I've got a scenario where I need to backup thousands of rather small (1-3KB) files (in Azure blob storage, but that's not the point) which I have as a list of these models (source code of which I don't own - it's Azure SDK):
class Model {
public string Name {get;set;}
public Stream Data {get;}
//bunch of other things I'd like to ignore
}
My best attempt (from performance point of view) is now merging them all in a single file using JsonTextWriter from Json.NET, but obviously I don't need text format here and it introduces a lot of overheads. I'm wondering if there is any (binary?) serializer available which won't require me to decorate existing model with some specific attributes and will also have a nice API including something like
var writer = new MagicWriterThatImLookingFor();
foreach(var model in models){
writer.WriteString(model.Name);
writer.WriteByteArray(model.Data.ToArray());
}
with corresponding deserialize?
Related
I can use the .Net ConfigurationManager to store strings, but how can I store structured data?
For example, I can do this:
conf = ConfigurationManager.OpenExeConfiguration(...)
string s = "myval";
conf.AppSettings.Settings["mykey"].Value = s;
conf.Save(ConfigurationSaveMode.Modified);
And I would like to do this:
class myclass {
public string s;
int i;
... more elements
};
myclass c = new myclass(); c.s = "mystring"; c.i = 1234; ...
conf.AppSettings.Settings["mykey"] = cc;
conf.Save(ConfigurationSaveMode.Modified);
How do I store and retrieve structured data with the ConfigurationManager?
I implemented a solution as #sll suggested. But then difficulty was to create a new section to the configuration. Here is how this is done:
How to Write to a User.Config file through ConfigurationManager?
You can create own configuration section type by inheriting from ConfigurationSection class and use it to save/load any custom type information.
MSDN: How to: Create Custom Configuration Sections Using ConfigurationSection
BTW, One advice which might be helpful for you or others: One good thing is making custom configurations section class immutable (no public setters) so you can be sure that configuration cannot be changed on any stage of application life cycle, but then if you decide writing unit tests for code which relies on configuration section class and need section stub with some test values you might stuck with abilty to set property values since there is no setters. Solution is providing a new class which is inherited from your section class and specifying in constructor values using protected indexer like show below:
public class TestSectionClass: MyConfigurationSection
{
public TestSectionClass(string testUserName)
{
this["userName"] = testUserName;
}
}
Serialization.
There are numerous different ways of serializing data, so you'd need to pick one. But .NET provides a serialization API that suits a great many cases, and in working with web AJAX calls recently I find myself using JavaScriptSerializer heavily to turn things into JSON. However there are third party libraries such as protobuf-net, and so on.
The key here is to essentially turn your data into a byte or string representation that can later be deserialized back to its original structure at a later date, allowing you to store it in a medium between then, such as in configuration files or transmission over networks etc.
As per #sll's answer, .NET has another facet meaning it can handle serialization of data in and out of custom configuration sections; whether you want to begin specifying types explicitly for this purpose or not is your call. Bottom line is the same, serialize, somehow.
I'm thinking of a mid / large scale project, which will need to store different data types and present them to different clients.
What I'm struggling now is, how to build a data and service layer that can capable of storing different types of objects and query them when needed.
As an example, think of a client - server application in which, clients can only read each individual server's broadcasts, and now think of a scenario where a flower shop and restaurant broadcasts their data to a person on the street with a smart phone.
class SoCalledServer
{
public AccessibleClientData Broadcast(ClientData broadcastMessage)
{
Broadcast(broadcastMessage)
}
}
class RestaurantClient : AbstractClient
{
public SomeGenericDataType menu;
public RestaurantClient()
{
menu = new SomeGenericDataType<List>();
menu.Add("Sushi");
menu.Add("Fried potatoes");
}
public override void BeginBroadcast()
{
server.Broadcast(menu);
}
}
class FlowerShopClient : AbstractClient
{
public SomeGenericDataType flowersOnSale;
public FlowerShopClient()
{
flowersOnSale = new SomeGenericDataType<List>();
flowersOnSale.Add("Daisy");
flowersOnSale.Add("Rose");
}
public void BeginBroadcast()
{
server.Broadcast(flowersOnSale);
}
}
In this example, I have two different types of data (one is a restaurant's menu, and the other is flower shop's flowers) which can have different members of its own (eg. menu has prices and ingredients, flower shop's data has flower names and a description, quantity and / or price etc...) and this "client" samples can be extended.
How should I model such type of application? What kind of database schema I should use to store unidentified and various types of data? How my server and client application should communicate with each other? And the most important how should client get the broadcasted data type (from the generic type)?
How will this service manipulate this data? Will it only save it? Will it do some computations with this data? Your description is too generic.
Assuming you only want to write/persist/read data, you can simply save strings and let client do the parsing themselves. You can query based on id. Key/value and document databases work like this.
For anything more, you should think what the responsibility of the service should be and design the internal structure accordingly.
Another idea is to de/serialize them as XML or Json. Some hints:
// get stuff here
String json = GetJsonString(expression));
List<T> result;
using (var ms = new MemoryStream(Encoding.Unicode.GetBytes(json)))
{
var serializer = new DataContractJsonSerializer(typeof(List<T>));
result = (List<T>)serializer.ReadObject(ms);
}
Or XML:
http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx
You can convert the objects to XML/Json for transmission. Then, for storing, deserialize them as anonymous objects (no specified class) on the side where the class is unknown.
With this, you have always the possiblity to store all of the data even when the classes are unknown. Everywhere, everytime.
I have an MVC web app where users upload a text file and I parse it out.
The requirement just changed and they will be uploading multiple files of the same kind now. I parse a single file by sending a file-path to the method below, ReadParts which opens a stream and calls the method parseReplicateBlock to retrieve desired fields. For multiple files I could read all the files into one big stream but I am afraid it could exceed the buffer limit etc.
So I am thinking to parse file by file and populate results into an object. My requirement then, is to sort the records based on a date field.
I just need some help in how to write this method ReadLogFile in a better way, espceially for sorting based on initialtionDate and initiationTime. I want to find the minimum record based on initiationDate and Time and then do some other logic.
The problem is if I sort the list member within the object, I would loose positiong of the other records.
You appear to be storing each field of the record in a separate collection within LogFile. This seems a very strange way to store your data.
If you sort one of these collections, then of course it will bear no relationship to the other fields any longer since they are unrelated. There are huge areas for bugs too if you are relying on all the collections tallying up (eg if a field is missing from one of the parsed records)
Instead you should be have a class that represents a SINGLE record, and then Logfile has a SINGLE collection of these records. eg:
public class ReplicateBlock
{
public string ReplicateId { get; set; }
public string AssayNumber { get; set; }
public DateTime InitiationDate { get; set; }
//etc
}
public class LogFile
{
public List<ReplicateBlock> ReplicateBlocks = new List<ReplicateBlock>();
}
I have to say that your code is very difficult to follow. The fact that all your functions are static makes me think that you're not particularly familiar with object oriented programming. I would suggest getting a good book on the subject.
I am implementing the Builder Pattern in order to generate a set of objects. These objects then have to be serialized to XML and deserialized from XML.
I know how to perform the serialization and deserialization however I am unsure how to integrate it into the design pattern.
For example suppose my code uses the builder to create products foo and bar. My first thought is to put a serialize function on each one because each product knows what to serialize.
My next thought was to put the deserialization in the Director or the ConcreteBuilder.
What I don't like about this is that the serialization and deserialization functions will be in different places - one in the file for the declaration of the foo and bar objects and the other in the files for something else. I am worried that they might end up becoming out of sync with each other as I work on the product classes.
My final thought was for the Director or ConcreteBuilder to perform the serialization and deserialization. What I don't like about that is the products then have to know which builder was used or know who the Director is.
To clarify - there are two situations where a product can be created:
User clicks on a button in the user interface
User loads a XML project
Can you not simply have a static serialize/deserialize class and create a generic method that can take any type of object? Isn't the pattern simply for building the objects? You can then serialize as you wish?
Something like:
public static string Serialize<T>(T data)
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
StringWriter sw = new StringWriter();
xmlSerializer.Serialize(sw, data);
return sw.ToString();
}
My current solution is to have the Product perform the serialization and the ConcreteBuilder perform the deserialization, then put both the Product and it's ConcreteBuilder declarations into the same source file.
Although the task is spread across two classes it is at least kept together in one file.
Any better solutions are appreciated.
I need to model in memory a collection web files, but that relationships between them. That is file A (e.g. html) may have a link to file B (e.g. css) and file C (e.g. javascript). Also file D may also require file B. If I wanted to delete file A I would need to make sure any files it uses (e.g. file B) is not also being used by another file (e.g. file D). Perhaps something like:
List<WebFile> list_of_webfiles
public class WebFile
- string url
- bool parentFile
public class FileRelationship
- private WebFile parentWebFile;
- private WebFile childWebFile;
QUESTION - What would be the best way to model this in C#? (e.g. which collection type & how to model)
Note - it has to be modeled in memory (no database), and I need to be able to serialize to XML too to save. An example of what I mean would be something that looked like this...
XmlSerializer serializer = new XmlSerializer(typeof(List<WebFile>));
TextWriter textWriter = new StreamWriter(CONFIG_FILE_PATH);
serializer.Serialize(textWriter, list_of_webfiles);
textWriter.Close();
Thanks
This seems to imply a hierarchical 'tree' relationsihp where you may have
Class WebFile:
- URL : string
- Parent : WebFile
- Children : WebFile[] (could be a list depending on the need)
Then somewhere you have a
List<WebFile> webFiles;
This approach makes it easy to traverse the tree of webfiles and find the related ones, but harder to list all the files themselves.
Alternatively, you could store the list of files and relationships seperately
Class WebFile
- URL : string
Class WebFileRelationship
- Parent : WebFile
- Child : WebFile
And you have 2 containers
List<WebFile> webFiles;
List<WebFileRelationship> relationships;
This approach makes it easy to list all the relationships or all the files, but hard to determine the individual relationships.
It all depends on your application, do you need more information about the individual files or the relationships?
The fact that you have duplicates (in terms of multiple files requiring B) means that it would be a pain to use the most obvious "requires" structure as a tree, since that would involve nesting B multiple times (from different parents). A few options:
keep the object-references in the object model, but only list the name (or some other reference) in the file; relatively simple to do, but requires fixups after deserialization
only list the name (or some other reference) in the relationship, and mirror this in the object model - i.e. "file.Parent" is a key, not another object
have the full object model, and use a graph serializer, such as DataContractSerializer with preserve-object-references enabled
I would probably choose between the last two; the last has "not very pretty" xml, but is relatively simple to implement. But I'd be tempted to just use the middle option, and have just the key references in the object model, i.e.
[XmlType("file"), XmlRoot("file")]
public class File {
[XmlAttribute("name")]
public string Name {get;set;}
[XmlElement("ref")]
public List<string> References {get;set;}
public File() {References = new List<string>();}
}
maybe not pure OO, but simple to do. Also - avoid the need to duplicate data; if you store it just like the above, you can always scan to see "what uses this file" (with some indexing if you need). But trying to maintain relationships in both directions (i.e. "UsedBy") is a nightmare.