I want to use the powerful DataContractSerializer to write or read data to the XML file.
But as my concept, DataContractSerializer can only read or write data with entire structure or list of structure.
My use case is describe below....I cannot figure out how to optimize the performance by using this API.
I have a structure named "Information" and have a List<Information> with unexpectable number of elements in this list.
User may update or add new element into this list very often.
Per operation (Add or Update), I must serialize all the element in the list to the same XML file.
So, I will write the same data even they are not modified into XML again. It does not make sense but I cannot find any approach to avoid this happened.
Due to the tombstoning mechanism, I must save all the information in 10 secs.
I'm afraid of the performance and maybe make UI lag...
Could I use any workaround to partially update or add a data information into the XML file by DataContractSerializer?
DataContractSerializer can be used to serialize selected items - what you need to do is to come up with scheme to identify changed data and way to efficiently serialize it. For example, one of the way could be
You start by serializing entire list of structures to an file.
Whenever some object is added/updated/removed from list, you create a diff object that will identify kind of change and the object changed. Then you can serialize this object to xml and append the xml to file.
While reading the file, you may have to apply similar logic, first read list and then start applying diffs one after another.
Because you want to continuous append to file, you shouldn't have root element in your file. In other words, the file with diff info will not be an valid xml document. It would contain series of xml fragments. To read it, you have to enclose these fragments in a xml declaration and root element.
You may use some background task to write the entire list periodically to generate valid xml file. At this point, you may discard your diff file. Idea is to mimic transactional system - one data structure to have serialized/saved info and then another structure containing changes (akin to transaction log).
If performance is a concern then using something other than DataContractSerializer.
There is a good comparison of the options at
http://blogs.claritycon.com/kevinmarshall/2010/11/03/wp7-serialization-comparison/
If the size of the list is a concern, you could try breaking it into smaller lists. THe most appropriate way to do this will depend on the data in your list and typical usage/edit/addition patterns.
Depending on the frequency with which the data is changed you could try saving it whenever it is changed. This would remove the need to save it in the time available for deactivation.
Related
I know this sounds kind of confusing. but i was wondering if there is a way to maintain the structure of the file and editing it even if it's adding data at some part of the file or editing a value from a certain position.
What i do right now to edit binary files is to code the parser with the BinaryReader class (in C#), reading a certain structure with reader.readSingle, readInt, and so on.
Then i write the exact same thing with BinaryWriter, which seems kind of inefficent and maybe i can make mistakes and making differences between both reader and writer, making the format inconsistent.
Is there any sort of way, to define the file structure and do the whole process automatically for reading and writing with a single format definition? Or being able to open a file, edit some values of it, (or adding, since it's not a fixed format, reading it would imply some for loops for example), and saving those changes?
I hope i explained myself in a sightly understandable way
If you want to insert new data into a binary file, you have three options:
Move everything from that point forward down a bit so that you make space for the new data.
Somehow mark the existing data as no longer relevant (i.e. a deleted flag), and add the new data at the end of the file.
Replace the existing data with a pointer to another location in the file (typically the end of the file) where the new data is stored.
The first method requires rewriting the entire file.
The second method can work well if it's a file of records, for example, and if you don't depend on the order of records in the file. It becomes more difficult if the file has a complex structure of nested records, etc. It has the drawback of leaving a lot of empty space in the file.
The third method is similar to the second, but works well if you're using random access rather than sequential access. It still ends up wasting space in the file.
I have a requirement for low memory footprint while processing XML files upwards of 400-500MB size. This means that I can have the file loaded in-memory only once at any point in time (e.g. in a string object). The data structure is such that the elements nest in only a few levels, but are many in number (i.e. many rows of data but grouped to only a couple of levels).
During processing, I need to forward some of the data directly (i.e. exactly as read from the file, unicode character-for-character) to another stream. In other parts of the file, I need to remove/add information (usually in the form of attribute values) and possibly forward the result in a byte-consistent way to another stream (i.e. removing or adding data the same way will always produce the same result).
I've looked into XmlReader and XmlTextReader but they don't provide a way to get the exact text of the node that was Read(). Am I missing something?
Lets say I have a List<Book> that a user can add/remove books to/from. The list is persisted to an xml file so that the data is available between sessions on a website.
Now, obviously this is not an ideal solution, a database would be much nicer. But this is for a school project with some rigid constraints. What I'm trying to figure out is if there is a good way to "update" the xml file whenever the List<Book> changes instead of simply rewriting the entire file each time.
No*, it is not possible to "update" XML without rewriting whole file. XML format does not allow for in-place modifications or even simple append.
*) one can come up with clever ways of allowing some in-place updates (i.e. leaving extra white-space and caching file offsets of elements) but I'd recommend doing such things only for personal entertainment project.
I am trying to do a merge sort on sorted chunks of XML files on disks. No chance that they all fit in memory. My XML files consists of records.
Say I have n XML files. If I had enough memory I would read the entire contents of each file into a correspoding Queue, one queue for each file, compare the timestamp on each item in each queue and output the one with the smallest timestamp to another file (the merge file). This way, I merge all the little files into one big file with all the entries time-sorted.
The problem is that I don't have enough memory to read all XML with .ReadToEnd to later pass to .Parse method of an XDocument.
Is there a clean way to read just enough records to keep each of the Queues filled for the next pass that compares their XElement attribute "TimeStamp", remembering which XElement from disk it has read?
Thank you.
An XmlReader is what you are looking for.
Represents a reader that provides fast, non-cached, forward-only
access to XML data.
So it has fallen out of fashion, but this is exactly the problem solved with SAX. It is the Simple API for XML, and is based on callbacks. You launch a read operation, and your code gets called back for each record. This may be an optioin, as this does not require the program to load in the entire XML file (ala XMLDocument). Google SAX.
If you like the linq to xml api, this codeplex project may suite your needs.
Just before I begin heres a small overview of what I'm trying to achieve and then we'll get down to the gory details. At present I'm developing an application which will monitor a users registry for changes to specific keys which relate to user preferences. Were currently using mandatory profiles (not my choice), anyway the whole idea is to record the changes to a location where they can be writen back to a users registry next time they log on.
At the moment I have the system monitoring registry changes and firing events returning the key, value name and value that have changed. I was entering these into a list to create a single string containing all the data, then writing that list to a text file every so often. Now this has all been fine but I need to change the way data's held as breaking the strings down into key, value name and value again for the write back to registry requires too much overhead and theres also problems breaking the strings up in a uniquely identifiable fashion.
So it was suggested to me to look at XML, which I haven't used before and I've begun investigating it and it all looks simple enough, I've also used LINQ before to connect to embedded databases. What I'm currently struggling to get my head around is how LINQ is able to retrieve and manipulate the data in memory from XML, as I don't want to be constantly accessing the XML file due to a need to keep the application as quick as possible. At present all changes in the registry are cached into a List(String) then written to a text file every minute or so.
At the moment what I have is the system returning the key, value name and value in different strings, converging these into a single List(String) value, where as what I'm going to need is table or equivalent representing a key, which contains multiple value names with each value name containing a single value and finally a type (this wil be a number representing what kind of registry value this is, REG SZ, REG BINARY etc). Both in the XML file and the program it self.
Also what I don't quite get is unlike a database the tables and there schemas won't exist until the program first runs as it will create a new XML file rather than it already existing. This is due to the information being writen back to the users personal drive, so it has to be created when it first runs on the users machine.
I've tried a few links and tutorials etc but nothing has clicked just yet, so if you have an example or could maybe explain it to me a little better it would be appreciated.
Just one final bit I want to add is that my current idea for storing the data in program is to create a List of values, embedded in a List of value names and a list of value names embedded in a list of keys. Does that sound ok?
Now I know this is long, and kind of all over the place, so if someone could help it would be appreciated or if you require further information of clarification please let me know and I'll try my best.
Thanks
From what I understand, you are just thinking in the wrong direction. Your application does not want to manipulate XML in memory. You just want to work with some data structure in memory and would like to have an easy way to store it to disc and to read it back? If I understand that right:
Don't care about LINQ for XML. Just have a look at the build in XML serialization infrastructure. Then build an internal data structure which fits your applications needs and use an XmlSerializer to write it to disc and to read it back. No need to touch any XML by hand!
From what you describe it does seem like a good idea to use XML here.
As for accessing the XML dta in memory, I found the MSDN documentation quite helpful:
http://msdn.microsoft.com/en-us/library/bb387098.aspx
The basic idea is, LINQ-to-XML is just LINQ-to-Objects, working with objects that represent the XML elements.
I'm afraid I don't quite get your second question.