C#: Keeping an xml file in sync with data model - c#

Lets say I have a List<Book> that a user can add/remove books to/from. The list is persisted to an xml file so that the data is available between sessions on a website.
Now, obviously this is not an ideal solution, a database would be much nicer. But this is for a school project with some rigid constraints. What I'm trying to figure out is if there is a good way to "update" the xml file whenever the List<Book> changes instead of simply rewriting the entire file each time.

No*, it is not possible to "update" XML without rewriting whole file. XML format does not allow for in-place modifications or even simple append.
*) one can come up with clever ways of allowing some in-place updates (i.e. leaving extra white-space and caching file offsets of elements) but I'd recommend doing such things only for personal entertainment project.

Related

What is the easiest way to store and query data without using a database?

I am fairly new to ASP.Net MVC, which is why I could use some direction.
I am building a site for a client that is not using a Database.
I have several (~20) youtube videos I would like to embed. The client is no longer producing these videos and this list will not be updated often. I have created a template view for the video and information. I would like to setup a model that can query a youtube video from the data set.
My initial thought is to create a JSON File, and a model class to query the information. Is that the best way to accomplish this?
JSON seems like a great idea to me. With only about 20 records total, you're near the point where it doesn't even make sense to be data driven: just have 20 static pages with shared css and google custom search engine for queries. However, I still tend to prefer relying on a data source whenever I can, and I like JSON for this.
JSON will work well here because you can use a *.js file that will be cached by most browsers, and you can execute your searches on data without even needing to refresh the page. Especially if you're using a templating system like Knockout or Ember, you can have this be entirely a client application: no server code. Such an application would be very fast from the user perspective, especially if you use a cdn for the template engine, such that many users will already have it cached on first load.
You can use XML document to store structured data, load it, and use XPath to query it (be mindful of XPath Injection vulnerabilities). Or use the same XML to deserialize into a data model and use LINQ to query it.
(B/w, this is by far not the only option - just one-and-a-half that comes immediately to mind)
I would put the data in flat text file of my preferred format (personally json also) and then I would deserialize that into a list of objects and use LINQ queries on it. Given the small amount of data in question I would use a flat file in favor of a database even if I had the option.
You could also use a resx file as part of the project or the in built settings as suggested in the comments. Regardless of how you do it, the amount of data is small enough that you may as well just read it into a collection in memory and then query that collection.
Since it doesn't need to be updated very often, an easy approach would be to just create a hard coded list in code that's used to generate links from. If you want to be able to update the links in the future without modifying code then XML or JSON are likely your best bets.

How to use csv files hierarchicaly as database?

I use csv files as database in seperate processes. I only store all data or read all data in my datagrid in singular relationship. Every field in every txt file is one and only number starting from zero.
//While reaching countries, i read allcountries.txt,
//while reaching cities, i read allcities.txt
//while reaching places i read allplaces.txt.
but one country has many cities and one city has many places. Yet, i don't use any relationship. I want to use and i know there is some needs for this. How can i reach data for reading and writing by adding all text files one extra column?
And is it possible to reach data without sql queries?
Text files don't have any mechanism for SELECTs or JOINs. You'll be at a pretty steep disadvantage.
However, LINQ gives you the ability to search through object collections in SQL-like ways, and you can certainly create entities that have relationships. The application (whatever you're building) will have to load everything from the text files at application start. Since your code will be doing the searching, it has to have the entities loaded (from text files) and in-memory.
That may or may not be a pleasant or effective experience, but if you're set on replacing SQL with text files, it's not impossible.
CSV files are good for mass input and mass output. They are not good for point manipulations or maintaining relationships. Consider using a database. SQLite might be something useful in your application.
Based on your comments, it would make more sense to use XML instead of CSV. This meets your requirements for being human and machine readable, and XML has nice built in libraries for searching, manipulating, serializing etc.
You can use SQL queries in CSV files: How to use SQL against a CSV file, I have done it for reading but never for writing so I don't know if this will work for you.

Search in thousands of xml files

I have around 50000 XML files with a size of 50KB per file. I want to search for data in these files, but my solution so far is very slow. Is there any way to enhance the search performance?
You could use Lucene.NET, a lightweight, fast, flat file search indexing engine.
See http://codeclimber.net.nz/archive/2009/09/02/lucene.net-your-first-application.aspx for a getting started tutorial.
You can always index content of files to database and perform search there. Databases are pretty performant in terms of search.
I am assuming you are using Windows and you can use Windows desktop search for quickly searching the files. You will be using the Windows index which would update when ever the file changes. The SDK is available here which can be used from .NET
A lot depends on the nature of these XML files. Are they just 50,000 XML files that won't be re-generated? Or are they constantly changing? Are there only certain elements within the XML files you want to index for searching?
Certainly opening 50k file handles, reading their contents, and searching for text is going to be very slow. I agree with Pavel, putting the data in a database will yield a lot of performance, but if your XML files are changing often, you will have to have some way to keep them synchronized with the database.
If you want to roll your own solution, I recommend scanning all the files and creating a word index. If your files change frequently, you will also want to keep track of your "last modified" date, and if a file has changed more recently than that, update your index. In this way, you'll have one ginormous word index, and if the search is for "foo", the index will reveal that the word can be found in the files file39209.xml, file57209 and file01009.xml. Depending on the nature of the XML, you could even store the elements in the index file (which would, in essence, be like flattening all of your XML files into one).
You could spin up a Splunk instance and have it index your files. It's billed mostly as a log parser but would still serve your needs. It tokenizes files into words, indexes those words, and provides both a web-based and a CLI-based search tool that supports complex search criteria.
Use an XML database. The usual recommendations are eXist if you want open source, MarkLogic if you want something commercial, but you could use SQL Server if being Microsoft matters to you and you don't want the ultimate in XML capability. And there are plenty of others if you want to evaluate them. All database products have a steep learning curve, but for these data volumes, it's the right solution.

How to use DataContractSerializer efficiently with this use case?

I want to use the powerful DataContractSerializer to write or read data to the XML file.
But as my concept, DataContractSerializer can only read or write data with entire structure or list of structure.
My use case is describe below....I cannot figure out how to optimize the performance by using this API.
I have a structure named "Information" and have a List<Information> with unexpectable number of elements in this list.
User may update or add new element into this list very often.
Per operation (Add or Update), I must serialize all the element in the list to the same XML file.
So, I will write the same data even they are not modified into XML again. It does not make sense but I cannot find any approach to avoid this happened.
Due to the tombstoning mechanism, I must save all the information in 10 secs.
I'm afraid of the performance and maybe make UI lag...
Could I use any workaround to partially update or add a data information into the XML file by DataContractSerializer?
DataContractSerializer can be used to serialize selected items - what you need to do is to come up with scheme to identify changed data and way to efficiently serialize it. For example, one of the way could be
You start by serializing entire list of structures to an file.
Whenever some object is added/updated/removed from list, you create a diff object that will identify kind of change and the object changed. Then you can serialize this object to xml and append the xml to file.
While reading the file, you may have to apply similar logic, first read list and then start applying diffs one after another.
Because you want to continuous append to file, you shouldn't have root element in your file. In other words, the file with diff info will not be an valid xml document. It would contain series of xml fragments. To read it, you have to enclose these fragments in a xml declaration and root element.
You may use some background task to write the entire list periodically to generate valid xml file. At this point, you may discard your diff file. Idea is to mimic transactional system - one data structure to have serialized/saved info and then another structure containing changes (akin to transaction log).
If performance is a concern then using something other than DataContractSerializer.
There is a good comparison of the options at
http://blogs.claritycon.com/kevinmarshall/2010/11/03/wp7-serialization-comparison/
If the size of the list is a concern, you could try breaking it into smaller lists. THe most appropriate way to do this will depend on the data in your list and typical usage/edit/addition patterns.
Depending on the frequency with which the data is changed you could try saving it whenever it is changed. This would remove the need to save it in the time available for deactivation.

Reading and Writing XML as relational data - best practices

I'm supposed to do the fallowing:
1) read a huge (700MB ~ 10 million elements) XML file;
2) parse it preserving order;
3) create a text(one or more) file with SQL insert statements to bulk load it on the DB;
4) write the relational tuples and write them back in XML.
I'm here to exchange some ideas about the best (== fast fast fast...) way to do this. I will use C# 4.0 and SQL Server 2008.
I believe that XmlTextReader its a good start. But I do not know if it can handle such a huge file. Does it load all file when is instantiated or holds just the actual reading line in memory? I suppose I can do a while(reader.Read()) and that should be fine.
What is the best way to write the text files? As I should preserve the ordering of the XML (adopting some numbering schema) I will have to hold some parts of the tree in memory to do the calculations etc... Should I iterate with stringbuilder?
I will have two scenarios: one where every node (element, attribute or text) will be in the same table (i.e., will be the same object) and another scenario where for each type of node (just this three types, no comments etc..) I will have a table in the DB and a class to represent this entity.
My last specific question is how good is the DataSet ds.WriteXml? Will it handle 10M tuples? Maybe its best to bring chunks from the database and use a XmlWriter... I really dont know.
I'm testing all this stuff... But I decided to post this question to listen you guys, hopping your expertise can help me doing this things more correctly and faster.
Thanks in advance,
Pedro Dusso
I'd use the SQLXML Bulk Load Component for this. You provide a specially annotated XSD schema for your XML with embedded mappings to your relational model. It can then bulk load the XML data blazingly fast.
If your XML has no schema you can create one from visual studio by loading the file and selecting Create Schema from the XML menu. You will need to add the mappings to your relational model yourself however. This blog has some posts on how to do that.
Guess what? You don't have a SQL Server problem. You have an XML problem!
Faced with your situation, I wouldn't hesitate. I'd use Perl and one of its many XML modules to parse the data, create simple tab- or other-delimited files to bulk load, and bcp the resulting files.
Using the server to parse your XML has many disadvantages:
Not fast, more than likely
Positively useless error messages, in my experience
No debugger
Nowhere to turn when one of the above turns out to be true
If you use Perl on the other hand, you have line-by-line processing and debugging, error messages intended to guide a programmer, and many alternatives should your first choice of package turn out not to do the job.
If you do this kind of work often and don't know Perl, learn it. It will repay you many times over.

Categories