I've looked at FileHelpers v2.0 but there is a serious problem woth that. I cannot define a class that maps to the record in the source/detination file.
The reason is I don't know what file I'm going to get. A big part of my program is mapping the file's fields to the database's fields... I don't know how many fields there wil be, nor wich will need to be imported.
I have no intention on rolling my own lib, especially since I have no control over the files that are going to be fed to my program.
Any solutions tot his?
Dennis
Check out the Fast CSV reader on the CodeProject. It helped me with my project a while ago. Its really easy to use, and is quite good.
You can use ADO.NET to directly read the .CSV file into a DataTable. If you don't know how many fields will exist in advance, this can be a useful means of working with the data. This also has the advantage of not requiring any external libraries.
For details, please see Deborah Kurata's article on the subject.
StreamReader has been fast enough for me for pretty much every text file, though you are pretty screwed if you cant even guarantee value ordering.
Related
I'm currently following a tutorial series for a Tile Engine which uses XML files to store conversations between NPCs. A topic it doesn't appear to cover (I have only quickly glanced through the subsequent videos) is how to prevent the user from either altering or knowing in advance what the NPC is going to say by opening the XML file easily with a generic text editor.
The 2nd point of being able to read future conversations is not a real issue but something I wanted to think about, so if that's hard to implement I am not too fussed at this point.
How would I go about making the XML uneditable? I know vaguely about CRC32's which can check file integrity which may be useful and I also think there might be better ways to go about that (i.e. not with a CRC32).
The most extreme action I can think of would be to create my own arbitrary encoding for the conversation data, but the usefulness of XML files deters me from that slightly, and with the tutorials I'm following teaching me a lot things I don't know, I would prefer not to defer too far away from them!
Just looking for a direction really, thanks!
Xml is in its fundamentals an open format, so I mean there is not way how to make xml uneditable.
But you can have a copy of xml document (or some of fingerprint of xml) on your server (or on endpoints of NPC conversation) and then you can compare if xml document was edited or no.
If document was edited, you cas replace it with backup version or say to endpoints, that xml document was corrupted...
Historically, many games wrap multiple resources into a single binary file.
You might put it in a ZIP file (and maybe change the file extension). That would allow you to avoid having an XML file with an obvious name as a temptation for your users :).
Ultimately, you're asking something similar to the DRM question. I don't know whether your platform has an answer to that. (E.g., "using RSA encryption" is not secure as such; your program still has to decrypt the data at some point using the appropriate key, etc).
I've not done much with linq to xml, but all the examples I've seen load the entire XML document into memory.
What if the XML file is, say, 8GB, and you really don't have the option?
My first thought is to use the XElement.Load Method (TextReader) in combination with an instance of the FileStream Class.
QUESTION: will this work, and is this the right way to approach the problem of searching a very large XML file?
Note: high performance isn't required.. i'm trying to get linq to xml to basically do the work of the program i could write that loops through every line of my big file and gathers up, but since linq is "loop centric" I'd expect this to be possible....
Using XElement.Load will load the whole file into the memory. Instead, use XmlReader with the XNode.ReadFrom function, where you can selectively load notes found by XmlReader with XElement for further processing, if you need to. MSDN has a very good example doing just that: http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom.aspx
If you just need to search the xml document, XmlReader alone will suffice and will not load the whole document into the memory.
Gabriel,
Dude, this isn't exactly answering your ACTUAL question (How to read big xml docs using linq) but you might want to checkout my old question What's the best way to parse big XML documents in C-Sharp. The last "answer" (timewise) was a "note to self" on what ACTUALLY WORKED. It turns out that a hybrid document-XmlReader & doclet-XmlSerializer is fast (enough) AND flexible.
BUT note that I was dealing with docs upto only 150MB. If you REALLY have to handle docs as big as 8GB? then I guess you're likely to encounter all sorts of problems; including issues with the O/S's LARGE_FILE (>2GB) handling... in which case I strongly suggest you keep things as-primitive-as-possible... and XmlReader is as primitive as possible (and THE fastest according to my testing) XML-parser available in the Microsoft namespace.
Also: I've just noticed a belated comment in my old thread suggesting that I check out VTD-XML... I had a quick look at it just now... It "looks promising", even if the author seems to have contracted a terminal case of FIGJAM. He claims it'll handle docs of upto 256GB; to which I reply "Yeah, have you TESTED it? In WHAT environment?" It sounds like it should work though... I've used this same technique to implement "hyperlinks" in a textual help-system; back before HTML.
Anyway good luck with this, and your overall project. Cheers. Keith.
I realize that this answer might be considered non-responsive and possibly annoying, but I would say that if you have an XML file which is 8GB, then at least some of what you are trying to do in XML should be done by the file system or database.
If you have huge chunks of text in that file, you could store them as individual files and store the metadata and the filenames separately. If you don't, you must have many levels of structured data, probably with a lot of repetition of the structures. If you can decide what is considered an individual 'record' which can be stored as a smaller XML file or in a column of a database, then you can structure your database based on the levels of nesting above that. XML is great for small and dirty, it's also good for quite unstructured data since it is self-structuring. But if you have 8GB of data which you are going to do something meaningful with, you must (usually) be able to count on some predictable structure somewhere in it.
Storing XML (or JSON) in a database, and querying and searching both for XML records, and within the XML is well supported nowadays both by SQL stuff and by the NoSQL paradigm.
Of course you might not have the choice of not using XML files this big, or you might have some situation where they are really the best solution. But for some people reading this it could be helpful to look at this alternative.
I have a website where I allow businesses to register what products they sell individually. Then a consumer can online and search for a product and receive a list of all the shops where it's currently selling.
Although they can upload one product at a time, I want to allow businesses to mass upload things they offer.
I was thinking of using a excel spreadsheet. Have them download the template, and then have them upload the filled in excel sheet.
Others have suggested telling them to create a CSV file, but that is counter-intuitive in my honest opinion. Most likely a secretary will be creating the product sheets and she won't have a clue about what a CSV is.
What is the best way to approach this?
Well, it partly depends on the businesses. If they are medium or large businesses, they'd probably rather submit the data via a webservice anyway - then they don't have to get a human involved at all, after the initial development. They can write an application to periodically suck information from their (inevitable) database of products, and post to your web service.
If you're talking about very small companies without their own IT departments, that's less feasible, and either Excel or CSV would be a better approach. (As Caladain says, it's pretty simple to export to CSV... but you should try from a number of different spreadsheet programs as they may well have different subtleties in their export format. Things like text encoding will be important as well.)
But here's a novel idea... how about you ask some sample companies what they would like you to do? Presumably you have some companies in mind already - if you don't, it's potentially going to be pretty hard to make sure you're really building the right thing.
Find out how they already store their product list, and how they'd want to upload it to you. Then consider how difficult that would be, and possibly go back to them with something which is almost as easy for them, but a lot easier for you to implement, etc.
While I personally don't like Excel very much, it seems to be the best accepted format to do such things (involving a manual process).
My experience is that CSV breaks easily, for instance it uses the regional settings to determine the separator which can cause incompatibilities on either the client or the server side. Also, many people just save the file in any Excel format because they just don't know the difference.
Creating the files can be pretty easily done with some XSLT (e.g. create XMLSS format files, which are "XML Spreadsheet 2003" format).
You may also want to have a look at the Excel Data Reader on Codeplex for parsing the files.
Reading in an Excel file is actually pretty easy with ODBC. Tutorial on it.
I'm writing a simple program that will run entirely client-side. (Desktop programming? do people still do that?) and I need a simple way to store trivial amounts of data in a structured form, but really don't see any need to use a database system. What's more, some of the data needs to be serialized and passed around to different users, like some kind of "file" or perhaps a "document". (has anyone ever done that before?)
So, I've looked at using .Net DataSets, LINQ, direct XML manipulation, and they all seem like they would get the job done, but I would like to know before I dive into any of them if there's one method that is generally regarded as easier to code than others. As I said, the amount of data to be stored is trivial, even if one hundred people all used the same machine we're not talking about more than 10 MB, so performance is not as large a concern as is codeability/maintainability. Thank you all in advance!
Sounds like Linq-to-XML is a good option for this.
Link 1
Link 2
Tons of info out there on this.
Without knowing anything else about your app, the .Net DataSets would likely be your easiest option because WriteXml and ReadXml already exist.
Any serialization API should do fine here. I would recommend something that is contract based (not BinaryFormatter, which is type-based) as that will keep it usable over time (as your assembly changes).
So I would build a basic object model (DTO) and use any of:
XmlSerializer
DataContractSerializer
protobuf-net (you all knew it was coming...)
OO, simple, and easy. And easy to use for passing fragments of the data (either between users of to a central server).
I would choose an embedded database. Using something like sqlite doesn't seem to be an overkill for me. You may even try its c# port (http://code.google.com/p/csharp-sqlite/).
I want to be able to store information about a song that has been opened using my application. I would like for the user to be able to give the song a rating and this rating be loaded every time the users opens that file using my application.
I also need to know whether I should store the ratings in a database or an xml file.
C# ID3 Library is a .Net class library for editing id3 tags (v1-2.4). I would store the ratings directly into the comments section of the mp3 since id3v1 does not have many of the storage features that id3v2 does. If you want to store additional information for each mp3, what about placing a unique identifier on the mp3 and then having that do a database lookup?
I would be cautious about adding custom tags to mp3s as it is an easy way to ruin a large library. Also, I have gone down this road before and while I enjoyed the programming knowledge that came out of it, trying something like the iTunes SDK or Last FM might be a better route.
I would use a single-file, zero-config database. SQL Server Compact in your case.
I don't think XML is a good idea. XML shines in data interchange and storing very small amounts of information. In this case a user may rate thousands of tracks ( I have personally in online radios that allow ratings), and you may have lots of other information to store about the track.
Export and import using XML export procedures if you have to. Don't use it as your main datastore.
I would store it in a file as it is easier to keep with the mp3 file itself. If all you're doing is storing ratings, would you consider setting the ID3 rating field instead?
For this type of very simple storage I don't think it really matters all that much. The pro's of XML is its very easy to deploy and its editable outside of your app. the con's are, its editible outside your application (could be good, could be bad, depends on your situation)
Maybe another option (just because you can ;-) is an OODBMS, check out DB4Objects, its seriously addictive and very, very cool.
As mentioned earlier it is better to store such information in media file itself. And my suggestion is to use TagLib# lib for this (best media metadata lib I can find). Very powerful and easy to use.
I would store the ratings in a XML file, that way it's easy to edit from the outside, easy to read in .NET and you don't have to worry about shipping a database for something simple with you application.
Something like this might work for you:
<Songs>
<Song Title="{SongTitle}">
<Path>{Song path}</Path>
<Rating>3</Rating>
</Song>
</Songs>
If the song format supports suitable meta data (eg. MP3), then follow Kevin's advice of using the meta data. This is by far the best way of doing it, and it is what the meta data is intended for.
If not, then it really depends on your application. If you want to share the rating information - especially over a web service, then I would go for XML: it would be trivial to supply your XML listings as one big feed, for example.
XML (or most other text formats) also have the advantage that they can be easily edited by a human in a text editor.
The database would have its advantages if you had a more closed system, you wanted speed and fast indexing, and/or have other tables you might want to store as well (eg. data about albums and bands).