I am just beginning to write an application. Part of what it needs to do is to run queries on a database of nutritional information. What I have is the USDA's SR21 Datasets in the form of flat delimited ASCII files.
What I need is advice. I am looking for the best way to import this data into the app and have it easily and quickly queryable at run time. I'll be using it for all the standard things. Populating controls dynamically, Datagrids, calculations, etc. I will also need to do user specific persistent data storage as well. This will not be a commercial app, so hopefully that opens up the possibilities. I am fine with .Net Framework 3.5 so Linq is a possibility when accessing the data (just don't know if it would be the best solution or not). So, what are some suggestions for persistent storage in this scenario? What sort of gotchas should I be watching for? Links to examples are always appreciated of course.
It looks pretty small, so I'd work out an appropriate object model, load the whole lot into memory, and then use LINQ to Objects.
I'm not quite sure what you're asking about in terms of "persistent storage" - aren't you just reading the data? Don't you already have that in the text files? I'm not sure why you'd want to introduce anything else.
I would import the flat files into SQL Server and access via standard ADO.NET functionality. Not only is DB access always better (more robust and powerful) than file I/O as far as data querying and manipulation goes, but you can also take advantage of SQL Server's caching capabilities, especially since this nutritional data won't be changing too often.
If you need to download updated flat files periodically, then look into developing a service that polls for these files and imports into SQL Server automatically.
EDIT: I refer to SQL Server, but feel free to use any DBMS.
My temptation would be to import the data into SQL Server (Express if you aren't looking to deploy the app) as it's a familiar source for me. Alternatively you can probably create an ODBC data source using the text file handler to get you a database-like connection.
I agree that you would benefit from a database, especially for rapid querying, and even more so if you are saving user changes to the data. In order to load the flat file data into a SQL Server (including Express), you can use SSIS.
Use Linq or text data to list method
1.create a list.
2.Read the text file line by line (or all lines).
3.process the line - get required data and attach to the list.
4.process the list for any further use.
the persistence storage will be files and List is volatile.
Related
The company I work for is running a C# project that crawling data from around 100 websites, saving it to the DB and running some procedures and calculations on that data.
Each one of those 100 websites is having around 10,000 events, and each event is saved to the DB.
After that, the data that was saved is being generated and aggregated to 1 big xml file, so each one of those 10,000 events that were saved, is now presented as a XML file in the DB.
This design looks like that:
1) crawling 100 websites to collects the data and save it the DB.
2) collect the data that was saved to the DB and generate XML files for each event
3) XML files are saved to the DB
The main issue for this post, is the selection of the saved XML files.
Each XML is about 1MB, and considering the fact that there are around 10,000 events, I am not sure SQL Server 2008 R2 is the right option.
I tried to use Redis, and the save is working very well (and fast!), but the query to get those XMLs works very slow (even locally, so network traffic wont be an issue).
I was wondering what are your thoughts? please take into consideration that it is a real-time system, so caching is not an option here.
Any idea will be welcomed.
Thanks.
Instead of using DB you could try a cloud-base system (Azure blobs or Amazon S3), it seems to be a perfect solution. See this post: azure blob storage effectiveness, same situation, except you have XML files instead of images. You can use a DB for storing the metadata, i.e. source and event type of the XML, the path in the cloud, but not the data itself.
You may also zip the files. I don't know the exact method, but it can surely be handled on client-side. Static data is often sent in zipped format to the client by default.
Your question is missing some details such as how long does your data need to remain in the database and such…
I’d avoid storing XML in database if you already have the raw data. Why not have an application that will query the database and generate XML reports on demand? This will save you a lot of space.
10GBs of data per day is something SQL Server 2008 R2 can handle with the right hardware and good structure optimization. You’ll need to investigate if standard edition will be enough or you’ll have to use enterprise or data center licenses.
In any case answer is yes – SQL Server is capable of handling this amount of data but I’d check other solutions as well to see if it’s possible to reduce the costs in any way.
Your basic arch doesn't seem to be at fault, its the way you've perceived the redis, basically if you design your key=>value right there is no way that the retrieval from redis could be slow.
for ex- lets say I have to store 1 mil objects in redis, and say there is an id against which I am storing my objects, this key is nothing but a guid, the save will be really quick, but when it comes to retrieval, do I know the "key" if i KNOW the key it'll be fast, but if I don't know it or I am trying to retrieve my data not on the basis of key but on the basis of some Value in my objects, then off course it'll be slow.
The point is - when it comes to retrieval you should just work against the "Key" and nothing else, so design your key like a pre-calculated value in itself; so when I need to get some data from redis/memcahce, I could make the KEY, and just do a single hit to get the data.
If you could put more details, we'll be able to help you better.
I want to save a whole MS SQL 2008 Database into XML files... using asp.net.
Now I am bit lost here.. what would be the best method to achieve this? Datasets?
And I need to restore the database later again.. using these XML files. I am thinking about using datasets for reading the tables and writing to xml and using the SQLBulkCopy class to restore the database again. But I am not sure whether this would be the right approach..
Any clues and tips for me?
If you will need to restore it on the same server type (I mean SQL Server 2008 or higher) and don't care about ability to see actual data inside the XML do the following:
Programmatically backup the DB using "BACKUP DATABASE" T-SQL
Compress the backup
Convert the backup to Base64
Place the backup as the content of the XML file (like: <database name="..." compressionmethod="..." compressionlevel="...">the Base64 content here</database>
On the server where you need to restore it, download the XML, extract the Base64 content, use the attributes to know what compression was used. Decompress and restore using T-SQL "RESTORE" command.
Would that approach work?
For sure, if you need to see the content of the database, you would need to develop the XML scheme, go through each table etc. But, you won't have SPs/Views and other items backed up.
Because you are talking about a CMS, I'm going to assume you are deploying into hosted environments where you might not have command line access.
Now, before I give you the link I want to state that this is a BAD idea. XML is way too verbose to transfer large amounts of data. Further, although it is relatively easy to pull data out, putting it back in will be difficult and a very time consuming development project in itself.
Next alert: as Denis suggested, you are going to miss all of your stored procedures, functions, etc. Your best bet is to use the normal sql server backup / restore process. (Incidentally, I upvoted his answer).
Finally, the last time I dealt with XML and SQL Server we noticed interesting issues that cropped up when data exceeded a 64KB boundary. Basically, at 63.5KB, the queries ran very quickly (200ms). At 64KB, the query times jumped to over a minute and sometimes quite a bit longer. We didn't bother testing anything over 100KB as that was taking 5 minutes on a fast/dedicated server with zero load.
http://msdn.microsoft.com/en-us/library/ms188273.aspx
See this for putting it back in:
How to insert FOR AUTO XML result into table?
For kicks, here is a link talking about pulling the data out as json objects: http://weblogs.asp.net/thiagosantos/archive/2008/11/17/get-json-from-sql-server.aspx
you should also read (not for the faint of heart): http://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/
Of course, the commentors all recommend building something using a CLR approach, but that's probably not available to you in a shared database hosting environment.
At the end of the day, if you are truly insistent on this madness, you might be better served by simply iterating through your table list and exporting all the data to standard CSV files. Then, iterating the CSV files to load the data back in ala C# - is there a way to stream a csv file into database?
Bear in mind that ALL of the above methods suffer from
long processing times due to the data overhead; which leads to
a high potential for failure due to the various time outs (page processing, command, connection, etc); and,
if your data model changes between the time it was exported and reimported then you're back to writing custom translation code and ultimately screwed anyway.
So, only do this if you really really have to and are at least somewhat of a masochist at heart. If the purpose is simply to transfer some data from one installation to another, you might consider using one of the tools like SQL Compare and SQL Data Compare from RedGate to handle the transfer.
I don't care how much (or little) you make, the $1500 investment in their developer bundle is much cheaper than the months of time you are going to spend doing this, fixing it, redoing it, fixing it again, etc. (for the record I do NOT work for them. Their products are just top notch.)
Red Gate's SQL Packager lets you package a database into an exe or to a VS project, so you might want to take a look at that. You can specify which tables you want to consider for data.
Is there any specific reason you want to do this using xml?
I have a C# Windows service which manages some stuff for my server application. This is not the main application, but a helper process used to control my actual application. The user connects to this application via WCF using a WinForms application. It all looks a bit like the IIS manager.
I need a data store for this application.
Currently, I use separate XML files which are loaded at start up, are updated in memory and flushed to disk on every change. I like this because:
We can simply edit the XML files in notepad when issues arise;
I do not have external dependencies to e.g. MSSQL express;
I do not have to update a database schema when the format changes.
However, I find that this is not stable and that the in memory management is very fragile.
What should I use instead that is not overkill (like e.g. MSSQL express would be) without loosing too many of the above advantages?
SQLite is made for occasions like this where you need a solid data store, but do not require the power or scalability of a full database server.
If you do not want to worry about schema changes, you may be best off with your xml method or some variety of NoSQL database. What exactly is unstable about your xml setup?
If you have multiple concurrent processes accessing the xml file, you will have to load it quite often to ensure it remains synchronized. If this is a multiuser situation, xml files may not be feasible past a very very small scale. This is the problem database systems solve fairly effectively.
Try SQL CE or SQLLite.
db4o
One solution would be to use and object database like dB4o. It has an extremely small footprint, is fast as hell and can you can add properties to your persisted objects without needing to make schema changes. Also, you don't have to write any sql.
Storing objects is as easy as:
using(IObjectContainer db = Db4oEmbedded.OpenFile(YapFileName))
{
Pilot pilot1 = new Pilot("Michael Schumacher", 100);
db.Store(pilot1);
}
XML in Database
Another way to do it is using something like SQLLite or SQL CE (as mentioned by other posters) in conjunction with xml data.
Data Contract Serializer
If you're not already using the DataContractSerializer / DataContracts to generate / load your xml files, it's worth considering. It's the same robust framework that you're already using for WCF. It handles versioning pretty well. You could use this to deal with xml files on disk, or use it with a database.
Here's our situation.
We're receiving a dump of relational data in Access 2007 format. There are quite a few tables involved. We're writing a console app in c# to run various queries against this data. We only need read-only access - we're not updating the Access database.
I haven't used Access in a project since pre-Linq days, and I'm hoping we don't have to go back to coding strings of sql against an ADO.Net connection just because the database is Access. I gather Linq 2 Sql is out of the question, but might Entity Framework be usable?
How would you approach this problem?
EDIT: The console app will be dropped by a business analyst into a folder containing the Access database, and when run will generate a text file created by querying the data. So unfortunately it's not an option to transfer the data to Sql beforehand!
If you must keep the data into Access, you can pull it into a Dataset via ADO.NET, and then use the LINQ extension methods that work against a Dataset to work against the data.
It's not nearly as nice as working with SQL Server, but it does work.
I would fully import the data into Sql Server, after which I would gleefully destroy all copies of the original Access file. Then you could get as Linq-y as you like.
Linq2Sql no, LinqXml yes! How hard would it be to dump the access file to xml?
I'm not sure I understand the question, but have you considered a SQL Server linked server to the Access file? You could then use SQL Server to do the manipulation. I don't know what the limitations/pitfalls of that are, but it's a solution often suggested when there's no good direct way to manipulate the Jet/ACE database.
I have a WPF application that stores a large amount of information in XML files and as the user uses the application they add more information to the XML files. It's basically using the XML files as a database. Since over the life of the program the XML files have gotten quite large, and I've been think about putting the data on a website, I've been looking into how to move all the information into an SQL database.
I've used SQL databases with web applications (PHP, Ruby, and ASP.NET) but never with a Desktop application. Ideally I'd like to be able to keep all the information in one database file and distribute it along with the application without requiring the user to connect to a remote database (so they don't need an internet connection - though eventually it would be nice if could compare the local file's version with one online somewhere and update if necessary) and without making them install a local database server on their computer. Is this possible?
I'd also like to use LINQ with any new database solution so switching to a database doesn't force to many changes (I read the XML with LINQ).
I'm sure this question has been asked and that there are already some good tutorials on the subject but I just can't find them.
SQLite is a good embedded database that you can ship along with your application. I have not done much more than some prototyping with it, so I personally cannot say with 100% certainty that it will meet your needs. But from what I have read, and what little I have written against it, it seems appropriate for the job.
SQLite Homepage
ADO.NET Provider
If you know how your objects are all going to fit together, you could serialize them/deserialize them to store them on disk as a set of ProtoBuf objects (depending on their size, of course). I've found that it's a pretty simple, elegant solution to storing a set of interconnected classes. Each class that should be savable, all your data, can be serialized using this method, and then restored as necessary.
Here's the .NET link to it.
This is a previous question I asked on SO, and got several good responses.