Datasets and XML in place of proper db: Not a good idea?

Datasets and XML in place of proper db: Not a good idea? - c#

In continuation of: Storing DataRelation in xml?
Thanks to everybody for answers to my earlier thread. However, could I ask the reason why everybody is not supporting this XML based approach? What exactly will be the problems? I can apply connstraints to dataset, and I can, I guess, also use transactions.
I am new to this. So if you could point me to some link, where I can find some sort of comparisons, that would be really helpful.
According to FAQ, discussions are not very encouraged, but I guess this is quite specific. I hope, not to be fired for this... :)
Thanks for reading,
Saurabh.

Database management systems are specifically designed to store data and retrieve it quickly, to preserve the integrity of the data and to leverage concurrent access to the data.
XML, on the other hand, was originally designed for documents, separating the content from the presentation. It became a handy way to store simple data because the file structure is so well defined, and then it went out of hand with people trying to store entire databases in an unsuited structure.
XML doesn't guarantee atomicity, concurrency, integrity, fast access or anything like that. Not inherently, anyway. .NET's DataSet libraries do help in that regard, but just because you can serialize DataSet objects to XML doesn't make it a good place to store data for multiple users.
When you're faced with two tools, one which was designed to do exactly what you need to do (in this case a DBMS) and one that was designed to do something else but has been kludged to do what you want, sorta (in this case XML), you should probably go with the first option.

Concurrency will be the main issue, where multiple users want to access the same "database" file. Performance is the other, because the whole file has to be loaded into memory. If the filesize grows it'll get unmanageable. Also, performance on queries is hit, because it won't be as efficient as getting something as tuned and honed as an RDBMS to do it for you.

Though I would need to research to back this up, I suspect that there are some performance implications to not using an actual database.
If there is a chance that another application built on a different platform (not ADO.NET) might need to access your data in the future, having to work with a giant XML file will very likely make life more difficult. A relational DB is the standard approach to this sort of problem.

Related

translate large queries to linq to Entity framework

I have a few very large queries which I need to convert it linq because we are using Entity framework and I cant use stored procedures(breaks compatibility with other data bases).
using tool like linqer didnt even help and even if I get it to work with some mods to generated linq, there is a huge performance issue.
so, what is the best option in a situation like this where EF fails?
please don't ask me to divide it into small queries cause that's not possible.

Moving this to an "answer" because what I want to say is too long for a comment.
It sounds like you're running into an inherent limitation to ORMs. You won't get perfect performance trying to do everything in code. It sounds like you're trying to use an ORM like a T-SQL interface rather than a mapping between objects and a relational instance of data.
You say you want to maintain compatibility between databases but that's already a nonstarter if you consider schema differences from database to database. If you're already implementing a schema validation step so you ensure your code doesn't break, then there should be no reason why you can't use something like views.
You can say you don't want to support these things all day long but the simple point is that these things exist because they address certain problems. If you wholesale abandon them, then you can't really expect to get rid of the problem. Some things the database simply does better.
So, I think you're expecting something out of the technology that it wasn't meant to solve. You'll need to either reevaluate your strategy or use another tool to accomplish it. I think you may even need a couple different tools.
What you've been doing may have worked when your scale was smaller. I could see such a thing working for quite a while actually. However, it does have a scale limit, and I think you're coming up against it.
I think you need to make a determination on what databases you want to support. Saying "we support all databases" is untenable. Then, compare features and use the ones in common. If it's a MS SQL vs. MySQL thing, then there's no reason why you can't use views or stored procedures.

Check out LinqKit - this is a very useful tool for building up complex large EF queries.
http://www.albahari.com/nutshell/linqkit.aspx

Loading XML Files

I'm currently working with loading a lot (thousands of files from 1KB - 6MB) XML files, and loading them into destination databases. Currently, I'm using the SQLXMLBULKLOAD COM object. One of the biggest problems I'm having is that the COM object doesn't always play nice within our transactional environment. There's other problems too, such as performance; the process really begins choking on files approaching ~2MB, taking several minutes, if not longer in some cases (hours), to load into the tables.
So now I'm looking for an alternative, of which there seems to be two flavors:
1) Something like OPENXML, where XML is inserted as XML data into SQL Server
or
2) Solutions that parse the XML in memory, and load as rowsets into the database.
There's drawbacks to either approach, and I know I'm going to have to start doing some benchmarking of prototype solutions before I jump to any conclusions. The OPENXML approach is very attractive IMO, mainly because it promises some really good performance numbers (others claiming going from hours to milliseconds). But it has the drawback of storing data as XML -not ideal in my particular scenario since the destination tables already exist, and many other processes rely on queries and SPROCS out there that query these tables as normal rowset data.
Whatever solution I choose, I must meet the following requirements:
1) Must accept any XML file. Clients (in a business sense) need only provide an XSD, and an appropriate destination database/table(s) for the data.
2) Individual files (never larger than ~6MB) must be processed in under 1 minute (ideally even much quicker than that).
3) Inserted data must be able to accomodate existing queries and SPROCS (i.e, must ultimately end up as normal rowset data)
So my question is, do you have any experience in this situation, and what are your thoughts and insights?
I am not completely opposed to an OPENXML-like solution, just as long as the data can end up as normal rowset data at some point. I am also interested in 3rd party solutions you may have experience with, this is an important part of our process, and we are willing to spend some $ if it provides us the best and most stable solution.
I'm also not opposed to "roll-your-own" suggestions, or things on Codeplex, etc. I came across the LINQ to XSD project, but couldn't find much documentation about what its capabilitities are (just as an exame of things I am interested in)

I would revisit the performance issues you are having with the SQLXMLBULKLOAD COM. I have used this component to load 500MB xml files before. Can you post the code you are using to invoke the component?

How can I use a very large dictionary in C#?

I want to use a lookup map or dictionary in a C# application, but it is expected to store 1-2 GB of data.
Can someone please tell if I will still be able to use dictionary class, or if I need to use some other class?
EDIT : We have an existing application which uses oracle database to query or lookup object details. It is however too slow, since the same objects are getting repeatedly queried. I was feeling that it might be ideal to use a lookup map for this scenario, to improve the response time. However I am worried if size will make it a problem

Short Answer
Yes. If your machine has enough memory for the structure (and the overhead of the rest of the program and system including operating system).
Long Answer
Are you sure you want to? Without knowing more about your application, it's difficult to know what to suggest.
Where is the data coming from? A file? Files? A database? Services?
Is this a caching mechanism? If so, can you expire items out of the cache once they haven't been accessed for a while? This way, you don't have to hold everything in memory all the time.
As others have suggested, if you're just trying to store lots of data, can you just use a database? That way you don't have to have all of the information in memory at once. With indexing, most databases are excellent at performing fast retrieves. You could combine this approach with a cache.
Is the data that will be in memory read only, or will it have to be persisted back to some storage when something changes?
Scalability - do you expect that the amount of data that will be stored in this dictionary will increase as time goes on? If so, you're going to run into a point where it's very expensive to buy machines that can handle this amount of data. You might want to look a distributed caching system if this is the case (AppFrabric comes to mind) so you can scale out horizontally (more machines) instead of vertically (one really big expensive point of failure).
UPDATE
In light of the poster's edit, it sounds like caching would go a long way here. There are many ways to do this:
Simple dictionary caching - just cache stuff as its requested.
Memcache
Caching Application Block I'm not a huge fan of this implementation, but others have had success.

As long as you're on a 64GB machine, yes you should be able to use that large of a dictionary. However if you have THAT much data, a database may be more appropriate (cassandra is really nothing but a gigantic dictionary, and there's always MySQL).

When you say 1-2GB of data, I assume that you mean the items are complex objects that cumulatively contain 1-2GB.
Unless they're structs (and they shouldn't be), the dictionary doesn't care how big the items are.
As long as you have less than about 224 items (I pulled that number out of a hat), you can store as much as you can fit in memory.
However, as everyone else has suggested, you should probably use a database instead.
You may want to use an in-memory database such as SQL CE.

You can but For a Dictionary as large as that you are better off using a DataBase

Use a database.
Make sure you've a good DB model, put correct indexes, and off you go.

You can use subdictionaries.
Dictionary<KeyA, Dictionary<KeyB ....
Where KeyA is some common part of KeyB.
For example, if you have a String dictionary you can use the First letter as KeyA.

How to store my data (C#.net)

I'm having a bit of a problem deciding how to store some data. To see it from a simple perspective, it will be a simple table of data but there will be many tables. There will be about 7 columns in each table, but again there will be a lot of tables (and they will be created at runtime, whenever the customer wants a clean grid)
The data has to be stored locally in a file (and there will not be multiple instances of the software running).
I'm using C# 4.0 and I have been looking at using XML files(one file per table, or storing multiple tables in a file), sqlite, sql server CE, access etc. I will be happy if someone here has some comments or suggestions on how to do/not to do. Stability and reliability(e.g. no trashed databases because of unstable third party software) is probably my biggest concern.

If you are looking to store the data locally in a file, I would recommend the sqlite option since it seems your data is created in the form of a database table already. Sqlite is already built to handle multiple tables and columns so it means less mental overhead for you, the developer.
http://web.archive.org/web/20100208133236/http://www.mikeduncan.com/sqlite-on-dotnet-in-3-mins/ is a decent tutorial to give a quick overview on how to set it up and get going.
As for what NOT to do: don't try to make your own scheme to save the data to a file, it's a well understood problem that has been solved many times over, why re-invent the wheel?

XML wont be a good choice if you are planning to make several queries, since loading text files may be painful when they grow (talking about files over 1mb). If you plan to mantain the data low, the xml would be good to keep it simple. I still won't use it, but if you have a background, then the benefits will be heavier than the learning curve.
If you have no expertise in any of them, and the data is light my suggestion is SQLite, I beleive is the best lightweight DB for .Net and the prvider is very good. you can find it easily on Google.
I would tell you that Access is not recommendable, but this is a personal oppinion. Many people use it and I think is for some reason. So you should check it out and try it.
Again, my final recommendation is SQLite, unless you know very well another one, in which case you'll have to think how much your data is going to grow. If you plan to have a DB around 100mb, any of them, except xml would do; If you think it'll grow bigger than that, consider SQLite heavily

Writing your own storage-system: where to start?

reading about NoSQL (http://nosql.eventbrite.com/), a movement aimed at encouraging the dropping of traditional relational databases in favor of custom, application-suited storage systems.
Intrigued by the idea of trying to write a small personal storage system (for the .net framework) as a learning pet project, what are you suggestions or useful links? Where to start? How to balance what's on the hard drive and what's in memory?
I think this could be an interesting opportunity to learn the insides of database inner work, but I really lack the most basic theory of it.
Thanks.

The NoSQL movement is aimed at huge scale systems, at sizes where the relational model truly breaks. Before you start writing your own storage I highly recommend understanding the relational model, as is one of the best documented and well understood domains in CS. Start with the Gray's and Reuter's Transaction Processing, this book explains everything there is to know about implementing a classic RDBMS. Next on your list should be the Readings in Database Systems, this is a collection of the most relevant scientific papers and articles.

Before you get going I would recommend looking into SQL Servers ability to store XML files as BLOB objects inside the relational database. Perhaps your storage system doesn't need to be "from scratch". It could be a hybrid on top of SQLs XML storage capability.

Well it all depends on the app you are building.
For example, if your app just needs to persist a few hundred objects and cut through them in a few ways and doesn't care if stuff gets corrupt once in a while. You could potentially just use LINQ to query a List and persist the List to disk once in a while.
If you need anything that has the magic ACID properties, well its going to take tons of work.
If you need something that supports Transactions, its going to take tons of work.
If you need something that understands the ANSI-SQL, you are going to have to write a parser, which is lots of work.
Before embarking on writing any kind of database I think you should understand a lot of database theory, get a book, read it.

Take a look at the work done by the Prevayler guys. They make the observation that if you can fit the data in ram, most usage scenario's show much better performance, and a lot less writing of code, without a RDBMS. On the other hand the Google, Amazon guys show that for large amounts of data you do not want to use a RDBMS. As we're moving to 64-bit OS-es and pcs with lots of ram, RDBMS's are between a rock and a hard place.

The SO question "Implementing a database — How to get started" has some usefull answers to your question!

Although this is a late response. There are a few basic scenarios you need to take into account before you do this even if you have prior knowledge in how the dbo and its engine works.
1. Is it for heavy storage?
If so, then you need to fine tune the pages and work on a file format that does take so much tick time to load and retrieve.
Does it need to handle many connection?
Again the pages are important but also you may need to create an engine for a Service or an app based instanced working behind the scenes.
Is it for application usage or web usage?
If it is for the web, then really use MySql or MSSQL.
Do not opt for inline memory to as your db storage because that nullifies the purpose of a database. The database was create so that you can free up the memory and release the table object(s) after an amount of time giving that memory back to the system. If it is for light use, create a simple XML/custom file database system because you are not saving or altering mass amounts of data at a time. Better than that, use SQLite which is very well suited for that purpose. If it is for opensource or commercial use do not go with inline memory because you don't to force someone to meet a high memory requirement, memory cost money and some folks are still running 32-bit OS.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.