sqlite , berkeley db benchmarking - c#

I want to create desktop application in c# for that i want to use embedded database like
(sqlite,berkeley db), so how can i start benchmarking for these databases ?

Recently, Oracle added the sqlite3 interface on top of BDB's btree storage, so you should be able to write your code against sqlite3 and then plug in BDB. The catch is licensing. BDB forces you to either pay or go open source; sqlite let's you do whatever you want.

Before thinking about benchmarking, you need to compare the features of the databases.
SQLite and BDB are completely different in the features they support, and if the data is complicated, I'd suggest SQLite for easier querying of relational data (if that's how your data is laid out)

I agree with Osama that you should compare the features your after first.
However, I disagree that "complicated" data should automatically drive you toward sqlite. While I haven't seen any benchmarks (nor have cared to write any), I have a gut reaction (whatever that's worth) that says BerkeleyDB is going to outperform nearly every time.
That said. I don't think that's what I'd use to make my own decision. It goes back to those features. If all I want is a simple data store, then I'd probably choose sqlite because its going to be easier. Likewise, if I want to be able to arbitrarily query my data on any field, or possibly one day store it in an "enterprise" SQL database, I'd likely go with sqlite because future migration will be easier. If, however, I intend to move beyond a simple data store, and am eyeing transactional safety, high concurrency, high availability, having many readers and writers, etc and I have a set of fairly well-defined "queries", then I probably want BDB.
Notice that "complexity" of my data doesn't really enter into these equations. The reason is simple. BDB can hold my object in it's native serialized format. Sql of any flavor comes with the famous impedence mismatch which, IMO, complicates my application.
If you are seriously considering BDB, I need to warn you that you should decide the type of storage your going to use up front as the different types of stores that BDB offers are not necessarily compatible.

Related

Most effective way of storing and managing moderate number of users

In a current project of mine I need to manage and store a moderate number (from 10-100 to 5000+) of users (ID, username, and some other data).
This means I have to be able to find users quickly at runtime, and I have to be able to save and restore the database to continue statistics after a restart of the program. I will also need to register every connect/disconnect/login/logout of a user for the statistics. (And some other data as well, but you get the idea).
In the past, I saved settings and other stuff in encoded textfiles, or serialized the needed objects and wrote them down. But these methods require me to rewrite the whole database on each change, and that's increasingly slowing it down (especially with a growing number of users/entries), isn't it?
Now the question is: What is the best way to do this kind of thing in C#?
Unfortunately, I don't have any experience in SQL or other query languages (except for a bit of LINQ), but that's not posing any problem for me, as I have the time and motivation to learn one (or more if required) for this task.
Most effective is highly subjective based on who you ask even if narrowing down this question to specific needs. If you are storing non-relational data Mongo or some other NoSQL type of database such as Raven DB would be effective. If your data has a relational shape then an RDBMS such as MySQL, SQL Server, or Oracle would be effective. Relational databases are ideal if you are going to have heavy reporting requirements as this allows non-developers more ease of access in writing simple SQL queries against it. But also keeping in mind performance with disk cache persistence that databases provide. Commonly accessed data is stored in memory to save the round trips to the disk (with hybrid drives I suppose accessing some files directly accomplishes the same thing however SSD's are still not as fast as RAM access). So you really need to ask yourself some questions to identify the best solution for you; What is the shape of your data (flat, relational, etc), do you have reporting requirements where less technical team members need to be able to query the data repository, and what are your performance metrics?

ADO and Microsoft SQL database backup and archival

I am working on re-engineering/upgrade of a tool. The database communication is in C++(unmanaged ADO) and connects to SQL server 2005.
I had a few queries regarding archiving and backup/restore techniques.
Generally archiving is different than backup/restore . can someone provide any link which explains me that .Presently the solution uses bcp tool for archival.I see lot of dependency on table names in the code. what are the things i have to consider in choosing the design(considering i have to take up the backup/archival on a button click, database size of 100mb at max)
Will moving the entire communication to .net will be of any help? considering lot of ORM tools. also all the bussiness logic and UI is in C#
What s the best method to verify the archival data ?
PS: the questionmight be too high level, but i did not get any proper link to understand this. It will be really helpful if someone can answer. I can provide more details!
Thanks in advance!
At 100 MB, I would say you should probably not spend too much time on archiving, and just use traditional backup strategies. The size of your database is so small that archiving would be quite an elaborate operation with very little gain, as the archiving process would typically only be relevant in the case of huge databases.
Generally speaking, a backup in database terms is a way to provide recoverability in case of a disaster (accidental data deletion, server crash, etc). Archiving mostly means you partition your data.
A possible goal with archiving is to keep specific data available for querying, but without the ability to alter it. When dealing with high volume databases, this is an excellent way to increase performance, as read-only data can be indexed much more densely than "hot" data. It also allows you to move the read-only data to an isolated RAID partition that is optimized for READ operations, and will not have to bother with the typical RDBMS IO. Also, by removing the non-active data from the regular database means the size of the data contained in your tables will decrease, which should boost performance of the overall system.
Archiving is typically done for legal reasons. The data in question might not be important for the business anymore, but the IRS or banking rules require it to be available for a certain amount of time.
Using SQL Server, you can archive your data using partitioning strategies. This normally involves figuring out the criteria based on which you will split the data. An example of this could be a date (i.e. data older than 3 years will be moved to the archive-part of the database). In case of huge systems, it might also make sense to split data based on geographical criteria (I.e. Americas on one server, Europe on another).
To answer your questions:
1) See the explanation written above
2) It really depends on what the goal of upgrading is. Moving it to .NET will get the code to be managed, but how important is that for the business?
3) If you do decide to partition, verifying it works could include issuing a query on the original database for data that contains both values before and after the threshold you will be using for partitioning, then splitting the data, and re-issuing the query afterwards to verify it still returns the same record-set. If you configure the system to use an automatic sliding window, you could also keep an eye on the system to ensure that data will automatically be moved to the archive partition.
Again, if the 100MB is not a typo, I would think your database is too small to really benefit from archiving. If your goal is to speed things up, put the system on a server that is able to load the whole database into RAM, or use SSD drives.
If you need to establish a data archive for legal or administrative reasons, give horizontal table partitioning a look. It's a pretty straight-forward process that is mostly handled by SQL Server automatically.
Hope this helps you out!

How to store my data (C#.net)

I'm having a bit of a problem deciding how to store some data. To see it from a simple perspective, it will be a simple table of data but there will be many tables. There will be about 7 columns in each table, but again there will be a lot of tables (and they will be created at runtime, whenever the customer wants a clean grid)
The data has to be stored locally in a file (and there will not be multiple instances of the software running).
I'm using C# 4.0 and I have been looking at using XML files(one file per table, or storing multiple tables in a file), sqlite, sql server CE, access etc. I will be happy if someone here has some comments or suggestions on how to do/not to do. Stability and reliability(e.g. no trashed databases because of unstable third party software) is probably my biggest concern.
If you are looking to store the data locally in a file, I would recommend the sqlite option since it seems your data is created in the form of a database table already. Sqlite is already built to handle multiple tables and columns so it means less mental overhead for you, the developer.
http://web.archive.org/web/20100208133236/http://www.mikeduncan.com/sqlite-on-dotnet-in-3-mins/ is a decent tutorial to give a quick overview on how to set it up and get going.
As for what NOT to do: don't try to make your own scheme to save the data to a file, it's a well understood problem that has been solved many times over, why re-invent the wheel?
XML wont be a good choice if you are planning to make several queries, since loading text files may be painful when they grow (talking about files over 1mb). If you plan to mantain the data low, the xml would be good to keep it simple. I still won't use it, but if you have a background, then the benefits will be heavier than the learning curve.
If you have no expertise in any of them, and the data is light my suggestion is SQLite, I beleive is the best lightweight DB for .Net and the prvider is very good. you can find it easily on Google.
I would tell you that Access is not recommendable, but this is a personal oppinion. Many people use it and I think is for some reason. So you should check it out and try it.
Again, my final recommendation is SQLite, unless you know very well another one, in which case you'll have to think how much your data is going to grow. If you plan to have a DB around 100mb, any of them, except xml would do; If you think it'll grow bigger than that, consider SQLite heavily

Best way to query different database engines in a uniform way?

I work on a C# client application (SlimTune Profiler) that uses relational (and potentially embedded) database engines as its backing store. The current version already has to deal with SQLite and SQL Server Compact, and I'd like to experiment with support for other systems like MySQL, Firebird, and so on. Worse still, I'd like it to support plugins for any other backing data store -- and not necessarily ones that are SQL based, ideally. Topping off the cake, the frontend itself supports plugins, so I have an unknown many-to-many mapping between querying code and engines handling the queries.
Right now, queries are basically handled via raw SQL code. I've already run into trouble making complex SELECTs work in a portable way. The problem can only get worse over time, and that doesn't even consider the idea of supporting non-SQL data. So then, what is the best way to query wildly disparate engines in a sane way?
I've considered something based on LINQ, possibly the DbLinq project. Another option is object persistence frameworks, Subsonic for example. But I'm not too sure what's out there, what the limitations are, or if I'm just hoping for too much.
(An aside, for the inevitable question of why I don't settle on one engine. I like giving the user a choice of the engine that works best for them. SQL Compact allows replication to a full SQL Server instance. SQLite is portable and supports in-memory databases. I can imagine a situation where a company wants to drop in a MySQL plugin so that they can easily store and collate an application's performance data over the course of time. Last and most importantly, I find the idea that I should have to be dependent on the implementation details of my underlying database engine to be absurd.)
Your best bet is to use an interface for all of your database access. Then for each database type you want to support to do the implementation of the interface for that database. That is what I've had to do for projects in the past.
The problem with many database systems and storage tools is that they aim to solve different problems. You might not even want to store your data in a SQL database but instead store it as files in the App_Data folder of a web application. With an interface method you could do that quite easily.
There generally isn't a solution that fits all database and storage solutions well or even a few of them well. If you find one that claims it does I still wouldn't trust it. When you have a problem with one of the databases it's going to be much easier for you to dig through your objects than it will be to go dig through theirs.
Use an object-relational mapper. This will provide a high level of abstraction away from the different database engines, and won't impose (many) limitations on the kind of queries you can run. Many ORMs also include LINQ support. There are numerous questions on SO providing recommendations and comparisons (e.g. What is your favorite ORM for .NET? appears to be the most recent and has links to several others).
I would recommend the repository pattern. You can create a class that encapsulates all the actions that you need the database for, and then create a different implementation for each database type you want to support. In many cases, for relationional data stores, you can use the ADO.NET abstractions (IDbConnection, IDataReader, IDataAdapter, etc) and create a single generic repository, and only write specific implementations for the database types that do not provide an ADO.NET driver.
public interface IExecutionResultsRepository
{
void SaveExecutionResults(string name, ExecutionResults results);
ExecutionResults GetExecutionResults(int id);
}
I don't actually know what you are storing, so you'd have to adapt this for your actual needs. I'm also guessing this would require some heavy refactoring as you might have sql statements littered throughout your code. And pulling these out and encapsulating them might not be feasible. But IMO, that's the best way to achieve what you want to do.

Simple Object to Database Product

I've been taking a look at some different products for .NET which propose to speed up development time by providing a way for business objects to map seamlessly to an automatically generated database. I've never had a problem writing a data access layer, but I'm wondering if this type of product will really save the time it claims. I also worry that I will be giving up too much control over the database and make it harder to track down any data level problems. Do these type of products make it better or worse in the already tough case that the database and business object structure must change?
For example:
Object Relation Mapping from Dev Express
In essence, is it worth it? Will I save "THAT" much time, effort, and future bugs?
I have used SubSonic and EntitySpaces. Once you get the hang of them, I beleive they can save you time, but as complexity of your app and volume of data grow, you may outgrow these tools. You start to lose time trying to figure out if something like a performance issue is related to the ORM or to your code. So, to answer your question, I think it depends. I tend to agree with Eric on this, high volume enterprise apps are not a good place for general purpose ORMs, but in standard fare smaller CRUD type apps, you might see some saved time.
I've found iBatis from the Apache group to be an excellent solution to this problem. My team is currently using iBatis to map all of our calls from Java to our MySQL backend. It's been a huge benefit as it's easy to manage all of our SQL queries and procedures because they're all located in XML files, not in our code. Separating SQL from your code, no matter what the language, is a great help.
Additionally, iBatis allows you to write your own data mappers to map data to and from your objects to the DB. We wanted this flexibility, as opposed to a Hibernate type solution that does everything for you, but also (IMO) limits your ability to perform complex queries.
There is a .NET version of iBatis as well.
I've recently set up ActiveRecord from the Castle Project for an app. It was pretty easy to get going. After creating a new app with it, I even used MyGeneration to script out class files for a legacy app that ActiveRecord could use in a pretty short time. It uses NHibernate to interact with the database, but takes away all the xml mapping that comes with NHibernate. The nice thing is though, if necessary, you already have NHibernate in your project, you can use its full power if you have some special cases. I'd suggest taking a look at it.
There are lots of choices of ORMs. Linq to Sql, nHibernate. For pure object databases there is db4o.
It depends on the application, but for a high volume enterprise application, I would not go this route. You need more control of your data.
I was discussing this with a friend over the weekend and it seems like the gains you make on ease of storage are lost if you need to be able to query the database outside of the application. My understanding is that these databases work by storing your object data in a de-normalized fashion. This makes it fast to retrieve entire sets of objects, but if you need to select data from a perspective that doesn't match your object model, the odbms might have a hard time getting at the particular data you want.

Categories