Question: I currently store ASP.net application data in XML files.
Now the problem is I have asynchronous operations, which means I ran into the problem of simultanous write access on a XML file...
Now, I'm considering moving to an embedded database to solve the issue.
I'm currently considering SQlite and embeddable Firebird.
I'm not sure however if SQlite or Firebird can handle multiple concurrent write access.
And I certainly don't want the same problem again.
Anybody knows ?
SQlite certainly is better known, but which one is better - SQlite or Firebird ? I tend to say Firebird, but I don't really know.
No MS-Access or MS-SQL-express recommodations please, I'm a sane person.
I wll choose Firebird for many reasons and for this too
Although it is transactional, SQLite
does not support concurrent
transactions, so if your embedded
application needs two or more
connections, they must be serialized.
An embedded Firebird database is
simple to upgrade to a fully shared
database - just change the shared
library.
May be you can also check this
SQLITE can be configured to gracefully handle simultaneous writes in most situations. What happens is that when one thread or process begins a write to the db, the file is locked. When the second write is attempted, and encounters the lock, it backs off for a short period before attempting the write again, until it succeeds or times out. The timeout is configurable, but otherwise all this happens without the application code having to do anything special except enabling the option, like this:
// set SQLite to wait and retry for up to 100ms if database locked
sqlite3_busy_timeout( db, 100 );
All this works very well and without any difficulty, except in two circumstances:
If an application does a great many writes, say a thousand inserts, all in one transaction, then the database will be locked up for a significant period and can cause problems for any other application attempting to write. The solution is to break up such large writes into seperate transactions, so other applications can get access to the database.
If the database is shared by different processes running on different machines, sharing a network mounted disk. Many operating systems have bugs in network mounted disks that making file locking unreliable. There is no answer to this. If you need to share a db on a network mounted disk, you need another database engine such as MySQL.
I do not have any experience with Firebird. I have used SQLITE in situations like this for many applications over several years.
Have you looked into Berkeley DB with the SQLite API for SQL support?
It sounds like SQLite will be a good fit. We use SQLite in a number of production apps, it supports, actually, it prefers transactions which go a long way to handling concurrency.
transactional sqlite? in C#
I would add #3 to the list from ravenspoint above: if you have a large call-center or order-processing center, say, where dozens of people might be hitting the SAVE button at the same time, even if each is updating or inserting just one record, you can run into problems using the busy timeout approach.
For scenario #3, a true SQL engine that can serialize is ideal; less ideal but serviceable is a dbms that can do byte-range record locking of a shared-file. But be aware that even a byte-range record lock will be inadequate for a large number of concurrent writes when new records are appended to the end of the file like a caboose on the end of a freight train, so that multiple processes are trying at the same time to set a lock on the same byte-range. On the other hand, a byte-range record locking scheme coupled with a hashed-key sparse file approach (e.g. the old Revelation/OpenInsight database for LANs) will be far superior to ISAM for this scenario.
Related
I have a number of images that are stored as VARBINARY(MAX) (using FileStream) in a database. I'm looking to retrieve about 10 images or so at a time.
The prescribed, most common way using ASP.net is to use an HTTP handler and hit the database for each individual image. Seems fine, but is a bit slow at times.
Is it best to download all images for a given page at the same time in one big data chunk? Or should I try to grab each individually? Best practice?
Probably best to do them individually on a domain that doesn't have cookies set, or make sure your handler will work with multiple simultaneous requests. That way you can stream multiple results from the DB at the same time, and stream multiple images from your webserver as it gets them.
Well,
I think many people would have different opinions, and reasons about what the best practice is for them, but in reality, it all depends on hardware, software, data structure, and if the data is normalized.
In general, the SQL server likes SET operations better, meaning, the loops in general are slower.But, loops are safer for IOPs related issues, and they are better at causing less locks.
I am not sure which object mapper, or built in SQL library you are using( I have a feeling you may be using LINQ after you built a SQL class), but it also depends on the library you are using, and I would definitely recommend dapper.
I think reading them all at once would be faster, and here is why;
- If it is as you say, and you hit the database each time for the image, then that would add the delay of reconnecting to the database, so the latency will occur. But when there is one connection, the data retrieval is straight and your connection is open at that moment without requiring further session authentication.
I would recommend downloading them all at once, and informing the end user with a download screen during the process of that. Also for retrieving data, this link is very helpful I believe : https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
Depending on the features of your server, and edition, you could definitely use different features.
I'm currently working on a C# project of an application we'd like to develop. We're brainstorming over the question of sharing the data between users. We'd like to be able to specify a folder where all the files of the application are going to be saved and we'd like to be able to save them on a shared folder (server, different PC or Mac, Nas, etc.).
The deployment would be like so :
Installation on the first PC, we choose a network drive, share, whatever and create all the files for the application in this location.
On the second PC we install the application and we choose the same location (on the network), the application doesn't create anything, it sees that it's already existing and it uses these files as the application's data
Same thing on the other clients
The application's files are going to be documents (most likely XML formatted documents) and when opening the application we want to show all the existing documents. The thing is, we don't only want to have the list of documents and be able to edit their content, we also would like to be able to edit the document's property, so in a way we'd like a file (Sqlite, XML, whatever) representing the list of all the documents and their attributes. Same thing for a list of addresses.
I know all that looks exactly like a client / server with database solution, but this solution is out of the question. I was first looking at SQLite for my data files, but I know concurrency can be a real problem and file lock doesn't work well. The thing is, I would have the same problem with simple XML files (refreshing the content when several users are working, accessing locked files).
So I guess my final question is : Is it feasable? Is there an alternative I didn't see which would allow us to do that more easily?
EDIT :
OK I'm not responding to every post or comment, because I'm currently testing concurrency with SQLite. What I did, and please correct me if the way I test this is wrong, is launch X BackgroundWorker which are all going to insert record in a sample database (which is recreated everytime I start the application). I tried launching 100 iterations of INSERT in the database via these backgroundWorkers.
Of course concurrency is working with one application running, it's simply waiting for the last BackgroundWorker to do it's job and then writing the next record. I also tried inserting at (almost) the same time, meaning I put a loop in every BackgroundWorker waiting for a modulo 5 timestamp (every 5 seconds, every BackgroundWorker runs). Again, it's waiting for the previous insert query to end before doing the next and everything's working fine. I even tried it with 500 BackgroundWorkers and it worked fine.
I then tried launching my app several times and running them simultaneously. When doing this I did have some issue. With two instances of my app it was still working fine, but when trying this with 4-5 instances, it got really buggy and I got two types of error : 1. database is locked 2. disk I/O failure. But mostyle locked databases.
What I did was pretty intensive, in the scenario of my application, it will never ever come to 5 processes trying to simultaneously insert 500 hunded rows at the same time (maybe I'll get a concurrency of two or three connections). But what really bugged me and what makes me think my testing method is not really a good one, is that I got these errors trying to work on a database on a shared network, on a NAS AND on my own HDD. Everytime it worked for maybe 30-40 queries then throwing me "database is locked" error.
Am I testing it wrong? Maybe I shouldn't be trying so hard to make this work, but I'm still not convinced that SQLite is not a good alternative to what I'm trying to do, since the concurrency is going to be really small.
With your optimistic/pessimistic locking, you are ultimately trying to build a database. Also, you WILL have issues with consistency while trying to keep multiple files in sync with each other. Think about if you update the "metadata" file, and the write fails half-way through because of a network blip. File corruption will ensue, and you will be left trying to reconstruct things from backups.
I would suggest a couple of likely solutions:
1) Host the content yourselves, and let them be pure clients (cloud based deployments are ideal for this). Most network/firewall issues can be circumvented by using HTTP as your transport (web services).
2) Have one of the workstations be the "server", which keeps it data files on the NFS. This will give you transactional integrity, incremental backups, etc. There are lots of good embedded database managements systems to help you manage this complexity. MS SQL Server even has some great options for this.
You right, Sqlite uses file locks on database file, so storing all data files in database would bring write-starvation problem for editing your documents.
May be it's better choice to implement simple optimistic/pessimistic locking by yourself on particular-file level? For example, in case of using pessimistic lock you just don't allow anyone to edit particular file, if somebody already in process of editing it. In this case you will hold lock just on one file, but not on the entire database. If possibility of conflict(editing particular file at the same time) is pretty low, it is better to go with optimistic locking.
Simple optimistic locking implementation:
When user get file for reading - it's OK, no problem here. If user get file for editing, you could calculate hash for this file(or get timestamp of last updated time of the file), and then, when user tries to save edited file, compare current(at the moment of saving) hash/timestamp to make sure that file has not been changed by somebody else. If file has not been changed then it's ok to save it. IF file has been changed, then current user is out of luck, you need to inform him about it. This optimistic scenario is nice when possibility of this "out of luck" is pretty low. Otherwise it's better to stick with pessimistic locking, when you do not allow user even to start file editing if somebody else is doing it.
I am not sure if this is asked before or not (as I googled it).
Well I have written a web-service that will be hosted with SQLite database.
Many clients would be performing CRUD Operations on it. I planed to use this just for simplicity.
Now I have written my most methods and at this time I thought that there is no DBMS with that SQLite (I suppose) so there may be conflicts and data inconsistency issues if two or more client applications write to my application.
or Does SQLite supports managing of operation for multiple connections? or I have to switch to SQL Server 2008
SQLite "supports managing of operation for multiple connections" in the sense that it won't blow up or cause data corruption. It is not, however, designed to be as efficient as MS-SQL Server is with a high load of concurrent operations. So, what it boils down to is how many is "Many clients". If you are talking about tens of simultaneous requests, you will be fine with SQLite. If you are talking about hundreds of simultaneous requests, you will probably need to migrate to MS-SQL Server. Note that in order for two requests to be simultaneous the two clients must press the 'Submit' button at roughly the same few-millisecond time window. So it takes hundreds of simultaneously connected clients to get dozens of simultaneous requests.
The short answer is yes. Take a look at this Sqlite FAQ entry. The longer answer is a bit more complicated... Would you want to use Sqlite in an architecture that is meant to handle heavy transaction loads? Probably not. If you do want to move in that direction I would suggest starting with SQL Server Express. If you need to upgrade to a full-blown SQL Server it won't be an issue at all...
Sqlite Excerpt:
(5) Can multiple applications or multiple instances of the same application access a single database file at the same time?
Multiple processes can have the same database open at the same time.
Multiple processes can be doing a SELECT at the same time. But only
one process can be making changes to the database at any moment in
time, however.
SQLite uses reader/writer locks to control access to the database. [...]
Yes SQLite supports concurrency and locking
I have a C# console application which does some processing and then writes to the database. I have it deployed multiple times on a server with different config settings to do slightly different things. However, they all have to write to the same database (and may need to insert the same data into to the same table if it doesn't already exist) using Linq to Entities.
If I were using threads I could lock the method, or stored procedures I could queue up the writes to avoid clashes, but is there any way to keep these as seperate applications, and prevent them both trying to write the same thing to the database at the same time?
I'm getting an exception every so often when there is a conflict.
Edit:
I'm not necessarily trying to debug why I'm getting the exception, looking more for any suggestions of a 'best practice' way of doing this e.g. Should this be handled at the console app level, the L2E level, or the database level.
Why can't you start a transaction with high isolation level so that the lock is active at the server side?
You may use locks (pessimistic concurrency model) or timestamps (optimistic concurrency model) to deal with concurrency issues.
It is a very wide topic so i would suggest you start by googling for database concurrency.
I have a data file and from time to time I need to write a change to the file. The change consists of changing information in more than one place. For example, changing some data near the end of the file and also changing some information near the start. I want the two separate writes to either both succeed or both fail, otherwise it is left in uncertain state and effectively corrupted. Is there any builtin support for this scenario in .NET or in general?
If not then how to others solve this issue? How does a database on Windows solve this issue?
UPDATE: I do not want to use the Transactional NTFS capability because it is not available on older version of Windows such as XP and it is slow in the file overwrite scenario as described above.
DB basically uses a Journal concept (at least those one I'm aware of). An idea is, that a write operation is written in journal until Writer doesn't commit a transaction. (Sure it's just basic description, it's so easy)
In your case, it could be a copy of your file, where you're going to write a data, and if everything finished with success, substitute original file with it's copy.
Substitution is: rename original file like a old, rename backup file like a original.
If substitution fails: this is a critical error, that application should handle via fault tolerance strategies. Could be that it informed a user about a failed save operation, and tries to recover. By the way in any moment you have both copies of your file. That one when write operation just started, and that one when write operation finished.
This techniques we used on past projects on VS IDE like systems for industrial control with pretty good success.
If you are using Windows 6 or later (Vista/7/2008/2008R2) the NTFS filesystem supports transactions (including within a distributed transaction): but you will need to use P/Invoke to call Win32 APIs (see this question).
If you need to run on older versions of Windows, or non-NTFS partitions you would need to perform the transactions yourself. This is decidedly non-trivial: getting full ACID functionality while handling multiple processes (including remote access via shares) across process and system crashes even with the assumption that only your access methods will be used (some other process using normal Win32 APIs would of course break things).
In this case a database will almost certainly be easier: there are a number of in-process databases (SQL Compact Edition, SQL Lite, ...) so a database doesn't require a server process.