Locking index with lucene.net - c#

I've got a Lucene index on a SAN (shared area network) which is used by several instances of lucene, distributed on several machines. When updating the index, an instance of lucene, will say add or update document, it will make i nativeFSLock, which makes it impossible for others to write at the same moment, this works fine!
The thing is, that I want to be able to send a batch with updates to any instance of lucene and I want it to do all the updates then release the lock. In Lucene.net there is no addDocuments method, only AddDocument. So i have to loop through all my documents and add them one at a time. as soon as one document is added lucene releases the lock, then makes a new lock for the next file. So if someone elses tries to update or add document at the same time it successfully obtains the lock sometimes in that little time-span, and when that happens only some of my batch will go through (race condition).
I want to obtain a lock, and not release it until my whole batch is done, any suggestions?
Best Regards

Don't use the NativeFSLockFactory on an index stored on a SAN, or use a dedicated node to write to the index.
Try using the SimpleFSLockFactory and see if it solves your locking issues. But with SimpleFSLockFactory another problem may arise, it leaves it's lock behind if your process crashes while it holds the lock.
More details here:
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_4/api/all/org/apache/lucene/store/NativeFSLockFactory.html
I'd suggest you test your environment with the locking test tools specified in the link above. (VerifyingLockFactory, LockVerifyServer and LockStressTest)

Related

nHibernate locking - does LockMode UPGRADE block subsequent a NONE read on the same row

We have an card order process which can take a few seconds to run. Data in this process is run using the LockMode.UPGRADE in nHibernate.
A second (webhook) process with runs with LockMode.NONE is occasionally being triggered before the first order process completed creating a race condition and appears to be using the original row data. Its not being blocked until the first is complete so is getting old data.
Since our database is not running with NO WAIT or any of the other SNAPSHOT COMMITTED settings.
My Question is: Can lockmode.none somehow ignore the UPGRADE lock and read the old data (cache perhaps?)
Thanks.
Upgrade locks are for preventing another transaction to acquire the same data for upgrade too.
A transaction running in read committed mode without additional lock will still be able to read them. You need an exclusive lock for blocking read committed reads (provided they are without snapshot isolation level). Read more on Technet.
As far as I know, NHibernate does not provide a way to issue exclusive locks on entity reads. (You will have them by actually updating the entities in the transaction.) You may workaround this by using CreateSQLQuery for issuing your locks directly in DB. Or in your webhook process, acquire upgrade lock too.
But this path will surely lead to deadlocks or lock contentions.
More generally, avoiding explicitly locking by adapting code patterns is preferable. I have just written an example here, see the "no explicit lock" solution.
As for your webhook process, is there really any differences between those two situation?
It obtains its data, which was not having any update lock on it, but before it processes it, its data get updated by the order process.
It obtains its data, including some which was having an update lock on them.
Generally you will still have the same trouble to solve, and avoiding 2. will not resolve 1. Here too, that is the way the process work with its data that need to be adapted for supporting concurrency. (Hint: using long running transactions for solving 1. will surely lead to locks contention too.)

Application architecture with data on a shared network, without a database on the server

I'm currently working on a C# project of an application we'd like to develop. We're brainstorming over the question of sharing the data between users. We'd like to be able to specify a folder where all the files of the application are going to be saved and we'd like to be able to save them on a shared folder (server, different PC or Mac, Nas, etc.).
The deployment would be like so :
Installation on the first PC, we choose a network drive, share, whatever and create all the files for the application in this location.
On the second PC we install the application and we choose the same location (on the network), the application doesn't create anything, it sees that it's already existing and it uses these files as the application's data
Same thing on the other clients
The application's files are going to be documents (most likely XML formatted documents) and when opening the application we want to show all the existing documents. The thing is, we don't only want to have the list of documents and be able to edit their content, we also would like to be able to edit the document's property, so in a way we'd like a file (Sqlite, XML, whatever) representing the list of all the documents and their attributes. Same thing for a list of addresses.
I know all that looks exactly like a client / server with database solution, but this solution is out of the question. I was first looking at SQLite for my data files, but I know concurrency can be a real problem and file lock doesn't work well. The thing is, I would have the same problem with simple XML files (refreshing the content when several users are working, accessing locked files).
So I guess my final question is : Is it feasable? Is there an alternative I didn't see which would allow us to do that more easily?
EDIT :
OK I'm not responding to every post or comment, because I'm currently testing concurrency with SQLite. What I did, and please correct me if the way I test this is wrong, is launch X BackgroundWorker which are all going to insert record in a sample database (which is recreated everytime I start the application). I tried launching 100 iterations of INSERT in the database via these backgroundWorkers.
Of course concurrency is working with one application running, it's simply waiting for the last BackgroundWorker to do it's job and then writing the next record. I also tried inserting at (almost) the same time, meaning I put a loop in every BackgroundWorker waiting for a modulo 5 timestamp (every 5 seconds, every BackgroundWorker runs). Again, it's waiting for the previous insert query to end before doing the next and everything's working fine. I even tried it with 500 BackgroundWorkers and it worked fine.
I then tried launching my app several times and running them simultaneously. When doing this I did have some issue. With two instances of my app it was still working fine, but when trying this with 4-5 instances, it got really buggy and I got two types of error : 1. database is locked 2. disk I/O failure. But mostyle locked databases.
What I did was pretty intensive, in the scenario of my application, it will never ever come to 5 processes trying to simultaneously insert 500 hunded rows at the same time (maybe I'll get a concurrency of two or three connections). But what really bugged me and what makes me think my testing method is not really a good one, is that I got these errors trying to work on a database on a shared network, on a NAS AND on my own HDD. Everytime it worked for maybe 30-40 queries then throwing me "database is locked" error.
Am I testing it wrong? Maybe I shouldn't be trying so hard to make this work, but I'm still not convinced that SQLite is not a good alternative to what I'm trying to do, since the concurrency is going to be really small.
With your optimistic/pessimistic locking, you are ultimately trying to build a database. Also, you WILL have issues with consistency while trying to keep multiple files in sync with each other. Think about if you update the "metadata" file, and the write fails half-way through because of a network blip. File corruption will ensue, and you will be left trying to reconstruct things from backups.
I would suggest a couple of likely solutions:
1) Host the content yourselves, and let them be pure clients (cloud based deployments are ideal for this). Most network/firewall issues can be circumvented by using HTTP as your transport (web services).
2) Have one of the workstations be the "server", which keeps it data files on the NFS. This will give you transactional integrity, incremental backups, etc. There are lots of good embedded database managements systems to help you manage this complexity. MS SQL Server even has some great options for this.
You right, Sqlite uses file locks on database file, so storing all data files in database would bring write-starvation problem for editing your documents.
May be it's better choice to implement simple optimistic/pessimistic locking by yourself on particular-file level? For example, in case of using pessimistic lock you just don't allow anyone to edit particular file, if somebody already in process of editing it. In this case you will hold lock just on one file, but not on the entire database. If possibility of conflict(editing particular file at the same time) is pretty low, it is better to go with optimistic locking.
Simple optimistic locking implementation:
When user get file for reading - it's OK, no problem here. If user get file for editing, you could calculate hash for this file(or get timestamp of last updated time of the file), and then, when user tries to save edited file, compare current(at the moment of saving) hash/timestamp to make sure that file has not been changed by somebody else. If file has not been changed then it's ok to save it. IF file has been changed, then current user is out of luck, you need to inform him about it. This optimistic scenario is nice when possibility of this "out of luck" is pretty low. Otherwise it's better to stick with pessimistic locking, when you do not allow user even to start file editing if somebody else is doing it.

Queue file operations for later when file is locked

I am trying to implement file based autoincrement identity value (at int value stored in TXT file) and I am trying to come up with the best way to handle concurrency issues. This identity will be used for unique ID for my content. When saving new content this file gets opened, the value gets read, incremented, new content is saved and the incremented value is written back to the file (whether we store the next available ID or the last issued one doesn't really matter). While this is being done another process might come along and try to save new content. The previous process opens the file with FileShare.None so no other process will be able to read the file until it is released by the first process. While the odds of this happening are minimal it could still happen.
Now when this does happen we have two options:
wait for the file to become available -
Emulate waiting on File.Open in C# when file is locked
we are talking about miliseconds here, so I guess this wouldn't be an issue as long as something strange happens and file never becomes available, then this solution would result in an infinite loop, so not an ideal solution
implement some sort of a queue and run all operations on files within a queue. My user experience requirements are such that at the time of saving/modifying files user should never be informed about exceptions or that something went wrong - he would get informed about them through a very friendly user interface later when operations would fail on the queue too.
At the moment of writing this, the solution should work within ASP.NET MVC application (both synchronously and async thru AJAX) but, if possible, it should use the concepts that could also work in Silverlight or Windows Forms or WPF application.
With regards to those two options which one do you think is better and for the second option what are possible technologies to implement this?
The ReaderWriterLockSlim class seems like a good solution for synchronizing access to the shared resource.

Lucene.Net writing/reading synchronization

Could I write (with IndexWriter) new documents into index while it is opened for reading (with IndexReader)? Or must I close reading before writing?
Could I read/search documents (with IndexReader) in index while it is opened for writing (with IndexWriter)? Or must I close writing before reading?
Is Lucene.Net thread safely or not? Or must I write my own?
You may have any amount of readers/searchers opened at any time, but only one writer. This is enforced by a directory specific lock, usually involving a file named "write.lock".
Readers open snapshots, and writers adds more data to the index. Readers need to be opened or reopened (IndexReader.Reopen) after your writer has commited (IndexWriter.Commit) the data for it to be seen, unless you're working with near-realtime-searches. This involves a special reader returned from (IndexWriter.GetReader) which will be able to see content up to the time the call to GetReader was executed. It also means that the reader may see data that will never be commited due to application logic calling IndexWriter.Rollback.
Searchers uses readers, so identical limitations on these. (Unlimited number of them, can only see what's already commited, unless based on a near-realtime reader.)
Lucene is thread-safe, and best practices is to share readers and searchers between several threads, while checking that IndexReader.IsCurrent() == true. You could have a background thread running that reopens the reader once it detects changes, create a new searcher, and then let the main threads use it. This would also allow you to prewarm any FieldCache you use to increase search speed when once the new searcher is in place.
As I found in this mail list
Lucene.NET is thread-safe. So you can share the same instance of IndexWriter
or IndexSearcher among threads. Using a write-lock, it also hinders a second
IndexWriter instance to be opened with the same index.
As I see I can write and read separatly; I'll check it;)

How to make windows application for single system(machine)

I want to make .exe for desktop application which can only used once in whole life.Nobody can run it twice.
You can not do that reliably.
You may try simple stuff like writing a magic key in the registry or storing a magic file somewhere, but simple tools like Process Monitor will show your magic markers to anyone with Google skills.
You may try to delete the .exe when it is terminating, but if the user makes a copy before they execute your file, you loose again.
You may write a root-kit that prevents the system from launching your application twice, but that is not very nice and it can be detected and circumvented too.
You may create an online service where your application needs check for a one time license to execute, but that can be cracked and you will get a big mess keeping track of one time licenses.
But in the end, if someone really wants to run your application more than once they will figure out how to do it.
How much protection do you want?
Delete itself as it exits?
What you are talking about is a single instance application that can start up, and no other copy can run - the single instance start up code is based on creating a mutex, and if another copy is run, it checks to see if the mutex is allocated, if it is, it bails out and exits immediately. Have a look at this article on CodeProject that does exactly what you're looking for.
It's possible to use a combination of DPAPI calls (ProtectedData Class) to push a value into the registry to check for a second time, or alternatively encode a value on first run and check the result matches the machine you intend it to be run on (and exit if not).
See DataProtectionScope.LocalMachine. The result of the protected data will be almost always different per machine, so this works as a specific-machine check.
You could create a registry key. If the key exists abort execution.

Categories