Lucene.Net writing/reading synchronization - c#

Could I write (with IndexWriter) new documents into index while it is opened for reading (with IndexReader)? Or must I close reading before writing?
Could I read/search documents (with IndexReader) in index while it is opened for writing (with IndexWriter)? Or must I close writing before reading?
Is Lucene.Net thread safely or not? Or must I write my own?

You may have any amount of readers/searchers opened at any time, but only one writer. This is enforced by a directory specific lock, usually involving a file named "write.lock".
Readers open snapshots, and writers adds more data to the index. Readers need to be opened or reopened (IndexReader.Reopen) after your writer has commited (IndexWriter.Commit) the data for it to be seen, unless you're working with near-realtime-searches. This involves a special reader returned from (IndexWriter.GetReader) which will be able to see content up to the time the call to GetReader was executed. It also means that the reader may see data that will never be commited due to application logic calling IndexWriter.Rollback.
Searchers uses readers, so identical limitations on these. (Unlimited number of them, can only see what's already commited, unless based on a near-realtime reader.)
Lucene is thread-safe, and best practices is to share readers and searchers between several threads, while checking that IndexReader.IsCurrent() == true. You could have a background thread running that reopens the reader once it detects changes, create a new searcher, and then let the main threads use it. This would also allow you to prewarm any FieldCache you use to increase search speed when once the new searcher is in place.

As I found in this mail list
Lucene.NET is thread-safe. So you can share the same instance of IndexWriter
or IndexSearcher among threads. Using a write-lock, it also hinders a second
IndexWriter instance to be opened with the same index.
As I see I can write and read separatly; I'll check it;)

Related

Locking index with lucene.net

I've got a Lucene index on a SAN (shared area network) which is used by several instances of lucene, distributed on several machines. When updating the index, an instance of lucene, will say add or update document, it will make i nativeFSLock, which makes it impossible for others to write at the same moment, this works fine!
The thing is, that I want to be able to send a batch with updates to any instance of lucene and I want it to do all the updates then release the lock. In Lucene.net there is no addDocuments method, only AddDocument. So i have to loop through all my documents and add them one at a time. as soon as one document is added lucene releases the lock, then makes a new lock for the next file. So if someone elses tries to update or add document at the same time it successfully obtains the lock sometimes in that little time-span, and when that happens only some of my batch will go through (race condition).
I want to obtain a lock, and not release it until my whole batch is done, any suggestions?
Best Regards
Don't use the NativeFSLockFactory on an index stored on a SAN, or use a dedicated node to write to the index.
Try using the SimpleFSLockFactory and see if it solves your locking issues. But with SimpleFSLockFactory another problem may arise, it leaves it's lock behind if your process crashes while it holds the lock.
More details here:
http://lucene.apache.org/core/old_versioned_docs/versions/2_9_4/api/all/org/apache/lucene/store/NativeFSLockFactory.html
I'd suggest you test your environment with the locking test tools specified in the link above. (VerifyingLockFactory, LockVerifyServer and LockStressTest)

Queue file operations for later when file is locked

I am trying to implement file based autoincrement identity value (at int value stored in TXT file) and I am trying to come up with the best way to handle concurrency issues. This identity will be used for unique ID for my content. When saving new content this file gets opened, the value gets read, incremented, new content is saved and the incremented value is written back to the file (whether we store the next available ID or the last issued one doesn't really matter). While this is being done another process might come along and try to save new content. The previous process opens the file with FileShare.None so no other process will be able to read the file until it is released by the first process. While the odds of this happening are minimal it could still happen.
Now when this does happen we have two options:
wait for the file to become available -
Emulate waiting on File.Open in C# when file is locked
we are talking about miliseconds here, so I guess this wouldn't be an issue as long as something strange happens and file never becomes available, then this solution would result in an infinite loop, so not an ideal solution
implement some sort of a queue and run all operations on files within a queue. My user experience requirements are such that at the time of saving/modifying files user should never be informed about exceptions or that something went wrong - he would get informed about them through a very friendly user interface later when operations would fail on the queue too.
At the moment of writing this, the solution should work within ASP.NET MVC application (both synchronously and async thru AJAX) but, if possible, it should use the concepts that could also work in Silverlight or Windows Forms or WPF application.
With regards to those two options which one do you think is better and for the second option what are possible technologies to implement this?
The ReaderWriterLockSlim class seems like a good solution for synchronizing access to the shared resource.

Synchronize writing to a file at file-system level

I have a text file and multiple threads/processes will write to it (it's a log file).
The file gets corrupted sometimes because of concurrent writings.
I want to use a file writing mode from all of threads which is sequential at file-system level itself.
I know it's possible to use locks (mutex for multiple processes) and synchronize writing to this file but I prefer to open the file in the correct mode and leave the task to System.IO.
Is it possible ? what's the best practice for this scenario ?
Your best bet is just to use locks/mutexex. It's a simple approach, it works and you can easily understand it and reason about it.
When it comes to synchronization it often pays to start with the simplest solution that could work and only try to refine if you hit problems.
To my knowledge, Windows doesn't have what you're looking for. There is no file handle object that does automatic synchronization by blocking all other users while one is writing to the file.
If your logging involves the three steps, open file, write, close file, then you can have your threads try to open the file in exclusive mode (FileShare.None), catch the exception if unable to open, and then try again until success. I've found that tedious at best.
In my programs that log from multiple threads, I created a TextWriter descendant that is essentially a queue. Threads call the Write or WriteLine methods on that object, which formats the output and places it into a queue (using a BlockingCollection). A separate logging thread services that queue--pulling things from it and writing them to the log file. This has a few benefits:
Threads don't have to wait on each other in order to log
Only one thread is writing to the file
It's trivial to rotate logs (i.e. start a new log file every hour, etc.)
There's zero chance of an error because I forgot to do the locking on some thread
Doing this across processes would be a lot more difficult. I've never even considered trying to share a log file across processes. Were I to need that, I would create a separate application (a logging service). That application would do the actual writes, with the other applications passing the strings to be written. Again, that ensures that I can't screw things up, and my code remains simple (i.e. no explicit locking code in the clients).
you might be able to use File.Open() with a FileShare value set to None, and make each thread wait if it can't get access to the file.

Concurrent file write

how to write to a text file that can be accessed by multiple sources (possibly in a concurrent way) ensuring that no write operation gets lost?
Like, if two different processes are writing in the same moment to the file, this can lead to problems. The simples solution (not very fast and not very elegant) would be locking the file while beginning the process (create a .lock file or similar) and release it (delete the lock) while the writing is done.
When beginning to write, i would check if the .lock file exists and delay the writing till the file is released.
What is the recommended pattern to follow for this kind of situation?
Thanks
EDIT
I mean processes, like different programs from different clients, different users and so on, not threads within the same program
Consider using a simple database. You will get all this built-in safety relatively easy.
The fastest way of synchronizing access between processes is to use Mutexes / Semaphores. This thread answers how to use them, to simulate read-writer lock pattern:
Is there a global named reader/writer lock?
I suggest using the ReaderWriterLock. It's designed for multiple readers but ensures only a single writer can writer data at any one time MSDN.
I would look at something like the Command Pattern for doing this. Concurrent writing would be a nightmare for data integrity.
Essentially you use this pattern to queue your write commands so that they are done in order of request.
You should also use the ReaderWriterLock to ensure that you can read from the file while writing occurs. This would be a second line of defense behind the command pattern so that only one thread could write to the file at a given time.
You can try lock too. It's easy - "lock ensures that one thread does not enter a critical section while another thread is in the critical section of code. If another thread attempts to enter a locked code, it will wait (block) until the object is released."
http://msdn.microsoft.com/en-us/library/c5kehkcz%28VS.71%29.aspx
I would also recommend you look for examples of having multiple readers and only 1 writer in a critical section heres a short paper with a good solution http://arxiv.org/PS_cache/cs/pdf/0303/0303005v1.pdf
Alternatively you could look at creating copies of the file each time it is requested and when it comes time to write any changes to the file you merge with the original file.

How to avoid File Blocking

We are monitoring the progress of a customized app (whose source is not under our control) which writes to a XML Manifest. At times , the application is stuck due to unable to write into the Manifest file. Although we are covering our traces by explicitly closing the file handle using File.Close and also creating the file variables in Using Blocks. But somehow it keeps happening. ( Our application is multithreaded and at most three threads might be accessing the file. )
Another interesting thing is that their app updates this manifest at three different events(add items, deleting items, completion of items) but we are only suffering about one event (completion of items). My code is listed here
using (var st = new FileStream(MenifestPath, FileMode.Open, FileAccess.Read))
{
using (TextReader r = new StreamReader(st))
{
var xml = r.ReadToEnd();
r.Close();
st.Close();
//................ Rest of our operations
}
}
If you are only reading from the file, then you should be able to pass a flag to specify the sharing mode. I don't know how you specify this in .NET, but in WinAPI you'd pass FILE_SHARE_READ | FILE_SHARE_WRITE to CreateFile().
I suggest you check your file API documentation to see where it mentions sharing modes.
Two things:
You should do the rest of your operations outside the scopes of the using statements. This way, you won't risk using the closed stream and reader. Also, you needn't use the Close methods, because when you exit the scope of the using statement, Dispose is called, which is equivalent.
You should use the overload that has the FileShare enumeration. Locking is paranoid in nature, so the file may be locked automatically to protect you from yourself. :)
HTH.
The problem is different because that person is having full control on the file access for all processes while as i mentioned ONE PROCESS IS THIRD PARTY WITH NO SOURCE ACCCESS. And our applications are working fine. However, their application seems stuck if they cant get hold the control of file. So i am willing to find a method of file access that does not disturb their running.
This could happen if one thread was attempting to read from the file while another was writing. To avoid this type of situation where you want multiple readers but only one writer at a time, make use of the ReaderWriterLock or in .NET 2.0 the ReaderWriterLockSlim class in the System.Threading namespace.
Also, if you're using .NET 2.0+, you can simplify your code to just:
string xmlText = File.ReadAllText(ManifestFile);
See also: File.ReadAllText on MSDN.

Categories