Chat logs management, performance-wise

Chat logs management, performance-wise - c#

I've got an application receiving messages from different sources (chat rooms and private chats). Multiple instances of the application can be opened, and the final result should be something similar to the following scenario:
Currently, each application saves logs in a directory which name is the account used to log on the chat server; while it's not a problem for private chat sources (unique for each application instance), it's useless to have the same logs saved multiple times, concerning common chat rooms. Logs are saved in plain text format, so they can be accessed and read without being routed through the application.
If I don't save the logs in separate folders, I might get I/O exceptions due to accessing the same file simultaneously from multiple processes, and I'd need to verify whether the line about to be saved hasn't already been written by other applications. I need to optimize the whole operation and try to maintain code readability.
Besides, my current approach for writing lines is the following:
public void Write(string message)
{
using (var writer = new StreamWriter(_fileName, File.Exists(_fileName)))
writer.WriteLine(message);
}
Which, considering that logs are constantly written, might not be the most efficient solution.
Summed up, my questions are:
How do I create an unique log folder/database, maintaining their format (plain text) but solving the aforementioned duplicates/access problem?
How do I improve, if possible, the writing method? Remember that logs need to be constantly written, but that closing the StreamWriter when the application exits would not be a proper solution, as the application is meant to run for a long time.
Thank you.

I would come up with a simple solution, which might be appropriate for your needs, not entirely sure though.
My approach would be to use a single file for each chat session/room. If such a session is started, the application tries to create/open that file and creates a write lock for that file. If it gets an IOException (because the file is locked) it can simply skip logging completely.

To be honest, if I were you I would be looking at already exising open source frameworks e.g. NLog. It's fast enough and supports asynchronous logging so it should do exactly what your looking for.

Not sure, if I should write this as an answer or a comment, but might need the room:
You mentioned your sketch showing the desired result, but as I said this will prevent you from deduping if you don't couple the instances. So here is what I would suggest:
You create two applications: LogWriter, which is a singleton and sits at the bottom of your sketch
LogProcessor, which is the application instances in your sketch.
Upon startup of a LogProcessor instance, it spawns a LogWriter or connects to it, if it is already running.
LogProcessor handles the incoming log requests, maybe preprocesses them if you need to do so, then sends them on to the LogWriter as a tuple of Timestamp, ChatroomID, UserID (need not be unique), text, and maybe a hash for easier deduping. Calculating the hash in the instances makes better use of multiple cores
The LogWriter keeps a hanging list sorted by timestamp, containing the hashes, so it is able to quickly discard the duplicate items
For the rest of the items, LogWriter determines the logfile path. If a stream is already open to that path, it writes the item out, updates the LastUsed Timestamp on that stream and is done
If no stream is open, LogWriter opens one, then writes.
If the max number of streams is reached, it closes the oldest stream (as by the above mentioned LastUsed Timestamp) and opens the needed new stream instead.

Perhaps change your application design in such a way that a logger is attached to a chat room not to a user.
When users enter a chat room, the chatroom will pass a logger object to the users.
This way all users will use the same logger. The problem then becomes: 1 consumer (the logger), and multiple producers (all those users who want to log).
See my reply to this post:
Write to FileOutputStream from multiple threads in Java
here
https://stackoverflow.com/a/8422621/1007845

Related

Multiple files in log4net with same logger

I'm currently working in a chat API, and I receive multiple requests at the same time, from different sessions, so its almost impossible to track each
conversation separately, because it mixes with all the others logs from other conversations.
So I want to create a separated file for each session(conversation) dynamically, with the filename as the sessionId, but if I create multiple loggers, my application just freeze, because I can have more than 100 sessions simultaneously.
I have also tried to change the file path (programmatically) for each request with its id on it, but it also freezes the application after 1-2 hours.
Is there any solution for this problem?

If these conversation files are so important, consider other options than logging. A database might be appropriate.
Another solution might be to parse the log files and split them into conversation files in a separate (logical?) process (perhaps later, after the session has ended.) This way the program doesn't need to keep track of many files at the same time and parsing can be done faster/more efficiently.

Queue file operations for later when file is locked

I am trying to implement file based autoincrement identity value (at int value stored in TXT file) and I am trying to come up with the best way to handle concurrency issues. This identity will be used for unique ID for my content. When saving new content this file gets opened, the value gets read, incremented, new content is saved and the incremented value is written back to the file (whether we store the next available ID or the last issued one doesn't really matter). While this is being done another process might come along and try to save new content. The previous process opens the file with FileShare.None so no other process will be able to read the file until it is released by the first process. While the odds of this happening are minimal it could still happen.
Now when this does happen we have two options:
wait for the file to become available -
Emulate waiting on File.Open in C# when file is locked
we are talking about miliseconds here, so I guess this wouldn't be an issue as long as something strange happens and file never becomes available, then this solution would result in an infinite loop, so not an ideal solution
implement some sort of a queue and run all operations on files within a queue. My user experience requirements are such that at the time of saving/modifying files user should never be informed about exceptions or that something went wrong - he would get informed about them through a very friendly user interface later when operations would fail on the queue too.
At the moment of writing this, the solution should work within ASP.NET MVC application (both synchronously and async thru AJAX) but, if possible, it should use the concepts that could also work in Silverlight or Windows Forms or WPF application.
With regards to those two options which one do you think is better and for the second option what are possible technologies to implement this?

The ReaderWriterLockSlim class seems like a good solution for synchronizing access to the shared resource.

Synchronize writing to a file at file-system level

I have a text file and multiple threads/processes will write to it (it's a log file).
The file gets corrupted sometimes because of concurrent writings.
I want to use a file writing mode from all of threads which is sequential at file-system level itself.
I know it's possible to use locks (mutex for multiple processes) and synchronize writing to this file but I prefer to open the file in the correct mode and leave the task to System.IO.
Is it possible ? what's the best practice for this scenario ?

Your best bet is just to use locks/mutexex. It's a simple approach, it works and you can easily understand it and reason about it.
When it comes to synchronization it often pays to start with the simplest solution that could work and only try to refine if you hit problems.

To my knowledge, Windows doesn't have what you're looking for. There is no file handle object that does automatic synchronization by blocking all other users while one is writing to the file.
If your logging involves the three steps, open file, write, close file, then you can have your threads try to open the file in exclusive mode (FileShare.None), catch the exception if unable to open, and then try again until success. I've found that tedious at best.
In my programs that log from multiple threads, I created a TextWriter descendant that is essentially a queue. Threads call the Write or WriteLine methods on that object, which formats the output and places it into a queue (using a BlockingCollection). A separate logging thread services that queue--pulling things from it and writing them to the log file. This has a few benefits:
Threads don't have to wait on each other in order to log
Only one thread is writing to the file
It's trivial to rotate logs (i.e. start a new log file every hour, etc.)
There's zero chance of an error because I forgot to do the locking on some thread
Doing this across processes would be a lot more difficult. I've never even considered trying to share a log file across processes. Were I to need that, I would create a separate application (a logging service). That application would do the actual writes, with the other applications passing the strings to be written. Again, that ensures that I can't screw things up, and my code remains simple (i.e. no explicit locking code in the clients).

you might be able to use File.Open() with a FileShare value set to None, and make each thread wait if it can't get access to the file.

Logging from multiple processes to same file using Enterprise Library 4.1

I have several processes running concurrently that I want to log to the same file.
We have been using Enterprise Library 4.1 Logging Application Block (with a RollingFlatFileTraceListener), and it works fine, apart from the fact that it prepends a GUID to the log file name when two processes try to write to the log file at the same time (a quirk of System.Diagnostics.TextWriterTraceListener I believe).
I've tried various things, including calling Logger.Writer.Dispose() after writing to the log file, but it's not ideal to do a blocking call each time a log entry is being written.
The EntLib forums suggest using MSMQ with a Distributor Service, but that is not an option as MSMQ is not allowed at my company.
Is there another way I can quickly and easily log from multiple threads/processes to the same file?

Sorry to say but the answer is no. The File TraceListeners lock the output file so only one TraceListener can log to a file.
You can try other Trace Listeners that are not file based (e.g. Database, Event Log).
Another option I can think of would be to write your own logging service (out of process) that would log to the file and accepts LogEntries. Then create a custom trace listener that sends a message to your service.
It might not be a good idea since you would have a bit of custom development plus it could impact performance since it is an out of process call. Basically you are setting up your own simplified-pseudo-distributor-service.

EntLib locks the log file when it writes to it. Therefore, 2 processes cannot write to the same log file.
When we have had this problem, that we needed to log from many difference places, to the same place, we have used database logging.
If you are 100% stuck logging to a text file, then you could log to individual log files, and then write a program to merge these files.

I know this is old, but if you are still curious. log4net supports this:
http://logging.apache.org/log4net/release/faq.html#How do I get multiple process to log to the same file?

The problem occurs when the App Pool Recycles and allows for Overlapping Threads. The closing thread has it still open, and the new thread gets the error. Try disabling the overlapping recycling behavior in IIS, or create your own version of the text writer.

WCF service with XML based storage. Concurrency issues?

I programmed a simple WCF service that stores messages sent by users and sends these messages to the intended user when asked for. For now, the persistence is implemented by creating username.xml files with the following structure:
<messages recipient="username">
<message sender="otheruser">
...
</message
</messages>
It is possible for more than one user to send a message to the same recipient at the same time, possibly causing the xml file to be updated concurrently. The WCF service is currently implemented with basicHttp binding, without any provisions for concurrent access.
What concurrency risks are there? How should I deal with them? A ReadWrite lock on the xml file being accessed?
Currently the service runs with 5 users at the most, this may grow up to 50, but no more.
EDIT:
As stated above the client will instantiate a new service class with every call it makes. (InstanceContext is PerCall, ConcurrencyMode irrelevant) This is inherent to the use of basicHttpBinding with default settings on the service.
The code below:
public class SomeWCFService:ISomeServiceContract
{
ClassThatTriesToHoldSomeInfo useless;
public SomeWCFService()
{
useless=new ClassThatTriesToHoldSomeInfo();
}
#region Implementation of ISomeServiceContract
public void IncrementUseless()
{
useless.Counter++;
}
#endregion
}
behaves is if it were written:
public class SomeWCFService:ISomeServiceContract
{
ClassThatTriesToHoldSomeInfo useless;
public SomeWCFService()
{}
#region Implementation of ISomeServiceContract
public void IncrementUseless()
{
useless=new ClassThatTriesToHoldSomeInfo();
useless.Counter++;
}
#endregion
}
So concurrency is never an issue until you try to access some externally stored data as in a database or in a file.
The downside is that you cannot store any data between method calls of the service unless you store it externally.

If your WCF service is a singleton service and guaranteed to be that way, then you don't need to do anything. Since WCF will allow only one request at a time to be processed, concurrent access to the username files is not an issue unless the operation that serves that request spawns multiple threads that access the same file. However, as you can imagine, a singleton service is not very scalable and not something you want in your case I assume.
If your WCF service is not a singleton, then concurrent access to the same user file is a very realistic scenario and you must definitely address it. Multiple instances of your service may concurrently attempt to access the same file to update it and you will get a 'can not access file because it is being used by another process' exception or something like that. So this means that you need to synchronize access to user files. You can use a monitor (lock), ReaderWriterLockSlim, etc. However, you want this lock to operate on per file basis. You don't want to lock the updates on other files out when an update on a different file is going on. So you will need to maintain a lock object per file and lock on that object e.g.
//when a new userfile is added, create a new sync object
fileLockDictionary.Add("user1file.xml",new object());
//when updating a file
lock(fileLockDictionary["user1file.xml"])
{
//update file.
}
Note that that dictionary is also a shared resource that will require synchronized access.
Now, dealing with concurrency and ensuring synchronized access to shared resources at the appropriate granularity is very hard not only in terms of coming up with the right solution but also in terms of debugging and maintaining that solution. Debugging a multi-threaded application is not fun and hard to reproduce problems. Sometimes you don't have an option but sometimes you do. So, Is there any particular reason why you're not using or considering a database based solution? Database will handle concurrency for you. You don't need to do anything. If you are worried about the cost of purchasing a database, there are very good proven open source databases out there such as MySQL and PostgreSQL that won't cost you anything.
Another problem with the xml file based approach is that updating them will be costly. You will be loading the xml from a user file in memory, create a message element, and save it back to file. As that xml grows, that process will take longer, require more memory, etc. It will also hurt your scalibility because the update process will hold onto that lock longer. Plus, I/O is expensive. There are also benefits that come with a database based solution: transactions, backups, being able to easily query your data, replication, mirroring, etc.
I don't know your requirements and constraints but I do think that file-based solution will be problematic going forward.

You need to read the file before adding to it and writing to disk, so you do have a (fairly small) risk of attempting two overlapping operations - the second operation reads from disk before the first operation has written to disk, and the first message will be overwritten when the second message is committed.
A simple answer might be to queue your messages to ensure that they are processed serially. When the messages are received by your service, just dump the contents into an MSMQ queue. Have another single-threaded process which reads from the queue and writes the appropriate changes to the xml file. That way you can ensure you only write one file at a time and resolve any concurrency issues.

The basic problem is when you access a global resource (like a static variable, or a file on the filesystem) you need to make sure you lock that resource or serialize access to it somehow.
My suggestion here (if you want to just get it done quick without using a database or anything, which would be better) would be to insert your messages into a Queue structure in memory from your service code.
public MyService : IMyService
{
public static Queue queue = new Queue();
public void SendMessage(string from, string to, string message)
{
Queue syncQueue = Queue.Synchronized(queue);
syncQueue.Enqueue(new Message(from, to, message));
}
}
Then somewhere else in your app you can create a background thread that reads from that queue and writes to the filesystem one update at a time.
void Main()
{
Timer timer = new Timer();
timer.Tick += (o, e)
{
Queue syncQueue = Queue.Synchronized(MyService.queue);
while(syncQueue.Count > 0)
{
Message message = syncQueue.Dequeue() as Message;
WriteMessageToXMLFile(message);
}
timer.Start();
};
timer.Start();
//Or whatever you do here
StartupService();
}
It's not pretty (and I'm not 100% sure it compiles) but it should work. It sort of follows the "get it done with the tools I have, not the tools I want" kind of approach I think you are looking for.
The clients are also off the line as soon as possible, rather than waiting for the file to be written to the filesystem before they disconnect. This can also be bad... clients might not know their message didn't get delivered should your app go down after they disconnect and the background thread hasn't written their message yet.
Other approaches on here are just as valid... I wanted to post the serialization approach, rather than the locking approach others have suggested.
HTH,
Anderson

Well, it just so happens that I've done something almost exactly the same, except that it wasn't actually messages...
Here's how I'd handle it.
Your service itself talks to a central object (or objects), which can dispatch message requests based on the sender.
The object relating to each sender maintains an internal lock while updating anything. When it gets a new request for a modification, it then can read from disk (if necessary), update the data, and write to disk (if necessary).
Because different updates will be happening on different threads, the internal lock will be serialized. Just be sure to release the lock if you call any 'external' objects to avoid deadlock scenarios.
If I/O becomes a bottleneck, you can look at different strategies involving putting messages in one file, separate files, not immediately writing them to disk, etc. In fact, I'd think about storing the messages for each user in a separate folder for exactly that reason.
The biggest point is, that each service instance acts as, essentially, an adapter to the central class, and that only one instance of one class will ever be responsible for reading/writing messages for a given recipient. Other classes may request a read/write, but they do not actually perform it (or even know how it's performed). This also means that their code is going to look like 'AddMessage(message)', not 'SaveMessages(GetMessages.Add(message))'.
That said, using a database is a very good suggestion, and will likely save you a lot of headaches.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.