Performance issues using task factory in C# - c#

I am working on a logging system for a web application which logs a sequence of events in a dictionary object before sending it to my logging object using Task.Factory.StartNew(() => iLogEventSave()). The logger seemed to work fine, but in some instances some events were not being saved properly so I used the lock() statement to correct the issue. This seemed to do the trick, but the application's performance has dramatically decreased by doing this. How can I have the UI/Page render without having to wait for the Tasks to finish their job?
Below is the code
private static readonly object Locker = new object();
public void iLogEventSave(object state)
{
XmlDocument doc = new XmlDocument();
IDictionary<string, string> EventDetails = (IDictionary<string, string>)state;
string logFile = "";
if(ConfigurationManager.AppSettings["Log_File_Path"].ToString() =="")
{
logFile = HttpRuntime.AppDomainAppPath + "Logs\\" + DateTime.Now.ToString("yyyy_MM_dd") + ".txt";
}
else
{
logFile = ConfigurationManager.AppSettings["Log_File_Path"].ToString() + DateTime.Now.ToString("yyyy_MM_dd") + ".txt";
}
lock (Locker)
{
if (File.Exists(logFile))
{
doc.Load(logFile);
}
else
{
var root = doc.CreateElement("Log");
doc.AppendChild(root);
}
var el = (XmlElement)doc.DocumentElement.AppendChild(doc.CreateElement("Event"));
foreach (KeyValuePair<string, string> item in EventDetails)
{
XmlElement Desc = doc.CreateElement("Details");
Desc.SetAttribute(item.Key.ToString(), item.Value);
el.AppendChild(Desc);
}
doc.Save(logFile);
}
}

If your log did not save several events while being executed asynchronously, you have an unhandled error that you did not address. Considering that you're using a file, I'm going to go out on a limb and say that it failed because two threads were competing for access to the same log file and the first thread to grab it locked the other one out. This is why your lock would now work, it prevents other threads from trying to grab the file.
But logging to a file means that you've effectively restricted yourself to one thread at a time and dealing with the entire file as it grows. You have to load more and more, append more and more, and locking the thread means that the more threads are waiting to log the events, the higher your overhead. All this could certainly add up to a decrease in performance.
May I recommend using a database table to log events? File I/O is very expensive, resource and time-wise. Databases have less overhead and far better throughput by comparison in these very scenarios.

Related

Memory Mapped File gets deleted from memory

For some reason, when i read from a memory mapped file a couple of times it just gets randomly deleted from memory, i don't know what's going on. Is the kernel or GC deleting it from memory? If they are, how do i prevent them from doing so?
I am serializing an object to Json and writing it to memory.
I get an exception when trying to read again after a couple of times, i get FileNotFoundException: Unable to find the specified file.
private const String Protocol = #"Global\";
Code to write to memory mapped file:
public static Boolean WriteToMemoryFile<T>(List<T> data)
{
try
{
if (data == null)
{
throw new ArgumentNullException("Data cannot be null", "data");
}
var mapName = typeof(T).FullName.ToLower();
var mutexName = Protocol + typeof(T).FullName.ToLower();
var serializedData = JsonConvert.SerializeObject(data);
var capacity = serializedData.Length + 1;
var mmf = MemoryMappedFile.CreateOrOpen(mapName, capacity);
var isMutexCreated = false;
var mutex = new Mutex(true, mutexName, out isMutexCreated);
if (!isMutexCreated)
{
var isMutexOpen = false;
do
{
isMutexOpen = mutex.WaitOne();
}
while (!isMutexOpen);
var streamWriter = new StreamWriter(mmf.CreateViewStream());
streamWriter.WriteLine(serializedData);
streamWriter.Close();
mutex.ReleaseMutex();
}
else
{
var streamWriter = new StreamWriter(mmf.CreateViewStream());
streamWriter.WriteLine(serializedData);
streamWriter.Close();
mutex.ReleaseMutex();
}
return true;
}
catch (Exception ex)
{
return false;
}
}
Code to read from memory mapped file:
public static List<T> ReadFromMemoryFile<T>()
{
try
{
var mapName = typeof(T).FullName.ToLower();
var mutexName = Protocol + typeof(T).FullName.ToLower();
var mmf = MemoryMappedFile.OpenExisting(mapName);
var mutex = Mutex.OpenExisting(mutexName);
var isMutexOpen = false;
do
{
isMutexOpen = mutex.WaitOne();
}
while (!isMutexOpen);
var streamReader = new StreamReader(mmf.CreateViewStream());
var serializedData = streamReader.ReadLine();
streamReader.Close();
mutex.ReleaseMutex();
var data = JsonConvert.DeserializeObject<List<T>>(serializedData);
mmf.Dispose();
return data;
}
catch (Exception ex)
{
return default(List<T>);
}
}
The process that created the memory mapped file must keep a reference to it for as long as you want it to live. Using CreateOrOpen is a bit tricky for exactly this reason - you don't know whether disposing the memory mapped file is going to destroy it or not.
You can easily see this at work by adding an explicit mmf.Dispose() to your WriteToMemoryFile method - it will close the file completely. The Dispose method is called from the finalizer of the mmf instance some time after all the references to it drop out of scope.
Or, to make it even more obvious that GC is the culprit, you can try invoking GC explicitly:
WriteToMemoryFile("Hi");
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
ReadFromMemoryFile().Dump(); // Nope, the value is lost now
Note that I changed your methods slightly to work with simple strings; you really want to produce the simplest possible code that reproduces the behaviour you observe. Even just having to get JsonConverter is an unnecessary complication, and might cause people to not even try running your code :)
And as a side note, you want to check for AbandonedMutexException when you're doing Mutex.WaitOne - it's not a failure, it means you took over the mutex. Most applications handle this wrong, leading to issues with deadlocks as well as mutex ownership and lifetime :) In other words, treat AbandonedMutexException as success. Oh, and it's good idea to put stuff like Mutex.ReleaseMutex in a finally clause, to make sure it actually happens, even if you get an exception. Thread or process dead doesn't matter (that will just cause one of the other contendants to get AbandonedMutexException), but if you just get an exception that you "handle" with your return false;, the mutex will not be released until you close all your applications and start again fresh :)
Clearly, the problem is that the MMF loose its context as explained by Luaan. But still nobody explains how to perform it:
The code 'Write to MMF file' must run on a separate async thread.
The code 'Read from MMF' will notify once read completed that the MMF had been read. The notification can be a flag in a file for example.
Therefore the async thread running the 'Write to MMF file' will run as long as the MMF file is read from the second part. We have therefore created the context within which the memory mapped file is valid.

Threading with writing to file system [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have this. It is an application for generating bank Accounts
static void Main(string[] args)
{
string path = #"G:\BankNumbers";
var bans = BankAcoutNumbers.BANS;
const int MAX_FILES = 80;
const int BANS_PER_FILE = 81818182/80;
int bansCounter = 0;
var part = new List<int>();
var maxNumberOfFiles = 10;
Stopwatch timer = new Stopwatch();
var fileCounter = 0;
if (!Directory.Exists(path))
{
DirectoryInfo di = Directory.CreateDirectory(path);
}
try
{
while (fileCounter <= maxNumberOfFiles)
{
timer.Start();
foreach (var bank in BankAcoutNumbers.BANS)
{
part.Add(bank);
if (++bansCounter >= BANS_PER_FILE)
{
string fileName = string.Format("{0}-{1}", part[0], part[part.Count - 1]);
string outputToFile = "";// Otherwise you dont see the lines in the file. Just single line!!
Console.WriteLine("NR{0}", fileName);
string subString = System.IO.Path.Combine(path, "BankNumbers");//Needed to add, because otherwise the files will not stored in the correct folder!!
fileName = subString + fileName;
foreach (var partBan in part)
{
Console.WriteLine(partBan);
outputToFile += partBan + Environment.NewLine;//Writing the lines to the file
}
System.IO.File.WriteAllText(fileName, outputToFile);//Writes to file system.
part.Clear();
bansCounter = 0;
//System.IO.File.WriteAllText(fileName, part.ToString());
if (++fileCounter >= MAX_FILES)
break;
}
}
}
timer.Stop();
Console.WriteLine(timer.Elapsed.Seconds);
}
catch (Exception)
{
throw;
}
System.Console.WriteLine("Press any key to exit.");
System.Console.ReadKey();
}
But this generates 81 million bank account records seperated over 80 files. But can I speed up the process with threading?
You're talking about speeding up a process whose bottleneck is overwhelmingly likely the file write speed. You can't really effectively parallelize writing to a single disk.
You may see slight increases in speed if you spawn a worker thread responsible for just fileIO. In other words, create a buffer, have your main thread dump contents into it while the other thread writes it to disk. It's the classic producer/consumer dynamic. I wouldn't expect serious speed gains, however.
Also keep in mind that writing to the console will slow you down, but you can keep that in the main thread and you'll probably be fine. Just make sure you put a limit on the buffer size and have the producer thread hang back when the buffer is full.
Edit: Also have a look at the link L-Three provided, using a BufferedStream would be an improvement (and probably render a consumer thread unnecessary)
Your process can be divided into two steps:
Generate an account
Save the account in the file
First step can be done in parallel as there is no dependency between accounts. That is wile creating an account number xyz you don't have to rely on data from the account xyz - 1 (as it may not yet be created).
The problematic bit is writing the data into file. You don't want several threads trying to access and write to the same file. And adding locks will likely make your code a nightmare to maintain. Other issue is that it's the writing to the file that slows the whole process down.
At the moment, in your code creating account and writing to the file happens in one process.
What you can try is to separate these processes. So First you create all the accounts and keep them in some collection. Here multi-threading can be used safely. Only when all the accounts are created you save them.
Improving the saving process will take bit more work. You will have to divide all the accounts into 8 separate collections. For each collection you create a separate file. Then you can take first collection, first file, and create a thread that will write the data to the file. The same for second collection and second file. And so on. These 8 processes can run in parallel and you do not have to worry that more than one thread will try to access same file.
Below some pseudo-code to illustrate the idea:
public void CreateAndSaveAccounts()
{
List<Account> accounts = this.CreateAccounts();
// Divide the accounts into separate batches
// Of course the process can (and shoudl) be automated.
List<List<Account>> accountsInSeparateBatches =
new List<List<Account>>
{
accounts.GetRange(0, 10000000), // Fist batch of 10 million
accounts.GetRange(10000000, 10000000), // Second batch of 10 million
accounts.GetRange(20000000, 10000000) // Third batch of 10 million
// ...
};
// Save accounts in parallel
Parallel.For(0, accountsInSeparateBatches.Count,
i =>
{
string filePath = string.Format(#"C:\file{0}", i);
this.SaveAccounts(accountsInSeparateBatches[i], filePath);
}
);
}
public List<Account> CreateAccounts()
{
// Create accounts here
// and return them as a collection.
// Use parallel processing wherever possible
}
public void SaveAccounts(List<Account> accounts, string filePath)
{
// Save accounts to file
// The method creates a thread to do the work.
}

Can this code have bottleneck or be resource-intensive?

It's code that will execute 4 threads in 15-min intervals. The last time that I ran it, the first 15-minutes were copied fast (20 files in 6 minutes), but the 2nd 15-minutes are much slower. It's something sporadic and I want to make certain that, if there's any bottleneck, it's in a bandwidth limitation with the remote server.
EDIT: I'm monitoring the last run and the 15:00 and :45 copied in under 8 minutes each. The :15 hasn't finished and neither has :30, and both began at least 10 minutes before :45.
Here's my code:
static void Main(string[] args)
{
Timer t0 = new Timer((s) =>
{
Class myClass0 = new Class();
myClass0.DownloadFilesByPeriod(taskRunDateTime, 0, cts0.Token);
Copy0Done.Set();
}, null, TimeSpan.FromMinutes(20), TimeSpan.FromMilliseconds(-1));
Timer t1 = new Timer((s) =>
{
Class myClass1 = new Class();
myClass1.DownloadFilesByPeriod(taskRunDateTime, 1, cts1.Token);
Copy1Done.Set();
}, null, TimeSpan.FromMinutes(35), TimeSpan.FromMilliseconds(-1));
Timer t2 = new Timer((s) =>
{
Class myClass2 = new Class();
myClass2.DownloadFilesByPeriod(taskRunDateTime, 2, cts2.Token);
Copy2Done.Set();
}, null, TimeSpan.FromMinutes(50), TimeSpan.FromMilliseconds(-1));
Timer t3 = new Timer((s) =>
{
Class myClass3 = new Class();
myClass3.DownloadFilesByPeriod(taskRunDateTime, 3, cts3.Token);
Copy3Done.Set();
}, null, TimeSpan.FromMinutes(65), TimeSpan.FromMilliseconds(-1));
}
public struct FilesStruct
{
public string RemoteFilePath;
public string LocalFilePath;
}
Private void DownloadFilesByPeriod(DateTime TaskRunDateTime, int Period, Object obj)
{
FilesStruct[] Array = GetAllFiles(TaskRunDateTime, Period);
//Array has 20 files for the specific period.
using (Session session = new Session())
{
// Connect
session.Open(sessionOptions);
TransferOperationResult transferResult;
foreach (FilesStruct u in Array)
{
if (session.FileExists(u.RemoteFilePath)) //File exists remotely
{
if (!File.Exists(u.LocalFilePath)) //File does not exist locally
{
transferResult = session.GetFiles(u.RemoteFilePath, u.LocalFilePath);
transferResult.Check();
foreach (TransferEventArgs transfer in transferResult.Transfers)
{
//Log that File has been transferred
}
}
else
{
using (StreamWriter w = File.AppendText(Logger._LogName))
{
//Log that File exists locally
}
}
}
else
{
using (StreamWriter w = File.AppendText(Logger._LogName))
{
//Log that File exists remotely
}
}
if (token.IsCancellationRequested)
{
break;
}
}
}
}
Something is not quite right here. First thing is, you're setting 4 timers to run parallel. If you think about it, there is no need. You don't need 4 threads running parallel all the time. You just need to initiate tasks at specific intervals. So how many timers do you need? ONE.
The second problem is why TimeSpan.FromMilliseconds(-1)? What is the purpose of that? I can't figure out why you put that in there, but I wouldn't.
The third problem, not related to multi-programming, but I should point out anyway, is that you create a new instance of Class each time, which is unnecessary. It would be necessary if, in your class, you need to set constructors and your logic access different methods or fields of the class in some order. In your case, all you want to do is to call the method. So you don't need a new instance of the class every time. You just need to make the method you're calling static.
Here is what I would do:
Store the files you need to download in an array / List<>. Can't you spot out that you're doing the same thing every time? Why write 4 different versions of code for that? This is unnecessary. Store items in an array, then just change the index in the call!
Setup the timer at perhaps 5 seconds interval. When it reaches the 20 min/ 35 min/ etc. mark, spawn a new thread to do the task. That way a new task can start even if the previous one is not finished.
Wait for all threads to complete (terminate). When they do, check if they throw exceptions, and handle them / log them if necessary.
After everything is done, terminate the program.
For step 2, you have the option to use the new async keyword if you're using .NET 4.5. But it won't make a noticeable difference if you use threads manually.
And why is it so slow...why don't you check your system status using task manager? Is the CPU high and running or is the network throughput occupied by something else or what? You can easily tell the answer yourself from there.
The problem was the sftp client.
The purpose of the console application was to loop through a list<> and download the files. I tried with winscp and, even though, it did the job, it was very slow. I also tested sharpSSH and it was even slower than winscp.
I finally ended up using ssh.net which, at least in my particular case, was much faster than both winscp and sharpssh. I think the problem with winscp is that there was no evident way of disconnecting after I was done. With ssh.net I could connect/disconnect after every file download was made, something I couldn't do with winscp.

Writing to file in a thread safe manner

Writing Stringbuilder to file asynchronously. This code takes control of a file, writes a stream to it and releases it. It deals with requests from asynchronous operations, which may come in at any time.
The FilePath is set per class instance (so the lock Object is per instance), but there is potential for conflict since these classes may share FilePaths. That sort of conflict, as well as all other types from outside the class instance, would be dealt with retries.
Is this code suitable for its purpose? Is there a better way to handle this that means less (or no) reliance on the catch and retry mechanic?
Also how do I avoid catching exceptions that have occurred for other reasons.
public string Filepath { get; set; }
private Object locker = new Object();
public async Task WriteToFile(StringBuilder text)
{
int timeOut = 100;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
while (true)
{
try
{
//Wait for resource to be free
lock (locker)
{
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
}
break;
}
catch
{
//File not available, conflict with other class instances or application
}
if (stopwatch.ElapsedMilliseconds > timeOut)
{
//Give up.
break;
}
//Wait and Retry
await Task.Delay(5);
}
stopwatch.Stop();
}
How you approach this is going to depend a lot on how frequently you're writing. If you're writing a relatively small amount of text fairly infrequently, then just use a static lock and be done with it. That might be your best bet in any case because the disk drive can only satisfy one request at a time. Assuming that all of your output files are on the same drive (perhaps not a fair assumption, but bear with me), there's not going to be much difference between locking at the application level and the lock that's done at the OS level.
So if you declare locker as:
static object locker = new object();
You'll be assured that there are no conflicts with other threads in your program.
If you want this thing to be bulletproof (or at least reasonably so), you can't get away from catching exceptions. Bad things can happen. You must handle exceptions in some way. What you do in the face of error is something else entirely. You'll probably want to retry a few times if the file is locked. If you get a bad path or filename error or disk full or any of a number of other errors, you probably want to kill the program. Again, that's up to you. But you can't avoid exception handling unless you're okay with the program crashing on error.
By the way, you can replace all of this code:
using (FileStream file = new FileStream(Filepath, FileMode.Append, FileAccess.Write, FileShare.Read))
using (StreamWriter writer = new StreamWriter(file, Encoding.Unicode))
{
writer.Write(text.ToString());
}
With a single call:
File.AppendAllText(Filepath, text.ToString());
Assuming you're using .NET 4.0 or later. See File.AppendAllText.
One other way you could handle this is to have the threads write their messages to a queue, and have a dedicated thread that services that queue. You'd have a BlockingCollection of messages and associated file paths. For example:
class LogMessage
{
public string Filepath { get; set; }
public string Text { get; set; }
}
BlockingCollection<LogMessage> _logMessages = new BlockingCollection<LogMessage>();
Your threads write data to that queue:
_logMessages.Add(new LogMessage("foo.log", "this is a test"));
You start a long-running background task that does nothing but service that queue:
foreach (var msg in _logMessages.GetConsumingEnumerable())
{
// of course you'll want your exception handling in here
File.AppendAllText(msg.Filepath, msg.Text);
}
Your potential risk here is that threads create messages too fast, causing the queue to grow without bound because the consumer can't keep up. Whether that's a real risk in your application is something only you can say. If you think it might be a risk, you can put a maximum size (number of entries) on the queue so that if the queue size exceeds that value, producers will wait until there is room in the queue before they can add.
You could also use ReaderWriterLock, it is considered to be more 'appropriate' way to control thread safety when dealing with read write operations...
To debug my web apps (when remote debug fails) I use following ('debug.txt' end up in \bin folder on the server):
public static class LoggingExtensions
{
static ReaderWriterLock locker = new ReaderWriterLock();
public static void WriteDebug(string text)
{
try
{
locker.AcquireWriterLock(int.MaxValue);
System.IO.File.AppendAllLines(Path.Combine(Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", ""), "debug.txt"), new[] { text });
}
finally
{
locker.ReleaseWriterLock();
}
}
}
Hope this saves you some time.

Thread safe writing to Lucene index files

I have an application that crawls a site and writes the content as lucene index files into the physical directory.
When I use threads for this purpose, I am getting write errors or errors due to the locks.
I want to use multiple threads and write into the index files without missing the task of any of the threads.
public class WriteDocument
{
private static Analyzer _analyzer;
private static IndexWriter indexWriter;
private static string Host;
public WriteDocument(string _Host)
{
Host = _Host;
Lucene.Net.Store.Directory _directory = FSDirectory.GetDirectory(Host, false);
_analyzer = new StandardAnalyzer();
bool indexExists = IndexReader.IndexExists(_directory);
bool createIndex = !indexExists;
indexWriter = new IndexWriter(_directory, _analyzer, true);
}
public void AddDocument(object obj)
{
DocumentSettings doc = (DocumentSettings)obj;
Field urlField = new Field("Url", doc.downloadedDocument.Uri.ToString(), Field.Store.YES, Field.Index.TOKENIZED);
document.Add(urlField);
indexWriter.AddDocument(document);
document = null;
doc.downloadedDocument = null;
indexWriter.Optimize();
indexWriter.Close();
}
}
To the above class, I am passing the values like this:
DocumentSettings writedoc = new DocumentSettings()
{
Host = Host,
downloadedDocument = downloadDocument
};
Thread t = new Thread(() =>
{
doc.AddDocument(writedoc);
});
t.Start();
If I add t.Join(); after t.Start(); the code works for me without any errors. But this slows down my process and virtually, this is equal to the output I get without using threads.
I am getting error like:
Cannot rename /indexes/Segments.new to /indexes/Segments
the file is used by some other process.
Can anyone help me on this code?
An IndexWriter is not thread safe, so this is not possible.
If you want to use multiple threads for download, you will need to build some sort of "message pump" that is singlethreaded, to which you can feed the Documents you are downloading and creating, and puts them in a queue.
Example, in your AddDocument method, instead of utilizing the index directly, just send them of to a service that will index it eventually.
That service should try to index everything in queue all the time, and if it doesn't have a queue for the time being, sleep for a while.
An approach you can take would be to create a separate index for each of the threads and merge them all back at the end. e.g. index1, index2...indexn (corresponding to threads 1..n) and merge them.

Categories