I have a 8 core systems and i am processing number of text files contains millions of lines say 23 files contain huge number of lines which takes 2 to 3 hours to finish.I am thinking of using TPL task for processing text files.As of now the code which i am using is sequentially processing text files one by one so i am thinking of split it like 5 text files in one thread 5 in another thread etc.Is it a good approach or any other way ? I am using .net 4.0 and code i am using is as shown below
foreach (DataRow dtr in ds.Tables["test"].Rows)
{
string filename = dtr["ID"].ToString() + "_cfg";
try
{
foreach (var file in
Directory.EnumerateFiles(Path.GetDirectoryName(dtr["FILE_PATH"].ToString()), "*.txt"))
{
id = file.Split('\\').Last();
if (!id.Contains("GMML"))
{
strbsc = id.Split('_');
id = strbsc[0];
}
else
{
strbsc = file.Split('-');
id = ("RC" + strbsc[1]).Replace("SC", "");
}
ProcessFile(file, id, dtr["CODE"].ToString(), dtr["DOR_CODE"].ToString(), dtr["FILE_ID"].ToString());
}
}
How to split text files in to batches and each batch should run in threads rather one by one.Suppose if 23 files then 7 in one thread 7 in one thread 7 in one thread and 2 in another thread. One more thing is i am moving all these data from text files to oracle database
EDIT
if i use like this will it worth,but how to seperate files in to batches
Task.Factory.StartNew(() => {ProcessFile(file, id, dtr["CODE"].ToString(), dtr["DOR_CODE"].ToString(), dtr["FILE_ID"].ToString()); });
Splitting the file into multiple chunks does not seem to be a good idea because its performance boost is related to how the file is placed on your disk. But because of the async nature of disk IO operations, I strongly recommend async access to the file. There are several ways to do this and you can always choose a combination of those.
At the lowest level you can use async methods such as StreamWriter.WriteAsync() or StreamReader.ReadAsync() to access the file on disk and cooperatively let the OS know that it can switch to a new thread for disk IO and let the thread out until the Disk IO operation is finished. While it's useful to make async calls at this level, it alone does not have a significant impact on the overall performance of your application, since your app is still waiting for the disk operation to finish and does nothing in the meanwhile! (These calls can have a big impact on your software's responsiveness when they are called from the UI thread)
So, I recommend splitting your software logic into at least two separate parts running on two separate threads; One to read data from the file, and one to process the read data. You can use the provider/consumer pattern to help these threads interact.
One great data structure provided by .net is System.Collections.Concurrent.ConcurrentQueue which is specially useful in implementing multithreaded provider/consumer pattern.
So you can easily do something like this:
System.Collections.Concurrent.ConcurrentQueue<string> queue = new System.Collections.Concurrent.ConcurrentQueue<string>();
bool readFinished = false;
Task tRead = Task.Run(async () =>
{
using (FileStream fs = new FileStream())
{
using (StreamReader re = new StreamReader(fs))
{
string line = "";
while (!re.EndOfStream)
queue.Enqueue(await re.ReadLineAsync());
}
}
});
Task tLogic = Task.Run(async () =>
{
string data ="";
while (!readFinished)
{
if (queue.TryDequeue(out data))
//Process data
else
await Task.Delay(100);
}
});
tRead.Wait();
readFinished = true;
tLogic.Wait();
This simple example uses StreamReader.ReadLineAsync() to read data from file, while a good practice can be reading a fixed-length of characters into a char[] buffer and adding that data to the queue. You can find the optimized buffer length after some tests.
All ,the real bottle neck was when i was doing mass insertion , i was checking whether the inserting data is present in the database or what,I have a status column where if data is present ,it will be 'Y' or 'N' by doing update statement.So the update statement in congestion with insert was the culprit.After doing indexing in database the result reduced from 4 hours to 10 minute, what an impact, but it wins:)
Related
This might be a long shot but I might as well try here. There is a block of c# code that is rebuilding a solr core. The steps are as follows:
Delete all the existing documents
Get the core entities
Split the entities into batches of 1000
Spin of threads to preform the next set of processes:
Serialize each batch to json and writing the json to a file on the server
hosting the core
Send a command to the core to upload that file using System.Net.WebClient solrurl/corename/update/json?stream.file=myfile.json&stream.contentType=application/json;charset=utf-8
Delete the file. I've also tried deleting the files after all the batches are done, as well as not deleting the files at all
After all batches are done it commits. I've also tried committing
after each batch is done.
My problem is the last batch will not upload if it's much less than the batch size. It flows through like the command was called but nothing happens. It throws no exceptions and I see no errors in the solr logs. My questions are Why? and How can I ensure the last batch always gets uploaded? We think it's a timing issue, but we've added Thread.Sleep(30000) in many parts of the code to test that theory and it still happens.
The only time it doesn't happen is:
if the batch is full or almost full
we don't run multiple threads it
we put a break point at the File.Delete line on the last batch and wait for 30 seconds or so, then continue
Here is the code for writing the file and calling the update command. This is called for each batch.
private const string
FileUpdateCommand = "{1}/update/json?stream.file={0}&stream.contentType=application/json;charset=utf-8",
SolrFilesDir = #"\\MYSERVER\SolrFiles",
SolrFileNameFormat = SolrFilesDir + #"\{0}-{1}.json",
_solrUrl = "http://MYSERVER:8983/solr/",
CoreName = "MyCore";
public void UpdateCoreByFile(List<CoreModel> items)
{
if (items.Count == 0)
return;
var settings = new JsonSerializerSettings { DateTimeZoneHandling = DateTimeZoneHandling.Utc };
var dir = new DirectoryInfo(SolrFilesDir);
if (!dir.Exists)
dir.Create();
var filename = string.Format(SolrFileNameFormat, Guid.NewGuid(), CoreName);
using (var sw = new StreamWriter(filename))
{
sw.Write(JsonConvert.SerializeObject(items, settings));
}
var file = HttpUtility.UrlEncode(filename);
var command = string.Format(FileUpdateCommand, file, CoreName);
using (var client = _clientFactory.GetClient())//System.Net.WebClient
{
client.DownloadData(new Uri(_solrUrl + command));
}
//Thread.Sleep(30000);//doesn't work if I add this
File.Delete(filename);//works here if add breakpoint and wait 30 sec or so
}
I'm just trying to figure out why this is happening and how to address it. I hope this makes sense, and I have provided enough information and code. Thanks for any help.
Since changing the size of the data set and adding a breakpoint "fixes" it, this is most certainly a race condition. Since you haven't added the code that actually indexes the content, it's impossible to say what the issue really is, but my guess is that the last commit happens before all the threads have finished, and only works when all threads are done (if you sleep the threads, the issue will still be there, since all threads sleep for the same time).
The easy fix - use commitWithin instead, and never issue explicit commits. The commitWithin parmaeter makes sure that the documents become available in the index within the given time frame (given as milliseconds). To make sure that the documents you submit becomes available within ten seconds, append &commitWithin=10000 to your URL.
If there's already documents pending a commit, the documents added will be committed before the ten seconds has ellapsed, but even if there's just one last document being submitted as the last batch, it'll never be more than ten seconds before it becomes visible (.. and there will be no documents left forever in a non-committed limbo).
That way you won't have to keep your threads synchronized or issue a final commit, as long as you wait until all threads have finished before exiting your application (if it's an application that actually terminates).
Hello fellow developers,
I have a question about implementing multi-threading on my .NET (Framework 4.0) Windows Service.
Basically, what the service should be doing is the following:
Scans the filesystem (a specific directory) to see if there are files to process
If there are files that need to be processed, it should be using a thread pooling mechanism to issue threads up to a predetermined amount.
Each thread will perform an upload operation of a single file
As soon as one thread completes, the filesystem is scanned again to see if there are other files to process (I want to avoid having two threads perform the operation on the same file)
I am struggling to find a way that will allow me to do just that last step.
Right now, I have a function that retrieves the number of maximum number of concurrent threads that runs in the main thread:
int maximumNumberOfConcurrentThreads = getMaxThreads(databaseConnection);
Then, still in the main thread, I have a function that scans the directory and returns a list with the files to process
List<FileToUploadInfo> filesToUpload = getFilesToUploadFromFS(directory);
After this, I call the following function:
generateThreads(maximumNumberOfConcurrentThreads, filesToUpload);
Each thread should be calling the below function (returns void):
uploadFile(fileToUpload, databaseConnection, currentThread);
Right now, the way the program is structured, if maximum number of threads is set, say, to 5, I am grabbing 5 elements from the list and uploading them.
As soon as all 5 are done, I grab 5 more and do the same until I don't have any left, as per code below.
for (int index = 0; index < filesToUpload.Count; index = index + maximumNumberOfConcurrentThreads) {
try {
Parallel.For(0, maximumNumberOfConcurrentThreads, iteration => { if (index + iteration < filesToUpload .Count) { uploadFile(filesToUpload [index + iteration], databaseConnection, iteration); } });
}
catch (System.ArgumentOutOfRangeException outOfRange) {
debug("Exception in Parallel.For [" + outOfRange.Message + "]");
}
However, if 4 files are small and the upload of each one takes 5 seconds, while the remaining one is big and takes 30 minutes, I will have, after the 4 files have been uploaded, only one file uploading, and I need to wait for it to finish before starting to upload others in the list.
After finishing uploading all the files in the list, my service goes to sleep, and then, when it wakes up again, it scans the file system again.
What is the strategy that best fits my needs? Is it advisable to go this route or will it create concurrency nightmares? I need to avoid uploading any file twice.
I'm working on one program that takes information from files and then stores them in MySQL database. This MySQL database is located in another dedicated server which is much more powerful than this server here. Data is being sent over LAN using 1gbps connection.
It is using 8 threads because my server has 8 cores, but somehow it runs so slowly.
CPU is: Intel Xeon E3-1270 v 3 # 3.50Ghz
RAM: 16 GB ECC
HDD: SATA 3 1TB
My program's CPU usage is only 0-5%
CPU affinity is all 8 cores
So, do you have any ideas what's wrong or how can I increase the speed of my program?
UPDATE:
I updated my code and it appears to be faster:
Parallel.For(0, this.data_files.Count, new ParallelOptions { MaxDegreeOfParallelism = this.MaxThreads }, i =>
{
this.ThreadCount++;
this.ParseFile(this.GetSource());
});
Here's a code snippet that deploys threads:
while (true)
{
if (this.ThreadCount < this.MaxThreads)
{
Task.Factory.StartNew(() =>
this.ParseFile(this.GetFile())
);
this.ThreadCount++;
}
else
{
Thread.Sleep(1);
}
this.UpdateConsole();
}
GetFile function:
private string GetFile()
{
string file = "";
string source = "";
while (true)
{
if (this.data_files.Count() != 0)
{
file = this.data_files[0];
this.data_files.RemoveAt(0);
if (File.Exists(file) == true)
{
source = File.ReadAllText(file);
File.Delete(file);
break;
}
}
}
return source;
}
I'm working on one program that takes information from files and then stores them in MySQL database.
Clearly your program is not CPU bound, it's IO bound. The bottlenecks are going to be based on your hard disk(s) and your network connection. Odds are even a single thread is going to be able to ensure proper utilization of these resources (in a well designed application). Adding extra threads generally won't help, it'll just create a bunch of threads that will spend their time waiting on various IO operations.
To use all the hardware resources is not the right goal for a program to have.
Instead, a better goal is to be as fast as possible. This is significantly different. While using more hardware resources can help, it is not always sufficient.
Sometimes, adding more resources to a problem doesn't help. In those cases, don't. Adding threads makes your program more complex, but not necessarily faster as you've seen.
C# already has good Asynchronous programming features with the TPL (which you are already using), so why not take advantage of that?
This will mean that the .NET framework will automatically manage the threads for you in an efficient way.
Here's what I propose:
foreach (var file in GetFilesToRead()) {
var task = PerformOperation(file);
// Keep a list of tasks, if you wish.
}
...
Task PerformOperation (string filename) {
var file = await ReadFile(file);
await ParseFile(file);
DoSomething();
}
Note that even in CPU-bound programs, threads (and tasks) may not help you if you're using locks.
Although locks help keep programs well-behaved, they come at a significant performance cost.
Within a lock, only one thread may be executing at a time.
This means that the first thread is locking your _lock instance, and then the other threads are waiting for that lock to be released.
In your program, only one thread is active at a time.
To solve this, don't use locks. Instead, write programs that do not need locks at all. Copy variables instead of sharing them. Use immutable collections instead of mutable collections and so on.
My program above uses exactly zero locks and, as such, will better utilize your threads.
I'm building a console application that have to process a bunch of data.
Basically, the application grabs references from a DB. For each reference, parse the content of the file and make some changes. The files are HTML files, and the process is doing a heavy work with RegEx replacements (find references and transform them into links). The results in then stored on the file system and sent to an external system.
If I resume the process, in a sequential way :
var refs = GetReferencesFromDB(); // ~5000 Datarow returned
foreach(var ref in refs)
{
var filePath = GetFilePath(ref); // This method looks up in a previously loaded file list
var html = File.ReadAllText(filePath); // Read html locally, or from a network drive
var convertedHtml = ParseHtml(html);
File.WriteAllText(destinationFilePath); // Copy the result locally, or a network drive
SendToWs(ref, convertedHtml);
}
My program is working correctly but is quite slow. That's why I want to parallelise the process.
By now, I made a simple Parallelization adding AsParallel :
var refs = GetReferencesFromDB().AsParallel();
refs.ForAll(ref=>
{
var filePath = GetFilePath(ref);
var html = File.ReadAllText(filePath);
var convertedHtml = ParseHtml(html);
File.WriteAllText(destinationFilePath);
SendToWs(ref, convertedHtml);
});
This simple change decrease the duration of the process (25% less time). However, what I understand with parallelization is that there won't be much benefits (or worse, less benefits) if parallelyzing over resources relying on I/O, because the i/o won't magically doubles.
That's why I think I should change my approach not to parallelize the whole process, but to create dependent chained queued tasks.
I.E., I should create a flow like :
Queue read file. When finished, Queue ParseHtml. When finished, Queue both send to WS and write locally. When finished, log the result.
However, I don't know how to realize such think.
I feel it will ends in a set of consumer/producer queues, but I didn't find a correct sample.
And moreover, I'm not sure if there will be benefits.
thanks for advices
[Edit] In fact, I'm the perfect candidate for using c# 4.5... if only it was rtm :)
[Edit 2] Another thing making me thinking it's not correctly parallelized, is that in the resource monitor, I see graphs of CPU, network I/O and disk I/O not stable. when one is high, others are low to medium
You're not leveraging any async I/O APIs in any of your code. Everything you're doing is CPU bound and all your I/O operations are going to waste CPU resources blocking. AsParallel is for compute bound tasks, if you want to take advantage of async I/O you need to leverage the Asynchronous Programming Model (APM) based APIs today in <= v4.0. This is done by looking for BeginXXX/EndXXX methods on the I/O based classes you're using and leveraging those whenever available.
Read this post for starters: TPL TaskFactory.FromAsync vs Tasks with blocking methods
Next, you don't want to use AsParallel in this case anyway. AsParallel enables streaming which will result in an immediately scheduling a new Task per item, but you don't need/want that here. You'd be much better served by partitioning the work using Parallel::ForEach.
Let's see how you can use this knowledge to achieve max concurrency in your specific case:
var refs = GetReferencesFromDB();
// Using Parallel::ForEach here will partition and process your data on separate worker threads
Parallel.ForEach(
refs,
ref =>
{
string filePath = GetFilePath(ref);
byte[] fileDataBuffer = new byte[1048576];
// Need to use FileStream API directly so we can enable async I/O
FileStream sourceFileStream = new FileStream(
filePath,
FileMode.Open,
FileAccess.Read,
FileShare.Read,
8192,
true);
// Use FromAsync to read the data from the file
Task<int> readSourceFileStreamTask = Task.Factory.FromAsync(
sourceFileStream.BeginRead
sourceFileStream.EndRead
fileDataBuffer,
fileDataBuffer.Length,
null);
// Add a continuation that will fire when the async read is completed
readSourceFileStreamTask.ContinueWith(readSourceFileStreamAntecedent =>
{
int soureFileStreamBytesRead;
try
{
// Determine exactly how many bytes were read
// NOTE: this will propagate any potential exception that may have occurred in EndRead
sourceFileStreamBytesRead = readSourceFileStreamAntecedent.Result;
}
finally
{
// Always clean up the source stream
sourceFileStream.Close();
sourceFileStream = null;
}
// This is here to make sure you don't end up trying to read files larger than this sample code can handle
if(sourceFileStreamBytesRead == fileDataBuffer.Length)
{
throw new NotSupportedException("You need to implement reading files larger than 1MB. :P");
}
// Convert the file data to a string
string html = Encoding.UTF8.GetString(fileDataBuffer, 0, sourceFileStreamBytesRead);
// Parse the HTML
string convertedHtml = ParseHtml(html);
// This is here to make sure you don't end up trying to write files larger than this sample code can handle
if(Encoding.UTF8.GetByteCount > fileDataBuffer.Length)
{
throw new NotSupportedException("You need to implement writing files larger than 1MB. :P");
}
// Convert the file data back to bytes for writing
Encoding.UTF8.GetBytes(convertedHtml, 0, convertedHtml.Length, fileDataBuffer, 0);
// Need to use FileStream API directly so we can enable async I/O
FileStream destinationFileStream = new FileStream(
destinationFilePath,
FileMode.OpenOrCreate,
FileAccess.Write,
FileShare.None,
8192,
true);
// Use FromAsync to read the data from the file
Task destinationFileStreamWriteTask = Task.Factory.FromAsync(
destinationFileStream.BeginWrite,
destinationFileStream.EndWrite,
fileDataBuffer,
0,
fileDataBuffer.Length,
null);
// Add a continuation that will fire when the async write is completed
destinationFileStreamWriteTask.ContinueWith(destinationFileStreamWriteAntecedent =>
{
try
{
// NOTE: we call wait here to observe any potential exceptions that might have occurred in EndWrite
destinationFileStreamWriteAntecedent.Wait();
}
finally
{
// Always close the destination file stream
destinationFileStream.Close();
destinationFileStream = null;
}
},
TaskContinuationOptions.AttachedToParent);
// Send to external system **concurrent** to writing to destination file system above
SendToWs(ref, convertedHtml);
},
TaskContinuationOptions.AttachedToParent);
});
Now, here's few notes:
This is sample code so I'm using a 1MB buffer to read/write files. This is excessive for HTML files and wasteful of system resources. You can either lower it to suit your max needs or implement chained reads/writes into a StringBuilder which is an excercise I leave up to you since I'd be writing ~500 more lines of code to do async chained reads/writes. :P
You'll note that on the continuations for the read/write tasks I have TaskContinuationOptions.AttachedToParent. This is very important as it will prevent the worker thread that the Parallel::ForEach starts the work with from completing until all the underlying async calls have completed. If this was not here you would kick off work for all 5000 items concurrently which would pollute the TPL subsystem with thousands of scheduled Tasks and not scale properly at all.
I call SendToWs concurrent to writing the file to the file share here. I don't know what is underlying the implementation of SendToWs, but it too sounds like a good candidate for making async. Right now it's assumed it's pure compute work and, as such, is going to burn a CPU thread while executing. I leave it as an excercise to you to figure out how best to leverage what I've shown you to improve throughput there.
This is all typed free form and my brain was the only compiler here and SO's syntax higlighting is all I used to make sure syntax was good. So, please forgive any syntax errors and let me know if I screwed up anything too badly that you can't make heads or tails of it and I'll follow up.
The good news is your logic could be easily separated into steps that go into a producer-consumer pipeline.
Step 1: Read file
Step 2: Parse file
Step 3: Write file
Step 4: SendToWs
If you are using .NET 4.0 you can use the BlockingCollection data structure as the backbone for the each step's producer-consumer queue. The main thread will enqueue each work item into step 1's queue where it will be picked up and processed and then forwarded on to step 2's queue and so on and so forth.
If you are willing to move on to the Async CTP then you can take advantage of the new TPL Dataflow structures for this as well. There is the BufferBlock<T> data structure, among others, that behaves in a similar manner to BlockingCollection and integrates well with the new async and await keywords.
Because your algorithm is IO bound the producer-consumer strategies may not get you the performance boost you are looking for, but at least you will have a very elegant solution that would scale well if you could increase the IO throughput. I am afraid steps 1 and 3 will be the bottlenecks and the pipeline will not balance well, but it is worth experimenting with.
Just a suggestion, but have you looked into the Consumer / Producer pattern ? A certain number of threads would read your files on disk and feed the content to a queue. Then another set of threads, known as the consumers, would "consume" the queue as its filled. http://zone.ni.com/devzone/cda/tut/p/id/3023
Your best bet in these kind of scenario is definitely the producer-consumer model. One thread to pull the data and a bunch of workers to process it. There's no easy way around the I/O so you might as well just focus on optimizing the computation itself.
I will now try to sketch a model:
// producer thread
var refs = GetReferencesFromDB(); // ~5000 Datarow returned
foreach(var ref in refs)
{
lock(queue)
{
queue.Enqueue(ref);
event.Set();
}
// if the queue is limited, test if the queue is full and wait.
}
// consumer threads
while(true)
{
value = null;
lock(queue)
{
if(queue.Count > 0)
{
value = queue.Dequeue();
}
}
if(value != null)
// process value
else
event.WaitOne(); // event to signal that an item was placed in the queue.
}
You can find more details about producer/consumer in part 4 of Threading in C#: http://www.albahari.com/threading/part4.aspx
I think your approach to split up the list of files and process each file in one batch is ok.
My feeling is that you might get more performance gain if you play with degree of parallelism.
See: var refs = GetReferencesFromDB().AsParallel().WithDegreeOfParallelism(16); this would start processing 16 files at the same time. Currently you are processing probably 2 or 4 files depending on number of cores you have. This is only efficient when you have only computation without IO. For IO intensive tasks adjustment might bring incredible performance improvements reducing processor idle time.
If you are going to split up and join tasks back using producer-consumer look at this sample: Using Parallel Linq Extensions to union two sequences, how can one yield the fastest results first?
Say the method below is being called several thousand times by different threads in a .net 4 application. What’s the best way to handle this situation? Understand that the disk is the bottleneck here but I’d like the WriteFile() method to return quickly.
Data can be can be up to a few MB. Are we talking threadpool, TPL or the like?
public void WriteFile(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
If you want to return quickly and not really care that operation is synchronous you could create some kind of in memory Queue where you will be putting write requests , and while Queue is not filled up you can return from method quickly. Another thread will be responsible for dispatching Queue and writing files. If your WriteFile is called and queue is full you will have to wait until you can queue and execution will become synchronous again, but that way you could have a big buffer so if process file write requests is not linear , but is more spiky instead (with pauses between write file calls spikes) such change can be seen as an improvement in your performance.
UPDATE:
Made a little picture for you. Notice that bottleneck always exists, all you can possibly do is optimize requests by using a queue. Notice that queue has limits, so when its filled up , you cannot insta queue files into, you have to wait so there is a free space in that buffer too. But for situation presented on picture (3 bucket requests) its obvious you can quickly put buckets into queue and return, while in first case you have to do that 1 by one and block execution.
Notice that you never need to execute many IO threads at once, since they will all be using same bottleneck and you will just be wasting memory if you try to parallel this heavily, I believe 2 - 10 threads tops will take all available IO bandwidth easily, and will limit application memory usage too.
Since you say that the files don't need to be written in order nor immediately, the simplest approach would be to use a Task:
private void WriteFileAsynchronously(string FileName, MemoryStream Data)
{
Task.Factory.StartNew(() => WriteFileSynchronously(FileName, Data));
}
private void WriteFileSynchronously(string FileName, MemoryStream Data)
{
try
{
using (FileStream DiskFile = File.OpenWrite(FileName))
{
Data.WriteTo(DiskFile);
DiskFile.Flush();
DiskFile.Close();
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
The TPL uses the thread pool internally, and should be fairly efficient even for large numbers of tasks.
If data is coming in faster than you can log it, you have a real problem. A producer/consumer design that has WriteFile just throwing stuff into a ConcurrentQueue or similar structure, and a separate thread servicing that queue works great ... until the queue fills up. And if you're talking about opening 50,000 different files, things are going to back up quick. Not to mention that your data that can be several megabytes for each file is going to further limit the size of your queue.
I've had a similar problem that I solved by having the WriteFile method append to a single file. The records it wrote had a record number, file name, length, and then the data. As Hans pointed out in a comment to your original question, writing to a file is quick; opening a file is slow.
A second thread in my program starts reading that file that WriteFile is writing to. That thread reads each record header (number, filename, length), opens a new file, and then copies data from the log file to the final file.
This works better if the log file and the final file are are on different disks, but it can still work well with a single spindle. It sure exercises your hard drive, though.
It has the drawback of requiring 2X the disk space, but with 2-terabyte drives under $150, I don't consider that much of a problem. It's also less efficient overall than directly writing the data (because you have to handle the data twice), but it has the benefit of not causing the main processing thread to stall.
Encapsulate your complete method implementation in a new Thread(). Then you can "fire-and-forget" these threads and return to the main calling thread.
foreach (file in filesArray)
{
try
{
System.Threading.Thread updateThread = new System.Threading.Thread(delegate()
{
WriteFileSynchronous(fileName, data);
});
updateThread.Start();
}
catch (Exception ex)
{
string errMsg = ex.Message;
Exception innerEx = ex.InnerException;
while (innerEx != null)
{
errMsg += "\n" + innerEx.Message;
innerEx = innerEx.InnerException;
}
errorMessages.Add(errMsg);
}
}