Parallel Concurrent Binary Readers - c#

I Have a Parallel.Foreach Loop creating Binary Readers on the same group of large Data Files
I was just wondering if it hurts performance that these readers are reading the same files in a Parallel Fashion (i.e, if they were reading exclusively different files would it go faster ?)
I am asking because there is a lot of I/O Disk access involved (I guess...)
Edit : I forgot to mention : I am using an Amazon EC2 instance and data is on the C:\ Disk assigned to it. I have no Idea how it affects this issue.
Edit 2: I'll make measurements duplicating the data folder and reading from 2 different sources and see what it gives.

It's not a good idea to read from the same disk using multiple threads. Since the disk's mechanical head needs to spin every time to seek the next reading location, you are basically bouncing it around with multiple threads, thus hurting performance.
The best approach is actually to read the files sequentially using a single thread and then handing the chunks to a group of threads to process them in parallel.

It depends on where your files are. If you're using one mechanical hard-disk, then no - don't read files in parallel, it's going to hurt performance. You may have other configurations, though:
On a single SDD, reading files in parallel will probably not hurt performance, but I don't expect you'll gain anything.
On two mirrored disks using RAID 1 and a half-decent RAID controller, you can read two files at once and gain considerable performance.
If your files are stored on a SAN, you can most definitely read a few at a time and improve performance.
You'll have to try it, but you have to be careful with this - if the files aren't large enough, the OS caching mechanisms are going to affect your measurements, and the second test run is going to be really fast.

Related

Benefits of saving multiple files async

I'm writing an action on my controller which saves files to disk. On .Net Core 2.0
I saw some code which saved files like this.
foreach (var formFile in files)
{
if (formFile.Length > 0)
{
using (var stream = new FileStream(filePath, FileMode.Create))
{
await formFile.CopyToAsync(stream);
}
}
}
This is saving files async but sequentially. So I decided to write it a bit differently
var fileTasks = files.Where(f => f.Length > 0).Select(f => this.SaveFile(f, BASE_PATH));
await Task.WhenAll(fileTasks);
protected async Task SaveFile(IFormFile file, string basePath)
{
var fileName = Path.GetTempFileName();
var filePath = Path.Combine(basePath, fileName);
using (var stream = new FileStream(filePath, FileMode.Create))
{
await file.CopyToAsync(stream);
}
}
Assuming I'm saving them all to the same drive, would there be any benefit of doing this?
I'm aware I wouldn't be blocking on any threads, but would would there still be a bottle neck at the Disc? Or can Modern computers save more than 1 file at once?
would would there still be a bottle neck at the Disc? Or can Modern computers save more than 1 file at once?
Yes, and yes. The disk, being orders of magnitude slower than the rest of the computer, will always be a bottle-neck. But, while it is not possible to literally write to more places on a disk at once than there are write heads (rotating media disks almost all have multiple write heads, because there are multiple platters and platter sides on almost all such disks), certainly modern computers (and even not-so-modern computers) can track the I/O for multiple files at once.
The short answer to the broader question: the only way to know for sure, with respect to any performance question, is to test it. No one here can predict what the outcome will be. This is true even for relatively simple CPU-bound problems, and it's even more significant when you're dealing with something as complex as writing data to a storage device.
And even if you find you can make the file I/O faster now, that effort may or may not remain relevant in the future. It's even possible you could wind up with your code being slower than a simpler implementation.
The longer version…
Issues that affect the actual performance include:
Type of drive. Conventional hard disks with rotating media are generally much slower than SSD, but each type of drive has its own particular performance characteristics.
Configuration of drive. Different manufacturers ship drives with different disk RPMs (for rotating drives), different controllers, different cache sizes and types, and varying support for disk protocols. A logical drive might actually be multiple physical drives (e.g. RAID), and even within a drive the storage can be configured differently: rotating media drives can have varying numbers of platters for a given amount of storage, and SSDs can use a variety of storage technologies and arrangements (i.e. single-level vs. multi-level cells, with different block sizes and layouts. This is far from an exhaustive list of the types of variations one might see in disk drives.
File system. Even Windows supports a wide range of file systems, and other OS's have an even broader variety of options. Each file system has specific things it's good at and poor at, and performance will depend on the exact nature of how the files are being accessed.
Driver software. Drives mostly use standardized APIs and typically a basic driver in the OS is used for all types of drives. But there are exceptions to the rule.
Operating system version and configuration. Different versions of Windows, or any other OS, have subtly different implementations for dealing with disk I/O. Even within a given version of an OS, a given drive may be configured differently, with options for caching.
Some generalizations can be made, but for every true generalization, there will be an exception. Murphy's Law leads us to conclude that if you ignore real-world testing of your implementation, you'll wind up being the exception.
All that said, it is possible that writing to multiple files concurrently can improve throughput, at least for disks with rotating media. Why?
While the comment above from #Plutonix is correct, it does gloss over the fact that the disk controller will optimize as best it can the writes. Having multiple writes queued at once (whether due to multiple files or a single file spread around the disk) allows the disk controller to take advantage of the current position of the disk.
Consider, for example, if you were to write a file one block at a time. You write a block, when you find it's been written, you write another. Well, the disk's moved by the time you get around to writing the next block, so now you get to wait for the proper location to come back around to the write head before the next write can complete.
So, what if you hand over two blocks to the OS at a time? Now, the disk controller can be told about both blocks, and if one block can be written immediately after another, it's there ready to be written. No waiting for another rotation of the disk.
The more blocks you can hand over at once, and the more the disk controller can see to write at once, the better the odds of it being able to write blocks continuously as the platter spins under the write head, without having to pause and wait for the right spot to come back around.
So, why not always write files this way? Well, the biggest reason is that we usually don't need to write data that fast. The user is not inconvenienced by file I/O taking 500 ms instead of 50.
Plus, it significantly increases the complexity of the code.
In addition, the programming frameworks, operating system, file system, and disk controller all have features that provide much or all of the same benefit, without the program itself having to work harder. Buffering at every level of disk I/O means that when your program writes to a file, it thinks the write went really fast, but all that happened was all that data got squirreled away by one or more layers in the disk I/O pipeline, allowing those layers to provide enough data to the disk at once for optimizations involving timing writes for platter position to be done transparently to your program.
Often — almost all the time, I'd guess — if your program is simply streaming data sequentially quickly enough, even without any concurrency the disk can still be kept at a high level of efficiency, because the buffers are large enough to ensure that for any writeable block that goes under the write head, there's a block of data ready to write to it.
Naturally, SSDs change the analysis significantly. Latency on the physical media is no longer an issue, but there are lots more different ways to build an SSD, and each will come with different performance characteristics. On top of that, the technology for SSDs is still changing quickly. The people who design and build SSDs, their controllers, and even the operating systems that use them, work hard to ensure that even naïve programs work efficiently.
So, in general, just write your code naïvely. It's a lot less work to do so, and in most cases it'll work just as well. If you do decide to measure performance, and find that you can make disk I/O work more efficiently by writing to multiple files asynchronously, plan on rechecking your results periodically over time. Changes to disk technology can easily render your optimizations null and void, or even counter-productive.
Related reading:
How to handle large numbers of concurrent disk write requests as efficiently as possible
outputing dictionary optimally
Performance creating multiple small files
What is the maximum number of simultaneous I/O operations in .net 4.5?

Optimizing File Operations

I've an application in C# which involves a lot of file operations, i.e., reading, moving, deleting, appending, etc. For Example, a file is read from a source path on local FS and after processing, it is deleted from there and the processed file is written to target location on local FS. This is all done parallelly on a group of systems with each working only on the local files. (Files were distributed among them by the load balancer)
How can I possibly improve the performance of this application?
Things that I can think of are:
1.) Create a queue for a particular type of operation such as delete. Put the required info in the queue and a separate thread will be processing the queue.
2.) Instead of working on FS, use a in-memory Data store such as Redis. As the data will be in cache, operations will be faster.
3.) Increasing the parallelism of the code. Each thread will be working on separate file and should be faster.
Will the above approaches work? Please suggest any other alternatives that might be worth giving a thought.
1.) I would suggest batching together common context operations to reduce sync\context switching overhead and take advantage of the caching mechanism of your processor.
2.) Grouping files together into a single file will reduce windows's handshake per file performance penalty.
3.) Try using of pointer and\or Win32 API that in many cases appears to be faster than their managed wrappers\lib implementations.
4.) Blocking collection queues (Producer consumer) can be a good starting point.

Multithreaded application does not reach 100% of processor usage

My multithreaded application take some files from the HD and then process the data in this files. I reuse the same instance of a class (dataProcessing)) to create threads (I just change the parameters of the calling method).
processingThread[i] = new Thread(new ThreadStart(dataProcessing.parseAll));
I am wondering if the cause could be all threads reading from the same memory.
It takes about half a minute to process each file. The files are quickly read since they are just 200 KB. After I process the files I write all the results in a single destination file. I dont think the problem is reading or writing to the disk. All the threads are working on the task, but for some reason the processor is not being fully used. I try adding more threads to see if I could reach 100% of processor usage, but it comes to a point where it slows down and decresease the processing usage instead of fully use it. Anyone do have an idea what could be wrong?
Here some points you might want to consider:
most CPUs today are Hyper threaded. Even though the OS assumes that each hyper threaded core has 2 pipe lines this is not the case and very dependent on the CPU and the arithmetic operations you are performing. While on most CPUs there are 2 integer units on each pipe-line, there is only one FP so most FP operations are not gaining any befit from the hyper-threaded architecture.
Since the file is only 200k I can only assume that it is all copied to the cache so this is not a memory/disk issue.
Are you using external DLLs? some operations, like reading/saving JPEG files using native Bitmap class, are not parallel and you won't see any speed-up if you are doing multiple executions at once.
Performance decrease as you are reaching a point that switching between the threads costs more than the operation they are doing.
Are you only reading the data or are you also modifying it? If each thread also modify the data then there are many locks on the cache. It would be better for each thread to gather its own data in its own memory and combine all the data together only after all the threads have does their job.

how to improve a large number of smaller files read and write speed or performance

Yesterday,I asked the question at here:how do disable disk cache in c# invoke win32 CreateFile api with FILE_FLAG_NO_BUFFERING.
In my performance test show(write and read test,1000 files and total size 220M),the FILE_FLAG_NO_BUFFERING can't help me improve performance and lower than .net default disk cache,since i try change FILE_FLAG_NO_BUFFERING to FILE_FLAG_SEQUENTIAL_SCAN can to reach the .net default disk cache and faster little.
before,i try use mongodb's gridfs feature replace the windows file system,not good(and i don't need to use distributed feature,just taste).
in my Product,the server can get a lot of the smaller files(60-100k) on per seconds through tcp/ip,then need save it to the disk,and third service read these files once(just read once and process).if i use asynchronous I/O whether can help me,whether can get best speed and best low cpu cycle?. someone can give me suggestion?or i can still use FileStream class?
update 1
the memory mapped file whether can to achieve my demand.that all files write to one big file or more and read from it?
If your PC is taking 5-10 seconds to write a 100kB file to disk, then you either have the world's oldest, slowest PC, or your code is doing something very inefficient.
Turning off disk caching will probably make things worse rather than better. With a disk cache in place, your writes will be fast, and Windows will do the slow part of flushing the data to disk later. Indeed, increasing I/O buffering usually results in significantly improved I/O in general.
You definitely want to use asynchronous writes - that means your server starts the data writing, and then goes back to responding to its clients while the OS deals with writing the data to disk in the background.
There shouldn't be any need to queue the writes (as the OS will already be doing that if disc caching is enabled), but that is something you could try if all else fails - it could potentially help by writing only one file at a time to minimise the need for disk seeks..
Generally for I/O, using larger buffers helps to increase your throughput. For example instead of writing each individual byte to the file in a loop, write a buffer-ful of data (ideally the entire file, for the sizes you mentioned) in one Write operation. This will minimise the overhead (instead of calling a write function for every byte, you call a function once for the entire file). I suspect you may be doing something like this, as it's the only way I know to reduce performance to the levels you've suggested you are getting.
Memory-mapped files will not help you. They're really best for accessing the contents of huge files.
One of buggest and significant improvements, in your case, can be, imo, process the filles without saving them to a disk and after, if you really need to store them, push them on Queue and provess it in another thread, by saving them on disk. By doing this you will immidiately get processed data you need, without losing time to save a data on disk, but also will have a file on disk after, without losing computational power of your file processor.

Performance creating multiple small files

I need a test app that will create a big number of small files on disk as faster as possible.
Will asynch ops help creating files or just writing them? Is there a way to speed up the whole process (writing on a single file is not possible)
Wouldn't physical drive IO be the bottleneck here? You'll probably get different results if you write to a 4200rpm drive versus a 10,000rpm drive versus an ultrafast SSD.
It's hard for me to say without writing a test app myself, but disc access will be synchronized anyway, so it's not like you will have multiple threads writing to disk at the same time. You could speed up your performance by using threads if there was a fair amount of processing done before writing out each file.
If it's possible to test your app using a ramdisk it would probably speed up things considerably.
If possible, don't write them all in the same directory. Many filesystems slow down when dealing with directories containing large numbers of files. (I once brought our fileserver at work, which normally happily serves the whole office, to its knees by writing thousands of files to the same directory).
Instead, make a new directory for each 1000 files or so.

Categories