Save multiple images to disk rapidly in C# - c#

I have a program in C# which saves a large number of images to disk after processing them
. This seems to be taking quite a bit of time due to the fact that so many images need to be saved.
Now, I was wondering: is there any way to speed up saving images in C#? At the moment, I'm using the standard bmp.Save(filename) approach.
If it helps, part of the image generation process involves using lockbits to access and modify the pixel values more rapidly, so perhaps when I do this, the images could be saved to disk at the same time? Apologies if this idea is daft, but I'm still somewhat new to C#.

You could certainly start a new thread for each image save. That would reduce the time taken a bit, the disk would then become the bottle neck though.
One other option would be to save the images to a temporary buffer list and then return control to the program. Then have a thread to write each one to disk in the background. Of course, that would only give the appearance of this happening quickly. It could possibly serve your needs though.
I am sure that .NET has implemented some sort of Asynchronous I/O to do this for you. I know Windows has so it makes sense that it would be in .NET.
This may be helpful.
http://msdn.microsoft.com/en-us/library/kztecsys(v=vs.71).aspx

Related

Having multiple simultaneous writers (no reader) to a single file. Is it possible to accomplish in a performant way in .NET?

I'm developing a multiple segment file downloader. To accomplish this task I'm currently creating as many temporary files on disk as segments I have (they are fixed in number during the file downloading). In the end I just create a new file f and copy all the segments' contents onto f.
I was wondering if there's not a better way to accomplish this. My idealization is of initially creating f in its full-size and then have the different threads write directly onto their portion. There need not to be any kind of interaction between them. We can assume any of them will start at its own starting point in the file and then only fill information sequentially in the file until its task is over.
I've heard about Memory-Mapped files (http://msdn.microsoft.com/en-us/library/dd997372(v=vs.110).aspx) and I'm wondering if they are the solution to my problem or not.
Thanks
Using the memory mapped API is absolute doable and it will probably perform quite well - of cause some testing would be recommended.
If you want to look for a possible alternative implementation, I have the following suggestion.
Create a static stack data structure, where the download threads can push each file segment as soon as it's downloaded.
Have a separate thread listen for push notifications on the stack. Pop the stack file segments and save each segment into the target file in a single threaded way.
By following the above pattern, you have separated the download of file segments and the saving into a regular file, by putting a stack container in between.
Depending on the implementation of the stack handling, you will be able to implement this with very little thread locking, which will maximise performance.
The pros of this is that you have 100% control on what is going on and a solution that might be more portable (if that ever should be a concern).
The stack decoupling pattern you do, can also be implemented pretty generic and might even be reused in the future.
The implementation of this is not that complex and probably on par with the implementation needed to be done around the memory mapped api.
Have fun...
/Anders
The answers posted so far are, of course addressing your question but you should also consider the fact that multi-threaded I/O writes will most likely NOT give you gains in performance.
The reason for multi-threading downloads is obvious and has dramatic results. When you try to combine the files though, remember that you are having multiple threads manipulate a mechanical head on conventional hard drives. In case of SSD's you may gain better performance.
If you use a single thread, you are by far exceeding the HDD's write capacity in a SEQUENTIAL way. That IS by definition the fastest way to write to conventions disks.
If you believe otherwise, I would be interested to know why. I would rather concentrate on tweaking the write performance of a single thread by playing around with buffer sizes, etc.
Yes, it is possible but the only precaution you need to have is to control that no two threads are writing at the same location of file, otherwise file content will be incorrect.
FileStream writeStream = new FileStream(destinationPath, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write);
writeStream.Position = startPositionOfSegments; //REMEMBER This piece of calculation is important
// A simple function to write the bytes ... just read from your source and then write
writeStream.Write(ReadBytes, 0 , bytesReadFromInputStream);
After each Write we used writeStream.Flush(); so that buffered data gets written to file but you can change according to your requirement.
Since you have code already working which downloads the file segments in parallel. The only change you need to make is just open the file stream as posted above, and instead of creating many segments file locally just open stream for a single file.
The startPositionOfSegments is very important and calculate it perfectly so that no two segments overwrite the desired downloaded bytes to same location on file otherwise it will provide incorrect result.
The above procedure works perfectly fine at our end, but this can be problem if your segment size are too small (We too faced it but after increasing size of segments it got fixed). If you face any exception then you can also synchronize only the Write part.

Best way to keep image in memory

I want to keep 1000-2000 Images in memory. I tried with imageToByteArray and store them in a key value pair but obviously gives memory leak. Is there any other way or i'm lost?
The reason for keeping them in memory is for very fast reading but it looks like a bad idea.
Pretty small images 450, 250 i will use them in winforms. The problem is that they are grouped in clips so in runtime i will show 25picture/second so thats why i need the memory
Thanks in advance,
Is there any situation where you need to 1000 images at once?
If you keep them all, depends on the image size you will definitely hit the memory caps in the long run. You need to have some cache mechanism to manage the smartly. May be you can deal with a simply DB like SQLite to manage them efficiently or use your own smart way of caching based on your application preferences.
If you are working with WinForms - have you considered using actual animated gifs?. It's not so hard to make them out of a series of images representing separate frames...

What is the correct strategy for keeping lots of text data in memory? System.Runtime.Caching or custom classes?

Before refactoring my code to start experimenting, I'm hoping the wisdom of the community can advise me on the correct path.
Problem: I have a WPF program that does a diff on hundreds of ini files. For performance of the diffing I'd like to keep several hundred of the base files that other files are diffed against in memory. I've found using custom classes to store this data starts to bring my GUI to a halt once I've loaded 10-15 files with approximately 4000 line of data each.
I'm considering several strategies to improve performance:
Don't store more than a few files in memory at a time and forget about what I hoped would have been perf improvement in parsing by keeping them in memory
Experiment with running all the base file data in a BackgroundWorker thread. I'm not doing any work of these files on the GUI thread but maybe all that stored data is affecting it somehow. I'm guessing here.
Experiment with System.Runtime.Caching class.
The question asked here on SO didn't, in my mind, answer the question of what's the best strategy for this type of work. Thanks in advance for any help you can provide!
Assuming 100 character lines of text 15 * 4000 * 100 is only 6MB which is a trivial amount of memory on a modern PC. If your GUI is coming to a halt then to me that is an indication of virtual memory being swapped in and out to disk. That doesn't make sense for only 6MB so I'd figure out how much it's really taking up and why. If may well be some trivial mistake that would be easier to fix than re-thinking your whole strategy. The other possibility is that it has nothing to do with memory consumption but rather an algorithm issue.
You should use MemoryCache for this.
It works almost alike the ASP.Net Cache class, and allows you to set when it should clean up, which should be cleaned up first etc.
It also allows you to reload items based on dependencies, or after a certain time. Has callbacks on remove.
Very complete.
if your application start to hang, it more like you are doing intensive process in your GUI process, which consumer too much resource either CPU/Memory on your GUI thread, thus the GUI thread can't repaint your UI in time.
The best way to resolve it is spawn separate thread to do the diff operation, as you mentioned in your post, you can use backgroundworker, or you can use threadpool to spawn as much thread as you can to do the diff.
Don't think you need to cache the file in memory, I think it would be more appropriate to save the result into file, and load the file ondemand. it shouldn't become a bottleneck of your application.

how to improve a large number of smaller files read and write speed or performance

Yesterday,I asked the question at here:how do disable disk cache in c# invoke win32 CreateFile api with FILE_FLAG_NO_BUFFERING.
In my performance test show(write and read test,1000 files and total size 220M),the FILE_FLAG_NO_BUFFERING can't help me improve performance and lower than .net default disk cache,since i try change FILE_FLAG_NO_BUFFERING to FILE_FLAG_SEQUENTIAL_SCAN can to reach the .net default disk cache and faster little.
before,i try use mongodb's gridfs feature replace the windows file system,not good(and i don't need to use distributed feature,just taste).
in my Product,the server can get a lot of the smaller files(60-100k) on per seconds through tcp/ip,then need save it to the disk,and third service read these files once(just read once and process).if i use asynchronous I/O whether can help me,whether can get best speed and best low cpu cycle?. someone can give me suggestion?or i can still use FileStream class?
update 1
the memory mapped file whether can to achieve my demand.that all files write to one big file or more and read from it?
If your PC is taking 5-10 seconds to write a 100kB file to disk, then you either have the world's oldest, slowest PC, or your code is doing something very inefficient.
Turning off disk caching will probably make things worse rather than better. With a disk cache in place, your writes will be fast, and Windows will do the slow part of flushing the data to disk later. Indeed, increasing I/O buffering usually results in significantly improved I/O in general.
You definitely want to use asynchronous writes - that means your server starts the data writing, and then goes back to responding to its clients while the OS deals with writing the data to disk in the background.
There shouldn't be any need to queue the writes (as the OS will already be doing that if disc caching is enabled), but that is something you could try if all else fails - it could potentially help by writing only one file at a time to minimise the need for disk seeks..
Generally for I/O, using larger buffers helps to increase your throughput. For example instead of writing each individual byte to the file in a loop, write a buffer-ful of data (ideally the entire file, for the sizes you mentioned) in one Write operation. This will minimise the overhead (instead of calling a write function for every byte, you call a function once for the entire file). I suspect you may be doing something like this, as it's the only way I know to reduce performance to the levels you've suggested you are getting.
Memory-mapped files will not help you. They're really best for accessing the contents of huge files.
One of buggest and significant improvements, in your case, can be, imo, process the filles without saving them to a disk and after, if you really need to store them, push them on Queue and provess it in another thread, by saving them on disk. By doing this you will immidiately get processed data you need, without losing time to save a data on disk, but also will have a file on disk after, without losing computational power of your file processor.

C# Parallel Task usage in OCR Application?

I'm building a Windows Service application that takes as input a directory containing scanned images. My application will iterates through all images and for every image, it will perform some OCR operations in order to grab the barcode, invoice number and customer number.
Some background info:
The tasks performed by the application are pretty CPU intensive
There are large number of images to procss and the scanned image file are large (~2MB)
The application runs on a 8-core server with 16GB of RAM.
My question:
Since it's doing stuff with images on the file system I'm unsure if it will really make a difference if I change my application in a way that it will use .NET Parallel Tasks.
Can anybody give me advice about this?
Many thanks!
If processing an image takes longer than reading N images from the disk, then processing multiple images concurrently is a win. Figure you can read a 2 MB file from disk in under 100 ms (including seek time). Figure one second to read 8 images into memory.
So if your image processing takes more than a second per image, I/O isn't a problem. Do it concurrently. You can scale that down if you need to (i.e. if processing takes 1/2 second, then you're probably best off with only 4 concurrent images).
You should be able to test this fairly quickly: write a program that randomly reads images off the disk, and calculate the average time to open, read, and close the file. Also write a program that processes a sample of the images and compute the average processing time. Those numbers should tell you whether or not concurrent processing will be helpful.
I think the answer is, 'It Depends'.
I'd try running the application with some type of Performance Monitoring (even the one in Task Manager) and see how high the CPU gets.
If the CPU is maxing out; it would improve performance to run it in paralell. If not, the disk is the bottleneck and without some other changes, you probably wouldn't get much (if any) gain.

Categories