Speeding up the loading of a List of images

Speeding up the loading of a List of images - c#

I'm loading a List<Image> from a folder of about 250 images. I did a DateTime comparison and it takes a full 11 second to load those 250 images. That's slow as hell, and I'd very much like to speed that up.
The images are on my local harddrive, not even an external one.
The code:
DialogResult dr = imageFolderBrowser.ShowDialog();
if(dr == DialogResult.OK) {
DateTime start = DateTime.Now;
//Get all images in the folder and place them in a List<>
files = Directory.GetFiles(imageFolderBrowser.SelectedPath);
foreach(string file in files) {
sourceImages.Add(Image.FromFile(file));
}
DateTime end = DateTime.Now;
timeLabel.Text = end.Subtract(start).TotalMilliseconds.ToString();
}
EDIT: yes, I need all the pictures. The thing I'm planning is to take the center 30 pixelcolums of each and make a new image out of that. Kinda like a 360 degrees picture. Only right now, I'm just testing with random images.
I know there are probably way better frameworks out there to do this, but I need this to work first.
EDIT2: Switched to a stopwatch, the difference is just a few milliseconds. Also tried it with Directory.EnumerateFiles, but no difference at all.
EDIT3: I am running .NET 4, on a 32-bit Win7 client.

Do you actually need to load all the images? Can you get away with loading them lazily? Alternatively, can you load them on a separate thread?

You cannot speed up your HDD access and decoding speed. However a good idea would be to load the images in a background thread.
Perhaps you should consider showing a placeholder until the image is actually loaded.
Caution: you'll need to insert the loaded images in your UI thread anyway!

You could use Directory.EnumerateFiles along with Parallel.ForEach to spread the work over as many CPUs as you have.
var directory = "C:\\foo";
var files = Directory.EnumerateFiles(directory, "*.jpg");
var images = files.AsParallel().Select(file => Image.FromFile(file)).ToList();

As loading a image does both file IO and CPU work, you should get some speadup by using more then one thread.
If you are using .net 4, using tasks would be the way to go.

Given that you likely already know the path (from the dialog box?), you might be better using Directory.EnumerateFiles and then work with the collection it returns instead of a list.
http://msdn.microsoft.com/en-us/library/dd383458.aspx
[edit]
just noticed you're also loading the files into your app within the loop - how big are they? Depending on their size, it might actually be a pretty good speed!
Do you need to load them at this point? Can you change some display code elsewhere to load on demand?

You probably can't speed things up as the bottle neck is reading the files themselves from disk and maybe parsing them as images.
What you can do though is Cache the list after it's loaded and then any subsequent calls to your code will be lot faster.

Related

System.IO.Compression - Counting the number of files using ZipFileArchive is very slow

In order to update a progress bar with the number of files to extract. My program is going over a list of Zip files and collects the number of files in them. The combined number is approximately 22000 files.
The code I am using:
foreach (string filepath in zipFiles)
{
ZipArchive zip = ZipFile.OpenRead(filepath);
archives.Add(zip);
filesCounter += zip.Entries.Count;
}
However it looks like the zip.Entries.Count is doing some kind of a traversal and it takes ages for this count to complete (Several Minutes and much, much more, if the internet connection is not great).
To have a sort of notion how much this can improve, I compared the above to the performance of 7-Zip.
I took one of the zip files that contain ~11000 files and folders:
2 Seconds to Open 7-Zip Archive.
1 Second to get the file properties
In the properties I can see 10016 files + 882 folder - meaning it takes 7-Zip ~3 seconds to know there are 10898 entries in the Zip file.
Any Idea, suggestion or any alternative method, that quickly counts the number of files, will be appreciated.
Using DotNetZip to count is actually much faster, but due to some internal bureaucratic issues, I can't use it.
I need to have a solution not involving third party libraries, I can still use Microsoft Standard Libraries.

My progress bar issue is solved, by taking a new approach to the matter.
I simply accumulate all ZIP files sizes, which serves as the max size. Now for each individual file that is extracted I add its compressed size to the progress. This way the progress bar does not show me the number of files, it shows me the uncompressed progress (E.g. If, in total, I have 4GB to Extract, when the progress bar is 1/4 green, I know I Extracted 1GB). Looks like a better representation of reality.
foreach (string filepath in zipFiles)
{
ZipArchive zip = ZipFile.OpenRead(filepath);
archives.Add(zip);
// Accumulating the Zip files sizes.
filesCounter += new FileInfo(filepath).Length;
}
// To utilize multiple processors it is possible to activate this loop
// in a thread for each ZipArchive -> currentZip!
// :
// :
foreach (ZipArchiveEntry entry in currentZip.Entries) {
// Doing my extract code here.
// :
// :
// Accumulate the compressed size of each file.
compressedFileSize += entry.CompressedLength
// Doing other stuff
// :
// :
}
So the issue with improving the performance of the zip.Entries.Count is still on, and I am still interested in knowing how to solve this specific issue (What does 7Zip do to be so quick - may be they use the DotNetZip or other C++ libraries)

Generating PDF for 90K records

Currently I am using LocalReport. Render to create PDF's for 90K records. Using normal 'for' loop, it takes around 4 hours to create PDF only. I have tried many options.
Tried with Parallel. Foreach with and without setting MaxDegreeOfParallelism with different values. There are 2 processors in my system. With setting MaxDegreeOfParallelism(MDP) =4, it is taking the time as normal 'for' loop. I thought increasing MDP to 40 will speed up the process. But didn't get expected results since it took 900 minutes.
Used
var list=List<Thread ()>;
foreach (var record in records) {
var thread = new Thread (=> GeneratePDF());
thread.Start();
list.Add(thread);
}
foreach(var listThreads in thread){
listThreads. Join();
}
I used the code above like that. But it ended up creating too many threads and took so longer time.
I need help in using Parallel. Foreach to speed up the process of creating PDF's for 90K records. Suggestions to change the code is also acceptable.
Any help would be much appreciated.
Thanks

I don't know any pdf generators, so I can only assume there is a lot overhead in initializing and in finalizing things. That's what I'd do:
Find an open source pdf generator.
Let it generate a few separate pieces of a pdf - header, footer, etc.
Dig the code to find where the header/footer is done and try work around them to reuse generator states without running through the entire process.
Try to stich together a pdf from stored states and a generator writing only the different parts.

Passthru reading of all files in folder

I've got pretty unusual request:
I would like to load all files from specific folder (so far easy). I need something with very small memory footprint.
Now it gets complicated (at least for me). I DON'T need to store or use the content of the files - I just need to force block-level caching mechanism to cache all the blocks that are used by that specific folder.
I know there are many different methods (BinaryReader, StreamReader etc.), but my case is quite special, since I don't care about the content...
Any idea what would be the best way how to achieve this?
Should I use small buffer? But since it would filled quickly, wouldn't flushing of the buffer actually slow down the operation?
Thanks,
Martin

I would perhaps memory map the files and then loop around accessing an element of each file at regular (block-spaced) intervals.
Assuming of course that you are able to use .Net 4.0.
In psuedo code you'd do something like:
using ( var mmf = MemoryMappedFile.CreateFromFile( path ) )
{
for ( long offset = 0 ; offset < file.Size ; offset += block_size )
{
using ( var acc = accessor = mmf.CreateViewAccessor(offset, 1) )
{
acc.ReadByte(offset);
}
}
}
But at the end of the day, each method will have different performance characteristics so you might have to use a bit of trial and error to find out which is the most performant.

I would simply read those files. When you do that, CacheManager in NTFS caches these files automatically, and you don't have to care about anything else - that's exactly the role of CacheManager, and by reading these files, you give it a hint that these files should be cached.

Background Changer application takes too much memory

I'm working on a background changing application. Part of the application is a slideshow with 3 image previews (3 image boxes). Previous, Current and Next image. The problem is that each time the timer ticks the application takes about 8 MB of memory space. I know its most likely caused by the image drawing class but I have no idea how to dispose of the images that I'm not using.
UPDATE:
Thank you so much. I need to adjucst the code you have provided a little bit but it works now. When I tried using the dispose method before I used it on completely different object.
Thank you.
It works in the following order.
Load multiple images
retrieve image path
set time interval in which the images will be changed
start the timer
with each timer tick the timer does the following
pictureBoxCurr.BackgroundImage = Image.FromFile(_filenames.ElementAt(_currNum));
pictureBoxPrev.BackgroundImage = Image.FromFile(_filenames.ElementAt(_currNum - 1));
pictureBoxNext.BackgroundImage = Image.FromFile(_filenames.ElementAt(_currNum + 1));
Each time new previews are shown the memory usage takes another 8MB or so. I have no Idea what exactly is taking that space.
Please let me know if you know what is causing the problem or have any clues.

I would recommend calling the following code at every timer tick, prior to changing the images.
pictureBoxCurr.BackgroundImage.Dispose();
pictureBoxPrev.BackgroundImage.Dispose();
pictureBoxNext.BackgroundImage.Dispose();
This will free the unmanaged image resources immediately, rather than waiting for the Garbage Collector.

Show progress when searching all files in a directory

I previously asked the question Get all files and directories in specific path fast in order to find files as fastest as possible. I am using that solution in order to find the file names that match a regular expression.
I was hoping to show a progress bar because with some really large and slow hard drives it still takes about 1 minute to execute. That solution I posted on the other link does not enable me to know how many more files are missing to be traversed in order for me to show a progress bar.
One solution that I was thinking about doing was trying to obtain the size of the directory that I was planing traversing. For example when I right click on the folder C:\Users I am able to get an estimate of how big that directory is. If I am able to know the size then I will be able to show the progress by adding the size of every file that I find. In other words the progress = (current sum of file sizes) / directory size
For some reason I have not been able to efficiently get the size of that directory.
Some of the questions on stack overflow use the following approach:
But note that I get an exception and are not able to enumerate the files. I am curios in trying that method on my c drive.
On that picture I was trying to count the number of files in order to show a progress. I will probably not going to be able to get the number of files efficiently using that approach. I where just trying some of the answers on stack overflow when people asked how to get the number of files on a directory and also people asked how the get the size f a directory.

Solving this is going to leave you with one of a few possibilities...
Not displaying a progress
Using an up-front cost to compute (like Windows)
Performing the operation while computing the cost
If the speed is that important and you expect large directory trees I would lean to the last of these options. I've added an answer on the linked question Get all files and directories in specific path fast that demonstrates a faster means of counting files and sizes than you are currently using. To combine this into a multi-threaded piece of code for option #3, the following can be performed...
static void Main()
{
const string directory = #"C:\Program Files";
// Create an enumeration of the files we will want to process that simply accumulates these values...
long total = 0;
var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
fcounter.RaiseOnAccessDenied = false;
fcounter.FileFound +=
(o, e) =>
{
if (!e.IsDirectory)
{
Interlocked.Increment(ref total);
}
};
// Start a high-priority thread to perform the accumulation
Thread t = new Thread(fcounter.Find)
{
IsBackground = true,
Priority = ThreadPriority.AboveNormal,
Name = "file enum"
};
t.Start();
// Allow the accumulator thread to get a head-start on us
do { Thread.Sleep(100); }
while (total < 100 && t.IsAlive);
// Now we can process the files normally and update a percentage
long count = 0, percentage = 0;
var task = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
task.RaiseOnAccessDenied = false;
task.FileFound +=
(o, e) =>
{
if (!e.IsDirectory)
{
ProcessFile(e.FullPath);
// Update the percentage complete...
long progress = ++count * 100 / Interlocked.Read(ref total);
if (progress > percentage && progress <= 100)
{
percentage = progress;
Console.WriteLine("{0}% complete.", percentage);
}
}
};
task.Find();
}
The FindFile class implementation can be found at FindFile.cs.
Depending on how expensive your file-processing task is (the ProcessFile function above) you should see a very clean progression of the progress on large volumes of files. If your file-processing is extremely fast, you may want to increase the lag between the start of enumeration and start of processing.
The event argument is of type FindFile.FileFoundEventArgs and is a mutable class so be sure you don't keep a reference to the event argument as it's values will change.
Ideally you will want to add error handling and probably the ability to abort both enumerations. Aborting the enumeration can be done by setting "CancelEnumeration" on the event argument.

What you are asking may not be possible because of how the file-system store it's data.
It is a file system limitation
There is no way to know the total size of a folder, nor the total files count inside a folder without enumerating files one by one. Neither of these informations are stored in the file system.
This is why Windows shows a message like "Calculating space" before copying folders with a lot of files... it is actually counting how many files are there inside the folder, and summing their sizes so that it can show the progress bar while doing the real copy operation. (it also uses the informations to know if the destination has enough space to hold all the data being copied).
Also when you right-click a folder, and go to properties, note that it takes some time to count all files and to sum all the file sizes. That is caused by the same limitation.
To know how large a folder is, or how many files are there inside a folder, you must enumerate the files one-by-one.
Fast files enumeration
Of course, as you already know, there are a lot of ways of doing the enumeration itself... but none will be instantaneous. You could try using the USN Journal of the file system to do the scan. Take a look at this project in CodePlex: MFT Scanner in VB.NET (the code is actually in C#... don't know why the author says it is VB.NET) ... it found all the files in my IDE SATA (not SSD) drive in less than 15 seconds, and found 311000 files.
You will have to filter the files by path, so that only the files inside the path you are looking are returned. But that is the easy part of the job!
Hope this helps in your project... good luck!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.