I am searching for a very fast way of loading text content from a 1GB text file into a WPF control (ListView for example). I want to load the content within 2 seconds.
Reading the content line by line takes to long, so I think reading it as bytes will be faster. So far I have:
byte[] buffer = new byte[4096];
int bytesRead = 0;
using(FileStream fs = new FileStream("myfile.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) {
while((bytesRead = fs.Read(buffer, 0, buffer.Length)) > 0) {
Encoding.Unicode.GetString(buffer);
}
}
Is there any way of transforming the bytes into string lines and add those to a ListView/ListBox?
Is this the fastest way of loading file content into a WPF GUI control? There are various applications that can load file content from a 1GB file within 1 second.
EDIT: will it help by using multiple threads reading the file? For example:
var t1 = Task.Factory.StartNew(() =>
{
//read content/load into GUI...
});
EDIT 2: I am planning to use pagination/paging as suggested below, but when I want to scroll down or up, the file content has to be read again to get to the place that is being displayed.. so I would like to use:
fs.Seek(bytePosition, SeekOrigin.Begin);
but would that be faster than reading line by line, in multiple threads? Example:
long fileLength = fs.Length;
long halfFile = (fileLength / 2);
FileStream fs2 = fs;
byte[] buffer2 = new byte[4096];
int bytesRead2 = 0;
var t1 = Task.Factory.StartNew(() =>
{
while((bytesRead += fs.Read(buffer, 0, buffer.Length)) < (halfFile -1)) {
Encoding.Unicode.GetString(buffer);
//convert bytes into string lines...
}
});
var t2 = Task.Factory.StartNew(() =>
{
fs2.Seek(halfFile, SeekOrigin.Begin);
while((bytesRead2 += fs2.Read(buffer2, 0, buffer2.Length)) < (fileLength)) {
Encoding.Unicode.GetString(buffer2);
//convert bytes into string lines...
}
});
Using a thread won't make it any faster (technically there is a slight expense to threads so loading may take slightly longer) though it may make your app more responsive. I don't know if File.ReadAllText() is any faster?
Where you will have a problem though is data binding. If say you after loading your 1GB file from a worker thread (regardless of technique), you will now have 1GB worth of lines to databind to your ListView/ListBox. I recommend you don't loop around adding line by line to your control via say an ObservableCollection.
Instead, consider having the worker thread append batches of items to your UI thread where it can append the items to the ListView/ListBox per item in the batch.
This will cut down on the overhead of Invoke as it floods the UI message pump.
Since you want to read this fast I suggest using the System.IO.File class for your WPF desktop application.
MyText = File.ReadAllText("myFile.txt", Encoding.Unicode); // If you want to read as is
string[] lines = File.ReadAllLines("myFile.txt", Encoding.Unicode); // If you want to place each line of text into an array
Together with DataBinding, your WPF application should be able to read the text file and display it on the UI fast.
About performance, you can refer to this answer.
So use File.ReadAllText() instead of ReadToEnd() as it makes your code
shorter and more readable. It also takes care of properly disposing
resources as you might forget doing with a StreamReader (as you did in
your snippet). - Darin Dimitrov
Also, you must consider the specs of the machine that will run your application.
When you say "Reading the content line by line takes to long", what do you mean? How are you actually reading the content?
However, more than anything else, let's take a step back and look at the idea of loading 1 GB of data into a ListView.
Personally you should use an IEnumerable to read the file, for example:
foreach (string line in File.ReadLines(path))
{
}
But more importantly you should implement pagination in your UI and cut down what's visible and what's loaded immediately. This will cut down your resource use massively and make sure you have a usable UI. You can use IEnumerable methods such as Skip() and Take(), which are effective at using your resources effectively (i.e. not loading unused data).
You wouldn't need to use any extra threads either (aside from the background thread + UI thread), but I will suggest using MVVM and INotifyPropertyChanged to skip worrying about threading altogether.
Related
I am facing an issue of OutofMemoryException when I add large number of files in ZipFile. The sample code is as below:
ZipFile file = new ZipFile("E:\\test1.zip");
file.UseZip64WhenSaving = Zip64Option.AsNecessary;
file.ParallelDeflateThreshold = -1;
for (Int64 i = 0; i < 1000000; i++)
{
file.CompressionLevel = Ionic.Zlib.CompressionLevel.None;
byte[] data = Encoding.ASCII.GetBytes("rama");
ZipEntry entry = file.AddEntry(#"myFolder1/test1/myhtml111.html" + i.ToString(), data);
}
file.Save();
I have downloaded the source Code of Ionic.zip library and I see that for every Add*() function like AddEntry(), AddFile() etc. they add item into Dictionary called _entry.
This dictionary does not get cleared when we call Save() or Dispose() methods on ZipFile object.
I feel this is the root cause of OutOfMemoryException.
How do I overcome this issue? Is there any other way to achieve the same result without running into OutOfMemoryException? Am I missing something?
I am open to using other open Source libraries too.
The dictionary holding the internal structure of the archvie shouldn't be a problem.
Assuming your entry 'path' is a string of about 50 bytes - even 1000000 entries should amount to about 50 Mb (a lot - but nowhere near the limit of 2 Gb) - while I haven't bothered checking the size of the ZipEntry - I also doubt it would be large enough (would need to be around 2kb each)
I also think that your expectation for this Entry dictionary to be cleared is wrong. Since this is the informational structure of the contents of the zip file - you need it to hold all the entries.
From this point on I am going to assume that the posted code:
byte[] data = Encoding.ASCII.GetBytes("rama");
Is a place holder for actual file data in bytes (since for 1M x 4 bytes - should be under 4Mb)
The most likely issue here that the declared byte[] data remains in memory untill the entire ZipFile is disposed.
It makes sense to keep this array untill the data is saved.
The simplest way to work around this is to wrap the ZipFile in using re-opening and closing for every file you want to add.
var zipFileName = "E:\\test1.zip";
for (int i = 0; i < 1000000; ++i)
{
using (ZipFile zf = new ZipFile(zipFileName))
{
byte[] data = File.ReadAllBytes(file2Zip);
ZipEntry entry = zf.AddEntry(#"myFolder1/test1/myhtml111.html" + i.ToString(), data);
zf.Save();
}
}
This approach might seem wasteful if you are saving a lot of small files, since you are using byte[] directly it would be quite simple to implement a buffering mechanism.
While it is true that it's possible to side step this issue by compiling to 64-bit, unless you are really just barelly going over the 2Gb limit, this would create a very memory hungry app.
I am reading files into an array; here is the relevant code; a new DiskReader is created for each file and path is determined using OpenFileDialog.
class DiskReader{
// from variables section:
long MAX_STREAM_SIZE = 300 * 1024 * 1024; //300 MB
FileStream fs;
public Byte[] fileData;
...
// Get file size, check it is within allowed size (MAX)STREAM_SIZE), start process including progress bar.
using (fs = File.OpenRead(path))
{
if (fs.Length < MAX_STREAM_SIZE)
{
long NumBytes = (fs.Length < MAX_STREAM_SIZE ? fs.Length : MAX_STREAM_SIZE);
updateValues[0] = (NumBytes / 1024 / 1024).ToString("#,###.0");
result = LoadData(NumBytes);
}
else
{
// Need for something to handle big files
}
if (result)
{
mainForm.ShowProgress(true);
bw.RunWorkerAsync();
}
}
...
bool LoadData(long NumBytes)
{
try
{
fileData = new Byte[NumBytes];
fs.Read(fileData, 0, fileData.Length);
return true;
}
catch (Exception e)
{
return false;
}
}
The first time I run this, it works fine. The second time I run it, sometimes it works fine, most times it throws an System.OutOfMemoryException at
[Edit:
"first time I run this" was a bad choice of words, I meant when I start the programme and open a file is fine, I get the problem when I try to open a different file without exiting the programme. When I open the second file, I am setting the DiskReader to a new instance which means the fileData array is also a new instance. I hope that makes it clearer.]
fileData = new Byte[NumBytes];
There is no obvious pattern to it running and throwing an exception.
I don't think it's relevant, but although the maximum file size is set to 300 MB, the files I am using to test this are between 49 and 64 MB.
Any suggestions on what is going wrong here and how I can correct it?
If the exception is being thrown at that line only, then my guess is that you've got a problem somewhere else in your code, as the comments suggest. Reading the documentation of that exception here, I'd bet you call this function one too many times somewhere and simply go over the limit on object length in memory, since there don't seem to be any problem spots in the code that you posted.
The fs.Length property requires the whole stream to be evaluated, hence to read the file anyway. Try doing something like
byte[] result;
if (new FileInfo(path).Length < MAX_STREAM_SIZE)
{
result = File.ReadAllBytes(path);
}
Also depending on your needs, you might avoid using byte array and read the data directly from the file stream. This should have much lower memory footprint
If I understand well what you want to do, I have this proposal: The best option is to allocate one static array of defined MAX size at the beginning. And then keep that array, only fill it with a new data from another file. This way your memory should be absolutely fine. You just need to store file size in a separate variable, because the array will have always the same MAX size.
This is a common approach in systems with automatic memory management - it makes the program faster when you allocate a constant size of memory at the start and then never allocate anything during the computation, because garbage collector is not run many times.
Using C# (.NET 4.5) I want to copy a set of files to multiple locations (e.g. the contents of a folder to 2 USB drives attached to the computer).
Is there a more efficient way of doing that then just using foreach loops and File.Copy?
Working towards a (possible) solution.
My first thought was some kind of multi-threaded approach. After some reading and research I discovered that just blindly setting up some kind of parallel and/or async process is not a good idea when it comes to IO (as per Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?).
The bottleneck is the disk, especially if it's a traditional drive, as it can only read/write synchronously. That got me thinking, what if I read it once then output it in multiple locations? After all, in my USB drive scenario I'm dealing with multiple (output) disks.
I'm having trouble figuring out how to do that though. One I idea I saw (Copy same file from multiple threads to multiple destinations) was to just read all the bytes of each file into memory then loop through the destinations and write out the bytes to each location before moving onto the next file. It seems that's a bad idea if the files might be large. Some of the files I'll be copying will be videos and could be 1 GB (or more). I can't imagine it's a good idea to load a 1 GB file into memory just to copy it to another disk?
So, allowing flexibility for larger files, the closest I've gotten is below (based off How to copy one file to many locations simultaneously). The problem with this code is that I've still not got a single read and multi-write happening. It's currently multi-read and multi-write. Is there a way to further optimise this code? Could I read chunks into memory then write that chunk to each destination before moving onto the next chunk (like the idea above but chunked files instead of whole)?
files.ForEach(fileDetail =>
Parallel.ForEach(fileDetail.DestinationPaths, new ParallelOptions(),
destinationPath =>
{
using (var source = new FileStream(fileDetail.SourcePath, FileMode.Open, FileAccess.Read, FileShare.Read))
using (var destination = new FileStream(destinationPath, FileMode.Create))
{
var buffer = new byte[1024];
int read;
while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
{
destination.Write(buffer, 0, read);
}
}
}));
I thought I'd post my current solution for anyone else who comes across this question.
If anyone discovers a more efficient/quicker way to do this then please let me know!
My code seems to copy files a bit quicker than just running the copy synchronously but it's still not as fast as I'd like (nor as fast as I've seen some other programs do it). I should note that performance may vary depending on .NET version and your system (I'm using Win 10 with .NET 4.5.2 on a 13" MBP with 2.9GHz i5 (5287U - 2 core / 4 thread) + 16GB RAM). I've not even figured out the best combination of method (e.g. FileStream.Write, FileStream.WriteAsync, BinaryWriter.Write) and buffer size yet.
foreach (var fileDetail in files)
{
foreach (var destinationPath in fileDetail.DestinationPaths)
Directory.CreateDirectory(Path.GetDirectoryName(destinationPath));
// Set up progress
FileCopyEntryProgress progress = new FileCopyEntryProgress(fileDetail);
// Set up the source and outputs
using (var source = new FileStream(fileDetail.SourcePath, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize, FileOptions.SequentialScan))
using (var outputs = new CompositeDisposable(fileDetail.DestinationPaths.Select(p => new FileStream(p, FileMode.Create, FileAccess.Write, FileShare.None, bufferSize))))
{
// Set up the copy operation
var buffer = new byte[bufferSize];
int read;
// Read the file
while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
{
// Copy to each drive
await Task.WhenAll(outputs.Select(async destination => await ((FileStream)destination).WriteAsync(buffer, 0, read)));
// Report progress
if (onDriveCopyFile != null)
{
progress.BytesCopied = read;
progress.TotalBytesCopied += read;
onDriveCopyFile.Report(progress);
}
}
}
if (ct.IsCancellationRequested)
break;
}
I'm using CompositeDisposable from Reactive Extensions (https://github.com/Reactive-Extensions/Rx.NET).
IO operations in general should be considered as asynchronous as there is some hardware operations which are run outside your code, so you can try to introduce some async/await constructs for read/write operations, so you can continue the execution during hardware operations.
while ((read = await source.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
await destination.WriteAsync(buffer, 0, read);
}
You also must to mark your lambda delegate as async to make this work:
async destinationPath =>
...
And you should await the resulting tasks all the way. You may find more information here :
Parallel foreach with asynchronous lambda
Nesting await in Parallel.ForEach
In my project user can upload file up to 1GB. I want to copy that uploaded file stream data to second stream.
If I use like this
int i;
while ( ( i = fuVideo.FileContent.ReadByte() ) != -1 )
{
strm.WriteByte((byte)i);
}
then it is taking so much time.
If i try to do this by byte array then I need to add array size in long which is not valid.
If someone has better idea to do this then please let me know.
--
Hi Khepri thanks for your response. I tried Stream.Copy but it is taking so much time to copy one stream object to second.
I tried with 8.02Mb file and it took 3 to 4 minutes.
The code i have added is
Stream fs = fuVideo.FileContent; //fileInf.OpenRead();
Stream strm = ftp.GetRequestStream();
fs.CopyTo(strm);
If i am doing something wrong then please let me know.
Is this .NET 4.0?
If so Stream.CopyTo is probably your best bet.
If not, and to give credit where credit is due, see the answer in this SO thread. If you're not .NET 4.0 make sure to read the comments in that thread as there are some alternative solutions (Async stream reading/writing) that may be worth investigating if performance is at an absolute premium which may be your case.
EDIT:
Based off the update, are you trying to copy the file to another remote destination? (Just guessing based on GetRequestStream() [GetRequestStream()]. The time is going to be the actual transfer of the file content to the destination. So in this case when you do fs.CopyTo(strm) it has to move those bytes from the source stream to the remote server. That's where the time is coming from. You're literally doing a file upload of a huge file. CopyTo will block your processing until it completes.
I'd recommend looking at spinning this kind of processing off to another task or at the least look at the asynchronous option I listed. You can't really avoid this taking a large period of time. You're constrained by file size and available upload bandwidth.
I verified that when working locally CopyTo is sub-second. I tested with a half gig file and a quick Stopwatch class returned a processing time of 800 millisecondss.
If you are not .NET 4.0 use this
static void CopyTo(Stream fromStream, Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = fromStream.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
I am trying to join a number of binary files that were split during download. The requirement stemmed from the project http://asproxy.sourceforge.net/. In this project author allows you to download files by providing a url.
The problem comes through where my server does not have enough memory to keep a file that is larger than 20 meg in memory.So to solve this problem i modified the code to not download files larger than 10 meg's , if the file is larger it would then allow the user to download the first 10 megs. The user must then continue the download and hopefully get the second 10 megs. Now i have got all this working , except when the user needs to join the files they downloaded i end up with corrupt files , as far as i can tell something is either being added or removed via the download.
I am currently join the files together by reading all the files then writing them to one file.This should work since i am reading and writing in bytes. The code i used to join the files is listed here http://www.geekpedia.com/tutorial201_Splitting-and-joining-files-using-C.html
I do not have the exact code with me atm , as soon as i am home i will post the exact code if anyone is willing to help out.
Please let me know if i am missing out anything or if there is a better way to do this , i.e what could i use as an alternative to a memory stream. The source code for the original project which i made changes to can be found here http://asproxy.sourceforge.net/download.html , it should be noted i am using version 5.0. The file i modified is called WebDataCore.cs and i modified line 606 to only too till 10 megs of data had been loaded the continue execution.
Let me know if there is anything i missed.
Thanks
You shouldn't split for memory reasons... the reason to split is usually to avoid having to re-download everything in case of failure. If memory is an issue, you are doing it wrong... you shouldn't be buffering in memory, for example.
The easiest way to download a file is simply:
using(WebClient client = new WebClient()) {
client.DownloadFile(remoteUrl, localPath);
}
Re your split/join code - again, the problem is that you are buffering everything in memory; File.ReadAllBytes is a bad thing unless you know you have small files. What you should have is something like:
byte[] buffer = new byte[8192]; // why not...
int read;
while((read = inStream.Read(buffer, 0, buffer.Length)) > 0)
{
outStream.Write(buffer, 0, read);
}
This uses a moderate buffer to pump data between the two as a stream. A lot more efficient. The loop says:
try to read some data (at most, the buffer-size)
(this will read at least 1 byte, or we have reached the end of the stream)
if we read something, write this many bytes from the buffer to the output
In the end i have found that by using a FTP request i was able to get arround the memory issue and the file is saved correctly.
Thanks for all the help
That example is loading each entire chunk into memory, instead you could do something like this:
int bufSize = 1024 * 32;
byte[] buffer = new byte[bufSize];
using (FileStream outputFile = new FileStream(OutputFileName, FileMode.OpenOrCreate,
FileAccess.Write, FileShare.None, bufSize))
{
foreach (string inputFileName in inputFiles)
{
using (FileStream inputFile = new FileStream(inputFileName, FileMode.Append,
FileAccess.Write, FileShare.None, buffer.Length))
{
int bytesRead = 0;
while ((bytesRead = inputFile.Read(buffer, 0, buffer.Length)) != 0)
{
outputFile.Write(buffer, 0, bytesRead);
}
}