so I'm trying to load a file into a richtextbox, but I'm having some problems. No matter what method I used, binaryreader, filestream, streamreader I always encountered a problem with loading a file into a richtextbox in chunks. (I can't use LoadFile as it doesn't let me specify encoding). It seems that if the buffer size is too small, smaller than 3MB, AppendText sometimes adds a few extra empty lines. The file itself doesn't lose any data, there are just a few extra lines appended to it. Here is the code I'm using:
richTextBox.Clear();
progressBar.Value = 0;
const int bufferSize = 1024 * 1024 * 3; //I've tried smaller buffers but they ALL seem to append a few extra lines (empty lines)
using (StreamReader streamReader = new StreamReader(path))
{
while (streamReader.Peek() != -1)
{
char[] buffer = new char[bufferSize];
await streamReader.ReadBlockAsync(buffer, 0, bufferSize);
richTextBox.AppendText(new string(buffer));
progressBar.Value = (int)(((double)streamReader.BaseStream.Position) / streamReader.BaseStream.Length * 100);
}
}
This code seems to work, but I'm paranoid that it might still append extra lines at times depending on the circumstances. Does anybody know why this could be occurring?
*Extra Questions
Is using StreamReader slower than FileStream or binaryreader?
Should I use readblock or read?
This is the simple way you can use; it gives same format as in file:
OpenFileDialog fil = new OpenFileDialog();
if (fil.ShowDialog()== DialogResult.OK)
{
richTextBox1.Clear();
richTextBox1.Text = File.ReadAllText(fil.FileName);
}
Related
I want to read a text file line by line. I wanted to know if I'm doing it as efficiently as possible within the .NET C# scope of things.
This is what I'm trying so far:
var filestream = new System.IO.FileStream(textFilePath,
System.IO.FileMode.Open,
System.IO.FileAccess.Read,
System.IO.FileShare.ReadWrite);
var file = new System.IO.StreamReader(filestream, System.Text.Encoding.UTF8, true, 128);
while ((lineOfText = file.ReadLine()) != null)
{
//Do something with the lineOfText
}
To find the fastest way to read a file line by line you will have to do some benchmarking. I have done some small tests on my computer but you cannot expect that my results apply to your environment.
Using StreamReader.ReadLine
This is basically your method. For some reason you set the buffer size to the smallest possible value (128). Increasing this will in general increase performance. The default size is 1,024 and other good choices are 512 (the sector size in Windows) or 4,096 (the cluster size in NTFS). You will have to run a benchmark to determine an optimal buffer size. A bigger buffer is - if not faster - at least not slower than a smaller buffer.
const Int32 BufferSize = 128;
using (var fileStream = File.OpenRead(fileName))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize)) {
String line;
while ((line = streamReader.ReadLine()) != null)
{
// Process line
}
}
The FileStream constructor allows you to specify FileOptions. For example, if you are reading a large file sequentially from beginning to end, you may benefit from FileOptions.SequentialScan. Again, benchmarking is the best thing you can do.
Using File.ReadLines
This is very much like your own solution except that it is implemented using a StreamReader with a fixed buffer size of 1,024. On my computer this results in slightly better performance compared to your code with the buffer size of 128. However, you can get the same performance increase by using a larger buffer size. This method is implemented using an iterator block and does not consume memory for all lines.
var lines = File.ReadLines(fileName);
foreach (var line in lines)
// Process line
Using File.ReadAllLines
This is very much like the previous method except that this method grows a list of strings used to create the returned array of lines so the memory requirements are higher. However, it returns String[] and not an IEnumerable<String> allowing you to randomly access the lines.
var lines = File.ReadAllLines(fileName);
for (var i = 0; i < lines.Length; i += 1) {
var line = lines[i];
// Process line
}
Using String.Split
This method is considerably slower, at least on big files (tested on a 511 KB file), probably due to how String.Split is implemented. It also allocates an array for all the lines increasing the memory required compared to your solution.
using (var streamReader = File.OpenText(fileName)) {
var lines = streamReader.ReadToEnd().Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (var line in lines)
// Process line
}
My suggestion is to use File.ReadLines because it is clean and efficient. If you require special sharing options (for example you use FileShare.ReadWrite), you can use your own code but you should increase the buffer size.
If you're using .NET 4, simply use File.ReadLines which does it all for you. I suspect it's much the same as yours, except it may also use FileOptions.SequentialScan and a larger buffer (128 seems very small).
While File.ReadAllLines() is one of the simplest ways to read a file, it is also one of the slowest.
If you're just wanting to read lines in a file without doing much, according to these benchmarks, the fastest way to read a file is the age old method of:
using (StreamReader sr = File.OpenText(fileName))
{
string s = String.Empty;
while ((s = sr.ReadLine()) != null)
{
//do minimal amount of work here
}
}
However, if you have to do a lot with each line, then this article concludes that the best way is the following (and it's faster to pre-allocate a string[] if you know how many lines you're going to read) :
AllLines = new string[MAX]; //only allocate memory here
using (StreamReader sr = File.OpenText(fileName))
{
int x = 0;
while (!sr.EndOfStream)
{
AllLines[x] = sr.ReadLine();
x += 1;
}
} //Finished. Close the file
//Now parallel process each line in the file
Parallel.For(0, AllLines.Length, x =>
{
DoYourStuff(AllLines[x]); //do your work here
});
Use the following code:
foreach (string line in File.ReadAllLines(fileName))
This was a HUGE difference in reading performance.
It comes at the cost of memory consumption, but totally worth it!
If the file size is not big, then it is faster to read the entire file and split it afterwards
var filestreams = sr.ReadToEnd().Split(Environment.NewLine,
StringSplitOptions.RemoveEmptyEntries);
There's a good topic about this in Stack Overflow question Is 'yield return' slower than "old school" return?.
It says:
ReadAllLines loads all of the lines into memory and returns a
string[]. All well and good if the file is small. If the file is
larger than will fit in memory, you'll run out of memory.
ReadLines, on the other hand, uses yield return to return one line at
a time. With it, you can read any size file. It doesn't load the whole
file into memory.
Say you wanted to find the first line that contains the word "foo",
and then exit. Using ReadAllLines, you'd have to read the entire file
into memory, even if "foo" occurs on the first line. With ReadLines,
you only read one line. Which one would be faster?
If you have enough memory, I've found some performance gains by reading the entire file into a memory stream, and then opening a stream reader on that to read the lines. As long as you actually plan on reading the whole file anyway, this can yield some improvements.
You can't get any faster if you want to use an existing API to read the lines. But reading larger chunks and manually find each new line in the read buffer would probably be faster.
When you need to efficiently read and process a HUGE text file, ReadLines() and ReadAllLines() are likely to throw Out of Memory exception, this was my case. On the other hand, reading each line separately would take ages. The solution was to read the file in blocks, like below.
The class:
//can return empty lines sometimes
class LinePortionTextReader
{
private const int BUFFER_SIZE = 100000000; //100M characters
StreamReader sr = null;
string remainder = "";
public LinePortionTextReader(string filePath)
{
if (File.Exists(filePath))
{
sr = new StreamReader(filePath);
remainder = "";
}
}
~LinePortionTextReader()
{
if(null != sr) { sr.Close(); }
}
public string[] ReadBlock()
{
if(null==sr) { return new string[] { }; }
char[] buffer = new char[BUFFER_SIZE];
int charactersRead = sr.Read(buffer, 0, BUFFER_SIZE);
if (charactersRead < 1) { return new string[] { }; }
bool lastPart = (charactersRead < BUFFER_SIZE);
if (lastPart)
{
char[] buffer2 = buffer.Take<char>(charactersRead).ToArray();
buffer = buffer2;
}
string s = new string(buffer);
string[] sresult = s.Split(new string[] { "\r\n" }, StringSplitOptions.None);
sresult[0] = remainder + sresult[0];
if (!lastPart)
{
remainder = sresult[sresult.Length - 1];
sresult[sresult.Length - 1] = "";
}
return sresult;
}
public bool EOS
{
get
{
return (null == sr) ? true: sr.EndOfStream;
}
}
}
Example of use:
class Program
{
static void Main(string[] args)
{
if (args.Length < 3)
{
Console.WriteLine("multifind.exe <where to search> <what to look for, one value per line> <where to put the result>");
return;
}
if (!File.Exists(args[0]))
{
Console.WriteLine("source file not found");
return;
}
if (!File.Exists(args[1]))
{
Console.WriteLine("reference file not found");
return;
}
TextWriter tw = new StreamWriter(args[2], false);
string[] refLines = File.ReadAllLines(args[1]);
LinePortionTextReader lptr = new LinePortionTextReader(args[0]);
int blockCounter = 0;
while (!lptr.EOS)
{
string[] srcLines = lptr.ReadBlock();
for (int i = 0; i < srcLines.Length; i += 1)
{
string theLine = srcLines[i];
if (!string.IsNullOrEmpty(theLine)) //can return empty lines sometimes
{
for (int j = 0; j < refLines.Length; j += 1)
{
if (theLine.Contains(refLines[j]))
{
tw.WriteLine(theLine);
break;
}
}
}
}
blockCounter += 1;
Console.WriteLine(String.Format("100 Mb blocks processed: {0}", blockCounter));
}
tw.Close();
}
}
I believe splitting strings and array handling can be significantly improved,
yet the goal here was to minimize number of disk reads.
i searched in stackoverflow and got one way but this method only let me to write word by word in the console. My goal is to get the end of my file but get the complete result not char by char.
This code only show me char by char the end of my file:
using (var reader = new StreamReader("file.dll")
{
if (reader.BaseStream.Length > 1024)
{
reader.BaseStream.Seek(-1024, SeekOrigin.End);
}
string line;
while ((line = reader.ReadLine()) != null)
{
Console.WriteLine(line);
Console.ReadKey();
}
}
I was trying to get something like this, it's c++ but i was trying to get the same result in c#.
QFile *archivo;
archivo = new QFile();
archivo->setFileName("file.dll");
archivo->open(QFile::ReadOnly);
archivo->seek(archivo->size() - 1024);
trama = archivo->read(1024);
It's possible to get the complete result of the end of my file in c#?
If the file is line-delimited text file, you can use ReadAllLines.
string[] lines = System.IO.File.ReadAllLines("file.txt");
If it's a binary file, you can use ReadAllBytes. Shocker, I know.
byte[] data = System.IO.File.ReadAllBytes("file.dll");
if you want to be able to seek first (e.g. if you want only the last 1024 bytes of the file) you can use the stream's Read method. Again, crazy.
reader.BaseStream.Seek(-1024, SeekOrigin.End);
var chars = new char[1024];
reader.Read(chars, 0, 1024);
And before you ask, you can convert the characters to a string by passing them to the constructor:
char[] chars = new char[1024];
string s = new string(chars);
Console.WriteLine(s);
Not sure what it'll look like, since you're reading characters from a binary file, but good luck. My guess is you should be reading bytes instead though:
reader.BaseStream.Seek(-1024, SeekOrigin.End);
var bytes = new byte[1024];
reader.BaseStream.Read(bytes, 0, 1024);
(Notice you don't even need the StreamReader, since the FileStream (your base stream) exposes the Read method you need).
I would like to set a small buffer so the buffer can be written to file more frequently. But it seems this does not work. I wrote the following code and check the text file from time to time and find that text is written to file when i = 840 and the file size is exactly 4K, which is the default buffer size. How come?
using (StreamWriter sw = new StreamWriter("u:\\log.txt", true, Encoding.UTF8, 1))
{
for (int i = 0; i < 300000; i++)
{
sw.WriteLine(i);
Console.Write(i);
Console.ReadLine();
}
}
StreamWriter uses an underlying FileStream, and based on the source code, it looks like the buffer size isn't passed to the file stream.
You can google: .net source code streamwriter
I have c# code reading a text file and printing it out which looks like this:
StreamReader sr = new StreamReader(File.OpenRead(ofd.FileName));
byte[] buffer = new byte[100]; //is there a way to simply specify the length of this to be the number of bytes in the file?
sr.BaseStream.Read(buffer, 0, buffer.Length);
foreach (byte b in buffer)
{
label1.Text += b.ToString("x") + " ";
}
Is there anyway I can know how many bytes my file has?
I want to know the length of the byte[] buffer in advance so that in the Read function, I can simply pass in buffer.length as the third argument.
System.IO.FileInfo fi = new System.IO.FileInfo("myfile.exe");
long size = fi.Length;
In order to find the file size, the system has to read from the disk. So, the above example performs data read from disk but does not read file content.
It's not clear why you're using StreamReader at all if you're going to read binary data. Just use FileStream instead. You can use the Length property to find the length of the file.
Note, however, that that still doesn't mean you should just call Read and *assume` that a single call will read all the data. You should loop until you've read everything:
byte[] data;
using (var stream = File.OpenRead(...))
{
data = new byte[(int) stream.Length];
int offset = 0;
while (offset < data.Length)
{
int chunk = stream.Read(data, offset, data.Length - offset);
if (chunk == 0)
{
// Or handle this some other way
throw new IOException("File has shrunk while reading");
}
offset += chunk;
}
}
Note that this is assuming you do want to read the data. If you don't want to even open the stream, use FileInfo.Length as other answers have shown. Note that both FileStream.Length and FileInfo.Length have a type of long, whereas arrays are limited to 32-bit lengths. What do you want to happen with a file which is bigger than 2 gigs?
You can use the FileInfo.Length method.
Take a look at the example given in the link.
I would imagine something in here should help.
I doubt you can preemptively guess the size of a file without reading it...
How do I use File.ReadAllBytes In chunks
If it is a large file; then reading in chunks should might help
I need to read the first line from a stream to determine file's encoding, and then recreate the stream with that Encoding
The following code does not work correctly:
var r = response.GetResponseStream();
var sr = new StreamReader(r);
string firstLine = sr.ReadLine();
string encoding = GetEncodingFromFirstLine(firstLine);
string text = new StreamReader(r, Encoding.GetEncoding(encoding)).ReadToEnd();
The text variable doesn't contain the whole text. For some reason the first line and several lines after it are skipped.
I tried everything: closing the StreamReader, resetting it, calling a separate GetResponseStream... but nothing worked.
I can't get the response stream again as I'm getting this file from the internet, and redownloading it again would be bad performance wise.
Update
Here's what GetEncodingFromFirstLine() looks like:
public static string GetEncodingFromFirstLine(string line)
{
int encodingIndex = line.IndexOf("encoding=");
if (encodingIndex == -1)
{
return "utf-8";
}
return line.Substring(encodingIndex + "encoding=".Length).Replace("\"", "").Replace("'", "").Replace("?", "").Replace(">", "");
}
...
// true
Assert.AreEqual("windows-1251", GetEncodingFromFirstLine(#"<?xml version=""1.0"" encoding=""windows-1251""?>"));
** Update 2 **
I'm working with XML files, and the text variable is parsed as XML:
var feedItems = XElement.Parse(text);
Well you're asking it to detect the encoding... and that requires it to read data. That's reading it from the underlying stream, and you're then creating another StreamReader around the same stream.
I suggest you:
Get the response stream
Retrieve all the data into a byte array (or MemoryStream)
Detect the encoding (which should be performed on bytes, not text - currently you're already assuming UTF-8 by creating a StreamReader)
Create a MemoryStream around the byte array, and a StreamReader around that
It's not clear what your GetEncodingFromFirstLine method does... or what this file really is. More information may make it easier to help you.
EDIT: If this is to load some XML, don't reinvent the wheel. Just give the stream to one of the existing XML-parsing classes, which will perform the appropriate detection for you.
You need to change the current position in the stream to the beginning.
r.Position = 0;
string text = new StreamReader(r, Encoding.GetEncoding(encoding)).ReadToEnd();
I found the answer to my question here:
How can I read an Http response stream twice in C#?
Stream responseStream = CopyAndClose(resp.GetResponseStream());
// Do something with the stream
responseStream.Position = 0;
// Do something with the stream again
private static Stream CopyAndClose(Stream inputStream)
{
const int readSize = 256;
byte[] buffer = new byte[readSize];
MemoryStream ms = new MemoryStream();
int count = inputStream.Read(buffer, 0, readSize);
while (count > 0)
{
ms.Write(buffer, 0, count);
count = inputStream.Read(buffer, 0, readSize);
}
ms.Position = 0;
inputStream.Close();
return ms;
}