so I'm trying to load a file into a richtextbox, but I'm having some problems. No matter what method I used, binaryreader, filestream, streamreader I always encountered a problem with loading a file into a richtextbox in chunks. (I can't use LoadFile as it doesn't let me specify encoding). It seems that if the buffer size is too small, smaller than 3MB, AppendText sometimes adds a few extra empty lines. The file itself doesn't lose any data, there are just a few extra lines appended to it. Here is the code I'm using:
richTextBox.Clear();
progressBar.Value = 0;
const int bufferSize = 1024 * 1024 * 3; //I've tried smaller buffers but they ALL seem to append a few extra lines (empty lines)
using (StreamReader streamReader = new StreamReader(path))
{
while (streamReader.Peek() != -1)
{
char[] buffer = new char[bufferSize];
await streamReader.ReadBlockAsync(buffer, 0, bufferSize);
richTextBox.AppendText(new string(buffer));
progressBar.Value = (int)(((double)streamReader.BaseStream.Position) / streamReader.BaseStream.Length * 100);
}
}
This code seems to work, but I'm paranoid that it might still append extra lines at times depending on the circumstances. Does anybody know why this could be occurring?
*Extra Questions
Is using StreamReader slower than FileStream or binaryreader?
Should I use readblock or read?
This is the simple way you can use; it gives same format as in file:
OpenFileDialog fil = new OpenFileDialog();
if (fil.ShowDialog()== DialogResult.OK)
{
richTextBox1.Clear();
richTextBox1.Text = File.ReadAllText(fil.FileName);
}
I am currently testing several decompression libraries for a project I'm involved with to decompress http file streams on the fly. I have tried two very promising libraries and found an issue that seems to appear in both of them.
This is what I am doing:
video.avi compressed to video.zip on HTTP server test.com/video.zip (~20MB)
HttpWebRequest to read stream from the server
Write HttpWebRequest ResponseStream data into MemoryStream
Let decompression library read from MemoryStream
Read decompressed file stream while it's being downloaded by HttpWebRequest
The whole idea works fine, I'm able to uncompress and stream the compressed video directly into VLC stdin and it's rendered just fine. However I have to use a read buffer of one byte on the decompression library. Any buffer larger than one byte will cause the uncompressed data stream to be cut off. For a test I've written the decompressed stream into a file and compared it with the original video.avi and some data is just skipped by the decompression. When streaming this broken data into VLC it causes a lot of video artifacts and the playback speed is also greatly reduced.
If I knew the size of what is available to read I could trim my buffer accordingly but no library would make this information public so all I can do is read the data with a one byte buffer. Maybe my approach is wrong? Or maybe I'm overlooking something?
Here's an example code (requires VLC):
ICSharpCode.SharpZLib (http://icsharpcode.github.io/SharpZipLib/)
static void Main(string[] args)
{
// Initialise VLC
Process vlc = new Process()
{
StartInfo =
{
FileName = #"C:\Program Files\VideoLAN\vlc.exe", // Adjust as required to test the code
RedirectStandardInput = true,
UseShellExecute = false,
Arguments = "-"
}
};
vlc.Start();
Stream outStream = vlc.StandardInput.BaseStream;
// Get source stream
HttpWebRequest stream = (HttpWebRequest)WebRequest.Create("http://codefreak.net/~daniel/apps/stream60s-large.zip");
Stream compressedVideoStream = stream.GetResponse().GetResponseStream();
// Create local decompression loop
MemoryStream compressedLoopback = new MemoryStream();
ZipInputStream zipStream = new ZipInputStream(compressedLoopback);
ZipEntry currentEntry = null;
byte[] videoStreamBuffer = new byte[8129]; // 8kb read buffer
int read = 0;
long totalRead = 0;
while ((read = compressedVideoStream.Read(videoStreamBuffer, 0, videoStreamBuffer.Length)) > 0)
{
// Write compressed video stream into compressed loopback without affecting current read position
long previousPosition = compressedLoopback.Position; // Store current read position
compressedLoopback.Position = totalRead; // Jump to last write position
totalRead += read; // Increase last write position by current read size
compressedLoopback.Write(videoStreamBuffer, 0, read); // Write data into loopback
compressedLoopback.Position = previousPosition; // Restore reading position
// If not already, move to first entry
if (currentEntry == null)
currentEntry = zipStream.GetNextEntry();
byte[] outputBuffer = new byte[1]; // Decompression read buffer, this is the bad one!
int zipRead = 0;
while ((zipRead = zipStream.Read(outputBuffer, 0, outputBuffer.Length)) > 0)
outStream.Write(outputBuffer, 0, outputBuffer.Length); // Write directly to VLC stdin
}
}
SharpCompress (https://github.com/adamhathcock/sharpcompress)
static void Main(string[] args)
{
// Initialise VLC
Process vlc = new Process()
{
StartInfo =
{
FileName = #"C:\Program Files\VideoLAN\vlc.exe", // Adjust as required to test the code
RedirectStandardInput = true,
UseShellExecute = false,
Arguments = "-"
}
};
vlc.Start();
Stream outStream = vlc.StandardInput.BaseStream;
// Get source stream
HttpWebRequest stream = (HttpWebRequest)WebRequest.Create("http://codefreak.net/~daniel/apps/stream60s-large.zip");
Stream compressedVideoStream = stream.GetResponse().GetResponseStream();
// Create local decompression loop
MemoryStream compressedLoopback = new MemoryStream();
ZipReader zipStream = null;
EntryStream currentEntry = null;
byte[] videoStreamBuffer = new byte[8129]; // 8kb read buffer
int read = 0;
long totalRead = 0;
while ((read = compressedVideoStream.Read(videoStreamBuffer, 0, videoStreamBuffer.Length)) > 0)
{
// Write compressed video stream into compressed loopback without affecting current read position
long previousPosition = compressedLoopback.Position; // Store current read position
compressedLoopback.Position = totalRead; // Jump to last write position
totalRead += read; // Increase last write position by current read size
compressedLoopback.Write(videoStreamBuffer, 0, read); // Write data into loopback
compressedLoopback.Position = previousPosition; // Restore reading position
// Open stream after writing to it because otherwise it will not be able to identify the compression type
if (zipStream == null)
zipStream = (ZipReader)ReaderFactory.Open(compressedLoopback); // Cast to ZipReader, as we know the type
// If not already, move to first entry
if (currentEntry == null)
{
zipStream.MoveToNextEntry();
currentEntry = zipStream.OpenEntryStream();
}
byte[] outputBuffer = new byte[1]; // Decompression read buffer, this is the bad one!
int zipRead = 0;
while ((zipRead = currentEntry.Read(outputBuffer, 0, outputBuffer.Length)) > 0)
outStream.Write(outputBuffer, 0, outputBuffer.Length); // Write directly to VLC stdin
}
}
To test this code I recommend setting the output buffer for SharpZipLib to 2 bytes and for SharpCompress to 8 bytes. You will see the artifacts and also that the play speed of the video is wrong, the seek time should always be aligned with the number that is counting in the video.
I haven't really found any good explanation of why a larger outputBuffer that is reading from the decompression lib is causing these problems or a way to solve this other than having the tiniest possible buffer.
So my question is what I am doing wrong or if this is a general issue when reading compressed files from streams? How could I increase the outputBuffer while reading the correct data?
Any help is greatly appreciated!
Regards,
Gachl
You need to write only how many bytes you read. Writing the entire buffer size will add additional bytes (whatever happened to be in the buffer before). zipStream.Read is not required to read as many bytes as you request.
while ((zipRead = zipStream.Read(outputBuffer, 0, outputBuffer.Length)) > 0)
outStream.Write(outputBuffer, 0, zipRead); // Write directly to VLC stdin
I have c# code reading a text file and printing it out which looks like this:
StreamReader sr = new StreamReader(File.OpenRead(ofd.FileName));
byte[] buffer = new byte[100]; //is there a way to simply specify the length of this to be the number of bytes in the file?
sr.BaseStream.Read(buffer, 0, buffer.Length);
foreach (byte b in buffer)
{
label1.Text += b.ToString("x") + " ";
}
Is there anyway I can know how many bytes my file has?
I want to know the length of the byte[] buffer in advance so that in the Read function, I can simply pass in buffer.length as the third argument.
System.IO.FileInfo fi = new System.IO.FileInfo("myfile.exe");
long size = fi.Length;
In order to find the file size, the system has to read from the disk. So, the above example performs data read from disk but does not read file content.
It's not clear why you're using StreamReader at all if you're going to read binary data. Just use FileStream instead. You can use the Length property to find the length of the file.
Note, however, that that still doesn't mean you should just call Read and *assume` that a single call will read all the data. You should loop until you've read everything:
byte[] data;
using (var stream = File.OpenRead(...))
{
data = new byte[(int) stream.Length];
int offset = 0;
while (offset < data.Length)
{
int chunk = stream.Read(data, offset, data.Length - offset);
if (chunk == 0)
{
// Or handle this some other way
throw new IOException("File has shrunk while reading");
}
offset += chunk;
}
}
Note that this is assuming you do want to read the data. If you don't want to even open the stream, use FileInfo.Length as other answers have shown. Note that both FileStream.Length and FileInfo.Length have a type of long, whereas arrays are limited to 32-bit lengths. What do you want to happen with a file which is bigger than 2 gigs?
You can use the FileInfo.Length method.
Take a look at the example given in the link.
I would imagine something in here should help.
I doubt you can preemptively guess the size of a file without reading it...
How do I use File.ReadAllBytes In chunks
If it is a large file; then reading in chunks should might help
I am using fs.Length, where fs is a FileStream.
Is this an O(1) operation? I would think this would just read from the properties of the file, as opposed to going through the file to find when the seek position has reached the end. The file I am trying to find the length of could easily range from 1 MB to 4-5 GB.
However I noticed that there is a FileInfo class, which also has a Length property.
Do both of these Length properties theoretically take the same amount of time? Or does is fs.Length slower because it must open the FileStream first?
The natural way to get the file size in .NET is the FileInfo.Length property you mentioned.
I am not sure Stream.Length is slower (it won't read the whole file anyway), but it's definitely more natural to use FileInfo instead of a FileStream if you do not plan to read the file.
Here's a small benchmark that will provide some numeric values:
private static void Main(string[] args)
{
string filePath = ...; // Path to 2.5 GB file here
Stopwatch z1 = new Stopwatch();
Stopwatch z2 = new Stopwatch();
int count = 10000;
z1.Start();
for (int i = 0; i < count; i++)
{
long length;
using (Stream stream = new FileStream(filePath, FileMode.Open))
{
length = stream.Length;
}
}
z1.Stop();
z2.Start();
for (int i = 0; i < count; i++)
{
long length = new FileInfo(filePath).Length;
}
z2.Stop();
Console.WriteLine(string.Format("Stream: {0}", z1.ElapsedMilliseconds));
Console.WriteLine(string.Format("FileInfo: {0}", z2.ElapsedMilliseconds));
Console.ReadKey();
}
Results:
Stream: 886
FileInfo: 727
Both will access the file system metadata rather than reading the whole file. I don't know which is more efficient necessarily, as a rule of thumb I'd say that if you only want to know the length (and other metadata), use FileInfo - whereas if you're opening the file as a stream anyway, use FileStream.Length.
I have a huge file, where I have to insert certain characters at a specific location. What is the easiest way to do that in C# without rewriting the whole file again.
Filesystems do not support "inserting" data in the middle of a file. If you really have a need for a file that can be written to in a sorted kind of way, I suggest you look into using an embedded database.
You might want to take a look at SQLite or BerkeleyDB.
Then again, you might be working with a text file or a legacy binary file. In that case your only option is to rewrite the file, at least from the insertion point up to the end.
I would look at the FileStream class to do random I/O in C#.
You will probably need to rewrite the file from the point you insert the changes to the end. You might be best always writing to the end of the file and use tools such as sort and grep to get the data out in the desired order. I am assuming you are talking about a text file here, not a binary file.
There is no way to insert characters in to a file without rewriting them. With C# it can be done with any Stream classes. If the files are huge, I would recommend you to use GNU Core Utils inside C# code. They are the fastest. I used to handle very large text files with the core utils ( of sizes 4GB, 8GB or more etc ). Commands like head, tail, split, csplit, cat, shuf, shred, uniq really help a lot in text manipulation.
For example if you need to put some chars in a 2GB file, you can use split -b BYTECOUNT, put the ouptut in to a file, append the new text to it, and get the rest of the content and add to it. This should supposedly be faster than any other way.
Hope it works. Give it a try.
You can use random access to write to specific locations of a file, but you won't be able to do it in text format, you'll have to work with bytes directly.
If you know the specific location to which you want to write the new data, use the BinaryWriter class:
using (BinaryWriter bw = new BinaryWriter (File.Open (strFile, FileMode.Open)))
{
string strNewData = "this is some new data";
byte[] byteNewData = new byte[strNewData.Length];
// copy contents of string to byte array
for (var i = 0; i < strNewData.Length; i++)
{
byteNewData[i] = Convert.ToByte (strNewData[i]);
}
// write new data to file
bw.Seek (15, SeekOrigin.Begin); // seek to position 15
bw.Write (byteNewData, 0, byteNewData.Length);
}
You may take a look at this project:
Win Data Inspector
Basically, the code is the following:
// this.Stream is the stream in which you insert data
{
long position = this.Stream.Position;
long length = this.Stream.Length;
MemoryStream ms = new MemoryStream();
this.Stream.Position = 0;
DIUtils.CopyStream(this.Stream, ms, position, progressCallback);
ms.Write(data, 0, data.Length);
this.Stream.Position = position;
DIUtils.CopyStream(this.Stream, ms, this.Stream.Length - position, progressCallback);
this.Stream = ms;
}
#region Delegates
public delegate void ProgressCallback(long position, long total);
#endregion
DIUtils.cs
public static void CopyStream(Stream input, Stream output, long length, DataInspector.ProgressCallback callback)
{
long totalsize = input.Length;
long byteswritten = 0;
const int size = 32768;
byte[] buffer = new byte[size];
int read;
int readlen = length < size ? (int)length : size;
while (length > 0 && (read = input.Read(buffer, 0, readlen)) > 0)
{
output.Write(buffer, 0, read);
byteswritten += read;
length -= read;
readlen = length < size ? (int)length : size;
if (callback != null)
callback(byteswritten, totalsize);
}
}
Depending on the scope of your project, you may want to decide to insert each line of text with your file in a table datastructure. Sort of like a database table, that way you can insert to a specific location at any given moment, and not have to read-in, modify, and output the entire text file each time. This is given the fact that your data is "huge" as you put it. You would still recreate the file, but at least you create a scalable solution in this manner.
It may be "possible" depending on how the filesystem stores files to quickly insert (ie, add additional) bytes in the middle. If it is remotely possible it may only be feasible to do so a full block at a time, and only by either doing low level modification of the filesystem itself or by using a filesystem specific interface.
Filesystems are not generally designed for this operation. If you need to quickly do inserts you really need a more general database.
Depending on your application a middle ground would be to bunch your inserts together, so you only do one rewrite of the file rather than twenty.
You will always have to rewrite the remaining bytes from the insertion point. If this point is at 0, then you will rewrite the whole file. If it is 10 bytes before the last byte, then you will rewrite the last 10 bytes.
In any case there is no function to directly support "insert to file". But the following code can do it accurately.
var sw = new Stopwatch();
var ab = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ";
// create
var fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
fs.Seek(0, SeekOrigin.Begin);
for (var i = 0; i < 40000000; i++) fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
fs.Dispose();
// insert
fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
byte[] b = new byte[262144];
long target = 10, offset = fs.Length - b.Length;
while (offset != 0)
{
if (offset < 0)
{
offset = b.Length - target;
b = new byte[offset];
}
fs.Position = offset; fs.Read(b, 0, b.Length);
fs.Position = offset + target; fs.Write(b, 0, b.Length);
offset -= b.Length;
}
fs.Position = target; fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
To gain better performance for file IO, play with "magic two powered numbers" like in the code above. The creation of the file uses a buffer of 262144 bytes (256KB) that does not help at all. The same buffer for the insertion does the "performance job" as you can see by the StopWatch results if you run the code. A draft test on my PC gave the following results:
13628.8 ms for creation and 3597.0971 ms for insertion.
Note that the target byte for insertion is 10, meaning that almost the whole file was rewritten.
Why don't you put a pointer to the end of the file (literally, four bytes above the current size of the file) and then, on the end of file write the length of inserted data, and finally the data you want to insert itself. For example, if you have a string in the middle of the file, and you want to insert few characters in the middle of the string, you can write a pointer to the end of file over some four characters in the string, and then write that four characters to the end together with the characters you firstly wanted to insert. It's all about ordering data. Of course, you can do this only if you are writing the whole file by yourself, I mean you are not using other codecs.