C# BinaryReader notably slower. Alternative? - c#

For my project I need to write UInt16, UInt32, Bytes and Strings from a file. I started with a simple class I wrote like this:
public FileReader(string path) //constructor
{
if (!System.IO.File.Exists(path))
throw new Exception("FileReader::File not found.");
m_byteFile = System.IO.File.ReadAllBytes(path);
m_readPos = 0;
}
public UInt16 getU16() // basic function for reading
{
if (m_readPos + 1 >= m_byteFile.Length)
return 0;
UInt16 ret = (UInt16)((m_byteFile[m_readPos + 0])
+ (m_byteFile[m_readPos + 1] << 8));
m_readPos += 2;
return ret;
}
I thought it might be better to use the already existing BinaryReader though and so I tried it, but I noticed that it is slower than my approach.
Can somebody explain why this is and if there is another already existing class I could use to load a file and read from it?
~Adura

You have all the data upfront in an array in memory whereas BinaryReader streams the bytes in one at a time from the source which I guess is a file on disk. I guess you could speed it up by passing it a stream that reads from an in-memory array:
Stream stream = new MemoryStream(byteArray);
//Pass the stream to BinaryReader
Note that with this approach you need to fill the entire file in memory at once but I guess that's ok for you.

Related

How to find the no. of bytes of a text file without reading it?

I have c# code reading a text file and printing it out which looks like this:
StreamReader sr = new StreamReader(File.OpenRead(ofd.FileName));
byte[] buffer = new byte[100]; //is there a way to simply specify the length of this to be the number of bytes in the file?
sr.BaseStream.Read(buffer, 0, buffer.Length);
foreach (byte b in buffer)
{
label1.Text += b.ToString("x") + " ";
}
Is there anyway I can know how many bytes my file has?
I want to know the length of the byte[] buffer in advance so that in the Read function, I can simply pass in buffer.length as the third argument.
System.IO.FileInfo fi = new System.IO.FileInfo("myfile.exe");
long size = fi.Length;
In order to find the file size, the system has to read from the disk. So, the above example performs data read from disk but does not read file content.
It's not clear why you're using StreamReader at all if you're going to read binary data. Just use FileStream instead. You can use the Length property to find the length of the file.
Note, however, that that still doesn't mean you should just call Read and *assume` that a single call will read all the data. You should loop until you've read everything:
byte[] data;
using (var stream = File.OpenRead(...))
{
data = new byte[(int) stream.Length];
int offset = 0;
while (offset < data.Length)
{
int chunk = stream.Read(data, offset, data.Length - offset);
if (chunk == 0)
{
// Or handle this some other way
throw new IOException("File has shrunk while reading");
}
offset += chunk;
}
}
Note that this is assuming you do want to read the data. If you don't want to even open the stream, use FileInfo.Length as other answers have shown. Note that both FileStream.Length and FileInfo.Length have a type of long, whereas arrays are limited to 32-bit lengths. What do you want to happen with a file which is bigger than 2 gigs?
You can use the FileInfo.Length method.
Take a look at the example given in the link.
I would imagine something in here should help.
I doubt you can preemptively guess the size of a file without reading it...
How do I use File.ReadAllBytes In chunks
If it is a large file; then reading in chunks should might help

Issue with memory management and program performance

OK, I made a C# winform app, it's a File_Splitter_Joiner.
You just give it a file and it splits it for you to a number of pieces you specify.
The splitting is done in a separate thread.
Everything was working pretty fine until I sliced a 1Gig file!
In the task manager, I saw that my program started consuming 1Gigabyte of memory and my computer almost died!
not just that, when slicing finished, the consuming didn't change!
(dunno if this means that the garbage collector isn't working, although I'm pretty sure that I lost all references to what was holding the big data chumps, so it should work)
Here's the Splitter constructor (just to give you a better idea):
public FileSplitter(string FileToSplitPath, string PiecesFolder, int NumberOfPieces, int PieceSize, SplittingMethod Method)
{
FileToSplitInfo = new FileInfo(FileToSplitPath);
this.FileToSplitPath = FileToSplitPath;
this.PiecesFolder = PiecesFolder;
this.NumberOfPieces = NumberOfPieces;
this.PieceSize = PieceSize;
this.Method = Method;
SplitterThread = new Thread(Split);
}
And here is the method that did the actual splitting:
(I'm still a newbie, so what you're about to see 'may not' be done in the best way ever possible, I'm just learning here)
private void Split()
{
int remainingSize = 0;
int remainingPos = -1;
bool isNumberOfPiecesEqualInSize = true;
int fileSize = (int)FileToSplitInfo.Length; // FileToSplitInfo is a FileInfo object
if (fileSize % PieceSize != 0)
{
remainingSize = fileSize % PieceSize;
remainingPos = fileSize - remainingSize;
isNumberOfPiecesEqualInSize = false;
}
byte[] fileBytes = new byte[fileSize];
var _fs = File.Open(FileToSplitPath, FileMode.Open);
BinaryReader br = new BinaryReader(_fs);
br.Read(fileBytes, 0, fileSize);
br.Close();
_fs.Close();
for (int i = 0, index = 0; i < NumberOfPieces; i++, index += PieceSize)
{
var fs = File.Create(PiecesFolder + "\\" + Path.GetFileName(FileToSplitPath) + "." + (i+1).ToString());
var bw = new BinaryWriter(fs);
bw.Write(fileBytes, index, PieceSize);
if(i == NumberOfPieces-1 && !isNumberOfPiecesEqualInSize && Method == SplittingMethod.NumberOfPieces)
bw.Write(fileBytes, remainingPos, remainingSize);
bw.Close();
fs.Close();
}
MessageBox.Show("File has been splitted successfully!");
SplitterThread.Abort();
}
Now, instead of reading the bytes of the file via a BinaryReader, I was first reading it via the File.ReadAllBytes method, it was working fine with small file sizes, but, I got a "SystemOutOfMemory" exception when I dealt with our big guy, dunno why I didn't get that exception when I read the bytes via a BinaryReader.
(that was an in between question)
So, the main question is, how can I load big files (gigs speaking) in a way that doesn't consume so much memory ? I mean, how can I make my program not consume all that memory ?
and how I can I free the used memory after the splitting is done ?
(I actually used
bw.Dispose; fs.Dispose;
instead of
bw.Close(); fs.Close();
it was the same.
I know the Q might not make sense, cuz when we load something, it gets in our memory not somewhere else, but, the reason I asked it like that, is cuz I used another Splitting_Joining program (not written by me) just to see that if it had the same problem, I loaded the file, the program consumed about 5Migs of ram, when I started splitting, it used about 10Migs!!
Now that is a VERY big difference .. Probably that app was in C/C++ ..
So to sum up, who sucks ? is it my code, and if so how can I fix it ? or is it C# when it comes to performance ?
Thank you SOOO much for anything you could hook me up with :)
The following 2 lines will kil you:
int fileSize = (int)FileToSplitInfo.Length; // a FileInfo object
...
byte[] fileBytes = new byte[fileSize];
Your code will fail when the size is over Int32.MaxValue. Unnecessary, just use long fileSize = FileToSplitInfo.Length;
This corrected code will fail when there is not enough contiguous memory. Fragmentation (of the LOH) will bring you down sooner or later.
You allocate memory for the entire file but your only need PieceSize bytes at a time.
You don't even need to know the fileSize, just
byte[] pieceBuffer = new byte[PieceSize];
while (true)
{
int nBytes = br.Read(pieceBuffer, 0, pieceBuffer.Length);
if (nBytes == 0)
break;
// write this piece, the length is nBytes
}
There are different aspects that can be made better:
if you are working with big file, why first read all inside an array and after write into another file ? Just write into the new file while reading from the other.
use using to gurantee disposal of the streams, in any case: either there is an exception or not.
if you begin to work with really big file, like 1GB or even more, I would recommend to look on Memory Mapped Files. So you will laverage incredible memory consuption benefit with some increased performance cost.

File Chunking Performance in C#

I am trying to empower users to upload large files. Before I upload a file, I want to chunk it up. Each chunk needs to be a C# object. The reason why is for logging purposes. Its a long story, but I need to create actual C# objects that represent each file chunk. Regardless, I'm trying the following approach:
public static List<FileChunk> GetAllForFile(byte[] fileBytes)
{
List<FileChunk> chunks = new List<FileChunk>();
if (fileBytes.Length > 0)
{
FileChunk chunk = new FileChunk();
for (int i = 0; i < (fileBytes.Length / 512); i++)
{
chunk.Number = (i + 1);
chunk.Offset = (i * 512);
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
chunks.Add(chunk);
chunk = new FileChunk();
}
}
return chunks;
}
Unfortunately, this approach seems to be incredibly slow. Does anyone know how I can improve the performance while still creating objects for each chunk?
thank you
I suspect this is going to hurt a little:
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
Try this instead:
byte buffer = new byte[512];
Buffer.BlockCopy(fileBytes, chunk.Offset, buffer, 0, 512);
chunk.Bytes = buffer;
(Code not tested)
And the reason why this code would likely be slow is because Skip doesn't do anything special for arrays (though it could). This means that every pass through your loop is iterating the first 512*n items in the array, which results in O(n^2) performance, where you should just be seeing O(n).
Try something like this (untested code):
public static List<FileChunk> GetAllForFile(string fileName, FileMode.Open)
{
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName))
{
int i = 0;
while (stream.Position <= stream.Length)
{
var chunk = new FileChunk();
chunk.Number = (i);
chunk.Offset = (i * 512);
Stream.Read(chunk.Bytes, 0, 512);
chunks.Add(chunk);
i++;
}
}
return chunks;
}
The above code skips several steps in your process, preferring to read the bytes from the file directly.
Note that, if the file is not an even multiple of 512, the last chunk will contain less than 512 bytes.
Same as Robert Harvey's answer, but using a BinaryReader, that way I don't need to specify an offset. If you use a BinaryWriter on the other end to reassemble the file, you won't need the Offset member of FileChunk.
public static List<FileChunk> GetAllForFile(string fileName) {
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName)) {
BinaryReader reader = new BinaryReader(stream);
int i = 0;
bool eof = false;
while (!eof) {
var chunk = new FileChunk();
chunk.Number = i;
chunk.Offset = (i * 512);
chunk.Bytes = reader.ReadBytes(512);
chunks.Add(chunk);
i++;
if (chunk.Bytes.Length < 512) { eof = true; }
}
}
return chunks;
}
Have you thought about what you're going to do to compensate for packet loss and data corruption?
Since you mentioned that the load is taking a long time then I would use asynchronous file reading in order to speed up the loading process. The hard disk is the slowest component of a computer. Google does asynchronous reads and writes on Google Chrome to improve their load times. I had to do something like this in C# in a previous job.
The idea would be to spawn several asynchronous requests over different parts of the file. Then when a request comes in, take the byte array and create your FileChunk objects taking 512 bytes at a time. There are several benefits to this:
If you have this run in a separate thread, then you won't have the whole program waiting to load the large file you have.
You can process a byte array, creating FileChunk objects, while the hard disk is still trying to for-fill read request on other parts of the file.
You will save on RAM space if you limit the amount of pending read requests you can have. This allows less page faulting to the hard disk and use the RAM and CPU cache more efficiently, which speeds up processing further.
You would want to use the following methods in the FileStream class.
[HostProtectionAttribute(SecurityAction.LinkDemand, ExternalThreading = true)]
public virtual IAsyncResult BeginRead(
byte[] buffer,
int offset,
int count,
AsyncCallback callback,
Object state
)
public virtual int EndRead(
IAsyncResult asyncResult
)
Also this is what you will get in the asyncResult:
// Extract the FileStream (state) out of the IAsyncResult object
FileStream fs = (FileStream) ar.AsyncState;
// Get the result
Int32 bytesRead = fs.EndRead(ar);
Here is some reference material for you to read.
This is a code sample of working with Asynchronous File I/O Models.
This is a MS documentation reference for Asynchronous File I/O.

Transfering a File with a NetworkStream then rebuilding the file fails

I am trying to send a file over a NetworkStream and rebuild it on the client side. I can get the data over correctly (i think) but when I use either a BinaryWriter or a FileStream object to recreate the file, the file is cut off in the beginning at the same point no matter what methodology I use.
private void ReadandSaveFileFromServer(ref TcpClient clientATF,ref NetworkStream currentStream, string locationToSave)
{
int fileSize = 0;
string fileName = "";
fileName = ReadStringFromServer(ref clientATF,ref currentStream);
fileSize = ReadIntFromServer(ref clientATF,ref currentStream);
byte[] fileSent = new byte[fileSize];
if (currentStream.CanRead && clientATF.Connected)
{
currentStream.Read(fileSent, 0, fileSent.Length);
WriteToConsole("Log Recieved");
}
else
{
WriteToConsole("Log Transfer Failed");
}
FileStream fileToCreate = new FileStream(locationToSave + "\\" + fileName, FileMode.Create);
fileToCreate.Seek(0, SeekOrigin.Begin);
fileToCreate.Write(fileSent, 0, fileSent.Length);
fileToCreate.Close();
//binWriter = new BinaryWriter(File.Open(locationToSave + "\\" + fileName, FileMode.Create));
//binWriter.Write(fileSent);
//binWriter.Close();
}
When I step through and check fileName and fileSize, they are correct. The byte[] is also fully populated. Any clue as to what I can do next?
Thanks in advance...
Sean
EDIT!!!:
So I figured out what is happening. When I read a string and then the Int from stream, the byte array is 256 indices long. So my read for string is taking in the int, which then will clobber the other areas. Need to figure this out...
For one thing, you can use the convenience method File.WriteAllBytes to write the data more simply. But I doubt that that's the problem.
You're assuming you can read all the data in a single call to Read. You're ignoring the return value. Don't do that - instead, read multiple times until either you've read everything you expect to, or you've reached the end of the stream. See this article for more details. If you're using .NET 4, there's a new CopyTo method you may find useful.
(As an aside, your use of ref suggests that you don't understand what it really means. It's well worth making sure you understand how arguments are passed in C#.)
To add to Jon Skeet's answer, your reading code should be:
int bytesRead;
int readPos = 0;
do
{
bytesRead = currentStream.Read(fileSent, readPos, fileSent.Length);
readPos += bytesRead;
} while (bytesRead > 0);
If you are looking for a general solution for sending and receiving files over a network have you considered using a C# network library? It has probably solved most of the issues you will come across when trying to do this.
Disclaimer: I'm one of the developers of this library.

Bytes consumed by StreamReader

Is there a way to know how many bytes of a stream have been used by StreamReader?
I have a project where we need to read a file that has a text header followed by the start of the binary data. My initial attempt to read this file was something like this:
private int _dataOffset;
void ReadHeader(string path)
{
using (FileStream stream = File.OpenRead(path))
{
StreamReader textReader = new StreamReader(stream);
do
{
string line = textReader.ReadLine();
handleHeaderLine(line);
} while(line != "DATA") // Yes, they used "DATA" to mark the end of the header
_dataOffset = stream.Position;
}
}
private byte[] ReadDataFrame(string path, int frameNum)
{
using (FileStream stream = File.OpenRead(path))
{
stream.Seek(_dataOffset + frameNum * cbFrame, SeekOrigin.Begin);
byte[] data = new byte[cbFrame];
stream.Read(data, 0, cbFrame);
return data;
}
return null;
}
The problem is that when I set _dataOffset to stream.Position, I get the position that the StreamReader has read to, not the end of the header. As soon as I thought about it this made sense, but I still need to be able to know where the end of the header is and I'm not sure if there's a way to do it and still take advantage of StreamReader.
You can find out how many bytes the StreamReader has actually returned (as opposed to read from the stream) in a number of ways, none of them too straightforward I'm afraid.
Get the result of textReader.CurrentEncoding.GetByteCount(totalLengthOfAllTextRead) and then seek to this position in the stream.
Use some reflection hackery to retrieve the value of the private variable of the StreamReader object that corresponds to the current byte position within the internal buffer (different from that with the stream - usually behind, but no more than equal to of course). Judging by .NET Reflector, the this variable seems to be named bytePos.
Don't bother using a StreamReader at all but instead implement your custom ReadLine function built on top of the Stream or BinaryReader even (BinaryReader is guaranteed never to read further ahead than what you request). This custom function must read from the stream char by char, so you'd actually have to use the low-level Decoder object (unless the encoding is ASCII/ANSI, in which case things are a bit simpler due to single-byte encoding).
Option 1 is going to be the least efficient I would imagine (since you're effectively re-encoding text you just decoded), and option 3 the hardest to implement, though perhaps the most elegant. I'd probably recommend against using the ugly reflection hack (option 2), even though it's looks tempting, being the most direct solution and only taking a couple of lines. (To be quite honest, the StreamReader class really ought to expose this variable via a public property, but alas it does not.) So in the end, it's up to you, but either method 1 or 3 should do the job nicely enough...
Hope that helps.
So the data is utf8 (the default encoding for StreamReader). This is a multibyte encoding, so IndexOf would be inadvisable. You could:
Encoding.UTF8.GetByteCount(string)
on your data so far, adding 1 or 2 bytes for the missing line ending.
If you're needing to count bytes, I'd go with the BinaryReader. You can take the results and cast them about as needed, but I find its idea of its current position to be more reliable (in that since it reads in binary, its immune to character-set problems).
So your last line contains 'DATA' + an unknown amount of data bytes. You could extract the position by using IndexOf() with your last read line. Then readjust the stream.Position.
But I am not sure if you should use ReadLine() at all in this case. Maybe it would be better to read byte by byte until you reach the 'DATA' mark.
The line breaks are easily identifiable without needing to decode the stream first (except for some encodings rarely used for text files like EBCDIC, UTF-16, UTF-32), so you can just read each line as bytes and then decode the entire line:
using (FileStream stream = File.OpenRead(path)) {
List<byte> buffer = new List<byte>();
bool hasCr = false;
bool done = false;
while (!done) {
int b = stream.ReadByte();
if (b == -1) throw new IOException("End of file reached in header.");
if (b == 13) {
hasCr = true;
} else if (b == 10 && hasCr) {
string line = Encoding.UTF8.GetString(buffer.ToArray(), 0, buffer.Count);
if (line == "DATA") {
done = true;
} else {
HandleHeaderLine(line);
}
buffer.Clear();
hasCr = false;
} else {
if (hasCr) buffer.Add(13);
hasCr = false;
buffer.Add((byte)b);
}
}
_dataOffset = stream.Position;
}
Instead of closing the stream and open it again, you could of course just keep on reading the data.

Categories