streamreader/buffered stream C#

streamreader/buffered stream C# - c#

I have a class Value
the output of Value is used as an input to other classes and eventually in Main.
In Main a logic is performed and output is produced for first 512 bits. I want my program to return back to value() to start with next 512 bits of file.txt. How can I do that?
public static byte[] Value()
{
byte[] numbers = new byte[9999];
using (FileStream fs = File.Open(#"C:\Users\file.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
string line;
while ((line = sr.ReadLine()) != null)
{
for (int i = 0; i < 512; i++)
{
numbers[i] = Byte.Parse(line[i].ToString());
}
}
}
return numbers;
}

What can be done is to pass Value() an offset and a length parameter.
But there is a problem with your method, you are actually taking the first bytes for each line in the file, which I don't know is what you want to do. So I corrected this to make sure you return only length bytes.
using System.Linq Skip and Take methods, you may find things easier as well
public static byte[] Value(int startOffset, int length)
{
byte allBytes = File.ReadAllBytes(#"C:\Users\file.txt");
return allBytes.Skip(startOffset).Take(length);
}

It seems like what you are trying to do is use a recursive call on Value() this is based on your comment, but it is not clear, so I am going to do that assumption.
there is a problem I see and it's like in your scenario you're returning a byte[], So I modified your code a little bit to make it as closest as your's.
/// <summary>
/// This method will call your `value` methodand return the bytes and it is the entry point for the loci.
/// </summary>
/// <returns></returns>
public static byte[] ByteValueCaller()
{
byte[] numbers = new byte[9999];
Value(0, numbers);
return numbers;
}
public static void Value(int startingByte, byte[] numbers)
{
using (FileStream fs = File.Open(#"C:\Users\file.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BinaryReader br = new BinaryReader(fs))
{
//64bytes == 512bit
//determines if the last position to use is inside your stream, or if the last position is the end of the stream.
int bytesToRead = startingByte + 64 > br.BaseStream.Length ? (int)br.BaseStream.Length - startingByte : 64;
//move your stream to the given possition
br.BaseStream.Seek(startingByte, SeekOrigin.Begin);
//populates databuffer with the given bytes
byte[] dataBuffer = br.ReadBytes(bytesToRead);
//This method will migrate from our temporal databuffer to the numbers array.
TransformBufferArrayToNumbers(startingByte, dataBuffer, numbers);
//recursive call to the same
if (startingByte + bytesToRead < fs.Length)
Value(startingByte + bytesToRead, numbers);
}
static void TransformBufferArrayToNumbers(int startingByte, byte[] dataBuffer, byte[] numbers)
{
for (var i = 0; i < dataBuffer.Length; i++)
{
numbers[startingByte + i] = dataBuffer[i];
}
}
}
Also, be careful with the byte[9999] as you are limiting the characters you can get, if that's a hardcoded limit, I will add also that information on the if that determines the recursive call.

#TiGreX
public static List<byte> ByteValueCaller()
{
List<byte> numbers = new List<byte>();
GetValue(0, numbers);
return numbers;
}
public static void GetValue(int startingByte, List<byte> numbers)
{
using (FileStream fs = File.Open(#"C:\Users\file1.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BinaryReader br = new BinaryReader(fs))
{
//64bytes == 512bit
//determines if the last position to use is inside your stream, or if the last position is the end of the stream.
int bytesToRead = startingByte + 64 > br.BaseStream.Length ? (int)br.BaseStream.Length - startingByte : 64;
//move your stream to the given possition
br.BaseStream.Seek(startingByte, SeekOrigin.Begin);
//populates databuffer with the given bytes
byte[] dataBuffer = br.ReadBytes(bytesToRead);
numbers.AddRange(dataBuffer);
//recursive call to the same
if (startingByte + bytesToRead < fs.Length)
GetValue(startingByte + bytesToRead, numbers);
}
}

Related

C# FileStream.Read doesn't read last block

I read binary file to hex by block.
It is diffrent when I use FileStream.Read and File.ReadAllBytes
FileSteram.Read
int limit = 0;
if (openFileDlg.FileName.Length > 0)
{
fileName = openFileDlg.FileName;
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
fsLen = (int)fs.Length;
int count = 0;
limit = 100;
byte[] read_buff = new byte[limit];
StringBuilder sb = new StringBuilder();
while ( (count = fs.Read(read_buff, 0, limit)) > 0)
{
foreach (byte b in read_buff)
{
sb.Append(Convert.ToString(b, 16).PadLeft(2, '0'));
}
}
rtxb_bin.AppendText(sb.ToString() + "\n");
}
File.ReadAllBytes
if (openFileDlg.FileName.Length > 0)
{
fileName = openFileDlg.FileName;
byte[] fileBytes = File.ReadAllBytes(fileName);
StringBuilder sb2 = new StringBuilder();
foreach (byte b2 in fileBytes)
{
sb2.Append(Convert.ToString(b2, 16).PadLeft(2, '0'));
}
rtxb_allbin.AppendText(sb2.ToString());
}
case 1, reasult is ...
........04c0020f00452a00421346108129844f2138448500208020250405250043188510812e0
and case 2 is
.......04c0020f00452a00421346108129844f2138448500208020250405250043188510812e044f212cc48120c24125404f2069c2c0008bff35f8f401efbd17047
FileStream.Read doesn't read after '12e0'
'44f212cc48120c24125404f2069c2c0008bff35f8f401efbd17047' is missing
How can I read all bytes using FileStream.Read?
Why FileStream.Read doesn't read last block?

Most likely it appears to you that it does not read last block. Suppose you have file of length 102. First iteration of you loop reads first 100 bytes, all is fine. But what happens on second (last) one? You read two bytes into read_buff, which is of length 100. Now that buffer contains 2 bytes of last block and 98 bytes of previous (first) block, because Read doesn't clear the buffer. Then you proceed with:
foreach (byte b in read_buff)
{
sb.Append(Convert.ToString(b, 16).PadLeft(2, '0'));
}
In result, sb has 100 bytes of first block, 2 bytes of last block, and then again 98 bytes of first block. If you don't look too closely, it might appear that it just skipped last block, while in reality it duplicated part of the previous one.
To fix, use count (indicating how much bytes were really read into the buffer) to work only with valid part of read_buff:
for (int i = 0; i < count; i++) {
sb.Append(Convert.ToString(read_buff[i], 16).PadLeft(2, '0'));
}

You need update offset and count.
Sintaxis
public override int Read(
byte[] array,
int offset,
int count
)
Example
public static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
Reference

public static void ReadAndProcessLargeFile(string theFilename, long whereToStartReading = 0)
{
FileInfo info = new FileInfo(theFilename);
long fileLength = info.Length;
long timesToRead = (fileLength / megabyte);
long ctr = 0;
long timesRead = 0;
FileStream fileStram = new FileStream(theFilename, FileMode.Open, FileAccess.Read);
using (fileStram)
{
byte[] buffer = new byte[megabyte];
fileStram.Seek(whereToStartReading, SeekOrigin.Begin);
int bytesRead = 0;
//bytesRead = fileStram.Read(buffer, 0, megabyte);
//ctr = ctr + 1;
while ((bytesRead = fileStram.Read(buffer, 0, megabyte)) > 0)
{
ProcessChunk(buffer, bytesRead);
buffer = new byte[megabyte]; // This solves last read prob
}
}
}
private static void ProcessChunk(byte[] buffer, int bytesRead)
{
// Do the processing here
string utfString = Encoding.UTF8.GetString(buffer, 0, bytesRead);
Console.Write(utfString);
}

FileStream Seek fails on large files at second call

I'm working with large files , beginning from 10Gb. I'm loading the parts of the file in the memory for processing. Following code works fine for smaller files (700Mb)
byte[] byteArr = new byte[layerPixelCount];
using (FileStream fs = File.OpenRead(recFileName))
{
using (BinaryReader br = new BinaryReader(fs))
{
fs.Seek(offset, SeekOrigin.Begin);
for (int i = 0; i < byteArr.Length; i++)
{
byteArr[i] = (byte)(br.ReadUInt16() / 256);
}
}
}
After opening a 10Gb file, the first run of this function is OK. But the second Seek() throws an IO exception:
An attempt was made to move the file pointer before the beginning of the file.
The numbers are:
fs.Length = 11998628352
offset = 4252580352
byteArr.Length = 7746048
I assumed that GC didn't collect the closed fs reference before the second call and tried
GC.Collect();
GC.WaitForPendingFinalizers();
but no luck.
Any help is apreciated

I'm guessing it's because either your signed integer indexer or offset is rolling over to negative values. Try declaring offset and i as long.
//Offest is now long
long offset = 4252580352;
byte[] byteArr = new byte[layerPixelCount];
using (FileStream fs = File.OpenRead(recFileName))
{
using (BinaryReader br = new BinaryReader(fs))
{
fs.Seek(offset, SeekOrigin.Begin);
for (long i = 0; i < byteArr.Length; i++)
{
byteArr[i] = (byte)(br.ReadUInt16() / 256);
}
}
}

My following written code logic is appropriate with large files beyond 4GB. The key issue to notice is the LONG data type used with the SEEK method. As a LONG is able to point beyond 2^32 data boundaries. In this example, the code is processing first processing the large file in chunks of 1GB, after the large whole 1GB chunks are processed, the left over (<1GB) bytes are processed. I use this code with calculating the CRC of files beyond the 4GB size. (using https://crc32c.machinezoo.com/ for the crc32c calculation in this example)
private uint Crc32CAlgorithmBigCrc(string fileName)
{
uint hash = 0;
byte[] buffer = null;
FileInfo fileInfo = new FileInfo(fileName);
long fileLength = fileInfo.Length;
int blockSize = 1024000000;
decimal div = fileLength / blockSize;
int blocks = (int)Math.Floor(div);
int restBytes = (int)(fileLength - (blocks * blockSize));
long offsetFile = 0;
uint interHash = 0;
Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
bool firstBlock = true;
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
buffer = new byte[blockSize];
using (BinaryReader br = new BinaryReader(fs))
{
while (blocks > 0)
{
blocks -= 1;
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(blockSize);
if (firstBlock)
{
firstBlock = false;
interHash = Crc32CAlgorithm.Compute(buffer);
hash = interHash;
}
else
{
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
offsetFile += blockSize;
}
if (restBytes > 0)
{
Array.Resize(ref buffer, restBytes);
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(restBytes);
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
buffer = null;
}
}
//MessageBox.Show(hash.ToString());
//MessageBox.Show(hash.ToString("X"));
return hash;
}

Declaring byte array for getting bytes from a file

Basically I'm reading all the bytes from a file into a byte array using stream reader.
The array I have declared looks like this : byte[] array = new byte[256];
The size of the array 256 can read the whole bytes from the file? Saying that a file has 500 bytes instead of 256?
Or the each element from the array has the size 256 bytes?

Just use
byte[] byteData = System.IO.File.ReadAllBytes(fileName);
and then you can find out how long the file was by looking at the byteData.Length property.

You could use File.ReadAllBytes instead:
byte[] fileBytes = File.ReadAllBytes(path);
or if you just want to know the size, with a FileInfo object:
FileInfo f = new FileInfo(path);
long s1 = f.Length;
Edit: If you want to it "in a classical way" as commented:
byte[] array;
using (FileStream fileStream = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
int num = 0;
long length = fileStream.Length;
if (length > 2147483647L)
{
throw new ArgumentException("File is greater than 2GB, hence it is too large!", "path");
}
int i = (int)length;
array = new byte[i];
while (i > 0)
{
int num2 = fileStream.Read(array, num, i);
num += num2;
i -= num2;
}
}
(reflected via ILSpy)

An elegant way to consume (all bytes of a) BinaryReader?

Is there an elegant to emulate the StreamReader.ReadToEnd method with BinaryReader? Perhaps to put all the bytes into a byte array?
I do this:
read1.ReadBytes((int)read1.BaseStream.Length);
...but there must be a better way.

Original Answer (Read Update Below!)
Simply do:
byte[] allData = read1.ReadBytes(int.MaxValue);
The documentation says that it will read all bytes until the end of the stream is reached.
Update
Although this seems elegant, and the documentation seems to indicate that this would work, the actual implementation (checked in .NET 2, 3.5, and 4) allocates a full-size byte array for the data, which will probably cause an OutOfMemoryException on a 32-bit system.
Therefore, I would say that actually there isn't an elegant way.
Instead, I would recommend the following variation of #iano's answer. This variant doesn't rely on .NET 4:
Create an extension method for BinaryReader (or Stream, the code is the same for either).
public static byte[] ReadAllBytes(this BinaryReader reader)
{
const int bufferSize = 4096;
using (var ms = new MemoryStream())
{
byte[] buffer = new byte[bufferSize];
int count;
while ((count = reader.Read(buffer, 0, buffer.Length)) != 0)
ms.Write(buffer, 0, count);
return ms.ToArray();
}
}

There is not an easy way to do this with BinaryReader. If you don't know the count you need to read ahead of time, a better bet is to use MemoryStream:
public byte[] ReadAllBytes(Stream stream)
{
using (var ms = new MemoryStream())
{
stream.CopyTo(ms);
return ms.ToArray();
}
}
To avoid the additional copy when calling ToArray(), you could instead return the Position and buffer, via GetBuffer().

To copy the content of a stream to another, I've solved reading "some" bytes until the end of the file is reached:
private const int READ_BUFFER_SIZE = 1024;
using (BinaryReader reader = new BinaryReader(responseStream))
{
using (BinaryWriter writer = new BinaryWriter(File.Open(localPath, FileMode.Create)))
{
int byteRead = 0;
do
{
byte[] buffer = reader.ReadBytes(READ_BUFFER_SIZE);
byteRead = buffer.Length;
writer.Write(buffer);
byteTransfered += byteRead;
} while (byteRead == READ_BUFFER_SIZE);
}
}

Had the same problem.
First, get the file's size using FileInfo.Length.
Next, create a byte array and set its value to BinaryReader.ReadBytes(FileInfo.Length).
e.g.
var size = new FileInfo(yourImagePath).Length;
byte[] allBytes = yourReader.ReadBytes(System.Convert.ToInt32(size));

Another approach to this problem is to use C# extension methods:
public static class StreamHelpers
{
public static byte[] ReadAllBytes(this BinaryReader reader)
{
// Pre .Net version 4.0
const int bufferSize = 4096;
using (var ms = new MemoryStream())
{
byte[] buffer = new byte[bufferSize];
int count;
while ((count = reader.Read(buffer, 0, buffer.Length)) != 0)
ms.Write(buffer, 0, count);
return ms.ToArray();
}
// .Net 4.0 or Newer
using (var ms = new MemoryStream())
{
stream.CopyTo(ms);
return ms.ToArray();
}
}
}
Using this approach will allow for both reusable as well as readable code.

I use this, which utilizes the underlying BaseStream property to give you the length info you need. It keeps things nice and simple.
Below are three extension methods on BinaryReader:
The first reads from wherever the stream's current position is to the end
The second reads the entire stream in one go
The third utilizes the Range type to specify the subset of data you are interested in.
public static class BinaryReaderExtensions {
public static byte[] ReadBytesToEnd(this BinaryReader binaryReader) {
var length = binaryReader.BaseStream.Length - binaryReader.BaseStream.Position;
return binaryReader.ReadBytes((int)length);
}
public static byte[] ReadAllBytes(this BinaryReader binaryReader) {
binaryReader.BaseStream.Position = 0;
return binaryReader.ReadBytes((int)binaryReader.BaseStream.Length);
}
public static byte[] ReadBytes(this BinaryReader binaryReader, Range range) {
var (offset, length) = range.GetOffsetAndLength((int)binaryReader.BaseStream.Length);
binaryReader.BaseStream.Position = offset;
return binaryReader.ReadBytes(length);
}
}
Using them is then trivial and clear...
// 1 - Reads everything in as a byte array
var rawBytes = myBinaryReader.ReadAllBytes();
// 2 - Reads a string, then reads the remaining data as a byte array
var someString = myBinaryReader.ReadString();
var rawBytes = myBinaryReader.ReadBytesToEnd();
// 3 - Uses a range to read the last 44 bytes
var rawBytes = myBinaryReader.ReadBytes(^44..);

Compare binary files in C#

I want to compare two binary files. One of them is already stored on the server with a pre-calculated CRC32 in the database from when I stored it originally.
I know that if the CRC is different, then the files are definitely different. However, if the CRC is the same, I don't know that the files are. So, I'm looking for a nice efficient way of comparing the two streams: one from the posted file and one from the file system.
I'm not an expert on streams, but I'm well aware that I could easily shoot myself in the foot here as far as memory usage is concerned.

static bool FileEquals(string fileName1, string fileName2)
{
// Check the file size and CRC equality here.. if they are equal...
using (var file1 = new FileStream(fileName1, FileMode.Open))
using (var file2 = new FileStream(fileName2, FileMode.Open))
return FileStreamEquals(file1, file2);
}
static bool FileStreamEquals(Stream stream1, Stream stream2)
{
const int bufferSize = 2048;
byte[] buffer1 = new byte[bufferSize]; //buffer size
byte[] buffer2 = new byte[bufferSize];
while (true) {
int count1 = stream1.Read(buffer1, 0, bufferSize);
int count2 = stream2.Read(buffer2, 0, bufferSize);
if (count1 != count2)
return false;
if (count1 == 0)
return true;
// You might replace the following with an efficient "memcmp"
if (!buffer1.Take(count1).SequenceEqual(buffer2.Take(count2)))
return false;
}
}

I sped up the "memcmp" by using a Int64 compare in a loop over the read stream chunks. This reduced time to about 1/4.
private static bool StreamsContentsAreEqual(Stream stream1, Stream stream2)
{
const int bufferSize = 2048 * 2;
var buffer1 = new byte[bufferSize];
var buffer2 = new byte[bufferSize];
while (true)
{
int count1 = stream1.Read(buffer1, 0, bufferSize);
int count2 = stream2.Read(buffer2, 0, bufferSize);
if (count1 != count2)
{
return false;
}
if (count1 == 0)
{
return true;
}
int iterations = (int)Math.Ceiling((double)count1 / sizeof(Int64));
for (int i = 0; i < iterations; i++)
{
if (BitConverter.ToInt64(buffer1, i * sizeof(Int64)) != BitConverter.ToInt64(buffer2, i * sizeof(Int64)))
{
return false;
}
}
}
}

This is how I would do it if you didn't want to rely on crc:
/// <summary>
/// Binary comparison of two files
/// </summary>
/// <param name="fileName1">the file to compare</param>
/// <param name="fileName2">the other file to compare</param>
/// <returns>a value indicateing weather the file are identical</returns>
public static bool CompareFiles(string fileName1, string fileName2)
{
FileInfo info1 = new FileInfo(fileName1);
FileInfo info2 = new FileInfo(fileName2);
bool same = info1.Length == info2.Length;
if (same)
{
using (FileStream fs1 = info1.OpenRead())
using (FileStream fs2 = info2.OpenRead())
using (BufferedStream bs1 = new BufferedStream(fs1))
using (BufferedStream bs2 = new BufferedStream(fs2))
{
for (long i = 0; i < info1.Length; i++)
{
if (bs1.ReadByte() != bs2.ReadByte())
{
same = false;
break;
}
}
}
}
return same;
}

The accepted answer had an error that was pointed out, but never corrected: stream read calls are not guaranteed to return all bytes requested.
BinaryReader ReadBytes calls are guaranteed to return as many bytes as requested unless the end of the stream is reached first.
The following code takes advantage of BinaryReader to do the comparison:
static private bool FileEquals(string file1, string file2)
{
using (FileStream s1 = new FileStream(file1, FileMode.Open, FileAccess.Read, FileShare.Read))
using (FileStream s2 = new FileStream(file2, FileMode.Open, FileAccess.Read, FileShare.Read))
using (BinaryReader b1 = new BinaryReader(s1))
using (BinaryReader b2 = new BinaryReader(s2))
{
while (true)
{
byte[] data1 = b1.ReadBytes(64 * 1024);
byte[] data2 = b2.ReadBytes(64 * 1024);
if (data1.Length != data2.Length)
return false;
if (data1.Length == 0)
return true;
if (!data1.SequenceEqual(data2))
return false;
}
}
}

if you change that crc to a sha1 signature the chances of it being different but with the same signature are astronomicly small

You can check the length and dates of the two files even before checking the CRC to possibly avoid the CRC check.
But if you have to compare the entire file contents, one neat trick I've seen is reading the bytes in strides equal to the bitness of the CPU. For example, on a 32 bit PC, read 4 bytes at a time and compare them as int32's. On a 64 bit PC you can read 8 bytes at a time. This is roughly 4 or 8 times as fast as doing it byte by byte. You also would probably wanna use an unsafe code block so that you could use pointers instead of doing a bunch of bit shifting and OR'ing to get the bytes into the native int sizes.
You can use IntPtr.Size to determine the ideal size for the current processor architecture.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

streamreader/buffered stream C# - c#

Related

C# FileStream.Read doesn't read last block

FileStream Seek fails on large files at second call

Declaring byte array for getting bytes from a file

An elegant way to consume (all bytes of a) BinaryReader?

Compare binary files in C#

Categories

Resources