C# FileStream.Read doesn't read last block - c#

I read binary file to hex by block.
It is diffrent when I use FileStream.Read and File.ReadAllBytes
FileSteram.Read
int limit = 0;
if (openFileDlg.FileName.Length > 0)
{
fileName = openFileDlg.FileName;
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
fsLen = (int)fs.Length;
int count = 0;
limit = 100;
byte[] read_buff = new byte[limit];
StringBuilder sb = new StringBuilder();
while ( (count = fs.Read(read_buff, 0, limit)) > 0)
{
foreach (byte b in read_buff)
{
sb.Append(Convert.ToString(b, 16).PadLeft(2, '0'));
}
}
rtxb_bin.AppendText(sb.ToString() + "\n");
}
File.ReadAllBytes
if (openFileDlg.FileName.Length > 0)
{
fileName = openFileDlg.FileName;
byte[] fileBytes = File.ReadAllBytes(fileName);
StringBuilder sb2 = new StringBuilder();
foreach (byte b2 in fileBytes)
{
sb2.Append(Convert.ToString(b2, 16).PadLeft(2, '0'));
}
rtxb_allbin.AppendText(sb2.ToString());
}
case 1, reasult is ...
........04c0020f00452a00421346108129844f2138448500208020250405250043188510812e0
and case 2 is
.......04c0020f00452a00421346108129844f2138448500208020250405250043188510812e044f212cc48120c24125404f2069c2c0008bff35f8f401efbd17047
FileStream.Read doesn't read after '12e0'
'44f212cc48120c24125404f2069c2c0008bff35f8f401efbd17047' is missing
How can I read all bytes using FileStream.Read?
Why FileStream.Read doesn't read last block?

Most likely it appears to you that it does not read last block. Suppose you have file of length 102. First iteration of you loop reads first 100 bytes, all is fine. But what happens on second (last) one? You read two bytes into read_buff, which is of length 100. Now that buffer contains 2 bytes of last block and 98 bytes of previous (first) block, because Read doesn't clear the buffer. Then you proceed with:
foreach (byte b in read_buff)
{
sb.Append(Convert.ToString(b, 16).PadLeft(2, '0'));
}
In result, sb has 100 bytes of first block, 2 bytes of last block, and then again 98 bytes of first block. If you don't look too closely, it might appear that it just skipped last block, while in reality it duplicated part of the previous one.
To fix, use count (indicating how much bytes were really read into the buffer) to work only with valid part of read_buff:
for (int i = 0; i < count; i++) {
sb.Append(Convert.ToString(read_buff[i], 16).PadLeft(2, '0'));
}

You need update offset and count.
Sintaxis
public override int Read(
byte[] array,
int offset,
int count
)
Example
public static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
Reference

public static void ReadAndProcessLargeFile(string theFilename, long whereToStartReading = 0)
{
FileInfo info = new FileInfo(theFilename);
long fileLength = info.Length;
long timesToRead = (fileLength / megabyte);
long ctr = 0;
long timesRead = 0;
FileStream fileStram = new FileStream(theFilename, FileMode.Open, FileAccess.Read);
using (fileStram)
{
byte[] buffer = new byte[megabyte];
fileStram.Seek(whereToStartReading, SeekOrigin.Begin);
int bytesRead = 0;
//bytesRead = fileStram.Read(buffer, 0, megabyte);
//ctr = ctr + 1;
while ((bytesRead = fileStram.Read(buffer, 0, megabyte)) > 0)
{
ProcessChunk(buffer, bytesRead);
buffer = new byte[megabyte]; // This solves last read prob
}
}
}
private static void ProcessChunk(byte[] buffer, int bytesRead)
{
// Do the processing here
string utfString = Encoding.UTF8.GetString(buffer, 0, bytesRead);
Console.Write(utfString);
}

Related

Copy files with a chunked stream causes the files to be different sizes due to last read

Could someone be kind enough to explain how I get my files the same size after copying it using a chunked stream? I presume it is because the last chunk still has a buffer size of 2048 so it is putting empty bytes at the end, but I am unsure how I would adjust the last read?
Original size: 15.1 MB (15,835,745 bytes)
New size: 15.1 MB (15,837,184 bytes)
static FileStream incomingFile;
static void Main(string[] args)
{
incomingFile = new FileStream(
#"D:\temp\" + Guid.NewGuid().ToString() + ".png",
FileMode.Create,
FileAccess.Write);
FileCopy();
}
private static void FileCopy()
{
using (Stream source = File.OpenRead(#"D:\temp\test.png"))
{
byte[] buffer = new byte[2048];
var chunkCount = source.Length;
for (int i = 0; i < (chunkCount / 2048) + 1; i++)
{
source.Position = i * 2048;
source.Read(buffer, 0, 2048);
WriteFile(buffer);
}
incomingFile.Close();
}
}
private static void WriteFile(byte[] buffer)
{
incomingFile.Write(buffer, 0, buffer.Length);
}
The last buffer read does not necessary contain exactly 2048 bytes (it can well be incomplete). Imagine, we have a file of 5000 bytes; in this case will read 3 chunks: 2 complete and 1 incomplete
2048
2048
904 the last incomplete buffer
Code:
using (Stream source = File.OpenRead(#"D:\temp\test.png"))
{
byte[] buffer = new byte[2048];
var chunkCount = source.Length;
for (int i = 0; i < (chunkCount / 2048) + 1; i++)
{
source.Position = i * 2048;
// size - number of actually bytes read
int size = source.Read(buffer, 0, 2048);
// if we bytes to write, do it
if (size > 0)
WriteFile(buffer, size);
}
incomingFile.Close();
}
...
private static void WriteFile(byte[] buffer, int size = -1)
{
incomingFile.Write(buffer, 0, size < 0 ? buffer.Length : size);
}
In your case you write 15837184 == 7733 * 2048 bytes (7733 complete chunks) when you should write 15835745 == 7732 * 2048 + 609 bytes - 7732 complete chunks and the last incomplete one of 609 bytes

C# split byte array from file

Hello I'm doing an encryption algorithm which reads bytes from file (any type) and outputs them into a file. The problem is my encryption program takes only blocks of 16 bytes so if the file is bigger it has to be split into blocks of 16, or if there's a way to read 16 bytes from the file each time it's fine.
The algorithm is working fine with hard coded input of 16 bytes. The ciphered result has to be saved in a list or array because it has to be deciphered the same way later. I can't post all my program but here's what I do in main so far and cannot get results
static void Main(String[] args)
{
byte[] bytes = File.ReadAllBytes("path to file");
var stream = new StreamReader(new MemoryStream(bytes));
byte[] cipherText = new byte[16];
byte[] decipheredText = new byte[16];
Console.WriteLine("\nThe message is: ");
Console.WriteLine(stream.ReadToEnd());
AES a = new AES(keyInput);
var list1 = new List<byte[]>();
for (int i = 0; i < bytes.Length; i+=16)
{
a.Cipher(bytes, cipherText);
list1.Add(cipherText);
}
Console.WriteLine("\nThe resulting ciphertext is: ");
foreach (byte[] b in list1)
{
ToBytes(b);
}
}
I know that my loops always add the first 16 bytes from the byte array but I tried many ways and nothing work. It won't let me index the bytes array or copy an item to a temp variable like temp = bytes[i]. The ToBytes method is irrelevant, it just prints the elements as bytes.
I would like to recommend you to change the interface for your Cipher() method: instead of passing the entire array, it would be better to pass the source and destination arrays and offset - block by block encryption.
Pseudo-code is below.
void Cipher(byte[] source, int srcOffset, byte[] dest, int destOffset)
{
// Cipher these bytes from (source + offset) to (source + offset + 16),
// write the cipher to (dest + offset) to (dest + offset + 16)
// Also I'd recommend to check that the source and dest Length is less equal to (offset + 16)!
}
Usage:
For small files (one memory allocation for destination buffer, block by block encryption):
// You can allocate the entire destination buffer before encryption!
byte[] sourceBuffer = File.ReadAllBytes("path to file");
byte[] destBuffer = new byte[sourceBuffer.Length];
// Encrypt each block.
for (int offset = 0; i < sourceBuffer.Length; offset += 16)
{
Cipher(sourceBuffer, offset, destBuffer, offset);
}
So, the main advantage of this approach - it elimitates additional memory allocations: the destination array is allocated at once. There is also no copy-memory operations.
For files of any size (streams, block by block encryption):
byte[] inputBlock = new byte[16];
byte[] outputBlock = new byte[16];
using (var inputStream = File.OpenRead("input path"))
using (var outputStream = File.Create("output path"))
{
int bytesRead;
while ((bytesRead = inputStream.Read(inputBlock, 0, inputBlock.Length)) > 0)
{
if (bytesRead < 16)
{
// Throw or use padding technique.
throw new InvalidOperationException("Read block size is not equal to 16 bytes");
// Fill the remaining bytes of input block with some bytes.
// This operation for last block is called "padding".
// See http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Padding
}
Cipher(inputBlock, 0, outputBlock, 0);
outputStream.Write(outputBlock, 0, outputBlock.Length);
}
}
No need to read the whole mess into memory if you can only process it a bit at a time...
var filename = #"c:\temp\foo.bin";
using(var fileStream = new FileStream(filename, FileMode.Open))
{
var buffer = new byte[16];
var bytesRead = 0;
while((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) > 0)
{
// do whatever you need to with the next 16-byte block
Console.WriteLine("Read {0} bytes: {1}",
bytesRead,
string.Join(",", buffer));
}
}
You can use Array.Copy
byte[] temp = new byte[16];
Array.Copy(bytes, i, temp, 0, 16);

FileStream Seek fails on large files at second call

I'm working with large files , beginning from 10Gb. I'm loading the parts of the file in the memory for processing. Following code works fine for smaller files (700Mb)
byte[] byteArr = new byte[layerPixelCount];
using (FileStream fs = File.OpenRead(recFileName))
{
using (BinaryReader br = new BinaryReader(fs))
{
fs.Seek(offset, SeekOrigin.Begin);
for (int i = 0; i < byteArr.Length; i++)
{
byteArr[i] = (byte)(br.ReadUInt16() / 256);
}
}
}
After opening a 10Gb file, the first run of this function is OK. But the second Seek() throws an IO exception:
An attempt was made to move the file pointer before the beginning of the file.
The numbers are:
fs.Length = 11998628352
offset = 4252580352
byteArr.Length = 7746048
I assumed that GC didn't collect the closed fs reference before the second call and tried
GC.Collect();
GC.WaitForPendingFinalizers();
but no luck.
Any help is apreciated
I'm guessing it's because either your signed integer indexer or offset is rolling over to negative values. Try declaring offset and i as long.
//Offest is now long
long offset = 4252580352;
byte[] byteArr = new byte[layerPixelCount];
using (FileStream fs = File.OpenRead(recFileName))
{
using (BinaryReader br = new BinaryReader(fs))
{
fs.Seek(offset, SeekOrigin.Begin);
for (long i = 0; i < byteArr.Length; i++)
{
byteArr[i] = (byte)(br.ReadUInt16() / 256);
}
}
}
My following written code logic is appropriate with large files beyond 4GB. The key issue to notice is the LONG data type used with the SEEK method. As a LONG is able to point beyond 2^32 data boundaries. In this example, the code is processing first processing the large file in chunks of 1GB, after the large whole 1GB chunks are processed, the left over (<1GB) bytes are processed. I use this code with calculating the CRC of files beyond the 4GB size. (using https://crc32c.machinezoo.com/ for the crc32c calculation in this example)
private uint Crc32CAlgorithmBigCrc(string fileName)
{
uint hash = 0;
byte[] buffer = null;
FileInfo fileInfo = new FileInfo(fileName);
long fileLength = fileInfo.Length;
int blockSize = 1024000000;
decimal div = fileLength / blockSize;
int blocks = (int)Math.Floor(div);
int restBytes = (int)(fileLength - (blocks * blockSize));
long offsetFile = 0;
uint interHash = 0;
Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
bool firstBlock = true;
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
buffer = new byte[blockSize];
using (BinaryReader br = new BinaryReader(fs))
{
while (blocks > 0)
{
blocks -= 1;
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(blockSize);
if (firstBlock)
{
firstBlock = false;
interHash = Crc32CAlgorithm.Compute(buffer);
hash = interHash;
}
else
{
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
offsetFile += blockSize;
}
if (restBytes > 0)
{
Array.Resize(ref buffer, restBytes);
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(restBytes);
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
buffer = null;
}
}
//MessageBox.Show(hash.ToString());
//MessageBox.Show(hash.ToString("X"));
return hash;
}

Processing C# filestream input in WHILE loop causing execution time error

I have a C# console app that I'm trying to create that processes all the files in a given directory and writes output to another given directory. I want to process the input files X bytes at a time.
namespace FileConverter
{
class Program
{
static void Main(string[] args)
{
string srcFolder = args[0];
string destFolder = args[1];
string[] srcFiles = Directory.GetFiles(srcFolder);
for (int s = 0; s < srcFiles.Length; s++)
{
byte[] fileBuffer;
int numBytesRead = 0;
int readBuffer = 10000;
FileStream srcStream = new FileStream(srcFiles[s], FileMode.Open, FileAccess.Read);
int fileLength = (int)srcStream.Length;
string destFile = destFolder + "\\" + Path.GetFileName(srcFiles[s]) + "-processed";
FileStream destStream = new FileStream(destFile, FileMode.OpenOrCreate, FileAccess.Write);
//Read and process the source file by some chunk of bytes at a time
while (numBytesRead < fileLength)
{
fileBuffer = new byte[readBuffer];
//Read some bytes into the fileBuffer
//TODO: This doesn't work on subsequent blocks
int n = srcStream.Read(fileBuffer, numBytesRead, readBuffer);
//If we didn't read anything, there's no more to process
if (n == 0)
break;
//Process the fileBuffer
for (int i = 0; i < fileBuffer.Length; i++)
{
//Process each byte in the array here
}
//Write data
destStream.Write(fileBuffer, numBytesRead, readBuffer);
numBytesRead += readBuffer;
}
srcStream.Close();
destStream.Close();
}
}
}
}
I'm running into an error at execution time at:
//Read some bytes into the fileBuffer
//TODO: This doesn't work on subsequent blocks
int n = srcStream.Read(fileBuffer, numBytesRead, readBuffer);
I don't want to load the entire file into memory, as it could possibly be many gigabytes in size. I really want to be able to read some number of bytes, process them, write them out to a file, and then read in the next X bytes and repeat.
It gets through one iteration of the loop, and then dies on the second. The error I get is:
"Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
The sample file I'm working with is about 32k.
Can anyone tell me what I'm doing wrong here?
The second parameter to Read is not the offset into the file - it is the offset into the buffer at which to start writing data. So just pass 0.
Also, don't assume the buffer is filled each time: you should only process "n" bytes from the buffer. And the buffer should be reused between iterations.
If you need to read exactly a number of bytes:
static void ReadOrThrow(Stream source, byte[] buffer, int count) {
int read, offset = 0;
while(count > 0 && (read = source.Read(buffer, offset, count)) > 0) {
offset += read;
count -= read;
}
if(count != 0) throw new EndOfStreamException();
}
Note that Write works similarly, so you need to pass 0 as the offset and n as the count.
It should be
destStream.Write(fileBuffer, numBytesRead, n);
numBytesRead += n;
because n is the actual number of bytes that was read

Problem in splitting a file

int bufferlength = 12488;
int pointer = 1;
int offset = 0;
int length = 0;
FileStream fstwrite = new FileStream("D:\\Movie.wmv", FileMode.Create);
while (pointer != 0)
{
byte[] buff = new byte[bufferlength];
FileStream fst = new FileStream("E:\\Movie.wmv", FileMode.Open);
pointer = fst.Read(buff, 0, bufferlength);
fst.Close();
fstwrite.Write(buff, offset , pointer);
offset += pointer;
}
I used the above code for splitting a file and place it in other drive.Im not able to set the correct offset and length for this routine can anyone help me to fix this
splitting in the sense ,i split it in "x" kbs and pass it somewhere make the same file in some other location
I find it atlast ,thanks to evry one who gave their valueble responses.
Currently you're always reading from the start of the file... and even if you weren't you'd just be copying the whole file.
Here's some code which will actually split a single file into multiple files:
public static void SplitFile(string inputFile,
string outputPrefix,
int chunkSize)
{
byte[] buffer = new byte[chunkSize];
using (Stream input = File.OpenRead(inputFile))
{
int index = 0;
while (input.Position < input.Length)
{
using (Stream output = File.Create(outputPrefix + index))
{
int chunkBytesRead = 0;
while (chunkBytesRead < chunkSize)
{
int bytesRead = input.Read(buffer,
chunkBytesRead,
chunkSize - chunkBytesRead);
// End of input
if (bytesRead == 0)
{
break;
}
chunkBytesRead += bytesRead;
}
output.Write(buffer, 0, chunkBytesRead);
}
index++;
}
}
}
Your reading bufferlength of bytes. Shouldn't you set the offset like this then?
offset += bufferlength;
Don't open your source file inside the loop, or you'll always read the first chunk.
Open it before the loop, then make sure your offset is applied to the read.

Categories