How to read file by chunks

How to read file by chunks - c#

I'm a little bit confused aboot how i should read large file(> 8GB) by chunks in case each chunk has own size.
If I know chunk size it looks like code bellow:
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, ProgramOptions.BufferSizeForChunkProcessing))
{
using (BufferedStream bs = new BufferedStream(fs, ProgramOptions.BufferSizeForChunkProcessing))
{
byte[] buffer = new byte[ProgramOptions.BufferSizeForChunkProcessing];
int byteRead;
while ((byteRead = bs.Read(buffer, 0, ProgramOptions.BufferSizeForChunkProcessing)) > 0)
{
byte[] originalBytes;
using (MemoryStream mStream = new MemoryStream())
{
mStream.Write(buffer, 0, byteRead);
originalBytes = mStream.ToArray();
}
}
}
}
But imagine, I've read large file by chunks made some coding with each chunk(chunk's size after that operation has been changed) and written to another new file all processed chunks. And now I need to do the opposite operation. But I don't know exactly chunk size. I have an idea. After each chunk has been processed i have to write new chunk size before chunk bytes. Like this:
Number of block bytes
Block bytes
Number of block bytes
Block bytes
So in that case first what i need to do is read chunk's header and learn what is chunk size exactly. I read and write to file only byte arrays. But I have a question - how should look chunk's header ? May be header have to contain some boundary ?

If the file is rigidly structured so that each block of data is preceded by a 32-bit length value, then it is easy to read. The "header" for each block is just the 32-bit length value.
If you want to read such a file, the easiest way is probably to encapsulate the reading into a method that returns IEnumerable<byte[]> like so:
public static IEnumerable<byte[]> ReadChunks(string path)
{
var lengthBytes = new byte[sizeof(int)];
using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
int n = fs.Read(lengthBytes, 0, sizeof (int)); // Read block size.
if (n == 0) // End of file.
yield break;
if (n != sizeof(int))
throw new InvalidOperationException("Invalid header");
int blockLength = BitConverter.ToInt32(lengthBytes, 0);
var buffer = new byte[blockLength];
n = fs.Read(buffer, 0, blockLength);
if (n != blockLength)
throw new InvalidOperationException("Missing data");
yield return buffer;
}
}
Then you can use it simply:
foreach (var block in ReadChunks("MyFileName"))
{
// Process block.
}
Note that you don't need to provide your own buffering.

try this
public static IEnumerable<byte[]> ReadChunks(string fileName)
{
const int MAX_BUFFER = 1048576;// 1MB
byte[] filechunk = new byte[MAX_BUFFER];
int numBytes;
using (var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
{
long remainBytes = fs.Length;
int bufferBytes = MAX_BUFFER;
while (true)
{
if (remainBytes <= MAX_BUFFER)
{
filechunk = new byte[remainBytes];
bufferBytes = (int)remainBytes;
}
if ((numBytes = fs.Read(filechunk, 0, bufferBytes)) > 0)
{
remainBytes -= bufferBytes;
yield return filechunk;
}
else
{
break;
}
}
}
}

Related

Read Chunk of data from file based on offset and length

int n = 0;
string encodeString = string.Empty;
using (FileStream fsSource = new FileStream("test.pdf", FileMode.Open, FileAccess.Read))
{
byte[] bytes = new byte[count];
n = fsSource.Read(bytes, offset, count);
encodeString = System.Convert.ToBase64String(bytes);
}
The above code is working fine if I provide offset-0 and length-1024, but the second time if I provide Offset-1024 and length-1024 it is returning an error.
My requirement is I want to get byte array data from offset to length.
1st chunk = 0-1024
2nd chunk = 1024-2048
..
Last chunk = SomeValue -Filesize.
Example in Node.js using readChunk.sync(file_path, Number(offset), Number(size)); - this code is able to get the byte array of data from offset to length.

public static string ReadFileStreamInChunks()
{
const int readChunkBufferLength = 1024;
string filePath = "test.pdf";
string encodeString = string.Empty;
var readChunk = new char[readChunkBufferLength];
int readChunkLength;
using (StringWriter sw = new StringWriter())
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
{
do
{
readChunkLength = sr.ReadBlock(readChunk, 0, readChunkBufferLength);
sw.Write(readChunk, 0, readChunkLength);
} while (readChunkLength > 0);
return sw.ToString();
}
}

actually i think your problem is understanding the concepts of these parameters in your code , Count is your Chunk Size and offset is where to start Reading so if you want to read (1): a part of File to end Just Add To Your Offset (Offset + count of Bytes you Want To Seek) but (2): if You Want to Read A Part Of File From Middle You Shouldn't Modify Count That Is Your Chunk Size You Should modify Where You Write Your Byte Array Usually It's a Do-While Loop like :
long position = 0;
do
{
// read bytes from input stream
int bytesRead = request.FileByteStream.Read(buffer, 0, chunkSize);
if (bytesRead == 0)
{
break;
}
// write bytes to output stream
writeStream.Write(buffer, 0, bytesRead);
position += bytesRead;
if(position == "the value you want")
break;
} while (true);

C# Split a file into two byte arrays

Because the maximum value of a byte array is 2GB, lets say i have a larger file and i need to convert it to a byte array. Since i can't hold the whole file, how should i convert it into two?
I tried:
long length = new System.IO.FileInfo(#"c:\a.mp4").Length;
int chunkSize = Convert.ToInt32(length / 2);
byte[] part2;
FileStream fileStream = new FileStream(filepath, FileMode.Open, FileAccess.Read);
try
{
part2 = new byte[chunkSize]; // create buffer
fileStream.Read(part2, 0, chunkSize);
}
finally
{
fileStream.Close();
}
byte[] part3;
fileStream = new FileStream(filepath, FileMode.Open, FileAccess.Read);
try
{
part3 = new byte[chunkSize]; // create buffer
fileStream.Read(part3, 5, (int)(length - (long)chunkSize));
}
finally
{
fileStream.Close();
}
but it's not working.
Any ideas?

You can use a StreamReader to read in file too large to read into a byte array
const int max = 1024*1024;
public void ReadALargeFile(string file, int start = 0)
{
FileStream fileStream = new FileStream(file, FileMode.Open,FileAccess.Read);
using (fileStream)
{
byte[] buffer = new byte[max];
fileStream.Seek(start, SeekOrigin.Begin);
int bytesRead = fileStream.Read(buffer, start, max);
while(bytesRead > 0)
{
DoSomething(buffer, bytesRead);
bytesRead = fileStream.Read(buffer, start, max);
}
}
}

If you are working with extremely large files, you should use MemoryMappedFile, which maps a physical file to a memory space:
using (var mmf = MemoryMappedFile.CreateFromFile(#"c:\path\to\big.file"))
{
using (var accessor = mmf.CreateViewAccessor())
{
byte myValue = accessor.ReadByte(someOffset);
accessor.Write((byte)someValue);
}
}
See also: MemoryMappedViewAccessor
You can also read/write chunks of the file with the different methods in MemoryMappedViewAccessor.

This was my solution:
byte[] part1;
byte[] part2;
bool odd = false;
int chunkSize = Convert.ToInt32(length/2);
if (length % 2 == 0)
{
part1 = new byte[chunkSize];
part2 = new byte[chunkSize];
}
else
{
part1 = new byte[chunkSize];
part2 = new byte[chunkSize + 1];
odd = true;
}
FileStream fileStream = new FileStream(filepath, FileMode.Open, FileAccess.Read);
using (fileStream)
{
fileStream.Seek(0, SeekOrigin.Begin);
int bytesRead = fileStream.Read(part1, 0, chunkSize);
if (odd)
{
bytesRead = fileStream.Read(part2, 0, chunkSize + 1);
}
else
{
bytesRead = fileStream.Read(part2, 0, chunkSize);
}
}

uncompressed file is bigger than original file in GZIP

i'm using the following function to compress(thanks to http://www.dotnetperls.com/):
public static void CompressStringToFile(string fileName, string value)
{
// A.
// Write string to temporary file.
string temp = Path.GetTempFileName();
File.WriteAllText(temp, value);
// B.
// Read file into byte array buffer.
byte[] b;
using (FileStream f = new FileStream(temp, FileMode.Open))
{
b = new byte[f.Length];
f.Read(b, 0, (int)f.Length);
}
// C.
// Use GZipStream to write compressed bytes to target file.
using (FileStream f2 = new FileStream(fileName, FileMode.Create))
using (GZipStream gz = new GZipStream(f2, CompressionMode.Compress, false))
{
gz.Write(b, 0, b.Length);
}
}
and for decompress:
static byte[] Decompress(byte[] gzip)
{
// Create a GZIP stream with decompression mode.
// ... Then create a buffer and write into while reading from the GZIP stream.
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
byte[] buffer = new byte[size];
using (MemoryStream memory = new MemoryStream())
{
int count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}
}
so my goal is actually compress log files and than to decompress them in memory and compare the uncompressed file to the original file in order to check that the compression succeeded and i'm able to open the compressed file successfuly.
the problem is that the uncompressed file is most of the time bigger than the original file and my compare check is failing altough the compression probably succeeded.
any idea why ?
btw here how i compare the uncompressed file to the original file:
static bool FileEquals(byte[] file1, byte[] file2)
{
if (file1.Length == file2.Length)
{
for (int i = 0; i < file1.Length; i++)
{
if (file1[i] != file2[i])
{
return false;
}
}
return true;
}
return false;
}

Try this method to compress a file:
public static byte[] Compress(byte[] raw)
{
using (MemoryStream memory = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(memory,
CompressionMode.Compress, true))
{
gzip.Write(raw, 0, raw.Length);
}
return memory.ToArray();
}
}
}
And this to decompress :
static byte[] Decompress(byte[] gzip)
{
// Create a GZIP stream with decompression mode.
// ... Then create a buffer and write into while reading from the GZIP stream.
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
byte[] buffer = new byte[size];
using (MemoryStream memory = new MemoryStream())
{
int count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}
}
}
Tell me if it worked.
Goodluck.

Think you'd be better off with the simplest API call, try Stream.CopyTo(). I can't find the error in your code. If I was working on it, I'd probably make sure everything is getting flushed properly.. can't recall if GZipStream is going to flush its output to FileStream when the using block closes.. but then you are also saying that the final file is larger, not smaller.
Anyhow, best policy in my experience.. don't rewrite gotcha prone code when you don't need to. At least you tested it ;)

Reading a stream that may have non-ASCII characters

I have an application that reads string data in from a stream. The string data is typically in English but on occasion it encounters something like 'Jalapeño' and the 'ñ' comes out as '?'. In my implementation I'd prefer to read the stream contents into a byte array but I could get by reading the contents into a string. Any idea what I can do to make this work right?
Current code is as follows:
byte[] data = new byte[len]; // len is known a priori
byte[] temp = new byte[2];
StreamReader sr = new StreamReader(input_stream);
int position = 0;
while (!sr.EndOfStream)
{
int c = sr.Read();
temp = System.BitConverter.GetBytes(c);
data[position] = temp[0];
position++;
}
input_stream.Close();
sr.Close();

You can pass the encoding to the StreamReader as in:
StreamReader sr = new StreamReader(input_stream, Encoding.UTF8);
However, I understand that Encoding.UTF8 is used by default according to the documentation.
Update
The following reads 'Jalapeño' fine:
byte[] bytes;
using (var stream = new FileStream("input.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
{
var index = 0;
var count = (int) stream.Length;
bytes = new byte[count];
while (count > 0)
{
int n = stream.Read(bytes, index, count);
if (n == 0)
throw new EndOfStreamException();
index += n;
count -= n;
}
}
// test
string s = Encoding.UTF8.GetString(bytes);
Console.WriteLine(s);
As does this:
byte[] bytes;
using (var stream = new FileStream("input.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
{
var reader = new StreamReader(stream);
string text = reader.ReadToEnd();
bytes = Encoding.UTF8.GetBytes(text);
}
// test
string s = Encoding.UTF8.GetString(bytes);
Console.WriteLine(s);
From what I understand the 'ñ' character is represented as 0xc391 in the text when the text is stored with UTF encoding. When you only read a byte, you'll loose data.
I'd suggest reading the whole stream as a byte array (the first example) and then do the encoding. Or use StreamReader to do the work for you.

Since you're trying to fill the contents into a byte-array, don't bother with the reader - it isn't helping you. Use just the stream:
byte[] data = new byte[len];
int read, offset = 0;
while(len > 0 &&
(read = input_stream.Read(data, offset, len)) > 0)
{
len -= read;
offset += read;
}
if(len != 0) throw new EndOfStreamException();

C# Trying to replace a byte while using MemoryStream class

I get a text file from a mainframe and sometimes there are some 0x0D injected into the middle of the text lines.
The previos programmer created a method using the FileStream class. This method works fine but is taking around 30 minutes to go thru the entire file.
My thought was to pass the text lines that are needed (about 25 lines) to a method to decrease the processing time.
I've been working with the MemoryStream class but am having issue where it does not find the 0x0D control code.
Here is the current FileStream method:
private void ReplaceFileStream(string strInputFile)
{
FileStream fileStream = new FileStream(strInputFile, FileMode.Open, FileAccess.ReadWrite);
byte filebyte;
while (fileStream.Position < fileStream.Length)
{
filebyte = (byte)fileStream.ReadByte();
if (filebyte == 0x0D)
{
filebyte = 0x20;
fileStream.Position = fileStream.Position - 1;
fileStream.WriteByte(filebyte);
}
}
fileStream.Close();
}
and here is the MemoryStream method:
private void ReplaceMemoryStream(string strInputLine)
{
byte[] byteArray = Encoding.ASCII.GetBytes(strInputLine);
MemoryStream fileStream = new MemoryStream(byteArray);
byte filebyte;
while (fileStream.Position < fileStream.Length)
{
filebyte = (byte)fileStream.ReadByte();
if (filebyte == 0x0D)
{
filebyte = 0x20;
fileStream.Position = fileStream.Position - 1;
fileStream.WriteByte(filebyte);
}
}
fileStream.Close();
}
As I have not used the MemoryStream class before am not that familar with it. Any tips or ideas?

I don't know the size of your files, but if they are small enough that you can load the whole thing in memory at once, then you could do something like this:
private void ReplaceFileStream(string strInputFile)
{
byte[] fileBytes = File.ReadAllBytes(strInputFile);
bool modified = false;
for(int i=0; i < fileBytes.Length; ++i)
{
if (fileByte[i] == 0x0D)
{
fileBytes[i] = 0x20;
modified = true;
}
}
if (modified)
{
File.WriteAllBytes(strInputFile, fileBytes);
}
}
If you can't read the whole file in at once, then you should switch to a buffered reading type of setup, here is an example that reads from the file, writes to a temp file, then in the end copies the temp file over the original file. This should yield better performance then reading a file one byte at a time:
private void ReplaceFileStream(string strInputFile)
{
string tempFile = Path.GetTempFileName();
try
{
using(FileStream input = new FileStream(strInputFile,
FileMode.Open, FileAccess.Read))
using(FileStream output = new FileStream(tempFile,
FileMode.Create, FileAccess.Write))
{
byte[] buffer = new byte[4096];
bytesRead = input.Read(buffer, 0, 4096);
while(bytesRead > 0)
{
for(int i=0; i < bytesRead; ++i)
{
if (buffer[i] == 0x0D)
{
buffer[i] = 0x20;
}
}
output.Write(buffer, 0, bytesRead);
bytesRead = input.Read(buffer, 0, 4096);
}
output.Flush();
}
File.Copy(tempFile, strInputFile);
}
finally
{
if (File.Exists(tempFile))
{
File.Delete(tempFile);
}
}
}

if your replacement code does not find the 0x0D in the stream and the previous method with the FileStream does it, I think it could be because of the Encoding you are using to get the bytes of the file, you can try with some other encoding types.
otherwise your code seems to be fine, I would use a using around the MemoryStream to be sure it gets closed and disposed, something like this:
using(var fileStream = new MemoryStream(byteArray))
{
byte filebyte;
// your while loop...
}
looking at your code I am not 100% sure the changes you make to the memory stream will be persisted; Actually I think that if you do not save it after the changes, your changes will be lost. I can be wrong in this but you should test and see, if it does not save you should use StreamWriter to save it after the changes.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to read file by chunks - c#

Related

Read Chunk of data from file based on offset and length

C# Split a file into two byte arrays

uncompressed file is bigger than original file in GZIP

Reading a stream that may have non-ASCII characters

C# Trying to replace a byte while using MemoryStream class

Categories

Resources