Writing and reading streams with offset to reduce disc seeks

Writing and reading streams with offset to reduce disc seeks - c#

I am reading a file and writing stream of that to a file, what I want to do is writing multiple files in a single file and reading them by their offset.
While writing the files, I understand that i need to know the file offset and length of the stream to read back the file.
var file = #"d:\foo.pdf";
var stream = File.ReadAllBytes(file);
// here i have the length of the
Console.WriteLine(stream.LongLength);
using (var br = new BinaryWriter(File.Open(#"d:\foo.bin", FileMode.OpenOrCreate)))
{
br.Write(stream);
}
I need to find the offset while writing multiple files.
Also while reading back the files, how do I start from an offset and read forwards as long as the length?
Finally, Does this method reduces number of disc seeks?

To read back various fragments, you will need to store the individual file lengths. For example:
using(var dest = File.Open(#"d:\foo.bin", FileMode.OpenOrCreate))
{
Append(dest, file);
Append(dest, anotherFile);
}
...
static void AppendFile(Stream dest, string path)
{
using(var source = File.OpenRead(path))
{
var lenHeader = BitConverter.GetBytes(source.Length);
dest.Write(lenHeader, 0, 4);
source.CopyTo(dest);
}
}
Then to read back you can do things like:
using(var source = File.OpenRead(...))
{
int len = ReadLength(source);
stream.Seek(len, SeekOrigin.Current); // skip the first file
len = ReadLength(source);
// TODO: now read len-many bytes from the second file
}
static int ReadLength(Stream stream)
{
byte[] buffer = new byte[4];
int count = 4, offset = 0, read;
while(count != 0 && (read = stream.Read(buffer, offset, count)) > 0)
{
count -= read;
offset += read;
}
if (count != 0) throw new EndOfStreamException();
return BitConverter.ToInt32(buffer, 0);
}
As for reading len-many bytes; you can either just keep track of it and decrement it while reading, or you can create a length-limited Stream wrapper. Either works.

Related

Processing C# filestream input in WHILE loop causing execution time error

I have a C# console app that I'm trying to create that processes all the files in a given directory and writes output to another given directory. I want to process the input files X bytes at a time.
namespace FileConverter
{
class Program
{
static void Main(string[] args)
{
string srcFolder = args[0];
string destFolder = args[1];
string[] srcFiles = Directory.GetFiles(srcFolder);
for (int s = 0; s < srcFiles.Length; s++)
{
byte[] fileBuffer;
int numBytesRead = 0;
int readBuffer = 10000;
FileStream srcStream = new FileStream(srcFiles[s], FileMode.Open, FileAccess.Read);
int fileLength = (int)srcStream.Length;
string destFile = destFolder + "\\" + Path.GetFileName(srcFiles[s]) + "-processed";
FileStream destStream = new FileStream(destFile, FileMode.OpenOrCreate, FileAccess.Write);
//Read and process the source file by some chunk of bytes at a time
while (numBytesRead < fileLength)
{
fileBuffer = new byte[readBuffer];
//Read some bytes into the fileBuffer
//TODO: This doesn't work on subsequent blocks
int n = srcStream.Read(fileBuffer, numBytesRead, readBuffer);
//If we didn't read anything, there's no more to process
if (n == 0)
break;
//Process the fileBuffer
for (int i = 0; i < fileBuffer.Length; i++)
{
//Process each byte in the array here
}
//Write data
destStream.Write(fileBuffer, numBytesRead, readBuffer);
numBytesRead += readBuffer;
}
srcStream.Close();
destStream.Close();
}
}
}
}
I'm running into an error at execution time at:
//Read some bytes into the fileBuffer
//TODO: This doesn't work on subsequent blocks
int n = srcStream.Read(fileBuffer, numBytesRead, readBuffer);
I don't want to load the entire file into memory, as it could possibly be many gigabytes in size. I really want to be able to read some number of bytes, process them, write them out to a file, and then read in the next X bytes and repeat.
It gets through one iteration of the loop, and then dies on the second. The error I get is:
"Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
The sample file I'm working with is about 32k.
Can anyone tell me what I'm doing wrong here?

The second parameter to Read is not the offset into the file - it is the offset into the buffer at which to start writing data. So just pass 0.
Also, don't assume the buffer is filled each time: you should only process "n" bytes from the buffer. And the buffer should be reused between iterations.
If you need to read exactly a number of bytes:
static void ReadOrThrow(Stream source, byte[] buffer, int count) {
int read, offset = 0;
while(count > 0 && (read = source.Read(buffer, offset, count)) > 0) {
offset += read;
count -= read;
}
if(count != 0) throw new EndOfStreamException();
}
Note that Write works similarly, so you need to pass 0 as the offset and n as the count.

It should be
destStream.Write(fileBuffer, numBytesRead, n);
numBytesRead += n;
because n is the actual number of bytes that was read

Setting the offset in a stream

It says here msdn.microsoft.com/en-us/library/system.io.stream.read.aspx that the Stream.Read and Stream.Write methods both advance the position/offset in the stream automatically so why is the examples here http://msdn.microsoft.com/en-us/library/system.io.stream.read.aspx and http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx manually changing the offset?
Do you only set the offset in a loop if you know the size of the stream and set it to 0 if you don't know the size and using a buffer?
// Now read s into a byte buffer.
byte[] bytes = new byte[s.Length];
int numBytesToRead = (int) s.Length;
int numBytesRead = 0;
while (numBytesToRead > 0)
{
// Read may return anything from 0 to 10.
int n = s.Read(bytes, numBytesRead, 10);
// The end of the file is reached.
if (n == 0)
{
break;
}
numBytesRead += n;
numBytesToRead -= n;
}
and
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
byte[] buffer = new byte[size];
using (MemoryStream memory = new MemoryStream())
{
int count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}

The offset is actually the offset of the buffer, not the stream. Streams are advanced automatically as they are read.

Edit (to the edited question):
In none of the code snippets you pasted into the question I see any stream offset being set.
I think you are mistaking the calculation of bytes to read vs. bytes received. This protocol may seem funny (why would you receive fewer bytes than requested?) but it makes sense when you consider that you might be reading from a high-latency packet oriented source (think: network sockets).
You might be receiving 6 characters in one burst (from a TCP packet) and only receive the remaining 4 characters in your next read (when the next packet has arrived).
Edit In response to your linked example from the comment:
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
// ... snip
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
It appears that the coders use prior knowledge about the underlying stream implementation, that stream.Read will always return 0 OR the size requested. That seems like a risky bet, to me. But if the docs for GZipStream do state that, it could be alright. However, since the MSDN samples use a generic Stream variable, it is (way) more correct to check the exact number of bytes read.
The first linked example uses a MemoryStream in both Write and Read fashion. The position is reset in between, so the data that was written first will be read:
Stream s = new MemoryStream();
for (int i = 0; i < 100; i++)
{
s.WriteByte((byte)i);
}
s.Position = 0;
The second example linked does not set the stream position. You'd typically have seen a call to Seek if it did. You maybe confusing the offsets into the data buffer with the stream position?

GZIP file Total length in C#

I have a zipped file having size of several GBs, I want to get the size of Unzipped contents but don't want to actually unzip the file in C#, What might be the Library I can use? When I right click on the .gz file and go to Properties then under the Archive Tab there is a property name TotalLength which is showing this value. But I want to get it Programmatically using C#.. Any idea?

The last 4 bytes of the gz file contains the length.
So it should be something like:
using(var fs = File.OpenRead(path))
{
fs.Position = fs.Length - 4;
var b = new byte[4];
fs.Read(b, 0, 4);
uint length = BitConverter.ToUInt32(b, 0);
Console.WriteLine(length);
}

The last for bytes of a .gz file are the uncompressed input size modulo 2^32. If your uncompressed file isn't larger than 4GB, just read the last 4 bytes of the file. If you have a larger file, I'm not sure that it's possible to get without uncompressing the stream.

EDIT: See the answers by Leppie and Gabe; the only reason I'm keeping this (rather than deleting it) is that it may be necessary if you suspect the length is > 4GB
For gzip, that data doesn't seem to be directly available - I've looked at GZipStream and the SharpZipLib equivalent - neither works. The best I can suggest is to run it locally:
long length = 0;
using(var fs = File.OpenRead(path))
using (var gzip = new GZipStream(fs, CompressionMode.Decompress)) {
var buffer = new byte[10240];
int count;
while ((count = gzip.Read(buffer, 0, buffer.Length)) > 0) {
length += count;
}
}
If it was a zip, then SharpZipLib:
long size = 0;
using(var zip = new ZipFile(path)) {
foreach (ZipEntry entry in zip) {
size += entry.Size;
}
}

public static long mGetFileLength(string strFilePath)
{
if (!string.IsNullOrEmpty(strFilePath))
{
System.IO.FileInfo info = new System.IO.FileInfo(strFilePath);
return info.Length;
}
return 0;
}

Problem in splitting a file

int bufferlength = 12488;
int pointer = 1;
int offset = 0;
int length = 0;
FileStream fstwrite = new FileStream("D:\\Movie.wmv", FileMode.Create);
while (pointer != 0)
{
byte[] buff = new byte[bufferlength];
FileStream fst = new FileStream("E:\\Movie.wmv", FileMode.Open);
pointer = fst.Read(buff, 0, bufferlength);
fst.Close();
fstwrite.Write(buff, offset , pointer);
offset += pointer;
}
I used the above code for splitting a file and place it in other drive.Im not able to set the correct offset and length for this routine can anyone help me to fix this
splitting in the sense ,i split it in "x" kbs and pass it somewhere make the same file in some other location
I find it atlast ,thanks to evry one who gave their valueble responses.

Currently you're always reading from the start of the file... and even if you weren't you'd just be copying the whole file.
Here's some code which will actually split a single file into multiple files:
public static void SplitFile(string inputFile,
string outputPrefix,
int chunkSize)
{
byte[] buffer = new byte[chunkSize];
using (Stream input = File.OpenRead(inputFile))
{
int index = 0;
while (input.Position < input.Length)
{
using (Stream output = File.Create(outputPrefix + index))
{
int chunkBytesRead = 0;
while (chunkBytesRead < chunkSize)
{
int bytesRead = input.Read(buffer,
chunkBytesRead,
chunkSize - chunkBytesRead);
// End of input
if (bytesRead == 0)
{
break;
}
chunkBytesRead += bytesRead;
}
output.Write(buffer, 0, chunkBytesRead);
}
index++;
}
}
}

Your reading bufferlength of bytes. Shouldn't you set the offset like this then?
offset += bufferlength;

Don't open your source file inside the loop, or you'll always read the first chunk.
Open it before the loop, then make sure your offset is applied to the read.

How do I convert a Stream into a byte[] in C#? [duplicate]

This question already has answers here:
Creating a byte array from a stream
(18 answers)
Closed 6 years ago.
Is there a simple way or method to convert a Stream into a byte[] in C#?

The shortest solution I know:
using(var memoryStream = new MemoryStream())
{
sourceStream.CopyTo(memoryStream);
return memoryStream.ToArray();
}

Call next function like
byte[] m_Bytes = StreamHelper.ReadToEnd (mystream);
Function:
public static byte[] ReadToEnd(System.IO.Stream stream)
{
long originalPosition = 0;
if(stream.CanSeek)
{
originalPosition = stream.Position;
stream.Position = 0;
}
try
{
byte[] readBuffer = new byte[4096];
int totalBytesRead = 0;
int bytesRead;
while ((bytesRead = stream.Read(readBuffer, totalBytesRead, readBuffer.Length - totalBytesRead)) > 0)
{
totalBytesRead += bytesRead;
if (totalBytesRead == readBuffer.Length)
{
int nextByte = stream.ReadByte();
if (nextByte != -1)
{
byte[] temp = new byte[readBuffer.Length * 2];
Buffer.BlockCopy(readBuffer, 0, temp, 0, readBuffer.Length);
Buffer.SetByte(temp, totalBytesRead, (byte)nextByte);
readBuffer = temp;
totalBytesRead++;
}
}
}
byte[] buffer = readBuffer;
if (readBuffer.Length != totalBytesRead)
{
buffer = new byte[totalBytesRead];
Buffer.BlockCopy(readBuffer, 0, buffer, 0, totalBytesRead);
}
return buffer;
}
finally
{
if(stream.CanSeek)
{
stream.Position = originalPosition;
}
}
}

I use this extension class:
public static class StreamExtensions
{
public static byte[] ReadAllBytes(this Stream instream)
{
if (instream is MemoryStream)
return ((MemoryStream) instream).ToArray();
using (var memoryStream = new MemoryStream())
{
instream.CopyTo(memoryStream);
return memoryStream.ToArray();
}
}
}
Just copy the class to your solution and you can use it on every stream:
byte[] bytes = myStream.ReadAllBytes()
Works great for all my streams and saves a lot of code!
Of course you can modify this method to use some of the other approaches here to improve performance if needed, but I like to keep it simple.

In .NET Framework 4 and later, the Stream class has a built-in CopyTo method that you can use.
For earlier versions of the framework, the handy helper function to have is:
public static void CopyStream(Stream input, Stream output)
{
byte[] b = new byte[32768];
int r;
while ((r = input.Read(b, 0, b.Length)) > 0)
output.Write(b, 0, r);
}
Then use one of the above methods to copy to a MemoryStream and call GetBuffer on it:
var file = new FileStream("c:\\foo.txt", FileMode.Open);
var mem = new MemoryStream();
// If using .NET 4 or later:
file.CopyTo(mem);
// Otherwise:
CopyStream(file, mem);
// getting the internal buffer (no additional copying)
byte[] buffer = mem.GetBuffer();
long length = mem.Length; // the actual length of the data
// (the array may be longer)
// if you need the array to be exactly as long as the data
byte[] truncated = mem.ToArray(); // makes another copy
Edit: originally I suggested using Jason's answer for a Stream that supports the Length property. But it had a flaw because it assumed that the Stream would return all its contents in a single Read, which is not necessarily true (not for a Socket, for example.) I don't know if there is an example of a Stream implementation in the BCL that does support Length but might return the data in shorter chunks than you request, but as anyone can inherit Stream this could easily be the case.
It's probably simpler for most cases to use the above general solution, but supposing you did want to read directly into an array that is bigEnough:
byte[] b = new byte[bigEnough];
int r, offset;
while ((r = input.Read(b, offset, b.Length - offset)) > 0)
offset += r;
That is, repeatedly call Read and move the position you will be storing the data at.

Byte[] Content = new BinaryReader(file.InputStream).ReadBytes(file.ContentLength);

byte[] buf; // byte array
Stream stream=Page.Request.InputStream; //initialise new stream
buf = new byte[stream.Length]; //declare arraysize
stream.Read(buf, 0, buf.Length); // read from stream to byte array

Ok, maybe I'm missing something here, but this is the way I do it:
public static Byte[] ToByteArray(this Stream stream) {
Int32 length = stream.Length > Int32.MaxValue ? Int32.MaxValue : Convert.ToInt32(stream.Length);
Byte[] buffer = new Byte[length];
stream.Read(buffer, 0, length);
return buffer;
}

if you post a file from mobile device or other
byte[] fileData = null;
using (var binaryReader = new BinaryReader(Request.Files[0].InputStream))
{
fileData = binaryReader.ReadBytes(Request.Files[0].ContentLength);
}

Stream s;
int len = (int)s.Length;
byte[] b = new byte[len];
int pos = 0;
while((r = s.Read(b, pos, len - pos)) > 0) {
pos += r;
}
A slightly more complicated solution is necesary is s.Length exceeds Int32.MaxValue. But if you need to read a stream that large into memory, you might want to think about a different approach to your problem.
Edit: If your stream does not support the Length property, modify using Earwicker's workaround.
public static class StreamExtensions {
// Credit to Earwicker
public static void CopyStream(this Stream input, Stream output) {
byte[] b = new byte[32768];
int r;
while ((r = input.Read(b, 0, b.Length)) > 0) {
output.Write(b, 0, r);
}
}
}
[...]
Stream s;
MemoryStream ms = new MemoryStream();
s.CopyStream(ms);
byte[] b = ms.GetBuffer();

"bigEnough" array is a bit of a stretch. Sure, buffer needs to be "big ebough" but proper design of an application should include transactions and delimiters. In this configuration each transaction would have a preset length thus your array would anticipate certain number of bytes and insert it into correctly sized buffer. Delimiters would ensure transaction integrity and would be supplied within each transaction. To make your application even better, you could use 2 channels (2 sockets). One would communicate fixed length control message transactions that would include information about size and sequence number of data transaction to be transferred using data channel. Receiver would acknowledge buffer creation and only then data would be sent.
If you have no control over stream sender than you need multidimensional array as a buffer. Component arrays would be small enough to be manageable and big enough to be practical based on your estimate of expected data. Process logic would seek known start delimiters and then ending delimiter in subsequent element arrays. Once ending delimiter is found, new buffer would be created to store relevant data between delimiters and initial buffer would have to be restructured to allow data disposal.
As far as a code to convert stream into byte array is one below.
Stream s = yourStream;
int streamEnd = Convert.ToInt32(s.Length);
byte[] buffer = new byte[streamEnd];
s.Read(buffer, 0, streamEnd);

Quick and dirty technique:
static byte[] StreamToByteArray(Stream inputStream)
{
if (!inputStream.CanRead)
{
throw new ArgumentException();
}
// This is optional
if (inputStream.CanSeek)
{
inputStream.Seek(0, SeekOrigin.Begin);
}
byte[] output = new byte[inputStream.Length];
int bytesRead = inputStream.Read(output, 0, output.Length);
Debug.Assert(bytesRead == output.Length, "Bytes read from stream matches stream length");
return output;
}
Test:
static void Main(string[] args)
{
byte[] data;
string path = #"C:\Windows\System32\notepad.exe";
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read))
{
data = StreamToByteArray(fs);
}
Debug.Assert(data.Length > 0);
Debug.Assert(new FileInfo(path).Length == data.Length);
}
I would ask, why do you want to read a stream into a byte[], if you are wishing to copy the contents of a stream, may I suggest using MemoryStream and writing your input stream into a memory stream.

You could also try just reading in parts at a time and expanding the byte array being returned:
public byte[] StreamToByteArray(string fileName)
{
byte[] total_stream = new byte[0];
using (Stream input = File.Open(fileName, FileMode.Open, FileAccess.Read))
{
byte[] stream_array = new byte[0];
// Setup whatever read size you want (small here for testing)
byte[] buffer = new byte[32];// * 1024];
int read = 0;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
stream_array = new byte[total_stream.Length + read];
total_stream.CopyTo(stream_array, 0);
Array.Copy(buffer, 0, stream_array, total_stream.Length, read);
total_stream = stream_array;
}
}
return total_stream;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Writing and reading streams with offset to reduce disc seeks - c#

Related

Processing C# filestream input in WHILE loop causing execution time error

Setting the offset in a stream

GZIP file Total length in C#

Problem in splitting a file

How do I convert a Stream into a byte[] in C#? [duplicate]

Categories

Resources