I worked out how to read a large file into a smaller file, but only if I start at the beginning. I would like to be able to read from the middle to an arbitrary point. I realize it sounds crazy, but I have my reasons. I keep getting nothing written out to the file when I set the position greater than 0 for some reason. I will end up with a file full of null values.
I thought this would read 300K in from 2.5MB into the file.
public static FileStream stream = new FileStream(#"file.dat", FileMode.Open, FileAccess.Read);
public static FileStream shortFile = null;
int limit = 300000;
public MainWindow()
{
byte[] block = new byte[limit];
using (FileStream fs = File.Create("tempfile.dat"))
{
var newposition = stream.Seek(2500000, SeekOrigin.Begin);
stream.Position = newposition;
while (stream.Read(block, 0, limit) > 0 && stream.Position <= limit)
{
fs.Write(block, 0, block.Length);
}
}
InitializeComponent();
}
You have said "I thought this would read 300K in from 2.5MB into the file"; but you have the limit and the Seek values the other way around. The Seek position needs to be set to 300000; the limit should be 2500000.
Other tips:
Streams are disposable so better to keep them as local variables, declared in using blocks (i.e. do that for stream).
You don't need to set stream.Position = newposition; since that is what the Seek has just done.
Related
I am currently testing several decompression libraries for a project I'm involved with to decompress http file streams on the fly. I have tried two very promising libraries and found an issue that seems to appear in both of them.
This is what I am doing:
video.avi compressed to video.zip on HTTP server test.com/video.zip (~20MB)
HttpWebRequest to read stream from the server
Write HttpWebRequest ResponseStream data into MemoryStream
Let decompression library read from MemoryStream
Read decompressed file stream while it's being downloaded by HttpWebRequest
The whole idea works fine, I'm able to uncompress and stream the compressed video directly into VLC stdin and it's rendered just fine. However I have to use a read buffer of one byte on the decompression library. Any buffer larger than one byte will cause the uncompressed data stream to be cut off. For a test I've written the decompressed stream into a file and compared it with the original video.avi and some data is just skipped by the decompression. When streaming this broken data into VLC it causes a lot of video artifacts and the playback speed is also greatly reduced.
If I knew the size of what is available to read I could trim my buffer accordingly but no library would make this information public so all I can do is read the data with a one byte buffer. Maybe my approach is wrong? Or maybe I'm overlooking something?
Here's an example code (requires VLC):
ICSharpCode.SharpZLib (http://icsharpcode.github.io/SharpZipLib/)
static void Main(string[] args)
{
// Initialise VLC
Process vlc = new Process()
{
StartInfo =
{
FileName = #"C:\Program Files\VideoLAN\vlc.exe", // Adjust as required to test the code
RedirectStandardInput = true,
UseShellExecute = false,
Arguments = "-"
}
};
vlc.Start();
Stream outStream = vlc.StandardInput.BaseStream;
// Get source stream
HttpWebRequest stream = (HttpWebRequest)WebRequest.Create("http://codefreak.net/~daniel/apps/stream60s-large.zip");
Stream compressedVideoStream = stream.GetResponse().GetResponseStream();
// Create local decompression loop
MemoryStream compressedLoopback = new MemoryStream();
ZipInputStream zipStream = new ZipInputStream(compressedLoopback);
ZipEntry currentEntry = null;
byte[] videoStreamBuffer = new byte[8129]; // 8kb read buffer
int read = 0;
long totalRead = 0;
while ((read = compressedVideoStream.Read(videoStreamBuffer, 0, videoStreamBuffer.Length)) > 0)
{
// Write compressed video stream into compressed loopback without affecting current read position
long previousPosition = compressedLoopback.Position; // Store current read position
compressedLoopback.Position = totalRead; // Jump to last write position
totalRead += read; // Increase last write position by current read size
compressedLoopback.Write(videoStreamBuffer, 0, read); // Write data into loopback
compressedLoopback.Position = previousPosition; // Restore reading position
// If not already, move to first entry
if (currentEntry == null)
currentEntry = zipStream.GetNextEntry();
byte[] outputBuffer = new byte[1]; // Decompression read buffer, this is the bad one!
int zipRead = 0;
while ((zipRead = zipStream.Read(outputBuffer, 0, outputBuffer.Length)) > 0)
outStream.Write(outputBuffer, 0, outputBuffer.Length); // Write directly to VLC stdin
}
}
SharpCompress (https://github.com/adamhathcock/sharpcompress)
static void Main(string[] args)
{
// Initialise VLC
Process vlc = new Process()
{
StartInfo =
{
FileName = #"C:\Program Files\VideoLAN\vlc.exe", // Adjust as required to test the code
RedirectStandardInput = true,
UseShellExecute = false,
Arguments = "-"
}
};
vlc.Start();
Stream outStream = vlc.StandardInput.BaseStream;
// Get source stream
HttpWebRequest stream = (HttpWebRequest)WebRequest.Create("http://codefreak.net/~daniel/apps/stream60s-large.zip");
Stream compressedVideoStream = stream.GetResponse().GetResponseStream();
// Create local decompression loop
MemoryStream compressedLoopback = new MemoryStream();
ZipReader zipStream = null;
EntryStream currentEntry = null;
byte[] videoStreamBuffer = new byte[8129]; // 8kb read buffer
int read = 0;
long totalRead = 0;
while ((read = compressedVideoStream.Read(videoStreamBuffer, 0, videoStreamBuffer.Length)) > 0)
{
// Write compressed video stream into compressed loopback without affecting current read position
long previousPosition = compressedLoopback.Position; // Store current read position
compressedLoopback.Position = totalRead; // Jump to last write position
totalRead += read; // Increase last write position by current read size
compressedLoopback.Write(videoStreamBuffer, 0, read); // Write data into loopback
compressedLoopback.Position = previousPosition; // Restore reading position
// Open stream after writing to it because otherwise it will not be able to identify the compression type
if (zipStream == null)
zipStream = (ZipReader)ReaderFactory.Open(compressedLoopback); // Cast to ZipReader, as we know the type
// If not already, move to first entry
if (currentEntry == null)
{
zipStream.MoveToNextEntry();
currentEntry = zipStream.OpenEntryStream();
}
byte[] outputBuffer = new byte[1]; // Decompression read buffer, this is the bad one!
int zipRead = 0;
while ((zipRead = currentEntry.Read(outputBuffer, 0, outputBuffer.Length)) > 0)
outStream.Write(outputBuffer, 0, outputBuffer.Length); // Write directly to VLC stdin
}
}
To test this code I recommend setting the output buffer for SharpZipLib to 2 bytes and for SharpCompress to 8 bytes. You will see the artifacts and also that the play speed of the video is wrong, the seek time should always be aligned with the number that is counting in the video.
I haven't really found any good explanation of why a larger outputBuffer that is reading from the decompression lib is causing these problems or a way to solve this other than having the tiniest possible buffer.
So my question is what I am doing wrong or if this is a general issue when reading compressed files from streams? How could I increase the outputBuffer while reading the correct data?
Any help is greatly appreciated!
Regards,
Gachl
You need to write only how many bytes you read. Writing the entire buffer size will add additional bytes (whatever happened to be in the buffer before). zipStream.Read is not required to read as many bytes as you request.
while ((zipRead = zipStream.Read(outputBuffer, 0, outputBuffer.Length)) > 0)
outStream.Write(outputBuffer, 0, zipRead); // Write directly to VLC stdin
Edit: Solution is at bottom of post
I am trying my luck with reading binary files. Since I don't want to rely on byte[] AllBytes = File.ReadAllBytes(myPath), because the binary file might be rather big, I want to read small portions of the same size (which fits nicely with the file format to read) in a loop, using what I would call a "buffer".
public void ReadStream(MemoryStream ContentStream)
{
byte[] buffer = new byte[sizePerHour];
for (int hours = 0; hours < NumberHours; hours++)
{
int t = ContentStream.Read(buffer, 0, sizePerHour);
SecondsToAdd = BitConverter.ToUInt32(buffer, 0);
// further processing of my byte[] buffer
}
}
My stream contains all the bytes I want, which is a good thing. When I enter the loop several things cease to work.
My int t is 0although I would presume that ContentStream.Read() would process information from within the stream to my bytearray, but that isn't the case.
I tried buffer = ContentStream.GetBuffer(), but that results in my buffer containing all of my stream, a behaviour I wanted to avoid by using reading to a buffer.
Also resetting the stream to position 0 before reading did not help, as did specifying an offset for my Stream.Read(), which means I am lost.
Can anyone point me to reading small portions of a stream to a byte[]? Maybe with some code?
Thanks in advance
Edit:
Pointing me to the right direction was the answer, that .Read() returns 0 if the end of stream is reached. I modified my code to the following:
public void ReadStream(MemoryStream ContentStream)
{
byte[] buffer = new byte[sizePerHour];
ContentStream.Seek(0, SeekOrigin.Begin); //Added this line
for (int hours = 0; hours < NumberHours; hours++)
{
int t = ContentStream.Read(buffer, 0, sizePerHour);
SecondsToAdd = BitConverter.ToUInt32(buffer, 0);
// further processing of my byte[] buffer
}
}
And everything works like a charm. I initially reset the stream to its origin every time I iterated over hour and giving an offset. Moving the "set to beginning-Part" outside my look and leaving the offset at 0 did the trick.
Read returns zero if the end of the stream is reached. Are you sure, that your memory stream has the content you expect? I´ve tried the following and it works as expected:
// Create the source of the memory stream.
UInt32[] source = {42, 4711};
List<byte> sourceBuffer = new List<byte>();
Array.ForEach(source, v => sourceBuffer.AddRange(BitConverter.GetBytes(v)));
// Read the stream.
using (MemoryStream contentStream = new MemoryStream(sourceBuffer.ToArray()))
{
byte[] buffer = new byte[sizeof (UInt32)];
int t;
do
{
t = contentStream.Read(buffer, 0, buffer.Length);
if (t > 0)
{
UInt32 value = BitConverter.ToUInt32(buffer, 0);
}
} while (t > 0);
}
Do I need a Seek call in this code?
// Assume bytes = byte[] of some bytes
using (var memoryStream = new MemoryStream(bytes))
{
memoryStream.Seek(0, SeekOrigin.Begin);
return new BinaryFormatter().Deserialize(memoryStream);
}
No, there is no need to Seek on stream that was just created.
You need to Seek or set Position is you wrote something to the stream before.
I.e. common question is "how to return MemoryStream with some serialized data" - you need to write data to the stream and than Seek to the beginning of the stream so Read will start from the beginning and not the last position of Write (hence always saying that there nothing left to read). Sample question - Can't create MemoryStream.
No, you don't have to. For proving that, you can check the constructor code:
public MemoryStream(byte[] buffer, bool writable)
{
if (buffer == null) throw new ArgumentNullException("buffer", Environment.GetResourceString("ArgumentNull_Buffer"));
Contract.EndContractBlock();
_buffer = buffer;
_length = _capacity = buffer.Length;
_writable = writable;
_exposable = false;
_origin = 0;
_isOpen = true;
}
Seek changes _position (in your example to 0), which is not assigned in the constructor, so upon construction of the object Position will have the default long value of 0.
It's a different story though if you perform further operations on the stream that could change its Position before reading from it.
Is it possible to read a large binary file from a particular position?
I don't want to read the file from the beginning because I can calculate the start position and the length of the stream I need.
using (FileStream sr = File.OpenRead("someFile.dat"))
{
sr.Seek(100, SeekOrigin.Begin);
int read = sr.ReadByte();
//...
}
According to #shenhengbin answord.
Use BinaryReader.BaseStream.Seek.
using (BinaryReader b = new BinaryReader(File.Open("perls.bin", FileMode.Open)))
{
int pos = 50000;
int required = 2000;
// Seek to our required position.
b.BaseStream.Seek(pos, SeekOrigin.Begin);
// Read the next 2000 bytes.
byte[] by = b.ReadBytes(required);
}
Well if you know streams, why not using (File)Stream.Seek(...) ?
Of course it is possible.See this here.See the offset.you can read from the offset
I have a web server which will read large binary files (several megabytes) into byte arrays. The server could be reading several files at the same time (different page requests), so I am looking for the most optimized way for doing this without taxing the CPU too much. Is the code below good enough?
public byte[] FileToByteArray(string fileName)
{
byte[] buff = null;
FileStream fs = new FileStream(fileName,
FileMode.Open,
FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
buff = br.ReadBytes((int) numBytes);
return buff;
}
Simply replace the whole thing with:
return File.ReadAllBytes(fileName);
However, if you are concerned about the memory consumption, you should not read the whole file into memory all at once at all. You should do that in chunks.
I might argue that the answer here generally is "don't". Unless you absolutely need all the data at once, consider using a Stream-based API (or some variant of reader / iterator). That is especially important when you have multiple parallel operations (as suggested by the question) to minimise system load and maximise throughput.
For example, if you are streaming data to a caller:
Stream dest = ...
using(Stream source = File.OpenRead(path)) {
byte[] buffer = new byte[2048];
int bytesRead;
while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0) {
dest.Write(buffer, 0, bytesRead);
}
}
I would think this:
byte[] file = System.IO.File.ReadAllBytes(fileName);
Your code can be factored to this (in lieu of File.ReadAllBytes):
public byte[] ReadAllBytes(string fileName)
{
byte[] buffer = null;
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
buffer = new byte[fs.Length];
fs.Read(buffer, 0, (int)fs.Length);
}
return buffer;
}
Note the Integer.MaxValue - file size limitation placed by the Read method. In other words you can only read a 2GB chunk at once.
Also note that the last argument to the FileStream is a buffer size.
I would also suggest reading about FileStream and BufferedStream.
As always a simple sample program to profile which is fastest will be most beneficial.
Also your underlying hardware will have a large effect on performance. Are you using server based hard disk drives with large caches and a RAID card with onboard memory cache? Or are you using a standard drive connected to the IDE port?
Depending on the frequency of operations, the size of the files, and the number of files you're looking at, there are other performance issues to take into consideration. One thing to remember, is that each of your byte arrays will be released at the mercy of the garbage collector. If you're not caching any of that data, you could end up creating a lot of garbage and be losing most of your performance to % Time in GC. If the chunks are larger than 85K, you'll be allocating to the Large Object Heap(LOH) which will require a collection of all generations to free up (this is very expensive, and on a server will stop all execution while it's going on). Additionally, if you have a ton of objects on the LOH, you can end up with LOH fragmentation (the LOH is never compacted) which leads to poor performance and out of memory exceptions. You can recycle the process once you hit a certain point, but I don't know if that's a best practice.
The point is, you should consider the full life cycle of your app before necessarily just reading all the bytes into memory the fastest way possible or you might be trading short term performance for overall performance.
I'd say BinaryReader is fine, but can be refactored to this, instead of all those lines of code for getting the length of the buffer:
public byte[] FileToByteArray(string fileName)
{
byte[] fileData = null;
using (FileStream fs = File.OpenRead(fileName))
{
using (BinaryReader binaryReader = new BinaryReader(fs))
{
fileData = binaryReader.ReadBytes((int)fs.Length);
}
}
return fileData;
}
Should be better than using .ReadAllBytes(), since I saw in the comments on the top response that includes .ReadAllBytes() that one of the commenters had problems with files > 600 MB, since a BinaryReader is meant for this sort of thing. Also, putting it in a using statement ensures the FileStream and BinaryReader are closed and disposed.
In case with 'a large file' is meant beyond the 4GB limit, then my following written code logic is appropriate. The key issue to notice is the LONG data type used with the SEEK method. As a LONG is able to point beyond 2^32 data boundaries.
In this example, the code is processing first processing the large file in chunks of 1GB, after the large whole 1GB chunks are processed, the left over (<1GB) bytes are processed. I use this code with calculating the CRC of files beyond the 4GB size.
(using https://crc32c.machinezoo.com/ for the crc32c calculation in this example)
private uint Crc32CAlgorithmBigCrc(string fileName)
{
uint hash = 0;
byte[] buffer = null;
FileInfo fileInfo = new FileInfo(fileName);
long fileLength = fileInfo.Length;
int blockSize = 1024000000;
decimal div = fileLength / blockSize;
int blocks = (int)Math.Floor(div);
int restBytes = (int)(fileLength - (blocks * blockSize));
long offsetFile = 0;
uint interHash = 0;
Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
bool firstBlock = true;
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
buffer = new byte[blockSize];
using (BinaryReader br = new BinaryReader(fs))
{
while (blocks > 0)
{
blocks -= 1;
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(blockSize);
if (firstBlock)
{
firstBlock = false;
interHash = Crc32CAlgorithm.Compute(buffer);
hash = interHash;
}
else
{
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
offsetFile += blockSize;
}
if (restBytes > 0)
{
Array.Resize(ref buffer, restBytes);
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(restBytes);
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
buffer = null;
}
}
//MessageBox.Show(hash.ToString());
//MessageBox.Show(hash.ToString("X"));
return hash;
}
Overview: if your image is added as a action= embedded resource then use the GetExecutingAssembly to retrieve the jpg resource into a stream then read the binary data in the stream into an byte array
public byte[] GetAImage()
{
byte[] bytes=null;
var assembly = Assembly.GetExecutingAssembly();
var resourceName = "MYWebApi.Images.X_my_image.jpg";
using (Stream stream = assembly.GetManifestResourceStream(resourceName))
{
bytes = new byte[stream.Length];
stream.Read(bytes, 0, (int)stream.Length);
}
return bytes;
}
Use the BufferedStream class in C# to improve performance. A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance.
See the following for a code example and additional explanation:
http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx
use this:
bytesRead = responseStream.ReadAsync(buffer, 0, Length).Result;
I would recommend trying the Response.TransferFile() method then a Response.Flush() and Response.End() for serving your large files.
If you're dealing with files above 2 GB, you'll find that the above methods fail.
It's much easier just to hand the stream off to MD5 and allow that to chunk your file for you:
private byte[] computeFileHash(string filename)
{
MD5 md5 = MD5.Create();
using (FileStream fs = new FileStream(filename, FileMode.Open))
{
byte[] hash = md5.ComputeHash(fs);
return hash;
}
}