If I have a single MemoryStream of which I know I sent multiple files (example 5 files) to this MemoryStream. Is it possible to read from this MemoryStream and be able to break apart file by file?
My gut is telling me no since when we Read, we are reading byte by byte... Any help and a possible snippet would be great. I haven't been able to find anything on google or here :(
You can't directly, not if you don't delimit the files in some way or know the exact size of each file as it was put into the buffer.
You can use a compressed file such as a zip file to transfer multiple files instead.
A stream is just a line of bytes. If you put the files next to each other in the stream, you need to know how to separate them. That means you must know the length of the files, or you should have used some separator. Some (most) file types have a kind of header, but looking for this in an entire stream may not be waterproof either, since the header of a file could just as well be data in another file.
So, if you need to write files to such a stream, it is wise to add some extra information. For instance, start with a version number, then, write the size of the first file, write the file itself and then write the size of the next file, etc....
By starting with a version number, you can make alterations to this format. In the future you may decide you need to store the file name as well. In that case, you can increase version number, make up a new format, and still be able to read streams that you created earlier.
This is of course especially useful if you store these streams too.
Since you're sending them, you'll have to send them into the stream in such a way that you'll know how to pull them out. The most common way of doing this is to use a length specification. For example, to write the files to the stream:
write an integer to the stream to indicate the number of files
Then for each file,
write an integer (or a long if the files are large) to indicate the number of bytes in the file
write the file
To read the files back,
read an integer (n) to determine the number of files in the stream
Then, iterating n times,
read an integer (or long if that's what you chose) to determine the number of bytes in the file
read the file
You could use an IEnumerable<Stream> instead.
You need to implement this yourself, what you would want to do is write in some sort of 'delimited' into the stream. As you're reading, look for that delimited, and you'll know when you have hit a new file.
Here's a quick and dirty example:
byte[] delimiter = System.Encoding.Default.GetBytes("++MyDelimited++");
ms.Write(myFirstFile);
ms.Write(delimiter);
ms.Write(mySecondFile);
....
int len;
do {
len = ms.ReadByte(buffer, lastOffest, delimiter.Length);
if(buffer == delimiter)
{
// Close and open a new file stream
}
// Write buffer to output stream
} while(len > 0);
Related
This is a follow up question to this question:
Difference between file path and file stream?
I didn't fully understand everything answered in the linked question.
I am using the Microsoft.SqlServer.Dac.BacPackage which contains a Load method with 2 overloads - one that receives a string path and one that receives a Stream.
This is the documentation of the Load method:
https://learn.microsoft.com/en-us/dotnet/api/microsoft.sqlserver.dac.bacpackage.load?view=sql-dacfx-150
What exactly is the difference between the two? Am I correct in assuming that the overloading of the string path saves all the file in the memory first, while the stream isn't? Are there other differences?
No, the file will not usually be fully loaded all at once.
A string path parameter normally means it will just open the file as a FileStream and pass it to the other version of the function. There is no reason why the stream should fully load the file into memory unless requested.
A Stream parameter means you open the file and pass the resulting Stream. You could also pass any other type of Stream, such as a network stream, a zip or decryption stream, a memory-backed stream, anything really.
Short answer:
The fact that you have two methods, one that accepts a filename and one that accepts a stream is just for convenience. Internally, the one with the filename will open the file as a stream and call the other method.
Longer answer
You can consider a stream as a sequence of bytes. The reason to use a stream instead of a byte[] or List<byte>, is, that if the sequence is really, really large, and you don't need to have access to all bytes at once, it would be a waste to put all bytes in memory before processing them.
For instance, if you want to calculate the checksum for all bytes in a file: you don't need to put all data in memory before you can start calculating the sum. In fact, anything that efficiently can deliver you the bytes one by one would suffice.
That is the reason why people would want to read a file as a stream.
The reason why people want a stream as input for their data, is that they want to give the caller the opportunity to specify the source of their data: callers can provide a stream that reads from a file, but also a stream with data from the internet, or from a database, or from a textBox, the procedure does not care, as long as it can read the bytes one by one or sometimes per chunk of bytes:
using (Stream fileStream = File.Open(fileName)
{
ProcessInputData(fileStream);
}
Or:
byte[] bytesToProcess = ...
using (Stream memoryStream = new MemoryStream(bytesToProcess))
{
ProcessInputData(memoryStream);
}
Or:
string operatorInput = this.textBox1.Text;
using (Stream memoryStream = new MemoryStream(operatorInput))
{
ProcessInputData(memoryStream);
}
Conclusioin
Methods use streams in their interface to indicate that they don't need all data in memory at once. One-by-one, or per chunk is enough. The caller is free to decide where the data comes from.
This question is similar to How to read data from a zip file without having to unzip the entire file excepted I'd like to go a bit further and read only a part of a file, i.e. get the file stream, and seek to a position for which I know the offset in bytes. I don't know if the zip format allows that in the first place.
I've tried to seek in the stream returned by ZipArchive.Entries[...].Open(), but it throws, saying the operation is not supported.
I can of course just read (and discard) the contents up to the point I'm interested in, but this is slow for big files.
EDIT: An example to make clear what I want to do:
Let's say I have a file archive.zip containing several files, one of them is bigfile.bin. I already know how to decompress bigfile.bin without decompressing the other files of archive.zip, no problem there. My question is: can I skip 10 000 000 bytes of bigfile.bin and start reading what remains? Those 10 000 000 bytes would be measured in the decompressed stream of course.
using (var archive = new ZipArchive("archive.zip"))
{
using (var data = archive.Entries.Single(e => e.Name == "bigfile.bin").Open())
{
data.Seek(10000000, SeekOrigin.Begin); // this is what I want to do but it doesn't work
data.Read(/*etc*/);
}
}
EDIT 1:
I build a torrent application; Downloading from diffrent clients simultaneously. Each download represent a portion for my file and diffrent clients have diffrent portions.
After a download is complete, I need to know which portion I need to achieve now by Finding "empty" portions in my file.
One way to creat a file with fixed size:
File.WriteAllBytes(#"C:\upload\BigFile.rar", new byte[Big Size]);
My portion Arr that represent my file as portions:
BitArray TorrentPartsState = new BitArray(10);
For example:
File size is 100.
TorrentPartsState[0] = true; // thats mean that in my file, from position 0 until 9 I **dont** need to fill in some information.
TorrentPartsState[1] = true; // thats mean that in my file, from position 10 until 19 I **need** to fill in some information.
I seatch an effective way to save what the BitArray is containing even if the computer/application is shut down. One way I tought of, is by xml file and to update it each time a portion is complete.
I don't think its smart and effective solution. Any idea for other one?
It sounds like you know the following when you start a transfer:
The size of the final file.
The (maximum) number of streams you intend to use for the file.
Create the output file and allocate the required space.
Create a second "control" file with a related filename, e.g. add you own extension. In that file maintain an array of stream status structures corresponding to the network streams. Each status consists of the starting offset and number of bytes transferred. Periodically flush the stream buffers and then update the control file to reflect the progress made and committed.
Variations on the theme:
The control file can define segments to be transferred, e.g. 16MB chunks, and treated as a work queue by threads that look for an incomplete segment and a suitable server from which to retrieve it.
The control file could be a separate fork within the result file. (Who am I kidding?)
You could use a BitArray (in System.Collections).
Then, when you visit on offset in the file, you can set the BitArray at that offset to true.
So for your 10,000 byte file:
BitArray ba = new BitArray(10000);
// Visited offset, mark in the BitArray
ba[4] = true;
Implement a file system (like on a disk) in your file - just use something simple, should be something available in the FOS arena
I'm new to programming in general (My understanding of programming concepts is still growing.). So this question is about learning, so please provide enough info for me to learn but not so much that I can't, thank you.
(I would also like input on how to make the code reusable with in the project.)
The goal of the project I'm working on consists of:
Read binary file.
I have known offsets I need to read to find a particular chunk of data from within this file.
First offset is first 4 bytes(Offset for end of my chunk).
Second offset is 16 bytes from end of file. I read for 4 bytes.(Gives size of chunk in hex).
Third offset is the 4 bytes following previous, read for 4 bytes(Offset for start of chunk in hex).
Locate parts in the chunk to modify by searching ASCII text as well as offsets.
Now I have the start offset, end offset and size of my chunk.
This should allow me to read bytes from file into a byte array and know the size of the array ahead of time.
(Questions: 1. Is knowing the size important? Other than verification. 2. Is reading part of a file into a byte array in order to change bytes and overwrite that part of the file the best method?)
So far I have managed to read the offsets from the file using BinaryReader on a MemoryStream. I then locate the chunk of data I need and read that into a byte array.
I'm stuck in several ways:
What are the best practices for binary Reading / Writing?
What's the best storage convention for the data that is read?
When I need to modify bytes how do I go about that.
Should I be using FileStream?
Since you want to both read and write, it makes sense to use the FileStream class directly (using FileMode.Open and FileAccess.ReadWrite). See FileStream on MSDN for a good overall example.
You do need to know the number of bytes that you are going to be reading from the stream. See the FileStream.Read documentation.
Fundamentally, you have to read the bytes into memory at some point if you're going to use and later modify their contents. So you will have to make an in-memory copy (using the Read method is the right way to go if you're reading a variable-length chunk at a time).
As for best practices, always dispose your streams when you're done; e.g.:
using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
//Do work with the FileStream here.
}
If you're going to do a large amount of work, you should be doing the work asynchronously. (Let us know if that's the case.)
And, of course, check the FileStream.Read documentation and also the FileStream.Write documentation before using those methods.
Reading bytes is best done by pre-allocating an in-memory array of bytes with the length that you're going to read, then reading those bytes. The following will read the chunk of bytes that you're interested in, let you do work on it, and then replace the original contents (assuming the length of the chunk hasn't changed):
EDIT: I've added a helper method to do work on the chunk, per the comments on variable scope.
using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
var chunk = new byte[numOfBytesInChunk];
var offsetOfChunkInFile = stream.Position; // It sounds like you've already calculated this.
stream.Read(chunk, 0, numOfBytesInChunk);
DoWorkOnChunk(ref chunk);
stream.Seek(offsetOfChunkInFile, SeekOrigin.Begin);
stream.Write(chunk, 0, numOfBytesInChunk);
}
private void DoWorkOnChunk(ref byte[] chunk)
{
//TODO: Any mutation done here to the data in 'chunk' will be written out to the stream.
}
I want to write a function in a way if i call it with an argument
of(100) and the path of particular file it will gets the first 100kb
data of the file, and when i call it for the second time with 200 it should return the next 200kb of data leaving the first 100. If there is no more left in the file it should return 0;
Thanks
most of what you want is handled by the System.IO.File and FileStream. If you want that exact function signature.
Step 1) you need to open a file with a particular path. System.IO.File has a few methods for doing this, including Open and OpenRead, as well as ReadAllXXXX, allowing you to access the flie contents in multiple ways. The one you'd probably want is OpenRead, which returns a FileStream object.
Step 2) you need to read a certain number of bytes. Once you have the FileStream from step 1, you should look at the Stream.ReadBytes method. given an array of bytes, it will read a specified number of bytes from the stream into the array
You may look at the StreamReader Class. It may get you to where you want to go, though I'm not sure of how specifically to break it down into the kb chunks you want.