Saving zip file entry into a list - c#

I have a List and I want to save several pictures into it, then I will binary serialize that list into a file.
I am getting the pictures from a zip file like this:
zip.GetEntry(path).Open()
The zip file opens correctly and if i replace Open with ExtractToFile and try to extract the picture into a folder it works with no problems.
But when I try to save the body of the picture into the list instead, as a stream, it doesn't work:
List.Add(zip.GetEntry(path).Open());
The picture is over 2MB large, yet when I serialize the list it has barely 2 kilobytes.
What am i doing wrong here?

ZipArchiveEntry.Open() returns a stream.
You need to read the stream using Stream.Read(...) method somewhere in your code.
You can save a list of streams if you want as long as you read them when you want to export the data.
The stream itself isn't the data, it allows you to read it.

You can't directly serialize a Stream object. You should first Read the contents into byte[], then serialize that array.
First change your List:
List<byte[]> List = new List<byte[]>();
Then read the streams into this list. Since the Length property is not supported on compression streams, it is simpler to use a MemoryStream as a buffer:
using (MemoryStream ms = new MemoryStream())
{
zip.GetEntry(path).Open().CopyTo(ms);
List.Add(ms.ToArray());
}
And finally serialize the List.

Related

Difference between loading a file from a path and from a stream C#

This is a follow up question to this question:
Difference between file path and file stream?
I didn't fully understand everything answered in the linked question.
I am using the Microsoft.SqlServer.Dac.BacPackage which contains a Load method with 2 overloads - one that receives a string path and one that receives a Stream.
This is the documentation of the Load method:
https://learn.microsoft.com/en-us/dotnet/api/microsoft.sqlserver.dac.bacpackage.load?view=sql-dacfx-150
What exactly is the difference between the two? Am I correct in assuming that the overloading of the string path saves all the file in the memory first, while the stream isn't? Are there other differences?
No, the file will not usually be fully loaded all at once.
A string path parameter normally means it will just open the file as a FileStream and pass it to the other version of the function. There is no reason why the stream should fully load the file into memory unless requested.
A Stream parameter means you open the file and pass the resulting Stream. You could also pass any other type of Stream, such as a network stream, a zip or decryption stream, a memory-backed stream, anything really.
Short answer:
The fact that you have two methods, one that accepts a filename and one that accepts a stream is just for convenience. Internally, the one with the filename will open the file as a stream and call the other method.
Longer answer
You can consider a stream as a sequence of bytes. The reason to use a stream instead of a byte[] or List<byte>, is, that if the sequence is really, really large, and you don't need to have access to all bytes at once, it would be a waste to put all bytes in memory before processing them.
For instance, if you want to calculate the checksum for all bytes in a file: you don't need to put all data in memory before you can start calculating the sum. In fact, anything that efficiently can deliver you the bytes one by one would suffice.
That is the reason why people would want to read a file as a stream.
The reason why people want a stream as input for their data, is that they want to give the caller the opportunity to specify the source of their data: callers can provide a stream that reads from a file, but also a stream with data from the internet, or from a database, or from a textBox, the procedure does not care, as long as it can read the bytes one by one or sometimes per chunk of bytes:
using (Stream fileStream = File.Open(fileName)
{
ProcessInputData(fileStream);
}
Or:
byte[] bytesToProcess = ...
using (Stream memoryStream = new MemoryStream(bytesToProcess))
{
ProcessInputData(memoryStream);
}
Or:
string operatorInput = this.textBox1.Text;
using (Stream memoryStream = new MemoryStream(operatorInput))
{
ProcessInputData(memoryStream);
}
Conclusioin
Methods use streams in their interface to indicate that they don't need all data in memory at once. One-by-one, or per chunk is enough. The caller is free to decide where the data comes from.

Read a part of a file contained in a zip archive

This question is similar to How to read data from a zip file without having to unzip the entire file excepted I'd like to go a bit further and read only a part of a file, i.e. get the file stream, and seek to a position for which I know the offset in bytes. I don't know if the zip format allows that in the first place.
I've tried to seek in the stream returned by ZipArchive.Entries[...].Open(), but it throws, saying the operation is not supported.
I can of course just read (and discard) the contents up to the point I'm interested in, but this is slow for big files.
EDIT: An example to make clear what I want to do:
Let's say I have a file archive.zip containing several files, one of them is bigfile.bin. I already know how to decompress bigfile.bin without decompressing the other files of archive.zip, no problem there. My question is: can I skip 10 000 000 bytes of bigfile.bin and start reading what remains? Those 10 000 000 bytes would be measured in the decompressed stream of course.
using (var archive = new ZipArchive("archive.zip"))
{
using (var data = archive.Entries.Single(e => e.Name == "bigfile.bin").Open())
{
data.Seek(10000000, SeekOrigin.Begin); // this is what I want to do but it doesn't work
data.Read(/*etc*/);
}
}

Appending bytes to an existing file C#

I am working on a steganography software in C#, more precisely for video files. My approach is to append the extra information at the end of a video file. However, I must read the whole video file in memory first. I used the File.ReadAllBytes() function in C# to read a video file (video around 200MB) into a byte array. Then I create a new array with the video's bytes and my data's bytes. But, this sometimes causes an OutOfMemoryException. And when it doesn't it is very slow. Is there a more efficient way to append bytes to an existing file in C# which will solve this issue? Thank you.
Open the file with FileMode.Append
var stream = new FileStream(path, FileMode.Append)
FileMode Enumeration
FileMode.Append:
Opens the file if it exists and seeks to the end of the file, or
creates a new file. This requires FileIOPermissionAccess.Append
permission. FileMode.Append can be used only in conjunction with
FileAccess.Write. Trying to seek to a position before the end of the
file throws an IOException exception, and any attempt to read fails
and throws a NotSupportedException exception.
Sure, it's easy:
using (var stream = File.Open(path, FileMode.Append))
{
stream.Write(extraData);
}
No need to read the file first.
I wouldn't class this as steganography though - that would involve making subtle changes to the video frames such that it's still a valid video and looks the same to the human eye, but the extra data is encoded within those frames so it can be extracted later.
attempt this method, I am unsure if it will yield faster results, but logically it should.
: https://stackoverflow.com/a/6862460/2835725

How to "include" a .PNG image in a binary file?

I'm doing some resource management code where I take a bunch of different resources (image positions etc.) along with the actual images and make a single binary file out of them. Now, how do I actually include the .PNG file in a binary file and how do I read it back again? I would like to retain the .PNG compression.
I use BinaryWriter to write the data into a file, and BinaryReader to read it back. Here's an example of the format I'm using:
BinaryWriter writer = new BinaryWriter(new FileStream("file.tmp"));
writer.Write(name);
writer.Write(positionX);
writer.Write(positionY);
// Here should be the binary data of the PNG image
writer.Close();
BinaryReader reader = new BinaryReader(new FileStream("file.tmp"));
string name = reader.ReadString();
float posX = reader.ReadSingle();
float posY = reader.ReadSingle();
Bitmap bitmap = ... // Here I'd like to get the PNG data
reader.Close();
There is some other data too, both before and after the PNG data. Basically I will merge multiple of PNG files into this one binary file.
You will need to use so sort of Prefix (int) Followed by a Length Indicator (int) followed by your Payload (variable length) or if you know this will be the last thing in your file, then you can skip the prefix/size and just read until end of stream.
Then when you save your various parts, you write your prefix, then your length and then your data.
Ideally you could use some of the serialisers like protobuf to do a lot of the serialising for you, then you can just load your class back. I do this in one of my projects for Plugin Installers. The final File is a Zip, but the structures to generate the file "filenames, description, actual file locations, etc" are stored in a custom file like what you are describing.
if you are doing this all in memory then you could Serialise your PNG Image to a MemoryStream (get the size), then write the Size to the FileStream (file.tmp) followed by the MemorySteam Buffer
using (MemoryStream ms = new MemoryStream())
{
bitmap.Save(ms);
writer.Write(ms.Length);
ms.Position = 0;
ms.CopyTo(writer.BaseStream);
}
Basically Paul Farry's answer is what you need to do. Read up on binary formats like the PNG format (see file header, chunks), the ZIP format (file headers, data descriptor), which implement something -- on a more elaborate level than you need -- the mechanism of storing different chunks of data in one file.

Is there a way to make this faster? MemoryStream vs FileStream

I am working with iTextSharp, and need to generate hundreds of thousands of RTF documents - the resulting files are between 5KB and 500KB.
I am listing 2 approaches below - the original approach wasn't necessarily slow, but I figured why write and retrieve to/from file to get the output string I need. I saw this other approach using MemoryStream, but it actually slowed things down. I essentially just need the outputted RTF content, so that I can run some filters on that RTF to clean up unnecessary formatting. The queries bringing back the data are very quick instant seeming . To generate a 1000 files (actually 2000 files are created in process) with original approach files takes about 15 minutes, the same with second approach takes about 25-30 minutes. The resulting files that I've run are averaging around 80KB.
Is there something wrong with the second approach? Seems like it should be faster than the first one, not slower.
Original approach:
RtfWriter2.GetInstance(doc, new FileStream(RTFFilePathName, FileMode.Create));
doc.Open();
//Add Tables and stuff here
doc.Close(); //It saves a file here to (RTFPathFileName)
StreamReader srRTF = new StreamReader(RTFFilePathName);
string rtfText = srRTF.ReadToEnd();
srRTF.Close();
//Do additional things with rtfText before writing to my final file
New approach, trying to speed it up but this is actually half as fast:
MemoryStream stream = new MemoryStream();
RtfWriter2.GetInstance(doc, stream);
doc.Open();
//Add Tables and stuff here
doc.Close();
string rtfText =
ASCIIEncoding.ASCII.GetString(stream.GetBuffer());
stream.Close();
//Do additional things with rtfText before writing to my final file
The second approach I am trying I found here:
iTextSharp - How to generate a RTF document in the ClipBoard instead of a file
How big your resulting stream is? MemoryStream performs a lot of memory copy operations while growing, so for large results it may take significantly longer to write data by small chunks compared with FileStream.
To verify if it is the problem set inital size of MemoryStream to some large value around resulting size and re-run the code.
To fix it you can pre-grow memory stream initially (if you know approximate output) or write your own stream that uses different scheme when growing. Also using temporary file might be good enough for your purposes as is.
Like Alexei said, its probably caused by fact, yo are creating MemoryStream every time, and every time it continously re-alocates memory as it grows. Try creating only 1 stream and reset it to begining before every write.
Also I think stream.GetBuffer() again returns new memory, so try using same StreamReader with your MemoryStream.
And it seems your code can be easily paralelised, so you can try run it using Paralel Extesions or using TreadPool.
And it seems little weird, you are writing your text as bytes in stream, then reading this stream as bytes and converting to text. Wouldnt it be possible to save your document directly as text?
A MemoryStream is not associated with a file, and has no concept of a filename. Basically, you can't do that.
You certainly can't cast between them; you can only cast upwards an downwards - not sideways; to visualise:
Stream
|
| |
FileStream MemoryStream
You can cast a MemoryStream to a Stream trivially, and a Stream to a MemoryStream via a type-check; but never a FileStream to a MemoryStream. That is like saying a dog is an animal, and an elephant is an animal, so we can cast a dog to an elephant.
You could subclass MemoryStream and add a Name property (that you supply a value for), but there would still be no commonality between a FileStream and a YourCustomMemoryStream, and FileStream doesn't implement a pre-existing interface to get a Name; so the caller would have to explicitly handle both separately, or use duck-typing (maybe via dynamic or reflection).
Another option (perhaps easier) might be: write your data to a temporary file; use a FileStream from there; then (later) delete the file.
I know this is old but there is a lot of misinformation in this thread.
It's all about buffer size. The internal buffers are significantly smaller with a memory stream vs a file stream. Smaller buffers cause more read\writes.
Just intilaize your memory stream with either a file stream or a byte array with a size of around 80k. Close the doc, set stream position to 0 and read to end the contents.
On a side note, get buffer will return the whole allocated buffer. So if you only wrote 1 byte and the buffer is 4k, you will have a lot of garbage in your string.

Categories