Reading compressed file and writing to new file will not allow decompression

Reading compressed file and writing to new file will not allow decompression - c#

I have a test program that demonstrates the end result that I am hoping for (even though in this test program the steps may seem unnecessary).
The program compresses data to a file using GZipStream. The resulting compressed file is C:\mydata.dat.
I then read this file, and write it to a new file.
//Read original file
string compressedFile = String.Empty;
using (StreamReader reader = new StreamReader(#"C:\mydata.dat"))
{
compressedFile = reader.ReadToEnd();
reader.Close();
reader.Dispose();
}
//Write to a new file
using (StreamWriter file = new StreamWriter(#"C:\mynewdata.dat"))
{
file.WriteLine(compressedUserFile);
}
When I try to decompress the two files, the original one decompresses perfectly, but the new file throws an InvalidDataException with message The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
Why are these files different?

StreamReader is for reading a sequence of characters, not bytes. The same applies to StremWriter. Since treating compressed files as a stream of characters doesn't make any sense, you should use some implementation of Stream. If you want to get the stream as an array of bytes, you can use MemoryStream.
The exact reason why using character streams doesn't work is that they assume the UTF-8 encoding by default. If some byte is not valid UTF-8 (like the second byte of the header, 0x8B), it's represented as Unicode “replacement character” (U+FFFD). When the string is written back, that character is encoded using UTF-8 into something completely different than what was in the source.
For example, to read a file from a stream, get it as an array of bytes and then write it to another files as a stream:
byte[] bytes;
using (var fileStream = new FileStream(#"C:\mydata.dat", FileMode.Open))
using (var memoryStream = new MemoryStream())
{
fileStream.CopyTo(memoryStream);
bytes = memoryStream.ToArray();
}
using (var memoryStream = new MemoryStream(bytes))
using (var fileStream = new FileStream(#"C:\mynewdata.dat", FileMode.Create))
{
memoryStream.CopyTo(fileStream);
}
The CopyTo() method is only available in .Net 4, but you can write your own if you use older versions.
Of course, for this simple example, there is no need to use streams. You can simply do:
byte[] bytes = File.ReadAllBytes(#"C:\mydata.dat");
File.WriteAllBytes(#"C:\mynewdata.dat", bytes);

EDIT: Apparently, my suggestions are wrong/invalid/whatever... please use one of the others which have no doubt been highly re-factored to the point where no extra performance could be possible be achieved (else, that would mean they are just as invalid as mine)
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"C:\mydata.dat"))
{
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(#"C:\mynewdata.dat"))
{
byte[] bytes = new byte[1024];
int count = 0;
while((count = sr.BaseStream.Read(bytes, 0, bytes.Length)) > 0){
sw.BaseStream.Write(bytes, 0, count);
}
}
}
Read all bytes
byte[] bytes = null;
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"C:\mydata.dat"))
{
bytes = new byte[sr.BaseStream.Length];
int index = 0;
int count = 0;
while((count = sr.BaseStream.Read(bytes, index, 1024)) > 0){
index += count;
}
}
Read all bytes/write all bytes (from svick's answer):
byte[] bytes = File.ReadAllBytes(#"C:\mydata.dat");
File.WriteAllBytes(#"C:\mynewdata.dat", bytes);
PERFORMANCE TESTING WITH OTHER ANSWERS:
Just did a quick test between my Answer (StreamReader) (first part above, file copy) and svick's answer (FileStream/MemoryStream) (the first one). The test is 1000 iterations of the code, here are the results from 4 tests (results are in whole seconds, all actual result where slightly over these values):
My Code | svick code
--------------------
9 | 12
9 | 14
8 | 13
8 | 14
As you can see, in my test at least, my code performed better. One thing perhaps to note with mine is I am not reading a character stream, I am in fact accessing the BaseStream which is providing a byte stream. Perhaps svick's answer is slow because he is using two streams for reading, then two for writing. Of course, there is a lot of optimisation that could be done to svick's answer to improve the performance (and he also provided an alternative for simple file copy)
Testing with third option (ReadAllBytes/WriteAllBytes)
My Code | svick code | 3rd
----------------------------
8 | 14 | 7
9 | 18 | 9
9 | 17 | 8
9 | 17 | 9
Note: in milliseconds the 3rd option was always better

Related

Replace a byte of data

I'm trying to replace only one byte of data from a file, meaning something like 0X05 -> 0X15.
I'm using Replace function to do this.
using (StreamReader reader = new System.IO.StreamReader(Inputfile))
{
content = reader.ReadToEnd();
content = content.Replace("0x05","0x15");
reader.Close();
}
using (FileStream stream = new FileStream(outputfile, FileMode.Create))
{
using (BinaryWriter writer = new BinaryWriter(stream, Encoding.UTF8))
{
writer.Write(content);
}
}
Technically speaking, only that byte of data had to replaced with new byte, but I see there are many bytes of data changed.
Why other bytes are changing ?How can I achieve this?

You're talking about bytes but you've written code that reads strings; strings are an interpretation of bytes so if you truly do mean bytes, mangling them through strings is the wrong way to go
Anyways, there are helper methods to make your life easy, if the file is relatively small (maybe up to 500mb - I'd switch to using an incremental streaming reading/changing/writing method if it's bigger than this)
If you want bytes changed:
var b = File.ReadAllBytes("path");
for(int x = 0; x < b.Length; x++)
if(b[x] == 0x5)
b[x] = (byte)0x15;
File.WriteAllBytes("path", b);
If your file is a text file that literally has "0x05" in it:
File.WriteAllText("path", File.ReadAllText("path").Replace("0x05", "0x15"));
In response to your question in the comments, and assuming you want your file to grow by 2 bytes more for each 0x05 it contains (so a 1000 byte file that cotnains three 0x05 bytes will be 1006 bytes after being written) it is probably simplest to:
var b = File.ReadAllBytes("path");
using(FileStream fs = new FileStream("path", FileMode.Create)) //replace file
{
for(int x = 0; x < b.Length; x++)
if(b[x] == 0x5) {
fs.WriteByte((byte)0x15);
fs.WriteByte((byte)0x5);
fs.WriteByte((byte)0x15);
} else
fs.WriteByte(b);
}
Don't worry about writing a single byte at a time - it is buffered elsewhere in the IO chain. You could go for a solution that writes blocks of bytes from the array if you wanted.. this is just easier to code/understand

String or binary data would be truncated when uploading file in MVC

The file is only 14kb (14,000 bytes). I have read that the varbinary(max) column type (which is what I am using) only supports 8,000 bytes. Is that correct? How can I upload my file into the database?
if (file.ContentLength < (3 * 1048576))
{
// extract only the fielname
var fileName = Path.GetFileName(file.FileName);
using (MemoryStream ms = new MemoryStream())
{
file.InputStream.CopyTo(ms);
byte[] array = ms.GetBuffer();
adj.resumeFile = array;
adj.resumeFileContentType = file.ContentType;
}
}
The error:
String or binary data would be truncated. The statement has been
terminated.

Check your other columns that you are inserting into during this process. I would especially check the ContentType column as this will be something like image/jpeg and not simply image or jpeg.
Here is a list of possible content types so that you can create enough space in your ContentType column accordingly.

varbinary [ ( n | max) ]
Variable-length binary data. n can be a value from 1 through 8,000.
max indicates that the maximum storage size is 2^31-1 bytes.
http://msdn.microsoft.com/en-us/library/ms188362.aspx
So that is 2GB.

If you defined your column as VARBINARY(MAX) in the table definition, then you should have up to 2 GB of storage space. If you specified the maximum column size as a number then you can only explicitly ask for up to VARBINARY(8000).
See this question for more details
AFAIK VARBINARY(MAX) only appeared in SQL Server 2008, so if your database pre-dates that version you might need to upgrade it.

I know this isn't the answer to your question, but ms.GetBuffer() will get the underlying buffer which probably isn't the exact size of your data. The MemoryStream allocates extra room for writing and you are probably inserting extra bytes from the unused buffer. Here you can see that GetBuffer() returns a 256 byte array even though the file is only 5 bytes long:
using (MemoryStream ms = new MemoryStream())
{
using (FileStream fs = File.OpenRead("C:\\t\\hello.txt"))
{
fs.CopyTo(ms);
byte[] results = ms.GetBuffer();
Console.WriteLine("Size: {0}", results.Length); // 256
byte[] justdata = new byte[ms.Length];
Array.Copy(results, justdata, ms.Length);
Console.WriteLine("Size: {0}", justdata.Length); // 5
}
}

Processing Huge Files In C#

I have a 4Gb file that I want to perform a byte based find and replace on. I have written a simple program to do it but it takes far too long (90 minutes+) to do just one find and replace. A few hex editors I have tried can perform the task in under 3 minutes and don't load the entire target file into memory. Does anyone know a method where I can accomplish the same thing? Here is my current code:
public int ReplaceBytes(string File, byte[] Find, byte[] Replace)
{
var Stream = new FileStream(File, FileMode.Open, FileAccess.ReadWrite);
int FindPoint = 0;
int Results = 0;
for (long i = 0; i < Stream.Length; i++)
{
if (Find[FindPoint] == Stream.ReadByte())
{
FindPoint++;
if (FindPoint > Find.Length - 1)
{
Results++;
FindPoint = 0;
Stream.Seek(-Find.Length, SeekOrigin.Current);
Stream.Write(Replace, 0, Replace.Length);
}
}
else
{
FindPoint = 0;
}
}
Stream.Close();
return Results;
}
Find and Replace are relatively small compared with the 4Gb "File" by the way. I can easily see why my algorithm is slow but I am not sure how I could do it better.

Part of the problem may be that you're reading the stream one byte at a time. Try reading larger chunks and doing a replace on those. I'd start with about 8kb and then test with some larger or smaller chunks to see what gives you the best performance.

There are lots of better algorithms for finding a substring in a string (which is basically what you are doing)
Start here:
http://en.wikipedia.org/wiki/String_searching_algorithm
The gist of them is that you can skip a lot of bytes by analyzing your substring. Here's a simple example
4GB File starts with: A B C D E F G H I J K L M N O P
Your substring is: N O P
You skip the length of the substring-1 and check against the last byte, so compare C to P
It doesn't match, so the substring is not the first 3 bytes
Also, C isn't in the substring at all, so you can skip 3 more bytes (len of substring)
Compare F to P, doesn't match, F isn't in substring, skip 3
Compare I to P, etc, etc
If you match, go backwards. If the character doesn't match, but is in the substring, then you have to do some more comparing at that point (read the link for details)

Instead of reading file byte by byte read it by buffer:
buffer = new byte[bufferSize];
currentPos = 0;
length = (int)Stream .Length;
while ((count = Stream.Read(buffer, currentPos, bufferSize)) > 0)
{
currentPos += count;
....
}

Another, easier way of reading more than one byte at a time:
var Stream = new BufferedStream(new FileStream(File, FileMode.Open, FileAccess.ReadWrite));
Combining this with Saeed Amiri's example of how to read into a buffer, and one of the better binary find/replace algorithms should give you better results.

You should try using memory-mapped files. C# supports them starting with version 4.0.
A memory-mapped file contains the contents of a file in virtual memory.
Persisted files are memory-mapped files that are associated with a source file on a disk. When the last process has finished working with the file, the data is saved to the source file on the disk. These memory-mapped files are suitable for working with extremely large source files.

Unable to read beyond the end of the stream

I did some quick method to write a file from a stream but it's not done yet. I receive this exception and I can't find why:
Unable to read beyond the end of the stream
Is there anyone who could help me debug it?
public static bool WriteFileFromStream(Stream stream, string toFile)
{
FileStream fileToSave = new FileStream(toFile, FileMode.Create);
BinaryWriter binaryWriter = new BinaryWriter(fileToSave);
using (BinaryReader binaryReader = new BinaryReader(stream))
{
int pos = 0;
int length = (int)stream.Length;
while (pos < length)
{
int readInteger = binaryReader.ReadInt32();
binaryWriter.Write(readInteger);
pos += sizeof(int);
}
}
return true;
}
Thanks a lot!

Not really an answer to your question but this method could be so much simpler like this:
public static void WriteFileFromStream(Stream stream, string toFile)
{
// dont forget the using for releasing the file handle after the copy
using (FileStream fileToSave = new FileStream(toFile, FileMode.Create))
{
stream.CopyTo(fileToSave);
}
}
Note that i also removed the return value since its pretty much useless since in your code, there is only 1 return statement
Apart from that, you perform a Length check on the stream but many streams dont support checking Length.
As for your problem, you first check if the stream is at its end. If not, you read 4 bytes. Here is the problem. Lets say you have a input stream of 6 bytes. First you check if the stream is at its end. The answer is no since there are 6 bytes left. You read 4 bytes and check again. Ofcourse the answer is still no since there are 2 bytes left. Now you read another 4 bytes but that ofcourse fails since there are only 2 bytes. (readInt32 reads the next 4 bytes).

I presume that the input stream have ints only (Int32). You need to test the PeekChar() method,
while (binaryReader.PeekChar() != -1)
{
int readInteger = binaryReader.ReadInt32();
binaryWriter.Write(readInteger);
}

You are doing while (pos < length) and length is the actual length of the stream in bytes. So you are effectively counting the bytes in the stream and then trying to read that many number of ints (which is incorrect). You could take length to be stream.Length / 4 since an Int32 is 4 bytes.

try
int length = (int)binaryReader.BaseStream.Length;

After reading the stream by the binary reader the position of the stream is at the end, you have to set the position to zero "stream.position=0;"

Byte size of Packet

I am trying to implement this protocol (http://developer.valvesoftware.com/wiki/Source_RCON_Protocol) from a C# NET application. The part applicable there to the code I am implementing is under heading "Receiving". However, I am not positive I have the byte sizes correct when constructing the packet.
Here is my function to construct a packet...
private static byte[] ConstructPacket(int request_id, int cmdtype, string cmd)
{
MemoryStream stream = new MemoryStream();
using (BinaryWriter writer = new BinaryWriter(stream))
{
byte[] cmdBytes = ConvertStringToByteArray(cmd);
int packetSize = 12 + cmdBytes.Length;
// Packet Contents
writer.Write((int)packetSize); // Byte size of Packet not including This
writer.Write((int)request_id); // 4 Bytes
writer.Write((int)cmdtype); // 4 Bytes
writer.Write(cmdBytes); // 8 Bytes ??
// NULL String 1
writer.Write((byte)0x00);
writer.Write((byte)0x00);
// NULL String 2
writer.Write((byte)0x00);
writer.Write((byte)0x00);
// Memory Stream to Byte Array
byte[] buffer = stream.ToArray();
return buffer;
}
}
According to the Protocol specifications, packetSize is the byte size of the packet not including itself.
The first 2 (int) would make it 8 bytes...
The "cmdBytes", which in this paticular instance is "testpass" would be 8 bytes I believe...
Then the final 2 null delimited strings (If I set these up right) would be 4 bytes.
So by my calculations, the packet should be 20 bytes big, but it doesn't seem to be working properly. Are the values I am thinking these should all be correct and am I setting the NULL delmited strings properly for C# .NET?

You write two zeros too many. Pretty easy to see in the examples, they all end with two zeros, not four.

You should probably call writer.Flush() after the last writer.Write(). Otherwise you run the risk of disposing the writer before it's finished writing everything to the stream.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading compressed file and writing to new file will not allow decompression - c#

Related

Replace a byte of data

String or binary data would be truncated when uploading file in MVC

Processing Huge Files In C#

Unable to read beyond the end of the stream

Byte size of Packet

Categories

Resources