GZipStream -- the decompressed file is missing data

GZipStream -- the decompressed file is missing data - c#

I need to get a file from my server via FTP into a memory stream and then decompress it so I can further work with it.
I do the below but the decompressed file is truncated every time.
I can see that the FTP part is working correctly (I checked that ms.Length equals the correct file size on the server (about 700KB)).
res.Length is only about 400K but it should be about 10MB. (also I can see in the Console.WriteLine(res) that the file is truncated).
I get a MemoryStream from my FTP code then...
var decompress = new GZipStream(ms, CompressionMode.Decompress);
using (var sr = new StreamReader(decompress))
{
var res = sr.ReadToEnd();
Console.WriteLine(res);
}

Related

Convert object to CSV and then compress without touching physical storage

Scenario
I have a object that I convert to a flat CSV and then compress and upload to a filestore.
I could easily do this by following the below steps.
Convert object to CSV file.
Compress file
Upload file.
However
I do not want the penalty that comes with touching physical storage so would like to do all this in memory.
Current Incorrect Implementation
Convert object to CSV byte array
Compress byte array
Upload byte array to file store
Problem
What i'm essentially doing is compressing a byte array and uploading that. which is obviously wrong. (Because when the compressed Gzip file is uncompressed, it contains a byte array of the csv and not the actual csv itself.)
Is it possible to create a file like "file.csv" in memory and then compress that in memory, instead of compressing a byte array?
The problem I'm having is it would seem I can only name the file and specify its extension when saving to a physical location.
Code Example of Current Implementation
public byte[] Example(IEnumerable<object> data)
{
// Convert Object to CSV and write to byte array.
byte[] bytes = null;
using (var ms = new MemoryStream())
{
TextWriter writer = new StreamWriter(ms);
var csv = new CsvWriter(writer);
csv.WriteRecords(data);
writer.Flush();
ms.Position = 0;
bytes = ms.ToArray();
}
//Compress byte array
using (var compressedStream = new MemoryStream(bytes))
using (var resultStream = new MemoryStream())
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
{
zipStream.CopyTo(resultStream);
zipStream.Close();
var gzipByteArray = resultStream.ToArray();
//Upload to AzureStorage
new AzureHelper().UploadFromByteArray(gzipByteArray, 0, gzipByteArray.Length);
}
}

Wrap the Stream you use for the upload in a GZipStream, write your CSV to that, and the then you'll have uploaded the compressed CSV.

Decompress file with wrong size

I have a method that decompresses *.gz file:
using (FileStream originalFileStream = new FileStream(gztempfilename, FileMode.Open, FileAccess.Read))
{
using (FileStream decompressedFileStream = new FileStream(outputtempfilename, FileMode.Create, FileAccess.Write))
{
using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
}
}
It worked perfectly, but recently I received pack of files with wrong size:
When I open them with 7-zip they have Packed Size ~ 1,600,000 and Size = 7 (it should be ~20,000,000).
So when I extract them using this code I get only a part of the file. But when I extract this file using 7-zip I get full file.
How can I handle this situation in my code?

My guess is that that the other end does a mistake when GZipping the files. It looks like it does not set the ISIZE bytes correctly.
The ISIZE bytes are the last four bytes of a valid GZip file and come after a 32-bit CRC value which in turn comes directly after the compressed data bytes.
7-Zip seems to be robust against such mistakes whereas the GZipStream is not. It is odd however that 7-Zip is not showing you any errors. It should show you (tested with 7-Zip 16.02 x64/Win7)...
CRC error in case the size is simply wrong,
"Unexpected end of data" in case some or all of the ISIZE bytes are cut off,
"There are some data after end of the payload data" in case there is more data following the ISIZE bytes.
7-Zip always uses the last four bytes of the packed file to determine the size of the original unpacked file without checking if the file is valid and whether the bytes read for that are actually the ISIZE bytes.
You can verify this by checking those last four bytes of the GZipped file with a hex viewer. For your example they should be exactly 07 00 00 00.
If you know the exact size of the unpacked original file you could replace those bytes so that they specify the correct size. For instance, if the unpacked file's size is 20,000,078, which is 01312D4E in hex (0-padded to eight digits), those bytes should be 4E 2D 31 01.
In case you don't know the exact size you can try replacing them with the maximum value, i.e. FF FF FF FF.
After that try your unpack code again.
This is obviously only a hacky solution to your problem. Better try fixing the code that GZips the files you receive or try to find a library that is more robust than GZipStream.

I've used ICSharpCode.SharpZipLib.GZip.GZipInputStream from this library instead of System.IO.Compression.GZipStream and it helped.

Did you try this for check the size? ie:
byte[] bArray;
using (FileStream f = new FileStream(tempFile, FileMode.Open))
{
bArray= new byte[f.Length];
f.Read(b, 0, f.Length);
}
Regards
try:
GZipStream uncompressed = new GZipStream(streamIn, CompressionMode.Decompress, true);
FileStream streamOut = new FileStream(tempDoc[0], FileMode.Create, FileAccess.Write, FileShare.None);

Looks like this is some sort of bug in GZipStream (it does not write original file length into gz end of file).
You need to change the way you compress your files using GZipStream.
The way it will work:
inputBytes = Encoding.UTF8.GetBytes(output);
using (var outputStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
gZipStream.Write(inputBytes, 0, inputBytes.Length);
System.IO.File.WriteAllBytes("file.xml.gz", outputStream.ToArray());
}
And this way will cause the error you have (no matter will you use Flush() or not):
inputBytes = Encoding.UTF8.GetBytes(output);
using (var outputStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
{
gZipStream.Write(inputBytes, 0, inputBytes.Length);
System.IO.File.WriteAllBytes("file.xml.gz", outputStream.ToArray());
}
}

You might need to call decompressedStream.Seek() after closing the gZip stream.
As shown here.

Download a PDF from a third party using ASP.NET HttpWebRequest/HttpWebResponse

I want to send a url as query string e.g.
localhost/abc.aspx?url=http:/ /www.site.com/report.pdf
and detect if the above URL returns the PDF file. If it will return PDF then it gets saved automatically otherwise it gives error.
There are some pages that uses Handler to fetch the files so in that case also I want to detect and download the same.
localhost/abc.aspx?url=http:/ /www.site.com/page.aspx?fileId=223344
The above may return a pdf file.
What is best way to capture this?
Thanks

You can download a PDF like this
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = req.GetResponse();
//check the filetype returned
string contentType = response.ContentType;
if(contentType!=null)
{
splitString = contentType.Split(';');
fileType = splitString[0];
}
//see if its PDF
if(fileType!=null && fileType=="application/pdf"){
Stream stream = response.GetResponseStream();
//save it
using(FileStream fileStream = File.Create(fileFullPath)){
// Initialize the bytes array with the stream length and then fill it with data
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, bytesInStream.Length);
// Use write method to write to the file specified above
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
}
response.Close();
The fact that it may come from an .aspx handler doesn't actually matter, it's the mime returned in the server response that is used.
If you are getting a generic mime type, like application/octet-stream then you must use a more heuristical approach.
Assuming you cannot simply use the file extension (eg for .aspx), then you can copy the file to a MemoryStream first (see How to get a MemoryStream from a Stream in .NET?). Once you have a memory stream of the file, you can take a 'cheeky' peek at it (I say cheeky because it's not the correct way to parse a PDF file)
I'm not an expert on PDF format, but I believe reading the first 5 chars with an ASCII reader will yield "%PDF-", so you can identify that with
bool isPDF;
using( StreamReader srAsciiFromStream = new StreamReader(memoryStream,
System.Text.Encoding.ASCII)){
isPDF = srAsciiFromStream.ReadLine().StartsWith("%PDF-");
}
//set the memory stream back to the start so you can save the file
memoryStream.Position = 0;

send GZIP stream over WCF

Below is my code.
I set the content-encoding header. Then write the file stream, to memory stream, using gzip encoding. Then finally return the memory stream.
However, the android, IOS, and webbrowser all recieve corrupt copies of the stream. None of them are able to fully read through the decompressed stream on the other side. Which vital part am I missing?
public Stream GetFileStream(String path, String basePath)
{
FileInfo fi = new FileInfo(basePath + path);
//WebOperationContext.Current.OutgoingResponse.ContentType = "application/x-gzip";
WebOperationContext.Current.OutgoingResponse.Headers.Add("Content-Encoding","gzip");
MemoryStream ms = new MemoryStream();
GZipStream CompressStream = new GZipStream(ms, CompressionMode.Compress);
// Get the stream of the source file.
FileStream inFile = fi.OpenRead();
// Prevent compressing hidden and already compressed files.
if ((File.GetAttributes(fi.FullName) & FileAttributes.Hidden)
!= FileAttributes.Hidden & fi.Extension != ".gz")
{
// Copy the source file into the compression stream.
inFile.CopyTo(CompressStream);
Log.d(String.Format("Compressed {0} from {1} to {2} bytes.",
fi.Name, fi.Length.ToString(), ms.Length.ToString()));
}
ms.Position = 0;
inFile.Close();
return ms;
}

I'd strongly recommend sending a byte array. Then on client side create a zip stream from the received byte array.

LINQtoCSV Stream provided to read is either null, or does not support seek

I am trying to use LINQtoCSV to parse out a CSV file into a list of objects and am receiving the error "Stream provided to read is either null, or does not support seek."
The error is happening at foreach(StockQuote sq in stockQuotesStream)
Below is the method that is throwing the error. The .CSV file is being downloaded from the internet and is never stored to disk (only stored to StreamReader).
public List<StockQuote> CreateStockQuotes(string symbol)
{
List<StockQuote> stockQuotes = new List<StockQuote>();
CsvFileDescription inputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = false
};
CsvContext cc = new CsvContext();
IEnumerable<StockQuote> stockQuotesStream = cc.Read<StockQuote>(GetCsvData(symbol));
foreach (StockQuote sq in stockQuotesStream)
{
stockQuotes.Add(sq);
}
return stockQuotes;
}

The .CSV file is being downloaded from the internet and is never stored to disk (only stored to StreamReader).
Well presumably that's the problem. It's not quite clear what you mean by this, in that if you have wrapped a StreamReader around it, that's a pain in terms of the underlying stream - but you can't typically seek on a stream being downloaded from the net, and it sounds like the code you're using requires a seekable stream.
One simple option is to download the whole stream into a MemoryStream (use Stream.CopyTo if you're using .NET 4), then rewind the MemoryStream (set Position to 0) and pass that to the Read method.

Using a MemoryStream first and then a StreamReader was the answer, but I went about it a little differently than mentioned.
WebClient client = new WebClient();
using (MemoryStream download = new MemoryStream(client.DownloadData(url)))
{
using (StreamReader dataReader = new StreamReader(download, System.Text.Encoding.Default, true))
{
return dataReader;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

GZipStream -- the decompressed file is missing data - c#

Related

Convert object to CSV and then compress without touching physical storage

Decompress file with wrong size

Download a PDF from a third party using ASP.NET HttpWebRequest/HttpWebResponse

send GZIP stream over WCF

LINQtoCSV Stream provided to read is either null, or does not support seek

Categories

Resources