Sending GZipped data via TcpClient [duplicate] - c#

I've got a pesky problem with gzipstream targeting .Net 3.5. This is my first time working with gzipstream, however I have modeled after a number of tutorials including here and I'm still stuck.
My app serializes a datatable to xml and inserts into a database, storing the compressed data into a varbinary(max) field as well as the original length of the uncompressed buffer. Then, when I need it, I retrieve this data and decompress it and recreates the datatable. The decompress is what seems to fail.
EDIT: Sadly after changing the GetBuffer to ToArray as suggested, my issue remains. Code Updated below
Compress code:
DataTable dt = new DataTable("MyUnit");
//do stuff with dt
//okay... now compress the table
using (MemoryStream xmlstream = new MemoryStream())
{
//instead of stream, use xmlwriter?
System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings();
settings.Encoding = Encoding.GetEncoding(1252);
settings.Indent = false;
System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(xmlstream, settings);
try
{
dt.WriteXml(writer);
writer.Flush();
}
catch (ArgumentException)
{
//likely an encoding issue... okay, base64 encode it
var base64 = Convert.ToBase64String(xmlstream.ToArray());
xmlstream.Write(Encoding.GetEncoding(1252).GetBytes(base64), 0, Encoding.GetEncoding(1252).GetBytes(base64).Length);
}
using (MemoryStream zipstream = new MemoryStream())
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Compress);
log.DebugFormat("Compressing commands...");
zip.Write(xmlstream.GetBuffer(), 0, xmlstream.ToArray().Length);
zip.Flush();
float ratio = (float)zipstream.ToArray().Length / (float)xmlstream.ToArray().Length;
log.InfoFormat("Resulting compressed size is {0:P2} of original", ratio);
using (SqlCommand cmd = new SqlCommand())
{
cmd.CommandText = "INSERT INTO tinydup (lastid, command, compressedlength) VALUES (#lastid,#compressed,#length)";
cmd.Connection = db;
cmd.Parameters.Add("#lastid", SqlDbType.Int).Value = lastid;
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.ToArray();
cmd.Parameters.Add("#length", SqlDbType.Int).Value = xmlstream.ToArray().Length;
cmd.ExecuteNonQuery();
}
}
Decompress Code:
/* This is an encapsulation of what I get from the database
public class DupUnit{
public uint lastid;
public uint complength;
public byte[] compressed;
}*/
//I have already retrieved my list of work to do from the database in a List<Dupunit> dupunits
foreach (DupUnit unit in dupunits)
{
DataSet ds = new DataSet();
//DataTable dt = new DataTable();
//uncompress and extract to original datatable
try
{
using (MemoryStream zipstream = new MemoryStream(unit.compressed))
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
//WHY ARE YOU ALWAYS 0!!!!!!!!
int bytesdecompressed = zip.Read(xmlbits, 0, unit.compressed.Length);
MemoryStream xmlstream = new MemoryStream(xmlbits);
log.DebugFormat("Uncompressed XML against {0} is: {1}", m_source.DSN, Encoding.GetEncoding(1252).GetString(xmlstream.ToArray()));
try{
ds.ReadXml(xmlstream);
}catch(Exception)
{
//it may have been base64 encoded... decode first.
ds.ReadXml(Encoding.GetEncoding(1254).GetString(
Convert.FromBase64String(
Encoding.GetEncoding(1254).GetString(xmlstream.ToArray())))
);
}
xmlstream.Dispose();
}
}
catch (Exception e)
{
log.Error(e);
Thread.Sleep(1000);//sleep a sec!
continue;
}
Note the comment above... bytesdecompressed is always 0. Any ideas? Am I doing it wrong?
EDIT 2:
So this is weird. I added the following debug code to the decompression routine:
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
int offset = 0;
while (zip.CanRead && offset < xmlbits.Length)
{
while (zip.Read(xmlbits, offset, 1) == 0) ;
offset++;
}
When debugging, sometimes that loop would complete, but other times it would hang. When I'd stop the debugging, it would be at byte 1600 out of 1616. I'd continue, but it wouldn't move at all.
EDIT 3: The bug appears to be in the compress code. For whatever reason, it is not saving all of the data. When I try to decompress the data using a third party gzip mechanism, I only get part of the original data.
I'd start a bounty, but I really don't have much reputation to give as of now :-(

Finally found the answer. The compressed data wasn't complete because GZipStream.Flush() does absolutely nothing to ensure that all of the data is out of the buffer - you need to use GZipStream.Close() as pointed out here. Of course, if you get a bad compress, it all goes downhill - if you try to decompress it, you will always get 0 returned from the Read().

I'd say this line, at least, is the most wrong:
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.GetBuffer();
MemoryStream.GetBuffer:
Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method.
It should be noted that in the zip format, it first works by locating data stored at the end of the file - so if you've stored more data than was required, the required entries at the "end" of the file don't exist.
As an aside, I'd also recommend a different name for your compressedlength column - I'd initially taken it (despite your narrative) as being intended to store, well, the length of the compressed data (and written part of my answer to address that). Maybe originalLength would be a better name?

Related

File API seems to always write corrupt files when used in a loop, except for the last file

I know the title is long, but it describes the problem exactly. I didn't know how else to explain it because this is totally out there.
I have a utility written in C# targeting .NET Core 2.1 that downloads and decrypts (AES encryption) files originally uploaded by our clients from our encrypted store, so they can be reprocessed through some of our services in the case that they fail. This utility is run via CLI using database IDs for the files as arguments, for example download.bat 101 102 103 would download 3 files with the corresponding IDs. I'm receiving byte data through a message queue (really not much more than a TCP socket) which describes a .TIF image.
I have a good reason to believe that the byte data is not ever corrupted on the server. That reason is when I run the utility with only one ID parameter, such as download.bat 101, then it works just fine. Furthermore, when I run it with multiple IDs, the last file that is downloaded by the utility is always intact, but the rest are always corrupted.
This odd behavior has persisted across two different implementations for writing the byte data to a file. Those implementations are below.
File.ReadAllBytes implementation:
private static void WriteMessageContents(FileServiceResponseEnvelope envelope, string destination, byte[] encryptionKey, byte[] macInitialVector)
{
using (var inputStream = new MemoryStream(envelope.Payload))
using (var outputStream = new MemoryStream(envelope.Payload.Length))
{
var sha512 = YellowAesEncryptor.DecryptStream(inputStream, outputStream, encryptionKey, macInitialVector, 0);
File.WriteAllBytes(destination, outputStream.ToArray());
_logger.LogStatement($"Finished writing [{envelope.Payload.Length} bytes] to [{destination}].", LogLevel.Debug);
}
}
FileStream implementation:
private static void WriteMessageContents(FileServiceResponseEnvelope envelope, string destination, byte[] encryptionKey, byte[] macInitialVector)
{
using (var inputStream = new MemoryStream(envelope.Payload))
using (var outputStream = new MemoryStream(envelope.Payload.Length))
{
var sha512 = YellowAesEncryptor.DecryptStream(inputStream, outputStream, encryptionKey, macInitialVector, 0);
using (FileStream fs = new FileStream(destination, FileMode.Create))
{
var bytes = outputStream.ToArray();
fs.Write(bytes, 0, envelope.Payload.Length);
_logger.LogStatement($"File byte content: [{string.Join(", ", bytes.Take(16))}]", LogLevel.Trace);
fs.Flush();
}
_logger.LogStatement($"Finished writing [{envelope.Payload.Length} bytes] to [{destination}].", LogLevel.Debug);
}
}
This method is called from a for loop which first receives the messages I described earlier and then feeds their payloads to the above method:
using (var requestSocket = new RequestSocket(fileServiceEndpoint))
{
// Envelopes is constructed beforehand
foreach (var envelope in envelopes)
{
var timer = Stopwatch.StartNew();
requestSocket.SendMoreFrame(messageTypeBytes);
requestSocket.SendMoreFrame(SerializationHelper.SerializeObjectToBuffer(envelope));
if (!requestSocket.TrySendFrame(_timeout, signedPayloadBytes, signedPayloadBytes.Length))
{
var message = $"Timeout exceeded while processing [{envelope.ActionType}] request.";
_logger.LogStatement(message, LogLevel.Error);
throw new Exception(message);
}
var responseReceived = requestSocket.TryReceiveFrameBytes(_timeout, out byte[] responseBytes);
...
var responseEnvelope = SerializationHelper.DeserializeObject<FileServiceResponseEnvelope>(responseBytes);
...
_logger.LogStatement($"Received response with payload of [{responseEnvelope.Payload.Length} bytes].", LogLevel.Info);
var destDir = downloadDetails.GetDestinationPath(responseEnvelope.FileId);
if (!Directory.Exists(destDir))
Directory.CreateDirectory(destDir);
var dest = Path.Combine(destDir, idsToFileNames[responseEnvelope.FileId]);
WriteMessageContents(responseEnvelope, dest, encryptionKey, macInitialVector);
}
}
I also know that TIFs have a very specific header, which looks something like this in raw bytes:
[73, 73, 42, 0, 8, 0, 0, 0, 20, 0...
It always begins with "II" (73, 73) or "MM" (77, 77) followed by 42 (probably a Hitchhiker's reference). I analyzed the bytes written by the utility. The last file always has a header that resembles this one. The rest are always random bytes; seemingly jumbled or mis-ordered image binary data. Any insight on this would be greatly appreciated because I can't wrap my mind around what I would even need to do to diagnose this.
UPDATE
I was able to figure out this problem with the help of elgonzo in the comments. Sometimes it isn't a direct answer that helps, but someone picking your brain until you look in the right place.
All right, as I suspected this was a dumb mistake (I had severe doubts that the File API was simply this flawed for so long). I just needed help thinking through it. There was an additional bit of code which I didn't post that was biting me, when I was retrieving the metadata for the file so that I could then request the file from our storage box.
byte[] encryptionKey = null;
byte[] macInitialVector = null;
...
using (var conn = new SqlConnection(ConnectionString))
using (var cmd = new SqlCommand(uploadedFileQuery, conn))
{
conn.Open();
var reader = cmd.ExecuteReader();
while (reader.Read())
{
FileServiceMessageEnvelope readAllEnvelope = null;
var originalFileName = reader["UploadedFileClientName"].ToString();
var fileId = Convert.ToInt64(reader["UploadedFileId"].ToString());
//var originalFileExtension = originalFileName.Substring(originalFileName.IndexOf('.'));
//_logger.LogStatement($"Scooped extension: {originalFileExtension}", LogLevel.Trace);
envelopes.Add(readAllEnvelope = new FileServiceMessageEnvelope
{
ActionType = FileServiceActionTypeEnum.ReadAll,
FileType = FileTypeEnum.UploadedFile,
FileName = reader["UploadedFileServerName"].ToString(),
FileId = fileId,
WorkerAuthorization = null,
BinaryTimestamp = DateTime.Now.ToBinary(),
Position = 0,
Count = Convert.ToInt32(reader["UploadedFileSize"]),
SignerFqdn = _messengerConfig.FullyQualifiedDomainName
});
readAllEnvelope.SignMessage(_messengerConfig.PrivateKeyBytes, _messengerConfig.PrivateKeyPassword);
signedPayload = new SecureMessage { Payload = new byte[0] };
signedPayload.SignMessage(_messengerConfig.PrivateKeyBytes, _messengerConfig.PrivateKeyPassword);
signedPayloadBytes = SerializationHelper.SerializeObjectToBuffer(signedPayload);
encryptionKey = (byte[])reader["UploadedFileEncryptionKey"];
macInitialVector = (byte[])reader["UploadedFileEncryptionMacInitialVector"];
}
conn.Close();
}
Eagle-eyed observers might realize that I have not properly coupled the encryptionKey and macInitialVector to the correct record, since each file has a unique key and vector. This means I was using the key for one of the files to decrypt all of them which is why they were all corrupt except for one file -- they were not properly decrypted. I solved this issue by coupling them together with the ID in a simple POCO and retrieving the appropriate key and vector for each file upon decryption.

Validate potential new rows to add to DataGridView from a .BIN file

Ignore/skip the first constant record which contains info like store id, location and date opened... Read the rest (info like purchase id, item name, date purchased)...
I am creating an application that decodes .bin files and displays info into a DataGridView table. When a row is selected more info is displayed into additional fields. However, unwanted data is always added in the first row of each file and sometimes the last row. I want to validate this and display only if validation's pass. My code simply adds what it sees after some conversion. When a user selects a file to import, the method Info_to_Table is called.
byte[] rec_arr = new byte[32767];
private void Info_to_Table()
{
rec_arr = File.ReadAllBytes(import.FileName);
FileInfo fi = new FileInfo(import.FileName);
long length = fi.Length;
long count = 0;
using (FileStream Fs = new FileStream(import.FileName, FileMode.Open, FileAccess.Read))
using (BinaryReader Br = new BinaryReader(Fs))
{
// Read other info here
while ((length = Br.Read(rec_arr, (int)0, 1024)) > 0) *
{
// Read other info here
Dgv.Rows.Add(DecodeLong(rec_arr, 4), count++, DecodeDateTime(rec_arr, 28));
}
Br.Close();
Fs.Close();
}
}
When a record (row) is selected:
private void Dgv_SelectionChanged(object sender, EventArgs e)
{
To_Fields();
}
Each time a record is selected:
private void To_Fields()
{
rec_arr = File.ReadAllBytes(import.FileName);
FileInfo file_info_bin = new FileInfo(import.FileName);
long length_bin = file_info_bin.Length;
int rec_num_to_read = Dgv.CurrentRow.Index;
using (FileStream FS = new FileStream(import.FileName, FileMode.Open, FileAccess.Read))
using (BinaryReader BR = new BinaryReader(FS))
{
do
{
FS.Seek(rec_num_to_read * 1024, SeekOrigin.Begin);
// Read other info here!
Status(rec_arr);
foreach (var rec in rec_arr)
{
rec_num_to_read++;
}
}
while ((length_bin = BR.Read(rec_arr, 0, 1024)) > 0);
BR.Close();
FS.Close();
}
}
Is there a way of validating the file information before it populates the table? Ignoring any isn't correct? It's the entire row that is wrong if the first column's number is larger than 22500.
What it does now, this is wrong:
What I would like it to do, this is correct:
Each record in the file is 1024 characters long. I thought the solution would be in the line with * as here I am reading bytes from the stream with index as the starting point in the array. byte[], index, count.
Generally, the files look like this and all have the same structure: (large file with 682 records)
Since you didn't provide any description of the data you're dealing with, it's hard to come up with anything better than this:
while ((length = Br.Read(rec_arr, (int)0, 1024)) > 0) *
{
// Read other info here
var foo = DecodeLong(rec_arr, 4);
if (foo != 825045041)
Dgv.Rows.Add(foo, count++, DecodeDateTime(rec_arr, 28));
}
And in the To_Fields method you can just skip one record:
int rec_num_to_read = Dgv.CurrentRow.Index + 1;
However, I tried to run your code with the data from the screenshot, and it didn't work. I skip 1024 bytes (the unwanted record) and what I get from the beginning of the next record doesn't seem to be valid either. The value is 542966816, and the date is just mess. I don't know the date format, but it's clear that there's something wrong.
I don't have enough reputation to post images, so here's a link: http://i.stack.imgur.com/AdIAa.gif
I'd like to add that several statements are useless.
You don't need to call Close(), that's what using is for. Explanation on MSDN.
Br.Close();
Fs.Close();
The other remarks are true, if there's no code you have omitted.
Reading is done in a loop, so ReadAllBytes is not needed.
rec_arr = File.ReadAllBytes(import.FileName);
length is not used either.
FileInfo fi = new FileInfo(import.FileName);
long length = fi.Length;
And one more thing, long is 64 bit in C#. Use int which is 32 bit instead. So it's better to rename DecodeLong to DecodeInt32, because it's quite confusing really.

LINQtoCSV Stream provided to read is either null, or does not support seek

I am trying to use LINQtoCSV to parse out a CSV file into a list of objects and am receiving the error "Stream provided to read is either null, or does not support seek."
The error is happening at foreach(StockQuote sq in stockQuotesStream)
Below is the method that is throwing the error. The .CSV file is being downloaded from the internet and is never stored to disk (only stored to StreamReader).
public List<StockQuote> CreateStockQuotes(string symbol)
{
List<StockQuote> stockQuotes = new List<StockQuote>();
CsvFileDescription inputFileDescription = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = false
};
CsvContext cc = new CsvContext();
IEnumerable<StockQuote> stockQuotesStream = cc.Read<StockQuote>(GetCsvData(symbol));
foreach (StockQuote sq in stockQuotesStream)
{
stockQuotes.Add(sq);
}
return stockQuotes;
}
The .CSV file is being downloaded from the internet and is never stored to disk (only stored to StreamReader).
Well presumably that's the problem. It's not quite clear what you mean by this, in that if you have wrapped a StreamReader around it, that's a pain in terms of the underlying stream - but you can't typically seek on a stream being downloaded from the net, and it sounds like the code you're using requires a seekable stream.
One simple option is to download the whole stream into a MemoryStream (use Stream.CopyTo if you're using .NET 4), then rewind the MemoryStream (set Position to 0) and pass that to the Read method.
Using a MemoryStream first and then a StreamReader was the answer, but I went about it a little differently than mentioned.
WebClient client = new WebClient();
using (MemoryStream download = new MemoryStream(client.DownloadData(url)))
{
using (StreamReader dataReader = new StreamReader(download, System.Text.Encoding.Default, true))
{
return dataReader;
}
}

Why is http content different when sent from c# vs java?

I have an xml file that I need to send to a REST server as a post. When I read the exact same file from c# and java the bytes do not match when they arrive at the server. The java ones fail with a 500 Internal Server Error while the c# one works perfectly. The server is c#.
The file in c# is read as follows:
using (ms = new MemoryStream())
{
string fullPath = #"c:\pathtofile\datalast.xml";
using (FileStream outStream = File.OpenRead(fullPath))
{
outStream.CopyTo(ms);
outStream.Flush();
}
ms.Position = 0;
var xmlDoc = new XmlDocument();
xmlDoc.Load(ms);
content = xmlDoc.OuterXml;
}
content is then sent to a call that uses an HttpWebResponse
The java (Android) code reads the file like this:
FileInputStream fis = app.openFileInput(DATA_LAST_FILE_NAME);
byte[] buffer = new byte[1024];
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
int len;
while ((len = fis.read(buffer)) != -1)
{
outputStream.write(buffer, 0, len);
}
outputStream.close();
fis.close();
ByteArrayEntity data = new ByteArrayEntity(buffer);
data.setContentType("application/xml");
post.setEntity(data);
HttpResponse response = request.execute(post);
For the most part the arrays generated are identical. The only difference seems to be in the first 3 bytes. The c# byte array's first 3 values are:
239,187,191
The java ones are:
-17,-69,-65
What is happening here? What should I do?
Thanks,
\ ^ / i l l
Look at what you're doing here:
FileInputStream fis = app.openFileInput(DATA_LAST_FILE_NAME);
byte[] buffer = new byte[1024];
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
int len;
while ((len = fis.read(buffer)) != -1)
{
outputStream.write(buffer, 0, len);
}
outputStream.close();
fis.close();
ByteArrayEntity data = new ByteArrayEntity(buffer);
You're creating the ByteArrayEntity from the buffer that you've used when reading the data. It's almost certainly not the right length (it will always be length 1024), and it may well not have all the data either.
You should be using the ByteArrayOutputStream you've been writing into, e.g.
ByteArrayEntity data = new ByteArrayEntity(outputStream.toByteArray());
(You should be closing fis in a finally block, by the way.)
EDIT: The values you've printed to the console are indeed just showing the differences between signed and unsigned representations. They have nothing to do with the reason the Java code is failing, which is due to the above problem, I believe. You should look at what's being sent over the wire in Wireshark - that'll show you what's really going on.
Take a look at this: http://en.wikipedia.org/wiki/Byte_order_mark
EDIT: The reason why java and C# are different is that when reading the bytes, C# is unsigned, and java is signed. Same binary values, however.

string serialization and deserialization problem

I'm trying to serialize/deserialize string. Using the code:
private byte[] StrToBytes(string str)
{
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, str);
ms.Seek(0, 0);
return ms.ToArray();
}
private string BytesToStr(byte[] bytes)
{
BinaryFormatter bfx = new BinaryFormatter();
MemoryStream msx = new MemoryStream();
msx.Write(bytes, 0, bytes.Length);
msx.Seek(0, 0);
return Convert.ToString(bfx.Deserialize(msx));
}
This two code works fine if I play with string variables.
But If I deserialize a string and save it to a file, after reading the back and serializing it again, I end up with only first portion of the string.
So I believe I have a problem with my file save/read operation. Here is the code for my save/read
private byte[] ReadWhole(string fileName)
{
try
{
using (BinaryReader br = new BinaryReader(new FileStream(fileName, FileMode.Open)))
{
return br.ReadBytes((int)br.BaseStream.Length);
}
}
catch (Exception)
{
return null;
}
}
private void WriteWhole(byte[] wrt,string fileName,bool append)
{
FileMode fm = FileMode.OpenOrCreate;
if (append)
fm = FileMode.Append;
using (BinaryWriter bw = new BinaryWriter(new FileStream(fileName, fm)))
{
bw.Write(wrt);
}
return;
}
Any help will be appreciated.
Many thanks
Sample Problematic Run:
WriteWhole(StrToBytes("First portion of text"),"filename",true);
WriteWhole(StrToBytes("Second portion of text"),"filename",true);
byte[] readBytes = ReadWhole("filename");
string deserializedStr = BytesToStr(readBytes); // here deserializeddStr becomes "First portion of text"
Just use
Encoding.UTF8.GetBytes(string s)
Encoding.UTF8.GetString(byte[] b)
and don't forget to add System.Text in your using statements
BTW, why do you need to serialize a string and save it that way?
You can just use File.WriteAllText() or File.WriteAllBytes. The same way you can read it back, File.ReadAllBytes() and File.ReadAllText()
The problem is that you are writing two strings to the file, but only reading one back.
If you want to read back multiple strings, then you must deserialize multiple strings. If there are always two strings, then you can just deserialize two strings. If you want to store any number of strings, then you must first store how many strings there are, so that you can control the deserialization process.
If you are trying to hide data (as indicated by your comment to another answer), then this is not a reliable way to accomplish that goal. On the other hand, if you are storing data an a user's hard-drive, and the user is running your program on their local machine, then there is no way to hide the data from them, so this is as good as anything else.

Categories