Content-Length Occasionally Wrong on Simple C# HTTP Server - c#

For some experimentation was working with Simple HTTP Server code here
In one case I wanted it to serve some ANSI encoded text configuration files. I am aware there are more issues with this code but the only one I'm currently concerned with is Content-Length is wrong, but only for certain text files.
Example code:
Output stream initialisation:
outputStream = new StreamWriter(new BufferedStream(socket.GetStream()));
The handling of HTTP get:
public override void handleGETRequest(HttpProcessor p)
{
if (p.http_url.EndsWith(".pac"))
{
string filename = Path.Combine(Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location), p.http_url.Substring(1));
Console.WriteLine(string.Format("HTTP request for : {0}", filename));
if (File.Exists(filename))
{
FileInfo fi = new FileInfo(filename);
DateTime lastWrite = fi.LastWriteTime;
Stream fs = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader sr = new StreamReader(fs);
string result = sr.ReadToEnd().Trim();
Console.WriteLine(fi.Length);
Console.WriteLine(result.Length);
p.writeSuccess("application/x-javascript-config",result.Length,lastWrite);
p.outputStream.Write(result);
// fs.CopyTo(p.outputStream.BaseStream);
p.outputStream.BaseStream.Flush();
fs.Close();
}
else
{
Console.WriteLine("404 - FILE not found!");
p.writeFailure();
}
}
}
public void writeSuccess(string content_type,long length,DateTime lastModified) {
outputStream.Write("HTTP/1.0 200 OK\r\n");
outputStream.Write("Content-Type: " + content_type + "\r\n");
outputStream.Write("Last-Modified: {0}\r\n", lastModified.ToUniversalTime().ToString("r"));
outputStream.Write("Accept-Range: bytes\r\n");
outputStream.Write("Server: FlakyHTTPServer/1.3\r\n");
outputStream.Write("Date: {0}\r\n", DateTime.Now.ToUniversalTime().ToString("r"));
outputStream.Write(string.Format("Content-Length: {0}\r\n\r\n", length));
}
For most files I've tested with Content-Length is correct. However when testing with HTTP debugging tool Fiddler some times protocol violation is reported on Content-Length.
For example fiddler says:
Request Count: 1
Bytes Sent: 303 (headers:303; body:0)
Bytes Received: 29,847 (headers:224; body:29,623)
So Content-Length should be 29623. But the HTTP header generated is
Content-Length: 29617
I saved the body of HTTP content from Fiddler and visibly compared the files, couldn't notice any difference. Then loaded them into BeyondCompare Hex compare, there are several problems with files like this:
Original File: 2D 2D 96 20 2A 2F
HTTP Content : 2D 2D EF BF BD 20 2A 2F
Original File: 27 3B 0D 0A 09 7D 0D 0A 0D 0A 09
HTTP Content : 27 3B 0A 09 7D 0A 0A 09
I suspect problem is related to encoding but not exactly sure. Only serving ANSI encoded files, no Unicode.
I made the file serve correctly with right Content-Length by modifying parts of the file with bytes sequence. Made this change in 3 parts of the file:
2D 2D 96 (--–) to 2D 2D 2D (---)

Based on the bytes you pasted, it looks like there are a couple things going wrong here. First, it seems that CRLF in your input file (0D 0A) is being converted to just LF (0A). Second, it looks like the character encoding is changing, either when reading the file into a string, or Writeing the string to the HTTP client.
The HTTP Content-Length represents the number of bytes in the stream, whereas string.Length gives you the number of characters in the string. Unless your file is exclusively using the first 128 ASCII characters (which precludes non-English characters as well as special windows-1252 characters like the euro sign), it's unlikely that string.Length will exactly equal the length of the string encoded in either UTF-8 or ISO-8859-1.
If you convert the string to a byte[] before sending it to the client, you'll be able to get the "true" Content-Length. However, you'll still end up with mangled text if you didn't read the file using the proper encoding. (Whether you specify the encoding or not, a conversion is happening when reading the file into a string of Unicode characters.)
I highly recommend specifying the charset in the Content-Type header (e.g. application/x-javascript-config;charset=utf-8). It doesn't matter whether your charset is utf-8, utf-16, iso-8859-1, windows-1251, etc., as long as it's the same character encoding you use when converting your string into a byte[].

Related

ZLIB.net (.NET) zeros after inflating a byte array (randomly working)

I am confused by the behavior of Deflate algorithm, such as, the first chunk of bytes (size 12~13k) always decompress successfully. But the second decompression never turns out successful..
I am using DotNetZip (DeflateStream) with a simple code, later I switched to ZLIB.Net (component ace), Org.Bouncycastle, and variety of c# libraries.
The compression goes in c++ (the server that sends the packets) with deflateInit2, windowSize (-15) -> (15 - nowrap).
What could be incorrectly going so that I'm having zeros at the end of the buffer despite the fact that the decompression went successfully?
a example code with "Org.BouncyCastle.Utilities.Zlib"
it's pretty much the same code for almost any lib (DotNetZip, ZLIB.Net, ...)
internal static bool Inflate(byte[] compressed, out byte[] decompressed)
{
using (var inputStream = new MemoryStream(compressed))
using (var zInputStream = new ZInputStream(inputStream, true))
using (var outputStream = new MemoryStream())
{
zInputStream.CopyTo(outputStream);
decompressed = outputStream.ToArray();
}
return true;
}
To make sure everything working correctly, you should check for the following:
Both zlib versions match on both sides (compression - server, and decompression - client).
Flush mode is set to sync, that means the buffer must be synchronized in order to decompress further packets sent by the server.
Ensure that the packets you received is actually correct, and at my specific case, I was appending a different array size (a constant one in fact, of 0xFFFF) which might be different than the size of the received data (and that happens in most cases).
[EDIT 13th Nov. 19']
Keep in mind that conventionally the server may not send the last 4 bytes (sync 00 00 ff ff) if both has a contract that the flush type is sync, so be aware of adding them manually.

load screenshot from adb through c#

I want to get a screenshot into c# using adb without saving files to the filesystem all the time.
I'm using the SharpAdbClient to talk with the device.
I'm on a windows platform.
This is what i got so far:
AdbServer server = new AdbServer();
StartServerResult result = server.StartServer(#"path\to\adb.exe", restartServerIfNewer: false);
DeviceData device = AdbClient.Instance.GetDevices().First();
ConsoleOutputReceiver receiver = new ConsoleOutputReceiver();
AdbClient.Instance.ExecuteRemoteCommand("screencap -p", device, receiver);
string str_image = receiver.ToString().Replace("\r\r", "");
byte[] bytes = Encoding.ASCII.GetBytes(str_image);
Image image = Image.FromStream(new MemoryStream(bytes));
I can successfully load both str_image, and create the byte array but it keeps saying System.ArgumentException when trying to load it into an Image.
I also tried saving the data to a file, but the file is corrupt.
I tried both replacing "\r\r" and "\r\n", both same result.
Anyone has some insight in how to load this file?
It's actually preferred if it could be loaded into a Emgu image since i'm gonna do some CV on it later.
One possible cause is the nonprintable ASCII characters in the string.
Look at the code below
string str_image = File.ReadAllText("test.png");
byte[] bytes = Encoding.ASCII.GetBytes(str_image);
byte[] actualBytes = File.ReadAllBytes("test.png");
str_image is shown in the below screencap, note that there are some non-printable chars (displayed as question mark).
The first eight bytes of a PNG file are always
137 80 78 71 13 10 26 10
While you read the console output as a string, then use ASCII to encode the string, the first byte becomes 63 (0x3F), which is the ASCII code for a question mark.
Also note that the size of the two byte arrays vary hugely (7828/7378).
And other thing is you are replace "\r\r", while actually a new line character in Windows is "\r\n".
So my conclusion is some image data is lost or modified in the output redirection by the ConsoleOutputReceiver, and you cannot recover the original data from the output string.

How to determine encoding of image using header bytes

So I am using c#, and I need to determine the actual encoding of an image-file. Most images can be in one format while simultaneously having a different extension and still work in general.
My need's require precise knowledge of the image format.
There is one other thread that deals with this: Determine Image Encoding of Image File
This show's how to find the actual encoding once you have the image's header information. I need to open the image and extract this header information.
FileStream imageFile = new FileStream("myImage.gif", FileMode.Open);
After this bit, how do I open only the bytes which contain the header?
Thank you.
You can't really read "just the header" unless you know it's size.
Instead, determine the minimum amount of bytes you need to be able to distinguish between the formats you need to support, and read only those bytes. Most likely all of the formats you need will have a unique header.
For example, if you need to support png & jpeg, those formats start with:
PNG: 89 50 4E 47 0D 0A 1A 0A
JPEG: FF D8 FF E0
So in that case you'd only have to read a single byte to differ between the two. In reality I'd say use a few more bytes, just in case you encounter other file formats.
To read, say 8 bytes, from the beginning of a file:
using( var sr = new FileStream( "file", FileMode.Open ) )
{
var data = new byte[8];
int numRead = sr.Read( data, 0, data.Length );
// numRead gives you the number of bytes read
}
Well I figured it out in the end. So im going to update the thread and close it. The only issue with my solution is that it requires opening the entire image file, rather than just the required bytes. This uses alot more memory, and takes longer. So it isn't the optimal solution when speed is a concern.
Just to give credit where it's due, this code was created from a
couple of sources here on stack-overflow, you can find the link's in
the OP and earlier comments. The rest of the code was written by me.
If anyone feels like modifying the code to only open the correct amount of bytes, feel free.
TextWriterTraceListener writer = new TextWriterTraceListener(System.Console.Out);
Debug.Listeners.Add(writer);
// PNG file contains 8 - bytes header.
// JPEG file contains 2 - bytes header(SOI) followed by series of markers,
// some markers can be followed by data array. Each type of marker has different header format.
// The bytes where the image is stored follows SOF0 marker(10 - bytes length).
// However, between JPEG header and SOF0 marker there can be other segments.
// BMP file contains 14 - bytes header.
// GIF file contains at least 14 bytes in its header.
FileStream memStream = new FileStream(#"C:\\a.png", FileMode.Open);
Image fileImage = Image.FromStream(memStream);
//get image format
var fileImageFormat = typeof(System.Drawing.Imaging.ImageFormat).GetProperties(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Static).ToList().ConvertAll(property => property.GetValue(null, null)).Single(image_format => image_format.Equals(fileImage.RawFormat));
MessageBox.Show("File Format: " + fileImageFormat);
//get image codec
var fileImageFormatCodec = System.Drawing.Imaging.ImageCodecInfo.GetImageDecoders().ToList().Single(image_codec => image_codec.FormatID == fileImage.RawFormat.Guid);
MessageBox.Show("MimeType: " + fileImageFormatCodec.MimeType + " \n" + "Extension: " + fileImageFormatCodec.FilenameExtension + "\n" + "Actual Codec: " + fileImageFormatCodec.CodecName);
Output is as Expected:
file_image_format: Png
Built-in PNG Codec, mime: image/png, extension: *.PNG

decompressing compressed binary file

i have compressed file (binary file/compressed string - i'm not sure what it is),
i'am trying to decompress this file by c#/vb.net ,
i tried to decompress it with Gzip:
Private Shared Function gzuncompress(ByVal data() As Byte) As Byte()
Dim input As MemoryStream = New MemoryStream(data)
Dim gzip As GZipStream = New GZipStream(input, CompressionMode.Decompress)
Dim output As MemoryStream = New MemoryStream
gzip.CopyTo(output)
Return output.ToArray
End Function
gzuncompress(New System.Net.WebClient().DownloadData("http://haxball.com/list3"))
but there is an exception (where : gzip.CopyTo(output)):
The magic number in GZip header is not correct
but when i tried to uncompress it by php it's worked :) .
php header('Content-Type: text/html; charset=utf-8');
$list = file_get_contents('http://haxball.com/list3');
$list = gzuncompress($list);
$len = implode('', unpack('n*', $list));
$bytes = unpack('c*', $list);
$string = implode('', array_map('chr', $bytes));
echo $string;
you can check the code here:
http://www.compileonline.com/execute_php_online.php
someone have the php's gzuncompress c#/vb.net alternative?
Even if there is a extarnal exe file that can do the same as the php's gzuncompress function it will be very good answer,
kind of:
Process.start("c:\umcompress.exe -f c:\list3 -o c:\res.txt")
Note:A good example is better than explanation
Update:
The First 30 Bytes Of The File:
78 DA 8C BD 79 F4 5D D7 55 26 78 65 0D F1 24 0F 89 E3 98 4C 5C 47 21 71 E2 C8 B9 E7 9E E1
That is a zlib stream. The zlib format is described in RFC 1950, and consists of a two-byte header and a four-byte trailer around a deflate stream. You will need to write your own code to process the header and trailer, and you can use the DeflateStream class to decompress the deflate stream.
Or you can use DotNetZip which will process the zlib stream directly.

How to encode Unicode so both iPad and Excel can understand?

I have a CSV that is encoded with UTF32. When I open stream in IE and open with Excel I can read everything. On iPad I stream and I get a blank page with no content whatsoever. (I don't know how to view source on iPad so there could be something hidden in HTML).
The http response is written in asp.net C#
Response.Clear();
Response.Buffer = true;
Response.ContentType = "text/comma-separated-values";
Response.AddHeader("Content-Disposition", "attachment;filename=\"InventoryCount.csv\"");
Response.RedirectLocation = "InventoryCount.csv";
Response.ContentEncoding = Encoding.UTF32;//works on Excel wrong in iPad
//Response.ContentEncoding = Encoding.UTF8;//works on iPad wrong in Excel
Response.Charset = "UTF-8";//tried also adding Charset just to see if it works somehow, but it does not.
EnableViewState = false;
NMDUtilities.Export oUtilities = new NMDUtilities.Export();
Response.Write(oUtilities.DataGridToCSV(gvExport, ","));
Response.End();
The only guess I can make is that iPad cannot read UTF32, is that true? How can I view source on iPad?
UPDATE
I just made an interesting discovery. When my encoding is UTF8 things work on iPad and characters are displayed properly, but Excel messes up a character. But when I use UTF32 the inverse is true. iPad displays nothing, but Excel works perfectly. I really have no idea what I can do about this.
iPad UTF8 outputs = " Quattrode® "
Excel UTF8 outputs = " Quattrode® "
iPad UTF32 outputs = " "
Excel UTF32 outputs = " Quattrode® "
Here's my implementation of DataGridToCsv
public string DataGridToCsv(GridView input, string delimiter)
{
StringBuilder sb = new StringBuilder();
//iterate Gridview and put row results in stringbuilder...
string result = HttpUtility.HtmlDecode(sb.ToString());
return result;
}
UPDATE2 Excel is barfing on UTF8 >:{. Man. I just undid the second option he lists because it doesnt work on iPad. I cant win for losing on this one.
UPDATE3Per your suggestions I have looked at the hex code. There is no BOM, but there is a difference between the file layouts.
UTF84D 61 74 65 (MATE from the first word MATERIAL)
UTF324D 00 00 00 (M from the first word MATERIAL)
So it looks like UTF32 lays things out in 32 bits vs UTF8 doing it in 8 bits. I think this is why Excel can guess. Now I will try your suggested fixes.
The problem is that the browser knows your data's encoding is UTF-8, but it has no way of telling Excel. When Excel opens the file, it assumes your system's default encoding. If you copy some non-ASCII text, paste it in Notepad, and save it with UTF-8 encoding, though, you'll see that Excel can properly detect it. It works on the iPad because its default encoding just happens to be UTF-8.
The reason is that Notepad puts the proper byte order mark (EF BB BF for UTF-8) in the beginning of the file. You can try it yourself by using a hex editor or some other means to create a file containing
EF BB BF 20 51 75 61 74 74 72 6F 64 65 C2 AE 20
and opening that file in Excel. (I used Excel 2010, but I assume it would work with all recent versions.)
Try making sure your output starts with those first 3 bytes.
How to write a BOM in C#
byte[] BOM = new byte[] { 0xef, 0xbb, 0xbf };
Response.BinaryWrite(BOM);//write the BOM first
Response.Write(utility.DataGridToCSV(gvExport, ","));//then write your CSV
Excel tries to infer the encoding based on your file contents, and ASCII and UTF-8 happen to overlap on the first 128 characters (letters and numbers). When you use UTF-16 and UTF-32, it can figure out that the content isn't ASCII, but since most of your content using UTF-8 matches ASCII, if you want your file to be read in as UTF-8, you have to tell it explicitly that the content is UTF-8 by writing the byte order mark as Gabe said in his answer. Also, see the answer by Andrew Csontos on this other question:
What's the best way to export UTF8 data into Excel?

Categories