Open a *.COM file using stream - c#

Just found myself in need of opening *.COM files in a C# application.
*.COM files are generated by fasm, with assemly code like this one:
org 100h
jmp start
msg: db "Hi", 0Dh,0Ah, 24h
start:
mov dx, msg
mov ah, 09h
int 21h
mov ah, 0
int 16h
ret
when opened with textprocessors like Sublime, it's represented that code represented like this:
eb05 4869 0d0a 24ba 0201 b409 cd21 b400
cd16 c3
I tried to open this file in application with code like this one
string COMtext = File.ReadAllText(filename,encoding);
byte[] info = new UTF8Encoding(true).GetBytes(COMtext);
When checked with MessageBox.Show(info[i].ToString("x2"));
it says, what first byte is EF, 2nd is BF (splitting EB in halfs), then adding one additional byte on 3rd place (BD). After this - everything parsed just as planned.
4th:05, 5th:48 etc.
What do I do wrong and is there any way to fix it without workaround (which one is unclear for me at this stage, because I don't know if would have same behavior or dont)

You need to open the file as a binary stream, not a text steam. As text it is getting encoded as Unicode surrogates.
Use File.OpenRead with a byte array. You can also use File.ReadAllBytes but I don't recommend it since a large file will cause an OutOfMemoryException.

Related

How to extract image from TCP Stream

I need your help.
I was creating an application in c# that converts the data from the IP camera to an image (JPEG).
I was able to convert the image using the below code:
hex = "FFD8FFDB008400130D0F1.........";/// supply this with the attached hex dump.
byte[] image = HexString2Bytes(hex);
File.WriteAllBytes("visio.png", image);
Process.Start("visio.png");
private static byte[] HexString2Bytes(string hexString)
{
int bytesCount = (hexString.Length) / 2;
byte[] bytes = new byte[bytesCount];
for (int x = 0; x < bytesCount; ++x)
{
bytes[x] = Convert.ToByte(hexString.Substring(x * 2, 2), 16);
}
return bytes;
}
Sometimes I get a better image as expected:https://ibb.co/pxrwn6p
but sometimes I get a distorted image after converting https://ibb.co/9twx5ZT.
I was wondering if there is a problem with the conversion or the way I save the image.
because as per the supplier what I need to do is to directly save the image from the stream.
but since I receive it as a byte and I still need to convert it maybe there is something wrong with my codes.
the image also starts with ÿØÿÛ FF D8 and ends with ÿ Ùÿÿÿÿ (FF D9 FF FF FF FF)
here's the hex dump from their sample app:
https://drive.google.com/file/d/1CMlQ0xaVjM0jfU5A4MB-_HwK54dUMTOr/view?usp=sharing
using their test application the image can be captured and converted the image perfectly.
captured image using their application:https://ibb.co/2KgyLTc
using the hex from the sniff and convert it using my code:
converted image using my code:https://ibb.co/G0WMjht
sample source code:
please bare with my codes because currently this is only my test app before integrating this feature to another app.
https://drive.google.com/file/d/1Ux7zsR39IVNyd1wrBxQPQKA6yM4YnwJN/view?usp=sharing
Thank You in advance.
Looking at the hex-dump it looks like some kind of XML file with embedded image data. Trying to convert this directly to an image will most likely not work, you would need to parse the XML-data to extract the actual image file. But it looks like you have a valid Jpeg header, so I would guess you have found the start of the image at least. But you probably also need to check the length property from the XML-data to find the length of the image-data block.
However, the datablock looks like it contains large sections of zeros, this should not be present in a jpeg file, so it might indicate some data corruption. Possibly from the way the network data is captured.
I would expect cameras to use some higher level protocol than raw TCP. Like Real Time Streaming Protocol, GigE vision, or mjpeg over http. I have not seen any camera that require you to process a raw TCP streams. But since you do not show how the data is fetched it is difficult to tell if there is any mistakes in that code.

load screenshot from adb through c#

I want to get a screenshot into c# using adb without saving files to the filesystem all the time.
I'm using the SharpAdbClient to talk with the device.
I'm on a windows platform.
This is what i got so far:
AdbServer server = new AdbServer();
StartServerResult result = server.StartServer(#"path\to\adb.exe", restartServerIfNewer: false);
DeviceData device = AdbClient.Instance.GetDevices().First();
ConsoleOutputReceiver receiver = new ConsoleOutputReceiver();
AdbClient.Instance.ExecuteRemoteCommand("screencap -p", device, receiver);
string str_image = receiver.ToString().Replace("\r\r", "");
byte[] bytes = Encoding.ASCII.GetBytes(str_image);
Image image = Image.FromStream(new MemoryStream(bytes));
I can successfully load both str_image, and create the byte array but it keeps saying System.ArgumentException when trying to load it into an Image.
I also tried saving the data to a file, but the file is corrupt.
I tried both replacing "\r\r" and "\r\n", both same result.
Anyone has some insight in how to load this file?
It's actually preferred if it could be loaded into a Emgu image since i'm gonna do some CV on it later.
One possible cause is the nonprintable ASCII characters in the string.
Look at the code below
string str_image = File.ReadAllText("test.png");
byte[] bytes = Encoding.ASCII.GetBytes(str_image);
byte[] actualBytes = File.ReadAllBytes("test.png");
str_image is shown in the below screencap, note that there are some non-printable chars (displayed as question mark).
The first eight bytes of a PNG file are always
137 80 78 71 13 10 26 10
While you read the console output as a string, then use ASCII to encode the string, the first byte becomes 63 (0x3F), which is the ASCII code for a question mark.
Also note that the size of the two byte arrays vary hugely (7828/7378).
And other thing is you are replace "\r\r", while actually a new line character in Windows is "\r\n".
So my conclusion is some image data is lost or modified in the output redirection by the ConsoleOutputReceiver, and you cannot recover the original data from the output string.

Issues Decoding Flate from PDF Embedded Font

Ok, before we start. I work for a company that has a license to redistribute PDF files from various publishers in any media form. So, that being said, the extraction of embedded fonts from the given PDF files is not only legal - but also vital to the presentation.
I am using code found on this site, however I do not recall the author, when I find it I will reference them. I have located the stream within the PDF file that contains the embedded fonts, I have isolated this encoded stream as a string and then into a byte[]. When I use the following code I get an error
Block length does not match with its complement.
Code (the error occurs in the while line below):
private static byte[] DecodeFlateDecodeData(byte[] data)
{
MemoryStream outputStream;
using (outputStream = new MemoryStream())
{
using (var compressedDataStream = new MemoryStream(data))
{
// Remove the first two bytes to skip the header (it isn't recognized by the DeflateStream class)
compressedDataStream.ReadByte();
compressedDataStream.ReadByte();
var deflateStream = new DeflateStream(compressedDataStream, CompressionMode.Decompress, true);
var decompressedBuffer = new byte[compressedDataStream.Length];
int read;
// The error occurs in the following line
while ((read = deflateStream.Read(decompressedBuffer, 0, decompressedBuffer.Length)) != 0)
{
outputStream.Write(decompressedBuffer, 0, read);
}
outputStream.Flush();
compressedDataStream.Close();
}
return ReadFully(outputStream);
}
}
After using the usual tools (Google, Bing, archives here) I found that the majority of the time that this occurs is when one has not consumed the first two bytes of the encoding stream - but this is done here so i cannot find the source of this error. Below is the encoded stream:
H‰LT}lg?7ñù¤aŽÂ½ãnÕ´jh›Ú?-T’ÑRL–¦
ëš:Uí6Ÿ¶“ø+ñ÷ùü™”ÒÆŸŸíóWlDZ“ºu“°tƒ¦t0ÊD¶jˆ
Ö m:$½×^*qABBï?Þç÷|ýÞßóJÖˆD"yâP—òpgÇó¦Q¾S¯9£Û¾mçÁçÚ„cÂÛO¡É‡·¥ï~á³ÇãO¡ŸØö=öPD"d‚ìA—$H'‚DC¢D®¤·éC'Å:È—€ìEV%cÿŽS;þÔ’kYkùcË_ZÇZ/·þYº(ý݇Ã_ó3m¤[3¤²4ÿo?²õñÖ*Z/Þiãÿ¿¾õ8Ü ?»„O Ê£ðÅ­P9ÿ•¿Â¯*–z×No˜0ãÆ-êàîoR‹×ÉêÊêÂulaƒÝü
Please help, I am beating my head against the wall here!
NOTE: The stream above is the encoded version of Arial Black - according to the specs inside the PDF:
661 0 obj
<<
/Type /FontDescriptor
/FontFile3 662 0 R
/FontBBox [ -194 -307 1688 1083 ]
/FontName /HLJOBA+ArialBlack
/Flags 4
/StemV 0
/CapHeight 715
/XHeight 518
/Ascent 0
/Descent -209
/ItalicAngle 0
/CharSet (/space/T/e/s/t/a/k/i/n/g/S/r/E/x/m/O/u/l)
>>
endobj
662 0 obj
<< /Length 1700 /Filter /FlateDecode /Subtype /Type1C >>
stream
H‰LT}lg?7ñù¤aŽÂ½ãnÕ´jh›Ú?-T’ÑRL–¦
ëš:Uí6Ÿ¶“ø+ñ÷ùü™”ÒÆŸŸíóWlDZ“ºu“°tƒ¦t0ÊD¶jˆ
Ö m:$½×^*qABBï?Þç÷|ýÞßóJÖˆD"yâP—òpgÇó¦Q¾S¯9£Û¾mçÁçÚ„cÂÛO¡É‡·¥ï~á³ÇãO¡ŸØö=öPD"d‚ìA—$H'‚DC¢D®¤·éC'Å:È—€ìEV%cÿŽS;þÔ’kYkùcË_ZÇZ/·þYº(ý݇Ã_ó3m¤[3¤²4ÿo?²õñÖ*Z/Þiãÿ¿¾õ8Ü ?»„O Ê£ðÅ­P9ÿ•¿Â¯*–z×No˜0ãÆ-êàîoR‹×ÉêÊêÂulaƒÝü
Is there a particular reason why you're not using the GetStreamBytes() method that is provided with iText? What about data? Are you sure you are looking at the correct bytes? Did you create the PRStream object correctly and did you get the bytes with PdfReader.GetStreamBytesRaw()? If so, why decode the bytes yourself? Which brings me to my initial counter-question: is there a particular reason why you're not using the GetStreamBytes() method?
Looks like GetStreamBytes() might solve your problem out right, but let me point out that I think you're doing something dangerous concerning end-of-line markers. The PDF Specification in 7.3.8.1 states that:
The keyword stream that follows the stream dictionary shall be
followed by an end-of-line marker consisting of either a CARRIAGE
RETURN and a LINE FEED or just a LINE FEED, and not by a CARRIAGE
RETURN alone.
In your code it looks like you always skip two bytes while the spec says it could be either one or two (CR LF or LF).
You should be able to catch whether you are running into this by comparing the exact number of bytes you want to decode with the value of the (Required) "Length" key in the stream dictionary.
Okay, for anyone who might stumble across this issue themselves allow me to warn you - this is a rocky road without a great deal of good solutions. I eventually moved away from writing all of the code to extract the fonts myself. I simply downloaded MuPDF (open source) and then made command line calls to mutool.exe:
mutool extract C:\mypdf.pdf
This pulls all of the fonts into the folder mutool resides in (it also extracts some images (these are the fonts that could not be converted (usually small subsets I think))). I then wrote a method to move those from that folder into the one I wanted them in.
Of course, to convert these to anything usable is a headache in itself - but I have found it to be doable.
As a reminder, font piracy IS piracy.

How to encode Unicode so both iPad and Excel can understand?

I have a CSV that is encoded with UTF32. When I open stream in IE and open with Excel I can read everything. On iPad I stream and I get a blank page with no content whatsoever. (I don't know how to view source on iPad so there could be something hidden in HTML).
The http response is written in asp.net C#
Response.Clear();
Response.Buffer = true;
Response.ContentType = "text/comma-separated-values";
Response.AddHeader("Content-Disposition", "attachment;filename=\"InventoryCount.csv\"");
Response.RedirectLocation = "InventoryCount.csv";
Response.ContentEncoding = Encoding.UTF32;//works on Excel wrong in iPad
//Response.ContentEncoding = Encoding.UTF8;//works on iPad wrong in Excel
Response.Charset = "UTF-8";//tried also adding Charset just to see if it works somehow, but it does not.
EnableViewState = false;
NMDUtilities.Export oUtilities = new NMDUtilities.Export();
Response.Write(oUtilities.DataGridToCSV(gvExport, ","));
Response.End();
The only guess I can make is that iPad cannot read UTF32, is that true? How can I view source on iPad?
UPDATE
I just made an interesting discovery. When my encoding is UTF8 things work on iPad and characters are displayed properly, but Excel messes up a character. But when I use UTF32 the inverse is true. iPad displays nothing, but Excel works perfectly. I really have no idea what I can do about this.
iPad UTF8 outputs = " Quattrode® "
Excel UTF8 outputs = " Quattrode® "
iPad UTF32 outputs = " "
Excel UTF32 outputs = " Quattrode® "
Here's my implementation of DataGridToCsv
public string DataGridToCsv(GridView input, string delimiter)
{
StringBuilder sb = new StringBuilder();
//iterate Gridview and put row results in stringbuilder...
string result = HttpUtility.HtmlDecode(sb.ToString());
return result;
}
UPDATE2 Excel is barfing on UTF8 >:{. Man. I just undid the second option he lists because it doesnt work on iPad. I cant win for losing on this one.
UPDATE3Per your suggestions I have looked at the hex code. There is no BOM, but there is a difference between the file layouts.
UTF84D 61 74 65 (MATE from the first word MATERIAL)
UTF324D 00 00 00 (M from the first word MATERIAL)
So it looks like UTF32 lays things out in 32 bits vs UTF8 doing it in 8 bits. I think this is why Excel can guess. Now I will try your suggested fixes.
The problem is that the browser knows your data's encoding is UTF-8, but it has no way of telling Excel. When Excel opens the file, it assumes your system's default encoding. If you copy some non-ASCII text, paste it in Notepad, and save it with UTF-8 encoding, though, you'll see that Excel can properly detect it. It works on the iPad because its default encoding just happens to be UTF-8.
The reason is that Notepad puts the proper byte order mark (EF BB BF for UTF-8) in the beginning of the file. You can try it yourself by using a hex editor or some other means to create a file containing
EF BB BF 20 51 75 61 74 74 72 6F 64 65 C2 AE 20
and opening that file in Excel. (I used Excel 2010, but I assume it would work with all recent versions.)
Try making sure your output starts with those first 3 bytes.
How to write a BOM in C#
byte[] BOM = new byte[] { 0xef, 0xbb, 0xbf };
Response.BinaryWrite(BOM);//write the BOM first
Response.Write(utility.DataGridToCSV(gvExport, ","));//then write your CSV
Excel tries to infer the encoding based on your file contents, and ASCII and UTF-8 happen to overlap on the first 128 characters (letters and numbers). When you use UTF-16 and UTF-32, it can figure out that the content isn't ASCII, but since most of your content using UTF-8 matches ASCII, if you want your file to be read in as UTF-8, you have to tell it explicitly that the content is UTF-8 by writing the byte order mark as Gabe said in his answer. Also, see the answer by Andrew Csontos on this other question:
What's the best way to export UTF8 data into Excel?

How to detect if a file is PDF or TIFF?

Please bear with me as I've been thrown into the middle of this project without knowing all the background. If you've got WTF questions, trust me, I have them too.
Here is the scenario: I've got a bunch of files residing on an IIS server. They have no file extension on them. Just naked files with names like "asda-2342-sd3rs-asd24-ut57" and so on. Nothing intuitive.
The problem is I need to serve up files on an ASP.NET (2.0) page and display the tiff files as tiff and the PDF files as PDF. Unfortunately I don't know which is which and I need to be able to display them appropriately in their respective formats.
For example, lets say that there are 2 files I need to display, one is tiff and one is PDF. The page should show up with a tiff image, and perhaps a link that would open up the PDF in a new tab/window.
The problem:
As these files are all extension-less I had to force IIS to just serve everything up as TIFF. But if I do this, the PDF files won't display. I could change IIS to force the MIME type to be PDF for unknown file extensions but I'd have the reverse problem.
http://support.microsoft.com/kb/326965
Is this problem easier than I think or is it as nasty as I am expecting?
OK, enough people are getting this wrong that I'm going to post some code I have to identify TIFFs:
private const int kTiffTagLength = 12;
private const int kHeaderSize = 2;
private const int kMinimumTiffSize = 8;
private const byte kIntelMark = 0x49;
private const byte kMotorolaMark = 0x4d;
private const ushort kTiffMagicNumber = 42;
private bool IsTiff(Stream stm)
{
stm.Seek(0);
if (stm.Length < kMinimumTiffSize)
return false;
byte[] header = new byte[kHeaderSize];
stm.Read(header, 0, header.Length);
if (header[0] != header[1] || (header[0] != kIntelMark && header[0] != kMotorolaMark))
return false;
bool isIntel = header[0] == kIntelMark;
ushort magicNumber = ReadShort(stm, isIntel);
if (magicNumber != kTiffMagicNumber)
return false;
return true;
}
private ushort ReadShort(Stream stm, bool isIntel)
{
byte[] b = new byte[2];
_stm.Read(b, 0, b.Length);
return ToShort(_isIntel, b[0], b[1]);
}
private static ushort ToShort(bool isIntel, byte b0, byte b1)
{
if (isIntel)
{
return (ushort)(((int)b1 << 8) | (int)b0);
}
else
{
return (ushort)(((int)b0 << 8) | (int)b1);
}
}
I hacked apart some much more general code to get this.
For PDF, I have code that looks like this:
public bool IsPdf(Stream stm)
{
stm.Seek(0, SeekOrigin.Begin);
PdfToken token;
while ((token = GetToken(stm)) != null)
{
if (token.TokenType == MLPdfTokenType.Comment)
{
if (token.Text.StartsWith("%PDF-1."))
return true;
}
if (stm.Position > 1024)
break;
}
return false;
}
Now, GetToken() is a call into a scanner that tokenizes a Stream into PDF tokens. This is non-trivial, so I'm not going to paste it here. I'm using the tokenizer instead of looking at substring to avoid a problem like this:
% the following is a PostScript file, NOT a PDF file
% you'll note that in our previous version, it started with %PDF-1.3,
% incorrectly marking it as a PDF
%
clippath stroke showpage
this code is marked as NOT a PDF by the above code snippet, whereas a more simplistic chunk of code will incorrectly mark it as a PDF.
I should also point out that the current ISO spec is devoid of the implementation notes that were in the previous Adobe-owned specification. Most importantly from the PDF Reference, version 1.6:
Acrobat viewers require only that the header appear somewhere within
the first 1024 bytes of the file.
TIFF can be detected by peeking at first bytes http://local.wasp.uwa.edu.au/~pbourke/dataformats/tiff/
The first 8 bytes forms the header.
The first two bytes of which is either
"II" for little endian byte ordering
or "MM" for big endian byte ordering.
About PDF: http://www.adobe.com/devnet/livecycle/articles/lc_pdf_overview_format.pdf
The header contains just one line that
identifies the version of PDF.
Example: %PDF-1.6
Reading the specification for each file format will tell you how to identify files of that format.
TIFF files - Check bytes 1 and 2 for 0x4D4D or 0x4949 and bytes 2-3 for the value '42'.
Page 13 of the spec reads:
A TIFF file begins with an 8-byte
image file header, containing the
following information: Bytes 0-1: The
byte order used within the file. Legal
values are: “II” (4949.H) “MM”
(4D4D.H) In the “II” format, byte
order is always from the least
significant byte to the most
significant byte, for both 16-bit and
32-bit integers This is called
little-endian byte order. In the “MM”
format, byte order is always from most
significant to least significant, for
both 16-bit and 32-bit integers. This
is called big-endian byte order. Bytes
2-3 An arbitrary but carefully chosen
number (42) that further identifies
the file as a TIFF file. The byte
order depends on the value of Bytes
0-1.
PDF files start with the PDF version followed by several binary bytes. (I think you now have to purchase the ISO spec for the current version.)
Section 7.5.2
The first line of a PDF file shall be
a header consisting of the 5
characters %PDF– followed by a version
number of the form 1.N, where N is a
digit between 0 and 7. A conforming
reader shall accept files with any of
the following headers: %PDF–1.0,
%PDF–1.1, %PDF–1.2, %PDF–1.3, %PDF–1.4,
%PDF–1.5, %PDF–1.6, %PDF–1.7 Beginning
with PDF 1.4, the Version entry in the
document’s catalog dictionary (located
via the Root entry in the file’s
trailer, as described in 7.5.5, "File
Trailer"), if present, shall be used
instead of the version specified in
the Header.
If a PDF file contains binary data, as
most do (see 7.2, "Lexical
Conventions"), the header line shall
be immediately followed by a comment
line containing at least four binary
characters—that is, characters whose
codes are 128 or greater. This ensures
proper behaviour of file transfer
applications that inspect data near
the beginning of a file to determine
whether to treat the file’s contents
as text or as binary.
Of course you could do a "deeper" check on each file by checking more file specific items.
A very useful list of File Signatures aka "magic numbers" by Gary Kessler is available http://www.garykessler.net/library/file_sigs.html
Internally, the file header information should help. if you do a low-level file open, such as StreamReader() or FOPEN(), look at the first two characters in the file... Almost every file type has its own signature.
PDF always starts with "%P" (but more specifically would have like %PDF)
TIFF appears to start with "II"
Bitmap files with "BM"
Executable files with "MZ"
I've had to deal with this in the past too... also to help prevent unwanted files from being uploaded to a given site and immediately aborting it once checked.
EDIT -- Posted sample code to read and test file header types
String fn = "Example.pdf";
StreamReader sr = new StreamReader( fn );
char[] buf = new char[5];
sr.Read( buf, 0, 4);
sr.Close();
String Hdr = buf[0].ToString()
+ buf[1].ToString()
+ buf[2].ToString()
+ buf[3].ToString()
+ buf[4].ToString();
String WhatType;
if (Hdr.StartsWith("%PDF"))
WhatType = "PDF";
else if (Hdr.StartsWith("MZ"))
WhatType = "EXE or DLL";
else if (Hdr.StartsWith("BM"))
WhatType = "BMP";
else if (Hdr.StartsWith("?_"))
WhatType = "HLP (help file)";
else if (Hdr.StartsWith("\0\0\1"))
WhatType = "Icon (.ico)";
else if (Hdr.StartsWith("\0\0\2"))
WhatType = "Cursor (.cur)";
else
WhatType = "Unknown";
If you go here, you will see that the TIFF usually starts with "magic numbers" 0x49 0x49 0x2A 0x00 (some other definitions are also given), which is the first 4 bytes of the file.
So just use these first 4 bytes to determine whether file is TIFF or not.
EDIT, it is probably better to do it the other way, and detect PDF first. The magic numbers for PDF are more standardized: As Plinth kindly pointed out they start with "%PDF" somewhere in the first 1024 bytes (0x25 0x50 0x44 0x46). source
You are going to have to write an ashx to get the file requested.
then, your handler should read the first few bytes (or so) to determine what the file type really is-- PDF and TIFF's have "magic numers" in the beginning of the file that you can use to determin this, then set your Response Headers accordingly.
you can use Myrmec to identify the file type, this library use the file byte head. this library avaliable on nuget "Myrmec",and this is the repo, myrmec also support mime type,you can try it. the code will like this :
// create a sniffer instance.
Sniffer sniffer = new Sniffer();
// populate with mata data.
sniffer.Populate(FileTypes.CommonFileTypes);
// get file head byte, may be 20 bytes enough.
byte[] fileHead = ReadFileHead();
// start match.
List<string> results = sniffer.Match(fileHead);
and get mime type :
List<string> result = sniffer.Match(head);
string mimeType = MimeTypes.GetMimeType(result.First());
but that support tiff only "49 49 2A 00" and "4D 4D 00 2A" two signature, if you have more you can add your self, may be you can see the readme file of myrmec for help. myrmec github repo

Categories