How to determine encoding of image using header bytes

How to determine encoding of image using header bytes - c#

So I am using c#, and I need to determine the actual encoding of an image-file. Most images can be in one format while simultaneously having a different extension and still work in general.
My need's require precise knowledge of the image format.
There is one other thread that deals with this: Determine Image Encoding of Image File
This show's how to find the actual encoding once you have the image's header information. I need to open the image and extract this header information.
FileStream imageFile = new FileStream("myImage.gif", FileMode.Open);
After this bit, how do I open only the bytes which contain the header?
Thank you.

You can't really read "just the header" unless you know it's size.
Instead, determine the minimum amount of bytes you need to be able to distinguish between the formats you need to support, and read only those bytes. Most likely all of the formats you need will have a unique header.
For example, if you need to support png & jpeg, those formats start with:
PNG: 89 50 4E 47 0D 0A 1A 0A
JPEG: FF D8 FF E0
So in that case you'd only have to read a single byte to differ between the two. In reality I'd say use a few more bytes, just in case you encounter other file formats.
To read, say 8 bytes, from the beginning of a file:
using( var sr = new FileStream( "file", FileMode.Open ) )
{
var data = new byte[8];
int numRead = sr.Read( data, 0, data.Length );
// numRead gives you the number of bytes read
}

Well I figured it out in the end. So im going to update the thread and close it. The only issue with my solution is that it requires opening the entire image file, rather than just the required bytes. This uses alot more memory, and takes longer. So it isn't the optimal solution when speed is a concern.
Just to give credit where it's due, this code was created from a
couple of sources here on stack-overflow, you can find the link's in
the OP and earlier comments. The rest of the code was written by me.
If anyone feels like modifying the code to only open the correct amount of bytes, feel free.
TextWriterTraceListener writer = new TextWriterTraceListener(System.Console.Out);
Debug.Listeners.Add(writer);
// PNG file contains 8 - bytes header.
// JPEG file contains 2 - bytes header(SOI) followed by series of markers,
// some markers can be followed by data array. Each type of marker has different header format.
// The bytes where the image is stored follows SOF0 marker(10 - bytes length).
// However, between JPEG header and SOF0 marker there can be other segments.
// BMP file contains 14 - bytes header.
// GIF file contains at least 14 bytes in its header.
FileStream memStream = new FileStream(#"C:\\a.png", FileMode.Open);
Image fileImage = Image.FromStream(memStream);
//get image format
var fileImageFormat = typeof(System.Drawing.Imaging.ImageFormat).GetProperties(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Static).ToList().ConvertAll(property => property.GetValue(null, null)).Single(image_format => image_format.Equals(fileImage.RawFormat));
MessageBox.Show("File Format: " + fileImageFormat);
//get image codec
var fileImageFormatCodec = System.Drawing.Imaging.ImageCodecInfo.GetImageDecoders().ToList().Single(image_codec => image_codec.FormatID == fileImage.RawFormat.Guid);
MessageBox.Show("MimeType: " + fileImageFormatCodec.MimeType + " \n" + "Extension: " + fileImageFormatCodec.FilenameExtension + "\n" + "Actual Codec: " + fileImageFormatCodec.CodecName);
Output is as Expected:
file_image_format: Png
Built-in PNG Codec, mime: image/png, extension: *.PNG

Related

How to extract image from TCP Stream

I need your help.
I was creating an application in c# that converts the data from the IP camera to an image (JPEG).
I was able to convert the image using the below code:
hex = "FFD8FFDB008400130D0F1.........";/// supply this with the attached hex dump.
byte[] image = HexString2Bytes(hex);
File.WriteAllBytes("visio.png", image);
Process.Start("visio.png");
private static byte[] HexString2Bytes(string hexString)
{
int bytesCount = (hexString.Length) / 2;
byte[] bytes = new byte[bytesCount];
for (int x = 0; x < bytesCount; ++x)
{
bytes[x] = Convert.ToByte(hexString.Substring(x * 2, 2), 16);
}
return bytes;
}
Sometimes I get a better image as expected:https://ibb.co/pxrwn6p
but sometimes I get a distorted image after converting https://ibb.co/9twx5ZT.
I was wondering if there is a problem with the conversion or the way I save the image.
because as per the supplier what I need to do is to directly save the image from the stream.
but since I receive it as a byte and I still need to convert it maybe there is something wrong with my codes.
the image also starts with ÿØÿÛ FF D8 and ends with ÿ Ùÿÿÿÿ (FF D9 FF FF FF FF)
here's the hex dump from their sample app:
https://drive.google.com/file/d/1CMlQ0xaVjM0jfU5A4MB-_HwK54dUMTOr/view?usp=sharing
using their test application the image can be captured and converted the image perfectly.
captured image using their application:https://ibb.co/2KgyLTc
using the hex from the sniff and convert it using my code:
converted image using my code:https://ibb.co/G0WMjht
sample source code:
please bare with my codes because currently this is only my test app before integrating this feature to another app.
https://drive.google.com/file/d/1Ux7zsR39IVNyd1wrBxQPQKA6yM4YnwJN/view?usp=sharing
Thank You in advance.

Looking at the hex-dump it looks like some kind of XML file with embedded image data. Trying to convert this directly to an image will most likely not work, you would need to parse the XML-data to extract the actual image file. But it looks like you have a valid Jpeg header, so I would guess you have found the start of the image at least. But you probably also need to check the length property from the XML-data to find the length of the image-data block.
However, the datablock looks like it contains large sections of zeros, this should not be present in a jpeg file, so it might indicate some data corruption. Possibly from the way the network data is captured.
I would expect cameras to use some higher level protocol than raw TCP. Like Real Time Streaming Protocol, GigE vision, or mjpeg over http. I have not seen any camera that require you to process a raw TCP streams. But since you do not show how the data is fetched it is difficult to tell if there is any mistakes in that code.

load screenshot from adb through c#

I want to get a screenshot into c# using adb without saving files to the filesystem all the time.
I'm using the SharpAdbClient to talk with the device.
I'm on a windows platform.
This is what i got so far:
AdbServer server = new AdbServer();
StartServerResult result = server.StartServer(#"path\to\adb.exe", restartServerIfNewer: false);
DeviceData device = AdbClient.Instance.GetDevices().First();
ConsoleOutputReceiver receiver = new ConsoleOutputReceiver();
AdbClient.Instance.ExecuteRemoteCommand("screencap -p", device, receiver);
string str_image = receiver.ToString().Replace("\r\r", "");
byte[] bytes = Encoding.ASCII.GetBytes(str_image);
Image image = Image.FromStream(new MemoryStream(bytes));
I can successfully load both str_image, and create the byte array but it keeps saying System.ArgumentException when trying to load it into an Image.
I also tried saving the data to a file, but the file is corrupt.
I tried both replacing "\r\r" and "\r\n", both same result.
Anyone has some insight in how to load this file?
It's actually preferred if it could be loaded into a Emgu image since i'm gonna do some CV on it later.

One possible cause is the nonprintable ASCII characters in the string.
Look at the code below
string str_image = File.ReadAllText("test.png");
byte[] bytes = Encoding.ASCII.GetBytes(str_image);
byte[] actualBytes = File.ReadAllBytes("test.png");
str_image is shown in the below screencap, note that there are some non-printable chars (displayed as question mark).
The first eight bytes of a PNG file are always
137 80 78 71 13 10 26 10
While you read the console output as a string, then use ASCII to encode the string, the first byte becomes 63 (0x3F), which is the ASCII code for a question mark.
Also note that the size of the two byte arrays vary hugely (7828/7378).
And other thing is you are replace "\r\r", while actually a new line character in Windows is "\r\n".
So my conclusion is some image data is lost or modified in the output redirection by the ConsoleOutputReceiver, and you cannot recover the original data from the output string.

How to Read .DSS format audio files into Byte array

in My application, i read .DSS format audio Files into Byte Array,with following code
byte[] bt = File.ReadAllBytes(Filepath);
but i am unable to get data into Byte's. but In the Audio player it is playing ,
here how can i read the files into Byte Array.
Here i am attaching Snap, what bt have, it show's 255 for all bytes.
TIA

To ensure this is not the issue with File.ReadAllBytes, try to read file using stream, like this:
using (var fileStream = new FileStream(FilePath, FileMode.Open, FileAccess.Read))
{
byte[] buffer = new byte[fileStream.Length];
fileStream.Read(buffer, 0, (int) fileStream.Length);
// use buffer;
}
UPDATE: as it's not working too, there should be issue with your file. Try to find any process that may be blocking and using it at the moment. Also, try to open the file with any HEX editor and see if there really any meaningful data present. I'd also create clean testing app/sandbox to test if it's working.

Well, the Dss format is copyrighted, and you'll likely not find a lot of information about it.
255 or 0xFF is commonly used in Dss files to indicate that a byte is not in use. You will see many of them in the header of the Dss file, later in the audio part they will be more sparse.
That means: a value of 255 in the region of bytes 83-97 which you show does NOT mean that something went wrong.

How to encode Unicode so both iPad and Excel can understand?

I have a CSV that is encoded with UTF32. When I open stream in IE and open with Excel I can read everything. On iPad I stream and I get a blank page with no content whatsoever. (I don't know how to view source on iPad so there could be something hidden in HTML).
The http response is written in asp.net C#
Response.Clear();
Response.Buffer = true;
Response.ContentType = "text/comma-separated-values";
Response.AddHeader("Content-Disposition", "attachment;filename=\"InventoryCount.csv\"");
Response.RedirectLocation = "InventoryCount.csv";
Response.ContentEncoding = Encoding.UTF32;//works on Excel wrong in iPad
//Response.ContentEncoding = Encoding.UTF8;//works on iPad wrong in Excel
Response.Charset = "UTF-8";//tried also adding Charset just to see if it works somehow, but it does not.
EnableViewState = false;
NMDUtilities.Export oUtilities = new NMDUtilities.Export();
Response.Write(oUtilities.DataGridToCSV(gvExport, ","));
Response.End();
The only guess I can make is that iPad cannot read UTF32, is that true? How can I view source on iPad?
UPDATE
I just made an interesting discovery. When my encoding is UTF8 things work on iPad and characters are displayed properly, but Excel messes up a character. But when I use UTF32 the inverse is true. iPad displays nothing, but Excel works perfectly. I really have no idea what I can do about this.
iPad UTF8 outputs = " Quattrode® "
Excel UTF8 outputs = " QuattrodeÂ® "
iPad UTF32 outputs = " "
Excel UTF32 outputs = " Quattrode® "
Here's my implementation of DataGridToCsv
public string DataGridToCsv(GridView input, string delimiter)
{
StringBuilder sb = new StringBuilder();
//iterate Gridview and put row results in stringbuilder...
string result = HttpUtility.HtmlDecode(sb.ToString());
return result;
}
UPDATE2 Excel is barfing on UTF8 >:{. Man. I just undid the second option he lists because it doesnt work on iPad. I cant win for losing on this one.
UPDATE3Per your suggestions I have looked at the hex code. There is no BOM, but there is a difference between the file layouts.
UTF84D 61 74 65 (MATE from the first word MATERIAL)
UTF324D 00 00 00 (M from the first word MATERIAL)
So it looks like UTF32 lays things out in 32 bits vs UTF8 doing it in 8 bits. I think this is why Excel can guess. Now I will try your suggested fixes.

The problem is that the browser knows your data's encoding is UTF-8, but it has no way of telling Excel. When Excel opens the file, it assumes your system's default encoding. If you copy some non-ASCII text, paste it in Notepad, and save it with UTF-8 encoding, though, you'll see that Excel can properly detect it. It works on the iPad because its default encoding just happens to be UTF-8.
The reason is that Notepad puts the proper byte order mark (EF BB BF for UTF-8) in the beginning of the file. You can try it yourself by using a hex editor or some other means to create a file containing
EF BB BF 20 51 75 61 74 74 72 6F 64 65 C2 AE 20
and opening that file in Excel. (I used Excel 2010, but I assume it would work with all recent versions.)
Try making sure your output starts with those first 3 bytes.
How to write a BOM in C#
byte[] BOM = new byte[] { 0xef, 0xbb, 0xbf };
Response.BinaryWrite(BOM);//write the BOM first
Response.Write(utility.DataGridToCSV(gvExport, ","));//then write your CSV

Excel tries to infer the encoding based on your file contents, and ASCII and UTF-8 happen to overlap on the first 128 characters (letters and numbers). When you use UTF-16 and UTF-32, it can figure out that the content isn't ASCII, but since most of your content using UTF-8 matches ASCII, if you want your file to be read in as UTF-8, you have to tell it explicitly that the content is UTF-8 by writing the byte order mark as Gabe said in his answer. Also, see the answer by Andrew Csontos on this other question:
What's the best way to export UTF8 data into Excel?

How to detect if a file is PDF or TIFF?

Please bear with me as I've been thrown into the middle of this project without knowing all the background. If you've got WTF questions, trust me, I have them too.
Here is the scenario: I've got a bunch of files residing on an IIS server. They have no file extension on them. Just naked files with names like "asda-2342-sd3rs-asd24-ut57" and so on. Nothing intuitive.
The problem is I need to serve up files on an ASP.NET (2.0) page and display the tiff files as tiff and the PDF files as PDF. Unfortunately I don't know which is which and I need to be able to display them appropriately in their respective formats.
For example, lets say that there are 2 files I need to display, one is tiff and one is PDF. The page should show up with a tiff image, and perhaps a link that would open up the PDF in a new tab/window.
The problem:
As these files are all extension-less I had to force IIS to just serve everything up as TIFF. But if I do this, the PDF files won't display. I could change IIS to force the MIME type to be PDF for unknown file extensions but I'd have the reverse problem.
http://support.microsoft.com/kb/326965
Is this problem easier than I think or is it as nasty as I am expecting?

OK, enough people are getting this wrong that I'm going to post some code I have to identify TIFFs:
private const int kTiffTagLength = 12;
private const int kHeaderSize = 2;
private const int kMinimumTiffSize = 8;
private const byte kIntelMark = 0x49;
private const byte kMotorolaMark = 0x4d;
private const ushort kTiffMagicNumber = 42;
private bool IsTiff(Stream stm)
{
stm.Seek(0);
if (stm.Length < kMinimumTiffSize)
return false;
byte[] header = new byte[kHeaderSize];
stm.Read(header, 0, header.Length);
if (header[0] != header[1] || (header[0] != kIntelMark && header[0] != kMotorolaMark))
return false;
bool isIntel = header[0] == kIntelMark;
ushort magicNumber = ReadShort(stm, isIntel);
if (magicNumber != kTiffMagicNumber)
return false;
return true;
}
private ushort ReadShort(Stream stm, bool isIntel)
{
byte[] b = new byte[2];
_stm.Read(b, 0, b.Length);
return ToShort(_isIntel, b[0], b[1]);
}
private static ushort ToShort(bool isIntel, byte b0, byte b1)
{
if (isIntel)
{
return (ushort)(((int)b1 << 8) | (int)b0);
}
else
{
return (ushort)(((int)b0 << 8) | (int)b1);
}
}
I hacked apart some much more general code to get this.
For PDF, I have code that looks like this:
public bool IsPdf(Stream stm)
{
stm.Seek(0, SeekOrigin.Begin);
PdfToken token;
while ((token = GetToken(stm)) != null)
{
if (token.TokenType == MLPdfTokenType.Comment)
{
if (token.Text.StartsWith("%PDF-1."))
return true;
}
if (stm.Position > 1024)
break;
}
return false;
}
Now, GetToken() is a call into a scanner that tokenizes a Stream into PDF tokens. This is non-trivial, so I'm not going to paste it here. I'm using the tokenizer instead of looking at substring to avoid a problem like this:
% the following is a PostScript file, NOT a PDF file
% you'll note that in our previous version, it started with %PDF-1.3,
% incorrectly marking it as a PDF
%
clippath stroke showpage
this code is marked as NOT a PDF by the above code snippet, whereas a more simplistic chunk of code will incorrectly mark it as a PDF.
I should also point out that the current ISO spec is devoid of the implementation notes that were in the previous Adobe-owned specification. Most importantly from the PDF Reference, version 1.6:
Acrobat viewers require only that the header appear somewhere within
the first 1024 bytes of the file.

TIFF can be detected by peeking at first bytes http://local.wasp.uwa.edu.au/~pbourke/dataformats/tiff/
The first 8 bytes forms the header.
The first two bytes of which is either
"II" for little endian byte ordering
or "MM" for big endian byte ordering.
About PDF: http://www.adobe.com/devnet/livecycle/articles/lc_pdf_overview_format.pdf
The header contains just one line that
identifies the version of PDF.
Example: %PDF-1.6

Reading the specification for each file format will tell you how to identify files of that format.
TIFF files - Check bytes 1 and 2 for 0x4D4D or 0x4949 and bytes 2-3 for the value '42'.
Page 13 of the spec reads:
A TIFF file begins with an 8-byte
image file header, containing the
following information: Bytes 0-1: The
byte order used within the file. Legal
values are: “II” (4949.H) “MM”
(4D4D.H) In the “II” format, byte
order is always from the least
significant byte to the most
significant byte, for both 16-bit and
32-bit integers This is called
little-endian byte order. In the “MM”
format, byte order is always from most
significant to least significant, for
both 16-bit and 32-bit integers. This
is called big-endian byte order. Bytes
2-3 An arbitrary but carefully chosen
number (42) that further identifies
the file as a TIFF file. The byte
order depends on the value of Bytes
0-1.
PDF files start with the PDF version followed by several binary bytes. (I think you now have to purchase the ISO spec for the current version.)
Section 7.5.2
The first line of a PDF file shall be
a header consisting of the 5
characters %PDF– followed by a version
number of the form 1.N, where N is a
digit between 0 and 7. A conforming
reader shall accept files with any of
the following headers: %PDF–1.0,
%PDF–1.1, %PDF–1.2, %PDF–1.3, %PDF–1.4,
%PDF–1.5, %PDF–1.6, %PDF–1.7 Beginning
with PDF 1.4, the Version entry in the
document’s catalog dictionary (located
via the Root entry in the file’s
trailer, as described in 7.5.5, "File
Trailer"), if present, shall be used
instead of the version specified in
the Header.
If a PDF file contains binary data, as
most do (see 7.2, "Lexical
Conventions"), the header line shall
be immediately followed by a comment
line containing at least four binary
characters—that is, characters whose
codes are 128 or greater. This ensures
proper behaviour of file transfer
applications that inspect data near
the beginning of a file to determine
whether to treat the file’s contents
as text or as binary.
Of course you could do a "deeper" check on each file by checking more file specific items.

A very useful list of File Signatures aka "magic numbers" by Gary Kessler is available http://www.garykessler.net/library/file_sigs.html

Internally, the file header information should help. if you do a low-level file open, such as StreamReader() or FOPEN(), look at the first two characters in the file... Almost every file type has its own signature.
PDF always starts with "%P" (but more specifically would have like %PDF)
TIFF appears to start with "II"
Bitmap files with "BM"
Executable files with "MZ"
I've had to deal with this in the past too... also to help prevent unwanted files from being uploaded to a given site and immediately aborting it once checked.
EDIT -- Posted sample code to read and test file header types
String fn = "Example.pdf";
StreamReader sr = new StreamReader( fn );
char[] buf = new char[5];
sr.Read( buf, 0, 4);
sr.Close();
String Hdr = buf[0].ToString()
+ buf[1].ToString()
+ buf[2].ToString()
+ buf[3].ToString()
+ buf[4].ToString();
String WhatType;
if (Hdr.StartsWith("%PDF"))
WhatType = "PDF";
else if (Hdr.StartsWith("MZ"))
WhatType = "EXE or DLL";
else if (Hdr.StartsWith("BM"))
WhatType = "BMP";
else if (Hdr.StartsWith("?_"))
WhatType = "HLP (help file)";
else if (Hdr.StartsWith("\0\0\1"))
WhatType = "Icon (.ico)";
else if (Hdr.StartsWith("\0\0\2"))
WhatType = "Cursor (.cur)";
else
WhatType = "Unknown";

If you go here, you will see that the TIFF usually starts with "magic numbers" 0x49 0x49 0x2A 0x00 (some other definitions are also given), which is the first 4 bytes of the file.
So just use these first 4 bytes to determine whether file is TIFF or not.
EDIT, it is probably better to do it the other way, and detect PDF first. The magic numbers for PDF are more standardized: As Plinth kindly pointed out they start with "%PDF" somewhere in the first 1024 bytes (0x25 0x50 0x44 0x46). source

You are going to have to write an ashx to get the file requested.
then, your handler should read the first few bytes (or so) to determine what the file type really is-- PDF and TIFF's have "magic numers" in the beginning of the file that you can use to determin this, then set your Response Headers accordingly.

you can use Myrmec to identify the file type, this library use the file byte head. this library avaliable on nuget "Myrmec",and this is the repo, myrmec also support mime type,you can try it. the code will like this :
// create a sniffer instance.
Sniffer sniffer = new Sniffer();
// populate with mata data.
sniffer.Populate(FileTypes.CommonFileTypes);
// get file head byte, may be 20 bytes enough.
byte[] fileHead = ReadFileHead();
// start match.
List<string> results = sniffer.Match(fileHead);
and get mime type :
List<string> result = sniffer.Match(head);
string mimeType = MimeTypes.GetMimeType(result.First());
but that support tiff only "49 49 2A 00" and "4D 4D 00 2A" two signature, if you have more you can add your self, may be you can see the readme file of myrmec for help. myrmec github repo

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.