KOFAX Bitmap Files : How can I open them?

KOFAX Bitmap Files : How can I open them? - c#

My C# application receives image files from KOFAX VRS TWAIN driver in TWSX_FILE mode, but neither my own .NET based application nor Windows default image viewer can open these files. However, Adobe Photoshop can open them without any problem.
I tried FreeImage library and although it detects their dimensions correctly it renders black images.
It seems that KOFAX has some kind of its own bitmap format which its header is different from normal bmp files:
http://www.fileformat.info/mirror/egff/ch03_03.htm
I have uploaded one of these files here:
http://www.box.net/shared/aby42aagz4
I wanted to know how can I open these images in my applications, anybody knows any lightweight open source/free library or C++/C# code snippet, supporting this image format?

You've basically answered your own question: The file is neither a Windows bitmap file nor is it in the documented Kofax Raster Format.
As you pointed out, the first two bytes are 'BM', which would indicate the file is purporting to be a Windows bitmap. However, if that were truly the case the next four bytes would contain the file size. In your sample file, the next four bytes contain a value much bigger than the actual file size so it can't be correctly interpreted as a Windows bitmap file.
As the fileformat.info site you linked to states, if the file was truly in Kofax Raster Format, it would start with the bytes '68464B2Eh'. Thus, your file isn't in Kofax Raster Format either. In fact, I tried opening it with Kofax's VCDemo software and got the following error: "Error 20204 - Internal invalid state"
Thus, Kofax's own software thinks the file is corrupt.
The fact that Photoshop can open it and display something doesn't necessarily mean it's a valid image file format. Image processing software packages will often simply try to guess at interpreting the raw bytes of the file. Sometimes they get lucky, sometimes not.
Trying to find something that can read the files assumes that the file is in a standard format, which it isn't. Thus, I wouldn't search for something that could read the file but instead search for why the VRS/TWAIN configuration you are using is producing a non-standard format.

Related

Microsoft Photo Editor | Conversion | File Signatures

I'm working on extracting images that were saved in an access database. Some were saved as Bitmap Image, some where saved as Microsoft Photo Editor (which shows up as MSPhotoEd.3). For each record containing bitmap images I'm obtaining the hexidecimal data that is saved and using the file signatures to obtain the image data and converting to binary. This works perfectly and as expected.
I'm currently attempting to do the same thing for the Microsoft Photo Editor images, however I'm having a difficult time finding documentation as to how this file format is structured. I'm going through the hex and I see signatures that would correlate to a jpeg file, however there are several of them (3 markers FFD8FF) and they do not coincide with eachother (22 trailers FFD9). It could be that this format is completely different and those just happen to show up...I'm not sure.
Has anyone attempted to do this before? Is it possible to extract a jpeg image from this file format (Microsoft Photo Editor)? If not does anyone know what the best way would be to go about converting this file format to any other usable image format programmatically?

Reading font file content in winrt

How to read font file stream from WinRT platform? I need to get font file content from C# UWP. As far as you probably know there is no way to read files from Fonts folder directly. FilePicker is also not an option for me, since it's not a user responsibility to choose this folder. I found the way to enumerate font names using DirectWrite (C++) and then wrapping it with COM component which will be available in C# (https://code.msdn.microsoft.com/FontExplorer-lets-you-f01d415e#content), I wonder if the similar thing can be done to read font file content as byte[] or Stream?

You cannot directly read the TTF file from a UWP app without the user navigating to the file manually. The UWP application is not allowed to open files without the user being prompted unless they are in specific locations.
Also, as mentioned in a comment, many fonts may not be distributed or embedded without special licenses.

Good news: PDF export doesn't make much sense in windows 10. Windows 10 has build-in PDF printer. So, it's better to kill 2 birds with one stone: implement printing and get PDF export free of charge.

Assuming you already got as far as you have created IDWriteFontFile instance, then it's easy to read arbitrary file fragment:
Get file reference key with IDwriteFontFile::GetReferenceKey();.
Get loader interface with IDWriteFontFile::GetLoader();
Create stream instance with IDWriteFontFileLoader::CreateStreamFromKey() using key from step 1.
Use IDWriteFontFileStream::ReadFileFragment/ReleaseFileFragment to read from file stream to your buffer.

OutOfMemoryException: Out of memory - System.Drawing.Graphics.FromImage

I get Out of Memory exception when using System.Drawing.Graphics.FromImage (using latest versions of .NET software on Windows 2012 server), ONLY on a very few specific image files. Most of the time the code works fine.
Typical answers to above issue indicate that certain resources are not being released.
Please consider the following before answering:-
This specific image is 34KB in size, is a .JPG image. Server is idle and has over 32GB RAM.
If I look at properties of
this jpg file, using windows explorer, by right-clicking on file, Windows says: 96 dpi and 32 bit depth.
BUT, if I open this jpg file using any graphics program (e.g. photoshop), the file properties show as: 72 dpi and 24 bit depth.
So, there is a mis-match between what I think file header properties
say and what the file actually contains.
Further, if I open the jpg
file using a graphics program and just re-save without changing
anything, the file properties in windows explorer now match/read correct
(72 dpi and 24 bit depth); and the file is processed by
System.Drawing.Graphics correctly, without throwing exception.
Due to my limited knowledge of the subject, I don't know if the file header of an image file can contain different data from actual file contents.
Questions:
How can I fix this problem? Or how can I tell System.Drawing.Graphics to ignore file header data and just look at actual image file contents? (as all graphics programs such as photoshop appear to do).
Thanks!

While I'm not a guru on the JPEG file format i did some research on the subject and here's what i found that could help you with your problem/questions.
Note that this answer will assume rather than specifically pinpoint the source of your problem due to the lack of an example file to inspect and tell what differs it from what the .Net/GDI+ JPEG/JFIF decoder expects.
The JPEG/JFIF format
Starting off, you might want to have some insight into the JPEG/JFIF format itself. After all, you have just encountered a file that .Net/GDI+ cannot load/parse. Since i don't have the file you experience issues with i would suggest you load it up in a hex editor of choice... that has the capability to highlight the file based on a template/code/parser.
I used 010 Editor and the JPEG Template from Sweetscape's online template repository.
010 Editor comes with a 30-day free trial.
What you are specifically looking for is the SOFn identifier and data in your bad JPEG.
In the SOFn data i can see that my image is Y (154) pixels high and X (640) pixels wide with a precision of 8 bits per component using 3 components, making it 24 bits per pixel.
The JPEG/JFIF format is a huge mix of many different implementations/formats. Obviously, you won't find every variant of the format in any library that has been around since long long ago before the odd JPEG formats appeared. Which the GDI+ library has.
In your case, i suspect you have run into the commonly asked about CMYK color profile on your JPEG files.
The .Net implementation
You said you used System.Drawing.Graphics.FromImage so i will assume your code looks like one of the following:
Graphics.FromImage(Image.FromFile("nope.jpg"));
Graphics.FromImage(Image.FromFile("nope.jpg", true));
Graphics.FromImage(Image.FromStream(nopeJpegStream));
From those calls, you may get an OutOfMemoryException when the native gdiplus.dll calls...
GdipGetImageGraphicsContext
GdipLoadImageFromFile
GdipLoadImageFromFileICM (or their respective *Stream variants) or
GdipImageForceValidation
... returns code 3 or 5 (Out of memory or Insufficient buffer respectively)
Which i gathered from referencesource.microsoft.com looking through the .Net sources there.
In any case, this most likely isn't an issue with .Net but an issue with GDI+ (gdiplus.dll) which Microsoft doesn't provide source code for. Which also means that there is no way of controlling how the image loads using the .Net wrappers and there's no way to check WHY it fails. (though i still suspect your JPEG is saved with CMYK)
Unfortunately, you are going to find many many more of these strange exceptions/errors as you move along in GDI+ land. As the library is all but deprecated in favor of the Windows Presentation Framework (WPF) and the Windows Imaging Component. (WIC)
My own testing
Since you never provided an image or any additional details on the subject i attempted to reproduce your issue. Which was a task in of itself, Image.FromFile (GdipLoadImageFromFile) will fail on many different file formats. At least it doesn't care what the file extension is, which thankfully Photoshop does.
So with your information, i finally managed to reproduce a .jpg file that loads fine in Photoshop, shows DPI as 96 and bit depth as 32. Of course, if i knew more about the JPEG format i probably could have gotten to the solution right away.
Showing this file (which i had to set to CMYK color space in Photoshop) in 010 Editor gave me the following SOFn data: Y (154) pixels high and X (640) pixels wide with a precision of 8 bits per component using 4 components, making it 32 bits per pixel.
I suspect you would see the same on your "bad" file.
And yes, Image.FromFile now throws an OutOfMemoryException!
Possible solutions
Use an external library for loading image files. (An exercise i leave to you but ImageMagick A.K.A Magick.NET seems like a good bet)
Make use of a command line tool (invoked when you get this exception) that can convert an image from one format to another. Or from JPEG to JPEG as it may be in this case. (Once again, ImageMagick's "convert" command line tool seems like a good bet)
Use the Windows Presentation Framework assemblies...
public static Image ImageFromFileWpf(string filename) {
/* Load the image into an encoder using the Presentation Framework.
* This is done by adding a frame (which in laymans terms is a layer) to a class derived BitmapEncoder.
* Only TIFF, Gif and JPEG XR supports multiple frames.
* Since we are going to convert our image to a GDI+ resource we won't support this as GDI+ doesn't (really) support it either.
* If you want/need support for layers/animated Gif files, create a similar method to this one that takes a BitmapFrame as an argument and then...
* 1. Instanciate the appropriate BitmapDecoder.
* 2. Iterate over the BitmapDecoders frames, feeding them to the new method.
* 3. Store the returned images in a collection of images.
*
* Finally, i opted to use a PngBitmapEncoder here which supports image transparency.
*/
var bitmapEncoder = new PngBitmapEncoder();
bitmapEncoder.Frames.Add(BitmapFrame.Create(new Uri(filename)));
// Use a memorystream as a handover from one file format to another.
using (var memoryStream = new MemoryStream()) {
bitmapEncoder.Save(memoryStream);
/* We MUST create a copy of our image from stream, MSDN specifically states that the stream must remain
* open throughout the lifetime of the image.
* We cannot instanciate the Image class, so we instanciate a Bitmap from our temporary image instead.
* Bitmaps are derived from Image anyways, so this is perfectly fine.
*/
var tempImage = Image.FromStream(memoryStream);
return new Bitmap(tempImage);
}
}
Based on this answer...
... Which i would say is a good option as it keeps you within the .Net framework.
Please keep in mind that when the method returns, you do specifically get a PNG image back. If you call Image.Save(string) on it you WILL save a PNG file, no matter what extension you save it as.
There is an overload Image.Save(string, ImageFormat) that will save the file using the intended file format. However, using that overload with ImageFormat.Jpeg will cause a loss in quality in the resulting file on more than one level.
That can be somewhat remedied by using the third overload:
foreach (var encoder in ImageCodecInfo.GetImageEncoders()) {
if (encoder.MimeType == "image/jpeg")
image.Save(filename, encoder, new EncoderParameters { Param = new [] { new EncoderParameter(Encoder.Quality, 100L) }});
}
Which, at least, will save a JPEG with "almost" no compression. GDI+ still doesn't do a good job at it.
However, no matter how much you twist and turn it. GDI+ will not be as good as a proper image library, which once again would most likely be ImageMagick. The further away you can get from GDI+, the better off you will be.
Conclusion / TL:DR and other notes.
Q: Can i load these files in .Net?
A: Yes, with a bit of fiddling and not using GDI+ for the initial loading of the file as GDI+ doesn't support the CMYK color space in JPEG files.
And even so, GDI+ lacks support for many things which is why i would recommend an external image library over GDI+.
Q: Mismatch in DPI and bit depth for file between Windows and <insert photo app here>
A: This is just proof that Windows JPEG loading differs from other applications JPEG loading routines. Only applications that use GDI or GDI+ would see the same information that Windows does when showing image details.
If you are using Windows 7+ then it isn't using GDI+ to show the information nor the image. It is using WPF or WIC to do so which are somewhat more up to date.
Q: If I open the jpg file using a graphics program and just re-save without changing anything, the file properties in windows explorer now match/read correct (72 dpi and 24 bit depth)
A: If you are using Adobe Photoshop and you use "Save for web" then the JPEG image will not be saved in CMYK format. Use "Save As..." instead and you will find that the color space (and bit depth) stays the same.
However, i wasn't able to reproduce your discrepancy in DPI and bit depth when loading my file in Photoshop. They are reported as the same in both Windows and Photoshop.

I had the same issue with this bug - seems as though the Graphics / Bitmap / Image library throws an exception with certain malformed images. Narrowing it down more than that, as Cadde shows, is difficult.
Following on from the great answer made by Cadde (which left using an external library as an exercise to the reader), I changed my code to the following using MagickNet which you can get here, or simply with NuGet: PM> Install-Package Magick.NET-Q16-x86.
The code tries to create a Graphics object from the image, and if it fails, uses ImageMagick to load the image again, convert to a Bitmap, and attempts to load from there.
Image bitmap = Bitmap.FromFile(filename, false);
Graphics graphics = null;
try
{
graphics = Graphics.FromImage(bitmap);
}
catch (OutOfMemoryException oome)
{
// Well, this looks like a buggy image.
// Try using alternate method
ImageMagick.MagickImage image = new ImageMagick.MagickImage(filename);
image.Resize(image.Width, image.Height);
image.Quality = 90;
image.CompressionMethod = ImageMagick.CompressionMethod.JPEG;
graphics = Graphics.FromImage(image.ToBitmap());
}

I had the same problem. My jpg file was generated from Photoshop. A simple solution is to open the jpg file with Winodws Paint, and save as a new jpg file. Import the new jpg file to C# project and the problem will be disappear.

How to connect to a print driver in C#?

I have an task of converting bunch of formats like .pdf, .doc, .jpg, .xls, .txt, .bmp file types into .png format. I found a print driver that does that.
But how do I connect to that printer driver in .net? This will a server side component. I need to print documents into a folder using this print driver.
I am wondering how that can be done.
Thanks

Based on your updated comments, it sounds as if you are looking to convert a variety of images and document types to a single common image type. The process of taking one of the several possible source formats you mention and convert it to a bitmapped format such as .PNG is referred to as RENDERING or RASTERIZING. You want to take one of the input formats, render it to a bitmap representation, then write it to a file in .PNG format. While it certainly might be possible to do this using a print driver, to do so, you would typically be relying on an installed application that would allow you to pass the source document to it for printing to the driver. For this to work, each of the source file types you want to be able to handle this way needs to have an application installed which can take actions from the shell and do what you request. So for example if you want to do this with a .DOC file, you need Microsoft Word installed as it does properly respond to the PRINT shell command. However, the limitation with the shell based method is that it is always going to print to the DEFAULT system printer. So your driver would need to be setup as the default printer for the machine you are going to run your process on. Therefore you would need to see if each of the source types you want to be able to handle have an installed or installable application which will allow you to print them using the shell and the PRINT action verb.
Reference URLs:
Windows Shell Verbs and File Associations
Creating Shortcut Menu Handlers
The problem with this technique is not all applications respond to the PRINT verb correctly or at all. This usually works with all the major Microsoft applications, but you should test any other document types you want to support before going much further with this technique.
This also raises other questions that this doesn't even begin to address such as what to do about multiple page formats. You listed a few image types that are straight-forward and can be converted to PNG files pretty directly. But how do you want to render a multiple page Word document files into PNG format? Do you intend for only one very large PNG with all the pages one after another? Or do you intend for one PNG file per corresponding source document page? The print driver method might not give you very much control over that.
Depending on some of these details and just how much control and reliability you need in the process, you might want to consider a completely different route to your process. Maybe you should consider using tools/libraries that can read the source file formats you want to support and render them directly, after which you can save into your PNG files. One library I have used in the past that would seem to fit and allow you a high degree of control over the conversion (rendering/rasterization) process is LeadTools. It is a fairly pricey product, but my experience with it has been that it does support a wide variety of formats reliably.
LeadTools PDF and Document Readers SDK
There may be some other open source tools available that you could pull together to support this type of functionality, but I'm not familiar with any to point you to anything specific. But hopefully this helps give you some information to look at putting together a process that might be more reliable and give you greater control than trying to coerce a printer driver to do something you might not quite be able to make work reliably.

Server-side component implies something that doesn't have a human sitting at it (at least, not the human that is trying to use that printer). If this is the case then a print driver will not work - Print drivers that write their output to disk instead of a device always, in my experience, ask the user to select a place to save the file (present a Save As dialog).

To elaborate a little bit on what Boo mentioned :
Depending on the printer driver you are using, you may be able to tell it where to save your file.
The problem is by using a printer, how it normally works is that you can print from any application to a .png file. But the application itself has to know how to open and render (not talk to the printer) the content of the original file.
To continue down this path, you have to make sure your server component knows how to read and render content of each file type (.jpg, .pdf, .doc, etc.).
Assuming your server component knows how to render the content, the next step from here is to use the .NET Printing namespace to print your content to the .png printer.
For more details go to : http://msdn.microsoft.com/en-us/magazine/cc188767.aspx

How to create an image from a raw data of DICOM image

I have a raw pixel data in a byte[] from a DICOM image.
Now I would like to convert this byte[] to an Image object.
I tried:
Image img = Image.FromStream(new MemoryStream(byteArray));
but this is not working for me. What else should I be using ?

One thing to be aware of is that a dicom "image" is not necessarily just image data. The dicom file format contains much more than raw image data. This may be where you're getting hung up. Consider checking out the dicom file standard which you should be able to find linked on the wikipedia article for dicom. This should help you figure out how to parse out the information you're actually interested in.

You have to do the following
Identify the PIXEL DATA tag from the file. You may use FileStream to read byte by byte.
Read the pixel data
Convert it to RGB
Create a BitMap object from the RGB
Use Graphics class to draw the BitMap on a panel.

The pixel data usually (if not always) ends up at the end of the DICOM data. If you can figure out width, height, stride and color depth, it should be doable to skip to the (7FE0,0010) data element value and just grab the succeeding bytes. This is the trick that most normal image viewers use when they show DICOM images.

There is a C# library called EvilDicom (http://rexcardan.com/evildicom/) that can be used to pull the image out of a DICOM file. It has a tutorial on how to do it on the website.

You should use GDCM.
Grassroots DiCoM is a C++ library for DICOM medical files. It is automatically wrapped to python/C#/Java (using swig). It supports RAW, JPEG 8/12/16bits (lossy/lossless), JPEG 2000, JPEG-LS, RLE and deflated (zlib).
It is portable and is known to run on most system (Win32, linux, MacOSX).
http://gdcm.sourceforge.net/wiki/index.php/GDCM_Release_2.4
See for example:
http://gdcm.sourceforge.net/html/DecompressImage_8cs-example.html

Are you working with a pure standard DICOM File? I've been maintainning a DICOM parser for over a two years and I came across some realy strange DICOM files that didn't completely fulfill the standard (companies implementing their "own" twisted standard DICOM files) . flush you byte array into a file and test whether your image viewer(irfanview, picassa or whatever) can show it. If your code is working with a normal JPEG stream then from my experience , 99.9999% chance that this simply because the file voilate the standard in some strange way ( and believe me , medical companies does that a lot)
Also note that DICOM standard support several variants of the JPEG standard . could be that the Bitmap class doesn't support the data you get from the DICOM file. Can you please write down the transfer syntax?
You are welcome to send me the file (if it's not big) yossi1981#gmail.com , I can check it out , There was a time I've been hex-editing DICOM file for a half a year.

DICOM is a ridiculous specification and I sincerely hope it gets overhauled in the near future. That said Offis has a software suite "DCMTK" which is fairly good at converting dicoms with the various popular encodings. Just trying to skip ahead in the file x-bytes will probably be fine for a single file but if you have a volume or several volumes a more robust strategy is in order. I used DCMTK's conversion code and just grabbed the image bits before they went into a pnm. The file you'll be looking for in DCMTK is dcm2pnm or possibly dcmj2pnm depending on the encoding scheme.
I had a problem with the scale window that I fixed with one of the runtime flags. DCMTK is open source and comes with fairly simple build instructions.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.