Where Can I Find Current Adobe Image Format Specifications? "Clipping Paths" - c#

This is in regards to Adobe's Image Resource Blocks(IRB), that they store in TIFF, PSD, JPEG Formats. It's also called "8BIM", This standard was released with Adobe's Photoshop 3 (November 1994).
IRB contains information on color profiles and clipping paths(what i am interested in).
The only piece of documentation i can find on the internet is this 4 page document provided by Adobe in 1990.
I have been searching imagemagick source code to find that the IRB ID's for clipping paths are from 2000 to 2998, to thats a usable 998 clipping paths.
So I managed to get a IRB Byte Array of each resource block from JPEG and a TIFF file, specified in the four page document. I rolled my own and tested out Graphics Mill to see if managed to get the same information.
I am not sure how to convert the clipping path byte array into anything usable since I don't even know the format that adobe photoshop uses. The idea was to map the clipping path to a c# GDI+ Graphics Path.
I think that it's kind of pathetic that adobe has been around for so many years being the industry leader in graphic design, but yet they can't even provide necessary documentation.
Can anybody suggest any documentation that i could use?

Barely it's still actual for you, but some info can be found here
http://www.adobe.com/devnet-apps/photoshop/fileformatashtml/PhotoshopFileFormats.htm

+1 to mistika
Adobe specification contains description of PSD clipping path. The last revision is dated by Oct 2013 and looks like Adobe is currently working on it. At least I have feeling that new stuff was added.
If you are looking for a code using PSD format, take a look into libpsd. That’s a nice open source, pretty easy to read. Sometimes more informative that spec.
As for GraphicsMill, since 6x version it can transform Adobe Clipping Path to GdiPlus GraphicsPath.

Related

OutOfMemoryException: Out of memory - System.Drawing.Graphics.FromImage

I get Out of Memory exception when using System.Drawing.Graphics.FromImage (using latest versions of .NET software on Windows 2012 server), ONLY on a very few specific image files. Most of the time the code works fine.
Typical answers to above issue indicate that certain resources are not being released.
Please consider the following before answering:-
This specific image is 34KB in size, is a .JPG image. Server is idle and has over 32GB RAM.
If I look at properties of
this jpg file, using windows explorer, by right-clicking on file, Windows says: 96 dpi and 32 bit depth.
BUT, if I open this jpg file using any graphics program (e.g. photoshop), the file properties show as: 72 dpi and 24 bit depth.
So, there is a mis-match between what I think file header properties
say and what the file actually contains.
Further, if I open the jpg
file using a graphics program and just re-save without changing
anything, the file properties in windows explorer now match/read correct
(72 dpi and 24 bit depth); and the file is processed by
System.Drawing.Graphics correctly, without throwing exception.
Due to my limited knowledge of the subject, I don't know if the file header of an image file can contain different data from actual file contents.
Questions:
How can I fix this problem? Or how can I tell System.Drawing.Graphics to ignore file header data and just look at actual image file contents? (as all graphics programs such as photoshop appear to do).
Thanks!
While I'm not a guru on the JPEG file format i did some research on the subject and here's what i found that could help you with your problem/questions.
Note that this answer will assume rather than specifically pinpoint the source of your problem due to the lack of an example file to inspect and tell what differs it from what the .Net/GDI+ JPEG/JFIF decoder expects.
The JPEG/JFIF format
Starting off, you might want to have some insight into the JPEG/JFIF format itself. After all, you have just encountered a file that .Net/GDI+ cannot load/parse. Since i don't have the file you experience issues with i would suggest you load it up in a hex editor of choice... that has the capability to highlight the file based on a template/code/parser.
I used 010 Editor and the JPEG Template from Sweetscape's online template repository.
010 Editor comes with a 30-day free trial.
What you are specifically looking for is the SOFn identifier and data in your bad JPEG.
In the SOFn data i can see that my image is Y (154) pixels high and X (640) pixels wide with a precision of 8 bits per component using 3 components, making it 24 bits per pixel.
The JPEG/JFIF format is a huge mix of many different implementations/formats. Obviously, you won't find every variant of the format in any library that has been around since long long ago before the odd JPEG formats appeared. Which the GDI+ library has.
In your case, i suspect you have run into the commonly asked about CMYK color profile on your JPEG files.
The .Net implementation
You said you used System.Drawing.Graphics.FromImage so i will assume your code looks like one of the following:
Graphics.FromImage(Image.FromFile("nope.jpg"));
Graphics.FromImage(Image.FromFile("nope.jpg", true));
Graphics.FromImage(Image.FromStream(nopeJpegStream));
From those calls, you may get an OutOfMemoryException when the native gdiplus.dll calls...
GdipGetImageGraphicsContext
GdipLoadImageFromFile
GdipLoadImageFromFileICM (or their respective *Stream variants) or
GdipImageForceValidation
... returns code 3 or 5 (Out of memory or Insufficient buffer respectively)
Which i gathered from referencesource.microsoft.com looking through the .Net sources there.
In any case, this most likely isn't an issue with .Net but an issue with GDI+ (gdiplus.dll) which Microsoft doesn't provide source code for. Which also means that there is no way of controlling how the image loads using the .Net wrappers and there's no way to check WHY it fails. (though i still suspect your JPEG is saved with CMYK)
Unfortunately, you are going to find many many more of these strange exceptions/errors as you move along in GDI+ land. As the library is all but deprecated in favor of the Windows Presentation Framework (WPF) and the Windows Imaging Component. (WIC)
My own testing
Since you never provided an image or any additional details on the subject i attempted to reproduce your issue. Which was a task in of itself, Image.FromFile (GdipLoadImageFromFile) will fail on many different file formats. At least it doesn't care what the file extension is, which thankfully Photoshop does.
So with your information, i finally managed to reproduce a .jpg file that loads fine in Photoshop, shows DPI as 96 and bit depth as 32. Of course, if i knew more about the JPEG format i probably could have gotten to the solution right away.
Showing this file (which i had to set to CMYK color space in Photoshop) in 010 Editor gave me the following SOFn data: Y (154) pixels high and X (640) pixels wide with a precision of 8 bits per component using 4 components, making it 32 bits per pixel.
I suspect you would see the same on your "bad" file.
And yes, Image.FromFile now throws an OutOfMemoryException!
Possible solutions
Use an external library for loading image files. (An exercise i leave to you but ImageMagick A.K.A Magick.NET seems like a good bet)
Make use of a command line tool (invoked when you get this exception) that can convert an image from one format to another. Or from JPEG to JPEG as it may be in this case. (Once again, ImageMagick's "convert" command line tool seems like a good bet)
Use the Windows Presentation Framework assemblies...
public static Image ImageFromFileWpf(string filename) {
/* Load the image into an encoder using the Presentation Framework.
* This is done by adding a frame (which in laymans terms is a layer) to a class derived BitmapEncoder.
* Only TIFF, Gif and JPEG XR supports multiple frames.
* Since we are going to convert our image to a GDI+ resource we won't support this as GDI+ doesn't (really) support it either.
* If you want/need support for layers/animated Gif files, create a similar method to this one that takes a BitmapFrame as an argument and then...
* 1. Instanciate the appropriate BitmapDecoder.
* 2. Iterate over the BitmapDecoders frames, feeding them to the new method.
* 3. Store the returned images in a collection of images.
*
* Finally, i opted to use a PngBitmapEncoder here which supports image transparency.
*/
var bitmapEncoder = new PngBitmapEncoder();
bitmapEncoder.Frames.Add(BitmapFrame.Create(new Uri(filename)));
// Use a memorystream as a handover from one file format to another.
using (var memoryStream = new MemoryStream()) {
bitmapEncoder.Save(memoryStream);
/* We MUST create a copy of our image from stream, MSDN specifically states that the stream must remain
* open throughout the lifetime of the image.
* We cannot instanciate the Image class, so we instanciate a Bitmap from our temporary image instead.
* Bitmaps are derived from Image anyways, so this is perfectly fine.
*/
var tempImage = Image.FromStream(memoryStream);
return new Bitmap(tempImage);
}
}
Based on this answer...
... Which i would say is a good option as it keeps you within the .Net framework.
Please keep in mind that when the method returns, you do specifically get a PNG image back. If you call Image.Save(string) on it you WILL save a PNG file, no matter what extension you save it as.
There is an overload Image.Save(string, ImageFormat) that will save the file using the intended file format. However, using that overload with ImageFormat.Jpeg will cause a loss in quality in the resulting file on more than one level.
That can be somewhat remedied by using the third overload:
foreach (var encoder in ImageCodecInfo.GetImageEncoders()) {
if (encoder.MimeType == "image/jpeg")
image.Save(filename, encoder, new EncoderParameters { Param = new [] { new EncoderParameter(Encoder.Quality, 100L) }});
}
Which, at least, will save a JPEG with "almost" no compression. GDI+ still doesn't do a good job at it.
However, no matter how much you twist and turn it. GDI+ will not be as good as a proper image library, which once again would most likely be ImageMagick. The further away you can get from GDI+, the better off you will be.
Conclusion / TL:DR and other notes.
Q: Can i load these files in .Net?
A: Yes, with a bit of fiddling and not using GDI+ for the initial loading of the file as GDI+ doesn't support the CMYK color space in JPEG files.
And even so, GDI+ lacks support for many things which is why i would recommend an external image library over GDI+.
Q: Mismatch in DPI and bit depth for file between Windows and <insert photo app here>
A: This is just proof that Windows JPEG loading differs from other applications JPEG loading routines. Only applications that use GDI or GDI+ would see the same information that Windows does when showing image details.
If you are using Windows 7+ then it isn't using GDI+ to show the information nor the image. It is using WPF or WIC to do so which are somewhat more up to date.
Q: If I open the jpg file using a graphics program and just re-save without changing anything, the file properties in windows explorer now match/read correct (72 dpi and 24 bit depth)
A: If you are using Adobe Photoshop and you use "Save for web" then the JPEG image will not be saved in CMYK format. Use "Save As..." instead and you will find that the color space (and bit depth) stays the same.
However, i wasn't able to reproduce your discrepancy in DPI and bit depth when loading my file in Photoshop. They are reported as the same in both Windows and Photoshop.
I had the same issue with this bug - seems as though the Graphics / Bitmap / Image library throws an exception with certain malformed images. Narrowing it down more than that, as Cadde shows, is difficult.
Following on from the great answer made by Cadde (which left using an external library as an exercise to the reader), I changed my code to the following using MagickNet which you can get here, or simply with NuGet: PM> Install-Package Magick.NET-Q16-x86.
The code tries to create a Graphics object from the image, and if it fails, uses ImageMagick to load the image again, convert to a Bitmap, and attempts to load from there.
Image bitmap = Bitmap.FromFile(filename, false);
Graphics graphics = null;
try
{
graphics = Graphics.FromImage(bitmap);
}
catch (OutOfMemoryException oome)
{
// Well, this looks like a buggy image.
// Try using alternate method
ImageMagick.MagickImage image = new ImageMagick.MagickImage(filename);
image.Resize(image.Width, image.Height);
image.Quality = 90;
image.CompressionMethod = ImageMagick.CompressionMethod.JPEG;
graphics = Graphics.FromImage(image.ToBitmap());
}
I had the same problem. My jpg file was generated from Photoshop. A simple solution is to open the jpg file with Winodws Paint, and save as a new jpg file. Import the new jpg file to C# project and the problem will be disappear.

OCR engine to capture characters from images

i'm using c# tessnet2 wrapper for Tesseract OCR engine to capture chracters of image files. i been searching everywhere if tessnet2 has any build in functions to overwrite certain characters and saved them into the same image file it's reading but have not found anything in regards to that. so what i'm thinking of doing is creating a new imagine file base on what i'm receiving from tessnet2 but i need to create the new image the same exact way but change just few things in the new created image. i'm not sure if i'm using the correct methology or if there is other c# assemblies out there that allow you to read characters from image file and at the same time allow you to manipulate as you need them.
Good luck--but tess has no way of replacing in the proper font. Raster graphics don't generally store glyph information. Even if it did, you would potentially be in violation of licenses and/or copyrights surrounding the fonts you'd be writing in. I'm not an expert in OCR, but I will confidently say that this is something not readily available out there in the wild.
To expand on Brian's answer:
You will need to do this yourself. I have not worked with Tesseract, but I have used the Nuance OCR engine. It will return you font information as well as coordinates for the character it has recognized (note that you will most likely have to compute the actual image coordinate as the OCR engine will have deskewed the image before performing the recognition). Once you get the coordinates and the deskew so that you can compute the actual coordinate, you can then use any image manipulation library (Leadtools, Accusoft, etc) or just straight GDI+ functions to clear the character, then using the font info and size info create a new character and merge it into the image. This is not trivial but certainly doable.
Edit:
It was late when I wrote the initial answer, wanted to clarify what is meant by font information. The OCR engine will give you information regarding the point size, whether its bold/italicized and the font family (Seriph, etc). I do not know of one that will tell you the exact font that the document is in. If you have a sample of the documents that you will process, then you can make a good guess based on the info the OCR engine gives you.

.NET component for color PDF to grayscale conversion

Currently i use Ghostscript to convert color PDF's to grayscale PDF's. Now i'm looking for reliable .NET commercial or not commercial component/library for ghostscript replacement. I googled and I did not find any component/library that is able to do that easily or to do that at all.
EDIT #1:
Why Ghostscript does not work for me:
I implemented Ghostscript and I'm using it's native API's. The problem is that Ghostscript does not support multiple instances of the interpreter within a single process. -dJOBSERVER mode also does not work for me because i don't collect all job and them process them all at once. It happens that Ghostscript is processing large job which takes around 20 minutes and meanwhile i get some smaller job which has to be processed ASAP and cannot wait 20 minutes. Other problem is that Ghostscript page processed events are not easily to catch. I wrote a parser for ghostscript stdout messages and i can read out processed page number but not for each page when it's processed as ghostscript pushes message for group of processed pages. There are couple of more problems with Ghostscript like producing bad pdf's, duplicating font problems.....
You can find one more problem i had with ghostscript here: Ghostscript - PS to PDF - Inverted images problem
-
a year after UPDATE:
Before a year a go i asked this question. Later i made my own solution by using iTextSharp.
You can take a look at the converting PDF to grayscale solution here:
http://habjan.blogspot.com/2013/09/proof-of-concept-converting-pdf-files.html
or
https://itextsharpextended.codeplex.com/
Works for me in most cases :)
Not quite an answer, but I think you dismiss Ghostscript too quickly.
Are you aware of the GhostScript API (for in-process Ghostscript)? Or of the -dJOBSERVER mode that can take a series of PS commands piped to its standard in?
That still won't get you your callbacks however, and it's still not multi-threaded.
As previously stated, iText could do it, but it would be a matter of walking through all the content and images looking for non-grayscale color spaces and converting them in a space-specific manner.
You'd also have to replace the pixel data in any images you might find.
The good news is that iText[Sharp] is capable of operating in multiple threads, provided each document is used from one thread at a time.
I suspect this is also the case for the suggested commercial library, which isn't such a good deal.
And then a light went on above my head... drawn in gray scale.
Blending modes and transparency groups!
Take all the current page content and stick it in a transparency group that is blended with a solid black rectangle that covers the page. I think there's even a luminosity to alpha blend mode... lets see here.
Yep, PDF reference section 11.6.5.2 "Soft Mask Dictionaries". You'll want a "luminosity" group.
Now, the bad news. If your goal in switching to gray scale is to save space, this will fail utterly. It'll actually make each file a little larger... say a 100 bytes per page, give or take.
The software rendering the PDF better be pretty hot stuff too. Your cousin's undergrad rendering project need not apply. This is advanced graphics stuff here, infrequently used by Common PDF Files, so the last sort of thing to be implemented.
So... For each original page
Create a new page.
Cover it with a black background.
Cover it with a white rectangle (had it backwards earlier) in a transparency group that uses a soft mask dictionary set to be the luminosity of the original page's content (now stashed in an XObject Form).
Because this is all your own code, you'll have ample opportunity to do whatever it is you want to do at the beginning or end of each page.
By golly, that's just crazy enough to work! It does require some PDF-Fu, but not nearly as much as the "convert each color space and image in various ways as I step through the document". Deeper knowledge, less code to write.
This isn't a .net library, but rather a potential work-around. You could install a virtual printer that is capable of writing PDF files. I would suggest CutePDF, as it's free, easy to use and does a great job 'printing' a large number of file formats to PDF. You can do nearly everything with CutePDF that you can do with a normal printer, including printing to grayscale.
After the virtual printer is installed, you can use c# to 'print' a greyscale version.
Edit: I just remembered that the free version is not silent. Once you print to the CutePDF printer, it will ask you to 'Save As'. They do have an SDK available for purchase, but I couldn't say whether it would be able to help you convert to grayscale.
If a commercial product is a valid option for you, allow me to recommend Amyuni PDF Creator .Net. By using it you will be able to enumerate all items inside the page and change their colors accordingly, images can also be set as grayscale. Usual disclaimers apply
Sample code using Amyuni PDF Creator ActiveX, the .Net version would be similar:
pdfdoc.ReportState = ReportStateConstants.acReportStateDesign;
object[] page_items = (object[])pdfdoc.get_ObjectAttribute("Pages[1]", "Objects");
string[] color_attributes = new string[] { "TextColor", "BackColor", "BorderColor", "StrokeColor" };
foreach (acObject page_item in page_items)
{
object _type = page_item["ObjectType"];
if ((ACPDFCREACTIVEX.ObjectTypeConstants)_type == ACPDFCREACTIVEX.ObjectTypeConstants.acObjectTypePicture)
{
page_item["GrayScale"] = true;
}
else
foreach (string attr_name in color_attributes)
{
try
{
Color color = System.Drawing.ColorTranslator.FromWin32((int)page_item[attr_name]);
int grayColor = (int)(0.3 * color.R + 0.59 * color.G + 0.11 * color.B);
int newColorRef = System.Drawing.ColorTranslator.ToWin32(Color.FromArgb(grayColor, grayColor, grayColor));
page_item[attr_name] = newColorRef;
}
catch { } //not all items have all kinds of color attributes
}
}
Before a year a go i asked this question. Later i made my own solution by using iTextSharp.
You can take a look at the converting PDF to grayscale solution here: https://itextsharpextended.codeplex.com/
iTextPdf a good product for creating/managing pdf it has got both commercial and free versions.
Have a look at aspose.pdf for .net it provides below features and a lot more.
Add and remove watermarks from PDF document
Set page margin, size, orientation, transition type, zoom factor and appearance of PDF document
..
And here is a list of open source pdf libraries.
After a lot of investigation i found out about ABCpdf from Websupergoo. Their component can easily convert any PDF page to grayscale by simple call to Recolor method. The component is commercial.

How do I check for corrupt TIFF images in C#?

I searched on how to check if a TIFF file is corrupt or not. Most suggests wrapping the Image.FromFile function in a try block. If it throws an OutOfMemoryException, its corrupt. Has anyone used this? Is it effective? Any alternatives?
Please check out the freeware called LibTiff .NET. It has the function to check if every page in a TIF file is corrupted or not. Even partially corrupt also no problem
http://bitmiracle.com/libtiff/
Thanks
Many tiff files won't open in the standard GDI+ .NET. That is, if you're running on Windows XP. Window 7 is much better. So any file which is not supported by GDI+ (i.e. fax, 16 bit gray scale, 48bpp RGB, tiled tiff, piramidical tiled tiff etc.) are then seen as 'corrupt'. And not just that, anything resulting in a bitmap over a few 100 MByte on a 32-bit system will also cause an out-of-memory exception.
If your goal is to support as much as possible of the TIFF standard, please start from LibTiff (derivates). I've used LibTiff.NET from BitMiracle (LGPL), which worked well for me. Please see my other posts
Many of the TIFF utilities are also based on LibTIFF, some of them are ported to C#.NET. This would be my suggestion if you want to validate the TIFF.
As for the TIFF specification suggested in other replies: of course this gives you bit-level control. But to my experience you won't need to go that low to have good TIFF support. The format is so versatile that it will cost you an enormous amount of time to start support from scratch.
It will only be corrupt in the sense that the frameworks methods cant open it.
There are some TIFF types that the framework cannot open -( In my case I cant remember the exact one, think it was one of the FAX type ones...)
That may be enough for you, if you are just looking a using the framework to manipulate images. After all I you cant open it, you cant use it...
ImageMagic - may give you more scope here
Without looking at the tiff, it may be difficult to see if its corrupt from a visual perspective, but if you have issues with processing an image, just create a function that does a basic test for this type of processing and handle the error?

Draw emf antialiased

Is there a way to draw an emf metafile (exported form a drawing tool) with antialiasing enabled? The tools I tried are not capable of exporting emf files antaliased so I wondered if I can turn it back on manually when drawing the emf in the OnPaint override of my Controls.
If anyone can confirm that is technically possible to generate antialiased emf files, another solution would be to use a drawing tool that can export to antialiased emf or have a 3rd party converter do this later. If anyone knowns such a tool, please let me know.
EDIT: When looking at the emf instructions it doesn't seem that emf itself can actually store the information whether it is to be rendered antialiased or not. At least I couldn't find anything. It is more likely that the antialiasing is done by the playback engine. For example when I open an emf in Word 2007 it is rendered antialiased. But not when I draw it with GDI+ "playback engine" (Graphics.DrawImage(...)). or when I view it the standard windows image viewer.
This makes me believe that some tools actually have their own emf playback engine. So maybe there is free .NET library (preferably with source code) that give me an object model of the emf instructions stored in the parsed emf file so I can play it back myself instead of using Graphics.DrawImage(...)?
We had a similar issue in a DirectX project. Upscaling and downscaling works to a certain degree, but it's faking it. If it's something you need to do over and over, you could perhaps parse the records of the WMF and draw them with GDI+ antialiased.
The following threads back this up (but they're from 2005 so things might have changed):
http://www.dotnet247.com/247reference/msgs/28/144605.aspx
http://www.dotnetmonster.com/Uwe/Forum.aspx/dotnet-sdk/1127/Graphics-DrawImage-metafile-no-antialiasing
[Edit:]
These three programs might do the job for you: I'm assuming you're ok with doing it by hand:
http://emf-to-vector-converter-command-line-ser.smartcode.com/info.html
http://www.verypdf.com/pdf-editor/index.html
http://www.ivanview.com/converter/emf-batch-converter.html
[Edit II:]
Well, here's a program that will let you inspect an EMF in various ways:
http://download.cnet.com/windows/3055-2383_4-10558240.html?tag=pdl-redir
...and here's a freeware library that will let you parse 122 of the EMF commands and output them in GDI+. That should probably do the trick:
http://www.codeproject.com/KB/GDI-plus/emfexplorer.aspx?msg=2359423
...oh, and notice also comment #3 on the codeproject page. Looks like someone have banged their heads against the wall before. Hope this solves your problem.
EMF is using GDI commands, not GDI+, so it has no notion of antialiasing. I suspect that when you ask GDI+ to render the file, it sends it to GDI and just copies the resulting bitmap.
Duplicating this in code would be the same as reimplementing GDI, so it's not terribly feasible. Not impossible, just a larger job than the benefit would justify. If there is an open source utility that can open EMF files outside of Windows, you might look into the source code.
My guess is that Word is using the downsampling trick.
EMF file is a list of GDI commands. So it won't be anti-aliaised, even if under GDI+, you put a SmoothingMode() call before the drawing. You'll have to enumerate the GDI commands, then translate it into GDI+ commands.
Under Vista/Seven, you can use GDI+ 1.1 function named GdipConvertToEmfPlus/ConvertToEmfPlus. If you want your program to work with XP, you should write your own enumeration, then conversion to GDI+ commands.
The GDI enumeration then conversion to GDI+ is possible has been done by emfexplorer, but I've written some code perhaps more easy to follow, even if it's written in Delphi.
I'm putting this answer just now (I'm late), because I spent a lot of time finding out a solution using ConvertToEmfPlus, and writing some tuned open source code in case this method is not available.

Categories