I have a system for creating a pdf book from users own images. The images are in high resolution and the pdf end up with around 70pages with pictures on most of them.
When generating the pdf the in a local application on the server the process uses around 3Gb of ram which makes it crash more often then it succeeds. The files are also really huge, around 1,2 Gb. Running it through a print to pdf would make it a a hundred times smaller.
Is there a way to make ABCPdf use less memory and creating smaller files?
I have had a very similar experience with iTextSharp, where I was basically running out of memory anytime I create a large PDF with images in it.
I found that there is a function that I should call to release images after I am done with the image, since it holds it in memory in case you want to use it again or until you finally close the PDF.
Either reuse the image if they are repeating header/footer logos, or release images on the go.
Most likely that is the issue you are facing, but I have no experience in ABCPdf.
I've not used ABCPdf directly but I'd suspect that the images are the source of your issues, resize them before they are included in the PDF objects. I suspect that's what a print-to-PDF process will be doing.
One other note, for very large PDF's you may want to set "linearize" to false.
<pdfDoc.SaveOptions.Linearize = false;>
This optimizes the PDF for web streaming, so if you are streaming the PDF, then you might want to leave it as true, but I've found it drastically increases the memory used by ABCPDF during the save.
Related
I have written a program that has a feature for viewing images. It works fine for common files types, I simply use a picturebox. But I want to be able to load DNG (Digital Negative) images too.
I don't need to load the whole thing, loading just the baked in jpg preview would be fine, even if not all DNGs have it, I am willing to settle for this to avoid a huge hassle. Plus since DNGs are so big and my program is meant to browse through them quickly this might be best anyway.
I have tried using WindowsAPICodePack.Shell to get the Extra Large Bitmap or Icon from the file. I am already using this for my "explorer" thumbnails. But there is nothing larger than "Large" which I believe is around 256px. I know that the image has something around 1024px jpg preview.
Is there an easy way to get at the jpg preview of a DNG file?
I´m looking for a way to convert a High Resolution PDF file to a Low Resolution PDF file from an ASP.NET applicaitn (C#).
Users will import High Resolution PDF's and the solution should then have the possibility to provide both High Resolution PDF and Low Resolution PDF.
I´m looking for a API to do that. I have found a lot of PDF apis but none of them seems to do what I´m looking for.
ABCpdf .NET will do this for you. There are a variety of functions for resizing, resampling or recompressing the images within a PDF document. However given your requirements you probably just want to use the document reduce size operation.
To do this you just need code of the following form:
Doc doc = new Doc();
doc.Read(Server.MapPath("../mypics/sample.pdf"));
using (ReduceSizeOperation op = new ReduceSizeOperation(doc)) {
op.UnembedSimpleFonts = false; // though of course making these true...
op.UnembedComplexFonts = false; // ... would further reduce file size.
op.MonochromeImageDpi = 72;
op.GrayImageDpi = 72;
op.ColorImageDpi = 144;
op.Compact(true);
}
doc.Save(Server.MapPath("ReduceSizeOperation.pdf"));
I work on the ABCpdf .NET software component so my replies may feature concepts based around ABCpdf. It's just what I know. :-)
There are a number of possible approaches to this problem, one of which would simply be to export from InDesign twice (which would allow you to make the two required versions of PDF). If that is not feasible (which it might not be as exporting from InDesign can take a bit of time) there are definitely libraries on the market that can perform what you want to do. Here to you have a number of approaches:
1) While this will get me shot by most Adobe employees, you could re-distill your PDF file into a smaller file. I would not advocate to do this, I'm mentioning it to be complete. Risks involved would be introducing quality loss, artefacts and so on (mostly because PostScript doesn't support a number of features PDF does support and by redistilling you'd loose those features).
2) Use an application or library that was made for this task. callas pdfToolbox for example (warning, I'm affiliated with this company!) is capable of taking a PDF file and running corrections on it. Amongst those corrections are image downsampling, conversion to RGB, image re-compression (for example with JPEG-2000), throwing away unnecessary data in the PDF and much more. Given that the application has a command-line interface, it can easily be integrated into a c# process. Other applications with similar functionality would be from companies such as Enfocus, Apago and others.
3) Use a low-level PDF library (such as the Adobe PDF library that can be licensed from Adobe through DataLogics) and implement the necessary changes yourself. More work, but potentially more flexible.
Whatever approach you choose, know that you are starting from a high-quality PDF file and that your process should try to retain as much of that quality as possible (dependant on which application you have for the low resolution PDF file of course). Make sure you don't get into trouble by loosing proper overprint, transparency etc...
Probably you can just resize images in high resolution version PDF and this will give you much smaller files.
You can resize and/or recompress images using Docotic.Pdf library. For more details please take a look at my answer to a similar question.
Disclaimer: I am one of the developers of Docotic.Pdf library.
I'm having speed issue generating documentation in C#.
I am basically trying to create documents with 600+ pages. But the tools I have used handle this very slowly.
I first tried using DocX by Novacode. Creation of this document with 600+ pages takes upwards to 3 minutes. I learned that there could be an issue with the function "InsertDocument" so I tried to find a different solution.
I started looking into opening a HTML document into word. While this is a fast solution, images are not embedded into the document. The HTML syntax (src="") is not supported in MS Word.
I could use URLs to the images, but then if the internet connection is down, the images would not display.
I then started looking into a HTML->PDF solution. iTextSharp is a little faster than the DocX solution, but still takes 1-2 minutes to generate this document.
I am simply out of ideas. I'm not sure a commercial product would be better, and I don't want to shell out that kind of cash, to just have the same speed issue.
Has anyone had experience with creating Word/PDF documents with 600+ pages in C# that is fairly quick (1-5 seconds).
If you are trying to do this from a web server, you should be careful about resources consumption of this process, since you may run out of memory for example quite easily.
If at some point you decide to consider commercial libraries, maybe you could give Amyuni PDF Creator .Net a try. Amyuni PDF Creator .Net provides a "page by page" mode that saves resources when processing exceptionally long PDF documents. The idea is to save each page to the output file as soon as it is generated, maybe keeping a few pages in memory in case they need to be modified.
Take a look on these links for more details:
SartSave Method
EndSave Method
Processing large PDF files
usual disclaimer applies
You should be able to create a rich formatted DOCX file with 600+ pages in that time frame, but for PDF file I'm not sure... it will probably depend on your document content.
Anyway, I'm able to create a rather large DOCX file with GemBox.Document in just few seconds (0-4 sec), and PDF file as well, but it does take a bit more time then DOCX output.
You can also convert HTML to DOCX or HTML to PDF really fast, but that can depend on the HTML content itself.
If possible, you should prefer having well written HTML content that's "printer-friendly", doesn't have too much nesting levels, has optimized images, has single CSS file, etc. Also, if you're providing an URL as an input path then I think it's better to have embedded base64 images then links in order to avoid making additional web requests.
Last, I don't think there is much difference in Flat OPC XML vs DOCX. Basically they both generate the same content, it's just that DOCX file is additionally zipped which is a neglectable performance penalty.
I searched on how to check if a TIFF file is corrupt or not. Most suggests wrapping the Image.FromFile function in a try block. If it throws an OutOfMemoryException, its corrupt. Has anyone used this? Is it effective? Any alternatives?
Please check out the freeware called LibTiff .NET. It has the function to check if every page in a TIF file is corrupted or not. Even partially corrupt also no problem
http://bitmiracle.com/libtiff/
Thanks
Many tiff files won't open in the standard GDI+ .NET. That is, if you're running on Windows XP. Window 7 is much better. So any file which is not supported by GDI+ (i.e. fax, 16 bit gray scale, 48bpp RGB, tiled tiff, piramidical tiled tiff etc.) are then seen as 'corrupt'. And not just that, anything resulting in a bitmap over a few 100 MByte on a 32-bit system will also cause an out-of-memory exception.
If your goal is to support as much as possible of the TIFF standard, please start from LibTiff (derivates). I've used LibTiff.NET from BitMiracle (LGPL), which worked well for me. Please see my other posts
Many of the TIFF utilities are also based on LibTIFF, some of them are ported to C#.NET. This would be my suggestion if you want to validate the TIFF.
As for the TIFF specification suggested in other replies: of course this gives you bit-level control. But to my experience you won't need to go that low to have good TIFF support. The format is so versatile that it will cost you an enormous amount of time to start support from scratch.
It will only be corrupt in the sense that the frameworks methods cant open it.
There are some TIFF types that the framework cannot open -( In my case I cant remember the exact one, think it was one of the FAX type ones...)
That may be enough for you, if you are just looking a using the framework to manipulate images. After all I you cant open it, you cant use it...
ImageMagic - may give you more scope here
Without looking at the tiff, it may be difficult to see if its corrupt from a visual perspective, but if you have issues with processing an image, just create a function that does a basic test for this type of processing and handle the error?
Got a question. I have images hosted on my server. I already know of the method when an image is uploaded to resize it and save, but I have another thought in mind.
I was wondering if there is a way to resize when the image is requested from the user. Not when it was uploaded by the user.
So for example a user goes to upload an image and I DO NOT RESIZE it and save another copy of the resized image. Instead, when the image is requested by the user via an ASP.NET img control/tag it would resize the image on the fly to display it and display it via the img tag/control.
Why would I want to do this?
To save on disk space. Most servers have a disk space limit, but not a server processing limit. So I would like to save on disk space and use the processing space instead.
EDIT: As a startup website its currently better that I save disk than saving processing time. I don't have much money for large amount of space at this moment. Hopefully it will change when the site launches.
Any ideas? Thanks guys and girls.
I assume you can 'control' the urls to the resized images, so for example the full-sized image might be referenced as <img src="uploads/myphoto.jpg"/> the thumbnail could be to an ASPX or ASHX like <img src="uploads/myphoto.jpg.ashx"/>?
This article on CodeProject - Dynamic Image Resize seems to have exactly the source code you are looking for (and although it's in VB, it shouldn't be hard to port if you're a C# person). Hope that helps.
Finally, I'd encourage you consider the various forms of caching (both using Http-Headers, to ensure the images are cached at the client or proxy whenever possible; and using built-in ASP.NET features to avoid unnecessary processing of the same images over-and-over).
Although you'll be saving disk-quota, you're effectively slowing down every other page/request... just a thought.
Dynamic image resizing has numerous advantages, the least of which is reduced disk space usage. However, it does need to be combined with a form of persistent caching, such as either Amazon CloudFront or a disk cache.
Dynamic image resizing gives you great agility on your web site, whereas pre-generating image variants locks you in, preventing the eventual changes you will have to make. When combined with caching, there is no run-time performance difference between the two.
The ImageResizer library offers disk caching, CloudFront caching, and correct memory and cache management. It's been constantly improved and maintained since 2007, and is quite bulletproof. It's running a few social networking sites as well, some having over a million images.
It's a time-tested, traffic-tested, and unit-tested library :) It's also extremely simple to use - you just add ?width=x&height=y to the query string. Functionality can be added via 20+ plugins, so you won't be weighed down by unused code and features.
The article mentioned by CraigD is inherently limited in its performance by the fact that it uses an HttpHandler instead of using an HttpModule - an HttpHandler cannot pass a request back to IIS native code for execution after the resized image is written to disk. It also doesn't adjust jpeg encoding properly or play well with the ASP.NET cache or URL authorization system. Although, I do have to admit - compared to most of the sample code I've seen, it violates far fewer of the image resizing pitfalls I've compiled.
I strongly suggest using the ImageResizer library. It's good code, I wrote it :) If you do end up using sample code or writing your own, please avoid these pitfalls!
You can create an implementation of IHttpHandler to respond to image requests, in that handler you can have code that loads the image from disk and transforms to it a size that is needed. You need to return the proper mime type with the response, and use the WriteBytes method (or something like it, I forgot the name). Also, you may look into content expiration headers, so that the image may not have to be loaded every time by the same client, but is instead cached.
You claim unlimited processing but limited disk space. Most of the time, even if they don't enforce a processing limit, as you have more customers, hits to your site, processing will be a worse bottleneck than storage space, and it will cost more to add more processing. Furthermore,
If you have large images, resized and compressed versions will occupy %10 of the space of the originals, even if you store a display and thumbnail version.
Else, just serve them and display them resized by browser, it will be faster.
It is not really actual resize of image, it is rather resize when you display an image, but i used with success just simple
<img src="myimage" height="height you want to give" width="width you want
to give" alt="" />
It is working every time.