Opening large PDF and checking signature in C#

Opening large PDF and checking signature in C# - c#

We are working with PDF files and want to check them if they are signed or not.
Currently we are using SecureBlackbox and it is working fine for small files.
Problem is with large (more than 2GB) files. When loading it gets out of memory or similar error.
Is there any way or library (for .net) to open large files and getting signature info from PDF?
What I tried so far with no success: ITextSharp, PdfSharp, SecureBlackbox
Ideal way would be not loading whole content of file, just information what is needed, so not to use so much memory or space.
Thank you for your suggestions.
Have a nice day.
Fitik

Related

Temporary files without saving to HDD

Here is my case:
I'm using ABCPDF to generate a HTML document from a .DOCX file that I need to show on the web.
When you export to HTML from ABCPDF you generate a HTML and a folder with support files (.css, .js, .png)
Now these HTML files may contain quite sensitive data so I immediately after generating the files, I move them to a password-protected .zip file (from which I fetch them later)
The problem is, that this leaves the files unencrypted on the HDD for a few seconds and even longer if I'm (for some reason) unable to delete them at once.
I'd like suggestions for another way of doing this. I've looked in to a ram drive, but I'm not happy with installing such drivers on my servers. (AND the RAM drive would still be accessible from the OS)
The cause of the problem here might be that ABCPDF can only export HTML as files (since its multiple files) and not as a stream.
Any ideas?
I'm using .NET 4.6.x and c#

Since all your files except the .HTML are anonymous, you can use the suggested way of writing the HTML to a stream. Only all other files will be stored to the file system.
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/doc/1-methods/save.htm
When saving to a Stream the format can be indicated using a Doc.SaveOptions.FileExtension property such as ".htm" or ".xps". For HTML you must provide a sensible value for the Doc.SaveOptions.Folder property.
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/xsaveoptions/2-properties/folder.htm
This property specifies the folder where to store additional data such as images and fonts. it is only used when exporting documents to HTML. It is ignored otherwise.
For a start, try using a simple MemoryStream to hold the sensitive data. If you get large files or high traffic, open an encrypted stream to a file on your system.

.NET graphic libraries to display images (pdf, .docx and any other format of image) in the browser

I am developing a ASP .NET MVC application where users are able to upload files to a repository. Those files could be pdf, doc, any type of image and so on.
When the user select a file to be imported I would like to display this file in the browser so they can review its contents before the upload.
I know I could use some sort of IFrame to display pdf but I am looking for some specific class or .net libraries to implement this feature.
I just need a north.

This is an extremely difficult problem. There are some libraries that can help. For instance PDF files might be rendered to images with ghostscript. Word and Excel files might be converted to PDF or image with a number of libraries. None of them, AFAIK, are very good at it so I can not recommend one.
You could automate MSO to perform the conversion to PDF, but that is decidedly not safe for server code. Another possibility is convert source documents to SWF files (like flexpaper) and display in flash. There are some great libraries out there, but it will limit your supported clients. Sharepoint has support for providing some of this capability as well. Others have used OpenOffice to convert MSO documents but also at a loss of quality.
I can't really advise any specific direction as it is highly dependent on what you/your company is willing to spend and the desired results. Good luck.

You could try to rely on Windows and the explorer thumbnails for it, like here, but then you'd have to make sure that:
You can abuse the server in the most elaborate way (install stuff, talk to the shell from ASP.NET)
You have a thumbnail provider installed on the server for every type that you want to preview. I guess from the moment you can see the thumbnail in explorer, you're set. So for pdf, you might need to install PDF Reader from Adobe.
Docx files should be saved with thumbnail checked (see link). There seems to be no other easy, free way to convert a docx to a thumbnail. The "best" solution I came across, was saving it automatically again somehow, and making sure the thumbnail option is checked.

I don't want to say that's impossible, but it can't be done with finite effort.
What you are asking for is a browser-based solution, because you want the user to be able to "review" the document before uploading.
Therefore you cannot use a server side solution, which is essentially what is being asked by referring to a ".Net library".
.Net libraries are dependent on an installed version of .Net, which does not exist in all versions for all operating systems for which graphical browsers exist.
Next, recent changes in browser security do not allow to read the full client-side file name of the selected file in the input field.
You'd have to rely on HTML5 and its FileReader to access the file's byte stream, but even then you can only retrieve image from image files. (see sample)
Excluding browser-based solutions in Flash, ActiveX, Java, due to browser and platform support, this leaves JavaScript as the only "reasonable" solution: you'd need a library for each supported format to either convert a file into an image in an image format supported by browsers, or extract the text(+image) representation of a file.

Great awnsers... Just want to share the result of my research and I found a nice client-based solution supported by Mozilla Labs. This is a framework based on HTML5 and Javascript with no native code needed.
Here the project website:
https://github.com/mozilla/pdf.js
This is what you are capable of:
http://mozilla.github.com/pdf.js/web/viewer.html
And for the last a great video explaning how everthing works
http://www.youtube.com/watch?v=Iv15UY-4Fg8&noredirect=1
Reguarding my question we are going to converter every possible file to PDF on the server and then render this PDF using this framework.

how to write dll file to access pdf

Hi i'm new programming and i have written few application to access pdf content by using some dll files, but now my question is how can we write our own dll to access the pdf files. I know it's a big process but i'm very much interested to learn about this. any one please help me.

You can start by reading the PDF specification (warning 32MB behind this link) in order to understand how the PDF file format is implemented. This is necessary if you want to be able to parse it and extract the information you are interested in.
In the meantime (as this reading might occupy you during a certain amount of time) if you have pressing project deadlines you probably want to use an existing library such as iTextSharp.

I know it's a big process but i'm very much interested to learn about this.
That's true. I'd like to suggest to study some open source APIs (iTextSharp) and PDF SDK.

Word/PDF Generation in C# with hundreds of pages is too slow

I'm having speed issue generating documentation in C#.
I am basically trying to create documents with 600+ pages. But the tools I have used handle this very slowly.
I first tried using DocX by Novacode. Creation of this document with 600+ pages takes upwards to 3 minutes. I learned that there could be an issue with the function "InsertDocument" so I tried to find a different solution.
I started looking into opening a HTML document into word. While this is a fast solution, images are not embedded into the document. The HTML syntax (src="data:image/png;base64,xxxx") is not supported in MS Word.
I could use URLs to the images, but then if the internet connection is down, the images would not display.
I then started looking into a HTML->PDF solution. iTextSharp is a little faster than the DocX solution, but still takes 1-2 minutes to generate this document.
I am simply out of ideas. I'm not sure a commercial product would be better, and I don't want to shell out that kind of cash, to just have the same speed issue.
Has anyone had experience with creating Word/PDF documents with 600+ pages in C# that is fairly quick (1-5 seconds).

If you are trying to do this from a web server, you should be careful about resources consumption of this process, since you may run out of memory for example quite easily.
If at some point you decide to consider commercial libraries, maybe you could give Amyuni PDF Creator .Net a try. Amyuni PDF Creator .Net provides a "page by page" mode that saves resources when processing exceptionally long PDF documents. The idea is to save each page to the output file as soon as it is generated, maybe keeping a few pages in memory in case they need to be modified.
Take a look on these links for more details:
SartSave Method
EndSave Method
Processing large PDF files
usual disclaimer applies

You should be able to create a rich formatted DOCX file with 600+ pages in that time frame, but for PDF file I'm not sure... it will probably depend on your document content.
Anyway, I'm able to create a rather large DOCX file with GemBox.Document in just few seconds (0-4 sec), and PDF file as well, but it does take a bit more time then DOCX output.
You can also convert HTML to DOCX or HTML to PDF really fast, but that can depend on the HTML content itself.
If possible, you should prefer having well written HTML content that's "printer-friendly", doesn't have too much nesting levels, has optimized images, has single CSS file, etc. Also, if you're providing an URL as an input path then I think it's better to have embedded base64 images then links in order to avoid making additional web requests.
Last, I don't think there is much difference in Flat OPC XML vs DOCX. Basically they both generate the same content, it's just that DOCX file is additionally zipped which is a neglectable performance penalty.

ABCPdf uses a lot of memory and generates huge files. Solution?

I have a system for creating a pdf book from users own images. The images are in high resolution and the pdf end up with around 70pages with pictures on most of them.
When generating the pdf the in a local application on the server the process uses around 3Gb of ram which makes it crash more often then it succeeds. The files are also really huge, around 1,2 Gb. Running it through a print to pdf would make it a a hundred times smaller.
Is there a way to make ABCPdf use less memory and creating smaller files?

I have had a very similar experience with iTextSharp, where I was basically running out of memory anytime I create a large PDF with images in it.
I found that there is a function that I should call to release images after I am done with the image, since it holds it in memory in case you want to use it again or until you finally close the PDF.
Either reuse the image if they are repeating header/footer logos, or release images on the go.
Most likely that is the issue you are facing, but I have no experience in ABCPdf.

I've not used ABCPdf directly but I'd suspect that the images are the source of your issues, resize them before they are included in the PDF objects. I suspect that's what a print-to-PDF process will be doing.

One other note, for very large PDF's you may want to set "linearize" to false.
<pdfDoc.SaveOptions.Linearize = false;>
This optimizes the PDF for web streaming, so if you are streaming the PDF, then you might want to leave it as true, but I've found it drastically increases the memory used by ABCPDF during the save.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Opening large PDF and checking signature in C# - c#

Related

Temporary files without saving to HDD

.NET graphic libraries to display images (pdf, .docx and any other format of image) in the browser

how to write dll file to access pdf

Word/PDF Generation in C# with hundreds of pages is too slow

ABCPdf uses a lot of memory and generates huge files. Solution?

Categories

Resources