Search and Replace PDF using Itext - c#

I need to generate a PDF based on some user inputs. The output PDF have some images, tables and texts. I think that Itext is not user friendly for programmatically generate this report.
Since the report I need to generate is quite complicated, I was wondering if it is possible to create a template PDF and then load -> search -> replace the strings/images I want.
The template PDF can be a tagged pdf.
Is it possible to do that?
Is it the best approach?
EDIT: I´m using WPF + MVVM + .Net 3.5

Replacing text within a PDF file is not simple. The PDF fileformat uses a dictionary at the file end where elements are listet with their byte offset within the file, also some elements have a field where they give their own length given in bytes. If these offsets are not met, the reader will probably report a broken pdf.
You should have a look at reporting as it is made for these tasks:
http://msdn.microsoft.com/en-us/library/bb885185%28v=vs.100%29.aspx
You can create a template with the report designer, set your data and export it to pdf.

Related

How to get a part of docx file as image, not whole page

I have been trying to insert a docx file contents into crystal reports. (In Crystal Reports if I insert OLE object and select a docx file, program imports only the used part of the word file, not the whole page.)
I have searched to convert Word document to image, but all I found was about getting a whole page or a specific image in a page. In my case I can have text and also line objects and maybe some graphic files in maybe 6 - 7 paragraphs. And only used part of the page(6 - 7 paragraphs) should be my image file.
Our customers are using this Word files as a header of the report. For now the only way I can do is to have screen shoot of the design. And it doesn't always look good. Is there a way to get this part as image?
Note: I can't use directly doc file in crystal reports, I need to have the data in SQL database. So it has to be graphic file.
You may find the Selection.CopyAsPicture method which works the same way as the Copy method. Basically you need to select the required content in a Word Document and then use this method to take a picture of selected content and keep it in a memory for further usage. For example, the following example copies the contents of the active document as a picture and pastes it as a picture at the end of the document:
Sub CopyPasteAsPicture()
ActiveDocument.Content.Select
With Selection
.CopyAsPicture
.Collapse Direction:=wdCollapseEnd
.PasteSpecial DataType:=wdPasteMetafilePicture
End With
End Sub
In C# you could use the same properties and methods, the Word object model is common for all kind of applications.

How to create a text file from an existing Pdf document in C#.net

I have PDF document data with table structure format and I would like to convert that PDF file into a text file with the same structure with margin and spaces between text in pdf
You need to write your own PDF tool then. Which is not exactly an easy task. Honestly, 3rd party tools make your job much easier, why don't you want to use one?
If you change your mind, I can suggest iTextSharp. I've used it in the past with great success. Here are some example to get you going:
http://www.codeproject.com/Articles/12445/Converting-PDF-to-Text-in-C
ps. there are 3 tools used in there.

How can i generate a word document docx with existing text as in a template, but fill in value thru a asp.net web form?

I want to generate a word document (.docx) file, which is based on a template could be a word template or that the text can be stored in code.
Now the document generated should be based on the Input provided by the user using an ASP.net Web Form with Text Boxes to input the values.
These values will be taken in and placed in placeholder positions in the word template.
How can i use a word template in an ASP.net web project?
How can i take the values from the ASP.net webform and pass it to the word template, fill in the placeholder with the text and then generate the word document?
I have seen some examples where the text to be generated is created as HTML and then the file exported is given filetype msword to save as a word file. But in this case i wonder, how can i give page numbers and other header or footer values to the generated document.
So if i can use a word template, to just fill in the values the user wants and then generate the final document, then that's great.
Is there anything new to do this in Visual Studio 2012? I am trying to do this using C#.
You have several options for creating word documents -
Use word automation - this is using Ms Word directly. Not recommended if you're running on a server - it's slow and suffers from extensibility and other issues.
Use OpenXML - this is Microsoft's SDK to work with Office 2007+ file formats. There are several ways to accomplish your goal with OpenXML - like using bookmarks or content controls. Look at this SO question - How to replace content in template docx document and Open XML SDK 2.0 (Aug 09)? or this one - How to create *.docx files from a template in C# or google it (while not that difficult, the entire solution is more that a few lines of code and does require a bit of understanding)
Use third party tools or controls - I'm not really familiar with them.
You can create your template by creating a Word document ( including logos, header, footer, etc), putting the placeholders and save it as "Word 2003 xml" document.
Next, you can read the template.xml as a text file, replace the placeholders with the values, and response the entire string as a content type "application/msword"
The placeholder could be {MY_PLACEHOLDER1}
Before creating the template, you must deactivate ( in Word ) all related with Ortography, because the "bad ortography" highlightings are stored as a part of the xml document, and {MY_PLACEHOLDER1} can be stored as MY_PLACEHOLDER1 (the "{" and "}" characters are stored separated with their own word xml tags )
Regards,
You can use TemplateEngine.Docx for replacing the text in template document

C# PDFSharp: Examples of how to strip text from PDF?

I have a fairly simple task: I need to read a PDF file and write out its image contents while ignoring its text contents. So essentially I need to do the complement of "save as text".
Ideally, I would prefer to avoid any sort of re-compression of the image contents but if it's not possible, it's ok too.
Are the examples of how to do it?
Thanks!
Extracting text from a PDF file with PDFsharp is not a simple task.
It was discussed recently in this thread:
https://stackoverflow.com/a/9161732/162529
Extracting text from a PDF with PdfSharp can actually be very easy, depending on the document type and what you intend to do with it. If the text is in the document as text, and not an image, and you don't care about the position or format, then it's quite simple. This code gets all of the text of the first page in the PDFs I'm working with:
var doc = PdfReader.Open(docPath);
string pageText = doc.Pages[0].Contents.Elements.GetDictionary(0).Stream.ToString();
doc.Pages.Count gives you the total number of pages, and you access each one through the doc.Pages array with the index. I don't recommend using foreach and Linq here, as the interfaces aren't implemented well. The index passed into GetDictionary is for which PDF document element - this may vary based on how the documents are produced. If you don't get the text you're looking for, try looping through all of the elements.
The text that this produces will be full of various PDF formatting codes. If all you need to do is extract strings, though, you can find the ones you want using Regex or any other appropriate string searching code. If you need to do anything with the formatting or positioning, then good luck - from what I can tell, you'll need it.
Example of PDFSharp libraries extracting images from .pdf file:
link
library
EDIT:
Then if you want to extract text from image you have to use OCR libraries.
There are two good OCRs tessnet and MODI
Link to thread on stack
But I fully can recommend MODI which I am using now. Some sample # codeproject.
EDIT 2 :
If you don't want to read text from extracted images, you should write new PDF document and put all of them into it. For writing PDFs I use MigraDoc. It is not difficult to use that library.

Create PDF from Merge document via ASP C#

Okay, here is what i want to do..
A user will run a query and return 3 pieces of data (really more but for this example lets say 3.. such as name, city, state).
The user will then be prompted with a list of 1 or more documents that are basically word merge documents. Does not have to be word but needs to be something standard.
Once they pic the document, we need to merge the data with the document and would then like to have a PDF displayed back to the user so they can print or save.
So, looking for ideas on what would be a suggested format to store the merge documents in as well as how to do the merge and create the PDF via ASP C#.
Thanks!
firstly ... stop thinking "doc" as standard cause .. it is not.
PDF is much more standard than doc and using any online viewer it is easy to provide access to it's content
now...
for manipulating PDF files fore free u have a very nice C# Library called iTextSharp

Categories