Converting HTML to PDF exact replica not shown

Converting HTML to PDF exact replica not shown - c#

I am doing an R&D for converting HTML to PDF.
We have created a page in Asp.net and has placed a CKEditor on it with simple options of selecting Fonts, Font Size, Bond, Italic etc. There is two more text boxes from where user can enter height and width of PDF to be generated. In addition to this we have div which shows preview of the content on the basis of text inserted in Editor. The Div height and width are set at run time, basically with this div we want to show how pdf is going to look like.
We are using wkhtmltopdf exe for generating PDF.
Now my problem is that the PDF being created is not exact replica of content shown in Div, sometimes it show exact content line by line but some times some words move out to next line in PDF
We have tried lot and lot of things to achieve exact result but could not successes any help is appriciable.

One thing you might try is using DOMPDF:
https://code.google.com/p/dompdf/
It's updated pretty frequently and has always given me great results.
Another suggestion would be to create styles specific to the PDF to get the desired result and then have those applied when the PDF is created.
In otherwords if you have a button "generate pdf" have it take all HTML content and then insert PDF specific style tags that you've tested in your PDF generator that makes the rendering the same as the browser. You could do this by just replacing:
</head>
With
<style>STYLES SPECIFIC TO PDF</style>
</head>
Upon creation of the PDF.
PDF rendering in my experience usually requires a few extra rules to make it look like web rending engines.
Hope that helps.

Related

Differentiate between blank page VS X-Cross page in PDF file Using iText7 C#

I am splitting a PDF document into multiple PDF documents using IText7. For example, I have a PDF document that contains multiple combinations of pages.
Page 1- Page with X-Cross symbol
Page 2- Blank Page
Page 3- Page with Text
Page 4- Page with X-Cross symbol
Page 5- Blank Page
Page 6- Page with Text
Page 7- Page with X-Cross symbol
when I tried to read the text of the page details, Page 1 & 2 both are returning the text as empty.
My question is: how could I determine the blank page and page with the X-Cross symbol? Any help would be greatly appreciated.

In a comment you explained that the X-Cross symbol is actually a bitmap image. Thus, to check whether there is such a symbol on a page, you have to apply bitmap image extraction, not text extraction. There are a number of questions and answers on bitmap image extraction on stack overflow, for example this answer by Alexey Subach who is on the iText 7 development team.
If you are lucky, the blank pages really are blank (and don't contain e.g. a purely white or purely transparent bitmap image). In that case you only need to check whether or not a page has (a) any text (which you already check) and whether or not it has (b) any bitmap:
If it has neither, it is a Blank Page.
If it has only an image, it is a Page with X-Cross symbol.
If it has text, it is a Page with Text.
If things are more complicated, you'll have to look more closely, e.g. analyze the bitmaps found on a page. If all those X-Cross symbol bitmaps are identical, you may for example compare found bitmap images with an example one you extracted first.

MigraDoc: How to add a blank page after a section on odd pages?

I want to create a PDF document for double-sided printing to save paper on an report.
On odd pages after the end of an section, there should be a blank page reading "This page is intentionally left blank.", such that the report can be split by chapters as needed.
Any hints how to do this?

MigraDoc creates blank pages as needed. Just indicate in the PageSetup that you are creating a double-sided layout.
The tricky part is adding the text "This page is intentionally left blank". I think I'd add that text with PDFsharp after rendering the page with MigraDoc. But I'd rather leave the page blank, assuming the reader is smart enough to realize it was left blank to get a double-sided layout.

Setting pages in a PDF from HTML

I'm converting ePubs into PDF's using iTextSharp and I have it all working fine using xmlWorkerHelper however when generating the pdf it cuts certain stuff across multiple pages. Is there a way to be able to get it to start a new page using xmlWorker? See the image below to see what I mean with a contents table.
As you can see at the top it finishes writing the text and then instantly does the contents table when ideally i'd like the contents table to be started on a new page.

You can use the CSS properties page-break-before and page-break-after. Only the value always is supported.
Assuming your contents table is a <table>, you can do something like this:
<table id="contents" style="page-break-before: always">
<!-- rest of the contents table -->
</table>

How do I insert an image in a word document as footer

I need to create and insert a QR code into existing word documents using .NET.
I've done the QR generation part. The 2 things I need to accomplish are:
Inserting the QR code in the footer of an existing word document (preferably using Open XML).
Each page of the word document has a unique QR code. This means that each footer would have to be different. (I could eliminate the footer and place the QR code as part of the body, but that word make flow of text complicated.)
Is it possible to accomplish this?

I haven't done this, but I believe that what you will need to do is
put each page in a separate Word section (and that means, in effect,
that you will need to decide what your page size and layout is)
create a footer containing one QR code to find out what XML Word
expects, and what type of image data you need to store in the .docx
(assuming that you are not attempting to store your image data
externally in spearate files).
create a footer for each section (and ensure that the footers are
not "linked to previous"), replicating the format you discovered in
point (2)
create a part for each QR code image, and a relationship to that
part
What I am even less sure about is whether Word will insist that you also store each image in another format (e.g. Windows Metafile or Extended metafile format). My guess is that Word will generate what it needs from your .jpg (or whatever). Or maybe you can use "AltChunks" in some useful way here.
The background to this is that if it were a .doc format document, you could have created a single footer containing a set of nested field codes that used the { PAGE } page number field to link to the correct image for each page - e.g.
{ INCLUDETEXT "c:\\myqrcodes\\qr{ PAGE }.jpg" }
or more likely, the slightly more complicated
{ PAGE \#"'{ INCLUDETEXT "c:\\myqrcodes\\qr{ PAGE }.jpg" }'" }
But if you try to save that as .docx format, even in compatibility mode, when you close and re-open, I think you wil just see one image on all pages. Further, even though that approach works with .doc format, it only works if the external image files are actually there and located at absolute addresses in the file system. If they are located at releative addresses (there is a way to do that) you or the end user will probably have to update the footer field codes to get the correct results.

read word file with text and picture

i have a word(Office) file. this file content text and picture.
how can read this file and show in <textarea> </textarea>;

The best way to display rich content like word document on UI is through html. You can export your word document to HTML and render it to asp.net UI controls. If you prefer, textarea, you have to implement custom textarea to support images from word-html file.
Also, you can use WebBrowser control to display this word-html file instead of textarea.

I don't believe you can read this into a <textarea> as you ask. (I will watch to see if somebody else shows how, because I want to see that too...) I believe the closest result you will get is to open the document into an <iframe> with the application/msword content type. If you are looking for some flexiblity in this, wrap the space in a <div> and swap out <textarea> for <iframe> at the server when appropriate.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Converting HTML to PDF exact replica not shown - c#

Related

Differentiate between blank page VS X-Cross page in PDF file Using iText7 C#

MigraDoc: How to add a blank page after a section on odd pages?

Setting pages in a PDF from HTML

How do I insert an image in a word document as footer

read word file with text and picture

Categories

Resources