Poor image quality when converting word docs with evo pdf - c#

I use the WordToPdfConverter from evo to convert a Word document to a PDF. The Word document, which is in RTF format, contains images such as a QR code.
Unfortunately, the image quality in the resulting PDF is very poor (hence the QR code won't be readable). Even if I disable image compression or set it to the lowest level (=> best quality), the resulting image has a very poor quality.
Is there any other way to control the image quality? Or is there a way to tell evo's WordToPdfConverter not to use JPG as the resulting image format but to stuck with the source format (e.g. PNG)?
var pdfConverter = new WordToPdfConverter();
// Set Pdf image options
pdfConverter.PdfDocumentOptions.JpegCompressionEnabled = false;
pdfConverter.PdfDocumentOptions.JpegCompressionLevel = 0;
var filename = #"C:\temp\evo\TestWordDoc.rtf";
pdfConverter.ConvertWordFileToFile(filename, Path.Combine(Path.GetDirectoryName(filename), $"{Path.GetFileNameWithoutExtension(filename)}_{DateTime.Now:yyyyMMddHHmmss}.pdf"));

Since RTF is a text format, you should convert it to PDF without having to do any image compression as that will take longer to process and will result in a larger output file + you might have issues with the image quality from embedded images.
I created a sample RTF file (test.rtf) that contains a QR code as you described:
I then took the RTF and ran it through the Document Converter from the Leadtools.Document.sdk Nuget. Just as a disclaimer: I am associated with this library.
This document converter preserves the text and parses the images as-is from the source document, then outputs it to PDF.
You can download the output PDF from here: test.pdf
Here is some sample code:
using (var documentConverter = new DocumentConverter())
{
var filename = #"C:\temp\evo\TestWordDoc.rtf";
var document = DocumentFactory.LoadFromStream(filename, new LoadDocumentOptions());
var jobData = DocumentConverterJobs.CreateJobData(filename, Path.Combine(Path.GetDirectoryName(filename), $"{Path.GetFileNameWithoutExtension(filename)}_{DateTime.Now:yyyyMMddHHmmss}.pdf"), DocumentFormat.Pdf);
var job = documentConverter.Jobs.CreateJob(jobData);
documentConverter.Jobs.RunJob(job);
}

I am failing to see why people have issues with QR codes such as this one which is just a template (I could not download any of the older samples above for comparison.)
 
 
It is a PNG demo template file designed to be scanned from up to 4 feet away (e.g. a poster) but it could be for production, much smaller i.e. lower scale for page scanning.
I drop the RTF on the WordPad print to pdf shortcut and get the pdf shown in the viewer almost instantly.
There is some natural degradation using RTF PNG and an aliased viewer, but the key is maintaining natural scales. Every thing you need is native as supplied with windows.
MSPaint, WordPad, CMD printing I could have sent preview to the PDFium viewer in Edge.

Related

c# iText7 - interate throuh pdf images and change size and dpi

I have a lot of very large PDF files, which contains huge images (scans).
The goal is to open PDF , read all images , change dpi, resolution and compress it.
How to managed it with Itex7?
And generally ho to iterate through all images in PDF?
using (iText.Kernel.Pdf.PdfReader pdfReader = new iText.Kernel.Pdf.PdfReader(inputPdfFile))
{
using (iText.Kernel.Pdf.PdfDocument pdfDocument = new iText.Kernel.Pdf.PdfDocument(pdfReader))
{
//??
//foreach (var image in pdfDocumentImagesList)
//{
// //image.SetNewDPI()
//}
}
}
How to go through all the PDF's images?
https://github.com/itext/i7js-book/blob/develop/src/test/java/com/itextpdf/samples/book/part4/chapter15/Listing_15_30_ExtractImages.java
https://github.com/itext/i7js-book/blob/develop/src/test/java/com/itextpdf/samples/book/part4/chapter15/Listing_15_31_MyImageRenderListener.java
How to change the image's dpi and resolution?
That's not a part of iText functionality, since iText is a PDF- rather than an image-proccessing library. I advise you to process the extracted images with some other tools and then either put them into a new document or replace the image in the PDF. The latter is not very easy. Probably the next SO answer would shed some light on it: http://stackoverflow.com/questions/26580912/pdf-convert-to-black-and-white-pngs
(its code, but in iText7: https://github.com/itext/i7js-examples/blob/develop/src/test/java/com/itextpdf/samples/sandbox/images/ReplaceImage.java)
How to compress an image?
https://github.com/itext/i7js-book/blob/develop/src/test/java/com/itextpdf/samples/book/part3/chapter10/Listing_10_12_CompressImage.java
Hope that would be useful!

how to add SVG graphic to pdf document using iText 7 in c#?

I'm working on generator of stickers, using iText7 C#.
The look of the final sticker is to look like this:
https://drive.google.com/open?id=16q_sMP5H0eiVhq85DDRGgE-CDlX3fOB5
I've problem with adding SVG graphic to pdf document. I have graphics in above link:
https://drive.google.com/open?id=1bw2E5hVhKjjwYqn6aGbe_tNqPmYmXu4b
https://drive.google.com/open?id=1lEqhrh2zAlOGlA1WMKfGtuhue6TBtcbc
I can not find any practical example on the Internet how to read an SVG file and add to a pdf document using iText7.
Can anyone help me with this topic?
Using the latest release 7.1.4, you would add an SVG to a document like this:
public static void Convert(Stream svg, Stream pdfOutputStream) {
SvgConverter.CreatePdf(svg, pdfOutputStream);
}
There are many other possibilities in this class to convert to PDF, but this is the easiest method to use.
I use this code to add SVG graphic to pdf document:
string enc_text = File.ReadAllText(SVG);
SvgConverter.DrawOnCanvas(enc_text, pdfCanvas);
but it's work only for simple SVG graphics like below
https://www.w3schools.com/graphics/tryit.asp?filename=trysvg_ellipse3
don't work for this SVG created and saved in CorelDraw:
https://drive.google.com/file/d/1bw2E5hVhKjjwYqn6aGbe_tNqPmYmXu4b/view
is that possible to draw this graphic on pdf using itext7 C#?
This is Code:
public const String SVG = #"C:\Users\Desktop\logo.svg";
....
string enc_text = File.ReadAllText(SVG);
SvgConverter.DrawOnCanvas(enc_text, pdfCanvas);
I tried three times and below are results:
Attempt 1
SVG
https://drive.google.com/open?id=1ibg_KwvviRQ4b9suniZwBJRdBdgM02te
PDF - the result is: OK:
https://drive.google.com/open?id=1DGGLUowlEYpAydbWTTSJORVxP67LGrf5
Attempt 2
SVG:
https://drive.google.com/open?id=1UHASgAxAaPONIo9fc6VZ9q4cjjK-uOzj
PDF - the result is: half well
https://drive.google.com/open?id=1yzLF-fQQcOQvEXVyDN-UuK0YVeHOn5B_
Attempt 3
SVG
https://drive.google.com/open?id=1ZNLDkc2x4WvHouKw-A4AgRrDuQ2pAoDE
PDF - the result is: no logo
https://drive.google.com/open?id=1nJVNT5oAMoI8HuUURZQv1yhu7IbFoI6H
SVG graphics are well displayed in browsers, but iText can not draw them properly, especially those complicated

iText GetTextFromPage exception with inline image

I have the same problem as was discussed here, which was not solved. My objective is to extract the text from an existing pdf file. I get the error message Could not find image data or EI for a certain pdf, which I cannot share as a sample. It works for other pdfs, with the following code
string fileURI = "C:\\Test\\Sample.pdf";
PdfReader reader = new PdfReader(fileURI);
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string s = PdfTextExtractor.GetTextFromPage(reader, 1, strategy);
Debug.WriteLine(s);
I am using iTextSharp 5.5.0 and tried changing found == 1 to found <= 1 as suggested in other posts. It does not help.
Would it help to remove all images in the pdf? I really just need the text. Which commands from iText could help me with this?
I downloaded the trial version of Acrobat to create a version of the pdf file, that I could share. After opening the file and saving it again as "Optimized PDF" over the Acrobat, the code was working and I could extract the text.
So the solution to the problem is probably opening each file in Acrobat and saving it again with the right settings using the Acrobat reference and then extracting the text.

Can a PDF be converted to a vector image format that can be printed from .NET?

We have a .NET app which prints to both real printers and PDF, currently using PDFsharp, although that part can be changed if there's a better option. Most of the output is generated text or images, but there can be one or more pages that get appended to the end. That page(s) are provided by the end-user in PDF format.
When printing to paper, our users use pre-printed paper, but in the case of an exported PDF, we concatenate those pages to the end, since they're already in PDF format.
We want to be able to embed those PDFs directly into the print stream so they don't need pre-printed paper. However, there aren't really any good options for rendering a PDF to a GDI page (System.Drawing.Graphics).
Is there a vector format the PDF could be converted to by some external program, that could rendered to a GDI+ page without being degraded by conversion to a bitmap first?
In an article titled "How To Convert PDF to EMF In .NET," I have shown how to do this using our PDFOne .NET product. EMFs are vector graphics and you can render them on the printer canvas.
A simpler alternative for you is PDF overlay explained in another article titled "PDF Overlay - Stitching PDF Pages Together in .NET." PDFOne allows x-y offsets in overlays that allows you stitch pages on the edges. In the article cited here, I have overlaid the pages one over another by setting the offsets to zero. You will have set it to page width and height.
DISCLAIMER: I work for Gnostice.
Ghostscript can output PostScript (which is a vector file) which can be directly sent to some types of printers. For example, if you're using an LPR capable printer, the PS file can be directly set to that printer using something like this project: http://www.codeproject.com/KB/printing/lpr.aspx
There are also some commercial options which can print a PDF (although I'm not sure if the internal mechanism is vector or bitmap based), for example http://www.tallcomponents.com/pdfcontrols2-features.aspx or http://www.tallcomponents.com/pdfrasterizer3.aspx
I finally figured out that there is an option that addresses my general requirement of embedding a vector format into a print job, but it doesn't work with GDI based printing.
The XPS file format created by Microsoft XPS Writer print driver can be printed from WPF, using the ReachFramework.dll included in .NET. By using WPF for printing instead of GDI, it's possible to embed an XPS document page into a larger print document.
The downside is, WPF printing works quite a bit different, so all the support code that directly uses stuff in the Sytem.Drawing namespace has to be re-written.
Here's the basic outline of how to embed the XPS document:
Open the document:
XpsDocument xpsDoc = new XpsDocument(filename, System.IO.FileAccess.Read);
var document = xpsDoc.GetFixedDocumentSequence().DocumentPaginator;
// pass the document into a custom DocumentPaginator that will decide
// what order to print the pages:
var mypaginator = new myDocumentPaginator(new DocumentPaginator[] { document });
// pass the paginator into PrintDialog.PrintDocument() to do the actual printing:
new PrintDialog().PrintDocument(mypaginator, "printjobname");
Then create a descendant of DocumentPaginator, that will do your actual printing. Override the abstract methods, in particular the GetPage should return DocumentPages in the correct order. Here's my proof of concept code that demonstrates how to append custom content to a list of Xps documents:
public override DocumentPage GetPage(int pageNumber)
{
for (int i = 0; i < children.Count; i++)
{
if (pageNumber >= pageCounts[i])
pageNumber -= pageCounts[i];
else
return FixFixedPage(children[i].GetPage(pageNumber));
}
if (pageNumber < PageCount)
{
DrawingVisual dv = new DrawingVisual();
var dc = dv.Drawing.Append();
dc = dv.RenderOpen();
DoRender(pageNumber, dc); // some method to render stuff to the DrawingContext
dc.Close();
return new DocumentPage(dv);
}
return null;
}
When trying to print to another XPS document, it gives an exception "FixedPage cannot contain another FixedPage", and a post by H.Alipourian demonstrates how to fix it: http://social.msdn.microsoft.com/Forums/da/wpf/thread/841e804b-9130-4476-8709-0d2854c11582
private DocumentPage FixFixedPage(DocumentPage page)
{
if (!(page.Visual is FixedPage))
return page;
// Create a new ContainerVisual as a new parent for page children
var cv = new ContainerVisual();
foreach (var child in ((FixedPage)page.Visual).Children)
{
// Make a shallow clone of the child using reflection
var childClone = (UIElement)child.GetType().GetMethod(
"MemberwiseClone", BindingFlags.Instance | BindingFlags.NonPublic
).Invoke(child, null);
// Setting the parent of the cloned child to the created ContainerVisual
// by using Reflection.
// WARNING: If we use Add and Remove methods on the FixedPage.Children,
// for some reason it will throw an exception concerning event handlers
// after the printing job has finished.
var parentField = childClone.GetType().GetField(
"_parent", BindingFlags.Instance | BindingFlags.NonPublic);
if (parentField != null)
{
parentField.SetValue(childClone, null);
cv.Children.Add(childClone);
}
}
return new DocumentPage(cv, page.Size, page.BleedBox, page.ContentBox);
}
Sorry that it's not exactly compiling code, I just wanted to provide an overview of the pieces of code necessary to make it work to give other people a head start on all the disparate pieces that need to come together to make it work. Trying to create a more generalized solution would be much more complex than the scope of this answer.
While not open source and not .NET native (Delphi based I believe, but offers a precompiled .NET library), Quick PDF can render a PDF to an EMF file which you could load into your Graphics object.

Generate a thumbnail of a Word document

I have a website where users upload Word documents and I want to display thumbnails of these word documents. If anyone of you knows how to display the first page of a Word file as an image using C# please tell me.
Also if you know a trusted .NET library to convert word files to images that requires no office interop that would be great.
http://blogs.msdn.com/windowssdk/archive/2009/06/12/windows-api-code-pack-for-microsoft-net-framework.aspx
ShellFile shellFile = ShellFile.FromFilePath(pathToYourFile);
Bitmap shellThumb = shellFile.Thumbnail.ExtraLargeBitmap;
It's Microsoft's API Code Pack
I found this question (7 yrs later) while searching for a similar solution. I'm evaluating 2JPEG and it appears to support 275 formats including Word, Excel, Publisher & Powerpoint files. fCoder recommends running 2JPEG as a scheduled background task. The command line syntax is pretty comprehensive.
Here's a sample snippet to generate a thumbnail for a specific file:
2jpeg.exe -src "c:\files\myfile.docx" -dst "c:\files" -oper Resize size:"100 200" fmode:fit_width -options pages:"1" scansf:no overwrite:yes template:"{Title}_thumb.jpg" silent:yes
A preview image of the 1st page of a .doc or .docx document can easily be created with a tool called Free Spire.Doc for .NET (a totally free word API for commercial and personal use). I found it to be fast and accurate.
Note from the developer's page:
"The featured function, conversion allows converting Word documents (Word 97-2003, Word 2007, Word 2010, Word 2013, Word 2016 and Word 2019) to commonly used file format, such as XML, RTF, TXT, PDF, XPS, EPUB, HTML and Image etc.
Friendly Reminder:
Free version is limited to 500 paragraphs and 25 tables... "
This C# code creates a System.Drawing.Image object of the 1st page of a .docx file:
using Spire.Doc
byte[] docContent = File.ReadAllBytes(#"C:\Temp\word_document.docx");
using (MemoryStream ms = new MemoryStream(docContent))
{
// Creates a Spire.Doc object to work with
Spire.Doc.Document doc = new Spire.Doc.Document(ms, Spire.Doc.FileFormat.Auto);
// SaveToImages creates an array of System.Drawing.Image, we take only the 1st element
System.Drawing.Image img = doc.SaveToImages(0, 1, Spire.Doc.Documents.ImageType.Bitmap)[0];
}
To create the thumbnail image, the following C# example includes a second using block to do it, and then converts to a base64 string:
using Spire.Doc
byte[] docContent = File.ReadAllBytes(#"C:\Temp\word_document.docx");
using (MemoryStream ms = new MemoryStream(docContent))
{
// Creates a Spire.Doc object to work with
Spire.Doc.Document doc = new Spire.Doc.Document(ms, Spire.Doc.FileFormat.Auto);
// SaveToImages creates an array of System.Drawing.Image, we take only the 1st element
System.Drawing.Image img = doc.SaveToImages(0, 1, Spire.Doc.Documents.ImageType.Bitmap)[0];
using (var ms2 = new MemoryStream())
{
// Auxiliary object needed for GetThumbnailImage
System.Drawing.Image.GetThumbnailImageAbort myCallback = new System.Drawing.Image.GetThumbnailImageAbort(ThumbnailCallback);
// We create a thumbnail (0.5 width and height = 50%)
img.GetThumbnailImage((int)(img.Width * 0.5), (int)(img.Height * 0.5), myCallback, IntPtr.Zero).Save(ms2, System.Drawing.Imaging.ImageFormat.Png);
// Convert to Base64 string representation of the image
return Convert.ToBase64String(ms2.ToArray());
}
}
In addition, the library can also convert in other ways, for instance this function returns .SVG files with each page:
doc.SaveToFile("resulting_file_name.svg", Spire.Doc.FileFormat.SVG);

Categories