Generate a thumbnail of a Word document - c#

I have a website where users upload Word documents and I want to display thumbnails of these word documents. If anyone of you knows how to display the first page of a Word file as an image using C# please tell me.
Also if you know a trusted .NET library to convert word files to images that requires no office interop that would be great.

http://blogs.msdn.com/windowssdk/archive/2009/06/12/windows-api-code-pack-for-microsoft-net-framework.aspx
ShellFile shellFile = ShellFile.FromFilePath(pathToYourFile);
Bitmap shellThumb = shellFile.Thumbnail.ExtraLargeBitmap;
It's Microsoft's API Code Pack

I found this question (7 yrs later) while searching for a similar solution. I'm evaluating 2JPEG and it appears to support 275 formats including Word, Excel, Publisher & Powerpoint files. fCoder recommends running 2JPEG as a scheduled background task. The command line syntax is pretty comprehensive.
Here's a sample snippet to generate a thumbnail for a specific file:
2jpeg.exe -src "c:\files\myfile.docx" -dst "c:\files" -oper Resize size:"100 200" fmode:fit_width -options pages:"1" scansf:no overwrite:yes template:"{Title}_thumb.jpg" silent:yes

A preview image of the 1st page of a .doc or .docx document can easily be created with a tool called Free Spire.Doc for .NET (a totally free word API for commercial and personal use). I found it to be fast and accurate.
Note from the developer's page:
"The featured function, conversion allows converting Word documents (Word 97-2003, Word 2007, Word 2010, Word 2013, Word 2016 and Word 2019) to commonly used file format, such as XML, RTF, TXT, PDF, XPS, EPUB, HTML and Image etc.
Friendly Reminder:
Free version is limited to 500 paragraphs and 25 tables... "
This C# code creates a System.Drawing.Image object of the 1st page of a .docx file:
using Spire.Doc
byte[] docContent = File.ReadAllBytes(#"C:\Temp\word_document.docx");
using (MemoryStream ms = new MemoryStream(docContent))
{
// Creates a Spire.Doc object to work with
Spire.Doc.Document doc = new Spire.Doc.Document(ms, Spire.Doc.FileFormat.Auto);
// SaveToImages creates an array of System.Drawing.Image, we take only the 1st element
System.Drawing.Image img = doc.SaveToImages(0, 1, Spire.Doc.Documents.ImageType.Bitmap)[0];
}
To create the thumbnail image, the following C# example includes a second using block to do it, and then converts to a base64 string:
using Spire.Doc
byte[] docContent = File.ReadAllBytes(#"C:\Temp\word_document.docx");
using (MemoryStream ms = new MemoryStream(docContent))
{
// Creates a Spire.Doc object to work with
Spire.Doc.Document doc = new Spire.Doc.Document(ms, Spire.Doc.FileFormat.Auto);
// SaveToImages creates an array of System.Drawing.Image, we take only the 1st element
System.Drawing.Image img = doc.SaveToImages(0, 1, Spire.Doc.Documents.ImageType.Bitmap)[0];
using (var ms2 = new MemoryStream())
{
// Auxiliary object needed for GetThumbnailImage
System.Drawing.Image.GetThumbnailImageAbort myCallback = new System.Drawing.Image.GetThumbnailImageAbort(ThumbnailCallback);
// We create a thumbnail (0.5 width and height = 50%)
img.GetThumbnailImage((int)(img.Width * 0.5), (int)(img.Height * 0.5), myCallback, IntPtr.Zero).Save(ms2, System.Drawing.Imaging.ImageFormat.Png);
// Convert to Base64 string representation of the image
return Convert.ToBase64String(ms2.ToArray());
}
}
In addition, the library can also convert in other ways, for instance this function returns .SVG files with each page:
doc.SaveToFile("resulting_file_name.svg", Spire.Doc.FileFormat.SVG);

Related

Poor image quality when converting word docs with evo pdf

I use the WordToPdfConverter from evo to convert a Word document to a PDF. The Word document, which is in RTF format, contains images such as a QR code.
Unfortunately, the image quality in the resulting PDF is very poor (hence the QR code won't be readable). Even if I disable image compression or set it to the lowest level (=> best quality), the resulting image has a very poor quality.
Is there any other way to control the image quality? Or is there a way to tell evo's WordToPdfConverter not to use JPG as the resulting image format but to stuck with the source format (e.g. PNG)?
var pdfConverter = new WordToPdfConverter();
// Set Pdf image options
pdfConverter.PdfDocumentOptions.JpegCompressionEnabled = false;
pdfConverter.PdfDocumentOptions.JpegCompressionLevel = 0;
var filename = #"C:\temp\evo\TestWordDoc.rtf";
pdfConverter.ConvertWordFileToFile(filename, Path.Combine(Path.GetDirectoryName(filename), $"{Path.GetFileNameWithoutExtension(filename)}_{DateTime.Now:yyyyMMddHHmmss}.pdf"));
Since RTF is a text format, you should convert it to PDF without having to do any image compression as that will take longer to process and will result in a larger output file + you might have issues with the image quality from embedded images.
I created a sample RTF file (test.rtf) that contains a QR code as you described:
I then took the RTF and ran it through the Document Converter from the Leadtools.Document.sdk Nuget. Just as a disclaimer: I am associated with this library.
This document converter preserves the text and parses the images as-is from the source document, then outputs it to PDF.
You can download the output PDF from here: test.pdf
Here is some sample code:
using (var documentConverter = new DocumentConverter())
{
var filename = #"C:\temp\evo\TestWordDoc.rtf";
var document = DocumentFactory.LoadFromStream(filename, new LoadDocumentOptions());
var jobData = DocumentConverterJobs.CreateJobData(filename, Path.Combine(Path.GetDirectoryName(filename), $"{Path.GetFileNameWithoutExtension(filename)}_{DateTime.Now:yyyyMMddHHmmss}.pdf"), DocumentFormat.Pdf);
var job = documentConverter.Jobs.CreateJob(jobData);
documentConverter.Jobs.RunJob(job);
}
I am failing to see why people have issues with QR codes such as this one which is just a template (I could not download any of the older samples above for comparison.)
 
 
It is a PNG demo template file designed to be scanned from up to 4 feet away (e.g. a poster) but it could be for production, much smaller i.e. lower scale for page scanning.
I drop the RTF on the WordPad print to pdf shortcut and get the pdf shown in the viewer almost instantly.
There is some natural degradation using RTF PNG and an aliased viewer, but the key is maintaining natural scales. Every thing you need is native as supplied with windows.
MSPaint, WordPad, CMD printing I could have sent preview to the PDFium viewer in Edge.

how to add SVG graphic to pdf document using iText 7 in c#?

I'm working on generator of stickers, using iText7 C#.
The look of the final sticker is to look like this:
https://drive.google.com/open?id=16q_sMP5H0eiVhq85DDRGgE-CDlX3fOB5
I've problem with adding SVG graphic to pdf document. I have graphics in above link:
https://drive.google.com/open?id=1bw2E5hVhKjjwYqn6aGbe_tNqPmYmXu4b
https://drive.google.com/open?id=1lEqhrh2zAlOGlA1WMKfGtuhue6TBtcbc
I can not find any practical example on the Internet how to read an SVG file and add to a pdf document using iText7.
Can anyone help me with this topic?
Using the latest release 7.1.4, you would add an SVG to a document like this:
public static void Convert(Stream svg, Stream pdfOutputStream) {
SvgConverter.CreatePdf(svg, pdfOutputStream);
}
There are many other possibilities in this class to convert to PDF, but this is the easiest method to use.
I use this code to add SVG graphic to pdf document:
string enc_text = File.ReadAllText(SVG);
SvgConverter.DrawOnCanvas(enc_text, pdfCanvas);
but it's work only for simple SVG graphics like below
https://www.w3schools.com/graphics/tryit.asp?filename=trysvg_ellipse3
don't work for this SVG created and saved in CorelDraw:
https://drive.google.com/file/d/1bw2E5hVhKjjwYqn6aGbe_tNqPmYmXu4b/view
is that possible to draw this graphic on pdf using itext7 C#?
This is Code:
public const String SVG = #"C:\Users\Desktop\logo.svg";
....
string enc_text = File.ReadAllText(SVG);
SvgConverter.DrawOnCanvas(enc_text, pdfCanvas);
I tried three times and below are results:
Attempt 1
SVG
https://drive.google.com/open?id=1ibg_KwvviRQ4b9suniZwBJRdBdgM02te
PDF - the result is: OK:
https://drive.google.com/open?id=1DGGLUowlEYpAydbWTTSJORVxP67LGrf5
Attempt 2
SVG:
https://drive.google.com/open?id=1UHASgAxAaPONIo9fc6VZ9q4cjjK-uOzj
PDF - the result is: half well
https://drive.google.com/open?id=1yzLF-fQQcOQvEXVyDN-UuK0YVeHOn5B_
Attempt 3
SVG
https://drive.google.com/open?id=1ZNLDkc2x4WvHouKw-A4AgRrDuQ2pAoDE
PDF - the result is: no logo
https://drive.google.com/open?id=1nJVNT5oAMoI8HuUURZQv1yhu7IbFoI6H
SVG graphics are well displayed in browsers, but iText can not draw them properly, especially those complicated

How to create a SVG file in C# and import the file in iTextSharp pdf?

I've been working in an ASP.NET MVC application (C#) in Visual Studio 2012. I've created several reports with MS Charts (to show them in a pdf file I've used iTextSharp). To show a chart as an image in iTextSharp I've used next code:
using (var chartimage = new MemoryStream())
{
chartCentersByYear.SaveImage(chartimage, ChartImageFormat.Png);
Byte[] newChart = chartimage.GetBuffer();
var image = Image.GetInstance(newChart);
image.ScalePercent(50f);
image.SetAbsolutePosition(document.LeftMargin + 40, document.BottomMargin + 95);
document.Add(image);
}
But, my charts have a very bad quality when they're zoomed on 200%. Because of that I would like to use SVG format for the charts. How can I do that in C#, using iTextSharp? The method GetInstance of the iTextSharp doesn't recognize SVG format. Thank you in advance for any help.
Currently, iText does not support SVG format.
This is on the long term roadmap however.

MemoryStream (pdf) to Ghostscript to MemoryStream (jpg)

I did see "PDF to Image using GhostScript. No image file has to be created", but that only (sort of) answered half my question. Is it possible to use GhostScriptSharp (or the regular GhostScript dll) to convert a pdf in a MemoryStream to a jpg in a MemoryStream? I speak of a dynamically filled in pdf form with iTextSharp which I am already directing to a MemoryStream to save to a database or stream to a http response, and I'd really love to avoid saving to a file (and subsequent cleanup) if I can.
The sole answer in the answer I referenced claimed that one has to go down to the GhostScript dll to do the latter part, but it was obvious I would need to do a good bit of leg-work to figure out what that meant. Does anyone have a good resource that could help me on this journey?
The thing is that the PDF language, unlike the PostScript language, inherently requires random access to the file. If you provide PDF directly to Standard Input or via PIPE, Ghostscript will copy it to a temporary file before interpreting the PDF. So, there is no point of passing PDF as MemoryStream (or byte array) as it will anyway end up on the disk before it is interpreted.
Take a look at the Ghostscript.NET and it's GhostscriptRasterizer sample for the 'in-memory' output.
Ghostscript.Net is a wrapper to the Ghostscript dll. It now can take a stream object and can return an image that can be saved to an stream. Here is an example that I used on as ASP page to generate PDF's from a memory stream. I haven't completely figured out the best way to handle the ghostscript dll and where to locate it on the server.
void PDFToImage(MemoryStream inputMS, int dpi)
{
GhostscriptRasterizer rasterizer = null;
GhostscriptVersionInfo version = new GhostscriptVersionInfo(
new Version(0, 0, 0), #"C:\PathToDll\gsdll32.dll",
string.Empty, GhostscriptLicense.GPL);
using (rasterizer = new GhostscriptRasterizer())
{
rasterizer.Open(inputMS, version, false);
for (int i = 1; i <= rasterizer.PageCount; i++)
{
using (MemoryStream ms = new MemoryStream())
{
Image img = rasterizer.GetPage(dpi, dpi, i);
img.Save(ms, ImageFormat.Jpeg);
ms.Close();
AspImage newPage = new AspImage();
newPage.ImageUrl = "data:image/png;base64," + Convert.ToBase64String((byte[])ms.ToArray());
Document1Image.Controls.Add(newPage);
}
}
rasterizer.Close();
}
}

Save the output as PDF file

We are currently using SoftArtisans to generate Excel and Word files.
We need to extend this to also create PDF files.
Does OfficeWriter currently support this?
If not, any plans to add this feature? Or any opensource library that can be used to convert Excel/Word files to PDF format?
PdfSharp and Migradoc as far as I know are the best and the most popular. Migradoc is the higher-level cover for PdfSharp.
Note: I work for SoftArtisans, makers of OfficeWriter.
OfficeWriter does not currently support converting Excel/Word files to PDF. We generally recommend a 3rd party component to convert to PDF. However, many of these components require having Office installed on the server, which Microsoft does not advise. Therefore, it is important that you choose a converter that either does not require having Office on the server, or manages it carefully
Here are a few solutions for converting Word to PDF that we’ve recommended to our users in the past:
• Word Services for Sharepoint – If you are using SharePoint Server 2010, then you can use Word Services to perform the format conversion. More information about this solution can be found at: http://msdn.microsoft.com/en-us/library/office/ff181518.aspx
• Rainbow PDF - rainbowpdf.com
• EasyPDF - pdfonline.com/easypdf/
For more information, please see our blog post: http://blog.softartisans.com/2011/08/05/kb-tools-to-convert-word-documents-to-pdf/
PfgSharp is quite popular. Here is an example from CodeProject on how to create a simple PDF to get some feeling on how to use it:
class Program
{
static void Main(string[] args)
{
// Create a new PDF document
PdfDocument document = new PdfDocument();
document.Info.Title = "Created with PDFsharp";
// Create an empty page
PdfPage page = document.AddPage();
// Get an XGraphics object for drawing
XGraphics gfx = XGraphics.FromPdfPage(page);
// Create a font
XFont font = new XFont("Verdana", 20, XFontStyle.BoldItalic);
// Draw the text
gfx.DrawString("Hello, World!", font, XBrushes.Black,
new XRect(0, 0, page.Width, page.Height),
XStringFormats.Center);
// Save the document...
const string filename = "HelloWorld.pdf";
document.Save(filename);
// ...and start a viewer.
Process.Start(filename);
}
}

Categories