HTML to Word or PDF with GEMBOX - c#

I am using Gembox.Documents to insert an HTML file into a Word or PDF document.
Unfortunately, in the resulting Word (or pdf), the height of the contents of the rows (cells) in the table is too high and does not correspond to the original one in the HTML file and I cannot change this with the help of CSS or HTML.
Can you, please, suggest solutions to the problem?
string fileName="zzzz";
var destinationDocument = new DocumentModel();
var section = new Section(destinationDocument);
destinationDocument.Sections.Add(section);
var srcDocument = DocumentModel.Load(TempPath + fileName + ".html");
var pageSetup = srcDocument.Sections[0].PageSetup;
var destpagesPageSetup = destinationDocument.Sections[0].PageSetup;
destpagesPageSetup.Orientation = Orientation.Landscape;
destpagesPageSetup.PageWidth = 1000;
destpagesPageSetup.PageHeight = 1000;
destpagesPageSetup.RightToLeft = true;
destpagesPageSetup.PageMargins.Left = 20;
destpagesPageSetup.PageMargins.Right = 0;
destpagesPageSetup.PageMargins.Bottom = 0;
destpagesPageSetup.PageMargins.Top = 0;
destpagesPageSetup.PageMargins.Gutter = 0;
destpagesPageSetup.PageMargins.Footer = 0;
var mapping = new ImportMapping(srcDocument, destinationDocument, false);
var blocks = srcDocument.Sections[0].Blocks;
foreach (Block b in blocks)
{
//b.ParentCollection.TableFormat.DefaultCellSpacing = 1;
Block b1 = destinationDocument.Import(b, true, mapping);
section.Blocks.Add(b1);
}
var pageSetup1 = section.PageSetup;
destinationDocument.Save(TempPath + fileName + ".pdf");
thanks

This issue occurred because of the cell margins appearing from the HTML content.
After investigating that HTML, the issue was resolved, the fix is available in the current latest bugfix version:
https://www.gemboxsoftware.com/document/downloads/bugfixes.html
Or in the current latest NuGet package:
https://www.nuget.org/packages/GemBox.Document/

Related

How to extract all pages and attachments from PDF to PNG

I am trying to create a process in .NET to convert a PDF and all it's pages + attachments to PNGs. I am evaluating libraries and came across PDFiumSharp but it is not working for me. Here is my code:
string Inputfile = "input.pdf";
string OutputFolder = "Output";
string fileName = Path.GetFileNameWithoutExtension(Inputfile);
using (PdfDocument doc = new PdfDocument(Inputfile))
{
for (int i = 0; i < doc.Pages.Count; i++)
{
var page = doc.Pages[i];
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, false))
{
page.Render(bitmap);
var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
bitmap.Save(targetFile);
}
}
}
When I run this code, I get this exception:
screenshot of exception
Does anyone know how to fix this? Also does PDFiumSharp support extracting PDF attachments? If not, does anyone have any other ideas on how to achieve my goal?
PDFium does not look like it supports extracting PDF attachments. If you want to achieve your goal, then you can take a look at another library that supports both extracting PDF attachments as well as converting PDFs to PNGs.
I am an employee of the LEADTOOLS PDF SDK which you can try out via these 2 nuget packages:
https://www.nuget.org/packages/Leadtools.Pdf/
https://www.nuget.org/packages/Leadtools.Document.Sdk/
Here is some code that will convert a PDF + all attachments in the PDF to separate PNGs in an output directory:
SetLicense();
cache = new FileCache { CacheDirectory = "cache" };
List<LEADDocument> documents = new List<LEADDocument>();
if (!Directory.Exists(OutputDir))
Directory.CreateDirectory(OutputDir);
using var document = DocumentFactory.LoadFromFile("attachments.pdf", new LoadDocumentOptions { Cache = cache, LoadAttachmentsMode = DocumentLoadAttachmentsMode.AsAttachments });
if (document.Pages.Count > 0)
documents.Add(document);
foreach (var attachment in document.Attachments)
documents.Add(document.LoadDocumentAttachment(new LoadAttachmentOptions { AttachmentNumber = attachment.AttachmentNumber }));
ConvertDocuments(documents, RasterImageFormat.Png);
And the ConvertDocuments method:
static void ConvertDocuments(IEnumerable<LEADDocument> documents, RasterImageFormat imageFormat)
{
using var converter = new DocumentConverter();
using var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
converter.SetOcrEngineInstance(ocrEngine, false);
converter.SetDocumentWriterInstance(new DocumentWriter());
foreach (var document in documents)
{
var name = string.IsNullOrEmpty(document.Name) ? "Attachment" : document.Name;
string outputFile = Path.Combine(OutputDir, $"{name}.{RasterCodecs.GetExtension(imageFormat)}");
int count = 1;
while (File.Exists(outputFile))
outputFile = Path.Combine(OutputDir, $"{name}({count++}).{RasterCodecs.GetExtension(imageFormat)}");
var jobData = new DocumentConverterJobData
{
Document = document,
Cache = cache,
DocumentFormat = DocumentFormat.User,
RasterImageFormat = imageFormat,
RasterImageBitsPerPixel = 0,
OutputDocumentFileName = outputFile,
};
var job = converter.Jobs.CreateJob(jobData);
converter.Jobs.RunJob(job);
}
}

Reading text from pdf with iText7 + C#, text not recognized

i want to read data from pdf document. I use iText7:
var src = "<file location>";
var pdfDocument = new PdfDocument(new PdfReader(src));
var strategy = new LocationTextExtractionStrategy();
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); ++i)
{
var page = pdfDocument.GetPage(i);
string text = PdfTextExtractor.GetTextFromPage(page, strategy);
string processed = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text)));
}
pdfDocument.Close();
It works, but doesn't recognize letters. All text looks like
"����������\n�������������������������\n�����������������������������������\n
It is in English, so I don't expect any problems with encoding. What is the cause of this issue and how can I fix it?
You don't need the conversion you're doing. Change the code to:
StringBuilder processed = new StringBuilder();
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); ++i)
{
var page = pdfDocument.GetPage(i);
string text = PdfTextExtractor.GetTextFromPage(page, strategy);
processed.Append(text);
}

Read more than one file

I am writing a pdf to word converter which works perfectly fine for me. But I want to be able to convert more than one file.
What happens now is that it read the first file and does the convert process.
public static void PdfToImage()
{
try
{
Application application = null;
application = new Application();
var doc = application.Documents.Add();
string path = #"C:\Users\Test\Desktop\pdfToWord\";
foreach (string file in Directory.EnumerateFiles(path, "*.pdf"))
{
using (var document = PdfiumViewer.PdfDocument.Load(file))
{
int pagecount = document.PageCount;
for (int index = 0; index < pagecount; index++)
{
var image = document.Render(index, 200, 200, true);
image.Save(#"C:\Users\chnikos\Desktop\pdfToWord\output" + index.ToString("000") + ".png", ImageFormat.Png);
application.Selection.InlineShapes.AddPicture(#"C:\Users\chnikos\Desktop\pdfToWord\output" + index.ToString("000") + ".png");
}
string getFileName = file.Substring(file.LastIndexOf("\\"));
string getFileWithoutExtras = Regex.Replace(getFileName, #"\\", "");
string getFileWihtoutExtension = Regex.Replace(getFileWithoutExtras, #".pdf", "");
string fileName = #"C:\Users\Test\Desktop\pdfToWord\" + getFileWihtoutExtension;
doc.PageSetup.PaperSize = WdPaperSize.wdPaperA4;
foreach (Microsoft.Office.Interop.Word.InlineShape inline in doc.InlineShapes)
{
if (inline.Height > inline.Width)
{
inline.ScaleWidth = 250;
inline.ScaleHeight = 250;
}
}
doc.PageSetup.TopMargin = 28.29f;
doc.PageSetup.LeftMargin = 28.29f;
doc.PageSetup.RightMargin = 30.29f;
doc.PageSetup.BottomMargin = 28.29f;
application.ActiveDocument.SaveAs(fileName, WdSaveFormat.wdFormatDocument);
doc.Close();
}
}
I thought that with my foreach that problem should not occur. And yes there are more than one pdf in this folder
The line
var doc = application.Documents.Add();
is outside the foreach loop. So you only create a single word document for all your *.pdf files.
Move the above line inside the foreach loop to add a new word document for each *.pdf file.

Conditional Formatting using EPPlus

I am trying to achieve the following: I have a C# application which does some data processing and then outputs to a .xlsx using EPPlus. I want to add some conditional formatting to the excel and tried the following method, first I made a template blank excel with all conditional formatting rules set up and then tried dumping the data in it. The snippet below is my approach. p is an Excel package. Currently this does not work, the data is written correctly however the formatting rules that I set up are lost. I'm guessing because it basically clears everything before writing. Any help will be appreciated!
Byte[] bin = p.GetAsByteArray();
File.Copy("C:\\template.xlsx", "C:\\result.xlsx");
using (FileStream fs = File.OpenWrite("C:\\result.xlsx")) {
fs.Write(bin, 0, bin.Length);
}
Note :: I tried the following as well to avoid the whole external template situation.. check snippet below. The problem with this is that, after the .xlsx is generated and I open it, it says the file has unreadable or not displayable content and that it needs to repair it and after I do that, everything is fine and the conditional formatting has also worked. I have no clue why its doing that or how I can get rid of the error upon file opening.
string _statement = "$E1=\"3\"";
var _cond = ws.ConditionalFormatting.AddExpression(_formatRangeAddress);
_cond.Style.Fill.PatternType = OfficeOpenXml.Style.ExcelFillStyle.Solid;
_cond.Style.Fill.BackgroundColor.Color = Color.LightCyan;
_cond.Formula = _statement;
Any help will be appreciated!!
The method of using fs.Write will simply overwrite the copied file with the epplus generated file since you are doing it at the byte/stream level. So that will not get you what you want. (#MatthewD was showing you this in his post).
As for applying the format itself, what you have should work but if you are getting that kind of error I suspect you are mixing epplus and non-epplus manipulation of the excel file. This is how you should be doing it roughly:
[TestMethod]
public void Conditional_Format_Test()
{
//http://stackoverflow.com/questions/31296039/conditional-formatting-using-epplus
var existingFile = new FileInfo(#"c:\temp\temp.xlsx");
if (existingFile.Exists)
existingFile.Delete();
//Throw in some data
var datatable = new DataTable("tblData");
datatable.Columns.Add(new DataColumn("Col1", typeof(int)));
datatable.Columns.Add(new DataColumn("Col2", typeof(int)));
datatable.Columns.Add(new DataColumn("Col3", typeof(int)));
for (var i = 0; i < 20; i++)
{
var row = datatable.NewRow();
row["Col1"] = i;
row["Col2"] = i * 10;
row["Col3"] = i * 100;
datatable.Rows.Add(row);
}
using (var pack = new ExcelPackage(existingFile))
{
var ws = pack.Workbook.Worksheets.Add("Content");
ws.Cells["E1"].LoadFromDataTable(datatable, true);
//Override E1
ws.Cells["E1"].Value = "3";
string _statement = "$E1=\"3\"";
var _cond = ws.ConditionalFormatting.AddExpression(new ExcelAddress(ws.Dimension.Address));
_cond.Style.Fill.PatternType = ExcelFillStyle.Solid;
_cond.Style.Fill.BackgroundColor.Color = Color.LightCyan;
_cond.Formula = _statement;
pack.SaveAs(existingFile);
}
}
To expand on #Ernie code sample, here's a working example that colors a range according to cell's value. Each cell of the range can have any of three colors depending on the cell's value (<.01, <.05, <.1).
ExcelRange rng = ws.Cells[statsTableRowStart, 10, statsTableRowStart + gud.levels.level.Count() - 1, 10];
OfficeOpenXml.ConditionalFormatting.Contracts.IExcelConditionalFormattingExpression _condp01 = ws.ConditionalFormatting.AddExpression(rng);
_condp01.Style.Fill.PatternType = OfficeOpenXml.Style.ExcelFillStyle.Solid;
_condp01.Style.Fill.BackgroundColor.Color = System.Drawing.Color.OrangeRed;
_condp01.Formula = new ExcelFormulaAddress(rng.Address) + "<.01";
OfficeOpenXml.ConditionalFormatting.Contracts.IExcelConditionalFormattingExpression _condp05 = ws.ConditionalFormatting.AddExpression(rng);
_condp05.Style.Fill.PatternType = OfficeOpenXml.Style.ExcelFillStyle.Solid;
_condp05.Style.Fill.BackgroundColor.Color = System.Drawing.Color.OliveDrab;
_condp05.Formula = new ExcelFormulaAddress(rng.Address) + "<.05";
OfficeOpenXml.ConditionalFormatting.Contracts.IExcelConditionalFormattingExpression _condp1 = ws.ConditionalFormatting.AddExpression(rng);
_condp1.Style.Fill.PatternType = OfficeOpenXml.Style.ExcelFillStyle.Solid;
_condp1.Style.Fill.BackgroundColor.Color = System.Drawing.Color.LightCyan;
_condp1.Formula = new ExcelFormulaAddress(rng.Address) + "<.1";

Excel Group Text Box and Picture

I found some code to add an image to an Excel-sheet with the SDK 2.0. And this part works fine. Now I want a Text Box under the Image, but I don't know how to get a TextBox in general.
Which classes do I need an what is appands what or which property?
Furthermore it would be nice if it be groupt. So that when you drag one the other is following.
The code look like this (I know it's a bit much, but I couldt cut it more):
private void addImage(Offset offset, Extents extents, string sImagePath, string description)
{
WorksheetPart worksheetPart = this.arbeitsBlatt.WorksheetPart;
DrawingsPart drawingsPart;
ImagePart imagePart;
XDrSp.WorksheetDrawing worksheetDrawing;
ImagePartType imagePartType = getImageType(sImagePath);
{
// --- use the existing DrawingPart
drawingsPart = worksheetPart.DrawingsPart;
imagePart = drawingsPart.AddImagePart(imagePartType);
drawingsPart.CreateRelationshipToPart(imagePart);
worksheetDrawing = drawingsPart.WorksheetDrawing;
}
using (FileStream fileStream = new FileStream(sImagePath, FileMode.Open))
{
imagePart.FeedData(fileStream);
}
int imageNumber = drawingsPart.ImageParts.Count<ImagePart>();
if (imageNumber == 1)
{
Drawing drawing = new Drawing();
drawing.Id = drawingsPart.GetIdOfPart(imagePart);
this.arbeitsBlatt.Append(drawing);
}
XDrSp.NonVisualDrawingProperties noVisualDrawingProps = new XDrSp.NonVisualDrawingProperties();
XDrSp.NonVisualPictureDrawingProperties noVisualPictureDrawingProps = new XDrSp.NonVisualPictureDrawingProperties();
noVisualDrawingProps.Id = new UInt32Value((uint)(1024 + imageNumber));
noVisualDrawingProps.Name = "Picture " + imageNumber.ToString();
noVisualDrawingProps.Description = beschreibung;
PictureLocks picLocks = new PictureLocks();
picLocks.NoChangeAspect = true;
picLocks.NoChangeArrowheads = true;
noVisualPictureDrawingProps.PictureLocks = picLocks;
XDrSp.NonVisualPictureProperties noVisualPictureProps = new XDrSp.NonVisualPictureProperties();
noVisualPictureProps.NonVisualDrawingProperties = noVisualDrawingProps;
noVisualPictureProps.NonVisualPictureDrawingProperties = noVisualPictureDrawingProps;
Stretch stretch = new Stretch();
stretch.FillRectangle = new FillRectangle();
XDrSp.BlipFill blipFill = new XDrSp.BlipFill();
Blip blip = new Blip();
blip.Embed = drawingsPart.GetIdOfPart(imagePart);
blip.CompressionState = BlipCompressionValues.Print;
blipFill.Blip = blip;
blipFill.SourceRectangle = new SourceRectangle();
blipFill.Append(stretch);
Transform2D t2d = new Transform2D();
t2d.Offset = offset;
t2d.Extents = extents;
XDrSp.ShapeProperties sp = new XDrSp.ShapeProperties();
sp.BlackWhiteMode = BlackWhiteModeValues.Auto;
sp.Transform2D = t2d;
PresetGeometry prstGeom = new PresetGeometry();
prstGeom.Preset = ShapeTypeValues.Rectangle;
prstGeom.AdjustValueList = new AdjustValueList();
sp.Append(prstGeom);
sp.Append(new NoFill());
XDrSp.Picture picture = new XDrSp.Picture();
picture.NonVisualPictureProperties = noVisualPictureProps;
picture.BlipFill = blipFill;
picture.ShapeProperties = sp;
XDrSp.OneCellAnchor anchor = this.getCellAnchor();
XDrSp.Extent extent = new XDrSp.Extent();
extent.Cx = extents.Cx;
extent.Cy = extents.Cy;
anchor.Extent = extent;
anchor.Append(picture);
anchor.Append(new XDrSp.ClientData());
worksheetDrawing.Append(anchor);
worksheetDrawing.Save(drawingsPart);
#endregion
}
I think you are new to OpenXml SDK
First of all you need to use the newest version of Open XMl SDK - Version 2.5 [Download - http://www.microsoft.com/en-us/download/details.aspx?id=30425]
Here download BOTH OpenXMLSDKV25.msi , OpenXMLSDKToolV25.msi .. Install BOTH.
Now here is the trick, OpenXML productivity tool is the one you need here. It allows you to brows an existing Excel file and break it down to CODES [watch here - https://www.youtube.com/watch?v=KSSMLR19JWA]
Now what you need to do is create an Excel sheet manually with what you want [In your case add Text Box under the Image] Then open this Excel file with productivity tool and understand the CODE . Note that you will need to understand Spreadsheet file structure to understand this CODE [ Reffer this - https://www.google.com/#q=open+xml+sdk] .. Now write your codes to meet your requirement using codes of Productivity tool
NOTE - Once you analyse the dummy Spreadsheet with Productivity tool you will understand why giving or guiding with CODE examples as an answer is not practical.
- Happy Coding-

Categories