Blank 1st page while converting from HTML to PDF using OpenOffice - c#

While converting HTML (the very basic HTML) to PDF, the output PDF always has 1st page blank. How to solve this?
Do I miss any specific properties?
xComponent = InitDocument(aLoader, PathConverter(inputFile), "_blank");
SaveDocument(xComponent, inputFile, PathConverter(outputFile));
private static void SaveDocument(XComponent xComponent, string sourceFile, string destinationFile)
{
var propertyValues = new PropertyValue[2];
// Setting the flag for overwriting
propertyValues[1] = new PropertyValue { Name = "Overwrite", Value = new Any(true) };
//// Setting the filter name
propertyValues[0] = new PropertyValue
{
Name = "FilterName",
Value = new Any("writer_pdf_Export")
};
((XStorable)xComponent).storeToURL(destinationFile, propertyValues);
}
I tried to make workaround by deleting 1st page using PDFsharp, but after deleting, it shows me "The documents page tree has invalid node".

Related

Apose.Words ImportNode ignores font formatting when appendingchild

I am currently using Aspose.Words to open a document, pull content between a bookmark start and a bookmark end and then place that content into another document. The issue that I'm having is that when using the ImportNode method is imports onto my document but changes all of the fonts from Calibri to Times New Roman and changes the font size from whatever it was on the original document to 12pt.
The way I'm obtaining the content from the bookmark is by using the Aspose ExtractContent method.
Because I'm having the issue with the ImportNode stripping my font formatting I tried making some adjustments and saving each node to an HTML string using ToString(HtmlSaveOptions). This works mostly but the problem with this is it is stripping out my returns on the word document so none of my text has the appropriate spacing. My returns end up coming in as HTML in the following format
"<p style=\"margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt\"><span style=\"font-family:Calibri; display:none; -aw-import:ignore\"> </span></p>"
When using
DocumentBuilder.InsertHtml("<p style=\"margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt\"><span style=\"font-family:Calibri; display:none; -aw-import:ignore\"> </span></p>");
it does not correctly add the return on the word document.
Here is the code I'm using, please forgive the comments etc... this has been my attempts at correcting this.
public async Task<string> GenerateHtmlString(Document srcDoc, ArrayList nodes)
{
// Create a blank document.
Document dstDoc = new Document();
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Open"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
// Remove the first paragraph from the empty document.
dstDoc.FirstSection.Body.RemoveAllChildren();
// Create a new Builder for the temporary document that gets generated with the header or footer data.
// This allows us to control font and styles separately from the main document being built.
var newBuilder = new DocumentBuilder(dstDoc);
Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
htmlSaveOptions.ExportImagesAsBase64 = true;
htmlSaveOptions.SaveFormat = SaveFormat.Html;
htmlSaveOptions.ExportFontsAsBase64 = true;
htmlSaveOptions.ExportFontResources = true;
htmlSaveOptions.ExportTextBoxAsSvg = true;
htmlSaveOptions.ExportRoundtripInformation = true;
htmlSaveOptions.Encoding = Encoding.UTF8;
// Obtain all the links from the source document
// This is used later to add hyperlinks to the html
// because by default extracting nodes using Aspose
// does not pull in the links in a usable way.
var srcDocLinks = srcDoc.Range.Fields.GroupBy(x => x.DisplayResult).Select(x => x.First()).Where(x => x.Type == Aspose.Words.Fields.FieldType.FieldHyperlink).Distinct().ToList();
var childNodes = nodes.Cast<Node>().Select(x => x).ToList();
var oldBuilder = new DocumentBuilder(srcDoc);
oldBuilder.MoveToBookmark("Header");
var allchildren = oldBuilder.CurrentParagraph.Runs;
var allChildNodes = childNodes[0].Document.GetChildNodes(NodeType.Any, true);
var headerText = allChildNodes[0].Range.Bookmarks["Header"].BookmarkStart.GetText();
foreach (Node node in nodes)
{
var html = node.ToString(htmlSaveOptions);
try
{
//   is used by aspose because it works in XML
// If we see this character and the text of the node is \r we need to insert a break
if (html.Contains(" ") && node.Range.Text == "\r")
{
newBuilder.InsertHtml(html, false);
// Change the node into an HTML string
/*var htmlString = node.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
// Get all the child nodes of the html document
var allChildNodes = tempHtmlLinkDoc.DocumentNode.SelectNodes("//*");
// Loop over all child nodes so we can make sure we apply the correct font family and size to the break.
foreach (var childNode in allChildNodes)
{
// Get the style attribute from the child node
var childNodeStyles = childNode.GetAttributeValue("style", "").Split(';');
foreach (var childNodeStyle in childNodeStyles)
{
// Apply the font name and size to the new builder on the document.
if (childNodeStyle.ToLower().Contains("font-family"))
{
newBuilder.Font.Name = childNodeStyle.Split(':')[1].Trim();
}
if (childNodeStyle.ToLower().Contains("font-size"))
{
newBuilder.Font.Size = Convert.ToDouble(childNodeStyle.Split(':')[1]
.Replace("pt", "")
.Replace("px", "")
.Replace("em", "")
.Replace("rem", "")
.Replace("%", "")
.Trim());
}
}
}
// Insert the break with the corresponding font size and name.
newBuilder.InsertBreak(BreakType.ParagraphBreak);*/
}
else
{
// Loop through the source document links so the link can be applied to the HTML.
foreach (var srcDocLink in srcDocLinks)
{
if (html.Contains(srcDocLink.DisplayResult))
{
// Now that we know the html string has one of the links in it we need to get the address from the node.
var linkAddress = srcDocLink.Start.NextSibling.GetText().Replace(" HYPERLINK \"", "").Replace("\"", "");
//Convert the node into an HTML String so we can get the correct font color, name, size, and any text decoration.
var htmlString = srcDocLink.Start.NextSibling.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
var linkStyles = tempHtmlLinkDoc.DocumentNode.ChildNodes[0].GetAttributeValue("style", "").Split(';');
var linkStyleHtml = "";
foreach (var linkStyle in linkStyles)
{
if (linkStyle.ToLower().Contains("color"))
{
linkStyleHtml += $"color:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-family"))
{
linkStyleHtml += $"font-family:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-size"))
{
linkStyleHtml += $"font-size:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("text-decoration"))
{
linkStyleHtml += $"text-decoration:{linkStyle.Split(':')[1].Trim()};";
}
}
if (linkAddress.ToLower().Contains("mailto:"))
{
// Since the link has mailto included don't add the target attribute to the link.
html = new Regex($#"\b{srcDocLink.DisplayResult}\b").Replace(html, $"{srcDocLink.DisplayResult}");
//html = html.Replace(srcDocLink.DisplayResult, $"{srcDocLink.DisplayResult}");
}
else
{
// Since the links is not an email include the target attribute.
html = new Regex($#"\b{srcDocLink.DisplayResult}\b").Replace(html, $"{srcDocLink.DisplayResult}");
//html = html.Replace(srcDocLink.DisplayResult, $"{srcDocLink.DisplayResult}");
}
}
}
// Inseret the HTML String into the temporary document.
newBuilder.InsertHtml(html, false);
}
}
catch (Exception ex)
{
throw;
}
}
// This is just for debugging/troubleshooting purposes and to make sure thigns look correct
string tempDocxPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.docx");
dstDoc.Save(tempDocxPath);
// We generate this HTML file then load it back up and pass the DocumentNode.OuterHtml back to the requesting method.
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Save"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
string tempHtmlPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.html");
dstDoc.Save(tempHtmlPath, htmlSaveOptions);
var tempHtmlDoc = new HtmlDocument();
tempHtmlDoc.Load(tempHtmlPath);
var htmlText = tempHtmlDoc.DocumentNode.OuterHtml;
// Clean up our mess...
if (File.Exists(tempDocxPath))
{
File.Delete(tempDocxPath);
}
if (File.Exists(tempHtmlPath))
{
File.Delete(tempHtmlPath);
}
// Return the generated HTML string.
return htmlText;
}
Saving each node to HTML and then inserting them into the destination document is not a good idea. Because not all nodes can be properly saved to HTML and some formatting can be lost after Aspose.Words DOM -> HTML -> Aspose.Words DOM roundtrip.
Regarding the original issue, the problem might occur because you are using ImportFormatMode.UseDestinationStyles, in this case styles and default of the destination document are used and font might be changed. If you need to keep the source document formatting, you should use ImportFormatMode.KeepSourceFormatting.
If the problem occurs even with ImportFormatMode.KeepSourceFormatting this must be a bug and you should report this to Aspose.Words staff in the support forum.

Itext7 PdfAction.CreateGoTo() Links not working in final document

I've parsed html into a PDF and created a table of contents from the Header tags. The bookmarks in the document work fine, but clicking on the line in the table of contents doesn't do anything. The cursor doesn't change icons like it does if I put a URL in the link.
I used Itext RUPS to inspect the final PDF and the named destinations are in the final file.
I tried hard coding a couple of the names in just to see what happens, but they also didn't work. Putting in .CreateURL and google.com works fine.
The one thing I'm doing that may or may not be an issue is I'm creating the body document, then creating the table of contents and merging the two documents.
Maybe Bruno can make a cameo on this one.
private static List ProcessOutlineChildren(PdfDocument pdfDocument, List tableOfContents, IEnumerable<PdfOutline> pdfOutlines, IDictionary<String, PdfObject> names = null)
{
List<TabStop> tabStops = new List<TabStop>();
tabStops.Add(new TabStop(580, TabAlignment.RIGHT));
foreach (var o in pdfOutlines)
{
ListItem currentOutlineItem = new ListItem();
Paragraph paragraph = new Paragraph();
paragraph.AddTabStops(tabStops);
paragraph.Add(o.GetTitle());
paragraph.Add(new Tab());
paragraph.Add((pdfDocument.GetPageNumber((PdfDictionary) o.GetDestination().GetDestinationPage(names))).ToString());
paragraph.SetAction(PdfAction.CreateGoTo(o.GetDestination()));
currentOutlineItem.Add(paragraph);
if (o.GetAllChildren().Any())
{
currentOutlineItem.Add(ProcessOutlineChildren(pdfDocument, new List(), o.GetAllChildren(), names));
}
tableOfContents.Add(currentOutlineItem);
}
return tableOfContents;
}
public class CustomOutlineHandler : OutlineHandler
{
//PDF's require a unique name for destinations, this is how the actions/bookmarks jump to a location.
protected override string GenerateUniqueDestinationName(IElementNode element)
{
string destinationName = base.GenerateUniqueDestinationName(element);
if ("p".Equals(element.Name()))
{
destinationName = destinationName.Replace(GetDestinationNamePrefix(), "paragraph-prefix-");
}
return destinationName;
}
}
//From my main method converting things into PDF.
OutlineHandler customOutlineHandler = new CustomOutlineHandler().PutAllTagPriorityMappings(priorityMappings);
customOutlineHandler.SetDestinationNamePrefix("destination-name-");
properties.SetOutlineHandler(customOutlineHandler);

Aspose PDF - get text from page that has a matching string

I'm working with an existing library - the goal of the library is to pull text out of PDFs to verify against expected values to quality check recorded data vs data in pdf.
I'm looking for a way to succinctly pull a specific page worth of text given a string that should only fall on that specific page.
var pdfDocument = new Document(file.PdfFilePath);
var textAbsorber = new TextAbsorber{
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
pdfDocument.Pages.Accept(textAbsorber);
foreach (var page in pdfDocument.Pages)
{
}
I'm stuck inside the foreach(var page in pdfDocument.Pages) portion... or is that the right area to be looking?
Answer: Text Absorber recreated each page - inside the foreach loop.
If the absorber isn't recreated, it keeps text from previous loops.
public List<string> ProcessPage(MyInfoClass file, string find)
{
var pdfDocument = new Document(file.PdfFilePath);
foreach (Page page in pdfDocument.Pages)
{
var textAbsorber = new TextAbsorber {
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
page.Accept(textAbsorber);
var ext = textAbsorber.Text;
var exts = ext.Replace("\n", "").Split('\r').ToList();
if (ext.Contains(find))
return exts;
}
return null;
}

Show loading screen before dynamically created PDF

I have a view that is, instead of returning a View(), is returning a dynamically created PDF and then showing the PDF in a new tab. I'm not saving the PDF anywhere, or storing it anywhere. What I would like to do is have a loading screen show up while the PDF is being created. Can this be done?
public ActionResult SolicitorActionReport_Load(SolicitorActionParamsViewModel viewModel) {
var cultivationModel = new CultivationModel(viewModel, ConstituentRepository, CampaignRepository);
var cultivationData = cultivationModel.GetCultivationActivityData();
var reportParamModel = new List<ReportParamModel>
{new ReportParamModel {AgencyName = SelectedUserAgency.AgencyName, StartDate = viewModel.StartDate, EndDate = viewModel.EndDate}};
var reportToRun = "ActionDateCultivationReport";
if (viewModel.SortActionBy == SolicitorActionReportSortType.Constituent) {
reportToRun = "ConstituentCultivationReport";
} else if (viewModel.SortActionBy == SolicitorActionReportSortType.Solicitor) {
reportToRun = "SolicitorCultivationReport";
}
return FileContentPdf("Constituent", reportToRun, cultivationData, reportParamModel, new List<FundraisingAppealMassSummary>(), new List<FundraisingAppealPortfolioSummary>());
}
public FileContentResult FileContentPdf(string folder, string reportName, object dataSet,object reportParamModel,object appealMassDataSet, object appealPortfolioDataSet) {
var localReport = new LocalReport();
localReport.ReportPath = Server.MapPath("~/bin/Reports/" + folder + "/rpt" + reportName + ".rdlc");
var reportDataSource = new ReportDataSource(reportName + "DataSet", dataSet);
var reportParamsDataSource = new ReportDataSource("ReportParamModelDataSet", reportParamModel);
var reportParamsDataSourceMass = new ReportDataSource("FundraisingAppealMassSummaryDataSet", appealMassDataSet);
var reportParamsDataSourcePortfolio = new ReportDataSource("FundraisingAppealPortfolioSummaryDataSet", appealPortfolioDataSet);
#region Setting ReportViewControl
localReport.DataSources.Add(reportDataSource);
localReport.DataSources.Add(reportParamsDataSource);
localReport.DataSources.Add(reportParamsDataSourceMass);
localReport.DataSources.Add(reportParamsDataSourcePortfolio);
localReport.SubreportProcessing += (s, e) => { e.DataSources.Add(reportDataSource); };
string reportType = "pdf";
string mimeType;
string encoding;
string fileNameExtension;
//The DeviceInfo settings should be changed based on the reportType
//http://msdn2.microsoft.com/en-us/library/ms155397.aspx
string deviceInfo = "<DeviceInfo><OutputFormat>PDF</OutputFormat></DeviceInfo>";
Warning[] warnings;
string[] streams;
byte[] renderedBytes;
//Render the report
renderedBytes = localReport.Render(reportType, deviceInfo, out mimeType, out encoding, out fileNameExtension, out streams, out warnings);
#endregion
return File(renderedBytes, mimeType);
}
I'm not saving the PDF anywhere, or storing it anywhere. What I would like to do is have a loading screen show up while the PDF is being created. Can this be done?
Short Answer
No, not in a new tab.
The main problem with what you're trying to do is the lack of power you have when it comes to controlling the browser. Specifically, when you tell an anchor to open its hyperlink in a new tab (ie target="_blank"). There are hacky ways around this that generally are just going to frustrate your user because you're changing behavior that they might be dependent/relying on.
Workaround
You can get very close to your desired outcome by using this jQuery File Download plugin (view a demo). Basically, it manipulates an iframe to queue a download. This makes it possible to show a loading div while also keeping the user on the active page (not directing them to another tab). Then, the user can click the downloaded PDF which will most-likely open in a new tab (view compatible browsers here).
If you decide to use this plugin, here are the steps to applying it:
Download the plugin js source and include it in your Scripts.
Include the FileDownloadAttribute class provided in the plugin MVC Demo:
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Method, Inherited = true, AllowMultiple = false)]
public class FileDownloadAttribute: ActionFilterAttribute
{
public FileDownloadAttribute(string cookieName = "fileDownload", string cookiePath = "/")
{
CookieName = cookieName;
CookiePath = cookiePath;
}
public string CookieName { get; set; }
public string CookiePath { get; set; }
/// <summary>
/// If the current response is a FileResult (an MVC base class for files) then write a
/// cookie to inform jquery.fileDownload that a successful file download has occured
/// </summary>
/// <param name="filterContext"></param>
private void CheckAndHandleFileResult(ActionExecutedContext filterContext)
{
var httpContext = filterContext.HttpContext;
var response = httpContext.Response;
if (filterContext.Result is FileResult)
//jquery.fileDownload uses this cookie to determine that a file download has completed successfully
response.AppendCookie(new HttpCookie(CookieName, "true") { Path = CookiePath });
else
//ensure that the cookie is removed in case someone did a file download without using jquery.fileDownload
if (httpContext.Request.Cookies[CookieName] != null)
{
response.AppendCookie(new HttpCookie(CookieName, "true") { Expires = DateTime.Now.AddYears(-1), Path = CookiePath });
}
}
public override void OnActionExecuted(ActionExecutedContext filterContext)
{
CheckAndHandleFileResult(filterContext);
base.OnActionExecuted(filterContext);
}
}
github source
Apply the FileDownload attribute to your ActionResult method:
[FileDownload]
public ActionResult SolicitorActionReport_Load(SolicitorActionParamsViewModel viewModel) {
...
return FileContentPdf("Constituent", reportToRun, cultivationData, reportParamModel, new List<FundraisingAppealMassSummary>(), new List<FundraisingAppealPortfolioSummary>());
}
Include the necessary markup in the View to which you'll be linking to the report:
<a class="report-download" href="/Route/To/SolicitorActionReport">Download PDF</a>
Attach an event handler to the report-download anchor:
$(document).on("click", "a.report-download", function () {
$.fileDownload($(this).prop('href'), {
preparingMessageHtml: "We are preparing your report, please wait...",
failMessageHtml: "There was a problem generating your report, please try again."
});
return false; //this is critical to stop the click event which will trigger a normal file download!
});
You can view working demos at http://jqueryfiledownload.apphb.com/. There is also a demo that uses pre-styled jQuery UI modals to "prettify" the user experience.
You can also download the demo ASP.NET MVC solution from johnculviner / jquery.fileDownload github to see all of this working.
I think you have two choices:
Redirect to a "loading" page with fancy GIF spinners, then direct the request to the PDF (this would work if the PDF take a little server time to generate - the visitor would be looking at a loading page while waiting for next page to load)
or
Use an iFrame: load a page that has an iframe. This page can overlay a spinning GIF and loading message while the iFrame loads the PDF itself. Note: you could make the iframe 100% width and height

Replace MergeFields in a Word 2003 document and keep style

I've been trying to create a library to replace the MergeFields on a Word 2003 document, everything works fine, except that I lose the style applied to the field when I replace it, is there a way to keep it?
This is the code I'm using to replace the fields:
private void FillFields2003(string template, Dictionary<string, string> values)
{
object missing = Missing.Value;
var application = new ApplicationClass();
var document = new Microsoft.Office.Interop.Word.Document();
try
{
// Open the file
foreach (Field mergeField in document.Fields)
{
if (mergeField.Type == WdFieldType.wdFieldMergeField)
{
string fieldText = mergeField.Code.Text;
string fieldName = Extensions.GetFieldName(fieldText);
if (values.ContainsKey(fieldName))
{
mergeField.Select();
application.Selection.TypeText(values[fieldName]);
}
}
}
document.Save();
}
finally
{
// Release resources
}
}
I tried using the CopyFormat and PasteFormat methods in the selection, also using the get_style and set_style but to no exent.
Instead of using TypeText over the top of your selection use the the Result property of the Field:
if (values.ContainsKey(fieldName))
{
mergeField.Result = (values[fieldName]);
}
This will ensure any formatting in the field is retained.

Categories