I am using Aspose.Words to create reports from a template file (.docx filetype).
After using Aspose.Words to modify the template file and saving it into a new file, the formatting of the template file were lost (such as bold text, comments, etc).
I have tried:
Aspose.Words.Document doc = new Document(inputStream);
var outputStream = new MemoryStream();
doc.Save(outputStream, SaveFormat.docx);
What I did not expect is that outputStream is much less bytes than inputStream although I have yet to make any modification on doc. It may the reason why the report file lose their formatting.
What should I try now?
Ok, the problem is because the current version of Aspose.Words I'm using does not support docx filetype. But it still can read text of a .docx file, and only text(without any associated formatting).
Related
I have an XSL-FO file and data is available in xml/json formats. I wanted to create a pdf using this xsl structure.
Can anyone suggest any open source libraries for conversion? I want it to be done at the C# level.
Note: I tried converting to html but as it is xsl-fo file I can't get the alignment.
You can use Apache FOP (https://xmlgraphics.apache.org/fop/) tool. It can generate PDF document from XSL-FO input.
From the C# it is possible to start Apache FOP process pointing it to the XSL-FO file (or use stdin, so you don't have to use any temporary files). After process exists you get PDF file (in file on disk or stdout).
For start you can make Apache FOP read XSL-FO file and write PDF file to disk, for that use Process class (https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process?view=netframework-4.8):
Draft code snippet (may contains errors but it should be a good start for you):
Process.Start("C:\\path\\to\\fop input_xsl-fo.xml output.pdf").WaitForExit();
I tried using fo.net and it worked for me,
here is the sample code
string lBaseDir = System.IO.Path.GetDirectoryName("e:\thermalpdf.xsl");
XslCompiledTransform lXslt = new XslCompiledTransform();
lXslt.Load("e:\thermalpdf.xsl");
lXslt.Transform("e:\billingData1.xml", "books1.fo");
FileStream lFileInputStreamFo = new FileStream("books1.fo", FileMode.Open);
FileStream lFileOutputStreamPDF = new FileStream("e:\response2.pdf", FileMode.Create);
FonetDriver lDriver = FonetDriver.Make();
lDriver.BaseDirectory = new DirectoryInfo(lBaseDir);
lDriver.CloseOnExit = true;
lDriver.Render(lFileInputStreamFo, lFileOutputStreamPDF);
lFileInputStreamFo.Close();
lFileOutputStreamPDF.Close();
I am working on a project where I need to create a Word file. For this purpose, I am using MigraDoc library for C#.
Using this library, I am easily able to generate a RTF file by writing :
Document document = CreateDocument();
RtfDocumentRenderer rtf = new RtfDocumentRenderer();
rtf.Render(document, "test.rtf", null);
Process.Start("test.rtf");
But the requirement now asks me to get a DOC or DOCX file rather than a RTF file. Is there a way to generate a DOC or DOCX file using MigraDoc? And if so, how may I achieve this?
MigraDoc cannot generate DOC or DOCX files. Since MigraDoc is open source, you could add a renderer for DOCX if you have the knowledge and the time.
MigraDoc as it is cannot generate DOC/DOCX, but maybe you can invoke an external conversion tool after generating the RTF file.
I don't know any such tools. Word can open RTF quickly and so far our customers never complained about getting RTF, not DOC or DOCX.
Update (2019-07-29): The website mentions "Word", but this only refers to RTF. There never was an implementation for .DOC or .DOCX.
It seems no any MigraDoc renders that support DOC or DOCX formats.
On documentation page we can see one MigraDoc feature:
Supports different output formats (PDF, Word, HTML, any printer supported by Windows)
But seems documentation says about RTF format that perfectly works with Word. I have reviewed MigraDoc repository and I do not see any DOC renders. We can use only RTF converter for Word supporting. So we can't generate DOC file directly using this package.
But we can convert RTF to DOC or DOCX easily (and for free) using FreeSpire.Doc nuget package.
Full code example is here:
using MigraDoc.DocumentObjectModel;
using MigraDoc.RtfRendering;
using Spire.Doc;
using System.IO;
namespace MigraDocTest
{
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
{
// Generate RTF (using MigraDoc)
var migraDoc = new MigraDoc.DocumentObjectModel.Document();
var section = migraDoc.AddSection();
var paragraph = section.AddParagraph();
paragraph.AddFormattedText("Hello World!", TextFormat.Bold);
var rtfDocumentRenderer = new RtfDocumentRenderer();
rtfDocumentRenderer.Render(migraDoc, stream, false, null);
// Convert RTF to DOCX (using Spire.Doc)
var spireDoc = new Spire.Doc.Document();
spireDoc.LoadFromStream(stream, FileFormat.Auto);
spireDoc.SaveToFile("D:\\example.docx", FileFormat.Docx );
}
}
}
}
You can use Microsoft's DocumentFormat.OpenXML library which has a NuGet package.
I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.
I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.
I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.
Here is my existing code:
public void ConvertDocxToHtml(string fileName)
{
byte[] byteArray = File.ReadAllBytes(fileName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"E:\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
So I just want to know if is there any way by which I can retain the formatting in converted HTML file.
I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.
PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See https://openxmldeveloper.org/
Your end result will not look exactly the way your Word Document turns out, but this link might help.
You might want to find an external tool to help you do this, like Aspose Words
You can use OpenXML Viewer extension for Firefox for Converting with formatting.
http://openxmlviewer.codeplex.com
This works for me. Hope this helps.
I know there are already questions similar to this, and suggested Open XML and all.
I am using Open XMl but it work only with inline style.
is there any solution to this, or any other better way to convert html to docx other than Open XML.
Thanks!
You can inline a CSS file using a tool like the one described here.
Then, to perform the conversion (adapted from Eric White's blog):
using (WordprocessingDocument myDoc =
WordprocessingDocument.Open("ConvertedDocument.docx", true))
{
string altChunkId = "AltChunkId1";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
var chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
using (FileStream fileStream = File.Open("YourHtmlDocument.html", FileMode.Open))
{
chunk.FeedData(fileStream);
}
AltChunk altChunk = new AltChunk() {Id = altChunkId};
mainPart.Document.Body.InsertAfter(
altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
This isn't exactly converting HTML to DOCX. It's appending YourHtmlDocument.html to ConvertedDocument.docx. If ConvertedDocument.docx is initially empty this approach is effectively a conversion.
Whenever you use an AltChunk to build a document, your HTML is embedded in the document until the next time the document is opened in Word. At that point, the HTML is converted to WordProcessingML markup. This is really only an issue if the document won't be opened in MS Word. If you were uploading to Google docs, opening in OpenOffice, or using COM to convert to a PDF, OpenXML won't be sufficient. In that case, you'll probably need to resort to a paid tool like Aspose.Words.
I am using iTextSharp to create a PDF document in C#. I would like to attach another file to the PDF. I'm having just loads of trouble trying to do so. The examples here show some annotations, which apparently attachments are.
This is what I've tried:
writer.AddAnnotation(its.pdf.PdfAnnotation.CreateFileAttachment(writer, new iTextSharp.text.Rectangle(100,100,100,100), "File Attachment", its.pdf.PdfFileSpecification.FileExtern(writer, "C:\\test.xml")));
Well, what happens is it does add an annotation on the PDF (appears as a little comment voice balloon), which i don't want. test.xml is shown in the attachments pane in Adobe Reader, but it can't be read or saved, and its file size is unknown so it's likely that it's never being properly attached.
Any suggestions?
Well, I got some code working to attach it:
its.Document PDFD = new its.Document(its.PageSize.LETTER);
its.pdf.PdfWriter writer;
writer = its.pdf.PdfWriter.GetInstance(PDFD, new FileStream(targetpath, FileMode.Create));
its.pdf.PdfFileSpecification pfs = its.pdf.PdfFileSpecification.FileEmbedded(writer, "C:\\test.xml", "New.xml", null);
writer.AddFileAttachment(pfs);
where "its"="iTextSharp.text"
Now to read the attachment!