How can I convert an RTF file to a PDF one? I have the adobe PDF printer, should I use it? If so, how can I programmatically access it?
You can use a PDF printer, but then you still have a few problems to solve.
In order to handle text that spans multiple pages, you need this article to create a descendant of RichTextbox that handles the EM_FORMATRANGE Message.
There are a lot of (free) PDF printer out there, but I found that only BioPdf will let you control the filename of the output. They also have reasonable rates for licensed versions.
I have used this to create complex reports (combinations of multiple RTF segments and custom graphics) as attachments for emailing.
You could use the virtual print Driver doPdf http://www.dopdf.com/ if this is permitted on the production machine. This will convert more or less any file type to a pdf format not just rtf. It just appears as another printer within Print Manager once installed.
To use it in say winforms code I adapted the code found on the msdn printing example http://msdn.microsoft.com/en-us/library/system.drawing.printing.printdocument.aspx
private void button1_Click(object sender, EventArgs e)
{
try
{
streamToPrint = new System.IO.StreamReader
(#"F:\temp\labTest.txt");
try
{
printFont = new Font("Arial", 10);
PrintDocument pd = new PrintDocument();
pd.PrinterSettings.PrinterName = "doPDF v6";//<-------added
pd.PrintPage += new PrintPageEventHandler
(this.pd_PrintPage);
pd.Print();
}
finally
{
streamToPrint.Close();
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
The only part of the code I needed to add was that marked above e.g. pd.PrinterSettings.PrinterName = "doPDF v6";
There may be a printer enumeration method which would be more elegant and robust and against this one could test to see if the print driver existed perhaps against a config file setting.
Update:
Handling multiple pages is taken care of in this method : this.pd_PrintPage as per the msdn sample.
PrintDocument supports from and to page printing.
DoPdf will pops up a fileSaveAsDialog box automatically so the files can be saved as a pdf document.
What about rtf though?
A Microsoft format not supported very well so it would seem. This article http://msdn.microsoft.com/en-us/library/ms996492.aspx with demo code uses the RichTextBox as a starting point and by using P/Invoke leverages the power of Win32 to print RTF as WYSIWG. The control defines it's own page length method replacing the one used above in the code snippet and still uses PrintDocument so it should be easy to use. You can assign any rtf using Rtb.rtf method.
An RTF document has to be read and interpreted by some app that can understand that format. You would need to programmatically launch that app, load your RTF file, and send it to the PDF printer. Word would be good for that, since it has a nice .NET interface. An overview of the steps would be:
ApplicationClass word = new ApplicationClass();
Document doc = word.Documents.Open(ref filename, ...);
doc.PrintOut(...);
You will need to use the Microsoft.Office.Interop.Word namespace and add a reference to the Microsoft.Office.Interop.Word.dll assembly.
Actually, none of these are terribly reliable or do what I want. The solution is simple, install Adobe Acrobat and just have it open the RTF file using the Process class.
I also found a more reasonable approach. I save the file as an RTF, the open it in word, and save it as PDF (Word's Print As PDF plugin must be installed)
SaveFileDialog sfd = new SaveFileDialog();
sfd.Filter = "Personal Document File (*.pdf)|*.pdf";
if (sfd.ShowDialog() == DialogResult.OK) {
String filename = Path.GetTempFileName() + ".rtf";
using (StreamWriter sw = new StreamWriter(filename)) {
sw.Write(previous);
}
Object oMissing = System.Reflection.Missing.Value; //null for VB
Object oTrue = true;
Object oFalse = false;
Microsoft.Office.Interop.Word.Application oWord = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document oWordDoc = new Microsoft.Office.Interop.Word.Document();
oWord.Visible = false;
Object rtfFile = filename;
Object saveLoc = sfd.FileName;
Object wdFormatPDF = 17; //WdSaveFormat Enumeration
oWordDoc = oWord.Documents.Add(ref rtfFile, ref oMissing, ref oMissing, ref oMissing);
oWordDoc.SaveAs(ref saveLoc, ref wdFormatPDF, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
oWordDoc.Close(ref oFalse, ref oMissing, ref oMissing);
oWord.Quit(ref oFalse, ref oMissing, ref oMissing);
//Get the MD5 hash and save it with it
FileStream file = new FileStream(sfd.FileName, FileMode.Open);
MD5 md5 = new MD5CryptoServiceProvider();
byte[] retVal = md5.ComputeHash(file);
file.Close();
using (StreamWriter sw = new StreamWriter(sfd.FileName + ".md5")) {
sw.WriteLine(sfd.FileName + " - " + DateTime.Now.ToLongDateString() + " " + DateTime.Now.ToShortTimeString() + " md5: " + BinaryToHexConverter.To64CharChunks(retVal)[0]);
}
}
Related
Background
I have a SharePoint 2013 (SP2013) environment (env) where the Word Automation Services (WAS) has stopped working, so my application has be failing to convert XML documents to PDF.
Previous Status
I use OpenXML SDK to convert the XML InfoPath document to Word document (works as expected). Then convert the Word document to PDF using WAS on SP.
Current Status
The WAS stopped working. My application converts the XML to Word but never converts to PDF. As a stop gap I am using a C# code snippet (shown below) to try converting to PDF but I keep getting the error "Object reference not set to an instance of an object."
...
using Word = Microsoft.Office.Interop.Word;
...
string fileName = generatedDoc; //generatedDoc is Word doc converted from XML
string pdfFileName = fileName.Replace("docx", "pdf");
string sourceUrl = siteUrl + "/DocLibMemo/" + fileName;
string destUrl = siteUrl + "/ApprovedMemoPDF/" + pdfFileName;
Convert(sourceUrl, destUrl, Word.WdSaveFormat.wdFormatPDF);
public static void Convert(string input, string output, Word.WdSaveFormat format)
{
// Create an instance of Word.exe
Word._Application oWord = new Word.Application();
// Make this instance of word invisible (Can still see it in the taskmgr).
oWord.Visible = false;
oWord.ScreenUpdating = false;
// Interop requires objects.
object oMissing = System.Reflection.Missing.Value;
object isVisible = false;
object readOnly = false;
object doNotSaveChanges = Word.WdSaveOptions.wdDoNotSaveChanges;
object oInput = input;
object oOutput = output;
object oFormat = format;
// Load a document into our instance of word.exe
Word._Document oDoc = oWord.Documents.Open(
ref oInput, ref oMissing, ref readOnly,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref isVisible, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
// Make this document the active document.
oDoc.Activate(); // The execption is hit here
// Save this document using Word
oDoc.SaveAs(ref oOutput, ref oFormat, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing);
// Always close Word.exe.
oWord.Quit(ref oMissing, ref oMissing, ref doNotSaveChanges);
}
The above snippet worked when I tested it using a console application and the Word files was on my C drive. However now that the Word files are on a SP library It doesn't convert to PDF.
I have encountered with the same problem. SharePoint library is a network drive which doesn't have a physical location. The files of SharePoint are indeed saved in database, so that is the reason why we cannot write and save files in SharePoint library. MS Office has a different approach of saving files to SP Library, it actually upload files instead of saving them directly.
Solution for the problem is to take a local copy of the Word file and make changes to the local copy and upload(i.e Copy) it to the same location in the SharePoint library.
Hope this helps.
Thanks.
I'm using Microsoft Interop Word version 15.0.0.0 in order to create a new Word document, insert some text into it, and save it.
When I'm saving it using the following command:
document.SaveAs2(wordFilePath);
the document is saved in format DOCX.
But when I'm saving it using the following command:
document.SaveAs2(wordFilePath, Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatDocument97);
the document is seemingly saved as Word-97 DOC (Windows explorer display it with Word-97 DOC icon and type), but it is really internally saved as DOCX (I can see this in two ways: it has the same size of the corresponding DOCX, and when I open it with Word-2016 and select SaveAs, the default save format is DOCX!).
How can I save a document in real document-97 format?
Here's the function used to create a new Word document, whose type depends on the extension (DOC vs. DOCX) of given file path:
public static void TextToMsWordDocument(string body, string wordFilePath)
{
Microsoft.Office.Interop.Word.Application winword = new Microsoft.Office.Interop.Word.Application();
winword.Visible = false;
object missing = System.Reflection.Missing.Value;
Microsoft.Office.Interop.Word.Document document = winword.Documents.Add(ref missing, ref missing, ref missing, ref missing);
if (body != null)
{
document.Content.SetRange(0, 0);
document.Content.Text = (body + System.Environment.NewLine);
}
if (System.IO.Path.GetExtension(wordFilePath).ToLower() == "doc")
document.SaveAs2(wordFilePath, Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatDocument97);
else // Assuming a "docx" extension:
document.SaveAs2(wordFilePath);
document.Close(ref missing, ref missing, ref missing);
document = null;
winword.Quit(ref missing, ref missing, ref missing);
winword = null;
}
And here's the code used to call this function:
TextToMsWordDocument("abcdefghijklmnopqrstuvwxyz", "text.doc");
TextToMsWordDocument("abcdefghijklmnopqrstuvwxyz", "text.docx");
It's been a rather stupid error...compare ‘ == ".doc" ’ instead of ‘ == "doc"...
I didn't notice it due to the fact that when SaveAs2 received a file path with extension ".doc" and no WdSaveFormat, it - strangely enough- created a Word document file that had the problem I explained here...
I have put together a simple C# Winforms Application that queries AS400 data based on user criteria and returns the results in a ListView Control. On Button_Click() I then store the Headers and Data in a .txt file. Below I am trying to use that .txt file as the DataSource for a Mail Merge word document.
When the code reaches oWrdDoc = oWord.Documents.Open(oTemplatePath); the program seems to just freeze up. Nothing occurs and I cannot step through to the next line. Anyone have ideas for what I am doing wrong?
public void Print(string docLoc, string docSource)
{
try
{
Word.Application oWord = new Word.Application();
Word.Document oWrdDoc = new Word.Document();
oWord.Visible = true;
Object oTemplatePath = "C:\\Users\NAME\\Desktop\\B-AIAddChgDual10-06-NEW.doc";
oWrdDoc = oWord.Documents.Open(oTemplatePath);
Object oMissing = System.Reflection.Missing.Value;
oWrdDoc.MailMerge.OpenDataSource("C:\\Users\\NAME\\Desktop\\Test2.txt", oMissing, oMissing, oMissing,
oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing);
oWrdDoc.MailMerge.Execute();
}
catch (Exception ex)
{
MessageBox.Show("Source:\t" + ex.Source + "\nMessage: \t" + ex.Message + "\nData:\t" + ex.Data);
}
finally
{
//
}
}
EDIT: Turns out I did not have the new Word instance set to Visible = true so when the instance brought up a dialog saying the file was locked for editing (by myself???) it was prompting me to open a Read-Only version, which previously I could not see, making it look like everything was frozen in processing. I've modified my code above to reflect the changes.
Any ideas for why I'm locked out of my own file, and how to prevent this?
These are the dialogs I receive after accepting open a Read-Only document (in order):
After selecting how to replace the fields in the above:
Original Mail Merge Fields:
After Personal Selections:
How do I tell the Word Application to use the '!' character as the Field Delimiter in my C# code?
Also, how do I proceed with the dialogs? I'm assuming I receive each one due to my datasource not containing fields matching those listed as Mail Merge Fields?
Here are my Mail Merge Fields:
-fuldate
-sys
-memno
-name
-address1
-address2
-address3
-sal
And here are my Delimited fields from my .txt DataSource file:
memno!name!addr1!addr2!city!state!zip!old_addr1!old_addr2!old_city!old_state!old_zip
You can set OpenAndRepair Mode to True and then Open the Document
Replace this:
oWrdDoc = oWord.Documents.Open(oTemplatePath);
With following:
oWrdDoc = word.Documents.Open(oTemplatePath, OpenAndRepair: true);
How can I copy contents of one word document and insert it in another pre-existing word document using C#. I've had a look around but everything looks really complicated (I'm a newbie). Surely there must be an easy solution?
I have found this code which gives me no errors but it doesnt seem to be doing anything. It's certainly not copying into the correct word doc. put it that way.
Word.Application oWord = new Word.Application();
Word.Document oWordDoc = new Word.Document();
Object oMissing = System.Reflection.Missing.Value;
oWordDoc = oWord.Documents.Add(ref oTemplatePath, ref oMissing, ref oMissing, ref oMissing);
oWordDoc.ActiveWindow.Selection.WholeStory();
oWordDoc.ActiveWindow.Selection.Copy();
oWord.ActiveDocument.Select();
oWordDoc.ActiveWindow.Selection.PasteAndFormat(Word.WdRecoveryType.wdPasteDefault);
P.S these word docs are .doc
Word.Application oWord = new Word.Application();
Word.Document oWordDoc = new Word.Document();
Object oMissing = System.Reflection.Missing.Value;
object oTemplatePath = #"C:\\Documents and Settings\\Student\\Desktop\\ExportFiles\\" + "The_One.docx";
oWordDoc = oWord.Documents.Add(ref oTemplatePath, ref oMissing, ref oMissing, ref oMissing);
oWordDoc.ActiveWindow.Selection.WholeStory();
oWordDoc.ActiveWindow.Selection.Copy();
oWord.ActiveDocument.Select();
// The Visible flag is what you've missed. You actually succeeded in making
// the copy, but because
// Your Word app remained hidden and the newly created document unsaved, you could not
// See the results.
oWord.Visible = true;
oWordDoc.ActiveWindow.Selection.PasteAndFormat(Word.WdRecoveryType.wdPasteDefault);
It's funny to see all the C# guys asking now the questions the VBA developers have answered since 15 years. It's worth to dig in VB 6 and VBA code samples, if you have to work with Microsoft Office automation issues.
For the point "nothing happens" it's simple: if you start Word through automation, you must set the application also to visible. If you run your code, it will work, but Word remains an invisible instance (open Windows Task Manager to see it).
For the point "easy solution", you can try to insert a document at a given range with the InsertFile method of the range, e.g. like this:
static void Main(string[] args)
{
Word.Application oWord = new Word.Application();
oWord.Visible = true; // shows Word application
Word.Document oWordDoc = new Word.Document();
Object oMissing = System.Reflection.Missing.Value;
oWordDoc = oWord.Documents.Add(ref oMissing);
Word.Range r = oWordDoc.Range();
r.InsertAfter("Some text added through automation!");
r.InsertParagraphAfter();
r.InsertParagraphAfter();
r.Collapse(Word.WdCollapseDirection.wdCollapseEnd); // Moves range at the end of the text
string path = #"C:\Temp\Letter.doc";
// Insert whole Word document at the given range, omitting page layout
// of the inserted document (if it doesn't contain section breakts)
r.InsertFile(path, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
}
NOTE: I was using framework 4.0 for this example, which allows for optional parameters.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
i want to convert a html page to docx in c#, how can i do it?
My solution uses Html2OpenXml along with DocumentFormat.OpenXml (NuGet package for Html2OpenXml is here) to provide an elegant solution for ASP.NET MVC.
WordHelper.cs
public static class WordHelper
{
public static byte[] HtmlToWord(String html)
{
const string filename = "test.docx";
if (File.Exists(filename)) File.Delete(filename);
using (MemoryStream generatedDocument = new MemoryStream())
{
using (WordprocessingDocument package = WordprocessingDocument.Create(
generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
Body body = mainPart.Document.Body;
var paragraphs = converter.Parse(html);
for (int i = 0; i < paragraphs.Count; i++)
{
body.Append(paragraphs[i]);
}
mainPart.Document.Save();
}
return generatedDocument.ToArray();
}
}
}
Controller
[HttpPost]
[ValidateInput(false)]
public FileResult Demo(CkEditorViewModel viewModel)
{
return File(WordHelper.HtmlToWord(viewModel.CkEditorContent),
"application/vnd.openxmlformats-officedocument.wordprocessingml.document");
}
I'm using CKEditor to generate HTML for this sample.
Below does the same thing as Luis code, but just a bit more readable and applied to an ASP.NET MVC application:
var word = new Microsoft.Office.Interop.Word.Application();
word.Visible = false;
var filePath = Server.MapPath("~/MyFiles/Html2PdfTest.html");
var savePathPdf = Server.MapPath("~/MyFiles/Html2PdfTest.pdf");
var wordDoc = word.Documents.Open(FileName: filePath, ReadOnly: false);
wordDoc.SaveAs2(FileName: savePathPdf, FileFormat: WdSaveFormat.wdFormatPDF);
you can also save in other formats such as docx like this:
var savePathDocx = Server.MapPath("~/MyFiles/Html2PdfTest.docx");
var wordDoc = word.Documents.Open(FileName: filePath, ReadOnly: false);
wordDoc.SaveAs2(FileName: savePathDocx, FileFormat: WdSaveFormat.wdFormatXMLDocument);
Using that code to convert
Microsoft.Office.Interop.Word.Application word =
new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document wordDoc =
new Microsoft.Office.Interop.Word.Document();
Object oMissing = System.Reflection.Missing.Value;
wordDoc = word.Documents.Add(ref oMissing, ref oMissing, ref oMissing, ref oMissing);
word.Visible = false;
Object filepath = "c:\\page.html";
Object confirmconversion = System.Reflection.Missing.Value;
Object readOnly = false;
Object saveto = "c:\\doc.pdf";
Object oallowsubstitution = System.Reflection.Missing.Value;
wordDoc = word.Documents.Open(ref filepath, ref confirmconversion,
ref readOnly, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing);
object fileFormat = WdSaveFormat.wdFormatPDF;
wordDoc.SaveAs(ref saveto, ref fileFormat, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oallowsubstitution, ref oMissing,
ref oMissing);
The OpenXML SDK allows you to programmatically build docx documents:
OpenXml SDK Download
You might consider using altChunk. See, amongst others, adding images to openxml doc created from altchunk
If you don't want to rely on Word to convert the HTML, you could try docx4j-ImportXHTML for .NET; see this walkthrough.
Aspose.Words for .NET is a commercial component allowing you to achieve this.
MigraDoc can help.
Or using VS tools for Office.
Or connecting to Office via COM.
Using office applications on the web server is not recommended by Microsoft.
however this can be done fairly easily using the OpenXML 2.5
All you have to really do is split the HTML by ("<", ">")
then for each part shove it into a switch and identify if it is a HTML tag or not.
Then for each part you can start converting the HTML to "Run" and "RunProperties" and the non-html text is simply placed into the "Text"
It sounds harder then it is... and yes I have no idea why there isn't code available to do exactly this.
Things to keep in mind.
The two formats do not cleanly convert into each other, so if you focus on the cleanest code possible you will run into issue where the format its self becomes messy.
You may consider using PHPDocX that offers a very convenient tool to convert HTML files and/or HTML strings into WordML.
It has plenty of options among them:
you can filter using CSS style selector which chunks of HTML should
be inserted into the Word document.
You may choos if download the image or letthem as external links.
It parses HTML forms.
You may use native Word styles for tables and paragraphs overwritting the original CSS.
Transforms HTML anchors in Word bookmarks.
etcetera
I hope you find it useful :-)