Convert SharePoint Library XML document to PDF document using C#

Convert SharePoint Library XML document to PDF document using C# - c#

Background
I have a SharePoint 2013 (SP2013) environment (env) where the Word Automation Services (WAS) has stopped working, so my application has be failing to convert XML documents to PDF.
Previous Status
I use OpenXML SDK to convert the XML InfoPath document to Word document (works as expected). Then convert the Word document to PDF using WAS on SP.
Current Status
The WAS stopped working. My application converts the XML to Word but never converts to PDF. As a stop gap I am using a C# code snippet (shown below) to try converting to PDF but I keep getting the error "Object reference not set to an instance of an object."
...
using Word = Microsoft.Office.Interop.Word;
...
string fileName = generatedDoc; //generatedDoc is Word doc converted from XML
string pdfFileName = fileName.Replace("docx", "pdf");
string sourceUrl = siteUrl + "/DocLibMemo/" + fileName;
string destUrl = siteUrl + "/ApprovedMemoPDF/" + pdfFileName;
Convert(sourceUrl, destUrl, Word.WdSaveFormat.wdFormatPDF);
public static void Convert(string input, string output, Word.WdSaveFormat format)
{
// Create an instance of Word.exe
Word._Application oWord = new Word.Application();
// Make this instance of word invisible (Can still see it in the taskmgr).
oWord.Visible = false;
oWord.ScreenUpdating = false;
// Interop requires objects.
object oMissing = System.Reflection.Missing.Value;
object isVisible = false;
object readOnly = false;
object doNotSaveChanges = Word.WdSaveOptions.wdDoNotSaveChanges;
object oInput = input;
object oOutput = output;
object oFormat = format;
// Load a document into our instance of word.exe
Word._Document oDoc = oWord.Documents.Open(
ref oInput, ref oMissing, ref readOnly,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref isVisible, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
// Make this document the active document.
oDoc.Activate(); // The execption is hit here
// Save this document using Word
oDoc.SaveAs(ref oOutput, ref oFormat, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing);
// Always close Word.exe.
oWord.Quit(ref oMissing, ref oMissing, ref doNotSaveChanges);
}
The above snippet worked when I tested it using a console application and the Word files was on my C drive. However now that the Word files are on a SP library It doesn't convert to PDF.

I have encountered with the same problem. SharePoint library is a network drive which doesn't have a physical location. The files of SharePoint are indeed saved in database, so that is the reason why we cannot write and save files in SharePoint library. MS Office has a different approach of saving files to SP Library, it actually upload files instead of saving them directly.
Solution for the problem is to take a local copy of the Word file and make changes to the local copy and upload(i.e Copy) it to the same location in the SharePoint library.
Hope this helps.
Thanks.

Related

Customize Microsoft Word built-in style with numbering

I have a word document which already defined costumed built-in style. Like image below.
I want to change the style of the predefined built-in style by running the C# code below.
// open document
Object oFilePath = "C://Users/myDoc.docx";
Microsoft.Office.Interop.Word._Document myDoc;
myDoc = wrdApp.Documents.Open(ref oFilePath, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing
);
// Change header2 style
myDoc.Styles[WdBuiltinStyle.wdStyleHeading2].Font.Color = WdColor.wdColorOrange;
//save and close doc
myDoc.Save();
Object oFalse = false;
myDoc.Close(ref oFalse, ref oMissing, ref oMissing);
The code successfully change the color of heading text, but the number before the text still remains green, not affected by the code. Like the following picture.
Please give me some hint to also apply the color change to the numbering of heading.Thank you.

yes you can do that with the API
//To Cancel the numerotation
ListTemplate lt = null;
myDoc.Styles[WdBuiltinStyle.wdStyleHeading2].LinkToListTemplate(lt);
//To add First Level of Numerotation
ListGallery gallery = wrdApp.ListGalleries[WdListGalleryType.wdNumberGallery];
myDoc.Styles[WdBuiltinStyle.wdStyleHeading2].LinkToListTemplate(gallery.ListTemplates[1]);

How to change excel filename

My question is how to we change the excel filename without we manual changing by our own?
Example , i get a list of naming from excel format from vendor and he will put on the specified location . I need to run a program which using this file to generate a cancellation progress with a specific format of excel in MMYYSP15.
Here my code and i wish to add on the function as i need . Kindly advise
object oMissing = System.Reflection.Missing.Value;
Microsoft.Office.Interop.Excel.ApplicationClass xl = new Microsoft.Office.Interop.Excel.ApplicationClass();
Microsoft.Office.Interop.Excel.Workbook xlBook;
Microsoft.Office.Interop.Excel.Worksheet xlSheet;
//System.Threading.Thread.CurrentThread.CurrentCulture = oldCI;
string laPath = System.IO.Path.GetFullPath("D:\\New & Renewal Summary Report 201409.xls");
xlBook = (Microsoft.Office.Interop.Excel.Workbook)xl.Workbooks.Open(laPath, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing, oMissing);
xlSheet = (Microsoft.Office.Interop.Excel.Worksheet)xlBook.Worksheets.get_Item(1);
xlSheet.Name = "Sheet 1";
xlBook.Save();
xl.Application.Workbooks.Close();

You have to call xlBook.SaveAs. You can't change the current file's name.
xlBook.SaveAs(Filename: yourFileName);
If you do not intend to do anything with the Excel workbook itself, a simple File.Move would do.
File.Move(fromFileName, toFileName);

How to grammatically parse sentences using Microsoft Word Automation

I have written some code that parses sentences from a Microsoft Word document based on the suggestions found at the following URLs:
Open Word document using C#
Using VBA to parse text in an MS Word document
How to Automate Microsoft Word using C#
I wrote a small function that reads in a document and outputs its sentences via debug statements:
using Microsoft.Office.Interop.Word;
private void button2_Click(object sender, EventArgs e)
{
oWord.Visible = true;
object filename = textBox1.Text;
oDoc = oWord.Documents.Open(filename, ref oMissing, true, false, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
Sentences sentences;
sentences = oDoc.Sentences;
Debug.WriteLine("sentences=" + sentences.ToString());
foreach (Range r in sentences)
{
Debug.WriteLine("range.Text=" + r.Text);
}
}
It does just about as good of a job as I have done previously using the Mid function on strings. I was expecting it to do much better, considering MS Word has grammar check capabilities. Is there some way to make use of MS Word's grammar abilities in order to get it to be smarter about sentence parsing?

Getting word message box contents using c#

I have a c# application that is opening a word document, running a .bas macro, and closing word. All of that works fine. The macro generates 2 message box dialogs with the result of the macro. I want to communicate these messages to my c# application. How can I do this?
Here is my code:
// Object for missing (or optional) arguments.
object oMissing = System.Reflection.Missing.Value;
// Create an instance of Word, make it visible,
// and open Doc1.doc.
Word.Application oWord = new Word.Application();
oWord.Visible = true;
Word.Documents oDocs = oWord.Documents;
object oFile = #"c:\\Macro.docm";
// Open the file.
Word._Document oDoc = oDocs.Open(ref oFile, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing);
// Run the macro.
oWord.GetType().InvokeMember("Run",
System.Reflection.BindingFlags.Default |
System.Reflection.BindingFlags.InvokeMethod,
null, oWord, new Object[] { "MyMacro" });
// Quit Word and clean up.
oDoc.Close(ref oMissing, ref oMissing, ref oMissing);
System.Runtime.InteropServices.Marshal.ReleaseComObject(oDoc);
oDoc = null;
System.Runtime.InteropServices.Marshal.ReleaseComObject(oDocs);
oDocs = null;
oWord.Quit(ref oMissing, ref oMissing, ref oMissing);
System.Runtime.InteropServices.Marshal.ReleaseComObject(oWord);
oWord = null;
return "all done!";

The solution I've used in the past for this is not for the faint of heart. It basically involves using good old fashioned Windows API calls to find the message box then to enumerate through its "windows" (controls) until you find the control with the text you are after.
If the message box always has the same title, you should be able to locate the window using the API call FindWindowEx. Once you have its window handle you can use EnumChildWindows to run through its controls until you find the one you are after. You can usually qualify the right control with either GetWindowText or GetClassName or a combination of both. Generally the text of a control should be available with GetWindowText, but I dont know what control MS used for this particular window.
Good luck!
FindWindowEx example
EnumChildWindows example

Convert Html to Docx in c# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
i want to convert a html page to docx in c#, how can i do it?

My solution uses Html2OpenXml along with DocumentFormat.OpenXml (NuGet package for Html2OpenXml is here) to provide an elegant solution for ASP.NET MVC.
WordHelper.cs
public static class WordHelper
{
public static byte[] HtmlToWord(String html)
{
const string filename = "test.docx";
if (File.Exists(filename)) File.Delete(filename);
using (MemoryStream generatedDocument = new MemoryStream())
{
using (WordprocessingDocument package = WordprocessingDocument.Create(
generatedDocument, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null)
{
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
Body body = mainPart.Document.Body;
var paragraphs = converter.Parse(html);
for (int i = 0; i < paragraphs.Count; i++)
{
body.Append(paragraphs[i]);
}
mainPart.Document.Save();
}
return generatedDocument.ToArray();
}
}
}
Controller
[HttpPost]
[ValidateInput(false)]
public FileResult Demo(CkEditorViewModel viewModel)
{
return File(WordHelper.HtmlToWord(viewModel.CkEditorContent),
"application/vnd.openxmlformats-officedocument.wordprocessingml.document");
}
I'm using CKEditor to generate HTML for this sample.

Below does the same thing as Luis code, but just a bit more readable and applied to an ASP.NET MVC application:
var word = new Microsoft.Office.Interop.Word.Application();
word.Visible = false;
var filePath = Server.MapPath("~/MyFiles/Html2PdfTest.html");
var savePathPdf = Server.MapPath("~/MyFiles/Html2PdfTest.pdf");
var wordDoc = word.Documents.Open(FileName: filePath, ReadOnly: false);
wordDoc.SaveAs2(FileName: savePathPdf, FileFormat: WdSaveFormat.wdFormatPDF);
you can also save in other formats such as docx like this:
var savePathDocx = Server.MapPath("~/MyFiles/Html2PdfTest.docx");
var wordDoc = word.Documents.Open(FileName: filePath, ReadOnly: false);
wordDoc.SaveAs2(FileName: savePathDocx, FileFormat: WdSaveFormat.wdFormatXMLDocument);

Using that code to convert
Microsoft.Office.Interop.Word.Application word =
new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document wordDoc =
new Microsoft.Office.Interop.Word.Document();
Object oMissing = System.Reflection.Missing.Value;
wordDoc = word.Documents.Add(ref oMissing, ref oMissing, ref oMissing, ref oMissing);
word.Visible = false;
Object filepath = "c:\\page.html";
Object confirmconversion = System.Reflection.Missing.Value;
Object readOnly = false;
Object saveto = "c:\\doc.pdf";
Object oallowsubstitution = System.Reflection.Missing.Value;
wordDoc = word.Documents.Open(ref filepath, ref confirmconversion,
ref readOnly, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing);
object fileFormat = WdSaveFormat.wdFormatPDF;
wordDoc.SaveAs(ref saveto, ref fileFormat, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
ref oMissing, ref oMissing, ref oMissing, ref oallowsubstitution, ref oMissing,
ref oMissing);

The OpenXML SDK allows you to programmatically build docx documents:
OpenXml SDK Download

You might consider using altChunk. See, amongst others, adding images to openxml doc created from altchunk
If you don't want to rely on Word to convert the HTML, you could try docx4j-ImportXHTML for .NET; see this walkthrough.

Aspose.Words for .NET is a commercial component allowing you to achieve this.

MigraDoc can help.
Or using VS tools for Office.
Or connecting to Office via COM.

Using office applications on the web server is not recommended by Microsoft.
however this can be done fairly easily using the OpenXML 2.5
All you have to really do is split the HTML by ("<", ">")
then for each part shove it into a switch and identify if it is a HTML tag or not.
Then for each part you can start converting the HTML to "Run" and "RunProperties" and the non-html text is simply placed into the "Text"
It sounds harder then it is... and yes I have no idea why there isn't code available to do exactly this.
Things to keep in mind.
The two formats do not cleanly convert into each other, so if you focus on the cleanest code possible you will run into issue where the format its self becomes messy.

You may consider using PHPDocX that offers a very convenient tool to convert HTML files and/or HTML strings into WordML.
It has plenty of options among them:
you can filter using CSS style selector which chunks of HTML should
be inserted into the Word document.
You may choos if download the image or letthem as external links.
It parses HTML forms.
You may use native Word styles for tables and paragraphs overwritting the original CSS.
Transforms HTML anchors in Word bookmarks.
etcetera
I hope you find it useful :-)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Convert SharePoint Library XML document to PDF document using C# - c#

Related

Customize Microsoft Word built-in style with numbering

How to change excel filename

How to grammatically parse sentences using Microsoft Word Automation

Getting word message box contents using c#

Convert Html to Docx in c# [closed]

Categories

Resources