Inserting word content into a VSTO document level customization - c#

I have a VSTO document level customization that performs specific functionality when opened from within our application. Basically, we open normal documents from inside of our application and I copy the content from the normal docx file into the VSTO document file which is stored inside of our database.
var app = new Microsoft.Office.Interop.Word.Application();
var docs = app.Documents;
var vstoDoc = docs.Open(vstoDocPath);
var doc = docs.Open(currentDocPath);
doc.Range().Copy();
vstoDoc.Range().PasteAndFormat(WdRecoveryType.wdFormatOriginalFormatting);
Everything works great, however using the above code leaves out certain formatting related to the document. The code below fixes these issues, but there will most likely be more issues that I come across, as I come across them I could address them one by one ...
for (int i = 0; i < doc.Sections.Count; i++)
{
var footerFont = doc.Sections[i + 1].Footers.GetEnumerator();
var headerFont = doc.Sections[i + 1].Headers.GetEnumerator();
var footNoteFont = doc.Footnotes.GetEnumerator();
foreach (HeaderFooter foot in vstoDoc.Sections[i + 1].Footers)
{
footerFont.MoveNext();
foot.Range.Font.Name = ((HeaderFooter)footerFont.Current).Range.Font.Name;
}
foreach (HeaderFooter head in vstoDoc.Sections[i + 1].Headers)
{
headerFont.MoveNext();
head.Range.Font.Name = ((HeaderFooter)headerFont.Current).Range.Font.Name;
}
foreach (Footnote footNote in vstoDoc.Footnotes)
{
footNoteFont.MoveNext();
footNote.Range.Font.Name = ((Footnote)footNoteFont.Current).Range.Font.Name;
}
}
I need a fool proof safe way of copying the content of one docx file to another docx file while preserving formatting and eliminating the risk of corrupting the document. I've tried to use reflection to set the properties of the two documents to one another, the code does start to look a bit ugly and I always worry that certain properties that I'm setting may have undesirable side effects. I've also tried zipping and unzipping the docx files, editing the xml manually and then rezipping afterwards, this hasn't worked too well, I've ended up corrupting a few of the documents during this process.
If anyone has dealt with a similar issue in the past, please could you point me in the right direction.
Thank you for your time

This code copies and keeps source formatting.
bookmark.Range.Copy();
Document newDocument = WordInstance.Documents.Add();
newDocument.Activate();
newDocument.Application.CommandBars.ExecuteMso("PasteSourceFormatting");

There is one more elegant way to manage it based upon
Globals.ThisAddIn.Application.ActiveDocument.Range().ImportFragment(filePath);
or you can do the following
Globals.ThisAddIn.Application.Selection.Range.ImportFragment(filePath);
in order to obtain current range where filePath is a path to the document you are copping from.

Related

Receiving message of issue with content in Excel File

I am currently building an Excel file by hand using OpenXml. I'm in the process of adding the sheets, however, I have come across an issue. I have a loop that adds the names of each sheet in but once it runs and I try to open the file, I get the following message:
"We found a problem with some content in 'FileName.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, Click Yes."
I think the issue might be due that I am adding in the name of each sheet using a string variable. When I take it out and add something else, it works. Below is my code where I am looping through and adding my sheets.
//Technology Areas
foreach (DataRow dr in techAreaDS.Rows)
{
var data = dr["TechAreaName"].ToString().Split('-');
var techArea = data[2].TrimStart();
var techAreaSheet = new Sheet { Id = workbookPart.GetIdOfPart(worksheetPart),
SheetId = sheetId, Name = techArea };
sheets.Append(techAreaSheet);
sheetId++;
}
I've seen people mention it is an issue with cells having strings that can be converted into strings, but in this case, the string will always be a string. Any help would be appreciated.
EDIT: I've figured out the problem. The issue is the Name property has a Max Length of 31. One of my items has a 42 length, hence the error. I did find a cool set of code to validate my OpenXml. Link.
UPDATE:
Oddly enough, someone thinks this question was about finding some code to help validate what I was doing. It was not... The question is clear: why was I receiving an error when trying to name sheets. I was not asking for validation code, though I found some.
I do ask that if you wish to help, please read the question versus assume what I was asking, and if you don't know what I wish to have answered, ask...
In order to find out the issue(s) causing this error, you need to validate the generated document.
Besides using the built in validation method as described here, which doesn't show you all issues as I found out, I suggest that you download and install Microsoft's Open XML SDK 2.5 for Microsoft Office.
It contains Microsoft's Open XML SDK 2.5 Productivity Tool, which is very helpful here:
Create a copy of the damaged XLSX file, and apply the fixes as Microsoft Excel is suggesting (suppose you have the files FileName_corrupt.xlsx and FileName_fixed.xlsx
Then, run Microsoft's Open XML SDK 2.5 Productivity Tool, open FileName_corrupt.xlsx, select "Compare Files" and specify the 2nd file FileName_fixed.xlsx. This allows you to compare the XML structure of both files.
Let Microsoft's Open XML SDK 2.5 Productivity Tool generate C# code from both files: Open them first, then right-click on the root level and select "Reflect Code". This will create C# code which allows you to generate the same file. Save both C# code versions (i.e. FileName_corrupt.cs and FileName_fixed.cs)
Now you can compare the differences via Visual Studio: Either use
devenv.exe /diff FileName_corrupt.cs FileName_fixed.cs
to compare them, or use the batch file I've created to launch the VS compare - this is a hidden feature in Visual Studio, it allows to compare 2 local files being not part of TFS.
This way you should be able to work out the differences and allow you to fix your code.
NOTE: For a first validation, I do suggest to use the validation code. Only if it still fails, use the steps above. For validation you can use
public static string ValidateOpenXmlDocument(OpenXmlPackage pXmlDoc, bool throwExceptionOnValidationFail=false)
{
using (var docToValidate = pXmlDoc)
{
var validator = new DocumentFormat.OpenXml.Validation.OpenXmlValidator();
var validationErrors = validator.Validate(docToValidate).ToList();
var errors = new System.Text.StringBuilder();
if (validationErrors.Any())
{
var errorMessage = string.Format("ValidateOpenXmlDocument: {0} validation error(s) with document", validationErrors.Count);
errors.AppendLine(errorMessage);
errors.AppendLine();
}
foreach (var error in validationErrors)
{
errors.AppendLine("Description: " + error.Description);
errors.AppendLine("ErrorType: " + error.ErrorType);
errors.AppendLine("Node: " + error.Node);
errors.AppendLine("Path: " + error.Path.XPath);
errors.AppendLine("Part: " + error.Part.Uri);
if (error.RelatedNode != null)
{
errors.AppendLine("Related Node: " + error.RelatedNode);
errors.AppendLine("Related Node Inner Text: " + error.RelatedNode.InnerText);
}
errors.AppendLine();
errors.AppendLine("==============================");
errors.AppendLine();
}
if (validationErrors.Any() && throwExceptionOnValidationFail)
{
throw new Exception(errors.ToString());
}
if (errors.Length > 0)
{
System.Diagnostics.Debug.WriteLine(errors.ToString());
}
return errors.ToString();
}
along with
public static void ValidateExcelDocument(string fileName)
{
using (var xlsx = SpreadsheetDocument.Open(fileName, true))
{
ValidateOpenXmlDocument(xlsx);
}
}
With a slight modification, you can easily use the code above for Microsoft Word validation too:
public static void ValidateWordDocument(string fileName)
{
using (var docx = WordprocessingDocument.Open(fileName, true))
{
ValidateOpenXmlDocument(docx);
}
}
I've figured out the problem. The issue is the Name property has a Max Length of 31 characters. The text I'm trying to use sometimes exceeds that limit (one has 42 characters). I also found a pretty cool set of code to validate my Open Xml to find out what the specific issue is. Link

Set BaseUrl of an existing Pdf Document

We're having trouble setting a BaseUrl using iTextSharp. We have used Adobes Implementation for this in the past, but we got some severe performance issues. So we switched to iTextSharp, which is aprox 10 times faster.
Adobe enabled us to set a base url for each document. We really need this in order to deploy our documents on different servers. But we cant seem to find the right code to do this.
This code is what we used with Adobe:
public bool SetBaseUrl(object jso, string baseUrl)
{
try
{
object result = jso.GetType().InvokeMember("baseURL", BindingFlags.SetProperty, null, jso, new Object[] {baseUrl });
return result != null;
}
catch
{
return false;
}
}
A lot of solutions describe how you can insert links in new or empty documents. But our documents already exist and do contain more than just text. We want to overlay specific words with a link that leads to one or more other documents. Therefore, its really important to us that we can insert a link without accessing the text itself. Maybe lay a box ontop of these words and set its position (since we know where the words are located in the document)
We have tried different implementations, using the setAction method, but it doesnt seem to work properly. The result was in most cases, that we saw out box, but there was no link inside or associated with it. (the cursor didn't change and nothing happend, when i clicked inside the box)
Any help is appreciated.
I've made you a couple of examples.
First, let's take a look at BaseURL1. In your comment, you referred to JavaScript, so I created a document to which I added a snippet of document-level JavaScript:
writer.addJavaScript("this.baseURL = \"http://itextpdf.com/\";");
This works perfectly in Adobe Acrobat, but when you try this in Adobe Reader, you get the following error:
NotAllowedError: Security settings prevent access to this property or
method. Doc.baseURL:1:Document-Level:0000000000000000
This is consistent with the JavaScript reference for Acrobat where it is clearly indicated that special permissions are needed to change the base URL.
So instead of following your suggested path, I consulted ISO-32000-1 (which was what I asked you to do, but... I've beaten you in speed).
I discovered that you can add a URI dictionary to the catalog with a Base entry. So I wrote a second example, BaseURL2, where I add this dictionary to the root dictionary of the PDF:
PdfDictionary uri = new PdfDictionary(PdfName.URI);
uri.put(new PdfName("Base"), new PdfString("http://itextpdf.com/"));
writer.getExtraCatalog().put(PdfName.URI, uri);
Now the BaseURL works in both Acrobat and Reader.
Assuming that you want to add a BaseURL to existing documents, I wrote BaseURL3. In this example, we add the same dictionary to the root dictionary of an existing PDF:
PdfReader reader = new PdfReader(src);
PdfDictionary uri = new PdfDictionary(PdfName.URI);
uri.put(new PdfName("Base"), new PdfString("http://itextpdf.com/"));
reader.getCatalog().put(PdfName.URI, uri);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
Using this code, you can change a link that points to "index.php" (base_url.pdf) into a link that points to "http://itextpdf.com/index.php" (base_url_3.pdf).
Now you can replace your Adobe license with a less expensive iTextSharp license ;-)

YASR - Yet another search and replace question

Environment: asp.net c# openxml
Ok, so I've been reading a ton of snippets and trying to recreate the wheel, but I'm hoping that somone can help me get to my desination faster. I have multiple documents that I need to merge together... check... I'm able to do that with openxml sdk. Birds are singing, sun is shining so far. Now that I have the document the way I want it, I need to search and replace text and/or content controls.
I've tried using my own text - {replace this} but when I look at the xml (rename docx to zip and view the file), the { is nowhere near the text. So I either need to know how to protect that within the doucment so they don't diverge or I need to find another way to search and replace.
I'm able to search/replace if it is an xml file, but then I'm back to not being able to combine the doucments easily.
Code below... and as I mentioned... document merge works fine... just need to replace stuff.
* Update * changed my replace call to go after the tag instead of regex. I have the right info now, but the .Replace call doesn't seem to want to work. Last four lines are for validation that I was seeing the right tag contents. I simply want to replace those contents now.
protected void exeProcessTheDoc(object sender, EventArgs e)
{
string doc1 = Server.MapPath("~/Templates/doc1.docx");
string doc2 = Server.MapPath("~/Templates/doc2.docx");
string final_doc = Server.MapPath("~/Templates/extFinal.docx");
File.Delete(final_doc);
File.Copy(doc1, final_doc);
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(final_doc, true))
{
string altChunkId = "AltChunkId2";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(doc2, FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
exeSearchReplace(final_doc);
}
public static void GetPropertyFromDocument(string document, string outdoc)
{
XmlDocument xmlProperties = new XmlDocument();
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
{
ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;
xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Company");
chars.Item(0).InnerText.Replace("{ClientName}", "Penn Inc.");
StreamWriter sw;
sw = File.CreateText(outdoc);
sw.WriteLine(chars.Item(0).InnerText);
sw.Close();
}
}
}
If I'm reading this right, you have something like "{replace me}" in a .docx and then when you loop through the XML, you're finding things like <t>{replace</t><t> me</><t>}</t> or some such havoc. Now, with XML like that, it's impossible to create a routine that will replace "{replace me}".
If that's the case, then it's very, very likely related to the fact that it's considered a proofing error. i.e. it's misspelled as far as Word is concerned. The cause of it is that you've opened the document in Word and have proofing turned on. As such, the text is marked as "isDirty" and split up into different runs.
The two ways about fixing this are:
Client-side. In Word, just make sure all proofing errors are either corrected or ignored.
Format-side. Use the MarkupSimplifier tool that is part of Open XML Package Editor Power Tool for Visual Studio 2010 to fix this outside of the client. Eric White has a great (and timely for you - just a few days old) write up here on it: Getting Started with Open XML PowerTools Markup Simplifier
If you want to search and replace text in a WordprocessingML document, there is a fairly easy algorithm that you can use:
Break all runs into runs of a single character. This includes runs that have special characters such as a line break, carriage return, or hard tab.
It is then pretty easy to find a set of runs that match the characters in your search string.
Once you have identified a set of runs that match, then you can replace that set of runs with a newly created run (which has the run properties of the run containing the first character that matched the search string).
After replacing the single-character runs with a newly created run, you can then consolidate adjacent runs with identical formatting.
I've written a blog post and recorded a screen-cast that walks through this algorithm.
Blog post: http://openxmldeveloper.org/archive/2011/05/12/148357.aspx
Screen cast: http://www.youtube.com/watch?v=w128hJUu3GM
-Eric

Using the SharpSVN api are there any methods available to get the number of lines contained in a file at a Revision without Exporting it?

I was just wondering if I missed anything inside the documentation that would allow me to get the number of lines contained in a file at a certain revision (or even number of lines changed from a SvnChangeItem, that would be nice too) without having to directly export the file to the filesystem and parse through it counting each line.
Any help would be appreciated. Thanks.
Nope, your stuck with exactly the solution you named. Export to temp file, count the lines, delete the file. A fairly expensive operation if your doing this file-by-file. It may be better to fetch the entire repo if you need to line-count every file and reuse the working directory for future runs.
The meta data (like current line count) is not contained within the repository but you can get the file without doing messy temp files.
For brevity, excluded code to iterate over revisions etc.
using (var client = new SvnClient())
{
using (MemoryStream memoryStream = new MemoryStream())
{
client.Write(new SvnUriTarget(urlToFile), memoryStream);
memoryStream.Position = 0;
var streamReader = new StreamReader(memoryStream);
int lineCount = 0;
while (streamReader.ReadLine() != null)
{
lineCount++;
}
}
}

Validate a Word 2007 Template file

I'm developing a solution that allows people to upload a DOCX file as a template. This template is used for generating Word documents with database info.
What I would like to do is once a template gets uploaded, to check it for errors. (I don't want my parser crashing when a template is used.)
I've seen the question about checking a signature of a Word template, but that isn't enough to validate the integrity of the file. Of course it is possible to try to unzip the file, validate the XML in there, and so on, but this is rather CPU intensive and I'd like a different approach if there is one.
Are there any solutions that are part of the Open XML SDK or other standard approaches to this? Any ideas are apreciated.
in C# off the MSDN site
public static bool IsDocumentValid(WordprocessingDocument mydoc)
{
OpenXmlValidator validator = new OpenXmlValidator();
var errors = validator.Validate(mydoc);
foreach (ValidationErrorInfo error in errors)
Debug.Write(error.Description);
return (errors.Count() == 0);
}

Categories