I was able to access a bookmark in my word document using this code:
var res = from bm in mainPart.Document.Body.Descendants<BookmarkStart>()
where bm.Name == "BookmarkName"
select bm;
Now I want to insert a paragraph and a table after this bookmark. How do I do that? (example code would be appreciated)
Code
Once you have the bookmark you can access its parent element and add the other items after it.
using (WordprocessingDocument document = WordprocessingDocument.Open(#"C:\Path\filename.docx", true))
{
var mainPart = document.MainDocumentPart;
var res = from bm in mainPart.Document.Body.Descendants<BookmarkStart>()
where bm.Name == "BookmarkName"
select bm;
var bookmark = res.SingleOrDefault();
if (bookmark != null)
{
var parent = bookmark.Parent; // bookmark's parent element
// simple paragraph in one declaration
//Paragraph newParagraph = new Paragraph(new Run(new Text("Hello, World!")));
// build paragraph piece by piece
Text text = new Text("Hello, World!");
Run run = new Run(new RunProperties(new Bold()));
run.Append(text);
Paragraph newParagraph = new Paragraph(run);
// insert after bookmark parent
parent.InsertAfterSelf(newParagraph);
var table = new Table(
new TableProperties(
new TableStyle() { Val = "TableGrid" },
new TableWidth() { Width = 0, Type = TableWidthUnitValues.Auto }
),
new TableGrid(
new GridColumn() { Width = (UInt32Value)1018U },
new GridColumn() { Width = (UInt32Value)3544U }),
new TableRow(
new TableCell(
new TableCellProperties(
new TableCellWidth() { Width = 0, Type = TableWidthUnitValues.Auto }),
new Paragraph(
new Run(
new Text("Category Name"))
)),
new TableCell(
new TableCellProperties(
new TableCellWidth() { Width = 4788, Type = TableWidthUnitValues.Dxa }),
new Paragraph(
new Run(
new Text("Value"))
))
),
new TableRow(
new TableCell(
new TableCellProperties(
new TableCellWidth() { Width = 0, Type = TableWidthUnitValues.Auto }),
new Paragraph(
new Run(
new Text("C1"))
)),
new TableCell(
new TableCellProperties(
new TableCellWidth() { Width = 0, Type = TableWidthUnitValues.Auto }),
new Paragraph(
new Run(
new Text("V1"))
))
));
// insert after new paragraph
newParagraph.InsertAfterSelf(table);
}
// close saves all parts and closes the document
document.Close();
}
The above code should do it. However, I'll explain some special circumstances.
Be aware that it will attempt the insertions after the parent element of the bookmark. What behavior do you expect if your bookmark happens to be part of a paragraph inside a table? Should it append the new paragraph and table right after it, within that table? Or should it do it after that table?
You might be wondering why the above questions matter. It all depends on where the insertion will occur. If the bookmark's parent is in a table, currently the above code would attempt to place a table within a table. That's fine, however an error might occur due to an invalid OpenXml structure. The reason is that if the inserted table was the last element in the original table's TableCell, there needs to be a Paragraph element added after the closing TableCell tag. You would promptly discover this issue if it occurred once you attempted to open the document in MS Word.
The solution is to determine whether you are indeed performing the insertion within a table.
To do so, we can add to the above code (after the parent var):
var parent = bookmark.Parent; // bookmark's parent element
// loop till we get the containing element in case bookmark is inside a table etc.
// keep checking the element's parent and update it till we reach the Body
var tempParent = bookmark.Parent;
bool isInTable = false;
while (tempParent.Parent != mainPart.Document.Body)
{
tempParent = tempParent.Parent;
if (tempParent is Table && !isInTable)
isInTable = true;
}
// ...
newParagraph.InsertAfterSelf(table); // from above sample
// if bookmark is in a table, add a paragraph after table
if (isInTable)
table.InsertAfterSelf(new Paragraph());
That should prevent the error from occurring and give you valid OpenXml. The while loop idea can be used if you answered "yes" to my earlier question and wanted to perform the insertion after the parent table rather than inside the table as the above code would do. If that's the case, the above issue would no longer be a concern and you can replace that loop and boolean with the following:
var parent = bookmark.Parent; // bookmark's parent element
while (parent.Parent != mainPart.Document.Body)
{
parent = parent.Parent;
}
This keeps re-assigning the parent till it's the main containing element at the Body level. So if the bookmark was in a paragraph that was in a table, it would go from Paragraph to TableCell to TableRow to Table and stop there since the Table's parent is the Body. At that point parent = Table element and we can insert after it.
That should cover some different approaches, depending on your original intent. Let me know if you need any clarification after trying it out.
Document Reflector
You might be wondering how I determined the GridColumn.Width values. I made a table and used the Document Reflector tool to get it. When you installed the Open Xml SDK, the productivity tools (if you installed them) would be located in C:\Program Files\Open XML Format SDK\V2.0\tools (or similar).
The best way to learn how the *.docx format works (or any Open Xml formatted doc) is to open an existing file with the Document Reflector tool. Navigate the document part, and locate the items you want to replicate. The tool shows you the actual code used to generate the entire document. This is code you can copy/paste into your application to generate similar results. You can ignore all the reference IDs usually; you'll have to take a look and try it out to get a feel for it.
As I mentioned, the above Table code was adapted from a sample document. I added a simple table to a docx, then opened it in the tool, and copied the code generated by the tool (I removed some extras to clean it up). That gave me a working sample to add a table.
It is especially helpful when you want to know how to write code that generates something, such as formatted tables and paragraphs with styles etc.
Take a look at this link for screenshots and info on the other tools included in the SDK: An introduction to Open XML SDK 2.0.
Code Snippets
You might also be interested in the code snippets for Open Xml. For a list of snippets check this blog post. You can download them from here: 2007 Office System Sample: Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008.
Once installed you would add them from Tools | Code Snippet Manager menu. Select C# for the language, click the Add button, and navigate to PersonalFolder\Visual Studio 2008\Code Snippets\Visual C#\Open XML SDK 2.0 for Microsoft Office to add them. From your code you would right-click and select "Insert Snippet" and select the one you want.
Related
I've parsed html into a PDF and created a table of contents from the Header tags. The bookmarks in the document work fine, but clicking on the line in the table of contents doesn't do anything. The cursor doesn't change icons like it does if I put a URL in the link.
I used Itext RUPS to inspect the final PDF and the named destinations are in the final file.
I tried hard coding a couple of the names in just to see what happens, but they also didn't work. Putting in .CreateURL and google.com works fine.
The one thing I'm doing that may or may not be an issue is I'm creating the body document, then creating the table of contents and merging the two documents.
Maybe Bruno can make a cameo on this one.
private static List ProcessOutlineChildren(PdfDocument pdfDocument, List tableOfContents, IEnumerable<PdfOutline> pdfOutlines, IDictionary<String, PdfObject> names = null)
{
List<TabStop> tabStops = new List<TabStop>();
tabStops.Add(new TabStop(580, TabAlignment.RIGHT));
foreach (var o in pdfOutlines)
{
ListItem currentOutlineItem = new ListItem();
Paragraph paragraph = new Paragraph();
paragraph.AddTabStops(tabStops);
paragraph.Add(o.GetTitle());
paragraph.Add(new Tab());
paragraph.Add((pdfDocument.GetPageNumber((PdfDictionary) o.GetDestination().GetDestinationPage(names))).ToString());
paragraph.SetAction(PdfAction.CreateGoTo(o.GetDestination()));
currentOutlineItem.Add(paragraph);
if (o.GetAllChildren().Any())
{
currentOutlineItem.Add(ProcessOutlineChildren(pdfDocument, new List(), o.GetAllChildren(), names));
}
tableOfContents.Add(currentOutlineItem);
}
return tableOfContents;
}
public class CustomOutlineHandler : OutlineHandler
{
//PDF's require a unique name for destinations, this is how the actions/bookmarks jump to a location.
protected override string GenerateUniqueDestinationName(IElementNode element)
{
string destinationName = base.GenerateUniqueDestinationName(element);
if ("p".Equals(element.Name()))
{
destinationName = destinationName.Replace(GetDestinationNamePrefix(), "paragraph-prefix-");
}
return destinationName;
}
}
//From my main method converting things into PDF.
OutlineHandler customOutlineHandler = new CustomOutlineHandler().PutAllTagPriorityMappings(priorityMappings);
customOutlineHandler.SetDestinationNamePrefix("destination-name-");
properties.SetOutlineHandler(customOutlineHandler);
Hi!!
i'm able to write charts to my XLSX file. But i'm stuck adding a simple title for every chart. No styles just simple plain text.
My code is like this:
String Dtitulo = "Hello chart";
DocumentFormat.OpenXml.Drawing.Charts.Title chartTitle = new DocumentFormat.OpenXml.Drawing.Charts.Title();
chartTitle.ChartText = new ChartText();
chartTitle.ChartText.RichText = new RichText();
DocumentFormat.OpenXml.Drawing.Paragraph parrafoTitulo = new DocumentFormat.OpenXml.Drawing.Paragraph();
DocumentFormat.OpenXml.Drawing.Run run = parrafoTitulo.AppendChild(new DocumentFormat.OpenXml.Drawing.Run());
run.AppendChild(new DocumentFormat.OpenXml.Drawing.Text(Dtitulo));
chartTitle.ChartText.RichText.AppendChild<DocumentFormat.OpenXml.Drawing.Paragraph>(parrafoTitulo);
chart.Title = chartTitle;
But when i open my file with excel says "file is corrupt" or something like that.
A bit late but I was faced with the same task, and I created an excel sheet and added manually a chart with a chart title, then opened the xml to understand what tags were needed. And after a while I got it working. moved everything in a small function as below:
So you can provide your chart object and the title you want to the below function and it will add the chart title.
Note:Im using Open XML SDK 2.0 for Microsoft Office
private void AddChartTitle(DocumentFormat.OpenXml.Drawing.Charts.Chart chart,string title)
{
var ctitle = chart.AppendChild(new Title());
var chartText = ctitle.AppendChild(new ChartText());
var richText = chartText.AppendChild(new RichText());
var bodyPr = richText.AppendChild(new BodyProperties());
var lstStyle = richText.AppendChild(new ListStyle());
var paragraph = richText.AppendChild(new Paragraph());
var apPr = paragraph.AppendChild(new ParagraphProperties());
apPr.AppendChild(new DefaultRunProperties());
var run = paragraph.AppendChild(new DocumentFormat.OpenXml.Drawing.Run());
run.AppendChild(new DocumentFormat.OpenXml.Drawing.RunProperties() { Language = "en-CA" });
run.AppendChild(new DocumentFormat.OpenXml.Drawing.Text() { Text = title });
}
And if you want a full example, you can review the official one here, and inject the above function in the right place (after the creation of the chart object) and it will add the chart title.
Using C#, I need to pull data from a word document. I have NetOffice for word installed in the project. The data is in two parts.
First, I need to pull data from the document settings.
Second, I need to pull the content of controls in the document. The content of the fields includes checkboxes, a date, and a few paragraphs. The input method is via controls, so there must be some way to interact with the controls via the api, but I don't know how to do that.
right now, I've got the following code to pull the flat text from the document:
private static string wordDocument2String(string file)
{
NetOffice.WordApi.Application wordApplication = new NetOffice.WordApi.Application();
NetOffice.WordApi.Document newDocument = wordApplication.Documents.Open(file);
string txt = newDocument.Content.Text;
wordApplication.Quit();
wordApplication.Dispose();
return txt;
}
So the question is: how do I pull the data from the controls from the document, and how do I pull the document settings (such as the title, author, etc. as seen from word), using either NetOffice, or some other package?
I did not bother to implement NetOffice, but the commands should mostly be the same (except probably for implementation and disposal methods).
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
string file = "C:\\Hello World.docx";
Microsoft.Office.Interop.Word.Document doc = word.Documents.Open(file);
// look for a specific type of Field (there are about 200 to choose from).
foreach (Field f in doc.Fields)
{
if (f.Type == WdFieldType.wdFieldDate)
{
//do something
}
}
// example of the myriad properties that could be associated with "document settings"
WdProtectionType protType = doc.ProtectionType;
if (protType.Equals(WdProtectionType.wdAllowOnlyComments))
{
//do something else
}
The MSDN reference on Word Interop is where you will find information on just about anything you need access to in a Word document.
UPDATE:
After reading your comment, here are a few document settings you can access:
string author = doc.BuiltInDocumentProperties("Author").Value;
string name = doc.Name; // this gives you the file name.
// not clear what you mean by "title"
As far as trying to understand what text you are getting from a "legacy control", I need more information as to exactly what kind of control you are extracting from. Try getting a name of the control/textbox/form/etc from within the document itself and then look up that property on the Google.
As a stab in the dark, here is an (incomplete) example of getting text from textboxes in the document:
List<string> textBoxText = new List<string>();
foreach (Microsoft.Office.Interop.Word.Shape s in doc.Shapes)
{
textBoxText.Add(s.TextFrame.TextRange.Text); //this could result in an error if there are shapes that don't contain text.
}
Another possibility is Content Controls, of which there are several types. They are often used to gather user input.
Here is some code to catch a rich text Content Control:
List<string> contentControlText = new List<string>();
foreach(ContentControl CC in doc.ContentControls)
{
if (CC.Type == WdContentControlType.wdContentControlRichText)
{
contentControlText.Add(CC.Range.Text);
}
}
Is it possible to enable "Sharing" on excel documents through OpenXML or ClosedXML? Or any other library if it can help... I believe this is usually performed when you save the document (at least that's how it works in VBA), but I can't find how to specify saving arguments in C#.
I'd like to avoid using InterOp since I might batch this process on multiple files through a network.
EDIT: According to some old pages from 2009, there are limitations where OpenXML cannot operate protected files. However, would that apply to sharing too?
Sharing Excel documents using OpenXML SDK is not well documented.
I did some tests and found that it is possible to enable sharing on Excel documents
using OpenXML SDK. The following steps are necessary to enable sharing:
Add a WorkbookUserDataPart to your Excel document. Add an empty Users collection
to the part. In this collection Excel stores all users who currently have
this shared workbook open.
Add a WorkbookRevisionHeaderPart to your Excel document. Add a Headers collection
to the part. In this collection Excel will store references to history, version and revision
information. Add a first element (Header) to the collection which contains the
SheetIdMap (used for tracking revision records). In the code sample below
I've added all worksheets included in the document.
Furthermore add a WorkbookRevisionLogPart to the workbook's revision header part.
In the log part a list of revision made to the document is stored.
The code sample below shows how to enable sharing on an Excel document.
The code also checks whether sharing is already enabled on a document.
Before you enable sharing you should create a backup of your original documents.
using (SpreadsheetDocument sd = SpreadsheetDocument.Open("c:\\temp\\enable_sharing.xlsx", true))
{
WorkbookPart workbookPart = sd.WorkbookPart;
if (workbookPart.GetPartsCountOfType<WorkbookRevisionHeaderPart>() != 0)
{
Console.Out.WriteLine("Excel document already shared!");
return;
}
// Create user data part if it does not exist.
if (workbookPart.GetPartsCountOfType<WorkbookUserDataPart>() == 0)
{
Console.Out.WriteLine("Adding user data part");
WorkbookUserDataPart workbookUserDataPart = workbookPart.AddNewPart<WorkbookUserDataPart>();
Users users = new Users() { Count = (UInt32Value)0U };
users.AddNamespaceDeclaration("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
workbookUserDataPart.Users = users;
}
// Create revision header part and revision log part.
WorkbookRevisionHeaderPart workbookRevisonHeaderPart = workbookPart.AddNewPart<WorkbookRevisionHeaderPart>();
WorkbookRevisionLogPart workbookRevisionLogPart = workbookRevisonHeaderPart.AddNewPart<WorkbookRevisionLogPart>();
// Create empty collection of revisions.
Revisions revisions = new Revisions();
revisions.AddNamespaceDeclaration("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
workbookRevisionLogPart.Revisions = revisions;
string lastSetOfRevisionsGuid = Guid.NewGuid().ToString("B");
// Create headers collection (references to history, revisions)
Headers headers = new Headers() { Guid = lastSetOfRevisionsGuid };
headers.AddNamespaceDeclaration("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
int worksheetPartsCount = workbookPart.GetPartsCountOfType<WorksheetPart>();
// Create first element in headers collection
// which contains the SheetIdMap.
Header header = new Header() { Guid = lastSetOfRevisionsGuid, DateTime = DateTime.Now,
MaxSheetId = (UInt32Value)(uint)worksheetPartsCount+1, UserName = "hans", Id = "rId1" };
// Create the list of sheet IDs that are used for tracking
// revision records. For every worksheet in the document
// create one SheetId.
SheetIdMap sheetIdMap = new SheetIdMap() { Count = (UInt32Value)(uint)worksheetPartsCount };
for (uint i = 1; i <= worksheetPartsCount; i++)
{
SheetId sheetId = new SheetId() { Val = (UInt32Value)i };
sheetIdMap.Append(sheetId);
}
header.Append(sheetIdMap);
headers.Append(header);
workbookRevisonHeaderPart.Headers = headers;
}
In the image below there is an area, which has an unknown (custom) class. That's not a Grid or a Table.
I need to be able:
to select Rows in this area
to grab a Value from each cell
The problem is since that's not a common type element - I have no idea how to google this problem or solve it myself. So far the code is following:
Process[] proc = Process.GetProcessesByName("programname");
AutomationElement window = AutomationElement.FromHandle(proc [0].MainWindowHandle);
PropertyCondition xEllist2 = new PropertyCondition(AutomationElement.ClassNameProperty, "CustomListClass", PropertyConditionFlags.IgnoreCase);
AutomationElement targetElement = window.FindFirst(TreeScope.Children, xEllist2);
I've already tried to threat this Area as a textbox, as a grid, as a combobox, but nothing solved my problem so far. Does anybody have any advice how to grab data from this area and iterate through rows?
EDIT: sorry I've made a wrong assumption. Actually, the header(column 1, column 2, column 3) and the "lower half" of this area are different control-types!!
Thanks to Wininspector I was able to dig more information regarding these control types:
The header has following properties: HeaderControl 0x056407DC (90441692) Atom: #43288 0xFFFFFFFF (-1)
and the lower half has these: ListControl 0x056408A4 (90441892) Atom: #43288 0x02A6FDA0 (44498336)
The code that I've showed earlier - retrieved the "List" element only, so here is the update:
Process[] proc = Process.GetProcessesByName("programname");
AutomationElement window = AutomationElement.FromHandle(proc [0].MainWindowHandle);
//getting the header
PropertyCondition xEllist3 = new PropertyCondition(AutomationElement.ClassNameProperty, "CustomHeaderClass", PropertyConditionFlags.IgnoreCase);
AutomationElement headerEl = XElAE.FindFirst(TreeScope.Children, xEllist3);
//getting the list
PropertyCondition xEllist2 = new PropertyCondition(AutomationElement.ClassNameProperty, "CustomListClass", PropertyConditionFlags.IgnoreCase);
AutomationElement targetElement = window.FindFirst(TreeScope.Children, xEllist2);
After giving it a further thought I've tried to get all column names:
AutomationElementCollection headerLines = headerEl.FindAll(TreeScope.Children, new PropertyCondition(AutomationElement.ControlTypeProperty, ControlType.HeaderItem));
string headertest = headerLines[0].GetCurrentPropertyValue(AutomationElement.NameProperty) as string;
textBox2.AppendText("Header 1: " + headertest + Environment.NewLine);
Unfortunately in debug mode element count in "headerLines" is 0 so the program throws an error.
Edit 2: Thanks to the answer below - I've installed Unmanaged UI Automation, which holds better possibilities than the default UIA. http://uiacomwrapper.codeplex.com/
How do you use the legacy pattern to grab data from unknown control-type?
if((bool)datagrid.GetCurrentPropertyValue(AutomationElementIdentifiers.IsLegacyIAccessiblePatternAvailableProperty))
{
var pattern = ((LegacyIAccessiblePattern)datagrid.GetCurrentPattern(LegacyIAccessiblePattern.Pattern));
var state = pattern.Current.State;
}
Edit 3. IUIAutoamtion approach (non-working as of now)
_automation = new CUIAutomation();
cacheRequest = _automation.CreateCacheRequest();
cacheRequest.AddPattern(UiaConstants.UIA_LegacyIAccessiblePatternId);
cacheRequest.AddProperty(UiaConstants.UIA_LegacyIAccessibleNamePropertyId);
cacheRequest.TreeFilter = _automation.ContentViewCondition;
trueCondition = _automation.CreateTrueCondition();
Process[] ps = Process.GetProcessesByName("program");
IntPtr hwnd = ps[0].MainWindowHandle;
IUIAutomationElement elementMailAppWindow = _automation.ElementFromHandle(hwnd);
List<IntPtr> ls = new List<IntPtr>();
ls = GetChildWindows(hwnd);
foreach (var child in ls)
{
IUIAutomationElement iuiae = _automation.ElementFromHandle(child);
if (iuiae.CurrentClassName == "CustomListClass")
{
var outerArayOfStuff = iuiae.FindAllBuildCache(interop.UIAutomationCore.TreeScope.TreeScope_Children, trueCondition, cacheRequest.Clone());
var outerArayOfStuff2 = iuiae.FindAll(interop.UIAutomationCore.TreeScope.TreeScope_Children, trueCondition);
var countOuter = outerArayOfStuff.Length;
var countOuter2 = outerArayOfStuff2.Length;
var uiAutomationElement = outerArayOfStuff.GetElement(0); // error
var uiAutomationElement2 = outerArayOfStuff2.GetElement(0); // error
//...
//I've erased what's followed next because the code isn't working even now..
}
}
The code was implemented thanks to this issue:
Read cell Items from data grid in SysListView32 of another application using C#
As the result:
countOuter and countOuter2 lengths = 0
impossible to select elements (rows from list)
impossible to get ANY value
nothing is working
You might want to try using the core UI automation classes. It requires that you import the dll to use it in C#. Add this to your pre-build event (or do it just once, etc):
"%PROGRAMFILES%\Microsoft SDKs\Windows\v7.0A\bin\tlbimp.exe" %windir%\system32\UIAutomationCore.dll /out:..\interop.UIAutomationCore.dll"
You can then use the IUIAutomationLegacyIAccessiblePattern.
Get the constants that you need for the calls from:
C:\Program Files\Microsoft SDKs\Windows\v7.1\Include\UIAutomationClient.h
I am able to read Infragistics Ultragrids this way.
If that is too painful, try using MSAA. I used this project as a starting point with MSAA before converting to all UIA Core: MSSA Sample Code
----- Edited on 6/25/12 ------
I would definitely say that finding the proper 'identifiers' is the most painful part of using the MS UIAutomation stuff. What has helped me very much is to create a simple form application that I can use as 'location recorder'. Essentially, all you need are two things:
a way to hold focus even when you are off of your form's window Holding focus
a call to ElementFromPoint() using the x,y coordinates of where the mouse is. There is an implementation of this in the CUIAutomation class.
I use the CTRL button to tell my app to grab the mouse coordinates (System.Windows.Forms.Cursor.Position). I then get the element from the point and recursively get the element's parent until I reach the the desktop.
var desktop = auto.GetRootElement();
var walker = GetRawTreeWalker();
while (true)
{
element = walker.GetParentElement(element);
if (auto.CompareElements(desktop, element) == 1){ break;}
}
----- edit on 6/26/12 -----
Once you can recursively find automation identifiers and/or names, you can rather easily modify the code here: http://blog.functionalfun.net/2009/06/introduction-to-ui-automation-with.html to be used with the Core UI Automation classes. This will allow you to build up a string as you recurse which can be used to identify a control nested in an application with an XPath style syntax.