Read Word bookmarks

Read Word bookmarks - c#

Can any tell me how to real all bookmarks in a word 2010 document using openXml 2.0. I was using Microsoft.Office.Interop.Word to read bookmarks but i am not able to deploy my website as it was having issues so i switched to openxml can anyone tell me how to read all bookmarks

you can iterate through all
file.MainDocumentPart.RootElement.Descendants<BookmarkStart>()
like:
IDictionary<String, BookmarkStart> bookmarkMap =
new Dictionary<String, BookmarkStart>();
// get all
foreach (BookmarkStart bookmarkStart in file.MainDocumentPart.RootElement.Descendants<BookmarkStart>())
{
bookmarkMap[bookmarkStart.Name] = bookmarkStart;
}
// get their text
foreach (BookmarkStart bookmarkStart in bookmarkMap.Values)
{
Run bookmarkText = bookmarkStart.NextSibling<Run>();
if (bookmarkText != null)
{
string bookmarkText = bookmarkText.GetFirstChild<Text>().Text;
}
}
code extracted from https://stackoverflow.com/a/3318381/28004

Try this.I have used same in my project
http://www.legalcube.de/post/Word-openxml-sdk-bookmark-handling.aspx

Related

OpenXml to remove watermarks from Word, Excel and Powerpoint

I'm newer using OpenXml, I'm developing a window application in C# and I need to remove the watermark (at run-time) from the selected word, excel or powerpoint file. The watermark has been added manually by the user (don't ask me why he could not remove it manually... it's a customer request...).
I have been created an "empty" file docx (Hello Word! in the body) and with a watermark "DRAFT". I have been implemented an example of simple application to remove it using the code used in this topic (code 1): Removing watermark in word with OpenXml & C# corrupts document
but the application returns a System.ArgumentOutOfRangeException.
The code is the following:
public Form1()
{
InitializeComponent();
string document = #"D:\Work\EsempioFiligrana\doc1.docx";
// Open the file in editable mode.
using (WordprocessingDocument wordprocessingDocument =
WordprocessingDocument.Open(document, true))
{
DeleteCustomWatermark(wordprocessingDocument, "DRAFT");
}
}
private static void DeleteCustomWatermark(WordprocessingDocument package, string watermarkId)
{
MainDocumentPart maindoc = package.MainDocumentPart;
if (maindoc != null)
{
var headers = maindoc.GetPartsOfType<HeaderPart>();
if (headers != null)
{
var head = headers.First(); //we are sure that this header part contains the Watermark with id=watermarkId
var watermark = head.GetPartById(watermarkId); \\ !! This statement generates the exception !!
if (watermark != null)
head.DeletePart(watermark);
}
}
}
What's wrong? What can I do to remove the watermark from the document?
Thanks

how to get Styles from existing word document by using Novacode.Docx?

This is the Example code using OpenXML SDK 2.5
void AddStylesPart()
{
StyleDefinitionsPart styleDefinitionsPart = mainPart.StyleDefinitionsPart;
styleDefinitionsPart = mainPart.AddNewPart<StyleDefinitionsPart>();
Styles styles1 = new Styles();
styles1.Save(styleDefinitionsPart);
if (styleDefinitionsPart != null)
{
using (WordprocessingDocument wordTemplate = WordprocessingDocument.Open(#"..\AT\Docs\FPMaster-4DEV.docx", false))
{
foreach (var templateStyle in wordTemplate.MainDocumentPart.StyleDefinitionsPart.Styles)
{
styleDefinitionsPart.Styles.Append(templateStyle.CloneNode(true));
}
}
}
}
Here an existing document is taken using WordprocessingDocument class finally Cloned all the styles present in existing document,
similarly I want to do it using Novacode.Docx DLL. How to get styles used in existing document using Novacode.Docx DLL? kindly please help.

Found an alternative solution, I hope this will help
Using Novacode.Docx DLL we can easily clone the styles used in original document.
It can be done by creating template of the original document.
once If it is done. apply the template in your project.
document.ApplyTemplate(#"..\TemplateFileName.dotx", false);
Now we can able to use all styles present in original document.

Read/import existing Excel file programmatically (cell-by-cell) in Windows Phone 8

I am working on a Windows Phone 8 app to READ/WRITE Excel files. I asked a question here about this and the comment provided and many other links led me to OpenXml.
All of this got me good on how to create an Excel file and how to launch it. But now I am stuck at very basic of these all i.e. How to read an existing Excel file (probably created outside using MS Excel) cell-by-cell i.e. I want to access each cells and their values through my code. In the openXML thing I did this:
Stream localFile = App.GetResourceStream(new Uri("/ReadExcel;component/jai.xlsx"
,UriKind.Relative)).Stream;
MemoryStream ms = new MemoryStream();
localFile.CopyTo(ms);
DocumentFormat.OpenXml.Packaging.SpreadsheetDocument spreadsheetDoc =
DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(localFile, true);
{
var a = spreadsheetDoc.Package;
// Do work here
}
But it gives me error:
The type 'System.IO.Packaging.Package' is defined in an assembly that is not
referenced. You must add a reference to assembly 'WindowsBase, Version=4.0.0.0
So basically I am stuck at this WindowsBase.dll. I tried all various ways to import an assembly i.e. unblock and all, but nothing works.
So all I want to do is to programmatically access the content of an existing Excel file in my code cell-by-cell.
Please help or suggest whether it is even possible as of now in WP8.

I used the following method to read cells from an xlsx Excel file on Windows Phone 8:
Add the Microsoft Compression library to your project using NuGet
Adapt the code sample from the developer network to your needs - it shows how to read cells from an Excel file (and it needs the Compression lib)
Since I already extended the code a bit to handle empty columns and empty files properly you can also use my code:
public class ExcelReader
{
List<string> _sharedStrings;
List<Dictionary<string, string>> _derivedData;
public List<Dictionary<string, string>> DerivedData
{
get
{
return _derivedData;
}
}
List<string> _header;
public List<string> Headers { get { return _header; } }
// e.g. cellID = H2 - only works with up to 26 cells
private int GetColumnIndex(string cellID)
{
return cellID[0] - 'A';
}
public void StartReadFile(Stream input)
{
ZipArchive z = new ZipArchive(input, ZipArchiveMode.Read);
var worksheet = z.GetEntry("xl/worksheets/sheet1.xml");
var sharedString = z.GetEntry("xl/sharedStrings.xml");
// get shared string
_sharedStrings = new List<string>();
// if there is no content the sharedStrings will be null
if (sharedString != null)
{
using (var sr = sharedString.Open())
{
XDocument xdoc = XDocument.Load(sr);
_sharedStrings =
(
from e in xdoc.Root.Elements()
select e.Elements().First().Value
).ToList();
}
}
// get header
using (var sr = worksheet.Open())
{
XDocument xdoc = XDocument.Load(sr);
// get element to first sheet data
XNamespace xmlns = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
XElement sheetData = xdoc.Root.Element(xmlns + "sheetData");
_header = new List<string>();
_derivedData = new List<Dictionary<string, string>>();
// worksheet empty?
if (!sheetData.Elements().Any())
return;
// build header first
var firstRow = sheetData.Elements().First();
// full of c
foreach (var c in firstRow.Elements())
{
// the c element, if have attribute t, will need to consult sharedStrings
string val = c.Elements().First().Value;
if (c.Attribute("t") != null)
{
_header.Add(_sharedStrings[Convert.ToInt32(val)]);
} else
{
_header.Add(val);
}
}
// build content now
foreach (var row in sheetData.Elements())
{
// skip row 1
if (row.Attribute("r").Value == "1")
continue;
Dictionary<string, string> rowData = new Dictionary<string, string>();
// the "c" elements each represent a column
foreach (var c in row.Elements())
{
var cellID = c.Attribute("r").Value; // e.g. H2
// each "c" element has a "v" element representing the value
string val = c.Elements().First().Value;
// a string? look up in shared string file
if (c.Attribute("t") != null)
{
rowData.Add(_header[GetColumnIndex(cellID)], _sharedStrings[Convert.ToInt32(val)]);
} else
{
// number
rowData.Add(_header[GetColumnIndex(cellID)], val);
}
}
_derivedData.Add(rowData);
}
}
}
}
This works for simple Excel files having one work sheet and some text and number cells. It assumes there is a header row.
Usage is as follows:
var excelReader = new ExcelReader();
excelReader.StartReadFile(excelStream);
After reading excelReader.Headers contains the header names, excelReader.DerivedData contains the rows. Each row is a Dictionary having the header as key and the data as value. Empty cells won't be in there.
Hope this gets you started.

Unfortunately, it is not possible to use the official OpenXML SDK by Microsoft. The reason is exactly the exception you already ran into. WP8 does not have the System.IO.Packaging namespace available which is required to extract/compress the zip-based xlsx file format. Adding WindowsBase.dll won't work either because it is not compiled for WP8.
After googling for quite some time in the last two years about this the only 3 solutions that I know are (despite developing Excel support from zero by your own :) ):
Use the Ag.OpenXML open source project which you can find on http://agopenxml.codeplex.com/ . The source repository contains an implementation to write an Excel file (the downloadable package only contains Word export). I use this in my WP8 app for quite some time and it works well despite the lack of a lot of features. Unfortunately, this package is not maintained anymore since 2011. However, it might be a good start for you.
Use the commercial libraries of ComponentOne https://www.componentone.com/SuperProducts/StudioWindowsPhone/
Use the commercial libraries of Syncfusion http://www.syncfusion.com/products/windows-phone

How to read PDF bookmarks programmatically

I'm using a PDF converter to access the graphical data within a PDF. Everything works fine, except that I don't get a list of the bookmarks. Is there a command-line app or a C# component that can read a PDF's bookmarks? I found the iText and SharpPDF libraries and I'm currently looking through them. Have you ever done such a thing?

Try the following code
PdfReader pdfReader = new PdfReader(filename);
IList<Dictionary<string, object>> bookmarks = SimpleBookmark.GetBookmark(pdfReader);
for(int i=0;i<bookmarks.Count;i++)
{
MessageBox.Show(bookmarks[i].Values.ToArray().GetValue(0).ToString());
if (bookmarks[i].Count > 3)
{
MessageBox.Show(bookmarks[i].ToList().Count.ToString());
}
}
Note: Don't forget to add iTextSharp DLL to your project.

As the bookmarks are in a tree structure (https://en.wikipedia.org/wiki/Tree_(data_structure)),
I've used some recursion here to collect all bookmarks and it's children.
iTextSharp solved it for me.
dotnet add package iTextSharp
Collected all bookmarks with the following code:
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using iTextSharp.text.pdf;
namespace PdfManipulation
{
class Program
{
static void Main(string[] args)
{
StringBuilder bookmarks = ExtractAllBookmarks("myPdfFile.pdf");
}
private static StringBuilder ExtractAllBookmarks(string pdf)
{
StringBuilder sb = new StringBuilder();
PdfReader reader = new PdfReader(pdf);
IList<Dictionary<string, object>> bookmarksTree = SimpleBookmark.GetBookmark(reader);
foreach (var node in bookmarksTree)
{
sb.AppendLine(PercorreBookmarks(node).ToString());
}
return RemoveAllBlankLines(sb);
}
private static StringBuilder RemoveAllBlankLines(StringBuilder sb)
{
return new StringBuilder().Append(Regex.Replace(sb.ToString(), #"^\s+$[\r\n]*", string.Empty, RegexOptions.Multiline));
}
private static StringBuilder PercorreBookmarks(Dictionary<string, object> bookmark)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(bookmark["Title"].ToString());
if (bookmark != null && bookmark.ContainsKey("Kids"))
{
IList<Dictionary<string, object>> children = (IList<Dictionary<string, object>>) bookmark["Kids"];
foreach (var bm in children)
{
sb.AppendLine(PercorreBookmarks(bm).ToString());
}
}
return sb;
}
}
}

You can use the PDFsharp library. It is published under the MIT License so it can be used even in corporate development. Here is an untested example.
using PdfSharp.Pdf;
using (PdfDocument document = PdfReader.IO.Open("bookmarked.pdf", IO.PdfDocumentOpenMode.Import))
{
PdfDictionary outline = document.Internals.Catalog.Elements.GetDictionary("/Outlines");
PrintBookmark(outline);
}
void PrintBookmark(PdfDictionary bookmark)
{
Console.WriteLine(bookmark.Elements.GetString("/Title"));
for (PdfDictionary child = bookmark.Elements.GetDictionary("/First"); child != null; child = child.Elements.GetDictionary("/Next"))
{
PrintBookmark(child);
}
}
Gotchas:
PdfSharp doesn't support open pdf's over version 1.6 very well. (throws: cannot handle iref streams. the current implementation of pdfsharp cannot handle this pdf feature introduced with acrobat 6)
There are many types of strings in PDFs which PDFsharp returns as is including UTF-16BE strings. (7.9.2.1 ISO32000 2008)

You might try Docotic.Pdf library for the task if you are fine with a commercial solution.
Here is a sample code to list all top-level items from bookmarks with some of their properties.
using (PdfDocument doc = new PdfDocument("file.pdf"))
{
PdfOutlineItem root = doc.OutlineRoot;
foreach (PdfOutlineItem item in root.Children)
{
Console.WriteLine("{0} ({1} child nodes, points to page {2})",
item.Title, item.ChildCount, item.PageIndex);
}
}
PdfOutlineItem class also provides properties related to outline item styles and more.
Disclaimer: I work for the vendor of the library.

If a commercial library is an option for you you could give Amyuni PDF Creator .Net a try.
Use the class Amyuni.PDFCreator.IacDocument.RootBookmark to retrieve the root of the bookmarks' tree, then the properties in IacBookmark to access each tree element, to navigate through the tree, and to add, edit or remove elements if needed.
Usual disclaimer applies

Manipulating Word 2007 Document XML in C#

I am trying to manipulate the XML of a Word 2007 document in C#. I have managed to find and manipulate the node that I want but now I can't seem to figure out how to save it back. Here is what I am trying:
// Open the document from memoryStream
Package pkgFile = Package.Open(memoryStream, FileMode.Open, FileAccess.ReadWrite);
PackageRelationshipCollection pkgrcOfficeDocument = pkgFile.GetRelationshipsByType(strRelRoot);
foreach (PackageRelationship pkgr in pkgrcOfficeDocument)
{
if (pkgr.SourceUri.OriginalString == "/")
{
Uri uriData = new Uri("/word/document.xml", UriKind.Relative);
PackagePart pkgprtData = pkgFile.GetPart(uriData);
XmlDocument doc = new XmlDocument();
doc.Load(pkgprtData.GetStream());
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", nsUri);
XmlNodeList nodes = doc.SelectNodes("//w:body/w:p/w:r/w:t", nsManager);
foreach (XmlNode node in nodes)
{
if (node.InnerText == "{{TextToChange}}")
{
node.InnerText = "success";
}
}
if (pkgFile.PartExists(uriData))
{
// Delete template "/customXML/item1.xml" part
pkgFile.DeletePart(uriData);
}
PackagePart newPkgprtData = pkgFile.CreatePart(uriData, "application/xml");
StreamWriter partWrtr = new StreamWriter(newPkgprtData.GetStream(FileMode.Create, FileAccess.Write));
doc.Save(partWrtr);
partWrtr.Close();
}
}
pkgFile.Close();
I get the error 'Memory stream is not expandable'. Any ideas?

I would recommend that you use Open XML SDK instead of hacking the format by yourself.

Using OpenXML SDK 2.0, I do this:
public void SearchAndReplace(Dictionary<string, string> tokens)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(_filename, true))
ProcessDocument(doc, tokens);
}
private string GetPartAsString(OpenXmlPart part)
{
string text = String.Empty;
using (StreamReader sr = new StreamReader(part.GetStream()))
{
text = sr.ReadToEnd();
}
return text;
}
private void SavePart(OpenXmlPart part, string text)
{
using (StreamWriter sw = new StreamWriter(part.GetStream(FileMode.Create)))
{
sw.Write(text);
}
}
private void ProcessDocument(WordprocessingDocument doc, Dictionary<string, string> tokenDict)
{
ProcessPart(doc.MainDocumentPart, tokenDict);
foreach (var part in doc.MainDocumentPart.HeaderParts)
{
ProcessPart(part, tokenDict);
}
foreach (var part in doc.MainDocumentPart.FooterParts)
{
ProcessPart(part, tokenDict);
}
}
private void ProcessPart(OpenXmlPart part, Dictionary<string, string> tokenDict)
{
string docText = GetPartAsString(part);
foreach (var keyval in tokenDict)
{
Regex expr = new Regex(_starttag + keyval.Key + _endtag);
docText = expr.Replace(docText, keyval.Value);
}
SavePart(part, docText);
}
From this you could write a GetPartAsXmlDocument, do what you want with it, and then stream it back with SavePart(part, xmlString).
Hope this helps!

You should use the OpenXML SDK to work on docx files and not write your own wrapper.
Getting Started with the Open XML SDK 2.0 for Microsoft Office
Introducing the Office (2007) Open XML File Formats
How to: Manipulate Office Open XML Formats Documents
Manipulate Docx with C# without Microsoft Word installed with OpenXML SDK

The problem appears to be doc.Save(partWrtr), which is built using newPkgprtData, which is built using pkgFile, which loads from a memory stream... Because you loaded from a memory stream it's trying to save the document back to that same memory stream. This leads to the error you are seeing.
Instead of saving it to the memory stream try saving it to a new file or to a new memory stream.

The short and simple answer to the issue with getting 'Memory stream is not expandable' is:
Do not open the document from memoryStream.
So in that respect the earlier answer is correct, simply open a file instead.
Opening from MemoryStream editing the document (in my experience) easy lead to 'Memory stream is not expandable'.
I suppose the message appears when one do edits that requires the memory stream to expand.
I have found that I can do some edits but not anything that add to the size.
So, f.ex deleting a custom xml part is ok but adding one and some data is not.
So if you actually need to open a memory stream you must figure out how to open an expandable MemoryStream if you want to add to it.
I have a need for this and hope to find a solution.
Stein-Tore Erdal
PS: just noticed the answer from "Jan 26 '11 at 15:18".
Don't think that is the answer in all situations.
I get the error when trying this:
var ms = new MemoryStream(bytes);
using (WordprocessingDocument wd = WordprocessingDocument.Open(ms, true))
{
...
using (MemoryStream msData = new MemoryStream())
{
xdoc.Save(msData);
msData.Position = 0;
ourCxp.FeedData(msData); // Memory stream is not expandable.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read Word bookmarks - c#

Can any tell me how to real all bookmarks in a word 2010 document using openXml 2.0. I was using Microsoft.Office.Interop.Word to read bookmarks but i am not able to deploy my website as it was having issues so i switched to openxml can anyone tell me how to read all bookmarks

Try this.I have used same in my project http://www.legalcube.de/post/Word-openxml-sdk-bookmark-handling.aspx

Related

OpenXml to remove watermarks from Word, Excel and Powerpoint

how to get Styles from existing word document by using Novacode.Docx?

Read/import existing Excel file programmatically (cell-by-cell) in Windows Phone 8

How to read PDF bookmarks programmatically

Manipulating Word 2007 Document XML in C#

Categories

Resources