Incorrect parsing of Textbox in docx by OpenXML - c#

I am reading a .docx file using OpenXML in C#. It reads everything correctly but strangely, the content of textbox is being read thrice. What could be wrong? Here is the code to read .docx:
public static string TextFromWord(String file)
{
const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
StringBuilder textBuilder = new StringBuilder();
using (WordprocessingDocument wdDoc = WordprocessingDocument.Open(file, false))
{
// Manage namespaces to perform XPath queries.
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", wordmlNamespace);
// Get the document part from the package.
// Load the XML in the document part into an XmlDocument instance.
XmlDocument xdoc = new XmlDocument(nt);
xdoc.Load(wdDoc.MainDocumentPart.GetStream());
XmlNodeList paragraphNodes = xdoc.SelectNodes("//w:p", nsManager);
foreach (XmlNode paragraphNode in paragraphNodes)
{
XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t", nsManager);
foreach (System.Xml.XmlNode textNode in textNodes)
{
textBuilder.Append(textNode.InnerText);
}
textBuilder.Append(Environment.NewLine);
}
}
return textBuilder.ToString();
}
The part of file I am talking about is:
The result is: I read it in a test application like this:
What's wrong here?

Related

iterating XmlDocument with xpath

How do we iterate an XmlDocument using an xpath?
I'm attempting to return a list of nodes by xpath:
public static List<string> Filter(string xpath, string input, string ns, string nsUrl)
{
var bytes = Encoding.UTF8.GetBytes(input); //i believe this unescapes the string
var stream = new MemoryStream(bytes);
var doc = new XmlDocument();
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(doc.NameTable);
namespaceManager.AddNamespace(ns, nsUrl);
var links = new List<string>();
var nodes = doc.SelectNodes(xpath, namespaceManager);
using (var reader = new XmlTextReader(stream))
{
reader.Namespaces = false;
doc.Load(reader);
}
foreach (XmlNode node in nodes)
{
if (IsNullOrWhiteSpace(node.InnerText))
{
continue;
}
links.Add(node.InnerText);
}
return links;
}
however, the count is always 0 !
I'm using this xpath. notice how i am using only 1 namespace:
/ns0:Visit/ns0:DocumentInterface/ns0:Documents/ns0:Document/ns0:BinaryData
The header of the file looks like this:
<ns0:Visit xmlns:ns0="http://NameSpace.ExternalSchemas.Patient"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
I'm certain that I am using the right xpath because I tested it against my payload:
I'm calling the function this way:
var links = Filter(xpath, xml, "ns0", "http://NameSpace.ExternalSchemas.Patient");
How do we iterate an XmlDocument using an xpath? Perhaps the XmlDocument should be an XDocument instead?

add new lines when generating xml from object

this is the code
XmlDocument xml = new XmlDocument();
XmlElement root = xml.CreateElement("customers");
xml.AppendChild(root);
foreach (var cust in customerlist)
{
XmlElement child = xml.CreateElement("customer");
child.SetAttribute("CustomerId", cust.CustomerId.ToString());
child.SetAttribute("CustomerName", cust.CustomerName);
child.SetAttribute("PhoneNumber", cust.PhoneNumber);
child.SetAttribute("Email", cust.Email);
root.AppendChild(child);
}
string s = xml.OuterXml;
I want my string to have next lines added to it instead of a single xml document
My string is coming as continuous
< x >xxxxx< /x > < x >xxxxx< /x >
You can use the XmlTextWriter class to format the XML as a string like this:
StringWriter string_writer = new StringWriter();
XmlTextWriter xml_text_writer = new XmlTextWriter(string_writer);
xml_text_writer.Formatting = Formatting.Indented;
xml.WriteTo(xml_text_writer); // xml is your XmlDocument
string formattedXml = string_writer.ToString();

C# append object to xml file using serialization

I am trying append a serialized object to an existing xml file beneath the root element, which I thought would be simple but is proving to be a little challenging.
The problem is in the AddShortcut method but I added some more code for completeness.
I believe what I need to do is:
load the file into an XmlDocument.
navigate to the node I want to append beneath (here the node name is Shortcuts).
create some type of writer and then serialize the object.
save the XmlDocument.
The trouble is in steps 2 and 3. I have tried different variations but I think using XPathNavigator somehow to find the "root" node to append under is a step in the right direction.
I have also looked at almost every question on Stack Overflow on the subject.
Any suggestions welcome. Here is my code
class XmlEngine
{
public string FullPath { get; set; } // the full path to the xmlDocument
private readonly XmlDocument xDoc;
public XmlEngine(string fullPath, string startElement, string[] rElements)
{
FullPath = fullPath;
xDoc = new XmlDocument();
CreateXmlFile(FullPath, startElement, rElements);
}
public void CreateXmlFile(string path, string startElement, string[] rElements)
{
try
{
if (!File.Exists(path))
{
// create a txt writer
XmlTextWriter wtr = new XmlTextWriter(path, System.Text.Encoding.UTF8);
// make sure the file is well formatted
wtr.Formatting = Formatting.Indented;
wtr.WriteProcessingInstruction("xml", "version='1.0' encoding='UTF-8'");
wtr.WriteStartElement(startElement);
wtr.Close();
// write the top level root elements
writeRootElements(path, rElements);
}
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex.Message);
Console.WriteLine("Could not create file: " + path);
}
}
public void AddShortcut(Shortcut s)
{
xDoc.Load(FullPath);
rootNode = xDoc.AppendChild(xDoc.CreateElement("Shortcuts"));
var serializer = new XmlSerializer(s.GetType());
using (var writer = new StreamWriter(FullPath, true))
{
XmlWriterSettings ws = new XmlWriterSettings();
ws.OmitXmlDeclaration = true;
serializer.Serialize(writer, s);
}
xDoc.Save(FullPath);
}
}
This code sample worked for me:
xml:
<?xml version="1.0" encoding="UTF-8"?>
<Launchpad>
<Shortcuts>
<Shortcut Id="1">
<Type>Folder</Type>
<FullPath>C:\SomePath</FullPath>
<Name>SomeFolderName</Name>
</Shortcut>
</Shortcuts>
</Launchpad>
Method:
public void AddShortcut(Shortcut s)
{
xDoc.Load(FullPath);
var rootNode = xDoc.GetElementsByTagName("Shortcuts")[0];
var nav = rootNode.CreateNavigator();
var emptyNamepsaces = new XmlSerializerNamespaces(new[] {
XmlQualifiedName.Empty
});
using (var writer = nav.AppendChild())
{
var serializer = new XmlSerializer(s.GetType());
writer.WriteWhitespace("");
serializer.Serialize(writer, s, emptyNamepsaces);
writer.Close();
}
xDoc.Save(FullPath);
}
load the file into an XmlDocument.
navigate to the node I want to append beneath (here the node name is Shortcuts).
create some type of writer and then serialize the object.
save the XmlDocument
So:
public void AddShortcut(Shortcut s)
{
// 1. load existing xml
xDoc.Load(FullPath);
// 2. create an XML node from object
XmlElement node = SerializeToXmlElement(s);
// 3. append that node to Shortcuts node under XML root
var shortcutsNode = xDoc.CreateElement("Shortcuts")
shortcutsNode.AppendChild(node);
xDoc.DocumentElement.AppendChild(shortcutsNode);
// 4. save changes
xDoc.Save(FullPath);
}
public static XmlElement SerializeToXmlElement(object o)
{
XmlDocument doc = new XmlDocument();
using(XmlWriter writer = doc.CreateNavigator().AppendChild())
{
new XmlSerializer(o.GetType()).Serialize(writer, o);
}
return doc.DocumentElement;
}
This post

Can not read XML document containing ampersand symbol

I am writing a program that reads a XML file with Visual C#. I have a problem reading the Xml file, because it contains invalid XML symbols, for example '&'.
I have to read the XML but I can not modify the document. How can I modify the Xml file using C#? My code so far:
private void button1_Click(object sender, EventArgs e)
{
XmlDocument doc;
doc = new XmlDocument();
doc.Load("nuevo.xml");
XmlNodeList Xpersonas = doc.GetElementsByTagName("personas");
XmlNodeList Xlista = ((XmlElement)Xpersonas[0]).GetElementsByTagName("edad");
foreach (XmlElement nodo in Xlista)
{
string edad = nodo.GetAttribute("edad");
string nombre = nodo.InnerText;
textBox1.Text = nodo.InnerXml;
}
As #EBrown suggested, one possibility would be read the file content in a string variable and replace the & symbol with the correct representation for propert XML & and then parse the XML structure. A possible solution could look like this:
var xmlContent = File.ReadAllText(#"nuevo.xml");
XmlDocument doc;
doc = new XmlDocument();
doc.LoadXml(xmlContent.Replace("&", "&"));
XmlNodeList Xpersonas = doc.GetElementsByTagName("personas");
XmlNodeList Xlista = ((XmlElement)Xpersonas[0]).GetElementsByTagName("edad");
foreach (XmlElement nodo in Xlista)
{
string edad = nodo.GetAttribute("edad");
string nombre = nodo.InnerText;
Console.WriteLine(nodo.InnerXml.Replace("&", "&"));
}
The output is:
34 & 34
If it is ok to use LINQ2XML, then the solution is even shorter, and there is no need to write the reverse(second) replace, because LINQ2XML make this for you automatically:
var xmlContent = File.ReadAllText(#"nuevo.xml");
var xmlDocument = XDocument.Parse(xmlContent.Replace("&", "&"));
var edad = xmlDocument.Root.Element("edad").Value;
Console.WriteLine(edad);
The output is the same as above.

How to load multiple files(*.xml) via XmlDocument C#

I wanted to load multiple .xml files in C#. Currently, I am able to load only 1 .xml file.
But not able to find out how will be able to load multiple files .
My code:
XmlDocument doc = new XmlDocument();
string path = #"path of *.xml file"; //
doc.Load(path);
Your code doesn't have to bee too big, you just need to store all the paths in some collection, then you have to apply the same operation to each XML file.
string newValue = "1234";
XmlDocument doc;
var paths = new[] { "config1.xml", "config2.xml" };
paths.ToList().ForEach(path =>
{
doc = new XmlDocument();
doc.Load(path);
// process the document
var nm = new XmlNamespaceManager(doc.NameTable);
var a = doc.SelectSingleNode("//SomeKeyValue", nm);
a.InnerText = newValue;
// save the file
doc.Save(path);
});

Categories