I have this example code in my c# win form...
List<string> listFurniture = new List<string>();
XDocument xml = XDocument.Load(Application.StartupPath + #"\Furniture.xml");
foreach (XElement quality in xml.Descendants("Quality"))
listFurniture.Add(quality.Value);
maintextbox.Text = listFurniture[0];
... And this example xml
<Furniture>
<Table>
<Quality>textbox1.Text + "and" + textbox2.Text + "but" + textbox3.Text</Quality>
...
</Table>
</Furniture>
My dilemma is, the maintextbox is producing the actual string "textbox1.Text", instead of the value of textbox1.
I want the xml value to be read as:
maintextbox.Text = textbox1.Text + "and" + textbox2.Text + "but" + textbox3.Text;
not as:
maintextbox.Text = "textbox1.Text + "and" + textbox2.Text + "but" + textbox3.Text";
I tried using a text file as well with StreamReader and I got the same result.
The reason for coding my project this way is because the sequence of the textboxes changes and so does the "and" and the "but". When that change happens, I wouldn't have to rewrite the code and recompile the program. I would just make the changes in xml.
There is all OK with xml parsing in your solution. What you need is processing of Quality strings.
string[] parts = quality.Split('+');
Regex regex = new Regex(#"^""(.*)""$");
var textBoxes = Controls.OfType<TextBox>().ToList();
for (int i = 0; i < parts.Length; i++)
{
string part = parts[i].Trim();
var match = regex.Match(part);
if (match.Success)
{
parts[i] = match.Groups[1].Value;
continue;
}
var textBox = textBoxes.FirstOrDefault(tb => tb.Name + ".Text" == part);
if (textBox != null) // possibly its an error if textbox not found
parts[i] = textBox.Text;
}
mainTextBox.Text = String.Join(" ", parts);
What happened here:
We split quality string by + chars to get array of string parts
With regular expression we verify if part looks like something in quotes "something". If yes, then it will be or, and or other connective word
And last, we check all textboxes for matching name of textbox in quality string part. If it matches, then we replace part with text from textbox
We join parts to get result string
BTW you can parse Xml in one line:
var listFurniture = xml.Descendants("Quality")
.Select(q => (string)q)
.ToList();
Update:
Since I received a comment to explain the code a bit; I'll explain it a bit.
First, XML as a language is designed for structure. That structure and ease; provides the flexibility and power to quickly parse data between languages or applications seamless. Your original question states that your textbox is producing a string value of your code textbox.text.
The XML need to be structured; an example structure would be:
<Furniture>
<Table>
<Color> Red </Color>
<Quality> 10 </Quality>
<Material> Wood </Material>
</Table>
</Furniture>
So if you were to read your XML it would be finding the root tag. All other components would be nodes. These nodes need to be index, or siphoned through to attain the proper correlation you would like represented into your textbox.
That is what this code is doing; I'll break it down through each step.
// String you will format with the XML Structure.
StringBuilder output = new StringBuilder();
The next part will be as follows:
// Create an XML Reader, by wrapping it in the 'using' it will ensure once your done the object is disposed of. Rather then leaving the connection to your document open.
using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
{
// We will read our document to the following; hold to that attribute. The attribute is identifying the node and all of the child elements that reside within it: So in our case Table.
reader.ReadToFollowing("Table");
reader.MoveToFirstAttribute();
string color = reader.Value;
output.AppendLine("The color of the table " + color);
// As you can see there isn't anything fancy here, it reads to our root node. Then moves to the first child element. Then it creates a string and appends it. Keep in mind we are using our StringBuilder so we are just appending to that structure.
reader.ReadToFollowing("Material");
output.AppendLine("The material: " + reader.ReadElementContentAsString());
// Same result as we used earlier; just a different method to attain our value.
}
// Now we output our block.
OutputTextBlock.Text = output.ToString();
Now all the data is pushed into a string, obviously you can use the above code with a textbox to retrieve those values as well.
That is how you correctly receive XML into your application; but you mentioned two things earlier. So it sounds like your trying to use the textbox to physically write to the document, which can be done through the XmlWriter.
But the reason you also keep receiving your textbox because as far as the textbox is concerned textbox.text is associated to the value. Your structure is stating this string is the value.
In order to achieve your goal; you would have a method to write the value to the document. Then another to read it; so that it properly transitions the data in and out of your document and is represented correctly.
<Quality>Textbox1.Text</Quality> That doesn't allow the textbox value to automatically be read into your document and textbox. Your assigning a string value into the node. You would physically have to write to the document the values before it can be read.
The MSDN has examples of how to properly parse the data; hopefully I've clarified some of the reasons in which you are having your issue.
More code; straight from MSDN:
Right off the MSDN:
StringBuilder output = new StringBuilder();
String xmlString =
#"<?xml version='1.0'?>
<!-- This is a sample XML document -->
<Items>
<Item>test with a child element <more/> stuff</Item>
</Items>";
// Create an XmlReader
using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
{
XmlWriterSettings ws = new XmlWriterSettings();
ws.Indent = true;
using (XmlWriter writer = XmlWriter.Create(output, ws))
{
// Parse the file and display each of the nodes.
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
writer.WriteStartElement(reader.Name);
break;
case XmlNodeType.Text:
writer.WriteString(reader.Value);
break;
case XmlNodeType.XmlDeclaration:
case XmlNodeType.ProcessingInstruction:
writer.WriteProcessingInstruction(reader.Name, reader.Value);
break;
case XmlNodeType.Comment:
writer.WriteComment(reader.Value);
break;
case XmlNodeType.EndElement:
writer.WriteFullEndElement();
break;
}
}
}
}
OutputTextBlock.Text = output.ToString();
or
StringBuilder output = new StringBuilder();
String xmlString =
#"<bookstore>
<book genre='autobiography' publicationdate='1981-03-22' ISBN='1-861003-11-0'>
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
</bookstore>";
// Create an XmlReader
using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
{
reader.ReadToFollowing("book");
reader.MoveToFirstAttribute();
string genre = reader.Value;
output.AppendLine("The genre value: " + genre);
reader.ReadToFollowing("title");
output.AppendLine("Content of the title element: " + reader.ReadElementContentAsString());
}
OutputTextBlock.Text = output.ToString();
Related
So my current understanding of how the C# XmlReader works is that it takes a given XML File and reads it node-by-node when I wrap it in a following construct:
using System.Xml;
using System;
using System.Diagnostics;
...
XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreComments = true;
settings.IgnoreWhitespace = true;
settings.IgnoreProcessingInstructions = true;
using (XmlReader reader = XmlReader.Create(path, settings))
{
while (reader.Read())
{
// All reader methods I call here will reference the current node
// until I move the pointer to some further node by calling methods like
// reader.Read(), reader.MoveToContent(), reader.MoveToElement() etc
}
}
Why will the following two snippets (within the above construct) produce two very different results, even though they both call the same methods?
I used this example file for testing.
Debug.WriteLine(new string(' ', reader.Depth * 2) + "<" + reader.NodeType.ToString() + "|" + reader.Name + ">" + reader.ReadString() + "</>");
(Snippet 1)
vs
(Snippet 2)
string xmlcontent = reader.ReadString();
string xmlname = reader.Name.ToString();
string xmltype = reader.NodeType.ToString();
int xmldepth = reader.Depth;
Debug.WriteLine(new string(' ', xmldepth * 2) + "<" + xmltype + "|" + xmlname + ">" + xmlcontent + "</>");
Output of Snippet 1:
<XmlDeclaration|xml></>
<Element|rss></>
<Element|head></>
<Text|>Test Xml File</>
<Element|description>This will test my xml reader</>
<EndElement|head></>
<Element|body></>
<Element|g:id>1QBX23</>
<Element|g:title>Example Title</>
<Element|g:description>Example Description</>
<EndElement|item></>
<Element|item></>
<Text|>2QXB32</>
<Element|g:title>Example Title</>
<Element|g:description>Example Description</>
<EndElement|item></>
<EndElement|body></>
<EndElement|xml></>
<EndElement|rss></>
Yes, this is formatted as it was in my output window. As to be seen it skipped certain elements and outputted a wrong depth for a few others. Therefore, the NodeTypes are correct, unlike Snippet Number 2, which outputs:
<XmlDeclaration|xml></>
<Element|xml></>
<Element|title></>
<EndElement|title>Test Xml File</>
<EndElement|description>This will test my xml reader</>
<EndElement|head></>
<Element|item></>
<EndElement|g:id>1QBX23</>
<EndElement|g:title>Example Title</>
<EndElement|g:description>Example Description</>
<EndElement|item></>
<Element|g:id></>
<EndElement|g:id>2QXB32</>
<EndElement|g:title>Example Title</>
<EndElement|g:description>Example Description</>
<EndElement|item></>
<EndElement|body></>
<EndElement|xml></>
<EndElement|rss></>
Once again, the depth is messed up, but it's not as critical as with Snippet Number 1. It also skipped some elements and assigned wrong NodeTypes.
Why can't it output the expected result? And why do these two snippets produce two totally different outputs with different depths, NodeTypes and skipped nodes?
I'd appreciate any help on this. I searched a lot for any answers on this but it seems like I'm the only one experiencing these issues. I'm using the .NET Framework 4.6.2 with Asp.net Web Forms in Visual Studio 2017.
Firstly, you are using a method XmlReader.ReadString() that is deprecated:
XmlReader.ReadString Method
... reads the contents of an element or text node as a string. However, we recommend that you use the ReadElementContentAsString method instead, because it provides a more straightforward way to handle this operation.
However, beyond warning us off the method, the documentation doesn't precisely specify what it actually does. To determine that, we need to go to the reference source:
public virtual string ReadString() {
if (this.ReadState != ReadState.Interactive) {
return string.Empty;
}
this.MoveToElement();
if (this.NodeType == XmlNodeType.Element) {
if (this.IsEmptyElement) {
return string.Empty;
}
else if (!this.Read()) {
throw new InvalidOperationException(Res.GetString(Res.Xml_InvalidOperation));
}
if (this.NodeType == XmlNodeType.EndElement) {
return string.Empty;
}
}
string result = string.Empty;
while (IsTextualNode(this.NodeType)) {
result += this.Value;
if (!this.Read()) {
break;
}
}
return result;
}
This method does the following:
If the current node is an empty element node, return an empty string.
If the current node is an element that is not empty, advance the reader.
If the now-current node is the end of the element, return an empty string.
While the current node is a text node, add the text to a string and advance the reader. As soon as the current node is not a text node, return the accumulated string.
Thus we can see that this method is designed to advance the reader. We can also see that, given mixed-content XML like <head>text <b>BOLD</b> more text</head>, ReadString() will only partially read the <head> element, leaving the reader positioned on <b>. This oddity is likely why Microsoft deprecated the method.
We can also see why your two snippets function differently. In the first, you get reader.Depth and reader.NodeType before calling ReadString() and advancing the reader. In the second you get these properties after advancing the reader.
Since your intent is to iterate through the nodes and get the value of each, rather than ReadString() or ReadElementContentAsString() you should just use XmlReader.Value:
gets the text value of the current node.
Thus your corrected code should look like:
string xmlcontent = reader.Value;
string xmlname = reader.Name.ToString();
string xmltype = reader.NodeType.ToString();
int xmldepth = reader.Depth;
Console.WriteLine(new string(' ', xmldepth * 2) + "<" + xmltype + "|" + xmlname + ">" + xmlcontent + "</>");
XmlReader is tricky to work with. You always need to check the documentation to determine exactly where a given method positions the reader. For instance, XmlReader.ReadElementContentAsString() moves the reader past the end of the element, whereas XmlReader.ReadSubtree() moves the reader to the end of the element. But as a general rule any method named Read is going to advance the reader, so you need to be careful using a Read method inside an outer while (reader.Read()) loop.
Demo fiddle here.
I have a XML file that i am using to loop through an on matching of a child node getting the value of a an attribute.The thing is matching these values with a * character or ? character like some regex style..can someone tell me how to do this .So if a request comes like g.portal.com it should match the second node .I am using .net 2.0
Below is my XML file
<Test>
<Test Text="portal.com" Sample="1" />
<Test Text="*.portal.com" Sample="201309" />
<Test Text="portal-0?.com" Sample="201309" />
</Test>
XmlDocument xDoc = new XmlDocument();
xDoc.Load(PathToXMLFile);
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
if (node.Attributes["Sample"].InnerText == value)
{
}
}
What you need to do is first convert each Text attribute into a valid Regex pattern and then use it to match your input. Something like this:
string input = "g.portal.com";
XmlNode foundNode = null;
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
string value = node.Attributes["Text"].Value;
string pattern = Regex.Escape(value)
.Replace(#"\*", ".*")
.Replace(#"\?", ".");
if (Regex.IsMatch(input, "^" + pattern + "$"))
{
foundNode = node;
break; //remove if you want to continue searching
}
}
After executing the above code, foundNode should contain the second node from the xml file.
So you have an XML file that sets up patterns, right? You'll want to feed those patterns into Regexes and then stream a number of requests through them. Did I get that correct?
Assuming the XML file doesn't change it only needs to be processed into according Regexes. For example *.portal.com would translate to
new Regex("\\w+\\.portal\\.com");
You'll just have to escape the dots, replace * with \\w+ and ? with \\w if i guessed the semantics of you match patterns correctly.
Look up the correct replacements at http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
I have here a small code:
string attributeValue = "Hello" + Environment.NewLine + " Hello 2";
XElement element = new XElement("test");
XElement subElement = new XElement("subTest");
XAttribute attribute = new XAttribute("key", "Hello");
XAttribute attribute2 = new XAttribute("key2", attributeValue);
subElement.Add(attribute);
subElement.Add(attribute2);
element.Add(subElement);
Console.Write(element.ToString());
Console.ReadLine();
I have an issue, basically the /r/n or the new line is converted in
in attribute, but I dont want to have it, I want to keep it /r/n as when I use this XML with the Microsoft Word documents template, the new lines are not implemented, although it is multilined text, in word document I only get the spaces. But no new lines :/
Anyone has any idea?
Although i've set the allow multi line int he property of the field in the template.
Actually the behaviour you get with
is the same than the one of Environment.NewLine. You can do a simple test to confirm this (add two TextBoxes to your Form with the Multiline property set to True: textBox1 and textBox2):
textBox1.Text = element.ToString(); //
string text = element.ToString().Replace("
", Environment.NewLine);
textBox2.Text = text; ///r/n
On the other hand, if you want to avoid the
part anyway (for example: because of wanting to output the given string to an external program not working on .NET), you can just rely on the aforementioned Replace after dealing with XElement and new lines.
so I have an XML document I'm trying to import using XmlTextReader in C#, and my code works well except for one part, that's where the tag line is not on the same line as the actually text/content, for example with product_name:
<product>
<sku>27939</sku>
<product_name>
Sof-Therm Warm-Up Jacket
</product_name>
<supplier_number>ALNN1064</supplier_number>
</product>
My code to try to sort the XML document is as such:
while (reader.Read())
{
switch (reader.Name)
{
case "sku":
newEle = new XMLElement();
newEle.SKU = reader.ReadString();
break;
case "product_name":
newEle.ProductName = reader.ReadString();
break;
case "supplier_number":
newEle.SupplierNumber = reader.ReadString();
products.Add(newEle);
break;
}
}
I have tried almost everything I found in the XmlTextReader documentation
reader.MoveToElement();
reader.MoveToContent();
reader.MoveToNextAttribute();
and a couple others that made less sense, but none of them seem to be able to consistently deal with this issue. Obviously I could fix this one case, but then it would break the regular cases. So my question is, would there be a way to have it after I find the "product_name" tag to go to the next line that contains text and extract it?
I should have mentioned, I am outputting it to an HTML table after and the element is coming up blank so I'm fairly certain it is not reading it correctly.
Thanks in advanced!
I think you will find Linq To Xml easier to use
var xDoc = XDocument.Parse(xmlstring); //or XDocument.Load(filename);
int sku = (int)xDoc.Root.Element("sku");
string name = (string)xDoc.Root.Element("product_name");
string supplier = (string)xDoc.Root.Element("supplier_number");
You can also convert your xml to dictionary
var dict = xDoc.Root.Elements()
.ToDictionary(e => e.Name.LocalName, e => (string)e);
Console.WriteLine(dict["sku"]);
It looks like you may need to remove the carriage returns, line feeds, tabs, and spaces before and after the text in the XML element. In your example, you have
<!-- 1. Original example -->
<product_name>
Sof-Therm Warm-Up Jacket
</product_name>
<!-- 2. It should probably be. If possible correct the XML generator. -->
<product_name>Sof-Therm Warm-Up Jacket</product_name>
<!-- 3a. If white space is important, then preserve it -->
<product_name xml:space='preserve'>
Sof-Therm Warm-Up Jacket
</product_name>
<!-- 3b. If White space is important, use CDATA -->
<product_name>!<[CDATA[
Sof-Therm Warm-Up Jacket
]]></product_name>
The XmlTextReader has a WhitespaceHandling property, but when I tested it, it still including the returns and indentation:
reader.WhitespaceHandling = WhitespaceHandling.None;
An option is to use a method to remove the extra characters while you are parsing the document. This method removes the normal white space at the beginning and end of a string:
string TrimCrLf(string value)
{
return Regex.Replace(value, #"^[\r\n\t ]+|[\r\n\t ]+$", "");
}
// Then in your loop...
case "product_name":
// Trim the contents of the 'product_name' element to remove extra returns
newEle.ProductName = TrimCrLf(reader.ReadString());
break;
You can also use this method, TrimCrLf(), with Linq to Xml and the traditional XmlDocument. You can even make it an extension method:
public static class StringExtensions
{
public static string TrimCrLf(this string value)
{
return Regex.Replace(value, #"^[\r\n\t ]+|[\r\n\t ]+$", "");
}
}
// Use it like:
newEle.ProductName = reader.ReadString().TrimCrLf();
Regular expression explanation:
^ = Beginning of field
$ = End of field
[]+= Match 1 or more of any of the contained characters
\n = carriage return (0x0D / 13)
\r = line feed (0x0A / 10)
\t = tab (0x09 / 9)
' '= space (0x20 / 32)
I have run into a similar problem before when dealing with text that originated on a Mac platform due to reversed \r\n in newlines. Suggest you try Ryan's regex solution, but with the following regex:
"^[\r\n]+|[\r\n]+$"
In an XmlDocument, either when writing and modify later, is it possible to remove the self-closing tags (i.e. />) for a certain element.
For example: change
<img /> or <img></img> to <img>.
<br /> to <br>.
Why you ask? I'm trying to conform to the HTML for Word 2007 schema; the resulting HTML will be displayed in Microsoft Outlook 2007 or later.
After reading another StackOverflow question, I tried the setting the IsEmpty property to false like so.
var imgElements = finalHtmlDoc.SelectNodes("//*[local-name()=\"img\"]").OfType<XmlElement>();
foreach (var element in imgElements)
{
element.IsEmpty = false;
}
However that resulted in <img /> becoming <img></img>. Also, as a hack I also tried changing the OuterXml property directly however that doesn't work (didn't expect it to).
Question
Can you remove the self-closing tags from XmlDocument? I honestly do not think there is, as it would then be invalid xml (no closing tag), however thought I would throw the question out the community.
Update:
I ended up fixing the HTML string after exporting from the XmlDocument using a regular expression (written in the wonderful RegexBuddy).
var fixHtmlRegex = new Regex("<(?<tag>meta|img|br)(?<attributes>.*?)/>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
return fixHtmlRegex.Replace(htmlStringBuilder.ToString(), "<$1$2>");
It cleared many errors from the validation pass and allow me to focus on the real compatibility problems.
You're right: it's not possible simply because it's invalid (or rather, not well-formed) XML. Empty elements in XML must be closed, be it with the shortcut syntax /> or with an immediate closing tag.
Both HTML and XML are applications of SGML. While HTML and SGML allow unclosed tags like <br>, XML does not.
A bit embarrassed by my answer, but it worked for what I needed. After you have a complete xml document you can string manipulate it to clean it up...
private string RemoveSelfClosingTags(string xml)
{
char[] seperators = { ' ', '\t', '\r', '\n' };
int prevIndex = -1;
while (xml.Contains("/>"))
{
int selfCloseIndex = xml.IndexOf("/>");
if (prevIndex == selfCloseIndex)
return xml; // we are in a loop...
prevIndex = selfCloseIndex;
int tagStartIndex = -1;
string tag = "";
//really? no backwards indexof?
for (int i = selfCloseIndex; i > 0; i--)
{
if (xml[i] == '<')
{
tagStartIndex = i;
break;
}
}
int tagEndIndex = xml.IndexOfAny(seperators, tagStartIndex);
int tagLength = tagEndIndex - tagStartIndex;
tag = xml.Substring(tagStartIndex + 1, tagLength - 1);
xml = xml.Substring(0, selfCloseIndex) + "></" + tag + ">" + xml.Substring(selfCloseIndex + 2);
}
return xml;
}
<img> would not be valid XML, so no, you can't do this.