Would like to ask some advice when working with xml data with C#.
I have a small practice exercise where I am required to retrieve a specific text value at a specific tag.
I have assigned the various names of the element nodes to string values and the the user is required to input a string value to the console and if the name tag is the same as the input then to retrieve the text value positioned at that tag.
This is the C# code I used but I am not sure how to retrieve the text value at the name tag.
int priceSpecific;
string destination;
ArrayList array = new ArrayList();
xRootNode = xdoc.DocumentElement;
string firstValue = xRootNode.FirstChild.FirstChild.Name;
string secondValue = xRootNode.FirstChild.FirstChild.NextSibling.Name;
string thirdValue = xRootNode.FirstChild.FirstChild.NextSibling.NextSibling.Name;
string fourthValue = xRootNode.FirstChild.FirstChild.NextSibling.NextSibling.NextSibling.Name;
array.AddRange(new object[] { firstValue, secondValue, thirdValue, fourthValue});
Console.WriteLine("Please enter your destination, first letter capital");
destination = Console.ReadLine();
The idea is to loop through the arraylist and retrieve the name of the element node that is the same as the string input of the user.
Any advice as to how to retrieve the text value?
Regards
That is some pretty nasty looking code there! I would recommend that you spend a few hours learning about Linq-to-XML. roughly speaking, if you want to find the value of an element with a given name, it can be done as follows:
string elementName = "foo";
XDocument doc = XDocument.Parse("<xml document goes here>");
string matchedValue = doc.Descendants(elementName).Single().Value;
Much simpler!
You can use several approaches, most usable in your scenario seem to be:
XmlDocument + XPath (supported in all .NET versions)
XmlReader (supported in all .NET versions)
XDocument (supported with LINQ since .NET 3.0)
XDocument with LINQ syntax
Choices 3 or 4 are preferred if .NET 3 or above is available and xml document is not too big (document size of several MB is the boundary).
Choice 1 uses XPath, which allows very strong queries into the document structure
1.
XPathDocument document = new XPathDocument(#"myFile.xml");
XPathNavigator navigator = document.CreateNavigator();
string foundElementContent =
navigator.SelectSingleNode("//myElement[position()=1]/text()").ToString();
2.
string elementNameToFind = "myElement";
XmlReader xmlReader = XmlReader.Create(#"myFile.xml");
string foundElementContent = string.Empty;
while (xmlReader.Read())
{
if(xmlReader.NodeType==XmlNodeType.Element &&
xmlReader.Name == elementNameToFind)
{
foundElementContent=xmlReader.ReadInnerXml();
break;
}
}
xmlReader.Close();
3.
string elementNameToFind = "myElement";
XDocument xmlInMemoryDoc = XDocument.Load(#"myFile.xml");
XElement foundElement = xmlInMemoryDoc.Descendants(elementNameToFind).First();
4.
string elementNameToFind = "myElement";
XDocument xmlInMemoryDoc = XDocument.Load(#"myFile.xml");
XElement foundElement =
(
from e in xmlInMemoryDoc.Descendants()
where e.Name == elementNameToFind
select e
).First();
Related
I have a requirement to edit and save- precisely replace a text in an xml (within the c# project) with a value from arguments and save in temp location. The value is replaced and saved in location, but it adds some characters- [] and hence when i use the xml in another application as input, it is shown as incorrect xml! Even when i remove the extra character and save and rerun it shows the same error. However when i remove the extra character and paste the whole xml into a new file it works fine! I dont understand whats the issue. Have pasted my code below:
{
parameterFileName = "test";
tempPath = Path.GetTempPath() + parameterFileName + DateTime.Now.ToString("dd-MM-yyyy_hh-mm-ss") + ".xml";
XmlDocument xdoc = GetParameterXML(parameterFileName);
XmlNode root = xdoc.DocumentElement;
XmlNode node = xdoc.DocumentElement.SelectSingleNode(#"/root/inputParameters");
XmlNode childNode = node.ChildNodes[0];
if (childNode is XmlCDataSection)
{
XmlCDataSection cdataSection = childNode as XmlCDataSection;
if (cdataSection.Value.Contains("ID_VALUE"))
{
cdataSection.Value = cdataSection.Value.Replace("ID_VALUE", id);
}
}
xdoc.Save(tempPath);
}
public static XmlDocument GetParameterXML(string parameterFileName)
{
var sDllPath = AppDomain.CurrentDomain.BaseDirectory;
XmlDocument xDoc = new XmlDocument();
xDoc.Load(sDllPath + "\\Templates\\" + parameterFileName + ".xml");
return xDoc;
}
When you parse Xml Document by using XmlDocument with DTD then empty Internal Subset means Square Brackets [] is automatically inserted.
public static XmlDocument GetParameterXML(string parameterFileName)
{
var sDllPath = AppDomain.CurrentDomain.BaseDirectory;
XmlDocument xDoc = new XmlDocument();
xDoc.Load(sDllPath + "\\Templates\\" + parameterFileName + ".xml");
if (xDoc.DocumentType != null)
{
var name = xDoc.DocumentType.Name;
var publicId = xDoc.DocumentType.PublicId;
var systemId = xDoc.DocumentType.SystemId;
var parent = xDoc.DocumentType.ParentNode;
var documentTypeWithNullInternalSubset = xDoc.CreateDocumentType(name, publicId, systemId, null);
parent.ReplaceChild(documentTypeWithNullInternalSubset, xDoc.DocumentType);
}
return xDoc;
}
Does it matter?
No this does not matter. but its a well formed XML if your XML doesn't contain any internal subset then it represent as blank square brackets []. it means that your xml doesn't contain any internal subset.
While parsing xml with XDocument with no internal subset then XDocument append blank square brackets [] instead of display nothing in DOCTYPE.
What does an empty internal subset do?
The basic purpose of an internal entity is to get rid of typing same content (like the name of the organization) again and again. And instead, we can define an internal entity to contain the text and then only you need to use the entity where you want to insert the text. Because the entity is expanded by the parser, you can be assured that you'll get the same text in every location. The parser will also catch if you misspell an entity name.
You can read more about Internal Subset here
I'm currently using a loop which gives me a variable, which then needs to be fed into an Xpath method to get me any nodes with an attribute equal to my variable. So far, I've learned that Xpath allows you to select a node from the XML document using
root.SelectNodes("Element[#Attribute='SpecificValue']")
However, I'd like to know if there's a way I can insert a predefined variable where the specific value, so I can grab a different set of nodes with each iteration of my loop.
For example something like this:
string attribValue= "test"
root.SelectNodes("Element[#Attribute = attribValue]")
Use string formatting:
string attribValue = "test";
string expression = String.Format("Element[#Attribute = '{0}']", attribValue);
root.SelectNodes(expression);
Using XML Linq
XDocument doc = new XDocument();
XElement root = (XElement)doc.FirstNode;
string attribValue= "test";
var results = root.Descendants("Element").Where(x => x.Attribute("Attribute").Value == attribValue).ToList();
This is not a homework; I need this for my unit tests.
Sample input: <rows><row><a>1234</a><b>Hello</b>...</row><row>...</rows>.
Sample output: <rows><row><a>0.0</a><b>0.0</b>...</row><row>...</rows>.
You may assume that the document starts with <rows> and that parent node has children named <row>. You do not know the name of nodes a, b, etc.
For extra credit: how to make this work with an arbitrary well-formed, "free-form" XML?
I have tried this with a regex :) without luck. I could make it "non-greedy on the right", but not on the left. Thanks for your help.
EDIT: Here is what I tried:
private static string ReplaceValuesWithZeroes(string gridXml)
{
Assert.IsTrue(gridXml.StartsWith("<row>"), "Xml representation must start with '<row>'.");
Assert.IsTrue(gridXml.EndsWith("</row>"), "Xml representation must end with '<row>'.");
gridXml = "<deleteme>" + gridXml.Trim() + "</deleteme>"; // Fake parent.
var xmlDoc = XDocument.Parse(gridXml);
var descendants = xmlDoc.Root.Descendants("row");
int rowCount = descendants.Count();
for (int rowNumber = 0; rowNumber < rowCount; rowNumber++)
{
var row = descendants.ElementAt(0);
Assert.AreEqual<string>(row.Value /* Does not work */, String.Empty, "There should be nothing between <row> and </row>!");
Assert.AreEqual<string>(row.Name.ToString(), "row");
var rowChildren = row.Descendants();
foreach (var child in rowChildren)
{
child.Value = "0.0"; // Does not work.
}
}
// Not the most efficient but still fast enough.
return xmlDoc.ToString().Replace("<deleteme>", String.Empty).Replace("</deleteme>", String.Empty);
}
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlElement el in doc.SelectNodes("//*[not(*)]"))
el.InnerText = "0.0";
xml = doc.OuterXml;
or to be more selective about non-empty text nodes:
foreach (XmlText el in doc.SelectNodes("//text()[.!='']"))
el.InnerText = "0.0";
XDocument xml = XDocument.Load(myXmlFile);
foreach (var element in xml.Descendants("row").SelectMany(r => r.Elements()))
{
element.Value = "0.0";
}
Note that this general search for "Desscendants('row')" is not very efficient--but it satisfies the 'arbitrary format' requirement.
You should take look at HTML Agility Pack. It allows you to treat html documents as well-formed xml's, therefore you can parse it and change values.
I think you can use Regex.Replace method in C#. I used the below regex to replace all the XML elements values:
[>]+[a-zA-Z0-9]+[<]+
This will basically match text starting with a '>'{some text alphabets or number}'<'.
I was able to use this successfully in Notepad++. You can write a small program as well using this.
I’ve got a problem witch I’ve been trying to solve almost for a week now, but it seems that, unfortunately, I can’t manage it by myself.
Maybe somebody could help me.
I’ve got this type of source XML:
<data>
<para1>24499</para1>
<para2>32080148</para2>
<para4>20e500cc6008d0f8ab1fd108b220ca261f85edd9</para4>
<para6></para6>
<timetype>4</timetype>
<fkcontent>964342</fkcontent>
<season>0</season>
<fmstoken><![CDATA[7bca3c544ad64e526806fb5a6b845148]]></fmstoken>
<fmstoken_user>32010484</fmstoken_user>
<fmstoken_time>1283165972</fmstoken_time>
<fmstoken_renew><![CDATA[http://www.sky.com/logic/fmstoken.php?method=refresh]]></fmstoken_renew>
<adserverXML><![CDATA[http://www.sky.de/dummy.xml]]></adserverXML>
<playlist>
<videoinfo quality="0" name="DSL 1000">
<id>24499</id>
<noad>1</noad>
<productplacement>0</productplacement>
<filename>http://www.sky.com/video/1/V_53511_BB00_E81016_46324_16x9-lq-512x288-vp6-c0_bbb491b3ce64ef667340a21e2bfb3594.f4v</filename>
<title><![CDATA[Who will be the winner?]]></title>
</videoinfo>
<videoinfo quality="1" name="DSL 2000">
<id>24499</id>
<noad>1</noad>
<productplacement>0</productplacement>
<filename>http://www.sky.de/video/1/V_53513_BB00_E81016_46324_16x9-hq-512x288-vp6-c0_fa948bc5429cf28455779666cc59cf5e.f4v</filename>
<title><![CDATA[Who will be the winner?]]></title>
</videoinfo>
</playlist>
</data>
And here are parts of the code that let me get required tag content from xml page above:
private static string getTagContent(string source, string tag)
{
string fullTagBegin = "<" + tag + ">";
string fullTagEnd = "</" + tag + ">";
int indexBegin = source.IndexOf(fullTagBegin) + fullTagBegin.Length;
int indexEnd = source.IndexOf(fullTagEnd);
int indexLength = indexEnd - indexBegin;
if (indexBegin == -1 || indexEnd == -1)
return "UNKNOWN";
return source.Substring(indexBegin, indexLength);
}
public static void Start(String url)
{
try
{
String urlXML = url;
WebClient wClient = new WebClient();
string sourceXML = wClient.DownloadString(urlXML);
sourceXML = sourceXML.Replace("]]>", "");
sourceXML = sourceXML.Replace("<![CDATA[", "");
String para1 = getTagContent(sourceXML, "para1");
String para2 = getTagContent(sourceXML, "para2");
String para4 = getTagContent(sourceXML, "para4");
String timetype = getTagContent(sourceXML, "timetype");
String fkcontent = getTagContent(sourceXML, "fkcontent");
String season = getTagContent(sourceXML, "season");
String fmstoken = getTagContent(sourceXML, "fmstoken");
String fmstoken_user = getTagContent(sourceXML, "fmstoken_user");
String fmstoken_time = getTagContent(sourceXML, "fmstoken_time");
String fmstoken_renew = getTagContent(sourceXML, "fmstoken_renew");
String filename = getTagContent(sourceXML, "filename").Replace("http://", "");
String title = System.Text.RegularExpressions.Regex.Replace(getTagContent(sourceXML, "title"), #"[^a-zA-Z0-9]","_");
The problem is:
everything works fine except the fact, that there are two "filename" and "title" tags in the source xml, but I need to choose only second ones, those that are under this line:
<videoinfo quality="1" name="DSL 2000">,
and somehow skip/ignore first ones, those that are above previous line and right under this line:
<videoinfo quality="0" name="DSL 1000">
I can't figure out how to do that.
(My only guess is that maybe it has something to do with XPathNavigator, but I’m not sure if that’s a right guess, and anyway, I don’t really understand how to use it properly).
Edit: problem solved.
I want to thank everyone who replied for your suggestions.
Really appreciated!
This is really not the right way to go about working with XML in .Net.
You didn't mention which version of .Net you are developing for. Depending on the version look into using XmlDocument, XDocument / LINQ to XML.
MSDN on LINQ to XML
MSDN on XmlDocument
You should really load the XML into XMlDocument object and then edit it.
But if you prefer to use your existing code, this dirty code should do the trick.
int indexBegin = source.IndexOf(fullTagBegin) == source.LastIndexOf(fullTagBegin) ? source.IndexOf(fullTagBegin) + fullTagBegin.Length : source.LastIndexOf(fullTagBegin) + fullTagBegin.Length;
int indexEnd = source.IndexOf(fullTagEnd) == source.LastIndexOf(fullTagEnd) ? source.IndexOf(fullTagEnd) : source.LastIndexOf(fullTagEnd);
This will move the indexes to the last occurrence of whatever tag you're looking for. Just replace your declarations with this ones.
Edit: Additionally, you use this easy few lines to find/manipulate your XML in a much cleaner way.
XmlDocument doc = new XmlDocument();
doc.Load(filename);
// or doc.LoadXML(fullXMLcode);
var elements = doc.GetElementsByTagName("title");
var element = elements.Item(elements.Count - 1); // returns the last element
// element.InnerText gets the value you need. You can use this property to change it, too
Hope this helps.
You need this XPath expression:
/data/playlist/videoinfo[2]/filename | /data/playlist/videoinfo[2]/title
Or
/data/playlist/videoinfo[2]/*[self::filename or self::title]
These expression return a node set with filename and title element in document order.
In C# (I'm not an expert):
XPathDocument doc = new XPathDocument("document.xml");
XPathNodeIterator nodeset = doc.CreateNavigator()
.Select("/data/playlist/videoinfo[2]/*[self::filename or self::title]");
foreach (XPathNavigator node in nodeset)
{
// Your code
}
As many people have already said, XPath and LINQ are both suitable. Here's LINQ to XML sample:
XDocument doc = XDocument.Load("yourXml.xml");
var result =
(from videoInfo in doc.Descendants("videoinfo")
let quality = videoInfo.Attribute("quality")
let name = videoInfo.Attribute("name")
where (quality != null && quality.Value == "1")
&& (name != null && name.Value == "DSL 2000")
select new
{
Title = videoInfo.Element("title"),
FileName = videoInfo.Element("filename")
}
).First();
string title = result.Title.Value;
string fileName = result.FileName.Value;
I have several XML files that I wish to read attributes from. My main objective is to apply syntax highlighting to rich text box.
For example in one of my XML docs I have: <Keyword name="using">[..] All the files have the same element: Keyword.
So, how can I get the value for the attribute name and put them in a collection of strings for each XML file.
I am using Visual C# 2008.
The other answers will do the job - but the syntax highlighting thingy and the several xml files you say you have makes me thinks you need something faster, why not use a lean and mean XmlReader?
private string[] getNames(string fileName)
{
XmlReader xmlReader = XmlReader.Create(fileName);
List<string> names = new List<string>();
while (xmlReader.Read())
{
//keep reading until we see your element
if (xmlReader.Name.Equals("Keyword") && (xmlReader.NodeType == XmlNodeType.Element))
{
// get attribute from the Xml element here
string name = xmlReader.GetAttribute("name");
// --> now **add to collection** - or whatever
names.Add(name);
}
}
return names.ToArray();
}
Another good option would be the XPathNavigator class - which is faster than XmlDoc and you can use XPath.
Also I would suggest to go with this approach only IFF after you try with the straightforward options you're not happy with performance.
You could use XPath to get all the elements, then a LINQ query to get the values on all the name atttributes you find:
XDocument doc = yourDocument;
var nodes = from element in doc.XPathSelectElements("//Keyword")
let att = element.Attribute("name")
where att != null
select att.Value;
string[] names = nodes.ToArray();
The //Keyword XPath expression means, "all elements in the document, named "Keyword".
Edit: Just saw that you only want elements named Keyword. Updated the code sample.
Like others, I would suggest using LINQ to XML - but I don't think there's much need to use XPath here. Here's a simple method to return all the keyword names within a file:
static IEnumerable<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value);
}
Nice and declarative :)
Note that if you're going to want to use the result more than once, you should call ToList() or ToArray() on it, otherwise it'll reload the file each time. Of course you could change the method to return List<string> or string[] by -adding the relevant call to the end of the chain of method calls, e.g.
static List<string> GetKeywordNames(string file)
{
return XDocument.Load(file)
.Descendants("Keyword")
.Attributes("name")
.Select(attr => attr.Value)
.ToList();
}
Also note that this just gives you the names - I would have expected you to want the other details of the elements, in which case you'd probably want something slightly different. If it turns out you need more, please let us know.
You could use LINQ to XML.
Example:
var xmlFile = XDocument.Load(someFile);
var query = from item in xmlFile.Descendants("childobject")
where !String.IsNullOrEmpty(item.Attribute("using")
select new
{
AttributeValue = item.Attribute("using").Value
};
You'll likely want to use XPath. //Keyword/#name should get you all of the keyword names.
Here's a good introduction: .Net and XML XPath Queries
**<Countries>
<Country name ="ANDORRA">
<state>Andorra (general)</state>
<state>Andorra</state>
</Country>
<Country name ="United Arab Emirates">
<state>Abu Z¸aby</state>
<state>Umm al Qaywayn</state>
</Country>**
public void datass(string file)
{
string file = HttpContext.Current.Server.MapPath("~/App_Data/CS.xml");
XmlDocument doc = new XmlDocument();
if (System.IO.File.Exists(file))
{
//Load the XML File
doc.Load(file);
}
//Get the root element
XmlElement root = doc.DocumentElement;
XmlNodeList subroot = root.SelectNodes("Country");
for (int i = 0; i < subroot.Count; i++)
{
XmlNode elem = subroot.Item(i);
string attrVal = elem.Attributes["name"].Value;
Response.Write(attrVal);
XmlNodeList sub = elem.SelectNodes("state");
for (int j = 0; j < sub.Count; j++)
{
XmlNode elem1 = sub.Item(j);
Response.Write(elem1.InnerText);
}
}
}