var xml = #"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<metadata created=""2014-11-03T18:13:02.769Z"" xmlns=""http://example.com/ns/mmd-2.0#"" xmlns:ext=""http://example.com/ns/ext#-2.0"">
<customer-list count=""112"" offset=""0"">
<customer id=""5f6ab597-f57a-40da-be9e-adad48708203"" type=""Person"" ext:score=""100"">
<name>Bobby Smith</name>
<gender>male</gender>
<country>US</country>
<date-span>
<begin>1965-02-18</begin>
<end>false</end>
</date-span>
</customer>
<customer id=""22"" type=""Person"" ext:score=""100"">
<name>Tina Smith</name>
<gender>Female</gender>
<country>US</country>
<date-span>
<end>false</end>
</date-span>
</customer>
<customer id=""30"" type=""Person"" ext:score=""500"">
<name>George</name>
<gender>Male</gender>
<country>US</country>
<date-span>
<begin>1965</begin>
<end>false</end>
</date-span>
</customer>
</customer-list>
</metadata>";
Im using the above XML. The problem i have is the date (im referring to the <date-span> <begin> element) can be in any format. So i am trying to use the below code in order to take care of the date format
GetCustomers = from c in XDoc.Descendants(ns + "customer")
select
new Customer
{
Name = c.Element(ns + "name").Value,
DateOfBirth = Convert.ToDateTime(c.Element(ns + "date-span").Elements(ns + "begin").Any() ? c.Element(ns + "date-span").Element(ns + "begin").Value : DateTime.Now.ToString())
};
The above works but crashed as soon as the XML contained 1965 - unfortunately i have no control over the XML. So i tried to use TryParse in order to convert 1965 to dd/mm/1965 where dd and mm could be todays date and current month, but i cant seem to get it working:
BeginDate = Convert.ToDateTime(c.Element(ns + "life-span").Elements(ns + "begin").Any() ? DateTime.TryParse( c.Element(ns + "life-span").Element(ns + "begin").Value, culture, styles, out dateResult) : DateTime.Now).ToString())
Could anyone guide me here how to resolve the issue?
Edit 1
var ModifyBeginDate = XDoc.Descendants(ns + "artist").Elements(ns + "date-span").Elements(ns + "begin");
The above retrieves all the dates but how do i assign the values after i have changed them back to the XML (i dont think i can use this variable in my code as when i iterate through the XML it would go directly back to the original XML)
If the data can be in any format then you will have to preprocess the data before trying to parse it into a DateTime.
If i were going to implement this the first thing i would do is break the input into an array of integers, if there is only one item in the array i would check the length, if it was 4 long then I would assume that it is a year and instantiate a new DateTime of January 1, with the year. If i found an array of length 4,2,2 or 2,2,4 i would parse them accordingly - obviously there will be some of it left to guessing but if you can't control the format of the xml there will always be something left to chance
You could use something like this (but modified to only return the integer types and skip the splitting type which could be /,-, etc) to split the date time into an array containing the integer values:
https://stackoverflow.com/a/13548184/184746
Related
In my code I'm creating a new xml file with linq to xml and I have a specific format of xml that I'm trying to put into the xml file on creation. However, when I put the string variable in it gives the error "non white space characters cannot be added to content." How would I correctly add that string value to the xml file?
string firstPart = #"<?xml version=""1.0"" encoding=""utf-8""?>
< wiidisc version = ""1"" >
< id game = ""RMCE"" disc = ""0"" version = ""0"" >
</ id > ";
XDocument doc = new XDocument(firstPart);
doc.Save(riivolutionXmls + #"\" + xmlFileName + ".xml");
This isn't valid XML, you're missing a closing tag for wiidisc.
Also I don't think you can use the constructor to create an XDocument from a string, I think you have to use the XDocument.Parse method:
https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xdocument.parse?view=netframework-4.7.2#System_Xml_Linq_XDocument_Parse_System_String_
I have a task where i need to create a conversion program from OmniPage XML ocr output into ALTO XML.
The output of OmniPage XML is really different from ALTO XML.
I tried to find an ALTO XML file and trying to figure out where those values came from.
I need to get the formula of SP Tag WIDTH.
Below is the sample XML i'm trying to figure out
<TextLine ID="P1_TL00002" HPOS="26.00" VPOS="98.00" WIDTH="1667.00" HEIGHT="130.00">
<String ID="P1_ST00002" HPOS="26.00" VPOS="106.00" WIDTH="387.00" HEIGHT="95.00" CONTENT="Twenties" WC="0.99" CC="06370005"/>
<SP ID="P1_SP00001" HPOS="413.00" VPOS="201.00" WIDTH="29.00"/>
<String ID="P1_ST00003" HPOS="442.00" VPOS="98.00" WIDTH="246.00" HEIGHT="103.00" CONTENT="Glrls" WC="0.78" CC="00045"/>
<SP ID="P1_SP00002" HPOS="688.00" VPOS="201.00" WIDTH="26.00"/>
<String ID="P1_ST00004" HPOS="714.00" VPOS="98.00" WIDTH="178.00" HEIGHT="103.00" CONTENT="ancl" WC="0.54" CC="5660"/>
<SP ID="P1_SP00003" HPOS="892.00" VPOS="201.00" WIDTH="39.00"/>
<String ID="P1_ST00005" HPOS="931.00" VPOS="98.00" WIDTH="368.00" HEIGHT="130.00" CONTENT="FUppER" WC="0.83" CC="090000"/>
<SP ID="P1_SP00004" HPOS="1299.00" VPOS="228.00" WIDTH="32.00"/>
<String ID="P1_ST00006" HPOS="1331.00" VPOS="98.00" WIDTH="362.00" HEIGHT="106.00" CONTENT="PAshiON" WC="0.76" CC="0008206"/>
</TextLine>
I already figured out the values of HPOS and VPOS.
I used c# Rect class
Rect r = new Rect(26, 106, 387, 95);
Debug.WriteLine("BottomRight: " + r.BottomRight);
BottomRight: 413,201
But I can't figure where the SP tag's WIDTH value comes from.
Please help me.
Looks like it’s just the difference between the SP tags HPOS and the following String tags HPOS, e.g. 413.00 + 29.00 = 442.00, 688.00 + 26.00 = 714.00, and so on.
Below is the sample xml,
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
c#:
string xml = System.IO.File.ReadAllText(#"E:\Sample.xml");
xml = System.Text.RegularExpressions.Regex.Replace(xml, "<(?![_:a-z][-._:a-z0-9]*\b[^<>]*>)", "<");
XDocument doc = XDocument.Parse(xml);
i need to convert the special charecters (<,>,",',&) and i am using the above regex. but parse method throws an error. any help please how to resolve the issue
See your current code converts XML like this
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
Whereas Parse is looking it like this
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam and Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
and thus you should not be converting < to < but XML contains sam&Tim would not allow you to Parse it. thus you can use
xml = xml.Replace("&", " n ");//n or and or some other char or string you want
instead of
xml = System.Text.RegularExpressions.Regex.Replace(xml, "<(?![_:a-z][-._:a-z0-9]*\b[^<>]*>)", "<");
Hope this will help you to parse it.
You can give a try with:
string xml = System.IO.File.ReadAllText(#"E:\Sample.xml");
xml = ReplaceXMLEncodedCharacters(xml)
public string ReplaceXMLEncodedCharacters(string input)
{
const string pattern = #"&#(x?)([A-Fa-f0-9]+);";
MatchCollection matches = Regex.Matches(input, pattern);
int offset = 0;
foreach (Match match in matches)
{
int charCode = 0;
if (string.IsNullOrEmpty(match.Groups[1].Value))
charCode = int.Parse(match.Groups[2].Value);
else
charCode = int.Parse(match.Groups[2].Value, System.Globalization.NumberStyles.HexNumber);
char character = (char)charCode;
input = input.Remove(match.Index - offset, match.Length).Insert(match.Index - offset, character.ToString());
offset += match.Length - 1;
}
return input;
}
Your problem is that your original XML is not a valid XML document, because is contains an unescaped ampersand ('&') which is explicitly forbidden by the standard that says
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section.
To make it valid, you must use & instead of a literal &. Trying to "correct" it is not practical and a totally bad idea in the general case, because you can never be sure, where in your XML & stands for a literal & and where it is part of an XML entity. If it were possible to distinguish these usages unambiguously, that rule could be embedded in XML parsers and we would not have to deal with it.
A valid, standard-conformant representation of your document would be
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
I am trying to replace within a string
<?xml version="1.0" encoding="UTF-8"?>
<response success="true">
<output><![CDATA[
And
]]></output>
</response>
with nothing.
The problem I am running into is the characters <> and " characters are interacting within the replace. Meaning, it's not reading those lines as a full string all together as one but breaking the string when it comes to a <> or ". Here is what I have but I know this isn't right:
String responseString = reader.ReadToEnd();
responseString.Replace(#"<<?xml version=""1.0"" encoding=""UTF-8""?><response success=""true""><output><![CDATA[[", "");
responseString.Replace(#"]]\></output\></response\>", "");
What would be the correct code to get the replace to see these lines as just a string?
A string will never change. The Replace method works as follows:
string x = "AAA";
string y = x.Replace("A", "B");
//x == "AAA", y == "BBB"
However, the real problem is how you handle the XML response data.
You should reconsider your approach of handling incoming XML by string replacement. Just get the CDATA content using the standard XML library. It's as easy as this:
using System.Xml.Linq;
...
XDocument doc = XDocument.Load(reader);
var responseString = doc.Descendants("output").First().Value;
The CDATA will already be removed. This tutorial will teach more about working with XML documents in C#.
Given your document structure, you could simply say something like this:
string response = #"<?xml version=""1.0"" encoding=""UTF-8""?>"
+ #"<response success=""true"">"
+ #" <output><![CDATA["
+ #"The output is some arbitrary text and it may be found here."
+ "]]></output>"
+ "</response>"
;
XmlDocument document = new XmlDocument() ;
document.LoadXml( response ) ;
bool success ;
bool.TryParse( document.DocumentElement.GetAttribute("success"), out success) ;
string content = document.DocumentElement.InnerText ;
Console.WriteLine( "The response indicated {0}." , success ? "success" : "failure" ) ;
Console.WriteLine( "response content: {0}" , content ) ;
And see the expected results on the console:
The response indicated success.
response content: The output is some arbitrary text and it may be found here.
If your XML document is a wee bit more complex, you can easily select the desired node(s) using an XPath query, thus:
string content = document.SelectSingleNode( #"/response/output" ).InnerText;
I have this xml file that i have created pragmatically using C# :-
<Years>
<Year Year="2011">
<Month Month="10">
<Day Day="10" AccessStartTime="01:15 PM" ExitTime="01:15 PM" />
<Day Day="11" AccessStartTime="01:15 PM" ExitTime="01:15 PM" />
<Day Day="12" AccessStartTime="01:15 PM" ExitTime="01:15 PM" />
<Day Day="13" AccessStartTime="01:15 PM" ExitTime="01:15 PM" />
</Month>
<Month Month="11">
<Day Day="12" AccessStartTime="01:16 PM" ExitTime="01:16 PM" />
</Month>
</Year>
</Years>
I am having problems when i want to get specfic data from it while i am using XmlReader or i am doing it the wrong way cause each time the reader reads one single line and i what i want is to get a list of all days in a specific month and a year
Use Linq-XML or post the code you have tried.
var list = from ele in XDocument.Load(#"c:\filename.xml").Descendants("Year")
select new
{
Year = (string)ele.Attribute("Year"),
Month= (string)ele.Element("Month").Attribute("Month"),
Day = (string)ele.Element("Month").Element("Day").Attribute("Day")
};
foreach (var t in list)
{
Console.WriteLine(t.Year + " " + t.Month + " " + t.Day );
}
I agree with AVD's suggestion of using LINQ to XML. Finding all the days for a specific year and month is simple:
XDocument doc = XDocument.Load("file.xml");
var days = doc.Elements("Year").Where(y => (int) y.Attribute("Year") == year)
.Elements("Month").Where(m => (int) m.Attribute("Month") == month)
.Elements("Day");
(This assumes that Month and Year attributes are specified on all Month and Year elements.)
The result is a sequence of the Day elements for the specified month and year.
In most cases I'd actually write one method call per line, but in this case I thought it looked better to have one full filter of both element and attribute per line.
Note that in LINQ, some queries end up being more readable using query expressions, and some are more readable in the "dot notation" I've used above.
You asked for an explanation of AVD's code, so you may be similarly perplexed by mine - rather than explain the bits of LINQ to XML and LINQ that my code happens to use, I strongly recommend that you read good tutorials on both LINQ and LINQ to XML. They're wonderful technologies which will help your code all over the place.
Take a look at this example how to represent the xml with root node and using xml reader how to get the data ....
using System;
using System.Xml;
class Program
{
static void Main()
{
// Create an XML reader for this file.
using (XmlReader reader = XmlReader.Create("perls.xml"))
{
while (reader.Read())
{
// Only detect start elements.
if (reader.IsStartElement())
{
// Get element name and switch on it.
switch (reader.Name)
{
case "perls":
// Detect this element.
Console.WriteLine("Start <perls> element.");
break;
case "article":
// Detect this article element.
Console.WriteLine("Start <article> element.");
// Search for the attribute name on this current node.
string attribute = reader["name"];
if (attribute != null)
{
Console.WriteLine(" Has attribute name: " + attribute);
}
// Next read will contain text.
if (reader.Read())
{
Console.WriteLine(" Text node: " + reader.Value.Trim());
}
break;
}
}
}
}
}
}
Input text [perls.xml]
<?xml version="1.0" encoding="utf-8" ?>
<perls>
<article name="backgroundworker">
Example text.
</article>
<article name="threadpool">
More text.
</article>
<article></article>
<article>Final text.</article>
</perls>
Output
Start <perls> element.
Start <article> element.
Has attribute name: backgroundworker
Text node: Example text.
Start <article> element.
Has attribute name: threadpool
Text node: More text.
Start <article> element.
Text node:
Start <article> element.
Text node: Final text.