I have an XSD schema already for the following xml file
<?xml version="1.0"?>
<note>
<to> </to>
<from> </from>
<datetime> </datetime>
<heading> </heading>
<body> </body>
</note>
I implemented a NoteGnerator to generate xml files based on the schema. The xml files must have to generated regarding some templates/specifications, such as:
<?xml version="1.0"?>
<note>
<to> Lucy </to>
<from> Lily </from>
<datetime> --date--time-- </datetime>
<heading> reminder </heading>
<body> do not forget my pen </body>
</note>
Another template/specification would be like:
<?xml version="1.0"?>
<note>
<to> Lily </to>
<from> Lucy </from>
<datetime> --date--time-- </datetime>
<heading> reply </heading>
<body> no problem </body>
</note>
, where <datetime> is a dynamic value when the xml is generated (so this value cannot be predetermined). Based on the XSD scheme and these two XML specifications, I can easily generate XML messages.
How can I unit test the generated XML files?
Do I need to validate the generated XML files again the schema? Or I need to use some diff tool to compare the generated xml files and the template? Because the datetime is dynamic, it is different each time when an xml file is generated, so how to compare them with the template? Or I need to deserialise xml to c# object and then test the c# object ?
This might be helpful for you. In this I am creating a object, assigning values, writing it to XML, reading the XML, and comparing it to original object. I am assuming that you have whole class structure.
// This is your expected object which you are going to write to xml.
var sourceObject = new SomeClassToWriteInXML();
// Writing object to XML.
var document = new XDocument();
var serializer = new XMLSerializer(typeof(SomeClassToWriteInXML));
using (var writer = document .CreateWriter())
{
serializer.Serialize(writer, source);
}
// write document to a file.
// Now document has the XML document.
// Need to read file you have just created. For testing sake I am reading document.
var actual = new SomeClassToWriteInXML();
// Deserialize xml to get actual object (which should be equal to sourceObject)
using (var reader = document.CreateReader())
{
actual = (SomeClassToWriteInXML)serializer.Deserialize(reader);
}
Assert.AreEqual(expected.First(), actual.First());
You can easily compare generated XML node values, except from the datetime. This is because of its non-deterministic nature. In unit testing (and code design) such problems are usually solved in either of two ways:
removing non-determinism altogether
loosening your requirements relating to non-determinism (eg. by not performing exact matching but rather some sort of fuzzy/approximated one)
With first solution, your note generating component would need to abstract out current date time to external service/dependency, say:
public class NoteGenerator
{
private readonly ICurrentDateProvider currentDateProvider;
public NoteGenerator(ICurrentDateProvider )currentDateProvider
{
this.currentDateProvider = currentDateProvider;
}
public string GenerateNote()
{
var currentDate = currentDateProvider.Now;
// ...
Now in unit test you can fake that dependency using your isolation framework of choice and perform assertions against deterministic value you set yourself (example with FakeItEasy):
var dateProvider = A.Fake<ICurrentDateProvider>();
A.CallTo(() => dateProvider.Now).Returns(new DateTime(2014, 01, 31, 10, 30));
var generator = new NoteGenerator(dateProvider);
// ...
The second approach is to replace the date time must be this value-matching with date time must not be older than-matching, for example:
var oneMinuteAgo = DateTime.Now.AddMinutes(-1.0);
var generator = new NoteGenerator();
var dateFromXml = // extract
Assert.That(dateFromXml, Is.GreaterThan(oneMinuteAgo));
Related
Trying to read XML file with nested XML object with own XML declaration. As expected got exception:
Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it.
How can i read that specific element as text and parse it as separate XML document for later deserialization?
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<Items>
<Item>
<Target type="System.String">Some target</Target>
<Content type="System.String"><?xml version="1.0" encoding="utf-8"?><Data><Items><Item><surname type="System.String">Some Surname</surname><name type="System.String">Some Name</name></Item></Items></Data></Content>
</Item>
</Items>
</Data>
Every approach i'm trying fail due to declaration exception.
var xml = System.IO.File.ReadAllText("Info.xml");
var xDoc = XDocument.Parse(xml); // Exception
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml); // Exception
var xmlReader = XmlReader.Create(new StringReader(xml));
xmlReader.ReadToFollowing("Content"); // Exception
I have no control over XML creation.
The only way I would know is by getting rid of the illegal second <?xml> declaration. I wrote a sample that will simply look for and discard the second <?xml>. After that the string has become valid XML and can be parsed. You may need to tweak it a bit to make it work for your exact scenario.
Code:
using System;
using System.Xml;
public class Program
{
public static void Main()
{
var badXML = #"<?xml version=""1.0"" encoding=""UTF-8""?>
<Data>
<Items>
<Item>
<Target type=""System.String"">Some target</Target>
<Content type=""System.String""><?xml version=""1.0"" encoding=""utf-8""?><Data><Items><Item><surname type=""System.String"">Some Surname</surname><name type=""System.String"">Some Name</name></Item></Items></Data></Content>
</Item>
</Items>
</Data>";
var goodXML = badXML.Replace(#"<Content type=""System.String""><?xml version=""1.0"" encoding=""utf-8""?>"
, #"<Content type=""System.String"">");
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(goodXML);
XmlNodeList itemRefList = xmlDoc.GetElementsByTagName("Content");
foreach (XmlNode xn in itemRefList)
{
Console.WriteLine(xn.InnerXml);
}
}
}
Output:
<Data><Items><Item><surname type="System.String">Some Surname</surname><name type="System.String">Some Name</name></Item></Items></Data>
Working DotNetFiddle: https://dotnetfiddle.net/ShmZCy
Perhaps needless to say: all of this would not have been needed if the thing that created this invalid XML would have applied the common rule to wrap the nested XML in a <![CDATA[ .... ]]> block.
The <?xml ...?> processing declaration is only valid on the first line of an XML document, and so the XML that you've been given isn't well-formed XML. This will make it quite difficult to parse as is without either changing the source document (and you've indicated that's not possible) or preprocessing the source.
You could try:
Stripping out the <?xml ?> instruction with regex or string manipulation, but the cure there may be worse than the disease.
The HTMLAgilityPack, which implements a more forgiving parser, may work with an XML document
Other than that, the producer of the document should look to produce well-formed XML:
CDATA sections can help this, but be aware that CDATA can't contain the ]]> end tag.
XML escaping the XML text can work fine; that is, use the standard routines to turn < into < and so forth.
XML namespaces can also help here, but they can be daunting in the beginning.
I currently have an XML file that is rather large in size (roughly 800MB). I've tried some attempts (here is one dealing with compression) to work with it in its current condition; however, they haven't been very successful as they take quite some time.
The XML file structure is similar to below (the generation pre-dates me):
<Name>Something</Name>
<Description>Some description.</Description>
<CollectionOfObjects>
<Object>
<Name>Name Of Object</Name>
<Description>Description of object.</Description>
<AltName>Alternate name</AltName>
<ContainerName>Container</ContainerName>
<Required>true</Required>
<Length>1</Length>
<Info>
<Name>Name</Name>
<File>Filename</File>
<Size>20</Size>
<SizeUnit>MB</SizeUnit>
</Info>
</Object>
</CollectionOfObjects>
There is quite a large chunk of data under each object, and a lot of these child nodes can be made into attributes on their parents:
<CollectionOfObjects Name="Something" Description="Some description.">
<Object Name="Name Of Object" AltName="Alternate name" Container="Container" Required="true" Length="1" Description="Description of object.">
<Info Name="Name" File="Filename" Size="20" SizeUnit="MB" />
</Object>
</CollectionOfObjects>
Now, obviously not everything under each node will become an attribute; the above is just an example. There is so much data in this file it breaks Notepad and takes Visual Studio approximately 2 minutes to even open. Heaven helps you if you try to search the file because it takes an hour or longer.
You can see how this is problematic. I've done a test on the size difference (obviously not with this file) but with a demo file. I created a file and converted unnecessary child nodes into attributes and it reduced the demo files size by 53%. I have no doubt in my mind that performing the same work on this file will reduce its size by 30% or more (hoping for the more).
Now that you understand the why, let's get to the question; how do I move these child nodes to attributes. The file is generated via XmlSerializer and uses reflection to build the nodes based on the classes and properties available:
internal class DemoClass {
[CategoryAttribute("Properties"), DescriptionAttribute("The name of this object.")]
public string Name { get; set; }
}
internal bool Serialize(DemoClass demo, FileStream fs) {
XmlSerializer serializer = new XmlSerializer(typeof(DemoClass));
XmlWriterSettings settings = null;
XmlWriter writer = null;
bool result = true;
try {
settings = new XmlWriterSettings() {
Indent = true,
IndentChars = ("\t"),
Encoding = Encoding.UTF8,
NewLineOnAttributes = false,
NewLineChars = Environment.NewLine,
NewLineHandling = NewLineHandling.Replace
};
writer = XmlWriter.Create(fs, settings);
serializer.Serialize(writer, demo);
} catch { result = false; } finally { writer.Close(); }
return result;
}
It is my understanding that I can just add the XmlAttribute tag to it and it will write all future versions of the file with that tag as attributes; however, I was told that in order to convert the data from the old way to the new way I may need some kind of "binder" which I am unsure of.
Any recommendations are going to be helpful here.
NOTE: I know the following can be done to reduce file size as well (dropped by 28%):
Indent = false,
Encoding = Encoding.UTF8,
NewLineOnAttributes = false,
Update: I am currently attempting to simply use the XmlAttribute tag on properties and I've encountered an error (which I expected) where the reflection failed on deserialization:
There was an error reflecting type DemoClass.
Update 2: Now working a new angle here; I've decided to copy all of the needed classes, update them with the XmlAttribute tag; then load the old file with the old classes and write the new file with the new classes. If this works then it'll be a great workaround. However, I'm sure there's a way to do this without this workaround.
Update 3: The method in Update 2 (above) did not work the way I expected and I ended up encountering this issue. Since this approach is also heavily involved, I ended up writing a custom conversion method that used the original serialization to load the XML, then using XDocument from the System.Xml.Linq namespace, I created a new XML document by hand. This ended up being a time consuming task, but less overall change in the long run. It serializes the file in the way expected (with some tweaking here and there of course). The next step was to update the old serialization now that the old files had been converted. I've made it approximately 80% of the way through this process, still hitting some road bumps here and there with reflection:
The type for XmlAttribute may not be specified for primitive types.
This occurs when attempting to de-serialize an enum value. The serializer seems to believe it is a string value instead.
here's the code that worked for me.
static void Main()
{
var element = XElement.Load(#"C:\Users\user\Downloads\CollectionOfObjects.xml");
ElementsToAttributes(element);
element.Save(#"C:\Users\user\Downloads\CollectionOfObjects-copy.xml");
}
static void ElementsToAttributes(XElement element)
{
foreach(var el in element.Elements().ToList())
{
if(!el.HasAttributes && !el.HasElements)
{
var attribute = new XAttribute(el.Name, el.Value);
element.Add(attribute);
el.Remove();
}
else
ElementsToAttributes(el);
}
}
The Xml in CollectionOfObjects.xml
<CollectionOfObjects>
<Name>Something</Name>
<Description>Some description.</Description>
<Object>
<Name>Name Of Object</Name>
<Description>Description of object.</Description>
<AltName>Alternate name</AltName>
<ContainerName>Container</ContainerName>
<Required>true</Required>
<Length>1</Length>
<Info>
<Name>Name</Name>
<File>Filename</File>
<Size>20</Size>
<SizeUnit>MB</SizeUnit>
</Info>
</Object>
</CollectionOfObjects>
The result Xml in CollectionOfObjects-copy.xml
<?xml version="1.0" encoding="utf-8"?>
<CollectionOfObjects Name="Something" Description="Some description.">
<Object Name="Name Of Object" Description="Description of object." AltName="Alternate name" ContainerName="Container" Required="true" Length="1">
<Info Name="Name" File="Filename" Size="20" SizeUnit="MB" />
</Object>
</CollectionOfObjects>
I have a xsd file, defined by an external company, that I used with xsd.exe to generate classes. I can use a provided xml file to deserialize into an object using the generated classes just fine, but there are a few cases where I need to have smaller portions of the xml as a XDocument. I won't know the path in these portions until run time, so I'm using the xml for:
XElement element = xml.XPathSelectElement(path);
The issue I'm having is that serialized result doesn't match the incoming xml quite right, which makes the select return null. How do I get a serialized object to look like the incoming file? Did I possibly generate the classes incorrectly with xsd.exe? I'll eventually need to use the same generated code to generate my own xml files.
Here's the code I'm currently using to serialize
var xml = new XDocument();
using (var writer = xml.CreateWriter())
{
List<Type> known = new List<Type>();
known.Add(typeof(ObjType1));
...
var serializer = new DataContractSerializer(typeof(Detail), known);
serializer.WriteObject(writer, sourceDetailObj);
}
The serialized result:
<Detail xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/CustomNameSpace">
...
<numberField>1</numberField>
<detailTypeField>
<objField i:type="ObjType1">
<valObjField i:nil="true" />
...
</objField>
</detailTypeField>
...
</Detail>
What it should look like:
<Detail>
...
<Number>1</Number>
<DetailType>
<ObjType1>
...
</ObjType1>
</DetailType>
...
</Detail>
Here's one of the classes xsd generates:
public partial class DetailType {
private object objField;
[System.Xml.Serialization.XmlElementAttribute("ObjType1", typeof(ObjType1))]
...
public object Obj {
get {
return this.objField;
}
set {
this.objField = value;
}
}
}
Obj can be one of several classes.
The problem with using a DataContractSerializer is that it is optimized for sending messages between WCF services and won't necessarily produce the same "classic" xml that the XmlSerializer does.
In particular, XmlSerializer will serialize all public members unless you tell it not to, but for DataContractSerializer it won't serialize unless you tell it to. This was done to help make WCF faster; you only get what you ask for.
So, if you're not generating XML for WCF services, I suggest that you use the XmlSerialiser instead.
I started with three (3) XSD files provided from an external party (one XSD links to the other two). I used the xsd.exe tool to generate a .NET object by running the following command: xsd.exe mof-simpleTypes.xsd mof-isp.xsd esf-submission.xsd /c and it generated a single CS file with a handful of partial objects.
I've created an XmlSerializerNamespaces object and fill with the namespaces required (two directly used in the provided sample XML file as well as two others that don't appear to be referenced). I have successfully generated an XML file using the following method:
private XmlDocument ConvertEsfToXml(ESFSubmissionType type)
{
var xml = new XmlDocument();
var serializer = new XmlSerializer(type.GetType());
string result;
using (var writer = new Utf8StringWriter()) //override of StringWriter to force UTF-8
{
serializer.Serialize(writer, type, _namespaces); //_namespaces object holds all 4 namespaces
result = writer.ToString();
}
xml.LoadXml(result);
return xml;
}
My problem that I'm facing is in the generated CS file, one of the objects has a property (another generated partial object) that is of type XmlElement. I have successfully built the object in code, and I'm having an issue converting the object to an XmlElement. The questions and answers I have found here on SO say convert it to an XmlDocument first and then take the DocumentElement property. This works, however the returned XML has namespaces embedded in the element as follows:
<esf:ESFSubmission xmlns:isp="http://www.for.gov.bc.ca/schema/isp" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:esf="http://www.for.gov.bc.ca/schema/esf">
<esf:submissionMetadata>
<esf:emailAddress>test#test.com</esf:emailAddress>
<esf:telephoneNumber>1234567890</esf:telephoneNumber>
</esf:submissionMetadata>
<esf:submissionContent>
<isp:ISPSubmission xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:esf="http://www.for.gov.bc.ca/schema/esf" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:isp="http://www.for.gov.bc.ca/schema/isp">
<isp:ISPMillReport>
<isp:reportMonth>12</isp:reportMonth>
<isp:reportYear>2014</isp:reportYear>
<isp:reportComment>comment</isp:reportComment>
<isp:ISPLumberDetail>
<isp:species>FI</isp:species>
Note: this is just a partial of the generated XML file (for illustration purposes).
As you can see, each XML node is prefixed with the namespace variable. My question is: how can I do this in code? Is my approach sound and if so, then how do NOT include the namespaces in the ISPSubmission node OR if there is a better way to approach this problem that I overlooked, please provide insight. My desired outcome is to have all namespace definitions at the top of the document (their appropriate location) and not on the sub elements - as well as maintain the namespace variables on each element as illustrated above.
EDIT (after reggaeguitar's comment)
Here is the sample XML document I was provided
<?xml version="1.0" encoding="UTF-8"?>
<esf:ESFSubmission xmlns:esf="http://www.for.gov.bc.ca/schema/esf"
xmlns:isp="http://www.for.gov.bc.ca/schema/isp" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.for.gov.bc.ca/schema/esf esf-submission.xsd
http://www.for.gov.bc.ca/schema/isp mof-isp.xsd">
<esf:submissionMetadata>
<esf:emailAddress>mailto:eric.murphy#cgi.com</esf:emailAddress>
<esf:telephoneNumber>6044445555</esf:telephoneNumber>
</esf:submissionMetadata>
<esf:submissionContent>
<isp:ISPSubmission>
<isp:ISPMillReport>
<isp:reportMonth>06</isp:reportMonth>
<isp:reportYear>2014</isp:reportYear>
<isp:reportComment>Up to 4000 characters is permitted for notes in this element.</isp:reportComment>
<isp:ISPLumberDetail>
<isp:species>FI</isp:species>
<isp:lumberGrade>EC</isp:lumberGrade>
<isp:gradeDescription/>
<isp:size>2x4</isp:size>
<isp:finishType/>
<isp:length>10</isp:length>
<isp:thickWidthUom>IN</isp:thickWidthUom>
<isp:volumeUnitOfMeasure>MBM</isp:volumeUnitOfMeasure>
<isp:volume>11543.987</isp:volume>
<isp:amount>1467893.98</isp:amount>
<isp:invoiceNumber>837261</isp:invoiceNumber>
</isp:ISPLumberDetail>
<isp:ISPLumberDetail>
<isp:species>CE</isp:species>
<isp:lumberGrade/>
<isp:gradeDescription/>
<isp:size/>
<isp:finishType>D</isp:finishType>
<isp:thickness>40</isp:thickness>
<isp:width>100</isp:width>
<isp:thickWidthUom>MM</isp:thickWidthUom>
<isp:volumeUnitOfMeasure>MBM</isp:volumeUnitOfMeasure>
<isp:volume>9743.987</isp:volume>
<isp:amount>1247893.98</isp:amount>
<isp:invoiceNumber/>
</isp:ISPLumberDetail>
<isp:ISPChipDetail>
<isp:species>CE</isp:species>
<isp:unitOfMeasure>BDT</isp:unitOfMeasure>
<isp:wholeLogInd>N</isp:wholeLogInd>
<isp:destinationCode>FBCO</isp:destinationCode>
<isp:destinationDescription/>
<isp:volume>563</isp:volume>
<isp:amount>54463</isp:amount>
<isp:invoiceNumber>12345679</isp:invoiceNumber>
</isp:ISPChipDetail>
</isp:ISPMillReport>
<isp:ISPSubmitter>
<isp:millNumber>103</isp:millNumber>
<isp:contactName>Dave Marotto</isp:contactName>
<isp:contactEmail>eric.murphy#cgi.com</isp:contactEmail>
<isp:contactPhone>2507775555</isp:contactPhone>
<isp:contactPhoneExtension>1234</isp:contactPhoneExtension>
</isp:ISPSubmitter>
</isp:ISPSubmission>
</esf:submissionContent>
</esf:ESFSubmission>
Solved my problem by doing the whole thing in code and not even using the xsd.exe to generate a .NET object.
I am querying a soap based service and wish to analyze the XML returned however when I try to load the XML into an XDoc in order to query the data. am getting an 'illegal characters in path' error message? This (below) is the XML returned from the service. I simply want to get the list of competitions and put them into a List I have setup. The XML does load into an XML Document though so must be correctly formatted?.
Any advice on the best way to do this and get round the error would be greatly appreciated.
<?xml version="1.0" ?>
- <gsmrs version="2.0" sport="soccer" lang="en" last_generated="2010-08-27 20:40:05">
- <method method_id="3" name="get_competitions">
<parameter name="area_id" value="1" />
<parameter name="authorized" value="yes" />
<parameter name="lang" value="en" />
</method>
<competition competition_id="11" name="2. Bundesliga" soccertype="default" teamtype="default" display_order="20" type="club" area_id="80" last_updated="2010-08-27 19:53:14" area_name="Germany" countrycode="DEU" />
</gsmrs>
Here is my code, I need to be able to query the data in an XDoc:
string theXml = myGSM.get_competitions("", "", 1, "en", "yes");
XmlDocument myDoc = new XmlDocument();
MyDoc.LoadXml(theXml);
XDocument xDoc = XDocument.Load(myDoc.InnerXml);
You don't show your source code, however I guess what you are doing is this:
string xml = ... retrieve ...;
XmlDocument doc = new XmlDocument();
doc.Load(xml); // error thrown here
The Load method expects a file name not an XML itself. To load an actual XML, just use the LoadXml method:
... same code ...
doc.LoadXml(xml);
Similarly, using XDocument the Load(string) method expects a filename, not an actual XML. However, there's no LoadXml method, so the correct way of loading the XML from a string is like this:
string xml = ... retrieve ...;
XDocument doc;
using (StringReader s = new StringReader(xml))
{
doc = XDocument.Load(s);
}
As a matter of fact when developing anything, it's a very good idea to pay attention to the semantics (meaning) of parameters not just their types. When the type of a parameter is a string it doesn't mean one can feed in just anything that is a string.
Also in respect to your updated question, it makes no sense to use XmlDocument and XDocument at the same time. Choose one or the another.
Following up on Ondrej Tucny's answer :
If you would like to use an xml string instead, you can use an XElement, and call the "parse" method. (Since for your needs, XElement and XDocument would meet your needs)
For example ;
string theXML = '... get something xml-ish...';
XElement xEle = XElement.Parse(theXML);
// do something with your XElement
The XElement's Parse method lets you pass in an XML string, while the Load method needs a file name.
Why not
XDocument.Parse(theXml);
I assume this will be the right solution
If this is really your output it is illegal XML because of the minus characters ('-'). I suspect that you have cut and pasted this from a browser such as IE. You must show the exact XML from a text editor, not a browser.