How to search through XML to find bad nodes - c#

I have a large XML file (68Mb), I am using SQL Server Business Intelligence Studio 2008 to extract the XML data into a database. There is an error in the XML file some where that prevents it from executing. Possibly a missing tag or something like that. The file is so large I cant manually sort through it looking for the error.
Below is a sample of the the XML schema used.
How can I use XPath to sort through the XML in VS 2012 using C#?
An example would be great!
-<PhoneNumberList>
<PhoneNumber value="1234567890" type="Phone"/>
</PhoneNumberList>
-<YearsOfServiceList>
<YearsOfService experienceInMonths="24" description="SuperAdmin" objectCode="049"/>
</YearsOfServiceList>
</Person>
-<Person dob="1960-01-09T00:00:00" lastName="Smith" middleName="Will" firstName="John" id="9999-9999-9999">
-<SiteList>
-<Site id="2014" siteLongName="HA" siteCode="1255" systemCode="999">
-<StaffPositionList>
<StaffPosition id="73" staffPosition="Administrator"/>
</StaffPositionList>
</Site>
</SiteList>
-<ProgramList>
<Program id="1234" siteLongName="ABC" siteCode="0000" systemCode="205"/>
<Program id="5678" siteLongName="DEF" siteCode="0000" systemCode="357"/>
</ProgramList>
-<TypeList>
<Type Description="Leader" certificateType="D"/>
<Type Description="Professional" certificateType="P"/>
</TypeList>
-<EmailList>
<Email value="jsmith#somesite.com" type="Email"/>
</EmailList>
-<PhoneNumberList>
<PhoneNumber value="1234567890" type="Phone"/>
</PhoneNumberList>
-<YearsOfServiceList>
<YearsOfService experienceInMonths="24" description="SuperAdmin" objectCode="049"/>
</YearsOfServiceList>
</Person>
</PersonList>
</GetPersonDetail>

If you want to do it in code then create an XSD file describing a valid format for the data, embed it as a resource in your app and then use code like this
var errors = new List<string>();
var schemaSet = new XmlSchemaSet();
schemaSet.Add("", XmlReader.Create(new StringReader(Properties.Resources.NameOfXSDResource)));
document.Validate(schemaSet, (sender, args) =>
{
errors.Add(args.Message);
}
);
This will give you a list of validation errors.

You don't need to search "by hand" if you use a competent text editor. NotePad++'s XML plugin, for instance, can determine if your XML as a whole is well-formed or valid, and both instances will provide separate error messages.
If you don't have a schema and the file is well-formed, you can use the CLR's System.XML namespace to read in the document and then iterate through its nodes using LINQ-to-XML, which would allow you to very finely control which nodes go where. With LINQ, you could either create a new XML file with only the valid entries, procedurally correct the invalid entries as you determine where they are, or even just write to your SQL server database directly.
Your troubleshooting process should be something as follows:
Is the XML well-formed? I..e, does it comport to the fundamental rules of XML?
Is the XML valid? I.e., does it have the elements and attributes you expect?
Is your import query accurate?

For things like this I usually have luck checking and fixing the data in Notepad++. Install the XmlTools plugin and that has a menu for checking the xml syntax and tags.
Also, those dashes will give you problems, it's best to save out the xml file directly without copying by hand.

A 68MB XML file is no problem for XML editors such as XMLBlueprint 64-bit (http://www.xmlblueprint.com/) or Stylus Studio (http://www.stylusstudio.com/). Just check the well-formedness of your xml file (F7 in XMLBlueprint) and the editor will display the errors.

Related

Importing XML with 'µ' symbol into excel

I am trying to import an XML file into excel using Data -> Other Sources -> From XML Data import. When the file contains a 'µ' symbol, it gives the following error:
Invalid file reference. The path to the file is invalid, or one or
more of the referenced schemas could not be found.
The XML looks like this:
<root>
<File>
<FileName>Data\7.5 µg_mL Sample.pdf</FileName>
</File>
</root>
If i remove the microgram symbol, it works and Excel imports the data.
I am generating the XML file in .net using XNode.toString(), and if I run the XML through a validator, it returns no errors. It doesn't seem to matter if I put the XML declaration at the top of the file and declare it as UTF-8 or 16 either.
Any pointers welcome, i would ideally like to check for any characters that might cause this problem as i am guessing there are more than just the microgram symbol.
I am passing the XML string to a function that swaps out a custom xml file, i don't seem to have the option to change the file format here..
'Uses Ionic.Zip.ZipFile
Using zip As ZipFile = ZipFile.Read(fileDest)
zip.RemoveEntry(xmlPath)
zip.Save()
zip.AddEntry(xmlPath, customXml)
zip.Save()
End Using
Per the docs for the overload of AddEntry you are using:
The content for the entry is encoded using the default text encoding for the machine
You want this to be UTF-8, so you can use the overload that allows you to specify the encoding:
zip.AddEntry(xmlPath, customXml, Encoding.UTF8);

how to get xml content and edit xml file by using C#

I have an XML file called Emails.xml:
<Root>
<Emails>
<address>dfg#fds.com</address>
</Emails>
<Emails>
<address>adsfZSdf#.com</address>
</Emails>
</Root>
Um...I'm using visual studio, asp.net. I want to get the addresses by using C# code and also edit one or more addresses(e.g. chage "dfg#fds.com" to "ddfla#fds.com").
Furthermore, add new address(es) to this xml file.
(path of xml file is: C:\Documents and Settings\Administrator\Desktop\Emails.xml)
How can I do that?
Use the XmlDocument class. Then you can filter to the addresses with a call to GetElementsByTagName(string). From there, modification should be easy.

Check XML via XSD schemas which are specified in xsi:schemaLocation attribute

Sorry for my English.
C# 4.0, LINQ to XML.
I get XDocument from an XML file, for example:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../../support/localization.xslt"?>
<doc:resources xmlns:doc="http://mea-orbis.com/2012/XMLSchema/localization"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mea-orbis.com/2012/XMLSchema/localization ../../support/localization.xsd">
<!--Заголовки столбцов таблицы-->
<doc:record id="commandName">Команда</doc:record>
<doc:record id="commandNameDescript">Краткое описание</doc:record>
<doc:record id="commandNameNotes">Примечание</doc:record>
<!--******************************************-->
<!--Наименования групп команд-->
<doc:record id="group1">Команды смены кодировок</doc:record>
<!--******************************************-->
<!--Наименования команд, их краткое описание и примечания-->
<doc:record id="dwgconvertName">DWGCONVERT</doc:record>
<doc:record id="dwgconvertKeyWords">кодировка</doc:record>
<doc:record id="dwgconvertDescr">конвертация текущего чертежа (версии AutoCAD до 2011 включительно)</doc:record>
<doc:record id="dwgconvertcpName">DWGCONVERTCP</doc:record>
<doc:record id="dwgconvertcpKeyWords">кодировка</doc:record>
<doc:record id="dwgconvertcpDescr">конвертация текущего чертежа (версии AutoCAD с 2008)</doc:record>
<doc:record id="dwgconvertfilesName">DWGCONVERTFILES</doc:record>
<doc:record id="dwgconvertfilesKeyW">кодировка</doc:record>
<doc:record id="dwgconvertfilesDescr">конвертация выбранных пользователем чертежей</doc:record>
<doc:record id="dwgconvertstrName">DWGCONVERTSTR</doc:record>
<doc:record id="dwgconvertstrKeyW">кодировка</doc:record>
<doc:record id="dwgconvertstrDescr">
конвертация отдельного текстового примитива (примитивов)
из текущего чертежа
</doc:record>
<doc:record id="ns">DWGCONVERT</doc:record>
<doc:record id="arxload">Загрузка всех ARX файлов</doc:record>
<doc:record id="netload">Загрузка всех DLL файлов</doc:record>
</doc:resources>
I need to check XDocument for XSD schema validation. I found two examples in MSDN:
first, second.
But in the samples, the XSD schema is separate from the file. I don't want to do superfluous operations because these schemas are already specified in the xsi:schemaLocation attribute of my XML file.
What is the correct way to execute a check of object XDocument, in which all necessary schemas are already specified in the xsi:schemaLocation attribute?
Regards
This may be a little late, but I found this question, and then I found this answer elsewhere on Stack Overflow: Validating an XML against referenced XSD in C#. I just checked that it worked at least for a locally stored xsd.
Processing of xsi attributes for schema locations is not built in the framework; you will have to do that yourself.
The way I've done it involves the following steps:
reading schemaLocation or noNamespaceSchemaLocation attributes associated with your document root element. This is where you have to come up with your solution that best fits your needs; if you don't care about performance, then you can simply use the DOM based API - it may result in going over the source XML twice: once to parse it into memory, then again to validate it. Or, you use a fast, forward only reader to only read all the attributes of the root node, looking for your xsi: ones, then abandon the reading once past the root element.
Once found, you'll have to parse the attribute values; typically you invoke a string.Split() on whitespace (\t, \r, \n, 0x20), trimming all, discarding empties and making pairs (when namespaces are used). Ultimately, this gives you the list of URIs where your XSDs are located
For each URI, resolve it to an absolute URI, eventually converting any relative using the base absolute URI of your XML file
Build an XmlSchemaSet by adding all the XSDs; compile it and use it for validation by getting a reader from your source XML.

How to validate XML file in C#2.0

I need help on how to validate Xml file simply?
I googled and found some tutorial said about developer can validate XML file based on an exist XSD schema file.(as below snppet).
For my case, I don't have an Xsd file. What can I do? Must I generate an Xsd file with a tool like XSD.exe?
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add("", "c:\mySchema.xsd");
settings.ValidationEventHandler += new ValidationEventHandler(OnValidationError);
XmlReader reader = XmlReader.Create("", settings);
XPathDocument doc = new XPathDocument(reader);
XPathNavigator navigatore = doc.CreateNavigator();
Actually the validation what I need is a very simple usage. Just make sure all the xml listed items/inner sub-items are paired. I will open and write my XML, but my XML can't be written successfully for some reason some time. Then when I load my XML next time, my Application will throw exception. That's why I need validate my xml file before load it.
Appreciated for your comments and suggestions.
If you don't have an xsd you should create one. If you are trying to validate any specific structure this is your best option.
If you just want to make sure a document is made up of valid XML you could get away with not having one but if you care about the schema then you must create a schema definition.
You can write your own XSD or use any number of tools. My advice is to write your own. It's not hard and it's worth knowing how to do.
Here is a link to get you started: http://www.w3schools.com/schema/default.asp

Parse an xml document when namespace is no-longer available

I have a number rather large, complex xml documents that I need to loop through. An xmlns is defined at the top of the document however the url this points to is no longer available.
What's the best way to parse the file to get the important data from it using C#?
I tried to load it into a Dataset but would occasionally receive the errors:
The table (endpoint) cannot be the child table to itself in nested relations.
or
Cannot add a SimpleContent column to a table containing element columns or nested relations.
XPath was my next port of call but I had problems because of the lack of namespace.
I suspect this is seriously limiting my options, but does anyone have any suggestions?
Snippet of the XML document:
<?xml version="1.0" encoding="UTF-8"?>
<cdr:cdr_set xmlns:cdr="http://www.naturalconvergence.com/schema/cdr/v3/cdr">
<!-- Copyright (c) 2001-2009, all rights reserved -->
<cdr:cdr xmlns:cdr="http://www.naturalconvergence.com/schema/cdr/v3/cdr">
<cdr:call_id>2040-1247062136726-5485131</cdr:call_id>
<cdr:cdr_id>1</cdr:cdr_id>
<cdr:status>Normal</cdr:status>
<cdr:responsibility>
<cdr:tenant id="17">
<cdr:name>SpiriTel plc</cdr:name>
</cdr:tenant>
<cdr:site id="45">
<cdr:name>KWS</cdr:name>
<cdr:time_zone>GB</cdr:time_zone>
</cdr:site>
</cdr:responsibility>
<cdr:originator type="sipGateway">
<cdr:sipGateway id="3">
<cdr:name>Audiocodes-91</cdr:name>
</cdr:sipGateway>
</cdr:originator>
<cdr:terminator type="group">
<cdr:group>
<cdr:tenant id="17">
<cdr:name>SpiriTel plc</cdr:name>
</cdr:tenant>
<cdr:type>Broadcast</cdr:type>
<cdr:extension>6024</cdr:extension>
<cdr:name>OLD PMS DDIS DO NOT USE</cdr:name>
</cdr:group>
</cdr:terminator>
<cdr:initiation>Dialed</cdr:initiation>
<cdr:calling_number>02087893850</cdr:calling_number>
<cdr:dialed_number>01942760142</cdr:dialed_number>
<cdr:target>6024</cdr:target>
<cdr:direction>Inbound</cdr:direction>
<cdr:disposition>No Answer</cdr:disposition>
<cdr:timezone>GB</cdr:timezone>
<cdr:origination_timestamp>2009-07-08T15:08:56.727+01:00</cdr:origination_timestamp>
<cdr:release_timestamp>2009-07-08T15:09:26.493+01:00</cdr:release_timestamp>
<cdr:release_cause>Normal Clearing</cdr:release_cause>
<cdr:call_duration>PT29S</cdr:call_duration>
<cdr:redirected>false</cdr:redirected>
<cdr:conference>false</cdr:conference>
<cdr:transferred>false</cdr:transferred>
<cdr:estimated>false</cdr:estimated>
<cdr:interim>false</cdr:interim>
<cdr:segments>
<cdr:segment>
<cdr:originationTimestamp>2009-07-08T15:08:56.727+01:00</cdr:originationTimestamp>
<cdr:initiation>Dialed</cdr:initiation>
<cdr:call_id>2040-1247062136726-5485131</cdr:call_id>
<cdr:originator type="sipGateway">
<cdr:sipGateway id="3">
<cdr:name>Audiocodes-91</cdr:name>
</cdr:sipGateway>
</cdr:originator>
<cdr:termination_attempt>
<cdr:termination_timestamp>2009-07-08T15:08:56.728+01:00</cdr:termination_timestamp>
<cdr:terminator type="group">
<cdr:group>
<cdr:tenant id="17">
<cdr:name>SpiriTel plc</cdr:name>
</cdr:tenant>
<cdr:type>Broadcast</cdr:type>
<cdr:extension>6024</cdr:extension>
<cdr:name>OLD PMS DDIS DO NOT USE</cdr:name>
</cdr:group>
</cdr:terminator>
<cdr:provided_address>01942760142</cdr:provided_address>
<cdr:direction>Inbound</cdr:direction>
<cdr:disposition>No Answer</cdr:disposition>
</cdr:termination_attempt>
</cdr:segment>
</cdr:segments>
</cdr:cdr>
...
</cdr:cdr_set>
Each entry is essentially the same but there are sometimes differences such as some of the fields may be missing, if they aren't required.
These values in an xml file are identifiers, not locators. Unless you are expecting to download a schema, it is not needed at all, and can be "flibble" if needed. I expect the best thing would be to just load it into XmlDocument / XDocument and try to access the data.
For example:
XmlDocument doc = new XmlDocument();
doc.Load("cdr.xml");
XmlNamespaceManager ns = new XmlNamespaceManager(doc.NameTable);
ns.AddNamespace("cdr", "http://www.naturalconvergence.com/schema/cdr/v3/cdr");
XmlElement el = (XmlElement)doc.SelectSingleNode(
"cdr:cdr_set/cdr:cdr/cdr:originator", ns);
Console.WriteLine(el.GetAttribute("type"));
or to loop over the cdr elements:
foreach (XmlElement cdr in doc.SelectNodes("/cdr:cdr_set/cdr:cdr", ns))
{
Console.WriteLine(cdr.SelectSingleNode("cdr:call_id", ns).InnerText);
}
Note that the aliases used in the document are largely unrelated to the aliases used in the XmlNamespaceManager, hence you need to re-declare it. I could have used x as my alias in the C# just as easily.
Of course, if you prefer to work with an object model; run it through xsd (where cdr.xml is your example file):
xsd cdr.xml
xsd cdr.xsd /classes
Now you can load it with XmlSerializer.
alternativley load it into an Xdocument and use linq2XML? ... although you might just get the same error.
I don't know what data you want, so its hard to suggest a query.
I personally prefer the use of XDocument to xmlDocument now in most cases.
the only problem with the automatic generation of an XSD is that it can get your datatypes pretty badly wrong if you are not using a good sized chunk of sample data.

Categories