Loading multiple XDocuments, and working with its documents - c#

I wrote several lines of code but still can't get over this:
I need to load many xml docs from web library. I don't know how many documents there are so I wonder which loop should I use while loading:
XDocument doc = XDocument.Load("http://" + i);
where -i is identifiers number.
I tried loading until i get document without meaningful content (thought it is the end, the rest are empty), but problem is that there is several Xdocs that are empty in the middle of library.
XML with content looks like
<?xml version="1.0" encoding="utf-8"?>
<OP xmlns="" xmlns:xsi="" xsi:schemaLocation="">
<request verb="GR" identifier="53" metadataPrefix="p"></request>
<GR>
<header>
<identifier>53,number of doc...used for counting</identifier>
</header>
<metadata>
<P xmlns="" xsi:schemaLocation="">
<TITLE>title</TITLE>
<CERTIFICATE NAME="different names">
</CERTIFICATE>
<YEAR>
<DATE>2012-10-18T00:00:00Z</DATE>
</YEAR>
<MINIATURE>
<COPY>
<CNAME>Copy name<CNAME>
<FORMAT>obj/max/dxf/3ds/...</FORMAT>
</COPY>
</MINIATURE>
</metadata>
</GR>
</OP>
XML without content
<?xml version="1.0" encoding="utf-8"?>
<OP xmlns="" xmlns:xsi="" xsi:schemaLocation="">
<request verb="GR" identifier="53" metadataPrefix="p"></request>
Furthermore, I need to do some counting like:
Tot.no. of doc,
No. of docs per certificate <CERTIFICATE>
No. of docs for each year <YEAR><DATE>
No of docs for each format <MINIATURE><COPY><FORMAT>
and my output should look like:
<?xml version="1.0" encoding="UTF-8" ?>
<Statistic>
<DocSum>21220</DocSum>
<Certificates>
<Certificate id=”certificateName”>17098</Certificate>
…
<Certificates>
<Years>
<Year year=”2014”>23</Year>
…
</Years>
<Miniature>
<Format post=”obj”>11723</Format>
…
</Miniature>
</Statistic>
If you could give me some help, hints or tips how to deal with it.

The posted answer by smink to the following thread should get you on the right path.
C# HttpWebRequest command to get directory listing
One of the easiest ways to get a list of the files of a web directory without knowing exactly how many there are or their filenames is by parsing the html of the directory and pulling out the tags.
You can then iterate through these tags and filter them out for the files by extensions that you need. I can provide a more in-depth example if necessary.

Related

Removing Attribute value based on value from an XML using VB.Net

I have an XML as below
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope
xmlns="http://com/uhg/uht/uhtSoapMsg_V1"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Header>
<uhtHeader
xmlns="http://com/uhg/uht/uhtHeader_V1">
<consumer>COMET</consumer>
<auditId></auditId>
<sendTimestamp>2020-09-03T18:15:40.942-05:00</sendTimestamp>
<environment>P</environment>
<businessService version="24">getClaimHistory</businessService>
<status>success</status>
</uhtHeader>
</env:Header>
<env:Body>
<srvcRspn
xmlns="http://com/uhg/uht/getClaimHistory_V24">
<srvcErrList arrayType="srvcErrOccur[1]" type="Array">
<srvcErrOccur>
<orig>Foundation</orig>
<rtnCd>00</rtnCd>
<explCd>000</explCd>
<desc></desc>
</srvcErrOccur>
</SrvcErrList>
</srvcRspn>
</env:Body>
</env:Envelope>
I want to remove all the attribute values with "http" like below:
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope
xmlns=""
xmlns:env="">
<env:Header>
<uhtHeader
xmlns="">
<consumer>COMET</consumer>
<auditId></auditId>
<sendTimestamp>2020-09-03T18:15:40.942-05:00</sendTimestamp>
<environment>P</environment>
<businessService version="24">getClaimHistory</businessService>
<status>success</status>
</uhtHeader>
</env:Header>
<env:Body>
<srvcRspn
xmlns="">
<srvcErrList arrayType="srvcErrOccur[1]" type="Array">
<srvcErrOccur>
<orig>Foundation</orig>
<rtnCd>00</rtnCd>
<explCd>000</explCd>
<desc></desc>
</srvcErrOccur>
</SrvcErrList>
</srvcRspn>
</env:Body>
</env:Envelope>
I have tried several ways but none of them has worked for me. Can anyone suggest what is fastest way to do it in VB.NET/C#.
The actual response is very large (approx 100000 lines of XML minimum) and using for each will consume a good amount of time. Is there any parsing method or LINQ query method which can do it faster.
I got the way to do it using Regex as below:
Return Regex.Replace(xmlDoc, "((?<=<|<\/)|(?<= ))[A-Za-z0-9]+:| xmlns(:[A-Za-z0-9]+)?="".*?""", "")
It serves my purpose completely. Thanks Cleptus for your quick reference.

Remove <?xml version="1.0" encoding="UTF-8"?> from string c#

i want to know how to remove :
<?xml version="1.0" encoding="UTF-8"?>
from a string data.
I have tried this but it doesn't work
string result = data.Replace("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>", "");
(I am not working with xml , it's just a response to manipulate it without header )
Let's look at your two strings. Removing the escapes they are:
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8" ?>
In other words you've managed to add an extra space. Remove that and your code will succeed.
More broadly, one wonders why you are attempting to do this. Simple text processing of XML files is liable to lead to pain and suffering. Perhaps you should consider using a parser.

Adding an XML file to your XNA project

I'm creating an XNA game. I've made it so I can specify all the level details in an XML file which is then de-serialized and used to set up the level details.
At the moment, it's just referencing a file on my computer - my question is, how do I reference this more generically?
Adding the xml in my content folder created a multitude of complaints about schemas and such like, which made me think that likely wasn't the correct route.
Any suggestions?
I tried removing all the entries from the XNA, this gives:
Attempt to access the method failed: System.IO.StreamReader..ctor(System.String)
EDIT:
The xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<XnaContent>
<Asset Type = "RDrop.Level[]">
<Item>
(stuff)
</Item>
<Item>
(stuff)
</Item>
</Asset>
</XnaContent>
EDIT:
I've started a new windows phone project - the previous one wasn't one. I've copied everything over and added "dataTypes" ala this tutorial:
http://msdn.microsoft.com/en-us/library/ff604979.aspx
Game project references -> content, MyDataTypes.
Content references -> MyDataTypes.
The XML is as is in previous edit and is contained in the content folder via Add-> Existing Item-> Level.XML.
Any ideas?
You can leave the build action as "Compile". One method to do what you want is the following:
Create a class that the xml is going to be describing. Example: Level.cs
Then structure your xml file like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<XnaContent>
<Asset Type="The_Level_class_namespace.Level">
<Property1>Value</Property1>
<Property2>Value</Property2>
<Property3>Value</Property3>
<Property4>Value</Property4>
</Asset>
</XnaContent>
if you want the xml to describe an array of objects you can do structure the xml like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<XnaContent>
<Asset Type="The_Level_class_namespace.Level[]">
<Item>
<Property1>Value</Property1>
<Property2>Value</Property2>
<Property3>Value</Property3>
<Property4>Value</Property4>
</Item>
</Asset>
</XnaContent>
From there you just need to make sure your values are in the proper format. For example a vector2 object would be like this:
<Vector2Property>x_value y_value</Vector2Property>
Make sure that your content project references the game project or library project.
Hope this helps :)
Open the properties of your XML document (right click in your content folder). You can set the Build Action to : None.
That way, the compiler won't analyse your schema, thus it won't produce any warnings.
(I'm not entirely sure about this, just my first guess)

Ill Formed XML Code

I currently try updating product data with the Amazon MWS and the Feeds API. My problem: Updating the Inventory and setting a new quantity for my products resolves in errors like this:
The XML you submitted is ill-formed at the Amazon Envelope XML level
at (or near) line X, column Y.
On the other hand, I export nearly the same XML to update the prices. That works just fine...
Here is an example of the XML that i upload to the Feeds API to update the quantity:
<?xml version="1.0" encoding="utf-8"?>
<AmazonEnvelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" noNamespaceSchemaLocation="amznenvelope.xsd">
<Header>
<DocumentVersion>1.01</DocumentVersion>
<MerchantIdentifier>{SellerID}</MerchantIdentifier>
</Header>
<MessageType>Inventory</MessageType>
<Message>
<MessageID>1</MessageID>
<Inventory>
<SKU>ArtNoXX</SKU>
<Quantity>10</Quantity>
</Inventory>
</Message>
<Message>
<MessageID>2</MessageID>
<Inventory>
<SKU>ArtNoXY</SKU>
<Quantity>23</Quantity>
</Inventory>
</Message>
</AmazonEnvelope>
P.S.: I'm using C# and a XMLDocument to create the XML File...
Edit: The Error is shown multiple times. Only the first and the last 3 lines don't appear in the error log.
Example:
... (or near) line 10, column 16.
That would be
<Inventory>
Regarding to the column, it should be
>
Wrong namespace in your config ?
Yours :
noNamespaceSchemaLocation="amznenvelope.xsd"
Should be:
noNamespaceSchemaLocation="amzn-envelope.xsd"

How to read nested XML using xDocument in Silver light?

Hi currently I have a nested XMl , having the following Structure :
<?xml version="1.0" encoding="utf-8" ?>
<Response>
<Result>
<item id="something" />
<price na="something" />
<?xml version="1.0" encoding="UTF-8" ?>
<DIDL-Lite xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:dlna="urn:schemas-dlna-org:metadata-1-0/">
</Result>
<NumberReturned>10</NumberReturned>
<TotalMatches>10</TotalMatches>
</Response>
Any help on how to read this using Xdocument or XMLReader will be really helpfull.
Thanks,
Subhendu
XDocument and XmlReader are both XML parsers that expect a properly formed XML as input. What you have shown is not a XML file. So the first task would be to extract the nested XML and as this is not valid XML you cannot rely on any parser to do this job. You'll need to resort to string manipulation and or regular expressions.
My suggestion would be to fix the procedure generating this invalid XML in the first place. Another suggestion is to never generate a XML file manually but use an appropriate tool for this (XmlWriter, XDocument, ...)

Categories