How to read only a small part of a .XML - c#

I built an application in order to read a file, but even with the fact that my connection is fast, the page takes several seconds to load, I would like to know how to read only the first records of this .xml
string rssURL = "http://www.cnt.org.br/Paginas/feed.aspx?t=n";
System.Net.WebRequest myRequest = System.Net.WebRequest.Create(rssURL);
System.Net.WebResponse myResponse = myRequest.GetResponse();
System.IO.Stream rssStream = myResponse.GetResponseStream();
System.Xml.XmlDocument rssDoc = new System.Xml.XmlDocument();
rssDoc.Load(rssStream);
System.Xml.XmlNodeList rssItems = rssDoc.SelectNodes("rss/channel/item");
Tks..

As the fore posters mention you can’t download part of a web request. But you can start parsing Xml before the request finished. Using XmlDocument is the wrong approach for your use case, because it needs the complete request to create the object. Try using XmlTextReader.

There is no easy way to download part of a web request and ensure it is what you want. One workaround would be to use the Google Feed API.
You'd have to use the JSON interface since they don't provide a library for C#, but since it's going through Google's servers it will be much faster. You'd have to modify your code a little bit, since it returns JSON by default instead of XML, but that is a trivial change to make. You can also change the parameter output=xml to retrieve the XML representation of the data.
Try going to this page, that is your same feed, with fewer elements and loads much faster. That only returns a few elements, but if you want 10 elements, all you have to do is add num=10 to the URL. For example, this url has 10 elements. Read the API documentation a little more to see what variables you can add to cater the request to what you want to do.

Related

Walmart API POST failing with 400 Bad Request (inventory feed) ARCA

I am having problems with a POST request to the Walmart Marketplace API for bulk data exchange and am hoping for some help.
Background:
I have been successful in writing signature authentication routines, and can successfully execute GET commands such as get products etc. This indicates to me that Authentication signatures are properly formatted, and headers (for the most part) are correct.
Problem:
I am receiving a 400 Bad Request response, Request Content is Invalid. response when attempting to submit a test feed to Walmarts API. I have read that this problem is common, but I have yet to find any forum post that clearly explains the actual problem, or how to fix it. Here are my current parameters:
ARCA
ARCA Rest Client For Chrome
URL:
https://marketplace.walmartapis.com/v2/feeds?feedType=inventory
Headers:
Accept: application/xml
WM_SVC.NAME: Walmart Marketplace
WM_CONSUMER.ID: <Consumer ID>
WM_SEC.AUTH_SIGNATURE: <Good Auth Signature>
WM_QOS.CORRELATION_ID: 15649814651
WM_SEC.TIMESTAMP: <Timestamp>
WM_CONSUMER.CHANNEL.TYPE: <Channel Type>
Content-Type: multipart/form-data
File attachment (not raw payload although that has been tried)
<?xml version="1.0" encoding="utf-8"?>
<InventoryFeed xmlns="http://walmart.com/">
<InventoryHeader>
<version>1.4</version>
</InventoryHeader>
<inventory>
<sku>KON04418</sku>
<quantity>
<unit>EACH</unit>
<amount>4</amount>
</quantity>
<fulfillmentLagTime>1</fulfillmentLagTime>
</inventory>
</InventoryFeed>
When I take this exact same XML and test it at Walmart API Explorer
the file is accepted with Response Code 200 (OK).
I have validated with Notepad++ XML Tools plugin that the XML conforms to the XSD provided by Walmart. I have seen numerous posts regarding Boundaries that need to be applied, so I have additionally attempted to change the Content-Type, and add Boundaries but have been unsuccessful in getting the request accepted.
Any help in getting this request to return a response code 200 would be greatly appreciated.
Lastly, once this request validates in ARCA, I will be implementing in C#. I already have all of the code written, but there's a bit of fuzziness about how to add an attachment to an HttpWebRequest vs. just submitting a raw data stream. If any clarity could be provided on the difference I would again, appreciate it.
So this answer isn't clean and elegant, more of a work around than anything. I have spoken with a few individuals inside the Walmart engineering team and have been told that a C# SDK should be forthcoming in the next few months.
After all of my research, it appears there is some tricks to how you submit a multi-part form to Walmart and the system is very inflexible. I've seen posts about adding specifically formatted boundaries into the body of the HTTP request, but had no such luck. I was unable to attach the body as a file, or as a data stream to the request.
The work around is pretty simple, and ugly unfortunately. It takes a bit of setup, but you can create a .jar wrapper around the Walmart Java SDK and call it from your .Net program.
So.. steps in the process:
Grab the appropriate .XSD files and generate C# classes from them.
Build properly formatted XML inventory file. Make sure to include namespaces!! Walmart will fail particular calls if you don't include the appropriate ns2 / ns3 namespaces.
Dynamically generate a batch file to call your Java module. Spawning a command line process directly seemed to make things cranky for some reason so I opted for the batch file instead.
string path = #Directory.GetParent(Environment.CurrentDirectory).ToString();
if (File.Exists(#"../inventory.bat"))
{
File.Delete(#"../inventory.bat");
}
string batchCommand = #"cd " + path + Environment.NewLine + #"java -jar WalmartWrapper.jar SubmitInventoryFeed inventoryFeed.xml";
File.WriteAllText(path + #"\\inventory.bat", batchCommand);
ProcessStartInfo info = new ProcessStartInfo();
info.UseShellExecute = true;
info.FileName = #"inventory.bat";
info.WorkingDirectory = path;
var p = Process.Start(info);
p.WaitForExit();`
From here, the Java module takes over. It took a bit of hacking around to make it work more like an SDK and less like a sample program.. Here's some of the sample code for making things work..
Entry Point
if ("SubmitInventoryFeed".equals(args[0].trim())) {
if (args.length < 2)
{
System.out.println("Need second argument for SubmitInventoryFeed");
return;
}
String filename = args[1];
Feed inventoryFeed = new Feed();
try
{
inventoryFeed.submitInventoryFeed(filename);
} catch (Exception ex) {
System.out.println("submitInventoryFeed failed: " + ex.getMessage());
}
}
SDK Call (This is the bare bones of submitInventoryFeed without error checking)
String path = Paths.get(".").toAbsolutePath().normalize().toString();
File itemFile = FileHandler.getFile(filename.trim());
String filePath = path + "\\" + "MarketplaceClientConfig.properties";
WalmartMarketplace wm = Utils.getClient(filePath);
Response response = wm.submitFeed(MarketplaceFeedType.inventory, itemFile);
You can use ResponseChecker.isResponse200(response, true) to test for successful submissions
Use FeedAcknowledgement ack = response.readEntity(FeedAcknowledgement.class); to grab the actual response to check errors
I will be the first to say I can't wait to replace this work around with the C# SDK pending from Walmart, but for the time being, this is the only way I have been able to submit. I've looked through the walmart code in depth, but unfortunately, there is some Java magic happening under the hood that does the file attachment so there's not really any way to gain access to the exact procedure and reverse engineer for C#. I think that someone who really knew Java inside and out could figure it out, but I have enough Java background to allow me to cobble together a workable, albeit ugly solution.

Parsing XML from webbrowser control?

I need to parse a XML file (which is generated by PHP) from a webbrowser control as the page I am trying to parse requires cookies to track some things. When I use something like:
string xmlURL = "urltophpfile";
XmlTextReader reader = null;
reader = new XmlTextReader(xmlUrl);
to parse it, cookies aren't enabled so I need to use a webbrowser control or something which will allow me to use cookies.
The problem I am having is that when I try to put the webbrowser text to a string (string info = webBrowser2.DocumentText.ToString(); it gives the full source of the web page and therefore I can't parse it.
Does anyone have any suggestions on how I can work this out please?
You should use HttpWebRequest and specify the CookieContainer property.
This URL has a good example of it: http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx
EDIT: To be clear, I mean use HttpWebRequest to fetch the XML, and then load the xml using XmlTextReader.Create with one of the overloads that supports a stream or direct string input.

How to get raw page source (not generated source) from c#

The goal is to get the raw source of the page, I mean do not run the scripts or let the browsers format the page at all. for example: suppose the source is <table><tr></table> after the response, I don't want get <table><tbody><tr></tr></tbody></table>, how to do this via c# code?
More info: for example, type "view-source:http://feeds.gawker.com/kotaku/full" in the browser's address bar will give u a xml file, but if you just call "http://feeds.gawker.com/kotaku/full" it will render a html page, what I want is the xml file. hope this is clear.
Here's one way, but it's not really clear what you actually want.
using(var wc = new WebClient())
{
var source = wc.DownloadString("http://google.com");
}
If you mean when rendering your own page. You can get access the the raw page content using a ResponseFilter, or by overriding page render. I would question your motives for doing this though.
Scripts run client-side, so it has no bearing on any c# code.
You can use a tool such as Fiddler to see what is actually being sent over the wire.
disclaimer: I think Fiddler is amazing

Read only the title and/or META tag of HTML file, without loading complete HTML file

Scenario :
I need to parse millions of HTML files/pages (as fact as I can) & then read only only Title or Meta part of it & Dump it to Database
What I am doing is using System.Net.WebClient Class's DownloadString(url_path) to download & then Saving it to Database by LINQ To SQL
But this DownloadString function gives me complete html source, I just need only Title part & META tag part.
Any ideas, to download only that much content?
I think you can open a stream with this url and use this stream to read the first x bytes, I can't tell the exact number but i think you can set it to reasonable number to get the title and the description.
HttpWebRequest fileToDownload = (HttpWebRequest)HttpWebRequest.Create("YourURL");
using (WebResponse fileDownloadResponse = fileToDownload.GetResponse())
{
using (Stream fileStream = fileDownloadResponse.GetResponseStream())
{
using (StreamReader fileStreamReader = new StreamReader(fileStream))
{
char[] x = new char[Number];
fileStreamReader.Read(x, 0, Number);
string data = "";
foreach (char item in x)
{
data += item.ToString();
}
}
}
}
I suspect that WebClient will try to download the whole page first, in which case you'd probably want a raw client socket. Send the appropriate HTTP request (manually, since you're using raw sockets), start reading the response (which will not be immediately) and kill the connection when you've read enough. However, the rest will have probably already been sent from the server and winging its way to your PC whether you want it or not, so you might not save much - if anything - of the bandwidth.
Depending on what you want it for, many half decent websites have a custom 404 page which is a lot simpler than a known page. Whether that has the information you're after is another matter.
You can use the verb "HEAD" in a HttpWebRequest to return the the response headers (not element. To get the full element with the meta data you'll need to download the page and parse out the meta data you want.
System.Net.WebRequest.Create(uri) { Method = "HEAD" };

How to cache XML url?

I am retrieving an XML string through a URL. My code works great, but I do not know how to add caching to it. I am not sure if I am able to cache XML streams or if that is even the right approach. What is the best way to add caching here?
XmlTextReader xmlTextReader = new XmlTextReader(this.RssUrl);
XmlDataDocument xdoc1 = new XmlDataDocument();
xdoc1.DataSet.ReadXml(xmlTextReader, XmlReadMode.InferSchema);
return xdoc1.DataSet.Tables["item"];
You could save the XML together with a timestamp, and if you need to reread the XML, you read from the stored XML until the timestamp gets older than a preconfigured value. Most RSS-readers do this to avoid flooding the RSS-service with requests.
Or, if you control the RSSUrl, you could implement the caching there. That would utilize HTTP caching and the fact that the web server can return 304 Not modified if no new items are added to the feed.
Cache the entire XmlDataDocument. If you only cache the XML, you'll have to parse it all the time.

Categories