I'm looking for a way to save a table from an html page as xml or json. The current method i'm using save the entire page as and xls sheet and then reads the sheet using Office.InterLop.Excel. I want to skip saving the file and just read directly from the page using HttpRequest. Any ideas?
I assume you mean that you'd like to scrape the contents of a web page without File-> Save As?
Code project has a writeup explaining using HttpWebRequest to do just that. Or, you could use the newer HttpClient. Once you retrieve the HTML, you'll have to parse it yourself.
In the MSDN artticle, they're actually requesting JSON directly, so they don't have to deal with parsing, but you could very easily write up a RegularExpression to capture the table body.
Related
I have an api that can pass a search query to a website that I use to lookup products. I use the catalog number to obtain the device identifier. The response that is returned is HTML, and I need to extract one line from the HTML to write to a file. Is it possible to select a specific div in a web api?
My goal is to eventually loop over each product search, pull the one line I need, and then write it to an excel file.
Here is an example of the api searching a product, and the response. api working
Here is the single line of code that I need to extract out of the response, I then want to concatenate it to the url and write the whole link out with each specific device identifier Line of code I need
I hope this makes sense.
This is a parsing problem, and since the file/content you want to extract from is HTML, it would be a straightforward task.
You have three main steps to get this done.
Parse the content, whether it's on the web, or downloaded file.
Use a selector to get the "a" tag you're looking for.
Extract the URL from the "href" attribute from the "a" tag.
I see you're using C#, so I would recommend this library, you will use its parser to parse the file, then the selector along with a CSS selector, to get your data.
Let me know if you still need more details.
I want to convert inforpath to pdf.
I have the url and have an access to the hosted place like :
http://hostserver/PWA/_layouts/15/FormServer.aspx?XmlLocation=/PWA/InfoPath%20Title.xml
Is there a way to read the InfoPath form as xml and converting it to pdf ?
You have to download and parse content of the URL to an object via: XElement, HtmlAgilityPack then create PDF file e.g. using TextWriter.
Depends on how and where you want to deal with it, ABCPdf,ItextSharp along with many other third party tools available let you handle the conversion on server side.
You can use jspdf or equivalent to do it on client side, analyse what you need and choose your tool.
I am attempting to retrieve data that is dynamically loaded onto a webpage using hashed links i.e. http://www.westfield.com.au/au/retailers#page=5
My question is what technology is being used to load the data onto the page?
Secondly, how would one approach retrieving this data using C#?
My attempts so far have used WebClient to download the page at this link, unfortunately the html file only contains the data from the very first page, no matter what page link i use.
What technology is being used to load the data onto the page?
JavaScript is used to load the data from a server, parse it into HTML and put it in the right place in the DOM.
Secondly, how would one approach retrieving this data using C#?
Make a request to: http://www.westfield.com.au/api/v1/countries/au/retail-chains/search.json?page=5, it will return a structured JSON document containing the data you need.
If all you need is the JSON structure, Jon's answer sounds like a good place to start.
If you want a good stack for true rendered scraping I'd use a combination of phantomjs and Selenium to help bridge it to .net.
This article is a great place to start.
I don't know if this is possible or if I am thinking about this in the wrong way, but this is what I want to do:
I have an XML file linked to an XSLT file and I want to use C# to get the output of the transformed XML file and Response.Write() that wherever I want on the page.
I have found questions on stackoverflow about saving the transformed output to a new file etc, but I don't want to save it to a file, I just want to display it with Response.Write() anywhere on my aspx page.
Is there any way to do this in C#?
Any help is appreciated.
Yes, save the transformed file to a MemoryStream (so in memory not the hard disk). You can then output that to a string using a filestrem reader.
Another way of doing it is by using the XML control, it has XML and XSLT properties.
You could save yourself the effort and simply serve up the XML to the browser. As long as the XML document references the URL of the corresponding XSLT document, the browser will render the page for you.
Use HttpResponse.OutputStream as output stream to save transformed file.
I am working on an application which has to retrieve data from a CSV file online
and display it with a click of a button. However, how can I automatically store
the CSV file in a safe place where I can access the information? I am working with Visual Studio C#.
Thank you.
You want to use the WebClient class to make an http request to the server for the csv file. It should read the whole contents as a string which you can then parse and manipulate at your leisure.
http://msdn.microsoft.com/en-us/library/system.net.webclient(VS.100).aspx
Use System.IO.File to write the contents to a file.
http://msdn.microsoft.com/en-us/library/system.io.file.aspx
The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.
The FileHelpers Library
http://www.filehelpers.com/