Like the statement,
string value = document.forms["sap.client.SsrClient.form"].elements["sapwdssr..requestCounter"].value;
in javascript, is there a corresponding statement to get the value of a particular input element within a particular form in C#?
I can do so by using HTMLDocument and mshtml interface. But that is a rather cumbersome process so if any direct method or property exists it would be great.
I assume you are asking to parse HTML, rather than attempting to do some form of runtime manipulation of a rendered web page, correct?
If that's the case, I highly suggest you look into the HTML Agility Pack, which we have used very successfully to parse HTML as if it were XML. You could do your stuff with a simple XPath query.
Related
I have html code:
<p>Answer1</p>
<h2>Category1</h2>
<p>Answer2</p>
<p>Answer3</p>
I need to do parsing so that each answer (p) belongs to the category(h2) above.
If nothing is above, then the category will be null.
Look like this :
obj1.category = null; obj1.answer = "Answer1";
obj2.category ="Category1"; obj2.answer = "Answer2";
obj3.category ="Category1"; obj3.answer = "Answer3";
I tried to solve this, but it was useless.
Use HTMLAgilityPack. It will parse HTML and allow you do use LINQ to SELECT whatever you need from the DOM structure.
In addition to HTMLAgilityPack, I've also written a light weight HTML parse for C#.
There's no big secret to the technique, but it's sort of detailed work. You just go through the text character by character and pull out HTML elements.
My parser is on Github as HtmlMonkey.
UPDATE:
I just added support for fairly advanced selectors to easily find nodes within a parsed document.
I need to retrieve some info from an html doc since the web service to get a json or an xml is still not ready. Im working with c# and using regular expressions to get the data i need from the html string. I've managed to get the div i want to work with from the whole html string but now i'm having trouble getting the info between the first span tag.
I've attempted to retrieve the data between ; and the first closing span tag but what i really want is the content between the first span tag.
Here's the regular expression i've written so far, but it's not working:
".*;(?<Content>(\r|\n|.)*)</span>"
I also tried this but didnt work either:
"<span class=""type"">(?<Content>(\r|\n|.)*)</span>"
Here is the div i want to retrieve the data from:
<div class="main">ABASASDFÓ 18/06/2014 17:38h Blabla Balbal <span class="type">15.80€ </span>+1.94 % +0.30€ | HOME <SPAN class="type2">11,398.70</span> +0.65 % +74.10</div>
EDIT: I can't use Htmlagilitypack since my client does not want us to use any external library. I've also heard about using the XmlReader but i'm not sure the structure of the html will match an xml one accordingly.
This regex will capture the string:
"<span class=\"type\">(?<Content>([^<]*))</span>"
Although, I agree with other answers, you should use something like Path instead of Regexes for parsing html.
Here's how it is done with a regex in Javascript. You should be able to adapt this for C# pretty easily.
var inner = html.match( /<span class="type"(?:\s+[a-z]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)))*\s*>([\S\s]*)<\/span>/i)[1];
Fiddle: http://jsfiddle.net/GarryPas/uk32r8vz/
You want to use XPath for that. Something like this:
div/span/text()
I understand not wanting some external 3rd party library in your solution, the solution to that is to go fetch the source code of the entire library:
https://htmlagilitypack.codeplex.com/
Now you don't have an external library, you have an internal library and you can use the right tool for the job!
XmlReader is a fairly low-level tool, it could technically do the job for you but what you're more after is "use XmlReader to do XPath" which is talked about here: https://msdn.microsoft.com/en-us/library/ms950778.aspx
The XPathReader class is the result of all that, which has been superseded by LINQ to XML: https://msdn.microsoft.com/en-ca/library/bb387098.aspx
So another option here is to try to use some LINQ to process your HTML file, but that might be tricky since HTML isn't good XML. Still, it's another option if you're looking for those.
I don't know what it called, but i think this is possible
I am looking to write something(don't know the exact name) that will,
go to a webpage and select a value from drop-down box on that page and read values from that page after selection, I am not sure weather it called crawler or activity, i am new to this but i heard long time back from one of my friend this can be done,
can any one please give me a head start
Thanks
You need an HTTP client library (perhaps libcurl in C, or some C# wrapper for it, or some native C# HTTP client library like this).
You also need to parse the retrieved HTML content. So you probably need an HTML parsing library (maybe HTML agility pack).
If the targeted webpage is nearly fixed and has e.g. some comments to ease finding the relevant part, you might use simpler or ad-hoc parsing techniques.
Some sites might send a nearly empty static HTML client, with the actual page being dynamically constructed by Javascript scripts (Ajax). In that case, you are unlucky.
Maybe you want some web service ....
One simple way (but not the most efficient way) is to simply read the webpage as String using the WebClient, for example:
WebClient Web = new WebClient();
String Data = Web.DownloadString("Address");
Now since HTML is simply an XML document you can parse the string to a XDocument and look up the tag that represents the dropdown box. Parsing the string to XDocument is done this way:
XDocument xdoc = XDocument.Pase(Data);
Update:
If you want to read the result of the selected value, and that result is displayed within the page do this:
Get all the items as I explained.
If the page does not make use of models, then you can use your selected value as an argument for example :
www.somepage.com/Name=YourItem?
Read the page again and find the value
Within a c# project I'm sending a WebRequest to a php website, which takes the values and uses a select statement to query the DB and return an HTML page. Since there is only one value that comes back from that query, I need to assign this value to my c# code.
The source of the body-tag of the returned HTML-page (and with the StreamReader in my c#) looks like this:
<table border='1'>
<tr><td>ValueINeed</td></tr>
</table>
How do I access the value inside this in order to assign it to a string in my c# code?
thank you.
If you are the author of the PHP code as well, I would suggest that you make another page that returns json or something instead, this way you would be able to avoid parsing HTML.
But if this really is what you are stuck with, I would suggest that you take a look at Html Agility Pack. Here is another quesiton here on StackOverflow that are about how to use the Html Agility Pack.
If the result is always the same you could just split the string or use regex.
If not you may use a html parser: http://htmlagilitypack.codeplex.com/
I have a big xml string that needs to be displayed as a web page. I can achieve this with xslt. Now the users will make changes to certain attributes of the xml displayed on the web page.
When they are done I need to save it back in the same xml format with the modified values.
Please guide me on what would be the best method to handle this.
using asp.net + c#
I've tried something like this in the past, and resorted to using two separate XSLT sheets, one to transform to (X)HTML, and another to transform the edited one back.
Unfortunately there isn't a 'generic' way of doing it, XSLT is a one way transform; e.g. if a stylesheet disregards an element altogether, there's obviously no way of writing an inverse XSLT that will restore it.
Another possibility is to have your XML->HTML stylesheet generate id attributes on input elements in the HTML, and give the value of that attribute a value that can easily be used as a lookup in your source XML. Then you can probably just iterate through each such element in the HTML, and lookup the related element in the source and replace the value. Or the other way round, go through each element in your source, and find the value in the HTML, either works.
Take a look at this utility:
http://www.chilkatsoft.com/refdoc/csHtmlToXmlRef.html
It might be possible to use XSLT to transform it back to HTML too, but without seeing the markup it's tough to tell.