My HTML string is like this , stored in a variable named sourceCode
<ul class="yom-list col first" style="width:33.333333333333%">
<li class="first">
<a href="/india/andaman-and-nicobar-islands/">
<span>Andaman and Nicobar Islands</span>
</a>
</li>
<li>
<a href="/india/jammu-and-kashmir/">
<span>Jammu and Kashmir</span>
</a>
</li>
<li class="last">
<a href="/india/andhra-pradesh/">
<span>Andhra Pradesh</span>
</a>
</li>
<li>
<a href="/india/jammu-and-kashmir/">
<span>Jammu and Kashmir</span>
</a>
</li>
</ul>
I want to convert it in to a generic List
So that I can access the data inside it in my code like href, name etc..
I have tried something like this
foreach (Match match in Regex.Matches(sourceCode, #"<li><a href=""(?<url>[^""])</a></li>"))
items.Add(new Item()
{
name = match.Groups["span"].Value, // i don't know how to get value inside that span
url = match.Groups["url"].Value,
});
But it does not work, Probably the regex is wrong. Can any one tell me what I am doing wrong?
Note: I can't use HTMLAgilityPack in this project
Try the below regex to get the values between <a href> tag and <span> tag only if it is present inside <li> tag.
/<li>\s*<a href=\"(?<url>[^"]*)\">\s*<span>(?<span>[^<]*)<\/span>/m
DEMO
Your c# code would be,
Regex rgx = new Regex(#"<li>\s*<a href=""(?<url>[^""]*)"">\s*<span>(?<span>[^<]*)</span>");
foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups["url"].Value);
Console.WriteLine(m.Groups["span"].Value);
}
IDEONE
Related
Having an html string like below,
...
<ul class="not-this-class" ...>
... <!-- some "<li> <a href.. > </a> </li>" here -->
</ul>
<ul class="yes-this-class" ...>
<li>
some text 1
</li>
<li>
some text 2
</li>
<li>
some text 3
</li>
...
</ul>
<ul class="not-this-class-either" ...>
... <!-- some "<li> <a href.. > </a> </li>" here -->
</ul>
I can extract the class named yes-this-class using the following code:
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(responseString.ToString());
HtmlNode htmlNode =
document.DocumentNode.SelectSingleNode("//*[#class='yes-this-class']");
then, I do some string manipulation (regular expression) to extract the text below:
some text 1
some text 2
some text 3
How can I extract the result just above, using only HtmlAgilityPack and without using regular expression? I tried something like below but it didn't work.
HtmlNodeCollection htmlNodes =
document.DocumentNode.SelectNodes("//*[#class='yes-this-class']/[#a='href']");
Use the following XPath query:
//ul[#class='yes-this-class']/li//text()
Then run Trim() on each result to remove any leading and trailing whitespace around the string.
Requirement : Get the value 4.030 by locating the node containing 'Last'
<tbody><tr>
<td rowspan="2" class="bg1 W1">
<ul class="UL1">
<li class="LI1 font12_grey W1">Last</li>
</ul>
<ul class="UL1">
<li class="LI2 font28 C bold W1"><span class="pos bold">4.030</span></li>
</ul>
nameNodes = doc.DocumentNode.SelectNodes("//td[text()='Last']/ul/li/span");
foreach (HtmlNode x in nameNodes)
Debug.WriteLine(x.InnerText);
I tried many other ways but still not able to get the 4.030
Appreciate if anyone can help
try this.
nameNodes = doc.DocumentNode.SelectNodes("//*[#class='UL1']/li/span");
foreach (HtmlNode x in nameNodes)
Debug.WriteLine(x.InnerText);
not tested but give it a try!
I have following xpath fetched using firefox xpath plugin
id('some_id')/x:ul/x:li[4]/x:span
using html agility pack I'm able to fetch id('some_id')/x:ul/x:li[4]
htmlDoc.DocumentNode.SelectNodes(#"//div[#id='some_id']/ul/li[4]").FirstOrDefault();
but I dont know how to get this span value.
update
<div id="some_id">
<ul>
<li><li>
<li><li>
<li><li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>
You don't need parse HTML with LINQ2XML, HTMLAgilityPack it's for it and it's more easy to obtain the node in the following way :
var html = #" <div id=""some_id"">
<ul>
<li></li>
<li></li>
<li></li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var value = doc.DocumentNode.SelectSingleNode("div[#id='some_id']/ul/li/span").InnerText;
Console.WriteLine(value);
An alternative approach (without html-agility-pack) would be to use LINQ2XML. You can use the XDocument.Descendants method to take the span element and take it's value:
var xml = #" <div id=""some_id"">
<ul>
<li></li>
<li></li>
<li></li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>";
var doc = XDocument.Parse(xml);
Console.WriteLine(doc.Root.Descendants("span").FirstOrDefault().Value);
The code can be extended to check if the div element has the matching id, using the XElement.Attribute property:
var doc = XDocument.Parse(xml);
Console.WriteLine(doc.Elements("div").Where (e => e.Attribute("id").Value == "some_id").Descendants("span").FirstOrDefault().Value);
One drawback of this solution is that the XML structure (HTML, XHTML) needs to be properly closed or else the parsing will fail.
I'm developing a program in C# and I require some help. I'm trying to create an array or a list of items, that display on a certain website. What I'm trying to do is read the anchor text and it's href. So for example, this is the HTML:
<div class="menu-1">
<div class="items">
<div class="minor">
<ul>
<li class="menu-item">
<a class="menu-link" title="Item-1" id="menu-item-1"
href="/?item=1">Item 1</a>
</li>
<li class="menu-item">
<a class="menu-link" title="Item-1" id="menu-item-2"
href="/?item=2">Item 2</a>
</li>
<li class="menu-item">
<a class="menu-link" title="Item-1" id="menu-item-3"
href="/?item=3">Item 3</a>
</li>
<li class="menu-item">
<a class="menu-link" title="Item-1" id="menu-item-4"
href="/?item=4">Item 4</a>
</li>
<li class="menu-item">
<a class="menu-link" title="Item-1" id="menu-item-5"
href="/?item=5">Item 5</a>
</li>
</ul>
</div>
</div>
</div>
So from that HTML I would like to read this:
string[,] array = {{"Item 1", "/?item=1"}, {"Item 2", "/?item=2"},
{"Item 3", "/?item=3"}, {"Item 4", "/?item=4"}, {"Item 5", "/?item=5"}};
The HTML is an example I had written, the actual site does not look like that.
As others said HtmlAgilityPack is the best for html parsing, also be sure to download HAP Explorer from HtmlAgilityPack site, use it to test your selects, anyway this SelectNode command will get all anchors that have ID and it start with menu-item :
HtmlDocument doc = new HtmlDocument();
doc.Load(htmlFile);
var myNodes = doc.DocumentNode.SelectNodes("//a[starts-with(#id,'menu-item-')]");
foreach (HtmlNode node in myNodes)
{
Console.WriteLine(node.Id);
}
If the HTML is valid XML you can load it using the XmlDocument class and then access the pieces you want using XPaths, or you can use and XmlReader as Adriano suggests (a bit more work).
If the HTML is not valid XML I'd suggest to use some existing HTML parsers - see for example this - that worked OK for us.
You can also use the HtmlAgility pack
I think this case is simple enough to use a regular expression, like <a.*title="([^"]*)".*href="([^"]*)":
string strRegex = #"<a.*title=""([^""]*)"".*href=""([^""]*)""";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = ...;
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Use the groups matched
}
}
I have a ul with 10 li's which the user has populated through jquery sortable. Each li has a hidden span with the ID in it. How do I read the IDs and pass them into my C# code behind when the user clicks an ASP.NET button?
Here is my thought process:
Build a jquery function that loops and retrieves all the IDs and passes them into an array.
Pass only the array back to C#.
In my code behind: read the array and do whatever I need from there.
My problem:
I don't know what jquery is involved to create the array
I don't know how to pass the array to the code behind
I don't know how to read the array in my code behind
Any help is appreciated.
Thanks!
<li class="ui-state-default">
<span id="competencyID">3</span>
<h1><span id="competencyTitle">Comp Title</span></h1>
</li>
<li class="ui-state-default">
<span id="competencyID">18</span>
<h1><span id="competencyTitle">Comp Title</span></h1>
</li>
<li class="ui-state-default">
<span id="competencyID">103</span>
<h1><span id="competencyTitle">Comp Title</span></h1>
</li>
<li class="ui-state-default">
<span id="competencyID">6</span>
<h1><span id="competencyTitle">Comp Title</span></h1>
</li>
<li class="ui-state-default">
<span id="competencyID">25</span>
<h1><span id="competencyTitle">Comp Title</span></h1>
</li>
First you need to make id's unique, I would replace the id's of the <span> tags with classes.
Change:
<li class="ui-state-default">
<span id="competencyID">25</span>
<h1><span id="competencyTitle">Comp Title</span></h1>
</li>
To:
<li class="ui-state-default">
<span class="competencyID">25</span>
<h1><span class="competencyTitle">Comp Title</span></h1>
</li>
For your jQuery you can do something like this:
$('#button_id').bind('click', function () {
var arr = {};//create new object (empty)
$('#ul_id').find('.competencyID').each(function (index, value) {
arr[index] = $(value).text();
});
//send data to server-side script
$.get('path/to_server.file', $.param(arr), function (response) {
//this is the callback function, once your server-side script runs you can output data that you can retrieve here via the response variable
});
});
Here is a jsfiddle of collecting the id's and adding them to a JavaScript object: http://jsfiddle.net/rre47/1/
--UPDATE--
If you want to return an array of ids that can more easily be parsed by server-side scripts you can use the following code:
Change:
var arr = {}; to var arr = {'id' : []};
And Change:
$.param(arr) to decodeURIComponent($.param(arr))
The output will look like this:
id[]=3&id[]=18&id[]=103&id[]=6&id[]=25
jsfiddle of the above code: http://jsfiddle.net/rre47/4/
Take a look at this : http://msdn.microsoft.com/en-us/library/byxd99hx%28v=vs.71%29.aspx
You can do a web method and call it in javascript, when you call it you can pass javascript variable.
You will then get you javascript variables in the code behind.
For the loop you can do :
$('li').each(function(index) {
alert(index + ': ' + $(this).text());
});
You should be using Hidden fields instead of spans to store data that must go to the server:
<input type="hidden" name="competencyTitle" value="Comp Title" />
These hidden fields can be read in the server using the Request.Params property:
var value = Request.Params["competencyTitle"];