I am searching through a database to find span tags with video information for the purpose of migration.
My regex works well and I can extract all of the information I need for the most part. The trouble I run into is when the style tag is in a different position than expected. This throws off the expression and results in about 2/3rds of the captures I would expect.
If I try and nest the style capture group inside the main capture group, it fails to capture anything. I also tried using negative/positive lookaheads as well, but it only ever works if I make it an optional capture group. I think the problem is im not nesting it correctly. Most of the related questions give the answer of a negative lookbehind, but my understanding is that's more of a assertion/quantifier.
So how can I always capture the style tag regardless of its position in the span tag?
Regex flavor is .NET (server side)
I have a Regexr setup
/(?<tag><span class='vidly-vid' data-thumb='(?<thumb>http.+\.jpg)'.+aspect-ratio='(?<aspect>\d{1,3}:\d{1,3})'.+sources='\[{"file":.+"(?<src>(?<uri>https:\/\/cf1234.cloudfront\.net\/Vids\/)(?<key>(?<ident>[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}|[a-z0-9]{6})\/(?<mp4>mp4_1080.mp4|mp4_720.mp4|mp4_480.mp4|mp4_360.mp4|mp4.mp4))).+style='(?<style>.+width: (?<width>.+)px.+height: (?<height>.+)px.+)'.+<\/span>)/gmi
Sample Data
All of these should match. The first one does NOT, the other three do.
<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/691DBB43-5EC8-4D57-AF7B-99896D9BD5D1_19127.jpg' data-aspect-ratio='4:3' style='border-width: 0px; width: 352px; height: 240px;' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/6v1j0a/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/6v1j0a/mp4_360.mp4","label":"360p SD"}]'> </span>
<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/b181cfa5-565d-470a-b93a-2610987bb4da_28142.jpg' data-aspect-ratio='160:117' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 600px; height: 480px;'> </span>
<table align="left" border="0" cellpadding="5" cellspacing="5" style="width:600px"> <tbody> <tr> <td><img alt="" src="/content/generator/Course_90016206/Case-10-LMLO_MG_FLAVOR1label.jpg" style="height:497px; width:324px" /></td> <td><span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414_28142.jpg' data-aspect-ratio='146:225' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 324px; height: 500px;'> </span></td> </tr> </tbody> </table>
<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/231913a7-b608-4d8b-9332-64b6840c22f0_28142.jpg' data-aspect-ratio='16:9' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 920px; height: 520px;'> </span>
I'd personally just split up the regex into more manageable chunks, like so:
var spanRegex = new Regex(#"<span class='vidly-vid'.+<\/span>");
var attrRegexes = new[]{
#"data-thumb='(?<thumb>http.+\.jpg)'",
#"aspect-ratio='(?<aspect>\d{1,3}:\d{1,3})'",
#"sources='\[{""file"":.+""(?<src>(?<uri>https:\/\/cf1234.cloudfront\.net\/Vids\/)(?<key>(?<ident>[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}|[a-z0-9]{6})\/(?<mp4>mp4_1080.mp4|mp4_720.mp4|mp4_480.mp4|mp4_360.mp4|mp4.mp4)))",
#"style='(?<style>.+width: (?<width>.+)px.+height: (?<height>.+)px.+)'",
}
.Select(r => new Regex(r))
.ToList();
var results = inputs.Select(i => spanRegex.Match(i).Value)
.Select(i => new
{
i,
attributes =
from r in attrRegexes
let match = r.Match(i)
from g in match.Groups.Cast<Group>().Skip(1)
select new {g.Name, capture = g.Value}
});
Linqpad example
Related
I need to get rid of the borders around the individual checkboxes that are rendered by a CheckBox control. Here's what it looks like now:
The ASP.Net markup is straightforward:
<asp:CheckBoxList ID="cblEthnicity" runat="server" RepeatDirection="Vertical"
RepeatColumns="3" RepeatLayout="Table" BorderStyle="None" BorderWidth="0">
</asp:CheckBoxList>
which is in a cell in a table with the class formTable applied (see below).
As you can see, I've tried setting the attributes BorderStyle="None" and BorderWidth="0" to no effect.
I'm pretty sure that what's behind this is the following CSS, which puts rounded corner borders around the enclosing table cells, which I want to keep:
.formTable
{
background-color: #eeeeee;
border: solid 1px #bbbbbb;
-moz-border-radius: 7px;
-webkit-border-radius: 7px;
border-radius: 7px;
}
.formTable tr, .formTable tr td, .formTable tr th
{
background-color: #eeeeee;
padding: 3px;
border: solid 1px #bbbbbb;
vertical-align: top;
}
I added the following CSS, which also did nothing:
.formTable tr td input[type="checkbox"]
{
border: none;
}
Finally, the HTML rendered from the .aspx for the CheckBoxList, as seen in Chrome DevTools, looks like this (edited a little for brevity):
<table id="main_cblEthnicity" style="border-width:0px; border-style:None; border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
<tbody>
<tr>
<td style="border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
<input id="main_cblEthnicity_0" type="checkbox" name="ctl00$main$cblEthnicity$0"
checked="checked" value="Native American" />
<label for="main_cblEthnicity_0">Native American</label>
</td>
...
</tr>
</tbody>
</table>
Any suggestions on how I can get rid of the unwanted borders?
UPDATE: Here are some images to make it more clear what's going on and what I'm trying to accomplish:
This is what I'm getting now:
This is what I get if I use either suggestion that has been presented so far:
This is what I'm trying to achieve:
In addition to the suggestions made here, I tried adding this to the CSS, but it made no difference:
.formTable tr td > input[type="checkbox"] {
border: none;
}
I also tried this in Javascript/jQuery:
<script type="text/javascript">
$(document).ready(function() {
$('.formTable tr td > input[type="checkbox"]').removeAttr("border");
});
</script>
The problem isn't the input but in it's td.
Look:
<td style="border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
Here (above) is defined the border radius. And here (below) the border color:
.formTable tr, .formTable tr td, .formTable tr th
{
background-color: #eeeeee;
padding: 3px;
border: solid 1px #bbbbbb;
vertical-align: top;
}
So, to change this, you may want to add just after the above CSS code, this:
.formTable tr td
{
border:0;
}
Doing this, you'll just make the td borders to disappear and not the borders of tr or th
UPDATE AFTER OP's CLARIFICATIONS
Oh, all right. Now with those new screenshots we can see well what you're tryning to do achieve.
Anyway, you're still trying to remove a border from the input, but I repeat, the problem isn't the input but it's td.
I'll explain you with the code you gave us ok? So:
<table id="main_cblEthnicity" style="border-width:0px; border-style:None; border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
<tbody>
<tr>
<td style="border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
<input id="main_cblEthnicity_0" type="checkbox" name="ctl00$main$cblEthnicity$0"
checked="checked" value="Native American" />
<label for="main_cblEthnicity_0">Native American</label>
</td>
...
</tr>
</tbody>
</table>
This is the HTML code of the table that has inside all those checkboxes. All it's TDs have rounded borders and stuff we already know. This table that has inside all those checkboxes is inside a bigger TD (which borders you want to keep) W're in the following situation:
So now you got 2 ways to act without changing all your HTML: CSS or jQuery.
The CSS way
Pretty simple, you may want to put inline style at those table cells (which have checkboxes inside) like this: style="border:0" instead of style="border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;". Or Just create a new CSS class like this
.no-borders {
border:0;
}
and apply it on every td you don't want to see.
The jQuery way
<script type="text/javascript">
$(document).ready(function() {
$('.formTable input[type="checkbox"]').parent().css('border','none');
});
</script>
Your code isn't showing it, but apparently at some point class .formTable is being assigned to the CheckBoxList. Just remove border: solid 1px #bbbbbb; from the second class declaration:
.formTable tr, .formTable tr td, .formTable tr th
{
background-color: #eeeeee;
padding: 3px;
vertical-align: top;
}
Demo: http://jsfiddle.net/pgpR3/1/
I have exhausted my mental resources and can't solve this puzzle. I am attempting to extract the text from the span with id = "lookupCount". I want the "9" from there, but no matter how hard I try, it doesn't work. Please help, below is the html.
Just to be clear, i want the value of this text - "9"
<div class="addressSelectionDiv" style="width:330px; margin-left:0px; margin-top:40px; ">
<table id="addressSelectionTable" align="center" width="100%" cellspacing="2" cellpadding="0">
<tbody>
<tr style="height:15px;">
<td>
<div id="App.ctl00_leftContent_addressSelection_validationLabel_Container" style="display:inline;">
<label id="ctl00_leftContent_addressSelection_validationLabel" class="x-label x-label-default x-border-box" for="" style="color:#981e32;font-size:1.0em;">
<img id="ext-gen1029" class="x-label-icon" src="" style="display: none;"/>
<span id="ext-gen1030" class="x-label-value"/>
</label>
</div>
<span id="lookups" style="visibility: hidden; float: right;">
<span id="lookupCount">9</span>
/
<span id="lookupLimit">100</span>
</span>
</td>
</tr>
<tr valign="top">
<tr>
</tbody>
</table>
</div>
Here is what I tried
var x = Driver.FindElement(By.Id("lookupCount")).Text returns ""
var x = Driver.FindElement(By.Id("lookups")).Text returns ""
Neither of the objects above contain any information that leads me to the answer, even if I remove the .Text property.
The 2 lines below returns "" in the [0]th element.
ICollection<IWebElement> table = Driver.FindElements(By.Id("lookups"));
List<IWebElement> elements = table.ToList();
The one below returns this string:
Search for Household Decisions by entering an address or ZIP code:\r\n\r\n AL\r\n AK\r\n AZ\r\n AR\r\n CA\r\n CO\r\n CT\r\n DC\r\n DE\r\n FL\r\n GA\r\n HI\r\n ID\r\n IL\r\n IN\r\n IA\r\n KS\r\n KY\r\n LA\r\n ME\r\n MD\r\n MA\r\n MI\r\n MN\r\n MS\r\n MO\r\n MT\r\n NE\r\n NV\r\n NH\r\n NJ\r\n NM\r\n NY\r\n NC\r\n ND\r\n OH\r\n OK\r\n OR\r\n PA\r\n RI\r\n SC\r\n SD\r\n TN\r\n TX\r\n UT\r\n VT\r\n VA\r\n WA\r\n WV\r\n WI\r\n WY
ICollection<IWebElement> table = Driver.FindElements(By.Id("addressSelectionTable"));
List<IWebElement> elements = table.ToList();
Returns text located between open/close tag of the element with Id "lookupCount":
driver.FindElement(By.Id("lookupCount")).Text;
Returns html code between open/close tag with id "lookupCount":
driver.FindElement(By.Id("lookupCount")).HtmlInner;
I have HTML code like this:
<tr>
<th colspan="2" style="padding: 10px; font-size: 11px; background: #eee; border: 1px solid white" align="left">
Some Text Here
</th>
</tr>
I am storing this in string like this:
string gtr =
#"<tr>
<th colspan=""2"" style=""padding: 10px; font-size: 11px; background: #eee; border: 1px solid white"" align=""left"">
Some Text Here
</th>
</tr>";
But When I Debug It Show String Like this:
<tr>
<th colspan=\"2\" style=\"padding: 10px; font-size: 11px; background: #eee; border: 1px solid white\" align=\"left\">
Some Text Here
</th>
</tr>
It Show Escape Sequence Characters.
It Tried to remove them like this
gtr = gtr.Replace(#"\","");
and use all other possible methods
But this not working it always show string gtr with Escape Sequence Character.
How to achieve string without Escape Sequence Character and gives only clear HTML code.
I am only Using ASP.NET with C# and not Using MVC and this is a static content.
But When I Debug It Show String Like this
That's because you're looking at the string in the debugger. The string doesn't actually contain those backslashes - they're just part of the debug output, which escapes various characters to make it look like it would as a regular string literal in code.
Write the string to a file or the console and you'll see the backslashes really aren't there.
As an alternative way of convincing yourself of this even in the debugger, try this:
string x = "\"\"";
int y = x.Length;
char z = x[0];
Then in the debugger you'll see that y is 2, and z is just " - it may be escaped again, but clearly it can't be both characters in \" as it's just a char.
When I check you code in Text Visualizer from Quick Watch it show string without escape characters, See following snap
I have a table like that. And I wanna get the just text FOO COMPANY from between td tags. How can I get it?
<table class="left_company">
<tr>
<td style="BORDER-RIGHT: medium none; bordercolor="#FF0000" align="left" width="291" bgcolor="#FF0000">
<table cellspacing="0" cellpadding="0" width="103%" border="0">
<tr style="CURSOR: hand" onclick="window.open('http://www.foo.com')">
<td class="title_post" title="FOO" valign="center" align="left" colspan="2">
<font style="font-weight: 700" face="Tahoma" color="#FFFFFF" size="2">***FOO COMPANY***</font>
</td>
</tr>
</table>
</td>
</tr>
<table>
I'm using following code but nS is null.
doc = hw.Load("http://www.foo.aspx?page=" + j);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//table[#class='left_company']"))
{
nS = doc.DocumentNode.SelectNodes("//td[#class='title_post']");
}
var text = doc.DocumentNode.Descendants()
.FirstOrDefault(n => n.Attributes["class"] != null &&
n.Attributes["class"].Value == "title_post")
.Element("font").InnerText;
or
var text2 = doc.DocumentNode.SelectNodes("//td[#class='title_post']/font")
.First().InnerText;
Likely the page you are calling generate the content of interest using JavaScript. HtmlAgilityPack does not execute JavaScript, so the content cannot be extracted. One way to confirm this is to try to visit the page with scripting turned off, and try to see if the element you are interested in still exists.
insert some attribute to font element like company="FOO"
then use jquery to get that element like
alert($('font[company="FOO"]').html())
like this
cheers
Close: nS = doc.DocumentNode.SelectNodes("//td[#class='title_post']//text()");
You can then open the nS node to retrieve the text. If there's more than one text node, you'll need to iterate over them.
I want to open a group with HTML-Elements when I click on a checkbox. It works fine for one group (because then i only have one id). But if I have more groups, each group has a dynamic id (for div-tag and input-tag). This is my HTML-Code:
<div style="line-height: 1.7em; background-color: #eee;">
<span style="padding-left: 8px; color: #eb8f00; font-size: 1.1em; font-weight: bold; font-family: 'Trebuchet MS', Verdana, Helvetica, Sans-Serif;">
<input style="vertical-align: middle;" id="#currentElement.sGroupId" type="checkbox" name="#currentElement.sGroupId" />
<label for="#currentElement.sGroupId">#currentElement.sGroupName</label>
</span>
</div>
<div style="background-color: #eee;" class="#currentElement.sGroupName">
<!-- Dynamic content -->
</div>
Now i have those string which are dynamic due to my foreach loop.
#currentElement.sGroupId looks like 'idgr_12' (only the number changes)
#currentElement.sGroupName is the name of the 'Group' which I want to open (the whole name changes with every loop)
This is my Javascript Code:
$(function () {
$("id from div").hide();
$("id from input").change(function () {
var $this = $(this);
if ($this.is(":checked")) {
$("id from div").show(250);
}
else {
$("id from div").hide(250);
}
});
});
Now i get for example three groups, each with a checkbox. But when I click on one checkbox each group expands. How i get these dynamic id's in my javascript method, so that only the group with the activated checkbox expands?
in your If body you can use relative selector like
$(this).parent('div').show(250)