regex variable from script 32/34 characters - c#

from the following code I am trying to get the data from the script variable. I'm interested in the text between ""
var code = "a37965dcd8421328a767c697448ed735";
XPathResult xpathResult = geckoWebBrowser1.Document.EvaluateXPath("/html/body/table[3]/tbody/tr[1]/td[2]/script");
var foundNodes = xpathResult.GetNodes();
foreach (var node in foundNodes)
{
var x = node.TextContent; // get text text contained by this node (including children)
GeckoHtmlElement element = node as GeckoHtmlElement; //cast to access.. inner/outerHtml
string inner = element.InnerHtml;
string outer = element.OuterHtml;
String pattent = ".[0-9a-zA-Z]{34}$.";
Match match = Regex.Match(inner, pattent);
regex is correct? what am I doing wrong?

Your Regex string can try to use [0-9a-zA-Z]{32,34} instead of .[0-9a-zA-Z]{34}$.
The . could be removed.
regex online

Your Regex rule can try like this:
bool result = Regex.Match(inner, #"^[0-9a-zA-Z]{32,34}$").Success;
Console.WriteLine(result);
If result equal true, it match success!

Related

HTMLAgilityPack selects nodes from first iteration through divs

I'm trying to use HTMLAgilityPack to parse some website for the first time. Everything works as expected but only for first iteration. On each iteration I get unique div with its data, but SelectNodes() always gets data from first iteration.
The code listed below explains the problem
All the properties for station get values from first iteration.
static void Main(string[] args)
{
List<Station> stations = new List<Station>();
wClient = new WebClient();
wClient.Proxy = null;
wClient.Encoding = encode;
for (int i = 1; i <= 1; i++)
{
HtmlDocument html = new HtmlDocument();
string link = string.Format("http://energybase.ru/powerPlant/index?PowerPlant_page={0}&pageSize=20&q=/powerPlant", i);
html.LoadHtml(wClient.DownloadString(link));
var stationList = html.DocumentNode.SelectNodes("//div[#class='items']").First().ChildNodes.Where(x=>x.Name=="div").ToList();//get list of nodes with PowerStation Data
foreach (var item in stationList) //each iteration returns Item with unique InnerHTML
{
Station st = new Station();
st.Name = item.SelectNodes("//div[#class='col-md-20']").First().SelectNodes("//div[#class='name']").First().ChildNodes["a"].InnerText;//gets name from first iteration
st.Url = item.SelectNodes("//div[#class='col-md-20']").First().SelectNodes("//div[#class='name']").First().ChildNodes["a"].Attributes["href"].Value;//gets url from first iteration and so on
st.Company = item.SelectNodes("//div[#class='col-md-20']").First().SelectNodes("//div[#class='name']").First().ChildNodes["small"].ChildNodes["em"].ChildNodes["a"].InnerText;
stations.Add(st);
}
}
Maybe I am not getting some of essentials of OOP?
Your code can be greatly simplified by using the full power of XPath.
var stationList = html.DocumentNode.SelectNodes("//div[#class='items']/div");
// XPath-expression may be so: "//div[#class='items'][1]/div"
// where [1] means first node
foreach (var item in stationList)
{
Station st = new Station();
st.Name = item.SelectSingleNode("div[#class='col-md-20']/div[#class='name']/a").InnerText;
st.Url = item.SelectSingleNode("div[#class='col-md-20']/div[#class='name']/a").Attributes["href"].Value;
string rawText = item.SelectSingleNode("div[#class='col-md-20']/div[#class='name']/small/em").InnerText;
st.Company = HttpUtility.HtmlDecode(rawText.Trim());
stations.Add(st);
}
Your mistake was to use XPath descendants axis: //div.
Even better rewrite code like this:
var divName = item.SelectSingleNode("div[#class='col-md-20']/div[#class='name']");
var nodeA = divName.SelectSingleNode("a");
st.Name = nodeA.InnerText;
st.Url = nodeA.Attributes["href"].Value;
string rawText = divName.SelectSingleNode("small/em").InnerText;
st.Company = HttpUtility.HtmlDecode(rawText.Trim());
This article contains some good exaples on various aspects of html agility pack.
have a look into this article, it would give you a quick start.

How to get Last Index Of '\' or '//', whichever comes last?

I want to get lastindexof character from url which comes from the database on the basis of '\' or '//'
Say for example i have string like this
Administration\Masters\EmployeePulseDetailsMaster.aspx
Administration/Masters/SearchKnowYourCollegues.aspx
Administration//SMS//PushSMS.aspx
I am using that code
foreach (var item in SessionClass.UserDetails.SubModules)
{
if (Request.RawUrl.Contains(item.PageURL.Substring(item.PageURL.LastIndexOf('\\') + 1))
|| Request.RawUrl.Contains(item.PageURL.Substring(item.PageURL.LastIndexOf('/') + 1)))
{
Response.RedirectPermanent("~/Login.aspx");
}
}
You can use a regular expression to find the last occurrence of any character in a group by constructing a regular expression that looks like this:
[target-group][^target-group]*$
In your case, the target group is [/\\], so the search would look like this:
var match = Regex.Match(s, #"[/\\][^/\\]*$");
Here is a running example:
var data = new[] {
#"quick/brown/fox"
, #"jumps\over\the\lazy\dog"
, #"Administration\Masters\EmployeePulseDetailsMaster.aspx"
, #"Administration/Masters/SearchKnowYourCollegues.aspx"
, #"Administration//SMS//PushSMS.aspx"
};
foreach (var s in data) {
var m = Regex.Match(s, #"[/\\][^/\\]*$");
if (m.Success) {
Console.WriteLine(s.Substring(m.Index+1));
}
}
This prints
fox
dog
EmployeePulseDetailsMaster.aspx
SearchKnowYourCollegues.aspx
PushSMS.aspx
Demo.
I guess you want to determine if the name of the current page is in the list of SessionClass.UserDetails.SubModules. Then i'd use Request.Url.Segments.Last() to get only the name of the current page(f.e. PushSMS.aspx) and System.IO.Path.GetFileName to get the name of each url. GetFileName works with / or \:
string pageName = Request.Url.Segments.Last();
bool anyMatch = SessionClass.UserDetails.SubModules
.Any(module => pageName == System.IO.Path.GetFileName(module.PageURL));
if(anyMatch) Response.RedirectPermanent("~/Login.aspx");
You need to add using System.Linq; for Enumerable.Any.

Regex expression to search upto nested level

How to search search string upto nested level using Regex expression
Like say: I have string like
var str = "samir patel {samirpatel#test1.com{sam#somedomain.com}}";
Out put should be sam#somedomain.com
You could simply use this pattern:
{([^{}]*)}
This will match any string like {some content} which does not contain any other group like {some content}. You can test this here.
You can capture this using:
var str = "samir patel {samirpatel#test1.com{sam#somedomain.com}}";
var regex = new Regex("{([^{}]*)}");
var matches = regex.Matches(str);
var output = matches[0].Groups[1].Value;
// output == "sam#somedomain.com"
Or more simply:
var str = "samir patel {samirpatel#test1.com{sam#somedomain.com}}";
var output = Regex.Match(str, "{([^{}]*)}").Groups[1].Value;
// output == "sam#somedomain.com"
You could get this result using (?<=\{)[^{}]*(?=\}), assuming a language other than JavaScript. In C#, for example, that's
result = Regex.Match(str, #"(?<=\{)[^{}]*(?=\})").Value;
If you're using JavaScript, use \{([^{}]*)\} and access $1 for the match result:
var myregexp = /\{([^{}]*)\}/;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1];
}

Get href from html using mshtml in C#

I am trying to get the href link out of the following HTML code using mshtml in C# (WPF).
<a class="button_link" href="https://rhystowey.com/account/confirm_email/2842S-B2EB5-136382?t=1&sig=b0dbd522380a21007d8c375iuc583f46a90365d9&iid=am-130280753913638201274485430&ac=1&uid=1284488216&nid=18+308" style="border:none;color:#0084b4;text-decoration:none;color:#ffffff;font-size:13px;font-weight:bold;font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;">Confirm your account now</a>
I have tried using the following code to make this work by using mshtml in C# (WPF) but I have failed miserably.
HTMLDocument mdoc = (HTMLDocument)browser.Document;
string innerHtml = mdoc.body.outerText;
string str = "https://rhystowey.com/account/confirm_email/";
int index = innerHtml.IndexOf(str);
innerHtml = innerHtml.Remove(0, index + str.Length);
int startIndex = innerHtml.IndexOf("\"");
string str3 = innerHtml.Remove(startIndex, innerHtml.Length - startIndex);
string thelink = "https://rhystowey.com/account/confirm_email/" + str3;
Can someone please help me to get this to work.
Use this:
var ex = new Regex("href=\"(.*)\" style");
var tag = "<a class=\"button_link\" href=\"https://rhystowey.com/account/confirm_email/2842S-B2EB5-136382?t=1&sig=b0dbd522380a21007d8c375iuc583f46a90365d9&iid=am-130280753913638201274485430&ac=1&uid=1284488216&nid=18+308\" style=\"border:none;color:#0084b4;text-decoration:none;color:#ffffff;font-size:13px;font-weight:bold;font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;\">Confirm your account now</a>";
var address = ex.Match(tag).Groups[1].ToString();
But you should extend it with checks because for instance Groups[1] could be out of range.
In your example
HTMLDocument mdoc = (HTMLDocument)browser.Document;
string innerHtml = mdoc.body.outerText;
var ex = new Regex("href=\"([^\"\"]+)\"");
var address = ex.Match(innerHtml).Groups[1].ToString();
will match the first href="...". Or you select all occurrences:
var matches = (from Match match in ex.Matches(innerHtml) select match.Groups[1].Value).ToList();
This will give you a List<string> with all the links in your HTML. To filter this, you can either go this way
var wantedMatches = matches.Where(m => m.StartsWith("https://rhystowey.com/account/confirm_email/"));
which is more flexible because you could check against a list of start strings or whatever. Or you do it in your regex, which will lead in better performance:
var ex = new Regex("href=\"(https://rhystowey\\.com/account/confirm_email/[^\"\"]+)\"");
Bringing it all together to what you want as far as I understand
var ex = new Regex("href=\"(https://rhystowey\\.com/account/confirm_email/[^\"\"]+)\"");
var matches = (from Match match in ex.Matches(innerHTML)
where match.Groups.Count >= 1
select match.Groups[1].Value).ToList();
var firstAddress = matches.FirstOrDefault();
firstAddress holds your link, if there is one.
If your link will always start with the same path and isn't repeated on the page, you can use this (untested):
var match = Regex.Match(html, #"href=""(?<href>https\:\/\/rhystowey\.com\/account\/confirm_email\/[^""]+)""");
if (match.Success)
{
var href = match.Groups["href"].Value;
....
}

Custom Regex for Parsing Custom Fields in HTML String

I am sending some html in a hidden field, and on server side I would be parsing it with regex. Currently I am able to parse
<div id="4059">asd</div>
and the code below gives me "id" in match.Groups[2] and "4059" in match.Groups[4], "div" comes at first index and 3rd comes empty.
string regex2 = #"<(?<Tag_Name>(a)|div)\b[^>]*?\b(?<URL_Type>(?(1)id))\s*=\s*(?:""(?<URL>(?:\\""|[^""])*)""|'(?<URL>(?:\\'|[^'])*)')";
var matches = Regex.Matches(myDiv, regex2, RegexOptions.IgnoreCase | RegexOptions.Singleline);
var links = new List<string>();
foreach (Match item in matches)
{
if (item.Groups[2].Value == "div")
{
employee.ID = item.Groups[4].Value;
}
]
Can someone please edit this regex,
<(?<Tag_Name>(a)|div)\b[^>]*?\b(?<URL_Type>(?(1)id))\s*=\s*(?:""(?<URL>(?:\\""|[^""])*)""|'(?<URL>(?:\\'|[^'])*)')
so that I could parse
<div id="5094" fieldA="asd" fieldB="def" fieldC="ghi"></div>
and the fields could be added too.
I should also mention here that I am working on a custom control and I CAN NOT USE HTML AGILITY PACK as the assemblies conflict as I add this in my project.
If you already know that the string contains only <div field="value" field="value" ...></div> (i.e. there's nothing but this div in the string), then just simplify your regex to pick out the field and value, and run it in a loop:
string regstr = #"\s+(?<field>[^\s=]+)\s*=\s*\"(?<value>[^\"]+)\"";
var reg = new Regex(regstr);
var m = reg.Match(myDiv);
while (m.Success)
{
// m.Groups["field"] and m.Groups["value"] hold your field and value
// get the next match
m = m.NextMatch();
}

Categories