I want to remove subdomain names from a URI.
Example:
I want to return 'baseurl.com' from the Uri "subdomain.sub2.baseurl.com".
Is there a way of doing this using URI class or is Regex the only solution?
Thank you.
This should get it done:
var tlds = new List<string>()
{
//the second- and third-level TLDs you expect go here, set to null if working with single-level TLDs only
"co.uk"
};
Uri request = new Uri("http://subdomain.domain.co.uk");
string host = request.Host;
string hostWithoutPrefix = null;
if (tlds != null)
{
foreach (var tld in tlds)
{
Regex regex = new Regex($"(?<=\\.|)\\w+\\.{tld}$");
Match match = regex.Match(host);
if (match.Success)
hostWithoutPrefix = match.Groups[0].Value;
}
}
//second/third levels not provided or not found -- try single-level
if (string.IsNullOrWhiteSpace(hostWithoutPrefix))
{
Regex regex = new Regex("(?<=\\.|)\\w+\\.\\w+$");
Match match = regex.Match(host);
if (match.Success)
hostWithoutPrefix = match.Groups[0].Value;
}
Related
I have this string:
http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=1007040&a=2
How can I pick out the number between "q=" and "&" as an integer?
So in this case I want to get the number: 1007040
What you're actually doing is parsing a URI - so you can use the .Net library to do this properly as follows:
var str = "http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=1007040&a=2";
var uri = new Uri(str);
var query = uri.Query;
var dict = System.Web.HttpUtility.ParseQueryString(query);
Console.WriteLine(dict["amp;q"]); // Outputs 1007040
If you want the numeric string as an integer then you'd need to parse it:
int number = int.Parse(dict["amp;q"]);
Consider using regular expressions
String str = "http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=1007040&a=2";
Match match = Regex.Match(str, #"q=\d+&");
if (match.Success)
{
string resultStr = match.Value.Replace("q=", String.Empty).Replace("&", String.Empty);
int.TryParse(resultStr, out int result); // result = 1007040
}
Seems like you want a query parameter for a uri that's html encoded. You could do:
Uri uri = new Uri(HttpUtility.HtmlDecode("http://www.edrdg.org/jmdictdb/cgi-bin/edform.py?svc=jmdict&sid=&q=1007040&a=2"));
string q = HttpUtility.ParseQueryString(uri.Query).Get("q");
int qint = int.Parse(q);
A regex approach using groups:
public int GetInt(string str)
{
var match = Regex.Match(str,#"q=(\d*)&");
return int.Parse(match.Groups[1].Value);
}
Absolutely no error checking in that!
I need to parse a site and I know where to find the element I'm searching: it's a span with class="metadata_with_icon-tags-primary_tag".
My C# code:
var page = new HtmlWeb().Load(url).DocumentNode.Descendants("span").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("metadata_with_icon-tags-primary_tag"));
Item that I need:
To get your span with class="metadata_with_icon-tags-primary_tag":
HtmlNode node = htmlDoc.DocumentNode.SelectSingleNode("//span[#class='metadata_with_icon-tags-primary_tag']");
Try this
HtmlWeb website = new HtmlWeb();
var html = website.Load("https://genius.com/Eminem-space-bound-lyrics").DocumentNode.InnerHtml;
Regex rgx = new Regex(#"<script\b[^>]*>([\s\S]*?)<\/script>", RegexOptions.IgnoreCase);
var matches = rgx.Matches(html);
var g = matches[14].Value;
Regex regex = new Regex(
#"(\[{.*}\])",
RegexOptions.Multiline
);
Match match = regex.Match(g);
var json = match.Value;
I am using webrequest to download a source from a page and then I need to use Regex to grab the string and store it in a string:
U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
also need:
bpvsid=nvnN2JFJqJc.&dcz=1
Both out of:
<td style="cursor:pointer;" class="" onclick="NewWindow('U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..', 'bpvsid=nvnN2JFJqJc.&dcz=1', 'bpvstage_edit', '1200', '800')" onmouseout="HideHover();"><img src="gfx/info.gif" alt="" tipwidth="450" ajaxtip="openajax.php?target=modules/bpv/bpvstage_hover_info.php&rid=&oid=&bpvsid=&bpvname=" /></td>
It keep giving me errors like not enough )'s?
Thanks in advance.
Current code, probably wrong in every way. Really new to this:
Regex rx = new Regex("(?<=class=\"\" onclick=\"NewWindow(').*(?=')");
longId = (rx.Match(textBox2.Text).Value);
textBox1.Text = longId;
var match = Regex.Match(s, #"onclick=""NewWindow\('([^']*)',\s*'([^']*)',.*");
if (match.Success)
{
string longId = match.Groups[1].Value;
string other = match.Groups[2].Value;
}
That will give you two groups with values:
U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
bpvsid=nvnN2JFJqJc.&dcz=1
The regex NewWindow\('([^']*)', '([^']*) will match what you require. The two strings required will be in Groups[1] and Groups[2].
var match = Regex.Match(textBox2.Text, "NewWindow\('([^']*)', '([^']*)");
var id1 = match.Groups[1].Value;
var id2 = match.Groups[2].Value;
Note that you could also use simply string functions instead of a regex:
var s = "<td style=\"cursor:pointer;\" class=\"\" onclick=\"NewWindow('U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..', 'bpvsid=nvnN2JFJqJc.&dcz=1', 'bpvstage_edit', '1200', '800')\" onmouseout=\"HideHover();\"><img src=\"gfx/info.gif\" alt=\"\" tipwidth=\"450\" ajaxtip=\"openajax.php?target=modules/bpv/bpvstage_hover_info.php&rid=&oid=&bpvsid=&bpvname=\" /></td>";
var tmp = s.Substring(s.IndexOf("NewWindow('")).Split('\'');
var value1 = tmp[1]; // U_nQgAjU_tdUnfcA7lT5opoTLyLdslWDTpiNzcdkLoHlobS_HbujMw..
var value2 = tmp[3]; // bpvsid=nvnN2JFJqJc.&dcz=1
I would use HtmlAgilityPack to parse HTML, then this non-regex approach works:
string html = // get your html ...
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html); // doc.Load can also consume a response-stream directly
var result = Enumerable.Empty<string>();
var firstTD = doc.DocumentNode.SelectNodes("//td").FirstOrDefault();
if (firstTD != null)
{
if (firstTD.Attributes.Contains("onclick"))
{
string onclick = firstTD.Attributes["onclick"].Value;
int newWindowIndex = onclick.IndexOf("newWindow(", StringComparison.OrdinalIgnoreCase);
if (newWindowIndex >= 0)
{
string functionBody = onclick.Substring(newWindowIndex + "newWindow(".Length);
string[] tokens = functionBody.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
result = tokens.Take(2).Select(s => s.Trim(' ', '\''));
}
}
}
when i use the code below to get list of groups
i get a long string represent the group name
CN=group.xy.admin.si,OU=Other,OU=Groups,OU=03,OU=UWP Customers,DC=WIN,DC=CORP,DC=com
But i just want to get the group name which is in this case group.xy.admin.si
public static List<string> GetGroups(DirectoryEntry de)
{
var memberGroups = de.Properties["memberOf"].Value;
var groups = new List<string>();
if (memberGroups != null)
{
if (memberGroups is string)
{
groups.Add((String)memberGroups);
}
else if (memberGroups.GetType().IsArray)
{
var memberGroupsEnumerable = memberGroups as IEnumerable;
if (memberGroupsEnumerable != null)
{
foreach (var groupname in memberGroupsEnumerable)
{
groups.Add(groupname.ToString());
}
}
}
}
return groups;
}
There are two options here:
use distinguishedName you got to retrieve group object from AD, use its 'name' attribute
use regex to extract group name
pseudo-code for regular expression:
string Pattern = #"^CN=(.*?)(?<!\\),.*";
string group = Regex.Replace(groupname.ToString(), Pattern, "$1");
groups.Add(group);
Name can contain "," that is escaped by "\", so this regex should work fine even if you have groups named "Foo, Bar"
For example I have an input: "Test your Internet connection bandwidth. Test your Internet connection bandwidth." (two times repeated) and I want to search for strings internet and bandwidth.
string keyword = tbSearch.Text //That holds value: "internet bandwidth"
string input = "Test your Internet connection bandwidth. Test your Internet connection bandwidth.";
Regex r = new Regex(keyword.Replace(' ', '|'), RegexOptions.IgnoreCase);
if (r.Matches(input).Count == siteKeyword.Split(' ').Length)
{
//Do something
}
This doesn't work cause it finds 2 "internet" and 2 "bandwidth", so it count 4 but the keyword length is 2. So what I can do?
var pattern = keyword.Split()
.Aggregate(new StringBuilder(),
(sb, s) => sb.AppendFormat(#"(?=.*\b{0}\b)", Regex.Escape(s)),
sb => sb.ToString());
if (Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase))
{
// contains all keywords
}
First part is generating pattern from your keywords. If there is two keywords "internet bandwidth", then generated regex pattern will look like:
"(?=.*\binternet\b)(?=.*\bbandwidth\b)"
It will match following inputs:
"Test your Internet connection bandwidth."
"Test your Internet connection bandwidth. Test your Internet bandwidth."
Following inputs will not match (not all words contained):
"Test your Internet2 connection bandwidth bandwidth."
"Test your connection bandwidth."
Another option (verifying each keyword separately):
var allWordsContained = keyword.Split().All(word =>
Regex.IsMatch(input, String.Format(#"\b{0}\b", Regex.Escape(word)), RegexOptions.IgnoreCase));
Not sure what you are trying to do, but you could try something like this:
public bool allWordsContained(string input, string keyword)
{
bool result = true;
string[] words = keyword.Split(' ');
foreach (var word in words)
{
if (!input.Contains(word))
result = false;
}
return result;
}
public bool atLeastOneWordContained(string input, string keyword)
{
bool result = false;
string[] words = keyword.Split(' ');
foreach (var word in words)
{
if (input.Contains(word))
result = true;
}
return result;
}
Here is the solution. Clue is to get a list of results and make Distinct()...
string keyword = "internet bandwidth";
string input = "Test your Internet connection bandwidth. Test your Internet connection bandwidth.";
Regex r = new Regex(keyword.Replace(' ', '|'), RegexOptions.IgnoreCase);
MatchCollection mc = r.Matches(input);
List<string> res = new List<string>();
for (int i = 0; i < mc.Count;i++ )
{
res.Add(mc[i].Value);
}
if (res.Distinct().Count() == keyword.Split(' ').Length)
{
//Do something
}
Regex r = new Regex(keyword.Replace(' ', '|'), RegexOptions.IgnoreCase);
int distinctKeywordsFound = r.Matches(input)
.Cast<Match>()
.Select(m => m.Value)
.Distinct()
.Count();
if (distinctKeywordsFound == siteKeyword.Split(' ').Length)
{
//Do something
}