Regex Matching First URL with word - c#

What I need:
I have a string like this:
Bike’s: http://website.net/bikeurl Toys: http://website.net/rc-cars
Calendar: http://website.net/schedule
I want to match the word I specify and the first URL after it. So if i specify the word as "Bike" i should get:
Bike’s: http://website.net/bikeurl
Or if possible only the url of the Bike word:
http://website.net/bikeurl
Or if I specify Toys I should get:
Toys: http://website.net/rc-cars
or if possible
http://website.net/rc-cars
What I am using:
I am using this regex:
(Bike)(.*)((https?|ftp):/?/?)(?:(.*?)(?::(.*?)|)#)?([^:/\s]+)(:([^/]*))?(((?:/\w+)*)/)([-\w.]+[^#?\s]*)?(\?([^#]*))?(#(.*))?
Result:
It is matching:
Bike’s: http://website.net/bikeurl Toys: http://website.net/rc-cars
I only want:
Bike’s: http://website.net/bikeurl
I am not a regex expert, I tried using {n} {n,} but it either didn't match anything or matches the same
I am using .NET C# so I am testing here http://regexhero.net/tester/

Here is another approach:
Bike(.*?):\s\S*
and here is an example how to get the corresponding URL-candidate only:
var inputString = "Bike’s: http://website.net/bikeurl Toys: http://website.net/rc-cars Calendar: http://website.net/schedule";
var word = "Bike";
var url = new Regex( word + #"(.*?):\s(?<URL>\S*)" )
.Match( inputString )
.Result( "${URL}" );

If you really need to make sure it's an url look at this:
Validate urls with Regex
Regex to check a valid Url
Here's another solution. I would separate the Bike's, Toys and Calendar in a dictionary and put the url as a value then when needed call it.
Dictionary<string, string> myDic = new Dictionary<string, string>()
{
{ "Bike’s:", "http://website.net/bikeurl" },
{ "Toys:", "http://website.net/rc-cars" },
{ "Calendar:", "http://website.net/schedule" }
};
foreach (KeyValuePair<string, string> item in myDic)
{
if (item.Key.Equals("Bike's"))
{
//do something
}
}
Hope one of my ideas helps you.

If I understood your problem correctly. You need a generic regex that will select a url based on a word. Here is one that would select the url with bike in it:
(.(?<!\s))*\/\/((?!\s).)*bike((?!\s).)*
If you replace bike with any other word. It would select the respective URL's.
EDIT 1:
Based on your edit, here is one that would select based on the word preceding the URL:
(TOKEN((?!\s).)*\s+)((?!\s).)*
It would select the word + the URL eg.
(Bike((?!\s).)*\s+)((?!\s).)* would select Bike’s: http://website.net/bikeurl
(Toy((?!\s).)*\s+)((?!\s).)* would select Toys: http://website.net/rc-cars
(Calendar((?!\s).)*\s+)((?!\s).)* would select Calendar: http://website.net/schedule
If you want to make sure the string contains a URL, you can use this instead:
(TOKEN((?!\s).)*\s+)((?!\s).)*\/\/((?!\s).)*
It will make sure that the 2nd part of the string ie. the one that is supposed to contain a URL has a // in between.

Related

Find all string parts starting with [ and ends with ] in long string

I have an interesting problem for which I want to find a best solution I have tried my best with regex . What I want is to find all the col_x values from this string using C# using regular expression or any other method.
[col_5] is a central heating boiler manufacturer produce boilers under [col_6]
brand name . Your selected [col_7] model name is a [col_6] [col_15] boiler.
[col_6] [col_15] boiler [col_7] model [col_10] came in production untill
[col_11]. [col_6] model product index number is [col_1] given by SEDBUK
'Seasonal Efficiency of a Domestic Boiler in the UK'. [col_6] model have
qualifier [col_8] and GCN [col_9] 'Boiler Gas Council No'. [col_7] model
source of heat for a boiler combustion is a [col_12].
The output expected is an array
var data =["col_5","col_10","etc..."]
Edit
my attempt :
string text = "[col_1]cc[col_2]asdfsd[col_3]";
var matches = Regex.Matches(text, #"[[^#]*]");
var uniques = matches.Cast<Match>().Select(match => match.Value).ToList().Distinct();
foreach(string m in uniques)
{
Console.WriteLine(m);
}
but no success.
Try something like this:
string[] result = Regex.Matches(input, #"\[(col_\d+)\]").
Cast<Match>().
Select(x => x.Groups[1].Value).
ToArray();
I think that's what you need:
string pattern = #"\[(col_\d+)\]";
MatchCollection matches = Regex.Matches(input, pattern);
string[] results = matches.Cast<Match>().Select(x => x.Groups[1].Value).ToArray();
Replace input with your input string.
I hope it helps
This is a little hacky but you could do this.
var myMessage =#"[col_5] is a central heating boiler..."; //etc.
var values = Enumerable.Range(1, 100)
.Select(x => "[col_" + x + "]")
.Where(x => myMessage.Contains(x))
.ToList();
Assuming there is a known max col_"x" in this case I assumed 100, it just tries them all by brute force returning only the ones that it finds inside the text.
If you know that there are only so many columns to hunt for, I would try this instead of Regex personally as I have had too many bad experiences burning hours on Regex.

How to get Last Index Of '\' or '//', whichever comes last?

I want to get lastindexof character from url which comes from the database on the basis of '\' or '//'
Say for example i have string like this
Administration\Masters\EmployeePulseDetailsMaster.aspx
Administration/Masters/SearchKnowYourCollegues.aspx
Administration//SMS//PushSMS.aspx
I am using that code
foreach (var item in SessionClass.UserDetails.SubModules)
{
if (Request.RawUrl.Contains(item.PageURL.Substring(item.PageURL.LastIndexOf('\\') + 1))
|| Request.RawUrl.Contains(item.PageURL.Substring(item.PageURL.LastIndexOf('/') + 1)))
{
Response.RedirectPermanent("~/Login.aspx");
}
}
You can use a regular expression to find the last occurrence of any character in a group by constructing a regular expression that looks like this:
[target-group][^target-group]*$
In your case, the target group is [/\\], so the search would look like this:
var match = Regex.Match(s, #"[/\\][^/\\]*$");
Here is a running example:
var data = new[] {
#"quick/brown/fox"
, #"jumps\over\the\lazy\dog"
, #"Administration\Masters\EmployeePulseDetailsMaster.aspx"
, #"Administration/Masters/SearchKnowYourCollegues.aspx"
, #"Administration//SMS//PushSMS.aspx"
};
foreach (var s in data) {
var m = Regex.Match(s, #"[/\\][^/\\]*$");
if (m.Success) {
Console.WriteLine(s.Substring(m.Index+1));
}
}
This prints
fox
dog
EmployeePulseDetailsMaster.aspx
SearchKnowYourCollegues.aspx
PushSMS.aspx
Demo.
I guess you want to determine if the name of the current page is in the list of SessionClass.UserDetails.SubModules. Then i'd use Request.Url.Segments.Last() to get only the name of the current page(f.e. PushSMS.aspx) and System.IO.Path.GetFileName to get the name of each url. GetFileName works with / or \:
string pageName = Request.Url.Segments.Last();
bool anyMatch = SessionClass.UserDetails.SubModules
.Any(module => pageName == System.IO.Path.GetFileName(module.PageURL));
if(anyMatch) Response.RedirectPermanent("~/Login.aspx");
You need to add using System.Linq; for Enumerable.Any.

Process part of the regex match before replacing it

I'm writing a function that will parse a file similar to an XML file from a legacy system.
....
<prod pid="5" cat='gov'>bla bla</prod>
.....
<prod cat='chi'>etc etc</prod>
....
.....
I currently have this code:
buf = Regex.Replace(entry, "<prod(?:.*?)>(.*?)</prod>", "<span class='prod'>$1</span>");
Which was working fine until it was decided that we also wanted to show the categories.
The problem is, categories are optional and I need to run the category abbreviation through a SQL query to retrieve the category's full name.
eg:
SELECT * FROM cats WHERE abbr='gov'
The final output should be:
<span class='prod'>bla bla</span><span class='cat'>Government</span>
Any idea on how I could do this?
Note1: The function is done already (except this part) and working fine.
Note2: Cannot use XML libraries, regex has to be used
Regex.Replace has an overload that takes a MatchEvaluator, which is basically a Func<Match, string>. So, you can dynamically generate a replacement string.
buf = Regex.Replace(entry, #"<prod(?<attr>.*?)>(?<text>.*?)</prod>", match => {
var attrText = match.Groups["attr"].Value;
var text = match.Groups["text"].Value;
// Now, parse your attributes
var attributes = Regex.Matches(#"(?<name>\w+)\s*=\s*(['""])(?<value>.*?)\1")
.Cast<Match>()
.ToDictionary(
m => m.Groups["name"].Value,
m => m.Groups["value"].Value);
string category;
if (attributes.TryGetValue("cat", out category))
{
// Your SQL here etc...
var label = GetLabelForCategory(category)
return String.Format("<span class='prod'>{0}</span><span class='cat'>{1}</span>", WebUtility.HtmlEncode(text), WebUtility.HtmlEncode(label));
}
// Generate the result string
return String.Format("<span class='prod'>{0}</span>", WebUtility.HtmlEncode(text));
});
This should get you started.

Replace {x} tokens in strings

We have a template URL like:
http://api.example.com/sale?auth_user=xxxxx&auth_pass=xxxxx&networkid={networkid}&category=b2c&country=IT&pageid={pageid}&programid=133&saleid=1&m={master}&optinfo={optinfo}&publisher={publisher}&msisdn={userId}
and I have values for these constant tokens. How can replace all these tokens in C#?
A simple approach is to use a foreach and a Dictionary with a String.Replace:
var values = new Dictionary<string, string> {
{ "{networkid}", "WHEEE!!" }
// etc.
};
var url = "http://api.example.com/sale?auth_user=xxxxx&auth_pass=xxxxx&networkid={networkid}&category=b2c&country=IT&pageid={pageid}&programid=133&saleid=1&m={master}&optinfo={optinfo}&publisher={publisher}&msisdn={userId}";
foreach(var key in values.Keys){
url = url.Replace(key,values[key]);
}
There is no standard way to "replace with dictionary values" in .NET. While there are a number of template engines, it's not very hard to write a small solution for such an operation. Here is an example which runs in LINQPad and utilizes a Regular Expression with a Match Evaluator.
As the result is a URL,
it is the callers responsibility to make sure all the supplied values are correctly encoded. I recommend using Uri.EscapeDataString as appropriate .. but make sure to not double-encode, if it is processed elsewhere.
Additionally, the rules of what to do when no replacement is found should be tailored to need. If not-found replacements should be eliminated entirely along with the query string key, the following can expand the regular expression to #"\w+=({\w+})" to also capture the parameter key in this specific template situation.
string ReplaceUsingDictionary (string src, IDictionary<string, object> replacements) {
return Regex.Replace(src, #"{(\w+)}", (m) => {
object replacement;
var key = m.Groups[1].Value;
if (replacements.TryGetValue(key, out replacement)) {
return Convert.ToString(replacement);
} else {
return m.Groups[0].Value;
}
});
}
void Main()
{
var replacements = new Dictionary<string, object> {
{ "networkid", "WHEEE!!" }
// etc.
};
var src = "http://api.example.com/sale?auth_user=xxxxx&auth_pass=xxxxx&networkid={networkid}&category=b2c&country=IT&pageid={pageid}&programid=133&saleid=1&m={master}&optinfo={optinfo}&publisher={publisher}&msisdn={userId}";
var res = ReplaceUsingDictionary(src, replacements);
// -> "http://api.example.com/sale?..&networkid=WHEEE!!&..&pageid={pageid}&..
res.Dump();
}
More advanced techniques, like reflection and transforms, are possible - but those should be left for the real template engines.
I am guessing you are trying to replace parameters in url with your values. This can be done using C# HttpUtility.ParseQueryString
Get the CurrentURL from
var _myUrl = System.Web.HttpUtility.ParseQueryString(Request.RawUrl);
Read Parameter from your Query string
string value1 = _myUrl ["networkid"];
Write a value into the QueryString object.
_myUrl ["networkid"] = "Your Value";
and then finally turn it back into URL
var _yourURIBuilder= new UriBuilder(_myUrl );
_myUrl = _yourURIBuilder.ToString();
You can use this alos using LinQ
Dictionary<string, string> myVal = new Dictionary<string, string>();
myVal.Add("networkid", "1");
myVal.Add("pageid", "2");
myVal.Add("master", "3");
myVal.Add("optinfo", "4");
myVal.Add("publisher", "5");
myVal.Add("userId", "6");
string url = #"http://api.example.com/sale?auth_user=xxxxx&auth_pass=xxxxx&networkid={networkid}&category=b2c&country=IT&pageid={pageid}&programid=133&saleid=1&m={master}&optinfo={optinfo}&publisher={publisher}&msisdn={userId}";
myVal.Select(a => url = url.Replace(string.Concat("{", a.Key, "}"), a.Value)).ToList();
this line can do your required functionlity
myVal.Select(a => url = url.Replace(string.Concat("{", a.Key, "}"), a.Value)).ToList();
There is a Nuget called StringTokenFormatter that does this well
https://www.nuget.org/packages/StringTokenFormatter/
Regex.Replace makes a single pass over a template string, offering you an opportunity to replace matched expressions. Use it by creating an regular expression that matches any token. Then look up replacement values for the tokens in a dictionary.
static string ReplaceTokens(string template, Dictionary<string, string> replacements) =>
Regex.Replace(template, #"{(\w+)}",
match => replacements.TryGetValue(match.Groups[1].Value, out string replacement) ? replacement : match.Value);
The algorithm completes in time linear with the size of the template string and the replacement strings, so O(t + r). Beware of algorithms that make multiple passes. They run slowly in time O(t * r) and will give incorrect results if one of the replacement values contains a token for later replacement. This unit test shows the pitfall:
public void TestReplaceTokens() {
var replacements = new Dictionary<string, string> {
["Movement"] = "drive {DontReplace}",
["DontReplace"] = "Should not appear"
};
string withReplacements = ReplaceTokens("I always {Movement} in {Direction}.", replacements);
Assert.AreEqual("I always drive {DontReplace} in {Direction}.", withReplacements);
}

Getting groups with Regex

{%if Lang="english", Site="testsite"}
content
{%endif}
I need to get the groups Lang, Site and the content
This is what I am using to get the content part
".*}(.*){%.*"
You can capture groups by using ?<groupname>.
This is a very crude regex to get the groups you want:
\{\%if\s.*(Lang=\"(?<lang>[^\"]*))\".*(Site=\"(?<site>[^\"]*))\"\}(?<content>[^\{]*)\{\%endif\}
when you use a regex from c# you can get the groups by using:
var _regex = new RegEx(...);
var _language = _regex.Groups["lang"].Value;
I'm not sure how flexible you need your regex to be, but try something like:
{%if Lang="(.*)",\s.*Site="(.*?)"}(\r\n)*?(.*?)(\r\n)*?{%endif}
You can then get the matches from there.

Categories