Regex implementation - c#

I have encountered this piece of code that is supposed to determine the parent url in a hierarchy of dynamic (rewritten) urls. The basic logic goes like this:
"/testing/parent/default.aspx" --> "/testing/default.aspx"
"/testing/parent.aspx" --> "/testing/default.aspx"
"/testing/default.aspx" --> "/default.aspx"
"/default.aspx" --> null
...
private string GetParentUrl(string url)
{
string parentUrl = url;
if (parentUrl.EndsWith("Default.aspx", StringComparison.OrdinalIgnoreCase))
{
parentUrl = parentUrl.Substring(0, parentUrl.Length - 12);
if (parentUrl.EndsWith("/"))
parentUrl = parentUrl.Substring(0, parentUrl.Length - 1);
}
int i = parentUrl.LastIndexOf("/");
if (i < 2) return null;
parentUrl = parentUrl.Substring(0, i + 1);
return string.Format(CultureInfo.InvariantCulture, "{0}Default.aspx", parentUrl);
}
This code works but it smells to me. It will not work with urls that have a querystring. How can I improve it using regex?

Have a look at the answers to SO question "Getting the parent name of a URI/URL from absolute name C#"
This will show you how to use System.Uri to access the segments of an URL. System.Uri also allows to manipulate the URL in the way you want (well, not the custom logic) without the danger of creating invalid URLs. There is no need to hack your own functions to dissect URLs.

A straight forward approach will be splitting url by "?" and concatenate query string at the end...

I recommend you not to use Regex in this scenario. Regex that solves this task will be "real code smell". Above code isn't so bad, use f3lix and Leon Shmulevich recommendations to make it better.

Related

How can I get a part/subdomain of my URL in C#?

I have a URL like the following
http://yellowcpd.testpace.net
How can I get yellowcpd from this? I know I can do that with string parsing, but is there a builtin way in C#?
Assuming your URLs will always be testpace.net, try this:
var subdomain = Request.Url.Host.Replace("testpace.net", "").TrimEnd('.');
It'll just give you the non-testpace.net part of the Host. If you don't have Request.Url.Host, you can do new Uri(myString).Host instead.
try this
string url = Request.Url.AbsolutePath;
var myvalues= url.Split('.');
How can I get yellowcpd from this? I know I can do that with string
parsing, but is there a builtin way in C#?
.Net doesn't provide a built-in feature to extract specific parts from Uri.Host. You will have to use string manipulation or a regular expression yourself.
The only constant part of the domain string is the TLD. The TLD is the very last bit of the domain string, eg .com, .net, .uk etc. Everything else under that depends on the particular TLD for its position (so you can't assume the next to last part is the "domain name" as, for .co.uk it would be .co
This fits the bill.
Split over two lines:
string rawURL = Request.Url.Host;
string domainName = rawURL .Split(new char[] { '.', '.' })[1];
Or over one:
string rawURL = Request.Url.Host.Split(new char[] { '.', '.' })[1];
The simple answer to your question is no there isn't a built in method to extract JUST the sub-domain. With that said this is the solution that I use...
public enum GetSubDomainOption
{
ExcludeWWW,
IncludeWWW
};
public static class Extentions
{
public static string GetSubDomain(this Uri uri,
GetSubDomainOption getSubDomainOption = GetSubDomainOption.IncludeWWW)
{
var subdomain = new StringBuilder();
for (var i = 0; i < uri.Host.Split(new char[]{'.'}).Length - 2; i++)
{
//Ignore any www values of ExcludeWWW option is set
if(getSubDomainOption == GetSubDomainOption.ExcludeWWW && uri.Host.Split(new char[]{'.'})[i].ToLowerInvariant() == "www") continue;
//I use a ternary operator here...this could easily be converted to an if/else if you are of the ternary operators are evil crowd
subdomain.Append((i < uri.Host.Split(new char[]{'.'}).Length - 3 &&
uri.Host.Split(new char[]{'.'})[i+1].ToLowerInvariant() != "www") ?
uri.Host.Split(new char[]{'.'})[i] + "." :
uri.Host.Split(new char[]{'.'})[i]);
}
return subdomain.ToString();
}
}
USAGE:
var subDomain = Request.Url.GetSubDomain(GetSubDomainOption.ExcludeWWW);
or
var subDomain = Request.Url.GetSubDomain();
I currently have the default set to include the WWW. You could easilly reverse this by switching the optional parameter value in the GetSubDomain() method.
In my opinion this allows for an option that looks nice in code and without digging in appears to be 'built-in' to c#. Just to confirm your expectations...I tested three values and this method will always return just the "yellowcpd" if the exclude flag is used.
www.yellowcpd.testpace.net
yellowcpd.testpace.net
www.yellowcpd.www.testpace.net
One assumption that I use is that...splitting the hostname on a . will always result in the last two values being the domain (i.e. something.com)
As others have pointed out, you can do something like this:
var req = new HttpRequest(filename: "search", url: "http://www.yellowcpd.testpace.net", queryString: "q=alaska");
var host = req.Url.Host;
var yellow = host.Split('.')[1];
The portion of the URL you want is part of the hostname. You may hope to find some method that directly addresses that portion of the name, e.g. "the subdomain (yellowcpd) within TestSpace", but this is probably not possible, because the rules for valid host names allow for any number of labels (see Valid Host Names). The host name can have any number of labels, separated by periods. You will have to add additional restrictions to get what you want, e.g. "Separate the host name into labels, discard www if present and take the next label".

Getting string after a specific slash

I have a string and I want to get whatever is after the 3rd slash so.
I don't know of any other way I can do this, I don't really want to use regex if I dont need it.
http://www.website.com/hello for example would be hello
I have used str.LastIndexOf('/') before like:
string str3 = str.Substring(str.LastIndexOf('/') + 1);
However I am still trying to figure out how to do this for a slash that is not the first or last
string s = "some/string/you/want/to/split";
string.Join("/", s.Split('/').Skip(3).ToArray());
As suggested by C.Evenhuis, you should rely on the native System.Uri class:
string url = "http://stackoverflow.com/questions/20213490/getting-string-after-a-specific-slash"
Uri asUri = new Uri(url);
string result = asUri.LocalPath;
Console.WriteLine(result);
(live at http://csharpfiddle.com/LlLbriBm)
This will output:
/questions/20213490/getting-string-after-a-specific-slash
If you don't want the first / in the result, simply use:
string url = "http://stackoverflow.com/questions/20213490/getting-string-after-a-specific-slash"
Uri asUri = new Uri(url);
string result = asUri.LocalPath.TrimStart('/');
Console.WriteLine(result);
You should take a look in the System.Uri class documentation. There's plenty of property that can you can play with, depending on what you want to actually keep in the url (url parameters, hashtag, etc.)
If you're manipulating URLs, then use the Uri class instead of rolling your own.
But if you want to do it manually for educational reasons, you could do something like this:
int startPos = 0;
for(int i = 0; i < 3; i++)
{
startPos = s.IndexOf('/', startPos)+1;
}
var stringOfInterest = s.Substring(startPos);
There are lots of ways this might fail if the string isn't in the form you expect, so it's just an example to get you started.
Although this is premature optimisation, this sort of approach is more efficient than smashing the whole string into components and putting them back together again.

find string using c#?

I am trying find a string in below string.
http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779
by using http://example.com/TIGS/SIM/Lists string. How can I get Team Discussion word from it?
Some times strings will be
http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779
I need `Team Discussion`
http://example.com/TIGS/ALIF/Lists/Artifical Lift Discussion Forum 2/DispForm.aspx?ID=8
I need `Artifical Lift Discussion Forum 2`
If you're always following that pattern, I recommend #Justin's answer. However, if you want a more robust method, you can always couple the System.Uri and Path.GetDirectoryName methods, then perform a String.Split. Like this example:
String url = #"http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779";
System.Uri uri = new System.Uri(url);
String dir = Path.GetDirectoryName(uri.AbsolutePath);
String[] parts = dir.Split(new[]{ Path.DirectorySeparatorChar });
Console.WriteLine(parts[parts.Length - 1]);
The only major problem, however, is you're going to wind up with a path that's been "encoded" (i.e. your space is now going to be represented by a %20)
This solution will get you the last directory of your URL regardless of how many directories are in your URL.
string[] arr = s.Split('/');
string lastPart = arr[arr.Length - 2];
You could combine this solution into one line, however it would require splitting the string twice, once for the values, the second for the length.
If you wanted to see a regular expression example:
string input = "http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx?ID=1779";
string given = "http://example.com/TIGS/SIM/Lists";
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(given + #"\/(.+)\/");
System.Text.RegularExpressions.Match match = regex.Match(input);
Console.WriteLine(match.Groups[1]); // Team Discussion
Here's a simple approach, assuming that your URL always has the same number of slashes before the are you want:
var value = url.Split(new[]{'/'}, StringSplitOptions.RemoveEmptyEntries)[5];
Here is another solution that provides the following advantages:
Does not require the use of regular expressions.
Does not require a certain 'count' of slashes be present (indexing based of a specific number). I consider this a key benefit because it makes the code less likely to fail if some part of the URL changes. Ultimately it is best to base your parsing logic off which part of the text's structure you consider least likely to change.
This method, however, DOES rely on the following assumptions, which I consider to be the least likely to change:
URL must have "/Lists/" right before target text.
URL must have "/" right after target text.
Basically, I just split the string twice, using text that I expect to be surrounding the area I am interested in.
String urlToSearch = "http://example.com/TIGS/SIM/Lists/Team Discussion/DispForm.aspx";
String result = "";
// First, get everthing after "/Lists/"
string[] temp1 = urlToSearch.Split(new String[] { "/Lists/" }, StringSplitOptions.RemoveEmptyEntries);
if (temp1.Length > 1)
{
// Next, get everything before the first "/"
string[] temp2 = temp1[1].Split(new String[] { "/" }, StringSplitOptions.RemoveEmptyEntries);
result = temp2[0];
}
Your answer will then be stored in the 'result' variable.

How to use C# regular expressions to emulate forum tags

I am building a forum and I want to be able to use simple square bracket tags to allow users to format text. I am currently accomplishing this by parsing the string and looking for the tags. It's very tedious, especially when I run into a tag like this [url=http://www.something.com]Some text[/url]. Having to parse the attribute, and the value, and make sure it has proper opening and closing tags is kind of a pain and seems silly. I know how powerful regular expressions are but I'm not good at them and they frustrate me to no end.
Any of you regex gurus willing to help me out? I think an example would get me started. Just a regex for finding tags like [b]bolded text[/b] and tags with attributes like the link one I listed above would be helpful. Thanks in advance!
Edit: Links to laymen's terms tutorials for regex are also helpful.
This should work. The "=something.com" is optional and accommodates single or double quotes and it also makes sure that the closing tag matches the opening tag.
protected void Page_Load(object sender, EventArgs e)
{
string input = #"My link: [url='http://www.something.com'][b]Some text[/b][/url] is awesome. Jazz hands activate!!";
string result = Parse(input);
}
//Result: My link: <b>Some text</b> is awesome. Jazz hands activate!!
private static string Parse(string input)
{
string regex = #"\[([^=]+)[=\x22']*(\S*?)['\x22]*\](.+?)\[/(\1)\]";
MatchCollection matches = new Regex(regex).Matches(input);
for (int i = 0; i < matches.Count; i++)
{
var tag = matches[i].Groups[1].Value;
var optionalValue = matches[i].Groups[2].Value;
var content = matches[i].Groups[3].Value;
if (Regex.IsMatch(content, regex))
{
content = Parse(content);
}
content = HandleTags(content, optionalValue, tag);
input = input.Replace(matches[i].Groups[0].Value, content);
}
return input;
}
private static string HandleTags(string content, string optionalValue, string tag)
{
switch (tag.ToLower())
{
case "url":
return string.Format("{1}", optionalValue, content);
case "b":
return string.Format("<b>{0}</b>", content);
default:
return string.Empty;
}
}
UPDATE
Now i'm just having fun with this. I cleaned it up a bit and replaced the " with \x22 so the entire string can easily be escaped per #Brad Christie's suggestion. Also this regex won't break if there are "[" or "]" characters in the content. Also it handles tags recursively
I'm not saying that you can't do this with regular expressions, but I think you're going to find it very, very difficult. You'll have to decide what to do with things like [b]this is [bold text[/b], and other cases where the user has [ or ] characters. And will you allow nesting? (i.e. [b]this is bold, [i]italic[/i] text[/b]).
I would suggest that you look into using something like Markdown.

How do I replace all the spaces with %20 in C#?

I want to make a string into a URL using C#. There must be something in the .NET framework that should help, right?
Another way of doing this is using Uri.EscapeUriString(stringToEscape).
I believe you're looking for HttpServerUtility.UrlEncode.
System.Web.HttpUtility.UrlEncode(string url)
I found useful System.Web.HttpUtility.UrlPathEncode(string str);
It replaces spaces with %20 and not with +.
To properly escape spaces as well as the rest of the special characters, use System.Uri.EscapeDataString(string stringToEscape).
As commented on the approved story, the HttpServerUtility.UrlEncode method replaces spaces with + instead of %20.
Use one of these two methods instead: Uri.EscapeUriString() or Uri.EscapeDataString()
Sample code:
HttpUtility.UrlEncode("https://mywebsite.com/api/get me this file.jpg")
//Output: "https%3a%2f%2fmywebsite.com%2fapi%2fget+me+this+file.jpg"
Uri.EscapeUriString("https://mywebsite.com/api/get me this file.jpg");
//Output: "https://mywebsite.com/api/get%20me%20this%20file.jpg"
Uri.EscapeDataString("https://mywebsite.com/api/get me this file.jpg");
//Output: "https%3A%2F%2Fmywebsite.com%2Fapi%2Fget%20me%20this%20file.jpg"
//When your url has a query string:
Uri.EscapeUriString("https://mywebsite.com/api/get?id=123&name=get me this file.jpg");
//Output: "https://mywebsite.com/api/get?id=123&name=get%20me%20this%20file.jpg"
Uri.EscapeDataString("https://mywebsite.com/api/get?id=123&name=get me this file.jpg");
//Output: "https%3A%2F%2Fmywebsite.com%2Fapi%2Fget%3Fid%3D123%26name%3Dget%20me%20this%20file.jpg"
I needed to do this too, found this question from years ago but question title and text don't quite match up, and using Uri.EscapeDataString or UrlEncode (don't use that one please!) doesn't usually make sense unless we are talking about passing URLs as parameters to other URLs.
(For example, passing a callback URL when doing open ID authentication, Azure AD, etc.)
Hoping this is more pragmatic answer to the question: I want to make a string into a URL using C#, there must be something in the .NET framework that should help, right?
Yes - two functions are helpful for making URL strings in C#
String.Format for formatting the URL
Uri.EscapeDataString for escaping any parameters in the URL
This code
String.Format("https://site/app/?q={0}&redirectUrl={1}",
Uri.EscapeDataString("search for cats"),
Uri.EscapeDataString("https://mysite/myapp/?state=from idp"))
produces this result
https://site/app/?q=search%20for%20cats&redirectUrl=https%3A%2F%2Fmysite%2Fmyapp
Which can be safely copied and pasted into a browser's address bar, or the src attribute of a HTML A tag, or used with curl, or encoded into a QR code, etc.
Use HttpServerUtility.UrlEncode
HttpUtility.UrlDecode works for me:
var str = "name=John%20Doe";
var str2 = HttpUtility.UrlDecode(str);
str2 = "name=John Doe"
HttpUtility.UrlEncode Method (String)
The below code will replace repeating space with a single %20 character.
Example:
Input is:
Code by Hitesh Jain
Output:
Code%20by%20Hitesh%20Jain
Code
static void Main(string[] args)
{
Console.WriteLine("Enter a string");
string str = Console.ReadLine();
string replacedStr = null;
// This loop will repalce all repeat black space in single space
for (int i = 0; i < str.Length - 1; i++)
{
if (!(Convert.ToString(str[i]) == " " &&
Convert.ToString(str[i + 1]) == " "))
{
replacedStr = replacedStr + str[i];
}
}
replacedStr = replacedStr + str[str.Length-1]; // Append last character
replacedStr = replacedStr.Replace(" ", "%20");
Console.WriteLine(replacedStr);
Console.ReadLine();
}
HttpServerUtility.HtmlEncode
From the documentation:
String TestString = "This is a <Test String>.";
String EncodedString = Server.HtmlEncode(TestString);
But this actually encodes HTML, not URLs. Instead use UrlEncode(TestString).

Categories