RegEx to extract a sub level from url - c#

i have the following set of Urls:
http://test/mediacenter/Photo Gallery/Conf 1/1.jpg
http://test/mediacenter/Photo Gallery/Conf 2/3.jpg
http://test/mediacenter/Photo Gallery/Conf 3/Conf 4/1.jpg
All i want to do is to extract the Conf 1, Conf 2,Conf 3 from the urls, the level after 'Photo Gallery' (Urls are not static, they share common level which is Photo Gallery)
Any help is appreciated

Is it necessary to use Regex? You can get it without using Regex like this
string str= #"http://test/mediacenter/Photo Gallery/Conf 1/1.jpg";
var z=qq.Split('/')[5];
or
var x= new Uri(str).Segments[3];

This ought to do you:
var s = #"http://test/mediacenter/Photo Gallery/Conf 11/1.jpg";
var regex = new Regex(#"(Conf \d*)");
var match = regex.Match(s);
Console.WriteLine(match.Groups[0].Value); // Prints a
Of course, you'd have to be confident the 'Conf x' (where x is a number) wasn't going to be elsewhere in the URL.
This will improve it slightly by stripping off multiple folders (Conf 3/Conf 4) in your example.
var regex = new Regex(#"((Conf \d*/*)+)");
It leaves the trailing / though.

No need for regex.
string testCase = "http://test/mediacenter/Photo Gallery/Conf 1/1.jpg";
string urlBase = "http://test/mediacenter/Photo Gallery/";
if(!testCase.StartsWith(urlBase))
{
throw new Exception("URL supplied doesn't belong to base URL.");
}
Uri uriTestCase = new Uri(testCase);
Uri uriBase = new Uri(urlBase);
if(uriTestCase.Segments.Length > uriBase.Segments.Length)
{
System.Console.Out.WriteLine(uriTestCase.Segments[uriBase.Segments.Length]);
}
else
{
Console.Out.WriteLine("No child segment...");
}

Try a RegEx like this.
Conf[^\/]*
This should give you all "Conf" Parts of the URLs.
I hope that helps.

Related

How to Extract Domain name from string with Regex in C#?

I want extract Top-Level Domain names and Country top-level domain names from string with Regex. I tested many Regex like this code:
var linkParser = new Regex(#"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Match m = linkParser.Match(Url);
Console.WriteLine(m.Value);
But none of these codes could do it properly.
The text string entered by the user can be in the following statements:
jonasjohn.com
http://www.jonasjohn.de/snippets/csharp/
jonasjohn.de
www.jonasjohn.de/snippets/csharp/
http://www.answers.com/article/1194427/8-habits-of-extraordinarily-likeable-people
http://www.apple.com
https://www.cnn.com.au
http://www.downloads.news.com.au
https://ftp.android.co.nz
http://global.news.ca
https://www.apple.com/
https://ftp.android.co.nz/
http://global.news.ca/
https://www.apple.com/
https://johnsmith.eu
ftp://johnsmith.eu
johnsmith.gov.ae
johnsmith.eu
www.jonasjohn.de
www.jonasjohn.ac.ir/snippets/csharp
http://www.jonasjohn.de/
ftp://www.jonasjohn.de/
https://subdomain.abc.def.jonasjohn.de/test.htm
The Regex I tested:
^(?:https?:\/\/)?(?:[^#\/\n]+#)?(?:www\.)?([^:\/\n]+)"
\b(?:https?://|www\.)\S+\b
://(?<host>([a-z\\d][-a-z\\d]*[a-z\\d]\\.)*[a-z][-a-z\\d]+[a-z])
and also too many
I just need the domain name and I don't need a protocol or a subdomain.
Like:
Domainname.gTLD or DomainName.ccTLD or DomainName.xyz.ccTLD
I got list of them from PUBLIC SUFFIX
Of course, I've seen a lot of posts on stackoverflow.com, but none of it answered me.
You don't need a Regex to parse a URL. If you have a valid URL, you can use one of the Uri constructors or Uri.TryCreate to parse it:
if(Uri.TryCreate("http://google.com/asdfs",UriKind.RelativeOrAbsolute,out var uri))
{
Console.WriteLine(uri.Host);
}
www.jonasjohn.de/snippets/csharp/ and jonasjohn.de/snippets/csharp/ aren't valid URLs though. TryCreate can still parse them as relative URLs, but reading Host throws System.InvalidOperationException: This operation is not supported for a relative URI.
In that case you can use the UriBuilder class, to parse and modify the URL eg:
var bld=new UriBuilder("jonasjohn.com");
Console.WriteLine(bld.Host);
This prints
jonasjohn.com
Setting the Scheme property produces a valid,complete URL:
bld.Scheme="https";
Console.WriteLine(bld.Uri);
This produces:
https://jonasjohn.com:80/
According to Lidqy answer, I wrote this function, which I think supports most possible situations, and if the input value is out of this, you can make it an exception.
public static string ExtractDomainName(string Url)
{
var regex = new Regex(#"^((https?|ftp)://)?(www\.)?(?<domain>[^/]+)(/|$)");
Match match = regex.Match(Url);
if (match.Success)
{
string domain = match.Groups["domain"].Value;
int freq = domain.Where(x => (x == '.')).Count();
while (freq > 2)
{
if (freq > 2)
{
var domainSplited = domain.Split('.', 2);
domain = domainSplited[1];
freq = domain.Where(x => (x == '.')).Count();
}
}
return domain;
}
else
{
return String.Empty;
}
}
var rx = new Regex(#"^((https?|ftp)://)?(www\.)?(?<domain>[^/]+)(/|$)");
var data = new[] { "jonasjohn.com",
"http://www.jonasjohn.de/snippets/csharp/",
"jonasjohn.de",
"www.jonasjohn.de/snippets/csharp/",
"http://www.answers.com/article/1194427/8-habits-of-extraordinarily-likeable-people",
"http://www.apple.com",
"https://www.cnn.com.au",
"http://www.downloads.news.com.au",
"https://ftp.android.co.nz",
"http://global.news.ca",
"https://www.apple.com/",
"https://ftp.android.co.nz/",
"http://global.news.ca/",
"https://www.apple.com/",
"https://johnsmith.eu",
"ftp://johnsmith.eu",
"johnsmith.gov.ae",
"johnsmith.eu",
"www.jonasjohn.de",
"www.jonasjohn.ac.ir/snippets/csharp",
"http://www.jonasjohn.de/",
"ftp://www.jonasjohn.de/",
"https://subdomain.abc.def.jonasjohn.de/test.htm"
};
foreach (var dat in data) {
var match = rx.Match(dat);
if (match.Success)
Console.WriteLine("{0} => {1}", dat, match.Groups["domain"].Value);
else {
Console.WriteLine("{0} => NO MATCH", dat);
}
}

Extract ID and replace everything in `Example HTML`

New to Regular Expressions, I want to have the following text in my HTML and would like to replace with something else
Example HTML:
{{Object id='foo'}}
Extract the id into a variable like this:
string strId = "foo";
So far I have the following Regular Expression code that will capture the Example HTML:
string strStart = "Object";
string strFind = "{{(" + strStart + ".*?)}}";
Regex regExp = new Regex(strFind, RegexOptions.IgnoreCase);
Match matchRegExp = regExp.Match(html);
while (matchRegExp.Success)
{
//At this point, I have this variable:
//{{Object id='foo'}}
//I can find the id='foo' (see below)
//but not sure how to extract 'foo' and use it
string strFindInner = "id='(.*?)'"; //"{{Slider";
Regex regExpInner = new Regex(strFindInner, RegexOptions.IgnoreCase);
Match matchRegExpInner = regExpInner.Match(matchRegExp.Value.ToString());
//Do something with 'foo'
matchRegExp = matchRegExp.NextMatch();
}
I understand this might be a simple solution, I am hoping to gain more knowledge about Regular Expressions but more importantly, I am hoping to receive a suggestion on how to approach this cleaner and more efficiently.
Thank you
Edit:
Is this an example that I could potentially use: c# regex replace
While I am not solving my initial question with Regular Expressions, I did move into a simpler solution using SubString, IndexOf and string.Split for the time being, I understand that my code needs to be cleaned up but thought I would post the answer that I have thus far.
string html = "<p>Start of Example</p>{{Object id='foo'}}<p>End of example</p>"
string strObject = "Slider"; //Example
//When found, this will contain "{{Object id='foo'}}"
string strCode = "";
//ie: "id='foo'"
string strCodeInner = "";
//Tags will be a list, but in this example, only "id='foo'"
string[] tags = { };
//Looking for the following "{{Object "
string strFindStart = "{{" + strObject + " ";
int intFindStart = html.IndexOf(strFindStart);
//Then ending in the following
string strFindEnd = "}}";
int intFindEnd = html.IndexOf(strFindEnd) + strFindEnd.Length;
//Must find both Start and End conditions
if (intFindStart != -1 && intFindEnd != -1)
{
strCode = html.Substring(intFindStart, intFindEnd - intFindStart);
//Remove Start and End
strCodeInner = strCode.Replace(strFindStart, "").Replace(strFindEnd, "");
//Split by spaces, this needs to be improved if more than IDs are to be used
//but for proof of concept this is perfect
tags = strCodeInner.Split(new char[] { ' ' });
}
Dictionary<string, string> dictTags = new Dictionary<string, string>();
foreach (string tag in tags)
{
string[] tagSplit = tag.Split(new char[] { '=' });
dictTags.Add(tagSplit[0], tagSplit[1].Replace("'", "").Replace("\"", ""));
}
//At this point, I can replace "{{Object id='foo'}}" with anything I'd like
//What I don't show is that I go into the website's database,
//get the object (ie: Slider) and return the html for slider with the ID of foo
html = html.Replace(strCode, strView);
/*
"html" variable may contain:
<p>Start of Example</p>
<p id="foo">This is the replacement text</p>
<p>End of example</p>
*/

The fastest way to trim string in C#

I need to trim paths in million strings like this:
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
to
src\my_component\my_file.cpp
I.e. remove absolute part of the path, what is the fastest way to do that?
My try using regex:
Regex.Replace(path, #"(.*?)\src", ""),
I wouldn't go with regex for this, use the plain old method.
If the path prefix is always the same:
const string partToRemove = #"C:\workspace\my_projects\my_app\";
if (path.StartsWith(partToRemove, StringComparison.OrdinalIgnoreCase))
path = path.Substring(partToRemove.Length);
If the prefix is variable, you can get the last index of \src\:
var startIndex = path.LastIndexOf(#"\src\", StringComparison.OrdinalIgnoreCase);
if (startIndex >= 0)
path = path.Substring(startIndex + 1);
define the regex with a new and reuse it
there is a (significant) cost to creating the regex
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
I'm not sure if you need speed here, but if you always get the full path, you could do a simple .Substring()
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
Console.WriteLine(path.Substring(32));
However, I think you should sanitize your input first; in this case, the Uri class could do the parsing step:
var root = #"C:\workspace\my_projects\my_app\";
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
var relative = new Uri(root).MakeRelativeUri(new Uri(path));
Console.WriteLine(relative.OriginalString.Replace("/", "\\"));
Notice here the Uri will change the \ with a /: that's the .Replace reason.
Cant think any faster than this
path.Substring(33);
What is before src is constant. and it starts from index 33.
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
^
How ever if its not always constant. you can find it once. and do the rest inside loop.
int startInd = path.IndexOf(#"\src\") + 1;
// Do this inside loop. 1 million times
path.Substring(startInd);
If your files will all end in "src/filename.ext" you could use the Path class in the .NET framework for it and get around all caveats you could have with pathes and filenames:
result = "src\" + Path.GetFileName(path);
So you should first double-check that the conversion is the thing that takes to long.

Parsing Image Path in ImageResizer

I'm resizing an image dynamically thus:
ImageJob i = new ImageJob(file, "~/eventimages/<guid>_<filename:A-Za-z0-9>.<ext>",
new ResizeSettings("width=200&height=133&format=jpg&crop=auto"));
i.Build();
I'm attempting to store the image relative URL in the DB. The i.FinalPath property gives me:
C:\inetpub\wwwroot\Church\eventimages\56b640bff5ba43e8aa161fff775c5f97_scenery.jpg
How can I obtain just the image filename - best way to parse this?
Desired string: /eventimages/56b640bff5ba43e8aa161fff775c5f97_scenery.jpg
something like below,
var sitePath = MapPath(#"~");
var relativePath= i.FinalPath.Replace(sitePath, "~");
Just use Regular expressions
Regex.Match
Create you pattern and extract desired value
string input = "C:\\inetpub\\wwwroot\\Church\\eventimages\\56b640bff5ba43e8aa161fff775c5f97_scenery.jpg";
Match match = Regex.Match(input, #"^C:\\[A-Za-z0-9_]+\\[A-Za-z0-9_]+\\[A-Za-z0-9_]+\\([A-Za-z0-9_]+\\[A-Za-z0-9_]+\.jpg)$", RegexOptions.IgnoreCase);
if (match.Success)
{
// Finally, we get the Group value and display it.
string path = match.Groups[1].Value.Replace("\\", "/");
}
Here is what I use in a utility method:
Uri uri1 = new Uri(i.FinalPath);
Uri uri2 = new Uri(HttpContext.Current.Server.MapPath("/"));
Uri relativeUri = uri2.MakeRelativeUri(uri1);
(stolen from someone else... can't remember who, but thanks)

Regex required for renaming file in C#

I need a regex for renaming file in c#. My file name is 22px-Flag_Of_Sweden.svg.png. I want it to rename as sweden.png.
So for that I need regex. Please help me.
I have various files more than 300+ like below:
22px-Flag_Of_Sweden.svg.png - should become sweden.png
13px-Flag_Of_UnitedStates.svg.png - unitedstates.png
17px-Flag_Of_India.svg.png - india.png
22px-Flag_Of_Ghana.svg.png - ghana.png
These are actually flags of country. I want to extract Countryname.Fileextension. Thats all.
var fileNames = new [] {
"22px-Flag_Of_Sweden.svg.png"
,"13px-Flag_Of_UnitedStates.svg.png"
,"17px-Flag_Of_India.svg.png"
,"22px-Flag_Of_Ghana.svg.png"
,"asd.png"
};
var regEx = new Regex(#"^.+Flag_Of_(?<country>.+)\.svg\.png$");
foreach ( var fileName in fileNames )
{
if ( regEx.IsMatch(fileName))
{
var newFileName = regEx.Replace(fileName,"${country}.png").ToLower();
//File.Save(Path.Combine(root, newFileName));
}
}
I am not exactly sure how this would look in c# (although the regex is important and not the language), but in Java this would look like this:
String input = "22px-Flag_Of_Sweden.svg.png";
Pattern p = Pattern.compile(".+_(.+?)\\..+?(\\..+?)$");
Matcher m = p.matcher(input);
System.out.println(m.matches());
System.out.println(m.group(1).toLowerCase() + m.group(2));
Where the relevant for you is this part :
".+_(.+?)\\..+?(\\..+?)$"
Just concat the two groups.
I wish I knew a bit of C# right now :)
Cheers Eugene.
This will return country in the first capture group: ([a-zA-Z]+)\.svg\.png$
I don't know c# but the regex could be:
^.+_(\pL+)\.svg\.png
and the replace part is : $1.png

Categories