Using .NET RegEx to retrieve part of a string after the second '-' - c#

This is my first stack message. Hope you can help.
I have several strings i need to break up for use later. Here are a couple of examples of what i mean....
fred-064528-NEEDED
frederic-84728957-NEEDED
sam-028-NEEDED
As you can see above the string lengths vary greatly so regex i believe is the only way to achieve what i want. what i need is the rest of the string after the second hyphen ('-').
i am very weak at regex so any help would be great.
Thanks in advance.

Just to offer an alternative without using regex:
foreach(string s in list)
{
int x = s.LastIndexOf('-')
string sub = s.SubString(x + 1)
}
Add validation to taste.

Something like this. It will take anything (except line breaks) after the second '-' including the '-' sign.
var exp = #"^\w*-\w*-(.*)$";
var match = Regex.Match("frederic-84728957-NEE-DED", exp);
if (match.Success)
{
var result = match.Groups[1]; //Result is NEE-DED
Console.WriteLine(result);
}
EDIT: I answered another question which relates to this. Except, it asked for a LINQ solution and my answer was the following which I find pretty clear.
Pimp my LINQ: a learning exercise based upon another post
var result = String.Join("-", inputData.Split('-').Skip(2));
or
var result = inputData.Split('-').Skip(2).FirstOrDefault(); //If the last part is NEE-DED then only NEE is returned.
As mentioned in the other SO thread it is not the fastest way of doing this.

If they are part of larger text:
(\w+-){2}(\w+)
If there are presented as whole lines, and you know you don't have other hyphens, you may also use:
[^-]*$
Another option, if you have each line as a string, is to use split (again, depending on whether or not you're expecting extra hyphens, you may omit the count parameter, or use LastIndexOf):
string[] tokens = line.Split("-".ToCharArray(), 3);
string s = tokens.Last();

This should work:
.*?-.*?-(.*)

This should do the trick:
([^\-]+)\-([^\-]+)\-(.*?)$

the regex pattern will be
(?<first>.*)?-(?<second>.*)?-(?<third>.*)?(\s|$)
then you can get the named group "second" to get the test after 2nd hyphen
alternatively
you can do a string.split('-') and get the 2 item from the array

Related

Regex : Replace text between semicolons a certain amount of times

i'm a bit confused with regex, i have a line which looks like something like this :
test = "article;vendor;qty;desc;price1;price2"
and what i'm trying to do is to only get price1.
I'm currently using this function :
Regex.Replace(test, #".*;[^;]*;", "");
which permit me to get price2 but I can't see how I can isolate price1.
Have you consider just using a String.Split() call instead to break your current semi-colon delimited string into an array :
var input = "article;vendor;qty;desc;price1;price2";
var output = input.Split(';');
And then you could simply access your value by its index :
var result = output[4]; // yields "price1"
You will only want to use a Regular Expression if there is a specific pattern that you can use to match and select exactly what you are looking for, but for delimited lists, the String.Split() method will usually make things easier (especially if there is nothing to uniquely identify the item you are trying to pull from the list).
Use the following regex:
(?:[^;]*;){4}([^;]*);
And replace the first match group.

How to remove a pattern from a string using Regex

I want to find paths from a string and remove them, e.g.:
string1 = "'c:\a\b\c'!MyUDF(param1, param2,..) + 'c:\a\b\c'!MyUDF(param3, param4,..)..."`
I'd like a regex to find the pattern '[some path]'!MyUDF, and remove '[path]'.
Thanks.
Edit:
Example input:
string1 = "'c:\a\b\c'!MyUDF(param1, param2,..) + 'c:\a\b\c'!MyUDF(param3, param4,..)";
Expected output: "MyUDF(param1, param2,...) + MyUDF(param3, param4,...)"
where MyUDF is a function name, so it consists of only letters
input=Regex.Replace(input,"'[^']+'(?=!MyUDF)","");
In case if the path is followed by ! and some other word you can use
input=Regex.Replace(input,#"'[^']+'(?=!\w+)","");
Alright, if the ! is always in the string as you suggest, this Regex !(.*)?\( will get you what you want. Here is a Regex 101 to prove it.
To use it, you might do something like this:
var result = Regex.Replace(myString, #"!(.*)?\(");
The feature you want, if you are dealing with file paths, is in System.Path.
There are many methods there, but that is one of it's specific purposes.

String-parsing-fu: Can you help me find a way to retrieve this value?

I need to somehow detect if there is a parent OU value, and if there is retrieve it.
For example, here there is no parent:
LDAP://servera/OU=Santa Cruz,DC=contoso,DC=com
But here, there is a parent:
LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com
So I would need to retrieve that "Ventas" string.
Another example:
LDAP://servera/OU=Contabilidad,OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com
I would need to retrieve that "Ventas" string as well.
Any suggestions on how to tackle this?
string ldap = "LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com";
Match match = Regex.Match(ldap, #"LDAP://\w+/OU=(?<toplevelou>\w+?),OU=");
if(match.Success)
{
Console.WriteLine(match.Result("${toplevelou}"));
}
I'd find the first occurrence of OU=... and get it's value. Then I'd check if there was another occurrence after it. If so, return the value I've got. If not, return whatever it is you want if there's no parent (String.Empty, or, null, or whatever).
You could also use a regular express like this:
var regex = new Regex(#"OU=(.*?),");
var matches = regex.Matches(ldapString);
Then check how many matches there are. If >1 return the captured value from the first match.
Update
The regex above needs to be improved to allow the case where there's an escaped comma (\,) in the LDAP string. Maybe something like:
var regex = new Regex(#"OU=((.*?(\\\,)+?)+?),");
That may be broken, and there may be simpler way to do the same thing. I'm not a regex wizard.
Another Update
Per Kimberly's comment below the regex should be #"OU=((?:.*?(?:\\\,)*?)+?),".
Call me crazy, but I 'd do it this way (hey ma, look, an one-liner!):
var str = "LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com";
var result = str.Substring(str.LastIndexOf('/') + 1).Split(',')
.Select(s => s.Split('='))
.Where(a => a[0] == "OU")
.Select(a => a[1])
.Reverse().Skip(1).FirstOrDefault();
result is either null or has the string you want. This will work no matter how many OUs are in there and return the second-to-last one, as long as the format of the string is valid to begin with.
Update: possible improvements:
The above will not work correctly if your DN contains an escaped forward slash or an escaped comma.
To fix both of these you need to use regular expressions. Change:
str.Substring(str.LastIndexOf('/') + 1).Split(',')
to:
Regex.Split(Regex.Split(str, "(?<!\\\\)/").Last(), "(?<!\\\\),")
What this does is separate the DN by getting the last part of str after splitting on forward slashes, and split the in parts DN by splitting on commas. In both cases, negative lookbehind is used to make sure that the slashes/commas are not escaped.
Not as pretty, I know. But it's still an one-liner (yay!) and it still allows you to use LINQ further down to handle multiple OUs any way you choose to.

C# Regular expression problem

I have the following string:
http://www.powerwXXe.com/text1 123-456 text2 text3/
Can someone give me advice on how to get the value of text1, text2 and text3 and put them into a string. I have heard of regular expressions but have no idea how to use them.
Instead of going the RegEx route, if you know that the string will always be of a similar format, you can using string.Split, first on /, then on space and retrieve the results from the resulting string arrays.
string[] slashes = myString.Split('/');
string[] textVals = slashes[3].Split(' ');
// at this point:
// textVals[0] = "text1"
// textVals[1] = "123-456"
// textVals[2] = "text2"
// textVals[3] = "text3"
Here is a link on getting started with regular expressions in C#:Regular Expression Tutorial
I don't think it is appropriate to write out a tutorial here since the information is online, so please check out the link and let me know if you have a specific question.
Instead of using regex, you can use string.Fromat("http://myurl.com/{0}{1}{2}", value1, textbox2.Text, textbox3.Text) and format the url in whatever fashion. If you are looking to go the regex route, you can always check regexlib.
The use of regular expressions relies on patterns you see in your strings - you need to be able to generalize the pattern of strings you're looking for before you can use a regular expression.
For a problem of this scope, if you can pin down the pattern, you're probably better off using other string parsing methods, such as String.IndexOf and String.Split.
Regular expressions is a powerful tool, and certainly worth learning, but it might not be necessary here.
Based on the example you gave, it looks as though text1, text2 and text3 are separated by spaces? If so, and if you always know the positions they'll be in, you may want to skip regular expressions and just use .Split(' ') to split the string into an array of strings and then grab the pertinent items from there. Something like this:
string foo = "http://www.powerwXXe.com/text1 123-456 text2 text3/"
string[] fooParts = foo.Split(' ');
string text1 = fooParts[0].Replace("http://www.powerwXXe.com/", "");
string text2 = fooParts[2];
string text3 = fooParts[3].Replace("/", "");
You'd want to perform bounds checking on the string[] before trying to grab anything from it, but this would work. Regex is awesome for string parsing, but when it's simple stuff you need to do, sometimes it's overkill when simple methods from the string class will do.
It all depends on how much you know about about the string you are parsing. Where does the string come from and how much do you know about it's formating?
Based on your example string you could get away with something as simple as
string pattern = #"http://www.powerwXXe.com/(?<myGroup1>\S+)\s\S+\s(?<myGroup2>\S+)\s(?<myGroup3>\S+)/";
var reg = new System.Text.RegularExpressions.Regex(pattern);
string input = "http://www.powerwXXe.com/text1 123-456 text2 text3/";
System.Text.RegularExpressions.Match myMatch = reg.Match(input);
The caputerd strings would then be contained in myMatch.Groups["myGroup1"], ["myGroup2"], ["myGroup3"] respectivly.
This however assumes that your string always begins with http://www.powerwXXe.com/, that there will always be three groups to capture and that the groups are separated by a space (which is an illegal character in url's and would in almost all cases be converted to %20, which would have to be accounted for in the pattern).
So, how much do you know about your string? And, as some has already stated, do you really need regular expressions?

Extract substring from string with Regex

Imagine that users are inserting strings in several computers.
On one computer, the pattern in the configuration will extract some characters of that string, lets say position 4 to 5.
On another computer, the extract pattern will return other characters, for instance, last 3 positions of the string.
These configurations (the Regex patterns) are different for each computer, and should be available for change by the administrator, without having to change the source code.
Some examples:
Original_String Return_Value
User1 - abcd78defg123 78
User2 - abcd78defg123 78g1
User3 - mm127788abcd 12
User4 - 123456pp12asd ppsd
Can it be done with Regex?
Thanks.
Why do you want to use regex for this? What is wrong with:
string foo = s.Substring(4,2);
string bar = s.Substring(s.Length-3,3);
(you can wrap those up to do a bit of bounds-checking on the length easily enough)
If you really want, you could wrap it up in a Func<string,string> to put somewhere - not sure I'd bother, though:
Func<string, string> get4and5 = s => s.Substring(4, 2);
Func<string,string> getLast3 = s => s.Substring(s.Length - 3, 3);
string value = "abcd78defg123";
string foo = getLast3(value);
string bar = get4and5(value);
If you really want to use regex:
^...(..)
And:
.*(...)$
To have a regex capture values for further use you typically use (), depending on the regex compiler it might be () or for microsoft MSVC I think it's []
Example
User4 - 123456pp12asd ppsd
is most interesting in that you have here 2 seperate capture areas. Is there some default rule on how to join them together, or would you then want to be able to specify how to make the result?
Perhaps something like
r/......(..)...(..)/\1\2/ for ppsd
r/......(..)...(..)/\2-\1/ for sd-pp
do you want to run a regex to get the captures and handle them yourself, or do you want to run more advanced manipulation commands?
I'm not sure what you are hoping to get by using RegEx. RegEx is used for pattern matching. If you want to extract based on position, just use substring.
It seems to me that Regex really isn't the solution here. To return a section of a string beginning at position pos (starting at 0) and of length length, you simply call the Substring function as such:
string section = str.Substring(pos, length)
Grouping. You could match on /^.{3}(.{2})/ and then look at group $1 for example.
The question is why? Normal string handling i.e. actual substring methods are going to be faster and clearer in intent.

Categories