Writing a regex command to split an array in c# - c#

Regex is one of those things I've wanted to be able to write myself and although I have a basic understand of how it works I've never found myself in the situation where I needed to use it where it doesn't exist already widely on the web (such as for validating email addresses).
A problem that I have is that I am receiving a string which is comma separated, however some of the string values contain commas also. For example I might receive:
$COMMAND=1,2,3,"string","another,string",4,5,6
Generally I will never receive anything like this, however the device sending me this string array allows for it to happen so I would like to be able to split the array accordingly if it ever were to occur.
So obviously just splitting it like so (where rawResponse has the $COMMAND= part removed:
string[] response = rawResponse.Split(',');
Is not good enough! I think regex is the correct tool for the job, could anyone help me write it?

string rawResponse = #"1,2,3,""string"",""another,string"",4,5";
string pattern = #"[^,""]+|""([^""]*)""";
foreach(Match match in Regex.Matches(rawResponse, pattern))
// use match.Value
Results:
1
2
3
"string"
"another,string"
4
5
If you need response as array of strings you can use Linq:
var response = Regex.Matches(rawResponse, pattern).Cast<Match>()
.Select(m => m.Value).ToArray();

string originalString = #"1,2,3,""string"",""another,string"",4,5,6";
string regexPattern = #"(("".*?"")|(.*?))(,|$)";
foreach(Match match in Regex.Matches(originalString, regexPattern))
{
}

Related

Pattern not correct with regular expressions in c#

i am doing a project , and i want to remove from a string http protocoll. In my excel sheet there are two types one is http://www.email#domain.com and the other is http://email#domain.com.I have tried so many combinations but i can't find the right one.
My code only works with the first type and not with the second one
var website_domain_in_excel = list_of_information_in_excel[2];
string pattern = "(http://\\www.)";
Console.WriteLine(Regex.Replace(website_domain_in_excel, pattern, String.Empty));
Thank you for your time
The pattern you want is this:
string pattern = #"http:\/\/(?:www\.)?"
This matches http:// and then an optional non-capturing group matching www..
You can see an explanation of the regex here and this fiddle for a working demo in C#.
You can use the string: "http?://?www.|http?://" which matches either "http://www." or "http://".
The code would look like this:
var website_domain_in_excel = list_of_information_in_excel[2];
string pattern = #"http:\/\/www.|http:\/\/";
Console.WriteLine(Regex.Replace(website_domain_in_excel, pattern, String.Empty));
A non-regex solution:
var eml = "http://www.email#domain.com";
eml = eml.Replace("http://", "").Replace("www.", "");
// eml now is "email#domain.com"
You might want to test that that "www." only appears at the start. The (unusual) "email#www.domain.com" should remain intact.
But if you really want a regex:
eml = Regex.Replace(eml, "^https?://(www\\.)?", "");
This also catches "https", because of the ? after that "s"
It will also find and replace an optional "www.", but only at the start

RegEx to extract partial string

So simple but I'm struggling, I do RegExp every 2 years or so , so I'm rusty
I have these two url strings
http://localhost:58876/Products/Product1
https://localhost:58876/Products/Product1
The result I want is
localhost:58876
Basically remove the http(s):// and everything after the first single / so I end up with the domain with or without the port number
P.S: I'm working with C#
This worked for me (tested int notepad++):
(\w+:\d+)
You can use the following regex to split the URL:
((http[s]?|ftp):/)?/?([^:/\s]+)(:([^/]))?((/\w+)/)([\w-.]+[^#?\s]+)(\?([^#]))?(#(.))?
The RegEx positions 3 and 5 are those you are looking for.
(^[^h]|\/\/)([\w\d\:\#\.]+:?[\d]?+)
then in c#:
string address = ...
char[] MyChar = {'/'};
string NewString = address.TrimStart(MyChar);
EDIT: also worked with localhost:58876/Products/Product1
!
Just match anything but a slash: /^https?:\/\/([^\/]+)\/.*$/
var url = 'http://localhost:58876/Products/Product1';
var match = url.match(/^https?:\/\/([^\/]+)\/.*$/);
if(match&&match.length>0)document.write(match[1]);
Even shorter: /\/\/([^\/]+)/. Note that there are (a lot) better ways to parse URLs. Depending on your platform, there’s PHP’s parse_url, NodeJS’s url module or libraries like uri.js that handle the many faces of valid URIs.

Alternative to RegEx

I am currently passing a parameter to a SQL string like this -
grid=0&
And I am using a RegEx to get the 0 value like so-
Match match = Regex.Match(input, #"grid=([A-Za-z0-9\-]+)\&$",
RegexOptions.IgnoreCase);
string grid = match.Groups[1].Value;
which works perfectly.
However as development has progressed it is clear that more parameters will be added to the string like so-
grid=0&hr=3&tb=0
These parameters may come in a different order in the string each time so clearly the RegEx I am currently using wont work. I have looked into it and think Split may be an option however not sure.
What would the best method be and how could I apply it to my current problem?
If you're parsing query string and looking for an alternative to Regex, there is a specialized class and method for that, it returns collection of parameters:
string s = "http://www.something.com?grid=0&hr=3&tb=0";
Uri uri = new Uri(s);
var result = HttpUtility.ParseQueryString(uri.Query);
You have to include System.Web namespace.
You can access each of the parameters' values by using it's key:
foreach (string key in result.Keys)
{
string value = result[key];
// action...
}
Regexes can still be used here. Consider adding another capture group to capture the property name, and then looping over all of the results using Matches rather that Match, or calling Match multiple times.

C# Regular expression problem

I have the following string:
http://www.powerwXXe.com/text1 123-456 text2 text3/
Can someone give me advice on how to get the value of text1, text2 and text3 and put them into a string. I have heard of regular expressions but have no idea how to use them.
Instead of going the RegEx route, if you know that the string will always be of a similar format, you can using string.Split, first on /, then on space and retrieve the results from the resulting string arrays.
string[] slashes = myString.Split('/');
string[] textVals = slashes[3].Split(' ');
// at this point:
// textVals[0] = "text1"
// textVals[1] = "123-456"
// textVals[2] = "text2"
// textVals[3] = "text3"
Here is a link on getting started with regular expressions in C#:Regular Expression Tutorial
I don't think it is appropriate to write out a tutorial here since the information is online, so please check out the link and let me know if you have a specific question.
Instead of using regex, you can use string.Fromat("http://myurl.com/{0}{1}{2}", value1, textbox2.Text, textbox3.Text) and format the url in whatever fashion. If you are looking to go the regex route, you can always check regexlib.
The use of regular expressions relies on patterns you see in your strings - you need to be able to generalize the pattern of strings you're looking for before you can use a regular expression.
For a problem of this scope, if you can pin down the pattern, you're probably better off using other string parsing methods, such as String.IndexOf and String.Split.
Regular expressions is a powerful tool, and certainly worth learning, but it might not be necessary here.
Based on the example you gave, it looks as though text1, text2 and text3 are separated by spaces? If so, and if you always know the positions they'll be in, you may want to skip regular expressions and just use .Split(' ') to split the string into an array of strings and then grab the pertinent items from there. Something like this:
string foo = "http://www.powerwXXe.com/text1 123-456 text2 text3/"
string[] fooParts = foo.Split(' ');
string text1 = fooParts[0].Replace("http://www.powerwXXe.com/", "");
string text2 = fooParts[2];
string text3 = fooParts[3].Replace("/", "");
You'd want to perform bounds checking on the string[] before trying to grab anything from it, but this would work. Regex is awesome for string parsing, but when it's simple stuff you need to do, sometimes it's overkill when simple methods from the string class will do.
It all depends on how much you know about about the string you are parsing. Where does the string come from and how much do you know about it's formating?
Based on your example string you could get away with something as simple as
string pattern = #"http://www.powerwXXe.com/(?<myGroup1>\S+)\s\S+\s(?<myGroup2>\S+)\s(?<myGroup3>\S+)/";
var reg = new System.Text.RegularExpressions.Regex(pattern);
string input = "http://www.powerwXXe.com/text1 123-456 text2 text3/";
System.Text.RegularExpressions.Match myMatch = reg.Match(input);
The caputerd strings would then be contained in myMatch.Groups["myGroup1"], ["myGroup2"], ["myGroup3"] respectivly.
This however assumes that your string always begins with http://www.powerwXXe.com/, that there will always be three groups to capture and that the groups are separated by a space (which is an illegal character in url's and would in almost all cases be converted to %20, which would have to be accounted for in the pattern).
So, how much do you know about your string? And, as some has already stated, do you really need regular expressions?

Using .NET RegEx to retrieve part of a string after the second '-'

This is my first stack message. Hope you can help.
I have several strings i need to break up for use later. Here are a couple of examples of what i mean....
fred-064528-NEEDED
frederic-84728957-NEEDED
sam-028-NEEDED
As you can see above the string lengths vary greatly so regex i believe is the only way to achieve what i want. what i need is the rest of the string after the second hyphen ('-').
i am very weak at regex so any help would be great.
Thanks in advance.
Just to offer an alternative without using regex:
foreach(string s in list)
{
int x = s.LastIndexOf('-')
string sub = s.SubString(x + 1)
}
Add validation to taste.
Something like this. It will take anything (except line breaks) after the second '-' including the '-' sign.
var exp = #"^\w*-\w*-(.*)$";
var match = Regex.Match("frederic-84728957-NEE-DED", exp);
if (match.Success)
{
var result = match.Groups[1]; //Result is NEE-DED
Console.WriteLine(result);
}
EDIT: I answered another question which relates to this. Except, it asked for a LINQ solution and my answer was the following which I find pretty clear.
Pimp my LINQ: a learning exercise based upon another post
var result = String.Join("-", inputData.Split('-').Skip(2));
or
var result = inputData.Split('-').Skip(2).FirstOrDefault(); //If the last part is NEE-DED then only NEE is returned.
As mentioned in the other SO thread it is not the fastest way of doing this.
If they are part of larger text:
(\w+-){2}(\w+)
If there are presented as whole lines, and you know you don't have other hyphens, you may also use:
[^-]*$
Another option, if you have each line as a string, is to use split (again, depending on whether or not you're expecting extra hyphens, you may omit the count parameter, or use LastIndexOf):
string[] tokens = line.Split("-".ToCharArray(), 3);
string s = tokens.Last();
This should work:
.*?-.*?-(.*)
This should do the trick:
([^\-]+)\-([^\-]+)\-(.*?)$
the regex pattern will be
(?<first>.*)?-(?<second>.*)?-(?<third>.*)?(\s|$)
then you can get the named group "second" to get the test after 2nd hyphen
alternatively
you can do a string.split('-') and get the 2 item from the array

Categories