Pattern not correct with regular expressions in c#

Pattern not correct with regular expressions in c# - c#

i am doing a project , and i want to remove from a string http protocoll. In my excel sheet there are two types one is http://www.email#domain.com and the other is http://email#domain.com.I have tried so many combinations but i can't find the right one.
My code only works with the first type and not with the second one
var website_domain_in_excel = list_of_information_in_excel[2];
string pattern = "(http://\\www.)";
Console.WriteLine(Regex.Replace(website_domain_in_excel, pattern, String.Empty));
Thank you for your time

The pattern you want is this:
string pattern = #"http:\/\/(?:www\.)?"
This matches http:// and then an optional non-capturing group matching www..
You can see an explanation of the regex here and this fiddle for a working demo in C#.

You can use the string: "http?://?www.|http?://" which matches either "http://www." or "http://".
The code would look like this:
var website_domain_in_excel = list_of_information_in_excel[2];
string pattern = #"http:\/\/www.|http:\/\/";
Console.WriteLine(Regex.Replace(website_domain_in_excel, pattern, String.Empty));

A non-regex solution:
var eml = "http://www.email#domain.com";
eml = eml.Replace("http://", "").Replace("www.", "");
// eml now is "email#domain.com"
You might want to test that that "www." only appears at the start. The (unusual) "email#www.domain.com" should remain intact.
But if you really want a regex:
eml = Regex.Replace(eml, "^https?://(www\\.)?", "");
This also catches "https", because of the ? after that "s"
It will also find and replace an optional "www.", but only at the start

Related

How to replace all the text before a word with a blank space using Regex.Replace in C#

I'm having problems trying to replace a text before a word using C#
I want to delete all the text found before a specific word.
I tried this but it didn't work
string filecomplete = File.ReadAllText(Path.GetFullPath(ofd.FileName));
string nonimportant = ".*?(?=WORD)";
string file = System.Text.RegularExpressions.Regex.Replace(filecomplete,
nonimportant, string.Empty);
What i want is to keep all the text after the WORD and delete everything before.
Thanks,

Instead of Regexp you can use IndexOf method in string and Substring method like following
var result = content.Substring(content.IndexOf("word"));
Moreover because you are reading file into memory for bigger files it will be useful if you read it line by line.

Following option uses the MatchEvaluator syntax of Regex.Replace. Note that you just need to put WORD in place and capture what comes after in a capture group via ().
string nonimportant = ".*?(WORD.*)";
string st = "aslkfdalksdWORDewot";
string file = Regex.Replace(st, nonimportant, (Match m) => String.Format("{0}",m.Groups[1]));

. doesn’t match newlines by default – I’m assuming you have some, given the ReadAllText. You can pass RegexOptions.Singleline to change this behaviour.
string file = Regex.Replace(filecomplete, nonimportant, string.Empty,
RegexOptions.Singleline);
(Use using System.Text.RegularExpressions;.)

How to remove a pattern from a string using Regex

I want to find paths from a string and remove them, e.g.:
string1 = "'c:\a\b\c'!MyUDF(param1, param2,..) + 'c:\a\b\c'!MyUDF(param3, param4,..)..."`
I'd like a regex to find the pattern '[some path]'!MyUDF, and remove '[path]'.
Thanks.
Edit:
Example input:
string1 = "'c:\a\b\c'!MyUDF(param1, param2,..) + 'c:\a\b\c'!MyUDF(param3, param4,..)";
Expected output: "MyUDF(param1, param2,...) + MyUDF(param3, param4,...)"
where MyUDF is a function name, so it consists of only letters

input=Regex.Replace(input,"'[^']+'(?=!MyUDF)","");
In case if the path is followed by ! and some other word you can use
input=Regex.Replace(input,#"'[^']+'(?=!\w+)","");

Alright, if the ! is always in the string as you suggest, this Regex !(.*)?\( will get you what you want. Here is a Regex 101 to prove it.
To use it, you might do something like this:
var result = Regex.Replace(myString, #"!(.*)?\(");

The feature you want, if you are dealing with file paths, is in System.Path.
There are many methods there, but that is one of it's specific purposes.

what is the best way to parse out string from longer string?

i have a string that looks like this:
"/dir/location/test-load-ABCD.p"
and i need to parse out "ABCD" (where ABCD will be a different value every day)
The only things that i know that will always be consistent (to use for the logic for parsing) are:
There will always be be a ".p" after the value
There will always be a "test-load-" before the value.
The things i thought of was somehow grab everything past the last "/" and then remove the last 2 characters (to take case of the ".p" and then to do a
.Replace("test-load-", "")
but it felt kind of hacky so i wanted to see if people had any suggestions on a more elegant solution.

You can use a regex:
static readonly Regex parser = new Regex(#"/test-load-(.+)\.p");
string part = parser.Match(str).Groups[1].Value;
For added resilience, replace .+ with a character class containing only the characters that can appear in that part.
Bonus:
You probably next want
DateTime date = DateTime.ParseExact(part, "yyyy-MM-dd", CultureInfo.InvariantCulture);

Since this is a file name, use the file name parsing facility offered by the framework:
var fileName = System.IO.Path.GetFileNameWithoutExtension("/dir/location/test-load-ABCD.p");
string result = fileName.Replace("test-load-", "");
A “less hacky” solution than using Replace would be the use of regular expressions to capture the solution but I think this would be overkill in this case.

string input = "/dir/location/test-load-ABCD.p";
Regex.Match(input, #"test-load-([a-zA-Z]+)\.p$").Groups[1].Value

C# Regular expression problem

I have the following string:
http://www.powerwXXe.com/text1 123-456 text2 text3/
Can someone give me advice on how to get the value of text1, text2 and text3 and put them into a string. I have heard of regular expressions but have no idea how to use them.

Instead of going the RegEx route, if you know that the string will always be of a similar format, you can using string.Split, first on /, then on space and retrieve the results from the resulting string arrays.
string[] slashes = myString.Split('/');
string[] textVals = slashes[3].Split(' ');
// at this point:
// textVals[0] = "text1"
// textVals[1] = "123-456"
// textVals[2] = "text2"
// textVals[3] = "text3"

Here is a link on getting started with regular expressions in C#:Regular Expression Tutorial
I don't think it is appropriate to write out a tutorial here since the information is online, so please check out the link and let me know if you have a specific question.

Instead of using regex, you can use string.Fromat("http://myurl.com/{0}{1}{2}", value1, textbox2.Text, textbox3.Text) and format the url in whatever fashion. If you are looking to go the regex route, you can always check regexlib.

The use of regular expressions relies on patterns you see in your strings - you need to be able to generalize the pattern of strings you're looking for before you can use a regular expression.
For a problem of this scope, if you can pin down the pattern, you're probably better off using other string parsing methods, such as String.IndexOf and String.Split.
Regular expressions is a powerful tool, and certainly worth learning, but it might not be necessary here.

Based on the example you gave, it looks as though text1, text2 and text3 are separated by spaces? If so, and if you always know the positions they'll be in, you may want to skip regular expressions and just use .Split(' ') to split the string into an array of strings and then grab the pertinent items from there. Something like this:
string foo = "http://www.powerwXXe.com/text1 123-456 text2 text3/"
string[] fooParts = foo.Split(' ');
string text1 = fooParts[0].Replace("http://www.powerwXXe.com/", "");
string text2 = fooParts[2];
string text3 = fooParts[3].Replace("/", "");
You'd want to perform bounds checking on the string[] before trying to grab anything from it, but this would work. Regex is awesome for string parsing, but when it's simple stuff you need to do, sometimes it's overkill when simple methods from the string class will do.

It all depends on how much you know about about the string you are parsing. Where does the string come from and how much do you know about it's formating?
Based on your example string you could get away with something as simple as
string pattern = #"http://www.powerwXXe.com/(?<myGroup1>\S+)\s\S+\s(?<myGroup2>\S+)\s(?<myGroup3>\S+)/";
var reg = new System.Text.RegularExpressions.Regex(pattern);
string input = "http://www.powerwXXe.com/text1 123-456 text2 text3/";
System.Text.RegularExpressions.Match myMatch = reg.Match(input);
The caputerd strings would then be contained in myMatch.Groups["myGroup1"], ["myGroup2"], ["myGroup3"] respectivly.
This however assumes that your string always begins with http://www.powerwXXe.com/, that there will always be three groups to capture and that the groups are separated by a space (which is an illegal character in url's and would in almost all cases be converted to %20, which would have to be accounted for in the pattern).
So, how much do you know about your string? And, as some has already stated, do you really need regular expressions?

Using .NET RegEx to retrieve part of a string after the second '-'

This is my first stack message. Hope you can help.
I have several strings i need to break up for use later. Here are a couple of examples of what i mean....
fred-064528-NEEDED
frederic-84728957-NEEDED
sam-028-NEEDED
As you can see above the string lengths vary greatly so regex i believe is the only way to achieve what i want. what i need is the rest of the string after the second hyphen ('-').
i am very weak at regex so any help would be great.
Thanks in advance.

Just to offer an alternative without using regex:
foreach(string s in list)
{
int x = s.LastIndexOf('-')
string sub = s.SubString(x + 1)
}
Add validation to taste.

Something like this. It will take anything (except line breaks) after the second '-' including the '-' sign.
var exp = #"^\w*-\w*-(.*)$";
var match = Regex.Match("frederic-84728957-NEE-DED", exp);
if (match.Success)
{
var result = match.Groups[1]; //Result is NEE-DED
Console.WriteLine(result);
}
EDIT: I answered another question which relates to this. Except, it asked for a LINQ solution and my answer was the following which I find pretty clear.
Pimp my LINQ: a learning exercise based upon another post
var result = String.Join("-", inputData.Split('-').Skip(2));
or
var result = inputData.Split('-').Skip(2).FirstOrDefault(); //If the last part is NEE-DED then only NEE is returned.
As mentioned in the other SO thread it is not the fastest way of doing this.

If they are part of larger text:
(\w+-){2}(\w+)
If there are presented as whole lines, and you know you don't have other hyphens, you may also use:
[^-]*$
Another option, if you have each line as a string, is to use split (again, depending on whether or not you're expecting extra hyphens, you may omit the count parameter, or use LastIndexOf):
string[] tokens = line.Split("-".ToCharArray(), 3);
string s = tokens.Last();

This should work:
.*?-.*?-(.*)

This should do the trick:
([^\-]+)\-([^\-]+)\-(.*?)$

the regex pattern will be
(?<first>.*)?-(?<second>.*)?-(?<third>.*)?(\s|$)
then you can get the named group "second" to get the test after 2nd hyphen
alternatively
you can do a string.split('-') and get the 2 item from the array

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Pattern not correct with regular expressions in c# - c#

The pattern you want is this: string pattern = #"http:\/\/(?:www\.)?" This matches http:// and then an optional non-capturing group matching www.. You can see an explanation of the regex here and this fiddle for a working demo in C#.

Related

How to replace all the text before a word with a blank space using Regex.Replace in C#

How to remove a pattern from a string using Regex

what is the best way to parse out string from longer string?

C# Regular expression problem

Using .NET RegEx to retrieve part of a string after the second '-'

Categories

Resources