Why does my Regex.Replace string contain the replacement value twice? - c#

I have the following string: aWesdE, which I want to convert to http://myserver.com/aWesdE.jpg using Regex.Replace(string, string, string, RegexOptions)
Currently, I use this code:
string input = "aWesdE";
string match = "(.*)";
string replacement = "http://myserver.com/$1.jpg";
string output = Regex.Replace(input, match, replacement,
RegexOptions.IgnoreCase | RegexOptions.Singleline);
The result is that output is ending up as: http://myserver.com/aWesdE.jpghttp://myserver.com/.jpg
So, the replacement value shows up correctly, and then appears to be appended again - very strange. What's going on here?

There are actually 2 matches in your Regex. You defined your match like this:
string match = "(.*)";
It means match zero or more characters, so you have 2 matches - empty string and your text. In order to fix it change the pattern to
string match = "(.+)";
It means match one or more characters - in that case you will only get a single match

Related

Replace ab in string |var11=ab|var12=100|var21=cd|var22=200| using regular expression

i want to replace ab followed by var11 in given string
Input:|var11=ab|var12=100|var21=cd|var22=200|
My code is as follows:
string input = "|var11=ab|var12=100|var21=cd|var22=200|";
string pattern = #"^.var11=([a-z]+).";
string value = Regex.Replace(input, pattern, "ep");
and the output I got is:
epvar12=100|var21=cd|var22=200|
but the expected output was:
|var11=ep|var12=100|var21=cd|var22=200|
You may use
string input = "|var11=ab|var12=100|var21=cd|var22=200|";
string pattern = #"(?<=\bvar11=)[^|]+";
string value = Regex.Replace(input, pattern, "ep");
Or, a capturing group approach:
string pattern = #"\b(var11=)[^|]+";
string value = Regex.Replace(input, pattern, "${1}ep");
See the .NET regex demo
Details
(?<=\bvar11=) - a location immediately preceded with a whole word var11=
[^|]+ - 1+ non-pipe chars.
If you want to update the var11 value only when it is preceded with | or at the start of string use
string pattern = #"(?<=(?:^|\|)var11=)[^|]+";
where (?:^|\|) matches start of string (^) or (|) a pipe char (\|).

Extract value from a string in C# from a specific position

I have bunch of files in a folder and I am looping through them.
How do I extract the value from the below example? I need the value 0519 only.
DOC 75-20-0519-1.PDF
The below code gives the complete part include -1.
Convert.ToInt32(Path.GetFileNameWithoutExtension(objFile).Split('-')[2]);
Appreciate any help.
You can try regular expressions in order to match the value.
pattern:
[0-9]+ - one ore more digits
(?=[^0-9][0-9]+$) - followed by not a digit and one or more digits and end of string
code:
using System.Text.RegularExpressions;
...
string file = "DOC 75-20-0519-1.PDF";
// "0519"
string result = Regex
.Match(Path.GetFileNameWithoutExtension(file), #"[0-9]+(?=[^0-9][0-9]+$)")
.Value;
If Split('-') fails, and you have an entire string as a result, it seems that you have a wrong delimiter. It can be, say, one of the dashes:
"DOC 75–20–0519–1.PDF"; // n-dash
"DOC 75—20—0519—1.PDF"; // m-dash
You can use REGEX for this
Match match = Regex.Match("DOC 75-20-0519-1.PDF", #"DOC\s+\d+\-\d+\-(\d+)\-\d+", RegexOptions.IgnoreCase);
string data = match.Groups[1].Value;

Using Regex to extract part of a string from a HTML/text file

I have a C# regular expression to match author names in a text document that is written as:
"author":"AUTHOR'S NAME"
The regex is as follows:
new Regex("\"author\":\"[A-Za-z0-9]*\\s?[A-Za-z0-9]*")
This returns "author":"AUTHOR'S NAME. However, I don't want the quotation marks or the word Author before. I just want the name.
Could anyone help me get the expected value please?
Use regex groups to get a part of the string. ( ) acts as a capture group and can be accessed by the .Groups field.
.Groups[0] matches the whole string
.Groups[1] matches the first group (and so on)
string pattern = "\"author\":\"([A-Za-z0-9]*\\s?[A-Za-z0-9]*)\"";
var match = Regex.Match("\"author\":\"Name123\"", pattern);
string authorName = match.Groups[1];
You can also use look-around approach to only get a match value:
var txt = "\"author\":\"AUTHOR'S NAME\"";
var rgx = new Regex(#"(?<=""author"":"")[^""]+(?="")");
var result = rgx.Match(txt).Value;
My regex yields 555,020 iterations per second speed with this input string, which should suffice.
result will be AUTHOR'S NAME.
(?<="author":") checks if we have "author":" before the match, [^"]+ looks safe since you only want to match alphanumerics and space between the quotes, and (?=") is checking the trailing quote.

Regular expression: take string literally with special re-characters

Maybe simple question..
String text = "fake 43 60 fake";
String patt = "[43.60]";
Match m = Regex.Match(text, patt)
In this situation, m.Success = true because the dot replace any character (also the space). But I must match the string literally in the patt.
Of course, I can use the '\' before the dot in the patt
String patt = #"[43\.60]";
So the m.Success = false, but there's more special characters in the Regular Expression-world.
My question is, how can I use regular expression that a string will be literally taken as it set. So '43.60' must be match with exactly '43.60'. '43?60' must be match with '43?60'....
thanks.
To get a regex-safe literal:
string escaped = Regex.Escape(input);
For example, to match the literal [43.60]:
string escaped = Regex.Escape(#"[43.60]");
gives the string with content: \[43\.60].
You can then use this escaped content to create a regex; for example:
string find = "43?60";
string escaped = Regex.Escape(find);
bool match = Regex.IsMatch("abc 43?60", escaped);
Note that in many cases you will want to combine the escaped string with some other regex fragment to make a complete pattern.

problem in regular expression

I am having a regular expression
Regex r = new Regex(#"(\s*)([A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]\d(?!.*[DFIOQU])(?:[A-Z](\s?)\d[A-Z]\d))(\s*)",RegexOptions.IgnoreCase);
and having a string
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
I have to fetch C1C 1C1.This running fine.
But if a modify test string as
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
then it is unable to find the pattern i.e C1C 1C1.
any idea why this expression is failing?
You have a negative look ahead:
(?!.*[DFIOQU])
That matches the "O" in "ON" and since it is a negative look ahead, the whole pattern fails. And, as an aside, I think you want to replace this:
[A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]
With this:
[A-CEGHJ-NPR-TVYX]
A pipe (|) is a literal character inside a character class, not an alternation, and you can use ranges to help hilight the characters that you're leaving out.
A single regex might not be the best way to parse that string. Or perhaps you just need a looser regex.
You are searching for a not a following DFIOQU with your negative look ahead (?!.*[DFIOQU])
In your second string there is a O at the end in ON, so it must be failing to match.
If you remove the .* in your negative look ahead it will only check the directly following character and not the complete string to the end (Is it this what you want?).
\s*([ABCEGHJKLMNPRSTVYX]\d(?![DFIOQU])(?:[A-Z]\s?\d[A-Z]\d))\s*
then it works, see it here on Regexr. It is now checking if there is not one of the characters in the class directly after the digit, I don't know if this is intended.
Btw. I removed the | from your first character class, its not needed and also some brackets around your whitespaces, also not needed.
As I understood you need to find the C1C 1C1 text in your string
I've used this regex for do this
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
after that you can extract text from named groups
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
RegexOptions myRegexOptions = RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
string secondStr = "LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
Match match = myRegex.Match(strTargetString);
string c1c = match.Groups["c1c"].Value;
string c1c2 = match.Groups["c1c2"].Value;
Console.WriteLine(c1c + " " +c1c2);

Categories