How to replace multiple occurrences in single pass?

How to replace multiple occurrences in single pass? - c#

I have the following string:
abc
def
abc
xyz
pop
mmm
091
abc
I need to replace all occurrences of abc with the ones from array ["123", "456", "789"] so the final string will look like this:
123
def
456
xyz
pop
mmm
091
789
I would like to do it without iteration, with just single expression. How can I do it?

Here is a "single expression version":
edit: Delegate instead of Lambda for 3.5
string[] replaces = {"123","456","789"};
Regex regEx = new Regex("abc");
int index = 0;
string result = regEx.Replace(input, delegate(Match match) { return replaces[index++];} );
Test it here

do it without iteration, with just single expression
This example uses the static Regex.Replace Method (String, String, MatchEvaluator) which uses a MatchEvaluator Delegate (System.Text.RegularExpressions) that will replace match values from a a queue and returns a string as a result:
var data =
#"abc
def
abc
xyz
pop
mmm
091
abc";
var replacements = new Queue<string>(new[] {"123", "456", "789"});
string result = Regex.Replace(data,
"(abc)", // Each match will be replaced with a new
(mt) => // queue item; instead of a one string.
{
return replacements.Dequeue();
});
Result
123
def
456
xyz
pop
mmm
091
789
.Net 3.5 Delegate
whereas I am limited to 3.5.
Regex.Replace(data, "(abc)", delegate(Match match) { return replacements.Dequeue(); } )

Related

C# regular expression

I have string like this:
{F971h}[0]<0>some result code: 1
and I want to split it into:
F971
0
0
some result code: 1
I know I can first split "{|}|[|]|<|>" it into:
{F971h}
[0]
<0>
some result code: 1
and next: {F971h} -> F971; [0] -> 0; etc.
But how can I do it with one regular expression?
I try somethink like this:
Regex rgx = new Regex(#"(?<timestamp>[0-9A-F]+)" + #"(?<subsystem>\d+)" + #"(?<level>\d+)" + #"(?<messagep>[0-9A-Za-z]+)");
var result = rgx.Matches(input);

You can try just Split without any regular expressions:
string source = "{F971h}[0]<0>some result code: 1";
string[] items = source.Split(new char[] { '{', '}', '[', ']', '<', '>' },
StringSplitOptions.RemoveEmptyEntries);
Test:
// F971h
// 0
// 0
// some result code: 1
Console.Write(String.Join(Environment.NewLine, items));

There are two issues with your regex:
You do not allow lowercase ASCII letters in the first capture group (add a-z or a RegexOptions.IgnoreCase flag)
The delimiting characters are missing in the pattern (<, >, [, ], etc.)
Use
{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)
^ ^^^ ^^^ ^^ ^
See the regex demo
Since the messagep group should match just the rest of the line, I suggest just using .+ at the end. Else, you'd need to replace your [0-9A-Za-z]+ that does not allow whitespace with something like [\w\s]+ (match all word chars and whitespaces, 1 or more times).
C# code:
var s = #"{F971h}[0]<0>some result code: 1";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)";
var m = Regex.Match(s, pat);
if (m.Success)
{
Console.Out.WriteLine(m.Groups["timestamp"].Value);
Console.Out.WriteLine(m.Groups["subsystem"].Value);
Console.Out.WriteLine(m.Groups["level"].Value);
Console.Out.WriteLine(m.Groups["messagep"].Value);
}
Or for a multiline string containing multiple matches:
var s = "{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>[^\r\n]+)";
var res = System.Text.RegularExpressions.Regex.Matches(s, pat)
.Cast<System.Text.RegularExpressions.Match>()
.Select(x => new[] {
x.Groups["timestamp"].Value,
x.Groups["subsystem"].Value,
x.Groups["level"].Value,
x.Groups["messagep"].Value})
.ToList();

You can get it like that:
string line = #"{F971h}[0]<0>some result code: 1";
var matchCollection = Regex.Matches(line, #"\{(?<timestamp>.*?)\}\[(?<subsystem>.*?)\]<(?<level>.*?)>(?<messagep>.*)");
if (matchCollection.Count > 0)
{
string timestamp = matchCollection[0].Groups["timestamp"].Value;
string subsystem = matchCollection[0].Groups["subsystem"].Value;
string level = matchCollection[0].Groups["level"].Value;
string messagep = matchCollection[0].Groups["messagep"].Value;
Console.Out.WriteLine("First part is {0}, second: {1}, thrid: {2}, last: {3}", timestamp, subsystem, level, messagep);
}
else
{
Console.Out.WriteLine("No match found.");
}
You can watch it live here on regex storm. You'll have to learn about:
Named capture groups
Repetitions

Thank you all! Code below works for me. I missed that it can be multiple string:
{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5
code:
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<message>.+)";
var collection = Regex.Matches(input, pat);
foreach (Match m in collection)
{
var timestamp = m.Groups["timestamp"];
var subsystem = m.Groups["subsystem"];
var level = m.Groups["level"];
var message = m.Groups["message"];
}

Get a specific part from a string based on a pattern

I have a string in this format:
ABCD_EFDG20120700.0.xml
This has a pattern which has three parts to it:
First is the set of chars before the '_', the 'ABCD'
Second are the set of chars 'EFDG' after the '_'
Third are the remaining 20120700.0.xml
I can split the original string and get the number(s) from the second element in the split result using this switch:
\d+
Match m = Regex.Match(splitname[1], "\\d+");
That returns only '20120700'. But I need '20120700.0'.
How do I get the required string?

You can extend your regex to look for any number of digits, then period and then any number of digits once again:
Match m = Regex.Match(splitname[1], "\\d+\\.\\d+");
Although with such regular expression you don't even need to split the string:
string s = "ABCD_EFDG20120700.0.xml";
Match m = Regex.Match(s, "\\d+\\.\\d+");
string result = m.Value; // result is 20120700.0

I can suggest you to use one regex operation for all you want like this:
var rgx = new Regex(#"^([^_]+)_([^\d.]+)([\d.]+\d+)\.(.*)$");
var matches = rgx.Matches(input);
if (matches.Count > 0)
{
Console.WriteLine("{0}", matches[0].Groups[0]); // All input string
Console.WriteLine("{0}", matches[0].Groups[1]); // ABCD
Console.WriteLine("{0}", matches[0].Groups[2]); // EFGH
Console.WriteLine("{0}", matches[0].Groups[3]); // 20120700.0
Console.WriteLine("{0}", matches[0].Groups[4]); // xml
}

c# RegExp shows only one capture

private String pattern = #"^{(.*)|(.*)}$";
ret = groups[0].Captures.Count.ToString(); // returns 1
isn't it should return 2 captures? because I have two () in my RegExp?
my string for example:
{test1 | test2}
the first capture should be test1 and the secnd test2, but I get the whole string in return and the captures count is 1 why is that?
UPDATE:
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(_sourceString);
String ret = "";
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
ret = groups[0].Captures[0].Value;
}
return ret; //returns the whole string, but I've expected 'test1'

| has a special meaning in regular expression. A|B matches A or B.
To match | literally, you need to escape it:
#"^{(.*)\|(.*)}$";

Regex replace with separate replacement depending on the match

Suppose if I have a dictionary (word and its replacements) as :
var replacements = new Dictionary<string,string>();
replacements.Add("str", "string0");
replacements.Add("str1", "string1");
replacements.Add("str2", "string2");
...
and an input string as :
string input = #"#str is a #str is a [str1] is a &str1 #str2 one test $str2 also [str]";
Edit:
Expected output :
string0 is a string0 is string0 is string1 string2 one test string2
I want to replace all occurances of '[CharSymbol]word' with its corresponding entry/value from the dictionary.
where Charsymbol can be ##$%^&*[] .. and also ']' after the word is valid i.e. [str] .
I tried the following for replace
string input = #"#str is a #str is a [str1] is a &str1 #str2 one test $str2 also [str]";
string pattern = #"(([#$&#\[]+)([a-zA-Z])*(\])*)"; // correct?
Regex re = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
// string outputString = re.Replace(input,"string0");
string newString = re.Replace(input, match =>
{
Debug.WriteLine(match.ToString()); // match is [str] #str
string lookup = "~"; // default value
replacements.TryGetValue(match.Value,out lookup);
return lookup;
});
How do i get the match as str , str1 etc. i.e. word without charsymbol.

Change your regex to this:
// Move the * inside the brackets around [a-zA-Z]
// Change [a-zA-Z] to \w, to include digits.
string pattern = #"(([#$&#\[]+)(\w*)(\])*)";
Change this line:
replacements.TryGetValue(match.Value,out lookup);
to this:
replacements.TryGetValue(match.Groups[3].Value,out lookup);
Note: Your IgnoreCase isn't necessary, since you match both upper and lower case in the regex.

Does this suit?
(?<=[##&$])(\w+)|[[\w+]]
It matches the following in your example:
#str is a #str is a [str] is a &str1 #str2 one test $str2

Try this Regex: ([#$&#\[])[a-zA-Z]*(\])?, and replace with string0
your code should be like this:
var replacements = new Dictionary<string, string>
{
{"str", "string0"},
{"str1", "string1"},
{"str2", "string2"}
};
String input="#str is a #str is a [str] is a &str #str can be done $str is #str";
foreach (var replacement in replacements)
{
string pattern = String.Format(#"([#$&#\[]){0}(\])?", replacement.Key);
var re = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
string output = re.Replace(input,
String.Format("{0}", replacement.Value));
}

Extract portion of these urls with RegEx and c#

I have to check if these two url match a pattern (or 2, to be more accurate). If so, I'd like to extract some portion of data.
1) /selector/en/any-string-chain-you-want.aspx?FamiId=32
Then I need to extract "en" and "32" into variables. To me the regex expression should like roughly something like /selector/{0}/any-string-chain-you-want.aspx?FamiId={1}
2) /selector/en/F/32/any-string-chain-you-want.html
where en and 32 must be assigned into variables. So:
/selector/{0}/F/{1}/any-string-chain-you-want.html
{0}: 2 letters language code such as en, fr, nl, es,...
{1}: family id integers (2 or 3 numbers) such as 12, 38, 124, etc but not 1212
Any idea on how to achieve it?
Thanks in advance,
Roland

This is the regex:
/selector/([a-z]{2})/.+?\.aspx\?FamiId=([0-9]+)
Code:
var regex = new Regex(#"/selector/([a-z]{2})/.+?\.aspx\?FamiId=([0-9]+)");
var test = "/selector/en/any-string-chain-you-want.aspx?FamiId=32";
foreach (Match match in regex.Matches(test))
{
var lang = match.Groups[1].Value;
var id = Convert.ToInt32(match.Groups[2].Value);
Console.WriteLine("lang: {0}, id: {1}", lang, id);
}
Regex for second case: /selector/([a-z]{2})/F/([0-9]+)/.+?\.html (code doesn't change)

You should have a look at this tutorial on Regular Expressions.
You could use the following expressions:
\/selector\/([a-z]{2})\/.*\.aspx\?FamiId=([0-9]{2,3})
and
\/selector\/([a-z]{2})\/F\/([0-9]{2,3})\/.*\.html

You can try with:
^/.*?/(\w{2})/(?:F/|.*?FamiId=)(\d{2,3}).*$
It works for both urls.

Case 1
private const string Regex1 = #"/selector/(\w\w)/.+\.aspx?FamiId=(\d+)";
Case 2
private const string Regex2 = #"/selector/(\w\w)/F/(\d+)/.+\.html";
Usage
Match m = Regex.Match(myString, Regex2);
string lang = m.Groups[1].Value;
string numericValue = m.Groups[2].Value;

string str = #"/selector/en/any-string-chain-you-want.aspx?FamiId=32";
Match m = Regex.Match(str, #"/selector/(\w{2})/.+\.aspx\?FamiId=(\d{2,3})");
string result = String.Format(#"/selector/{0}/F/{1}/any-string-chain-you-want.html", m.Groups[1].Value, m.Groups[2].Value);
There you go.

Its useful to learn a bit of regular expression for cases like this. RegExr is a free online RegEx building tool. However, the most useful I have found is Expresso

you can use something like that
String urlSchema1= #"/selector/(<lang>\w\w)/.+\.aspx?FamiId=(<FamiId>\d+)";
Match mExprStatic = Regex.Match(inputString,urlSchema1, RegexOptions.IgnoreCase | RegexOptions.Singleline);
if (mExprStatic.Success || !string.IsNullOrEmpty(mExprStatic.Value))
{
String language = mExprStatic.Groups["lang"].Value;
String FamId = mExprStatic.Groups["FamId"].Value;
}
String urlSchema2= #"/selector/(<lang>\w\w)/F/(<FamId>\d+)/.+\.html";
Match mExprStatic = Regex.Match(inputString,urlSchema2, RegexOptions.IgnoreCase | RegexOptions.Singleline);
if (mExprStatic.Success || !string.IsNullOrEmpty(mExprStatic.Value))
{
String language = mExprStatic.Groups["lang"].Value;
String FamId = mExprStatic.Groups["FamId"].Value;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to replace multiple occurrences in single pass? - c#

Here is a "single expression version": edit: Delegate instead of Lambda for 3.5 string[] replaces = {"123","456","789"}; Regex regEx = new Regex("abc"); int index = 0; string result = regEx.Replace(input, delegate(Match match) { return replaces[index++];} ); Test it here

Related

C# regular expression

Get a specific part from a string based on a pattern

c# RegExp shows only one capture

Regex replace with separate replacement depending on the match

Extract portion of these urls with RegEx and c#

Categories

Resources