.NET RegEx.Replace substring with special chars [duplicate] - c#

I want to insert a dollar sign at a specific position between two named capturing groups. The problem is that this means two immediately following dollar-signs in the replacement-string which results in problems.
How am I able to do that directly with the Replace-method? I only found a workaround by adding some temporary garbage that I instantly remove again.
See code for the problem:
// We want to add a dollar sign before a number and use named groups for capturing;
// varying parts of the strings are in brackets []
// [somebody] has [some-dollar-amount] in his [something]
string joeHas = "Joe has 500 in his wallet.";
string jackHas = "Jack has 500 in his pocket.";
string jimHas = "Jim has 740 in his bag.";
string jasonHas = "Jason has 900 in his car.";
Regex dollarInsertion = new Regex(#"(?<start>^.*? has )(?<end>\d+ in his .*?$)", RegexOptions.Multiline);
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Console.WriteLine("--------------------------");
joeHas = dollarInsertion.Replace(joeHas, #"${start}$${end}");
jackHas = dollarInsertion.Replace(jackHas, #"${start}$-${end}");
jimHas = dollarInsertion.Replace(jimHas, #"${start}\$${end}");
jasonHas = dollarInsertion.Replace(jasonHas, #"${start}$kkkkkk----kkkk${end}").Replace("kkkkkk----kkkk", "");
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Output:
Joe has 500 in his wallet.
Jack has 500 in his pocket.
Jim has 740 in his bag.
Jason has 900 in his car.
--------------------------
Joe has ${end}
Jack has $-500 in his pocket.
Jim has \${end}
Jason has $900 in his car.

Use this replacement pattern: "${start}$$${end}"
The double $$ escapes the $ so that it is treated as a literal character. The third $ is really part of the named group ${end}. You can read about this on the MSDN Substitutions page.
I would stick with the above approach. Alternately you can use the Replace overload that accepts a MatchEvaluator and concatenate what you need, similar to the following:
jackHas = dollarInsertion.Replace(jackHas,
m => m.Groups["start"].Value + "$" + m.Groups["end"].Value);

Why are you using regex for this in the first place?
string name = "Joe";
int amount = 500;
string place = "car";
string output = string.Format("{0} has ${1} in his {2}",name,amount,place);

Related

Substring until space

I have string like this:
Some data of the string Job ID_Of_the_job some other data of the string
I need to get this ID_Of_the_job
I here this stored in notes string variable
intIndex = notes.IndexOf("Job ")
strJob = notes.Substring(intIndex+4, ???)
I dont know how to get the lenght of this job.
Thanks for help,
Marc
Since you're already using string.IndexOf, here's a solution which builds on that.
Note that there's an overload of String.IndexOf which takes a parameter saying where to start searching.
We've managed to find the beginning of the Job ID, by doing:
int startIndex = notes.IndexOf("Job ") + "Job ".Length;
startIndex is the index of the "I" in "ID_Of_the_job".
We can then use IndexOf again to find the next space -- which will be the space following "ID_Of_the_job":
int endIndex = notes.IndexOf(" ", startIndex);
We can then use Substring:
string jobId = notes.Substring(startIndex, endIndex - startIndex);
Note that there's no error-handling here: if either of the IndexOf fails to find the thing you're looking for, it will return -1, and your code will do strange things. It would be a good idea to handle these cases!
Another, terser solution is to use Regex.
string jobId = Regex.Match(notes, #"Job (\S+)").Groups[1].Value
The regular expression Job (\S+) looks for the text "Job ", followed by 1 or more non-whitespace characters. It puts those non-whitespace characters into a capture group (which becomes Groups[1]), which we can read out.
In this case, jobId will be an empty string if the regex doesn't match.
See these working on dotnetfiddle.
I think I'd make life easy, split the string on spaces and take the string after the array slot that had Job in it:
var notes = "Some data of the string Job ID_Of_the_job some other data of the string";
var bits = notes.Split();
var job = bits[bits.IndexOf("Job") + 1]; //or Array.IndexOf..
If you're on a recent .net and know the job number will occur within the first 10 (say) words, then you can stop splitting after a certain number of words, with e.g. Split(new[]{' '}, 10) - this gives the first 9 words then the rest of the string in the 10th slot which could be a useful performance boost
You could also pull this fairly easily with regex:
var r = new Regex("Job (?<j>[^ ]+?)");
var m = r.Match(notes);
var job = m.Groups["j"].Value;
If you can more accurately define the format of a job number e.g. "it's between 2-3 digits, then a underscore, slash or hyphen, followed by 4 digits", then you don't even have to use Job to locate it, you can put the pattern into the regex:
var r = new Regex(#"(?<j>\d{2,3}[-_\\]\d{4})");
That will pick out a string of the given pattern (\digits {2 to 3 of}, then [hyphen or underscore or slash], then \digits {4 of}).. For example
First step you already did: find the string "Job id ". Second step is to split result by ' ' to extract id.
var input = "Some data of the string Job ID_Of_the_job some other data of the string";
Console.WriteLine(input.Substring(input.IndexOf("Job") + 4).Split(' ')[0]);
Fiddle.

Remove substring if number exists before keyword

I have a strings with the form:
5 dogs = 1 medium size house
4 cats = 2 small houses
one bird = 1 bird cage
What I amt trying to do is remove the substring that exists before the equals sign but only if the substring contains a keyword and the data before that keyword is a integer.
So in this example my key words are:
dogs,
cats,
bird
In the above example, the ideal output of my process would be:
1 medium size house
2 small houses
one bird = 1 bird cage
My code so far looks like this (I am hard coding the keyword values/strings for now)
var orginalstring= "5 dogs = 1 medium size house";
int equalsindex = originalstring.indexof('=');
var prefix = originalstring.Substring(0,equalsindex);
if(prefix.Contains("dogs")
{
var modifiedstring = originalstring.Remove(prefix).Replace("=", string.empty);
return modifiedstring;
}
return originalstring;
The issue here is that I am removing the whole substring regardless of whether or not the data preceding the keyword is a number.
Would somebody be able to help me with this additional logic?
Thanks so much as always for anybody who takes a few minutes to read this question.
Mick
You can do it with a simple regex of the form
\d+\s+(?:kw1|kw2|kw3|...)\s*=\s*
where kwX is the corresponding keyword.
var data = new[] {
"5 dogs = 1 medium size house",
"4 cats = 2 small houses",
"one bird = 1 bird cage"
};
var keywords = new[] {"dogs", "cats", "bird"};
var regexStr = string.Format( #"\d+\s+(?:{0})\s*=\s*", string.Join("|", keywords));
var regex = new Regex(regexStr);
foreach (var s in data) {
Console.WriteLine("'{0}'", regex.Replace(s, string.Empty));
}
In the example above the call of string.Format pastes the list of keywords joined by | into the "template" of the expression at the top of the post, i.e.
\d+\s+(?:dogs|cats|bird)\s*=\s*
This expression matches
One or more digits \d+, followed by
One or more space \s+, followed by
A keyword from the list: dogs, cats, bird (?:dogs|cats|bird), followed by
Zero or more spaces \s*, followed by
An equal sign =, followed by
Zero or more spaces \s*
The rest is easy: since this regex matches the part that you wish to remove, you need to call Replace and pass it string.Empty.
Demo.
You can use regex (System.Text.RegularExpressions) to identify whether or not there is a number in the string.
Regex r = new Regex("[0-9]"); //Look for a number between 0 and 9
bool hasNumber = r.IsMatch(prefix);
This Regex simply searches for any number in the string. If you want to search for a number-space-string you could use [0-9] [a-z]|[A-Z]. The | is an "or" so that both upper and lower case letters result in a match.
You can try something like this:
int i;
if(int.TryParse(prefix.Substring(0, 1), out i)) //try to get an int from first char of prefix
{
//remove prefix
}
This will only work for single-digit integers, however.

Regex cut number in a string c#

I have a string as following 2 - 5 now I want to get the number 5 with Regex C# (I'm new to Regex), could you suggest me an idea? Thanks
You can use String.Split method simply:
int number = int.Parse("2 - 5".Split('-', ' ').Last());
This will work if there is no space after the last number.If that is the case then:
int number = int.Parse("2 - 5 ".Split('-', ' ')
.Last(x => x.Any() && x.All(char.IsDigit)));
Very simply as follows:
'\s-\s(\d)'
and extract first matching group
#SShashank has the right of it, but I thought I'd supply some code, since you mentioned you were new to Regex:
string s = "something 2-5 another";
Regex rx = new Regex(#"-(\d)");
if (rx.IsMatch(s))
{
Match m = rx.Match(s);
System.Console.WriteLine("First match: " + m.Groups[1].Value);
}
Groups[0] is the entire match and Groups[1] is the first matched group (stuff in parens).
If you really want to use regex, you can simply do:
string text = "2 - 5";
string found = Regex.Match(text, #"\d+", RegexOptions.RightToLeft).Value;

Regular expression replace in C#

I'm fairly new to using regular expressions, and, based on a few tutorials I've read, I'm unable to get this step in my Regex.Replace formatted properly.
Here's the scenario I'm working on... When I pull my data from the listbox, I want to format it into a CSV like format, and then save the file. Is using the Replace option an ideal solution for this scenario?
Before the regular expression formatting example.
FirstName LastName Salary Position
-------------------------------------
John Smith $100,000.00 M
Proposed format after regular expression replace
John Smith,100000,M
Current formatting status output:
John,Smith,100000,M
*Note - is there a way I can replace the first comma with a whitespace?
Snippet of my code
using(var fs = new FileStream(filepath, FileMode.OpenOrCreate, FileAccess.Write))
{
using(var sw = new StreamWriter(fs))
{
foreach (string stw in listBox1.Items)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(stw);
//Piecing the list back to the original format
sb_trim = Regex.Replace(stw, #"[$,]", "");
sb_trim = Regex.Replace(sb_trim, #"[.][0-9]+", "");
sb_trim = Regex.Replace(sb_trim, #"\s", ",");
sw.WriteLine(sb_trim);
}
}
}
You can do it this with two replace's
//let stw be "John Smith $100,000.00 M"
sb_trim = Regex.Replace(stw, #"\s+\$|\s+(?=\w+$)", ",");
//sb_trim becomes "John Smith,100,000.00,M"
sb_trim = Regex.Replace(sb_trim, #"(?<=\d),(?=\d)|[.]0+(?=,)", "");
//sb_trim becomes "John Smith,100000,M"
sw.WriteLine(sb_trim);
Try this::
sb_trim = Regex.Replace(stw, #"(\D+)\s+\$([\d,]+)\.\d+\s+(.)",
m => string.Format(
"{0},{1},{2}",
m.Groups[1].Value,
m.Groups[2].Value.Replace(",", string.Empty),
m.Groups[3].Value));
This is about as clean an answer as you'll get, at least with regexes.
(\D+): First capture group. One or more non-digit characters.
\s+\$: One or more spacing characters, then a literal dollar sign ($).
([\d,]+): Second capture group. One or more digits and/or commas.
\.\d+: Decimal point, then at least one digit.
\s+: One or more spacing characters.
(.): Third capture group. Any non-line-breaking character.
The second capture group additionally needs to have its commas stripped. You could do this with another regex, but it's really unnecessary and bad for performance. This is why we need to use a lambda expression and string format to piece together the replacement. If it weren't for that, we could just use this as the replacement, in place of the lambda expression:
"$1,$2,$3"
Add the following 2 lines
var regex = new Regex(Regex.Escape(","));
sb_trim = regex.Replace(sb_trim, " ", 1);
If sb_trim= John,Smith,100000,M the above code will return "John Smith,100000,M"
This must do the job:
var result=Regex.Replace("John Smith $100,000.00 M", #"^(\w+)\s+(\w+)\s+\$([\d,\.]+)\s+(\w+)$","$1,$2,$3,$4");
//result: "John,Smith,100,000.00,M"
For simplicity, you just need a number from currency.
Regex.Replace(yourcurrency, "[^0-9]","")

C# dollar problem with regex-replace

I want to insert a dollar sign at a specific position between two named capturing groups. The problem is that this means two immediately following dollar-signs in the replacement-string which results in problems.
How am I able to do that directly with the Replace-method? I only found a workaround by adding some temporary garbage that I instantly remove again.
See code for the problem:
// We want to add a dollar sign before a number and use named groups for capturing;
// varying parts of the strings are in brackets []
// [somebody] has [some-dollar-amount] in his [something]
string joeHas = "Joe has 500 in his wallet.";
string jackHas = "Jack has 500 in his pocket.";
string jimHas = "Jim has 740 in his bag.";
string jasonHas = "Jason has 900 in his car.";
Regex dollarInsertion = new Regex(#"(?<start>^.*? has )(?<end>\d+ in his .*?$)", RegexOptions.Multiline);
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Console.WriteLine("--------------------------");
joeHas = dollarInsertion.Replace(joeHas, #"${start}$${end}");
jackHas = dollarInsertion.Replace(jackHas, #"${start}$-${end}");
jimHas = dollarInsertion.Replace(jimHas, #"${start}\$${end}");
jasonHas = dollarInsertion.Replace(jasonHas, #"${start}$kkkkkk----kkkk${end}").Replace("kkkkkk----kkkk", "");
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Output:
Joe has 500 in his wallet.
Jack has 500 in his pocket.
Jim has 740 in his bag.
Jason has 900 in his car.
--------------------------
Joe has ${end}
Jack has $-500 in his pocket.
Jim has \${end}
Jason has $900 in his car.
Use this replacement pattern: "${start}$$${end}"
The double $$ escapes the $ so that it is treated as a literal character. The third $ is really part of the named group ${end}. You can read about this on the MSDN Substitutions page.
I would stick with the above approach. Alternately you can use the Replace overload that accepts a MatchEvaluator and concatenate what you need, similar to the following:
jackHas = dollarInsertion.Replace(jackHas,
m => m.Groups["start"].Value + "$" + m.Groups["end"].Value);
Why are you using regex for this in the first place?
string name = "Joe";
int amount = 500;
string place = "car";
string output = string.Format("{0} has ${1} in his {2}",name,amount,place);

Categories