RegEx - Find and Replace while ignoring a number in the middle? - c#

In the middle of a long string, I am looking for "No. 1234. "
The number (1234) in my example above can be any length whole number. It also has to match on the space at the end.
So I am looking for examples:
1) This is a test No. 42. Hello Nice People
2) I have no idea wtf No. 1234412344124. I am doing.
I have figured out a way to match on this pattern with the following regex:
(No. [\d]{1,}. )'
What I cannot figure out, though, is how to do one simple thing when finding a match: Replace that last period with a darn comma!
So, with the two examples up above, I want to transform them into:
1) This is a test No. 42, Hello Nice People
2) I have no idea wtf No. 1234412344124, I am doing.
(Notice the commas now after the numbers)
How might one do this in C# and RegEx? Thank you!
EDIT:
Another way of looking at this is...
I can do this easily and have for years:
str = Replace(str, "Find this", "Replace it with this")
However, how can I do that by combining regex and the unknown portion of the string in the middle to replace the last period (not to be confused with the last character since the last character still needs to be a space)
This is a test No. 42. Hello Nice People
This is a test No. (some unknown length number). Hello Nice People
becomes
This is a test No. 42, Hello Nice People
This is a test No. (some unknown length number), Hello Nice People
(Notice the comma)

So you are essentially trying to match two adjacent groups, "\d+" and ". " then replace the second with ", ".
var r = new Regex(#"(\d+)(\. )");
var input = "This is a test No. 42. Hello Nice People";
var output = r.Replace(input, "$1, ");
Use the parenthesis to match two groups then with replace keep the first group and dump in the ", ".
Edit: derp, escape that period.

Edit - #1:
neilh's way is much better!
Ok, i know the code looks ugly.. i don't know how to edit the last char of a match directly in a regex
string[] stringhe = new string[5] {
"This is a test No. 42, Hello Nice People",
"I have no idea wtf No. 1234412344124. I am doing.",
"Very long No. 74385748957348957893458934; Hello World",
"Nope No. 48394839!!!",
"Nope"
};
Regex reg = new Regex(#"No.\s*([0-9]+)");
Match match;
int idx = 0;
StringBuilder builder;
foreach(string stringa in stringhe)
{
match = reg.Match(stringa);
if (match.Success)
{
Console.WriteLine("No. Stringa #" + idx + ": " + stringhe[idx]);
int indexEnd = match.Groups[1].Index + match.Groups[1].Length;
builder = new StringBuilder(stringa);
builder[indexEnd] = '.';
stringhe[idx] = builder.ToString();
Console.WriteLine("New String: " + stringhe[idx]);
}
++idx;
}
Console.ReadKey(true);
If you want to edit the char after the number of if it's a ',':
int indexEnd = match.Groups[1].Index + match.Groups[1].Length;
if (stringa[indexEnd] == ',')
{
builder = new StringBuilder(stringa);
builder[indexEnd] = '.';
stringhe[idx] = builder.ToString();
Console.WriteLine("New String: " + stringhe[idx]);
}
Or, we can edit the Regex to detect only if the number is followed by a comma with (better anyway)
No.\s*([0-9]+),
I'm not the best at Regex, but this should do what you want.
No.\s+([0-9]+)
If you except zero or more whitespaces between No. {NUMBER} this Regex should do the work:
No.\s*([0-9]+)
An example of how can look C# code:
string[] stringhe = new string[4] {
"This is a test No. 42, Hello Nice People",
"I have no idea wtf No. 1234412344124. I am doing.",
"Very long No. 74385748957348957893458934; Hello World",
"Nope No. 48394839!!!"
};
Regex reg = new Regex(#"No.\s+([0-9]+)");
Match match;
int idx = 0;
foreach(string stringa in stringhe)
{
match = reg.Match(stringa);
if (match.Success)
{
Console.WriteLine("No. Stringa #" + idx + ": " + match.Groups[1].Value);
}
++idx;
}

Here is the code :
private string Format(string input)
{
Match m = new Regex("No. [0-9]*.").Match(input);
int targetIndex = m.Index + m.Length - 1;
return input.Remove(targetIndex, 1).Insert(targetIndex, ",");
}

Related

how do i replace exact phrases in c# string.replace

I am trying to ensure that a list of phrases start on their own line by finding them and replacing them with \n + the phrase. eg
your name: joe your age: 28
becomes
my name: joe
your age: 28
I have a file with phrases that i pull and loop through and do the replace. Except as there are 2 words in some phrases i use \b to signify where the phrase starts and ends.
This doesn't seem to work, anybody know why?
example - String is 'Name: xxxxxx' does not get edited.
output = output.Replace('\b' + "Name" + '\b', "match");
Using regular expressions, accounts for any number of words with any number of spaces:
using System.Text.RegularExpressions;
Regex re = new Regex("(?<key>\\w+(\\b\\s+\\w+)*)\\s*:\\s*(?<value>\\w+)");
MatchCollection mc = re.Matches("your name: joe your age: 28 ");
foreach (Match m in mc) {
string key = m.Groups("key").Value;
string value = m.Groups("value").Value;
//accumulate into a list, but I'll just write to console
Console.WriteLine(key + " : " + value);
}
Here is some explanation:
Suppose what you want to the left of the colon (:) is called a key, and what is to the right - a value.
These key/value pairs are separated by at least once space. Because of this, value has be exactly one word (otherwise we'd have ambiguity).
The above regular expression uses named groups, to make code more readable.
got it
for (int headerNo=0; headerNo<headersArray.Length; headerNo++)
{
string searchPhrase = #"\b" + PhraseArray[headerNo] + #"\b";
string newPhrase = "match";
output = Regex.Replace(output, searchPhrase, newPhrase); }
Following the example you can do that :
output = output.Replace("your", "\nyour");

Regex cut number in a string c#

I have a string as following 2 - 5 now I want to get the number 5 with Regex C# (I'm new to Regex), could you suggest me an idea? Thanks
You can use String.Split method simply:
int number = int.Parse("2 - 5".Split('-', ' ').Last());
This will work if there is no space after the last number.If that is the case then:
int number = int.Parse("2 - 5 ".Split('-', ' ')
.Last(x => x.Any() && x.All(char.IsDigit)));
Very simply as follows:
'\s-\s(\d)'
and extract first matching group
#SShashank has the right of it, but I thought I'd supply some code, since you mentioned you were new to Regex:
string s = "something 2-5 another";
Regex rx = new Regex(#"-(\d)");
if (rx.IsMatch(s))
{
Match m = rx.Match(s);
System.Console.WriteLine("First match: " + m.Groups[1].Value);
}
Groups[0] is the entire match and Groups[1] is the first matched group (stuff in parens).
If you really want to use regex, you can simply do:
string text = "2 - 5";
string found = Regex.Match(text, #"\d+", RegexOptions.RightToLeft).Value;

C# Regex wildcard multiple replace

Doing a search for different strings using wildcards, such as doing a search for test0? (there is a space after the ?). The strings the search produces are:
test01
test02
test03
(and so on)
The replacement text should be for example:
test0? -
The wildcard above in test0? - represents the 1, 2, or 3...
So, the replacement strings should be:
test01 -
test02 -
test03 -
string pattern = WildcardToRegex(originalText);
fileName = Regex.Replace(originalText, pattern, replacementText);
public string WildcardToRegex(string pattern)
{
return "^" + System.Text.RegularExpressions.Regex.Escape(pattern).
Replace("\\*", ".*").Replace("\\?", ".") + "$";
}
The problem is saving the new string with the original character(s) plus the added characters. I could search the string and save the original with some string manipulation, but that seems like too much overhead. There has to be an easier way.
Thanks for any input.
EDIT:
Search for strings using the wildcard ?
Possible string are:
test01 someText
test02 someotherText
test03 moreText
Using Regex, the search string patter will be:
test0? -
So, each string should then read:
test01 - someText
test02 - someotherText
test03 - moreText
How to keep the character that was replaced by the regex wildcard '?'
As my code stands, it will come out as test? - someText
That is wrong.
Thanks.
EDIT Num 2
First, thanks everyone for their answers and direction.
It did help and lead me to the right track and now I can better ask the exact question:
It has to do with substitution.
Inserting text after the Regex.
The sample string I gave, they may not always be in that format. I have been looking into substitution but just can't seem to get the syntax right. And I am using VS 2008.
Any more suggestions?
Thanks
If you want to replace "test0? " with "test0? -", you would write:
string bar = Regex.Replace(foo, "^test0. ", "$0- ");
The key here is the $0 substitution, which will include the matched text.
So if I understand your question correctly, you just want your replacementText to be "$0- ".
If I understand the question correctly, couldn't you just use a match?
//Convert pattern to regex (I'm assuming this can be done with your "originalText")
Regex regex = pattern;
//For each match, replace the found pattern with the original value + " -"
foreach (Match m in regex.Matches)
{
RegEx.Replace(pattern, m.Groups[0].Value + " -");
}
So I'm not 100% clear on what you're doing, but I'll give it a try.
I'm going with the assumption that you want to use "file wildcards" (?/*) and search for a set of values that match (while retaining the values stored using the placeholder itself), then replace it with the new value (re-inserting those placeholders). given that, and probably a lot of overkill (since your requirement is kind of weird) here's what I came up with:
// Helper function to turn the file search pattern in to a
// regex pattern.
private Regex BuildRegexFromPattern(String input)
{
String pattern = String.Concat(input.ToCharArray().Select(i => {
String c = i.ToString();
return c == "?" ? "(.)"
: c == "*" ? "(.*)"
: c == " " ? "\\s"
: Regex.Escape(c);
}));
return new Regex(pattern);
}
// perform the actual replacement
private IEnumerable<String> ReplaceUsingPattern(IEnumerable<String> items, String searchPattern, String replacementPattern)
{
Regex searchRe = BuildRegexFromPattern(searchPattern);
return items.Where(s => searchRe.IsMatch(s)).Select (s => {
Match match = searchRe.Match(s);
Int32 m = 1;
return String.Concat(replacementPattern.ToCharArray().Select(i => {
String c = i.ToString();
if (m > match.Groups.Count)
{
throw new InvalidOperationException("Replacement placeholders exceeds locator placeholders.");
}
return c == "?" ? match.Groups[m++].Value
: c == "*" ? match.Groups[m++].Value
: c;
}));
});
}
Then, in practice:
String[] samples = new String[]{
"foo01", "foo02 ", "foo 03",
"bar0?", "bar0? ", "bar03 -",
"test01 ", "test02 ", "test03 "
};
String searchTemplate = "test0? ";
String replaceTemplate = "test0? -";
var results = ReplaceUsingPattern(samples, searchTemplate, replaceTemplate);
Which, from the samples list above, gives me:
matched: & modified to:
test01 test01 -
test02 test02 -
test03 test03 -
However, if you really want to save headaches you should be using replacement references. there's no need to re-invent the wheel. The above, with replacements, could have been changed to:
Regex searchRe = new Regex("test0(.*)\s");
samples.Select(x => searchRe.Replace(s, "test0$1-"));
You can catch any piece of your matched string and place anywhere in the replace statement, using symbol $ followed by the index of catched element (it starts at index 1).
You can catch element with parenthesis "()"
Example:
If I have several strings with testXYZ, being XYZ a 3-digit number, and I need to replace it, say, with testZYX, inverting the 3 digits, I would do:
string result = Regex.Replace(source, "test([0-9])([0-9])([0-9])", "test$3$2$1");
So, in your case, it can be done:
string result = Regex.Replace(source, "test0([0-9]) ", "test0$1 - ");

How to find the number of occurrences of a letter in only the first sentence of a string?

I want to find number of letter "a" in only first sentence. The code below finds "a" in all sentences, but I want in only first sentence.
static void Main(string[] args)
{
string text; int k = 0;
text = "bla bla bla. something second. maybe last sentence.";
foreach (char a in text)
{
char b = 'a';
if (b == a)
{
k += 1;
}
}
Console.WriteLine("number of a in first sentence is " + k);
Console.ReadKey();
}
This will split the string into an array seperated by '.', then counts the number of 'a' char's in the first element of the array (the first sentence).
var count = Text.Split(new[] { '.', '!', '?', })[0].Count(c => c == 'a');
This example assumes a sentence is separated by a ., ? or !. If you have a decimal number in your string (e.g. 123.456), that will count as a sentence break. Breaking up a string into accurate sentences is a fairly complex exercise.
This is perhaps more verbose than what you were looking for, but hopefully it'll breed understanding as you read through it.
public static void Main()
{
//Make an array of the possible sentence enders. Doing this pattern lets us easily update
// the code later if it becomes necessary, or allows us easily to move this to an input
// parameter
string[] SentenceEnders = new string[] {"$", #"\.", #"\?", #"\!" /* Add Any Others */};
string WhatToFind = "a"; //What are we looking for? Regular Expressions Will Work Too!!!
string SentenceToCheck = "This, but not to exclude any others, is a sample."; //First example
string MultipleSentencesToCheck = #"
Is this a sentence
that breaks up
among multiple lines?
Yes!
It also has
more than one
sentence.
"; //Second Example
//This will split the input on all the enders put together(by way of joining them in [] inside a regular
// expression.
string[] SplitSentences = Regex.Split(SentenceToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase);
//SplitSentences is an array, with sentences on each index. The first index is the first sentence
string FirstSentence = SplitSentences[0];
//Now, split that single sentence on our matching pattern for what we should be counting
string[] SubSplitSentence = Regex.Split(FirstSentence, WhatToFind, RegexOptions.IgnoreCase);
//Now that it's split, it's split a number of times that matches how many matches we found, plus one
// (The "Left over" is the +1
int HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the first sentence, {0} '{1}'.", HowMany, WhatToFind));
//Do all this again for the second example. Note that ideally, this would be in a separate function
// and you wouldn't be writing code twice, but I wanted you to see it without all the comments so you can
// compare and contrast
SplitSentences = Regex.Split(MultipleSentencesToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
SubSplitSentence = Regex.Split(SplitSentences[0], WhatToFind, RegexOptions.IgnoreCase | RegexOptions.Singleline);
HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the second sentence, {0} '{1}'.", HowMany, WhatToFind));
}
Here is the output:
We found, in the first sentence, 3 'a'.
We found, in the second sentence, 4 'a'.
You didn't define "sentence", but if we assume it's always terminated by a period (.), just add this inside the loop:
if (a == '.') {
break;
}
Expand from this to support other sentence delimiters.
Simply "break" the foreach(...) loop when you encounter a "." (period)
Well, assuming you define a sentence as being ended with a '.''
Use String.IndexOf() to find the position of the first '.'. After that, searchin a SubString instead of the entire string.
find the place of the '.' in the text ( you can use split )
count the 'a' in the text from the place 0 to instance of the '.'
string SentenceToCheck = "Hi, I can wonder this situation where I can do best";
//Here I am giving several way to find this
//Using Regular Experession
int HowMany = Regex.Split(SentenceToCheck, "a", RegexOptions.IgnoreCase).Length - 1;
int i = Regex.Matches(SentenceToCheck, "a").Count;
// Simple way
int Count = SentenceToCheck.Length - SentenceToCheck.Replace("a", "").Length;
//Linq
var _lamdaCount = SentenceToCheck.ToCharArray().Where(t => t.ToString() != string.Empty)
.Select(t => t.ToString().ToUpper().Equals("A")).Count();
var _linqAIEnumareable = from _char in SentenceToCheck.ToCharArray()
where !String.IsNullOrEmpty(_char.ToString())
&& _char.ToString().ToUpper().Equals("A")
select _char;
int a =linqAIEnumareable.Count;
var _linqCount = from g in SentenceToCheck.ToCharArray()
where g.ToString().Equals("a")
select g;
int a = _linqCount.Count();

String split using C#

I have the following string:
string text = "1. This is first sentence. 2. This is the second sentence. 3. This is the third sentence. 4. This is the fourth sentence."
I want to split it according to 1. 2. 3. and so on:
result[0] == "This is first sentence."
result[1] == "This is the second sentence."
result[2] == "This is the third sentence."
result[3] == "This is the fourth sentence."
Is there any way I can do it C#?
Assuming that you can't encounter such a pattern in your sentences : X. (a integer, followed by a point, followed by a space), this should work:
String[] result = Regex.Split(text, #"[0-9]+\. ");
is it possible that there will be numbers in the sentence too?
As I do not know you formatting, you already said you cannot do on EOL/New Line I would try something like...
List<string> lines = new List<string>();
string buffer = "";
int count = 1;
foreach(char c in input)
{
if(c.ToString() == count.ToString())
{
if(!string.IsNullOrEmpty(buffer))
{
lines.Add(buffer);
buffer = "";
}
count++;
}
buffer += c;
}
//lines will now contain your splitted data
You can then access each sentence like this...
string s1 = lines[0];
string s2 = lines[1];
string s3 = lines[2];
Important: Make sure you check the count of lines before getting sentence like...
string s1 = lines.Count > 0 ? lines[0] : "";
This makes a big assumption that you will not have the next lines number ID in a given sentance (i.e. sentence 2 will not contain the number 3)
If this does not help the provide you input in original format (do not add lines breaks if there are none)
EDIT: Fixed my code (wrong variable sorry)
int index = 1;
String[] result = Regex.Split(text, #"[0-9]+\. ").Where(i => !string.IsNullOrEmpty(i)).Select(i => (index++).ToString() + ". " + i).ToArray();
result will contain your sentences, including the "line number".
You could split on the '.' char and drop anything smaller than 2 char from the resulting array.
Of course, this relies on the fact that you would have no datapoints of 1 character other than the numeric indicator, if that was the case you could also check for it as a numeric value.
This answer would also drop a period from your sentences, so you'd have to add that back in. There is a lot of manipulation but this saves you from having to read each char and decision it independently.
This is the easiest way:
var str = "1. This is first sentence." +
"2. This is the second sentence." +
"3. This is the third sentence." +
"n. This is the nenth sentence";
//set your max number e.g 10000
var num = Enumerable.Range(1, 10000).Select(x=>x.ToString()+".").ToArray();
var res=str.Split(num ,StringSplitOptions.RemoveEmptyEntries);
Hope this help ;)

Categories