Is there a way to count the number of replacements a Regex.Replace call makes?
E.g. for Regex.Replace("aaa", "a", "b"); I want to get the number 3 out (result is "bbb"); for Regex.Replace("aaa", "(?<test>aa?)", "${test}b"); I want to get the number 2 out (result is "aabab").
Ways I can think to do this:
Use a MatchEvaluator that increments a captured variable, doing the replacement manually
Get a MatchCollection and iterate it, doing the replacement manually and keeping a count
Search first and get a MatchCollection, get the count from that, then do a separate replace
Methods 1 and 2 require manual parsing of $ replacements, method 3 requires regex matching the string twice. Is there a better way.
Thanks to both Chevex and Guffa. I started looking for a better way to get the results and found that there is a Result method on the Match class that does the substitution. That's the missing piece of the jigsaw. Example code below:
using System.Text.RegularExpressions;
namespace regexrep
{
class Program
{
static int Main(string[] args)
{
string fileText = System.IO.File.ReadAllText(args[0]);
int matchCount = 0;
string newText = Regex.Replace(fileText, args[1],
(match) =>
{
matchCount++;
return match.Result(args[2]);
});
System.IO.File.WriteAllText(args[0], newText);
return matchCount;
}
}
}
With a file test.txt containing aaa, the command line regexrep test.txt "(?<test>aa?)" ${test}b will set %errorlevel% to 2 and change the text to aabab.
You can use a MatchEvaluator that runs for each replacement, that way you can count how many times it occurs:
int cnt = 0;
string result = Regex.Replace("aaa", "a", m => {
cnt++;
return "b";
});
The second case is trickier as you have to produce the same result as the replacement pattern would:
int cnt = 0;
string result = Regex.Replace("aaa", "(?<test>aa?)", m => {
cnt++;
return m.Groups["test"] + "b";
});
This should do it.
int count = 0;
string text = Regex.Replace(text,
#"(((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)", //Example expression. This one captures URLs.
match =>
{
string replacementValue = String.Format("<a href='{0}'>{0}</a>", match.Value);
count++;
return replacementValue;
});
I am not on my dev computer so I can't do it right now, but I am going to experiment later and see if there is a way to do this with lambda expressions instead of declaring the method IncrementCount() just to increment an int.
EDIT modified to use a lambda expression instead of declaring another method.
EDIT2 If you don't know the pattern in advance, you can still get all the groupings (The $ groups you refer to) within the match object as they are included as a GroupCollection. Like so:
int count = 0;
string text = Regex.Replace(text,
#"(((http|ftp|https):\/\/|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)", //Example expression. This one captures URLs.
match =>
{
string replacementValue = String.Format("<a href='{0}'>{0}</a>", match.Value);
count++;
foreach (Group g in match.Groups)
{
g.Value; //Do stuff with g.Value
}
return replacementValue;
});
Related
I have this code:
string firstTag = "Forums2008/forumPage.aspx?forumId=";
string endTag = "</a>";
index = forums.IndexOf(firstTag, index1);
if (index == -1)
continue;
var secondIndex = forums.IndexOf(endTag, index);
result = forums.Substring(index + firstTag.Length + 12, secondIndex - (index + firstTag.Length - 50));
The string i want to extract from is for example:
הנקה
What i want to get is the word after the title only this: הנקה
And the second problem is that when i'm extracting it i see instead hebrew some gibrish like this: ������
One powerful way to do this is to use Regular Expressions instead of trying to find a starting position and use a substring. Try out this code, and you'll see that it extracts the anchor tag's title:
var input = "הנקה";
var expression = new System.Text.RegularExpressions.Regex(#"title=\""([^\""]+)\""");
var match = expression.Match(input);
if (match.Success) {
Console.WriteLine(match.Groups[1]);
}
else {
Console.WriteLine("not found");
}
And for the curious, here is a version in JavaScript:
var input = 'הנקה';
var expression = new RegExp('title=\"([^\"]+)\"');
var results = expression.exec(input);
if (results) {
document.write(results[1]);
}
else {
document.write("not found");
}
Okay here is the solution using String.Substring() String.Split() and String.IndexOf()
String str = "הנקה"; // <== Assume this is passing string. Yes unusual scape sequence are added
int splitStart = str.IndexOf("title="); // < Where to start splitting
int splitEnd = str.LastIndexOf("</a>"); // < = Where to end
/* What we try to extract is this : title="הנקה">הנקה
* (Given without escape sequence)
*/
String extracted = str.Substring(splitStart, splitEnd - splitStart); // <=Extracting required portion
String[] splitted = extracted.Split('"'); // < = Now split with "
Console.WriteLine(splitted[1]); // <= Try to Out but yes will produce ???? But put a breakpoint here and check the values in split array
Now the problem, here you can see that i have to use escape sequence in an unusual way. You may ignore that since you are simply passing the scanning string.
And this actually works, but you cannot visualize it with the provided Console.WriteLine(splitted[1]);
But if you put a break point and check the extracted split array you can see that text are extracted. you can confirm it with following screenshot
Suppose I have written "5 and 6" or "5+6". How can I assign 5 and 6 to two different variables in c# ?
P.S. I also want to do certain work if certain chars are found in string. Suppose I have written 5+5. Will this code do that ?
if(string.Contains("+"))
{
sum=x+y;
}
string input="5+5";
var numbers = Regex.Matches(input, #"\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Personally, I would vote against doing some splitting and regular expression stuff.
Instead I would (and did in the past) use one of the many Expression Evaluation libraries, like e.g. this one over at Code Project (and the updated version over at CodePlex).
Using the parser/tool above, you could do things like:
A simple expression evaluation then could look like:
Expression e = new Expression("5 + 6");
Debug.Assert(11 == e.Evaluate());
To me this is much more error-proof than doing the parsing all by myself, including regular expressions and the like.
You should use another name for your string than string
var numbers = yourString.Split("+");
var sum = Convert.ToInt32(numbers[0]) + Convert.ToInt32(numbers[1]);
Note: Thats an implementation without any error checking or error handling...
If you want to assign numbers from string to variables, you will have to parse string and make conversion.
Simple example, if you have text with only one number
string text = "500";
int num = int.Parse(text);
Now, if you want to parse something more complicated, you can use split() and/or regex to get all numbers and operators between them. Than you just iterate array and assign numbers to variables.
string text = "500+400";
if (text.Contains("+"))
{
String[] data = text.Split("+");
int a = int.Parse(data[0]);
int b = int.Parse(data[1]);
int res = a + b;
}
Basicly, if you have just 2 numbers and operazor between them, its ok. If you want to make "calculator" you will need something more, like Binary Trees or Stack.
Use the String.Split method. It splits your string rom the given character and returns a string array containing the value that is broken down into multiple pieces depending on the character to break, in this case, its "+".
int x = 0;
int y = 0;
int z = 0;
string value = "5+6";
if (value.Contains("+"))
{
string[] returnedArray = value.Split('+');
x = Convert.ToInt32(returnedArray[0]);
y = Convert.ToInt32(returnedArray[1]);
z = x + y;
}
Something like this may helpful
string strMy = "5&6";
char[] arr = strMy.ToCharArray();
List<int> list = new List<int>();
foreach (char item in arr)
{
int value;
if (int.TryParse(item.ToString(), out value))
{
list.Add(item);
}
}
list will contains all the integer values
You can use String.Split method like;
string s = "5 and 6";
string[] a = s.Split(new string[] { "and", "+" }, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(a[0].Trim());
Console.WriteLine(a[1].Trim());
Here is a DEMO.
Use regex to get those value and then switch on the operand to do the calculation
string str = "51 + 6";
str = str.Replace(" ", "");
Regex regex = new Regex(#"(?<rightHand>\d+)(?<operand>\+|and)(?<leftHand>\d+)");
var match = regex.Match(str);
int rightHand = int.Parse(match.Groups["rightHand"].Value);
int leftHand = int.Parse(match.Groups["leftHand"].Value);
string op = match.Groups["operand"].Value;
switch (op)
{
case "+":
.
.
.
}
Split function maybe is comfortable in use but it is space inefficient
because it needs array of strings
Maybe Trim(), IndexOf(), Substring() can replace Split() function
If the title isn't clear enough, here's a procedural way of approaching the problem:
[TestMethod]
public void Foo()
{
var start = "9954-4740-4491-4414";
var sb = new StringBuilder();
var j = 0;
for (var i = 0 ; i < start.Length; i++)
{
if ( start[i] != '-')
{
if (j == 2)
{
sb.AppendFormat(":{0}", start[i]);
j = 1;
}
else
{
sb.Append(start[i]);
j++;
}
}
}
var end = sb.ToString();
Assert.AreEqual(end, "99:54:47:40:44:91:44:14");
}
If you're using C# 4 all you need is this:
string result = string.Join(":", Regex.Matches(start, #"\d{2}").Cast<Match>());
For C# 3 you need to provide a string[] to Join:
string[] digitPairs = Regex.Matches(start, #"\d{2}")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
string result = string.Join(":", digitPairs);
I agree with "why bother with regular expressions?"
string.Join(":", str.Split('-').Select(s => s.Insert(2, ":"));
Regex.Replace version, although I like Mark's answer better:
string res = Regex.Replace(start,
#"(\d{2})(\d{2})-(\d{2})(\d{2})-(\d{2})(\d{2})-(\d{2})(\d{2})",
#"$1:$2:$3:$4:$5:$6:$7:$8");
After a while of experimenting, I've found a way to do it by using a single regular expression that works with input of unlimited length:
Regex.Replace(start, #"(?'group'\d\d)-|(?'group'\d\d)(?!$)", #"$1:")
When using named groups (the (?'name') stuff) with same name, captures are stored in the same group. That way, it is possible to replace distinct matches with same value.
It also makes use of negative lookahead (the (?!) stuff).
You don't need them: strip the '-' characters and then insert a colon between each pair of numbers. Unless I've misunderstood the desired output format.
I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:
word:
TEST:
word:
TEST:
TEST: // random text
"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:
word_ONE:
TEST_ONE:
word_TWO:
TEST_TWO:
TEST_THREE: // random text
The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?
Regex is perfect for this job - using Replace with a match evaluator:
This example is not tested nor compiled:
public class Fix
{
public static String Execute(string largeText)
{
return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
}
private Dictionary<String, int> counters = new Dictionary<String, int>();
private static String[] numbers = {"ONE", "TWO", "THREE",...};
public String Evaluator(Match m)
{
String word = m.Groups[1].Value;
int count;
if (!counters.TryGetValue(word, out count))
count = 0;
count++;
counters[word] = count;
return word + "_" + numbers[count-1] + ":";
}
}
This should return what you requested when calling:
result = Fix.Execute(largeText);
i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.
Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
string matchedString = m.ToString();
if(wordCount.Contains(matchedString))
wordCount[matchedString]=wordCount[matchedString]+1;
else
wordCount.Add(matchedString, 1);
return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}
string inputText = "....";
string regexText = #"";
static void Main()
{
string text = "....";
string result = Regex.Replace(text, #"^\b[A-Za-z0-9_]{4,}\b:",
new MatchEvaluator(AppendIndex));
}
see this:
http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx
If I understand you correctly, regex is not necessary here.
You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.
EDIT
Read your file line by line OR split the string by '\n'
Check if your delimiter is present. Either by splitting by ':' OR using regex.
Get the first item from the split array OR the first match of your regex.
Use a dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.
Look at this example (I know it's not perfect and not so nice)
lets leave the exact argument for the Split function, I think it can help
static void Main(string[] args)
{
string a = "word:word:test:-1+234=567:test:test:";
string[] tks = a.Split(':');
Regex re = new Regex(#"^\b[A-Za-z0-9_]{4,}\b");
var res = from x in tks
where re.Matches(x).Count > 0
select x + DecodeNO(tks.Count(y=>y.Equals(x)));
foreach (var item in res)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static string DecodeNO(int n)
{
switch (n)
{
case 1:
return "_one";
case 2:
return "_two";
case 3:
return "_three";
}
return "";
}
I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):
string[] lines = tbIn.Text.Split('\n')
.Select(t => t.Replace("\r", "\r\n")).ToArray();
So I'm wondering - is there a better way to do it?
Accepted answer
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
The following seems to do the job:
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.
(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.
Something along the lines of using this regular expression:
[^\n\r]*\r\n
Then use Regex.Matches().
The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)
Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:
string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
lines[i] = lines[i].Replace("\r", "\r\n");
}
... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.
One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?
If you are just going to replace the newline (\n) then do something like this:
string[] lines = tbIn.Text.Split('\n')
.Select(t => t + "\r\n").ToArray();
Edit: Regex.Replace allows you to split on a string.
string[] lines = Regex.Split(tbIn.Text, "\r\n")
.Select(t => t + "\r\n").ToArray();
As always, extension method goodies :)
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
usage:
string text = "One,Two,Three,Four";
foreach (var s in text.SplitAndKeep(","))
{
Console.WriteLine(s);
}
Output:
One,
Two,
Three,
Four
You can achieve this with a regular expression. Here's an extension method with it:
public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
{
MatchCollection matches = Regex.Matches(input, #"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
string[] result = new string[matches.Count];
for (int i = 0; i < matches.Count ; i++)
{
result[i] = matches[i].Value;
}
return result;
}
I'm not sure if this is a better solution. Yours is very compact and simple.