How do I get the exact position of regex matches? - c#

For example assume I have a text string
What is the value of pn in 1 ;/
This is a test 12./ lop
I want to get the exact line position of the regex matches for the regex pattern \d\s?[.,;:]\s?/. How can I do that
I've tried
string text = #"What is the value of pn in 1 ;/
This is a test 12./ lop";
string pattern = #"\d\s?[.,;:]\s?/";
foreach (Match m in Regex.Matches(text, pattern))
{
var info=LineFromPos(text,m.Index);
Console.WriteLine(info+","+m.Index);
}
Console.Read();
}
public static int LineFromPos(string S, int Pos)
{
int Res = 1;
for (int i = 0; i <= Pos - 1; i++)
if (S[i] == '\n') Res++;
return Res;
}
But the code outputs
1,27
2,49
Where it should be
1,27
2,16
How do I fix this?

You can try something like this:
string text = #"What is the value of pn in 1 ;/
This is a test 12./ lop";
string pattern = #"\d\s?[.,;:]\s?/";
var lines = Regex.Split(text, "\r\n|\r|\n").Where(s => s != String.Empty)
.ToList();
for (int i = 0; i < lines.Count; i++)
{
foreach (Match m in Regex.Matches(lines[i], pattern))
{
Console.WriteLine(string.Format("{0},{1}", i + 1, m.Index));
}
}

You're currently treating m.Index as if it's the position in the line, but it's actually the position in the string it sounds like you may want to write a method to convert from a string index into a position (both line and index within line) - assuming you want to keep the matches within a single string.
For example (using ValueTuple and C# 7 syntax - you could create your own line/column type otherwise):
static (int line, int column) FindPosition(string text, int index)
{
int line = 0;
int current = 0;
while (true)
{
int next = text.IndexOf('\n', current);
if (next > index || next == -1)
{
return (line, index - current);
}
current = next + 1;
line++;
}
}
We could be more efficient than that by remembering the position of the previous match, but it's simpler to keep it as just accepting the string and index.
Here's a complete example of that in your code:
using System;
using System.Text.RegularExpressions;
static class Int32Extensions
{
// This doesn't do what you might expect it to!
public static void Increment(this int x)
{
x = x + 1;
}
}
class Test
{
static void Main()
{
string text = #"What is the value of pn in 1 ;/
This is a test 12./ lop";
string pattern = #"\d\s?[.,;:]\s?/";
foreach (Match m in Regex.Matches(text, pattern))
{
var position = FindPosition(text, m.Index);
Console.WriteLine($"{position.line}, {position.column}");
}
}
static (int line, int column) FindPosition(string text, int index)
{
int line = 0;
int current = 0;
while (true)
{
int next = text.IndexOf('\n', current);
if (next > index || next == -1)
{
return (line, index - current);
}
current = next + 1;
line++;
}
}
}
That prints output of:
0, 27
1, 16
That's using 0-based line and column numbers - obviously you can add 1 when you display the values if you want to.

Related

Get Occurrences of Letters in A String

I am trying to count occurrences of letters in a string and almost got the result using the below code snippet:
public static void GetNoofLetters()
{
string str = "AAAAABBCCCDDDD";
int count = 1;
char[] charVal = str.ToCharArray();
List<string> charCnt = new List<string>();
string concat = "";
//Getting each letters using foreach loop
foreach (var ch in charVal)
{
int index = charCnt.FindIndex(c => c.Contains(ch.ToString())); //Checks if there's any existing letter in the list
if(index >= 0) //If letter exists, then count and replace the last value
{
count++;
charCnt[charCnt.Count - 1] = count.ToString() + ch.ToString();
}
else
{
charCnt.Add(ch.ToString()); //If no matching letter exists, then add it to the list initially
count = 1;
}
}
foreach (var item in charCnt)
{
concat += item;
}
Console.WriteLine(concat.Trim());
}
The code works for the given input sample and returns output as: 5A2B3C4D. Simple is that.
But say I've the following input: Second input sample
string str = "AAAAABBCCCDDDDAA";
Expected output:
5A2B3C4D2A
With the above code that I've returns the output as follows:
5A2B3C6A
The above actually occurred for the below code snippet:
if(index >= 0) //If letter found, then count and replace the last value
{
count++;
charCnt[charCnt.Count - 1] = count.ToString() + ch.ToString();
}
Is there any better idea that I can resolve to get the expected output for the second input sample? I can understand, am close enough and may be missing something that's simple enough.
Code sample: Count Occurrences of Letters
Why don't we just loop over value and count? We can have two possibilities:
When character c doesn't equal to current (we have the different character) we should write down the previous sequence and start a new one
Otherwise, add 1 to count
Code:
private static string Compress(string value) {
if (string.IsNullOrEmpty(value))
return value;
char current = '\0';
int count = 0;
StringBuilder result = new StringBuilder(2 * value.Length);
foreach (char c in value) {
if (count != 0 && c != current) {
result.Append(count);
result.Append(current);
count = 0;
}
current = c;
count += 1;
}
result.Append(count);
result.Append(current);
return result.ToString();
}
Please, fiddle yourself
Well, I ended with the following code sample:
public static void Main()
{
string str = "AAAAABBCCCDDDDAABBBBAABB";
int count = 1;
char[] charVal = str.ToCharArray();
List<string> charCnt = new List<string>();
charCnt.Add("");
string concat = "";
//Getting each letters using foreach loop
foreach (var ch in charVal)
{
var lastItem = charCnt.LastOrDefault();
if (lastItem.EndsWith((ch.ToString()))) //If letter exists, then count and replace the last value
{
count++;
charCnt[charCnt.Count - 1] = count.ToString() + ch.ToString();
}
else
{
charCnt.Add(ch.ToString()); //If no matching letter exists, then add it to the list initially
count = 1;
}
}
foreach (var item in charCnt)
{
concat += item; //Concatenate items from the list
}
Console.WriteLine(concat.Trim());
}
Here's a woking sample: Get Occurrences of Letters in A String

How to reverse an array of strings without changing the position of special characters in C#

I'm working on reversing a sentence. I'm able to do it. But I'm not sure, how to reverse the word without changing the special characters positions. I'm using regex but as soon as it finds the special characters it's stopping the reversal of the word.
Following is the code:
Console.WriteLine("Enter:");
string w = Console.ReadLine();
string rw = String.Empty;
String[] arr = w.Split(' ');
var regexItem = new Regex("^[a-zA-Z0-9]*$");
StringBuilder appendString = new StringBuilder();
for (int i = 0; i < arr.Length; i++)
{
char[] chararray = arr[i].ToCharArray();
for (int j = chararray.Length - 1; j >= 0; j--)
{
if (regexItem.IsMatch(rw))
{
rw = appendString.Append(chararray[j]).ToString();
}
}
sb.Append(' ');
}
Console.WriteLine(rw);
Console.ReadLine();
Example : Input
Marshall! Hello.
Expected output
llahsram! olleh.
A basic solution with regex and LINQ. Try it online.
public static void Main()
{
Console.WriteLine("Marshall! Hello.");
Console.WriteLine(Reverse("Marshall! Hello."));
}
public static string Reverse(string source)
{
// we split by groups to keep delimiters
var parts = Regex.Split(source, #"([^a-zA-Z0-9])");
// if we got a group of valid characters
var results = parts.Select(x => x.All(char.IsLetterOrDigit)
// we reverse it
? new string(x.Reverse().ToArray())
// or we keep the delimiters as it
: x);
// then we concat all of them
return string.Concat(results);
}
The same solution without LINQ. Try it online.
public static void Main()
{
Console.WriteLine("Marshall! Hello.");
Console.WriteLine(Reverse("Marshall! Hello."));
}
public static bool IsLettersOrDigits(string s)
{
foreach (var c in s)
{
if (!char.IsLetterOrDigit(c))
{
return false;
}
}
return true;
}
public static string Reverse(char[] s)
{
Array.Reverse(s);
return new string(s);
}
public static string Reverse(string source)
{
var parts = Regex.Split(source, #"([^a-zA-Z0-9])");
var results = new List<string>();
foreach(var x in parts)
{
results.Add(IsLettersOrDigits(x)
? Reverse(x.ToCharArray())
: x);
}
return string.Concat(results);
}
This is a solution without LINQ. I wasn't sure about what are considered special characters.
string sentence = "Marshall! Hello.";
List<string> words = sentence.Split(' ').ToList();
List<string> reversedWords = new List<string>();
foreach (string word in words)
{
char[] arr = new char[word.Length];
for( int i=0; i<word.Length; i++)
{
if(!Char.IsLetterOrDigit((word[i])))
{
for ( int x=0; x< i; x++)
{
arr[x] = arr[x + 1];
}
arr[i] = word[i];
}
else
{
arr[word.Length - 1 - i] = word[i];
}
}
reversedWords.Add(new string(arr));
}
string reversedSentence = string.Join(" ", reversedWords);
Console.WriteLine(reversedSentence);
And this is the output:
Updated Output = llahsraM! olleH.
Here is a non-regex version that does what you want:
var sentence = "Hello, john!";
var parts = sentence.Split(' ');
var reversed = new StringBuilder();
var charPositions = sentence.Select((c, idx) => new { Char = c, Index = idx })
.Where(_ => !char.IsLetterOrDigit(_.Char));
for (int i = 0; i < parts.Length; i++)
{
var chars = parts[i].ToCharArray();
for (int j = chars.Length - 1; j >= 0; j--)
{
if (char.IsLetterOrDigit(chars[j]))
{
reversed.Append(chars[j]);
}
}
}
foreach (var ch in charPositions)
{
reversed.Insert(ch.Index, ch.Char);
}
// olleH, nhoj!
Console.WriteLine(reversed.ToString());
Basically the trick is to remember the position of special (i.e. non letter or digit) characters and insert them at the end to those positions.
This solution is without LINQ and Regex. It may not be an efficient answer but working properly for small string values.
// This will reverse the string and special characters will just stay there.
public string ReverseString(string rString)
{
StringBuilder ss = new StringBuilder(rString);
int y = 0;
// The idea is to swap values. Like swapping first value with last one. It will keep swapping unless it reaches at the middle of the string where no swapping will be needed.
// This first loop is to detect first values.
for(int i=rString.Length-1;i>=0;i--)
{
// This condition is to check if the values is String or not. If it is not string then it is considered as special character which will just stay there at same old position.
if(Char.IsLetter(Convert.ToChar(rString.Substring(i,1))))
{
// This is second loop which is starting from end to swap values from end with first.
for (int k = y; k < rString.Length; k++)
{
// Again checking last values if values are string or not.
if (Char.IsLetter(Convert.ToChar(rString.Substring(k, 1))))
{
// This is swapping. So st1 is First value in that string
// st2 is the last item in that string
char st1 = Convert.ToChar(rString.Substring(k, 1));
char st2 = Convert.ToChar(rString.Substring(i, 1));
//This is swapping. So last item will go to first position and first item will go to last position, To make sure string is reversed.
// Remember when the string value is Special Character, swapping will move forward without swapping.
ss[rString.IndexOf(rString.Substring(i, 1))] = st1;
ss[rString.IndexOf(rString.Substring(k, 1))] = st2;
y++;
// When the swapping is done for first 2 items. The loop will stop to change the values.
break;
}
else
{
// This is just increment if value was Special character.
y++;
}
}
}
}
return ss.ToString();
}
Thanks!

Get permutation of specific characters in strings

Given a string like "N00MNM" I need all permutations of zero '0' char inside the string maintaining all other chars in fixed order.
The result must be:
"N0M0NM" "N0MN0M" "N0MNM0" "NM00NM" "NM0N0M" "NM0NM0" "NMN0M0" "NMNM00"
"0N0MNM" "0NM0NM" "0NMN0M" "0NMNM0"
Standard permutation function takes too time to do that work (we are talking of about 1500ms) and strings to test are longer than the sample one.
There's an algorithm for this?
What you're trying to do can be done by getting all different positions in which the character 0 (in this case) can be placed and then including the total of 0 characters (00 in this case) in all positions of the string. These positions are taken from the string without all occurrences of 0. The code bellow does it:
public static IEnumerable<string> Combs(string str, char c)
{
int count = str.Count(_c => _c == c);
string _str = new string(str.Where(_c => _c != c).ToArray());
// Compute all combinations with different positions
foreach (var positions in GetPositionsSets(0, _str.Length, count))
{
StringBuilder _b = new StringBuilder();
int index = 0;
foreach (var _char in _str)
{
if (positions.Contains(index))
{ _b.Append($"{c}{_char}"); }
else
{ _b.Append(_char); }
index++;
}
if (positions.Contains(index))
_b.Append(c);
yield return _b.ToString();
}
//Compute the remaining combinations. I.e., those whose at some position
//have the amount of supplied characters.
string p = new string(c, count);
for (int i = 0; i < _str.Length; i++)
{
yield return _str.Insert(i, p);
}
yield return _str + p;
}
//Gets all posible positions sets that can be obtain from minPos
//until maxPos with positionsCount positions, that is, C(n,k)
//where n = maxPos - minPos + 1 and k = positionsCount
private static IEnumerable<HashSet<int>> GetPositionsSets(int minPos, int maxPos, int positionsCount)
{
if (positionsCount == 0)
yield return new HashSet<int>();
for (int i = minPos; i <= maxPos; i++)
{
foreach (var positions in GetPositionsSets(i + 1, maxPos, positionsCount - 1))
{
positions.Add(i);
yield return positions;
}
}
}
The output of the code above for "N00MNM" is:
0N0MNM
0NM0NM
0NMN0M
0NMNM0
N0M0NM
N0MN0M
N0MNM0
NM0N0M
NM0NM0
NMN0M0
00NMNM
N00MNM
NM00NM
NMN00M
NMNM00

Iterating through string?

Not entirely sure this is possible, but say I have two strings like so:
"IAmAString-00001"
"IAmAString-00023"
What would be a quick'n'easy way to iterate from IAmAString-0001 to IAmAString-00023 by moving up the index of just the numbers on the end?
The problem is a bit more general than that, for example the string I could be dealing could be of any format but the last bunch of chars will always be numbers, so something like Super_Confusing-String#w00t0003 and in that case the last 0003 would be what I'd use to iterate through.
Any ideas?
You can use char.IsDigit:
static void Main(string[] args)
{
var s = "IAmAString-00001";
int index = -1;
for (int i = 0; i < s.Length; i++)
{
if (char.IsDigit(s[i]))
{
index = i;
break;
}
}
if (index == -1)
Console.WriteLine("digits not found");
else
Console.WriteLine("digits: {0}", s.Substring(index));
}
which produces this output:
digits: 00001
string.Format and a for loop should do what you want.
for(int i = 0; i <=23; i++)
{
string.Format("IAmAString-{0:D4}",i);
}
or something close to that (not sitting in front of a compiler).
string start = "IAmAString-00001";
string end = "IAmAString-00023";
// match constant part and ending digits
var matchstart = Regex.Match(start,#"^(.*?)(\d+)$");
int numberstart = int.Parse(matchstart.Groups[2].Value);
var matchend = Regex.Match(end,#"^(.*?)(\d+)$");
int numberend = int.Parse(matchend.Groups[2].Value);
// constant parts must be the same
if (matchstart.Groups[1].Value != matchend.Groups[1].Value)
throw new ArgumentException("");
// create a format string with same number of digits as original
string format = new string('0', matchstart.Groups[2].Length);
for (int ii = numberstart; ii <= numberend; ++ii)
Console.WriteLine(matchstart.Groups[1].Value + ii.ToString(format));
You could use a Regex:
var match=Regex.Match("Super_Confusing-String#w00t0003",#"(?<=(^.*\D)|^)\d+$");
if(match.Success)
{
var val=int.Parse(match.Value);
Console.WriteLine(val);
}
To answer more specifically, you could use named groups to extract what you need:
var match=Regex.Match(
"Super_Confusing-String#w00t0003",
#"(?<prefix>(^.*\D)|^)(?<digits>\d+)$");
if(match.Success)
{
var prefix=match.Groups["prefix"].Value;
Console.WriteLine(prefix);
var val=int.Parse(match.Groups["digits"].Value);
Console.WriteLine(val);
}
If you can assume that the last 5 characters are the number then:
string prefix = "myprefix-";
for (int i=1; i <=23; i++)
{
Console.WriteLine(myPrefix+i.ToString("D5"));
}
This function will find the trailing number.
private int FindTrailingNumber(string str)
{
string numString = "";
int numTest;
for (int i = str.Length - 1; i > 0; i--)
{
char c = str[i];
if (int.TryParse(c.ToString(), out numTest))
{
numString = c + numString;
}
}
return int.Parse(numString);
}
Assuming all your base strings are the same, this would iterate between strings.
string s1 = "asdf123";
string s2 = "asdf127";
int num1 = FindTrailingNumber(s1);
int num2 = FindTrailingNumber(s2);
string strBase = s1.Replace(num1.ToString(), "");
for (int i = num1; i <= num2; i++)
{
Console.WriteLine(strBase + i.ToString());
}
I think it would be better if you do the search from the last (Rick already upvoted you since it was ur logic :-))
static void Main(string[] args)
{
var s = "IAmAString-00001";
int index = -1;
for (int i = s.Length - 1; i >=0; i--)
{
if (!char.IsDigit(s[i]))
{
index = i;
break;
}
}
if (index == -1)
Console.WriteLine("digits not found");
else
Console.WriteLine("digits: {0}", s.Substring(index));
Console.ReadKey();
}
HTH
If the last X numbers are always digits, then:
int x = 5;
string s = "IAmAString-00001";
int num = int.Parse(s.Substring(s.Length - x, x));
Console.WriteLine("Your Number is: {0}", num);
If the last digits can be 3, 4, or 5 in length, then you will need a little more logic:
int x = 0;
string s = "IAmAString-00001";
foreach (char c in s.Reverse())//Use Reverse() so you start with digits only.
{
if(char.IsDigit(c) == false)
break;//If we start hitting non-digit characters, then exit the loop.
++x;
}
int num = int.Parse(s.Substring(s.Length - x, x));
Console.WriteLine("Your Number is: {0}", num);
I'm not good with complicated RegEx. Because of this, I always shy away from it when maximum optimization is unnecessary. The reason for this is RegEx doesn't always parse strings the way you expect it to. If there is and alternate solution that will still run fast then I'd rather go that route as it's easier for me to understand and know that it will work with any combination of strings.
For Example: if you use some of the other solutions presented here with a string like "I2AmAString-000001", then you will get "2000001" as your number instead of "1".

How do you perform string replacement on just a subsection of a string?

I'd like an efficient method that would work something like this
EDIT: Sorry I didn't put what I'd tried before. I updated the example now.
// Method signature, Only replaces first instance or how many are specified in max
public int MyReplace(ref string source,string org, string replace, int start, int max)
{
int ret = 0;
int len = replace.Length;
int olen = org.Length;
for(int i = 0; i < max; i++)
{
// Find the next instance of the search string
int x = source.IndexOf(org, ret + olen);
if(x > ret)
ret = x;
else
break;
// Insert the replacement
source = source.Insert(x, replace);
// And remove the original
source = source.Remove(x + len, olen); // removes original string
}
return ret;
}
string source = "The cat can fly but only if he is the cat in the hat";
int i = MyReplace(ref source,"cat", "giraffe", 8, 1);
// Results in the string "The cat can fly but only if he is the giraffe in the hat"
// i contains the index of the first letter of "giraffe" in the new string
The only reason I'm asking is because my implementation I'd imagine getting slow with 1,000s of replaces.
How about:
public static int MyReplace(ref string source,
string org, string replace, int start, int max)
{
if (start < 0) throw new System.ArgumentOutOfRangeException("start");
if (max <= 0) return 0;
start = source.IndexOf(org, start);
if (start < 0) return 0;
StringBuilder sb = new StringBuilder(source, 0, start, source.Length);
int found = 0;
while (max-- > 0) {
int index = source.IndexOf(org, start);
if (index < 0) break;
sb.Append(source, start, index - start).Append(replace);
start = index + org.Length;
found++;
}
sb.Append(source, start, source.Length - start);
source = sb.ToString();
return found;
}
it uses StringBuilder to avoid lots of intermediate strings; I haven't tested it rigorously, but it seems to work. It also tries to avoid an extra string when there are no matches.
To start, try something like this:
int count = 0;
Regex.Replace(source, Regex.Escape(literal), (match) =>
{
return (count++ > something) ? "new value" : match.Value;
});
To replace only the first match:
private string ReplaceFirst(string source, string oldString, string newString)
{
var index = source.IndexOf(oldString);
var begin = source.Substring(0, index);
var end = source.Substring(index + oldString.Length);
return begin + newString + end;
}
You have a bug in that you will miss the item to replace if it is in the beginning.
change these lines;
int ret = start; // instead of zero, or you ignore the start parameter
// Find the next instance of the search string
// Do not skip olen for the first search!
int x = i == 0 ? source.IndexOf(org, ret) : source.IndexOf(org, ret + olen);
Also your routine does 300 thousand replaces a second on my machine. Are you sure this will be a bottleneck?
And just found that your code also has an issue if you replace larger texts by smaller texts.
This code is 100% faster if you have four replaces and around 10% faster with one replacement (faster when compared with the posted original code). It uses the specified start parameter and works when replacing larger texts by smaller texts.
Mark Gravells solution is (no offense ;-) 60% slower as the original code and it also returns another value.
// Method signature, Only replaces first instance or how many are specified in max
public static int MyReplace(ref string source, string org, string replace, int start, int max)
{
var ret = 0;
int x = start;
int reps = 0;
int l = source.Length;
int lastIdx = 0;
string repstring = "";
while (x < l)
{
if ((source[x] == org[0]) && (reps < max) && (x >= start))
{
bool match = true;
for (int y = 1; y < org.Length; y++)
{
if (source[x + y] != org[y])
{
match = false;
break;
}
}
if (match)
{
repstring += source.Substring(lastIdx, x - lastIdx) + replace;
ret = x;
x += org.Length - 1;
reps++;
lastIdx = x + 1;
// Done?
if (reps == max)
{
source = repstring + source.Substring(lastIdx);
return ret;
}
}
}
x++;
}
if (ret > 0)
{
source = repstring + source.Substring(lastIdx);
}
return ret;
}

Categories