Replacing chars with padded index - c#

I Have a string with special chars and i have to replace those chars with an index (padded n '0' left).
Fast example for better explanation:
I have the string "0980 0099 8383 $$$$" and an index (integer) 3
result should be "0980 0099 8383 0003"
The special characters are not necessarily in sequence.
the source string could be empty or it may not contain any special characters
I've already written functions that works.
public static class StringExtensions
{
public static string ReplaceCounter(this string source, int counter, string character)
{
string res = source;
try
{
if (!string.IsNullOrEmpty(character))
{
if (res.Contains(character))
{
// Get ALL Indexes position of character
var Indexes = GetIndexes(res, character);
int max = GetMaxValue(Indexes.Count);
while (counter >= max)
{
counter -= max;
}
var new_value = counter.ToString().PadLeft(Indexes.Count, '0');
for (int i = 0; i < Indexes.Count; i++)
{
res = res.Remove(Indexes[i], 1).Insert(Indexes[i], new_value[i].ToString());
}
}
}
}
catch (Exception)
{
res = source;
}
return res;
}
private static List<int> GetIndexes(string mainString, string toFind)
{
var Indexes = new List<int>();
for (int i = mainString.IndexOf(toFind); i > -1; i = mainString.IndexOf(toFind, i + 1))
{
// for loop end when i=-1 (line.counter not found)
Indexes.Add(i);
}
return Indexes;
}
private static int GetMaxValue(int numIndexes)
{
int max = 0;
for (int i = 0; i < numIndexes; i++)
{
if (i == 0)
max = 9;
else
max = max * 10 + 9;
}
return max;
}
}
but i don't really like it (first of all because i'm passing the char as string.. and not as a char).
string source = "000081059671####=1811";
int index = 5;
string character = "#";
string result = source.ReplaceCounter(index, character);
can it be more optimized and compact?
Can some good soul help me?
Thanks in advance
EDIT
The index is variable so:
If the index is 15
string source = "000081059671####=1811";
int index = 15;
string character = "#";
string result = source.ReplaceCounter(index, character);
// result = "0000810596710015=1811"
it should be a check if the index > max number
in my code i posted above, if this case happened i remove from index the "max" value until index < max number
What is mux number? if the special chars number is 4 (as in the example below) the max number will be 9999
string source = "000081059671####=1811";
// max number 9999

Yet another edit
From a comment it seems that more than one digit can be used. In this case the counter can be converted to a string and treated as a char[] to pick the character to use in each iteration :
public static string ReplaceCounter(this string source,
int counter,
char character)
{
var sb=new StringBuilder(source);
var replacements=counter.ToString();
int r=replacements.Length-1;
for(int i=sb.Length-1;i>=0;i--)
{
if(sb[i]==character)
{
sb[i]=r>=0 ? replacements[r--] : '0';
}
}
return sb.ToString();
}
This can be used for any number of digits."0980 0099 8383 $$$$".ReplaceCounter(15,'$') produces 0980 0099 8383 0015
An edit
After posting the original answer I remembered one can modify a string without allocations by using a StringBuilder. In this case, the last match needs to be replaced with one character, all other matches with another. This ca be a simple reverse iteration :
public static string ReplaceCounter(this string source,
int counter,
char character)
{
var sb=new StringBuilder(source);
bool useChar=true;
for(int i=sb.Length-1;i>=0;i--)
{
if(sb[i]==character)
{
sb[i]=useChar?(char)('0'+counter):'0';
useChar=false;
}
}
return sb.ToString();
}
Console.WriteLine("0000##81#059671####=1811".ReplaceCounter(5,'#'));
Console.WriteLine("0980 0099 8383 $$$$".ReplaceCounter(3,'$'));
------
0000008100596710005=1811
0980 0099 8383 0003
Original Answer
Any string modification operation produces a new temporary string that need to be garbage collected. This adds up so quickly that avoiding temporary strings can result in >10x speed improvements when processing lots of text or lots of requests. That's better than using parallel processing.
You can use Regex.Replace to perform complex replacements without allocating temporary strings. You can use one of the Replace overloads that use a MatchEvaluator to produce dynamic output, not just a single value.
In this case :
var source = "0000##81#059671####=1811";
var result = Regex.Replace(source,"#", m=>m.NextMatch().Success?"0":"5");
Console.WriteLine(result);
--------
0000008100596710005=1811
Match.NextMatch() returns the next match in the source, so m.NextMatch().Success can be used to identify the last match and replace it with the index.
This would fail if the character was one of the Regex pattern characters. This can be avoided by escaping the character with Regex.Escape(string)
This can be packed in an extension method
public static string ReplaceCounter(this string source,
int counter,
string character)
{
return Regex.Replace(source,
Regex.Escape(character),
m=>m.NextMatch().Success?"0":counter.ToString());
}
public static string ReplaceCounter(this string source,
int counter,
char character)
=>ReplaceCounter(source,counter,character.ToString());
This code
var source= "0980 0099 8383 $$$$";
var result=source.ReplaceCounter(5,"$");
Returns
0980 0099 8383 0003

I would suggest such solutiuon (got rid out of helper methods:
public static class StringExtensions
{
public static string ReplaceCounter(this string source, int counter, char character)
{
string res = source;
string strCounter = counter.ToString();
bool counterTooLong = false;
int idx;
// Going from the and backwards, we fill with counter digits.
for(int i = strCounter.Length - 1; i >= 0; i--)
{
idx = res.LastIndexOf(character);
// if we run out of special characters, break the loop.
if (idx == -1)
{
counterTooLong = true;
break;
}
res = res.Remove(idx, 1).Insert(idx, strCounter[i].ToString());
}
// If we could not fit the counter, we simply throw exception
if (counterTooLong) throw new InvalidOperationException();
// If we did not fill all placeholders, we fill it with zeros.
while (-1 != (idx = res.IndexOf(character))) res = res.Remove(idx, 1).Insert(idx, "0");
return res;
}
}
Here's fiddle

Related

function complex_decode( string str) that takes a non-simple repeated encoded string, and returns the original un-encoded string

Im trying to write a function complex_decode( string str) in c sharp that takes a non-simple repeated encoded string, and returns the original un-encoded string.
for example, "t11h12e14" would return "ttttttttttthhhhhhhhhhhheeeeeeeeeeeeee". I have been successful in decoding strings where the length is less than 10, but unable to work with length for than 10. I am not allowed to use regex, libraries or loops. Only recursions.
This is my code for simple decode which decodes when length less than 10.
public string decode(string str)
{
if (str.Length < 1)
return "";
if(str.Length==2)
return repeat_char(str[0], char_to_int(str[1]));
else
return repeat_char(str[0], char_to_int(str[1]))+decode(str.Substring(2));
}
public int char_to_int(char c)
{
return (int)(c-48);
}
public string repeat_char(char c, int n)
{
if (n < 1)
return "";
if (n == 1)
return ""+c;
else
return c + repeat_char(c, n - 1);
}
This works as intended, for example input "a5" returns "aaaaa", "t1h1e1" returns "the"
Any help is appreciated.
Here is another way of doing this, assuming the repeating string is always one character long and using only recursion (and a StringBuilder object):
private static string decode(string value)
{
var position = 0;
var result = decode_char(value, ref position);
return result;
}
private static string decode_char(string value, ref int position)
{
var next = value[position++];
var countBuilder = new StringBuilder();
get_number(value, ref position, countBuilder);
var result = new string(next, Convert.ToInt32(countBuilder.ToString()));
if (position < value.Length)
result += decode_char(value, ref position);
return result;
}
private static void get_number(string value, ref int position, StringBuilder countBuilder)
{
if (position < value.Length && char.IsNumber(value[position]))
{
countBuilder.Append(value[position++]);
get_number(value, ref position, countBuilder);
}
}
I've refactored your code a bit. I've removed 2 unnecessary methods that you don't actually need. So, the logic is simple and it works like this;
Example input: t3h2e4
Get the first digit. (Which is 2 and has index of 1)
Get the first letter comes after that index, which is our next letter. (Which is "h" and has index of 2)
Slice the string. Start from index 1 and end the slicing on index 2 to get repeat count. (Which is 3)
Repeat the first letter of string for repeat count times and combine it with the result you got from step 5.
Slice the starting from the next letter index we got in second step, to the very end of the string and pass this to recursive method.
public static string Decode(string input)
{
// If the string is empty or has only 1 character, return the string itself to not proceed.
if (input.Length <= 1)
{
return input;
}
// Convert string into character list.
var characters = new List<char>();
characters.AddRange(input);
var firstDigitIndex = characters.FindIndex(c => char.IsDigit(c)); // Get first digit
var nextLetterIndex = characters.FindIndex(firstDigitIndex, c => char.IsLetter(c)); // Get the next letter after that digit
if (nextLetterIndex == -1)
{
// This has only one reason. Let's say you are in the last recursion and you have c2
// There is no letter after the digit, so the index will -1, which means "not found"
// So, it will raise an exception, since we try to use the -1 in slicing part
// Instead, if it's not found, we set the next letter index to length of the string
// With doing that, you either get repeatCount correctly (since remaining part is only digits)
// or you will get empty string in the next recursion, which will stop the recursion.
nextLetterIndex = input.Length;
}
// Let's say first digit's index is 1 and the next letter's index is 2
// str[2..3] will start to slice the string from index 2 and will stop in index 3
// So, it will basically return us the repeat count.
var repeatCount = int.Parse(input[firstDigitIndex..nextLetterIndex]);
// string(character, repeatCount) constructor will repeat the "character" you passed to it for "repeatCount" times
return new string(input[0], repeatCount) + Decode(input[nextLetterIndex..]);
}
Examples;
Console.WriteLine(Decode("t1h1e1")); // the
Console.WriteLine(Decode("t2h3e4")); // tthhheeee
Console.WriteLine(Decode("t3h3e3")); // ttthhheee
Console.WriteLine(Decode("t2h10e2")); // tthhhhhhhhhhee
Console.WriteLine(Decode("t2h10e10")); // tthhhhhhhhhheeeeeeeeee
First you can simplify your repeat_char function, you have to have a clear stop condition:
public static string repeat_char(char c, int resultLength)
{
if(resultLength < 1) throw new ArgumentOutOfRangeException("resultLength");
if(resultLength == 1) return c.ToString();
return c + repeat_char(c, resultLength - 1);
}
See the use of the parameter as equivalent of a counter on a loop.
So you can have something similar on the main function, a parameter that tells when your substring is not an int anymore.
public static string decode(string str, int repeatNumberLength = 1)
{
if(repeatNumberLength < 1) throw new ArgumentOutOfRangeException("length");
//stop condition
if(string.IsNullOrWhiteSpace(str)) return str;
if(repeatNumberLength >= str.Length) repeatNumberLength = str.Length; //Some validation, just to be safe
//keep going until str[1...repeatNumberLength] is not an int
int charLength;
if(repeatNumberLength < str.Length && int.TryParse(str.Substring(1, repeatNumberLength), out charLength))
{
return decode(str, repeatNumberLength + 1);
}
repeatNumberLength--;
//Get the repeated Char.
charLength = int.Parse(str.Substring(1, repeatNumberLength));
var repeatedChar = repeat_char(str[0], charLength);
//decode the substring
var decodedSubstring = decode(str.Substring(repeatNumberLength + 1));
return repeatedChar + decodedSubstring;
}
I used a default parameter, but you can easily change it for a more traditonal style.
This also assumes that the original str is in a correct format.
An excellent exercise is to change the function so that you can have a word, instead of a char before the number. Then you could, for example, have "the3" as the parameter (resulting in "thethethe").
I took more of a Lisp-style head and tail approach (car and cdr if you speak Lisp) and created a State class to carry around the current state of the parsing.
First the State class:
internal class State
{
public State()
{
LastLetter = string.Empty;
CurrentCount = 0;
HasStarted = false;
CurrentValue = string.Empty;
}
public string LastLetter { get; private set; }
public int CurrentCount { get; private set; }
public bool HasStarted { get; private set; }
public string CurrentValue { get; private set; }
public override string ToString()
{
return $"LastLetter: {LastLetter}, CurrentCount: {CurrentCount}, HasStarted: {HasStarted}, CurrentValue: {CurrentValue}";
}
public void AddLetter(string letter)
{
CurrentCount = 0;
LastLetter = letter;
HasStarted = true;
}
public int AddDigit(string digit)
{
if (!HasStarted)
{
throw new InvalidOperationException($"The input must start with a letter, not a digit");
}
if (!int.TryParse(digit, out var num))
{
throw new InvalidOperationException($"Digit passed to {nameof(AddDigit)} ({digit}) is not a number");
}
CurrentCount = CurrentCount * 10 + num;
return CurrentCount;
}
public string GetValue()
{
if (string.IsNullOrEmpty(LastLetter))
{
return string.Empty;
}
CurrentValue = new string(LastLetter[0], CurrentCount);
return CurrentValue;
}
}
You'll notice it's got some stuff in there for debugging (example, the ToString override and the CurrentValue property)
Once you have that, the decoder is easy, it just recurses over the string it's given (along with (initially) a freshly constructed State instance):
private string Decode(string input, State state)
{
if (input.Length == 0)
{
_buffer.Append(state.GetValue());
return _buffer.ToString();
}
var head = input[0];
var tail = input.Substring(1);
var headString = head.ToString();
if (char.IsDigit(head))
{
state.AddDigit(headString);
}
else // it's a character
{
_buffer.Append(state.GetValue());
state.AddLetter(headString);
}
Decode(tail, state);
return _buffer.ToString();
}
I did this in a simple Windows Forms app, with a text box for input, a label for output and a button to crank her up:
const string NotAllowedPattern = #"[^a-zA-Z0-9]";
private static Regex NotAllowedRegex = new Regex(NotAllowedPattern);
private StringBuilder _buffer = new StringBuilder();
private void button1_Click(object sender, EventArgs e)
{
if (textBox1.Text.Length == 0 || NotAllowedRegex.IsMatch(textBox1.Text))
{
MessageBox.Show(this, "Only Letters and Digits Allowed", "Bad Input", MessageBoxButtons.OK, MessageBoxIcon.Error);
return;
}
label1.Text = string.Empty;
_buffer.Clear();
var result = Decode(textBox1.Text, new State());
label1.Text = result;
}
Yeah, there's a Regex there, but it's just to make sure that the input is valid; it's not involved in calculating the output.

Count how many time each Char is in a txt file in C#

I have a school homework, where I need to count how many times each letter is in a txt file, I can't use a dictionary and can't use LinQ, I then need to put it in order alphabetically, and in order of iterations.
string = "hello world"
output =
D=1
E=1
H=1
L=3
O=2
R=1
W=1
L=3
O=2
D=1
E=1
H=1
R=1
W=1
What I have so far works just for the order alphabetically , not for the itterations.
public void testChar() {
string text = File.ReadAllText(# "C:\Ecole\Session 2\Prog\Bloc 4\test.txt")?.ToUpper();
text = Regex.Replace(text, # "[^a-zA-Z]", "");
List < char > listChar = new List < char > ();
foreach(char lettre in text) {
listChar.Add(lettre);
}
int countPosition = 0;
List < int > position = new List < int > ();
listOfChar.Add(listChar[0]);
listOfRepetitions.Add(1);
position.Add(countPosition);
int jumpFirstItteration = 0;
foreach(var item in listChar) {
if (jumpFirstItteration == 0) {
jumpFirstItteration++;
}
if (listOfChar.Contains(item)) {
int pos = listOfChar.IndexOf(item);
listOfRepetitions[pos] += 1;
} else if (!listOfChar.Contains(item)) {
listOfChar.Add(item);
listOfRepetitions.Add(1);
countPosition++;
position.Add(countPosition);
}
}
}
Please help :D
The canonical way of computing a concordance is to use an array of integers for counting the letters, the same size as the number of different letters in the text - in this case, just the normal uppercase alphabetic characters A-Z.
Then you iterate through the uppercased letters and if they are in range, increment the count corresponding to that letter.
To simplify this you can make two observations:
To convert a letter to an index, just subtract 'A' from the character code.
To convert an index to a letter, just add 'A' to the index and cast the result back to a char. (The cast is necessary because the result of adding an int to a char is an int, not a char.)
Once you've done that, you'll have all the counts for the characters in alphabetical order. However, you also need the letters in order of frequency of occurrence. To compute that, you can use an overload of Array.Sort() that takes two arrays: The first parameter is an array to sort, and the second parameter is an array to sort in the same way as the first array.
If you pass the array of counts as the first array, and an array of all the letters being counted in alphabetical order as the second array (i.e. the letters A..Z) then after sorting the second array will give you the letters in the correct order to display with the first array.
Putting all that together:
public void testChar()
{
string filename = #"C:\Ecole\Session 2\Prog\Bloc 4\test.txt";
string text = File.ReadAllText(filename).ToUpper();
int[] concordance = new int[26]; // 26 different letters of the alphabet to count.
foreach (char c in text)
{
int index = c - 'A'; // A..Z will convert to 0..25; other chars will be outside that range.
if (index >= 0 && index < 26)
++concordance[index];
}
// Display frequency in alphabetic order, omitting chars with 0 occurances.
for (int i = 0; i < concordance.Length; ++i)
{
if (concordance[i] > 0)
Console.WriteLine($"{(char)('A'+i)} = {concordance[i]}");
}
Console.WriteLine();
// For sorting by frequency we need another array of chars A..Z in alphabetical order.
char[] aToZ = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToCharArray();
Array.Sort(concordance, aToZ);
// Display frequency in occurance order, omitting chars with 0 occurances.
for (int i = 0; i < concordance.Length; ++i)
{
if (concordance[i] > 0)
Console.WriteLine($"{aToZ[i]} = {concordance[i]}");
}
}
maybe add something like a sort function?
position.sort((a,b)=> if a < b return 1; if a > b return -1; return 0;)
This would be a normal sorting function based on their integer value, but since you are dealing with a list, it becomes a little more tedious.
position.sort((a,b)=> if a[i] < b[i] return 1; if a[i] > b[i] return -1; return 0;)
where i the index of integers to be compared
Forgive me, I come from java
Not necessarilly the most efficient, but works.
First, a simple letter class thant can compare its properies.
public class Letter
{
public char Symbole { get; set; }
public int Frequency { get; set; }
public int CompareLetter(object obj)
{
Letter other = obj as Letter;
if (other == null) return 1;
return this.Symbole.CompareTo(other.Symbole);
}
public int CompareFrequency(object obj)
{
Letter other = obj as Letter;
if (other == null) return 1;
return this.Frequency.CompareTo(other.Frequency);
}
}
Then a method to populate a List of Letter
public static List<Letter> ReturnLettersCount(string fileName)
{
string text;
List<Letter> letters = new List<Letter>();
using (StreamReader sr = new StreamReader(fileName))
{
text = sr.ReadToEnd().ToLower();
}
foreach (char c in text)
{
if (char.IsLetter(c))
{
Letter letter = letters.Find((letter) => letter.Symbole == c);
if (letter == null)
{
letters.Add(new Letter() { Symbole = c, Frequency = 1 });
}
else
{
letter.Frequency++;
}
}
}
return letters;
}
And a user code
static void Main(string[] args)
{
string fileName = #"path\to\your\textFile.txt";
List<Letter> letters = ReturnLettersCount(fileName);
letters.Sort( (a, b) => a.CompareLetter(b) );
foreach(Letter letter in letters)
{
Console.WriteLine($"{letter.Symbole}: {letter.Frequency}");
}
Console.WriteLine("--------------------------");
letters.Sort((b, a) => a.CompareFrequency(b));
foreach (Letter letter in letters)
{
Console.WriteLine($"{letter.Symbole}: {letter.Frequency}");
}
}
Without Linq and Dictionary it's like walking by foot to the grocery 20 miles away. ;)
//C# vs >= 8.0
using System;
using Entry = System.Collections.Generic.KeyValuePair<char, int>;
// class header ...
public static void Sort() {
var text = "hello world";
var contest = new System.Collections.Generic.List<Entry>(text.Length);
foreach (var c in text) {
if (!char.IsLetter(c)) {
continue;
}
var i = contest.FindIndex(kv => kv.Key == c);
if (i < 0) {
contest.Add(new(c, 1));
}
else {
contest[i] = new(c, contest[i].Value + 1);
}
}
contest.Sort((e1, e2) => e1.Key - e2.Key);
Console.Write("\n{0}\n\n", string.Join('\n', contest));
contest.Sort((e1, e2) => e2.Value - e1.Value);
Console.Write("\n{0}\n\n", string.Join('\n', contest));
}
A few suggestions that may come in handy.
You could build a container (the CharInfo class object here), to store the information you collect about each char found, it's position and how many times it's found in your text file.
This class container implements IComparable (the comparer just tests the char value, as an integer, against the char value of another object).
This makes it simple to sort a List<CharInfo>, just calling its Sort() method:
public class CharInfo : IComparable<CharInfo>
{
public CharInfo(char letter, int position) {
Character = letter;
Positions.Add(position);
}
public char Character { get; }
public int Occurrences { get; set; } = 1;
public List<int> Positions { get; } = new List<int>();
public static List<string> CharsFound { get; } = new List<string>();
public int CompareTo(CharInfo other) => this.Character - other.Character;
}
Since you can use the Regex class, you can then use Regex.Matches() to return all characters in the [a-zA-Z] range. Each Match object also stores the position where the character was found.
(As a note, you could just use the Matches collection and GroupBy() the results, but you cannot use LINQ and the other suggestions would just be mute :)
Looping the Matches collection, you test whether a char is already present in the list; if it is, add 1 to the Occurrences Property and add its position to the Positions Property.
If it's not already there, add a new CharInfo object to the collection. The constructor of the class takes a char and a position. The Occurrences Property defaults to 1 (you create a new CharInfo because you have found a new char, which is occurrence n° 1).
In the end, just Sort() the collection using its custom comparer. Sort() will call the IComparable.CompareTo() method of the class.
string text = File.ReadAllText([File Path]);
var matches = Regex.Matches(text, #"[a-zA-Z]", RegexOptions.Multiline);
var charsInfo = new List<CharInfo>();
foreach (Match m in matches) {
int pos = CharInfo.CharsFound.IndexOf(m.Value);
if (pos >= 0) {
charsInfo[pos].Occurrences += 1;
charsInfo[pos].Positions.Add(m.Index);
}
else {
CharInfo.CharsFound.Add(m.Value);
charsInfo.Add(new CharInfo(m.Value[0], m.Index));
}
}
// Sort the List<CharInfo> using the provided comparer
charsInfo.Sort();
Call CharInfo.CharsFound.Clear() before you search another text file.
You can then print the results as:
foreach (var chInfo in charsInfo) {
Console.WriteLine(
$"Char: {chInfo.Character} " +
$"Occurrences: {chInfo.Occurrences} " +
$"Positions: {string.Join(",", chInfo.Positions)}");
}
Note that upper and lower case chars are treated as distinct elements.
Modify as required.

Can I limit the number of times a character appears in a string?

For instance, I have a string and I only want the character '<' to appear 10 times in the string, and create a substring where the cutoff point is the 10th appearance of that character. Is this possible?
A manual solution could be like the following:
class Program
{
static void Main(string[] args)
{
int maxNum = 10;
string initialString = "a<b<c<d<e<f<g<h<i<j<k<l<m<n<o<p<q<r<s<t<u<v<w<x<y<z";
string[] splitString = initialString.Split('<');
string result = "";
Console.WriteLine(splitString.Length);
if (splitString.Length > maxNum)
{
for (int i = 0; i < maxNum; i++) {
result += splitString[i];
result += "<";
}
}
else
{
result = initialString;
}
Console.WriteLine(result);
Console.ReadKey();
}
}
By the way, it may be better to try to do it using Regex (in case you may have other replacement rules in the future, or need to make changes, etc). However, given your problem, something like that will work, too.
You can utilize TakeWhile for your purpose, given the string s, your character < as c and your count 10 as count, following function would solve your problem:
public static string foo(string s, char c, int count)
{
var i = 0;
return string.Concat(s.TakeWhile(x => (x == c ? i++ : i) < count));
}
Regex.Matches can be used to count the number of occurrences of a patter in a string.
It also reference the position of each occurrence, the Capture.Index property.
You can read the Index of the Nth occurrence and cut your string there:
(The RegexOptions are there just in case the pattern is something different. Modify as required.)
int cutAtOccurrence = 10;
string input = "one<two<three<four<five<six<seven<eight<nine<ten<eleven<twelve<thirteen<fourteen<fifteen";
var regx = Regex.Matches(input, "<", RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
if (regx.Count >= cutAtOccurrence) {
input = input.Substring(0, regx[cutAtOccurrence - 1].Index);
}
input is now:
one<two<three<four<five<six<seven<eight<nine<ten
If you need to use this procedure many times, it's bettern to build a method that returns a StringBuilder instead.

What is a Unicode safe replica of String.IndexOf(string input) that can handle Surrogate Pairs?

I am trying to figure out an equivalent to C# string.IndexOf(string) that can handle surrogate pairs in Unicode characters.
I am able to get the index when only comparing single characters, like in the code below:
public static int UnicodeIndexOf(this string input, string find)
{
return input.ToTextElements().ToList().IndexOf(find);
}
public static IEnumerable<string> ToTextElements(this string input)
{
var e = StringInfo.GetTextElementEnumerator(input);
while (e.MoveNext())
{
yield return e.GetTextElement();
}
}
But if I try to actually use a string as the find variable then it won't work because each text element only contains a single character to compare against.
Are there any suggestions as to how to go about writing this?
Thanks for any and all help.
EDIT:
Below is an example of why this is necessary:
CODE
Console.WriteLine("HolyCow𪘁BUBBYY𪘁YY𪘁Y".IndexOf("BUBB"));
Console.WriteLine("HolyCow#BUBBYY#YY#Y".IndexOf("BUBB"));
OUTPUT
9
8
Notice where I replace the 𪘁 character with # the values change.
You basically want to find index of one string array in another string array. We can adapt code from this question for that:
public static class Extensions {
public static int UnicodeIndexOf(this string input, string find, StringComparison comparison = StringComparison.CurrentCulture) {
return IndexOf(
// split input by code points
input.ToTextElements().ToArray(),
// split searched value by code points
find.ToTextElements().ToArray(),
comparison);
}
// code from another answer
private static int IndexOf(string[] haystack, string[] needle, StringComparison comparision) {
var len = needle.Length;
var limit = haystack.Length - len;
for (var i = 0; i <= limit; i++) {
var k = 0;
for (; k < len; k++) {
if (!String.Equals(needle[k], haystack[i + k], comparision)) break;
}
if (k == len) return i;
}
return -1;
}
public static IEnumerable<string> ToTextElements(this string input) {
var e = StringInfo.GetTextElementEnumerator(input);
while (e.MoveNext()) {
yield return e.GetTextElement();
}
}
}

How to get number from string in C#

i have a String in HTML (1-3 of 3 Trip) how do i get the number 3(before trip) and convert it to int.I want to use it as a count
Found this code
public static string GetNumberFromStr(string str)
{
str = str.Trim();
Match m = Regex.Match(str, #"^[\+\-]?\d*\.?[Ee]?[\+\-]?\d*$");
return (m.Value);
}
But it can only get 1 number
Regex is unnecessary overhead in your case. try this:
int ExtractNumber(string input)
{
int number = Convert.ToInt32(input.Split(' ')[2]);
return number;
}
Other useful methods for Googlers:
// throws exception if it fails
int i = int.Parse(someString);
// returns false if it fails, returns true and changes `i` if it succeeds
bool b = int.TryParse(someString, out i);
// this one is able to convert any numeric Unicode character to a double. Returns -1 if it fails
double two = char.GetNumericValue('٢')
Forget Regex. This code splits the string using a space as a delimiter and gets the number in the index 2 position.
string trip = "1-3 of 3 trip";
string[] array = trip.Split(' ');
int theNumberYouWant = int.Parse(array[2]);
Try this:
public static int GetNumberFromStr(string str)
{
str = str.Trim();
Match m = Regex.Match(str, #"^.*of\s(?<TripCount>\d+)");
return m.Groups["TripCount"].Length > 0 ? int.Parse(m.Groups["TripCount"].Value) : 0;
}
Another way to do it:
public static int[] GetNumbersFromString(string str)
{
List<int> result = new List<int>();
string[] numbers = Regex.Split(input, #"\D+");
int i;
foreach (string value in numbers)
{
if (int.TryParse(value, out i))
{
result.Add(i);
}
}
return result.ToArray();
}
Example of how to use:
const string input = "There are 4 numbers in this string: 40, 30, and 10.";
int[] numbers = MyHelperClass.GetNumbersFromString();
for(i = 0; i < numbers.length; i++)
{
Console.WriteLine("Number {0}: {1}", i + 1, number[i]);
}
Output:
Number: 4
Number: 40
Number: 30
Number: 10
Thanks to: http://www.dotnetperls.com/regex-split-numbers
If I'm reading your question properly, you'll get a string that is a single digit number followed by ' Trip' and you want to get the numeric value out?
public static int GetTripNumber(string tripEntry)
{
return int.Parse(tripEntry.ToCharArray()[0]);
}
Not really sure if you mean that you always have "(x-y of y Trip)" as a part of the string you parse...if you look at the pattern it only catches the "x-y" part thought with the acceptance of .Ee+- as seperators. If you want to catch the "y Trip" part you will have to look at another regex instead.
You could do a simple, if you change the return type to int instead of string:
Match m = Regex.Match(str, #"(?<maxTrips>\d+)\sTrip");
return m.Groups["maxTrips"].Lenght > 0 ? Convert.ToInt32(m.Groups["maxTrips"].Value) : 0;

Categories