C# How to generate a new string based on multiple ranged index - c#

Let's say I have a string like this one, left part is a word, right part is a collection of indices (single or range) used to reference furigana (phonetics) for kanjis in my word:
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす"
The pattern in detail:
word,<startIndex>(-<endIndex>):<furigana>
What would be the best way to achieve something like this (with a space in front of the kanji to mark which part is linked to the [furigana]):
子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Edit: (thanks for your comments guys)
Here is what I wrote so far:
static void Main(string[] args)
{
string myString = "ABCDEF,1:test;3:test2";
//Split Kanjis / Indices
string[] tokens = myString.Split(',');
//Extract furigana indices
string[] indices = tokens[1].Split(';');
//Dictionnary to store furigana indices
Dictionary<string, string> furiganaIndices = new Dictionary<string, string>();
//Collect
foreach (string index in indices)
{
string[] splitIndex = index.Split(':');
furiganaIndices.Add(splitIndex[0], splitIndex[1]);
}
//Processing
string result = tokens[0] + ",";
for (int i = 0; i < tokens[0].Length; i++)
{
string currentIndex = i.ToString();
if (furiganaIndices.ContainsKey(currentIndex)) //add [furigana]
{
string currentFurigana = furiganaIndices[currentIndex].ToString();
result = result + " " + tokens[0].ElementAt(i) + string.Format("[{0}]", currentFurigana);
}
else //nothing to add
{
result = result + tokens[0].ElementAt(i);
}
}
File.AppendAllText(#"D:\test.txt", result + Environment.NewLine);
}
Result:
ABCDEF,A B[test]C D[test2]EF
I struggle to find a way to process ranged indices:
string myString = "ABCDEF,1:test;2-3:test2";
Result : ABCDEF,A B[test] CD[test2]EF

I don't have anything against manually manipulating strings per se. But given that you seem to have a regular pattern describing the inputs, it seems to me that a solution that uses regex would be more maintainable and readable. So with that in mind, here's an example program that takes that approach:
class Program
{
private const string _kinvalidFormatException = "Invalid format for edit specification";
private static readonly Regex
regex1 = new Regex(#"(?<word>[^,]+),(?<edit>(?:\d+)(?:-(?:\d+))?:(?:[^;]+);?)+", RegexOptions.Compiled),
regex2 = new Regex(#"(?<start>\d+)(?:-(?<end>\d+))?:(?<furigana>[^;]+);?", RegexOptions.Compiled);
static void Main(string[] args)
{
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
string result = EditString(myString);
}
private static string EditString(string myString)
{
Match editsMatch = regex1.Match(myString);
if (!editsMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int ichCur = 0;
string input = editsMatch.Groups["word"].Value;
StringBuilder text = new StringBuilder();
foreach (Capture capture in editsMatch.Groups["edit"].Captures)
{
Match oneEditMatch = regex2.Match(capture.Value);
if (!oneEditMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int start, end;
if (!int.TryParse(oneEditMatch.Groups["start"].Value, out start))
{
throw new ArgumentException(_kinvalidFormatException);
}
Group endGroup = oneEditMatch.Groups["end"];
if (endGroup.Success)
{
if (!int.TryParse(endGroup.Value, out end))
{
throw new ArgumentException(_kinvalidFormatException);
}
}
else
{
end = start;
}
text.Append(input.Substring(ichCur, start - ichCur));
if (text.Length > 0)
{
text.Append(' ');
}
ichCur = end + 1;
text.Append(input.Substring(start, ichCur - start));
text.Append(string.Format("[{0}]", oneEditMatch.Groups["furigana"]));
}
if (ichCur < input.Length)
{
text.Append(input.Substring(ichCur));
}
return text.ToString();
}
}
Notes:
This implementation assumes that the edit specifications will be listed in order and won't overlap. It makes no attempt to validate that part of the input; depending on where you are getting your input from you may want to add that. If it's valid for the specifications to be listed out of order, you can also extend the above to first store the edits in a list and sort the list by the start index before actually editing the string. (In similar fashion to the way the other proposed answer works; though, why they are using a dictionary instead of a simple list to store the individual edits, I have no idea…that seems arbitrarily complicated to me.)
I included basic input validation, throwing exceptions where failures occur in the pattern matching. A more user-friendly implementation would add more specific information to each exception, describing what part of the input actually was invalid.
The Regex class actually has a Replace() method, which allows for complete customization. The above could have been implemented that way, using Replace() and a MatchEvaluator to provide the replacement text, instead of just appending text to a StringBuilder. Which way to do it is mostly a matter of preference, though the MatchEvaluator might be preferred if you have a need for more flexible implementation options (i.e. if the exact format of the result can vary).
If you do choose to use the other proposed answer, I strongly recommend you use StringBuilder instead of simply concatenating onto the results variable. For short strings it won't matter much, but you should get into the habit of always using StringBuilder when you have a loop that is incrementally adding onto a string value, because for long string the performance implications of using concatenation can be very negative.

This should do it (and even handle ranged indices), based on the formatting of the input string you have-
using System;
using System.Collections.Generic;
public class stringParser
{
private struct IndexElements
{
public int start;
public int end;
public string value;
}
public static void Main()
{
//input string
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
int wordIndexSplit = myString.IndexOf(',');
string word = myString.Substring(0,wordIndexSplit);
string indices = myString.Substring(wordIndexSplit + 1);
string[] eachIndex = indices.Split(';');
Dictionary<int,IndexElements> index = new Dictionary<int,IndexElements>();
string[] elements;
IndexElements e;
int dash;
int n = 0;
int last = -1;
string results = "";
foreach (string s in eachIndex)
{
e = new IndexElements();
elements = s.Split(':');
if (elements[0].Contains("-"))
{
dash = elements[0].IndexOf('-');
e.start = int.Parse(elements[0].Substring(0,dash));
e.end = int.Parse(elements[0].Substring(dash + 1));
}
else
{
e.start = int.Parse(elements[0]);
e.end = e.start;
}
e.value = elements[1];
index.Add(n,e);
n++;
}
//this is the part that takes the "setup" from the parts above and forms the result string
//loop through each of the "indices" parsed above
for (int i = 0; i < index.Count; i++)
{
//if this is the first iteration through the loop, and the first "index" does not start
//at position 0, add the beginning characters before its start
if (last == -1 && index[i].start > 0)
{
results += word.Substring(0,index[i].start);
}
//if this is not the first iteration through the loop, and the previous iteration did
//not stop at the position directly before the start of the current iteration, add
//the intermediary chracters
else if (last != -1 && last + 1 != index[i].start)
{
results += word.Substring(last + 1,index[i].start - (last + 1));
}
//add the space before the "index" match, the actual match, and then the formatted "index"
results += " " + word.Substring(index[i].start,(index[i].end - index[i].start) + 1)
+ "[" + index[i].value + "]";
//remember the position of the ending for the next iteration
last = index[i].end;
}
//if the last "index" did not stop at the end of the input string, add the remaining characters
if (index[index.Keys.Count - 1].end + 1 < word.Length)
{
results += word.Substring(index[index.Keys.Count-1].end + 1);
}
//trimming spaces that may be left behind
results = results.Trim();
Console.WriteLine("INPUT - " + myString);
Console.WriteLine("OUTPUT - " + results);
Console.Read();
}
}
input - 子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす
output - 子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Note that this should also work with characters the English alphabet if you wanted to use English instead-
input - iliketocodeverymuch,2:A;4-6:B;9-12:CDEFG
output - il i[A]k eto[B]co deve[CDEFG]rymuch

Related

C#: Need to split a string into a string[] and keeping the delimiter (also a string) at the beginning of the string

I think I am too dumb to solve this problem...
I have some formulas which need to be "translated" from one syntax to another.
Let's say I have a formula that goes like that (it's a simple one, others have many "Ceilings" in it):
string formulaString = "If([Param1] = 0, 1, Ceiling([Param2] / 0.55) * [Param3])";
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
My attempt is to split the fomulaString at "Ceiling(" so I am able to iterate through the string array and insert my string at the correct index (counting every "(" and ")" to get the right index)
What I have so far:
//splits correct, but loses "CEILING("
string[] parts = formulaString.Split(new[] { "CEILING(" }, StringSplitOptions.None);
//splits almost correct, "CEILING(" is in another group
string[] parts = Regex.Split(formulaString, #"(CEILING\()");
//splits almost every letter
string[] parts = Regex.Split(formulaString, #"(?=[(CEILING\()])");
When everything is done, I concat the string so I have my complete formula again.
What do I have to set as Regex pattern to achieve this sample? (Or any other method that will help me)
part1 = "If([Param1] = 0, 1, ";
part2 = "Ceiling([Param2] / 0.55) * [Param3])";
//part3 = next "CEILING(" in a longer formula and so on...
As I mention in a comment, you almost got it: (?=Ceiling). This is incomplete for your use case unfortunately.
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
Depending on your regex engine (for example JS) this works:
string[] parts = Regex.Split(formulaString, #"(?<=Ceiling\([^)]*(?=\)))");
string modifiedFormula = String.join("; 1", parts);
The regex
(?<=Ceiling\([^)]*(?=\)))
(?<= ) Positive lookbehind
Ceiling\( Search for literal "Ceiling("
[^)] Match any char which is not ")" ..
* .. 0 or more times
(?=\)) Positive lookahead for ")", effectively making us stop before the ")"
This regex is a zero-assertion, therefore nothing is lost and it will cut your strings before the last ")" in every "Ceiling()".
This solution would break whenever you have nested "Ceiling()". Then your only solution would be writing your own parser for the same reasons why you can't parse markup with regex.
Regex.Replace(formulaString, #"(?<=Ceiling\()(.*?)(?=\))","$1; 1");
Note: This will not work for nested "Ceilings", but it does for Ceiling(), It will also not work fir Ceiling(AnotherFunc(x)). For that you need something like:
Regex.Replace(formulaString, #"(?<=Ceiling\()((.*\((?>[^()]+|(?1))*\))*|[^\)]*)(\))","$1; 1$3");
but I could not get that to work with .NET, only in JavaScript.
This is my solution:
private string ConvertCeiling(string formula)
{
int ceilingsCount = formula.CountOccurences("Ceiling(");
int startIndex = 0;
int bracketCounter;
for (int i = 0; i < ceilingsCount; i++)
{
startIndex = formula.IndexOf("Ceiling(", startIndex);
bracketCounter = 0;
for (int j = 0; j < formula.Length; j++)
{
if (j < startIndex) continue;
var c = formula[j];
if (c == '(')
{
bracketCounter++;
}
if (c == ')')
{
bracketCounter--;
if (bracketCounter == 0)
{
// found end
formula = formula.Insert(j, "; 1");
startIndex++;
break;
}
}
}
}
return formula;
}
And CountOccurence:
public static int CountOccurences(this string value, string parameter)
{
int counter = 0;
int startIndex = 0;
int indexOfCeiling;
do
{
indexOfCeiling = value.IndexOf(parameter, startIndex);
if (indexOfCeiling < 0)
{
break;
}
else
{
startIndex = indexOfCeiling + 1;
counter++;
}
} while (true);
return counter;
}

C# Console Word Wrap

I have a string with newline characters and I want to wrap the words. I want to keep the newline characters so that when I display the text it looks like separate paragraphs. Anyone have a good function to do this? Current function and code below.(not my own function). The WordWrap function seems to be stripping out \n characters.
static void Main(string[] args){
StreamReader streamReader = new StreamReader("E:/Adventure Story/Intro.txt");
string intro = "";
string line;
while ((line = streamReader.ReadLine()) != null)
{
intro += line;
if(line == "")
{
intro += "\n\n";
}
}
WordWrap(intro);
public static void WordWrap(string paragraph)
{
paragraph = new Regex(#" {2,}").Replace(paragraph.Trim(), #" ");
var left = Console.CursorLeft; var top = Console.CursorTop; var lines = new List<string>();
for (var i = 0; paragraph.Length > 0; i++)
{
lines.Add(paragraph.Substring(0, Math.Min(Console.WindowWidth, paragraph.Length)));
var length = lines[i].LastIndexOf(" ", StringComparison.Ordinal);
if (length > 0) lines[i] = lines[i].Remove(length);
paragraph = paragraph.Substring(Math.Min(lines[i].Length + 1, paragraph.Length));
Console.SetCursorPosition(left, top + i); Console.WriteLine(lines[i]);
}
}
Here is a word wrap function that works by using regular expressions to find the places that it's ok to break and places where it must break. Then it returns pieces of the original text based on the "break zones". It even allows for breaks at hyphens (and other characters) without removing the hyphens (since the regex uses a zero-width positive lookbehind assertion).
IEnumerable<string> WordWrap(string text, int width)
{
const string forcedBreakZonePattern = #"\n";
const string normalBreakZonePattern = #"\s+|(?<=[-,.;])|$";
var forcedZones = Regex.Matches(text, forcedBreakZonePattern).Cast<Match>().ToList();
var normalZones = Regex.Matches(text, normalBreakZonePattern).Cast<Match>().ToList();
int start = 0;
while (start < text.Length)
{
var zone =
forcedZones.Find(z => z.Index >= start && z.Index <= start + width) ??
normalZones.FindLast(z => z.Index >= start && z.Index <= start + width);
if (zone == null)
{
yield return text.Substring(start, width);
start += width;
}
else
{
yield return text.Substring(start, zone.Index - start);
start = zone.Index + zone.Length;
}
}
}
If you want another newline to make text look-like paragraphs, just use Replace method of your String object.
var str =
"Line 1\n" +
"Line 2\n" +
"Line 3\n";
Console.WriteLine("Before:\n" + str);
str = str.Replace("\n", "\n\n");
Console.WriteLine("After:\n" + str);
Recently I've been working on creating some abstractions that imitate window-like features in a performance- and memory-sensitive console context.
To this end I had to implement word-wrapping functionality without any unnecessary string allocations.
The following is what I managed to simplify it into. This method:
preserves new-lines in the input string,
allows you to specify what characters it should break on (space, hyphen, etc.),
returns the start indices and lengths of the lines via Microsoft.Extensions.Primitives.StringSegment struct instances (but it's very simple to replace this struct with your own, or append directly to a StringBuilder).
public static IEnumerable<StringSegment> WordWrap(string input, int maxLineLength, char[] breakableCharacters)
{
int lastBreakIndex = 0;
while (true)
{
var nextForcedLineBreak = lastBreakIndex + maxLineLength;
// If the remainder is shorter than the allowed line-length, return the remainder. Short-circuits instantly for strings shorter than line-length.
if (nextForcedLineBreak >= input.Length)
{
yield return new StringSegment(input, lastBreakIndex, input.Length - lastBreakIndex);
yield break;
}
// If there are native new lines before the next forced break position, use the last native new line as the starting position of our next line.
int nativeNewlineIndex = input.LastIndexOf(Environment.NewLine, nextForcedLineBreak, maxLineLength);
if (nativeNewlineIndex > -1)
{
nextForcedLineBreak = nativeNewlineIndex + Environment.NewLine.Length + maxLineLength;
}
// Find the last breakable point preceding the next forced break position (and include the breakable character, which might be a hypen).
var nextBreakIndex = input.LastIndexOfAny(breakableCharacters, nextForcedLineBreak, maxLineLength) + 1;
// If there is no breakable point, which means a word is longer than line length, force-break it.
if (nextBreakIndex == 0)
{
nextBreakIndex = nextForcedLineBreak;
}
yield return new StringSegment(input, lastBreakIndex, nextBreakIndex - lastBreakIndex);
lastBreakIndex = nextBreakIndex;
}
}

How to run-length encode 'EEDDDNE' to '2E3DNE'?

Explanation: The task itself is that we have 13 strings (stored in the sor[] array) like the one in the title or 'EEENKDDDDKKKNNKDK'
and we have to shorten it in a way that if there's two or more of the same letter next to eachother then we have to write it in the form of 'NumberoflettersLetter'
So by this rule, 'EEENKDDDDKKKNNKDK' would become '3ENK4D3K2NKDK'
using System;
public class Program
{
public static void Main(string[] args)
{
string[] sor = new string[] { "EEENKDDDDKKKNNKDK", "'EEDDDNE'" };
char holder;
int counter = 0;
string temporary;
int indexholder;
for (int i = 0; i < sor.Length; i++)
{
for (int q = 0; q < sor[i].Length; q++)
{
holder = sor[i][q];
indexholder = q;
counter = 0;
while (sor[i][q] == holder)
{
q++;
counter++;
}
if (counter > 1)
{
temporary = Convert.ToString(counter) + holder;
sor[i].Replace(sor[i].Substring(indexholder, q), temporary); // EX here
}
}
}
Console.ReadLine();
}
}
Sorry I didn't make the error clear, it says that :
"The value of index and length has to represent a place inside the string (System.ArgumentOutOfRangeException) - name of parameter: length"
...but I have no clue what's wrong with it, maybe it's a tiny little mistake, maybe the whole thing is messed up, so this is why I'd like someone to help me with this D:
(Ps 'indexholder' is there because i need it for another exercise)
EDIT:
'sor' is the string array that holds these strings (there are 13 of them) like the one mentioned in the title or in the example
You can use regex for this:
Regex.Replace("EEENKDDDDKKKNNKDK", #"(.)\1+", m => $"{m.Length}{m.Groups[1].Value}")
Explanation:
(.) matches any character and puts it in group #1
\1+ matches group #1 as many times can it can
Shortening the same string inplace is more difficult then construction a new one while iterating the old one char by char. If you plan to iteratively add to a string it is better to use the StringBuilder - class instead of adding directly to a string (performance reasons).
You can streamline your approach by using IEnumerable.Aggregate function wich does the iteration on one string for you automatically:
using System;
using System.Linq;
using System.Text;
public class Program
{
public static string RunLengthEncode(string s)
{
if (string.IsNullOrEmpty(s)) // avoid null ref ex and do simple case
return "";
// we need a "state" between the differenc chars of s that we store here:
char curr_c = s[0]; // our current char, we start with the 1st one
int count = 0; // our char counter, we start with 0 as it will be
// incremented as soon as it is processed by Aggregate
// ( and then incremented to 1)
var agg = s.Aggregate(new StringBuilder(), (acc, c) => // StringBuilder
// performs better for multiple string-"additions" then string itself
{
if (c == curr_c)
count++; // same char, increment
else
{
// other char
if (count > 1) // store count if > 1
acc.AppendFormat("{0}", count);
acc.Append(curr_c); // store char
curr_c = c; // set current char to new one
count = 1; // startcount now is 1
}
return acc;
});
// add last things
if (count > 1) // store count if > 1
agg.AppendFormat("{0}", count);
agg.Append(curr_c); // store char
return agg.ToString(); // return the "simple" string
}
Test with
public static void Main(string[] args)
{
Console.WriteLine(RunLengthEncode("'EEENKDDDDKKKNNKDK' "));
Console.ReadLine();
}
}
Output for "'EEENKDDDDKKKNNKDK' ":
'3ENK4D3K2NKDK'
Your approach without using the same string is more like this:
var data = "'EEENKDDDDKKKNNKDK' ";
char curr_c = '\x0'; // avoid unasssinged warning
int count = 0; // counter for the curr_c occurences in row
string result = string.Empty; // resulting string
foreach (var c in data) // process every character of data in order
{
if (c != curr_c) // new character found
{
if (count > 1) // more then 1, add count as string and the char
result += Convert.ToString(count) + curr_c;
else if (count > 0) // avoid initial `\x0` being put into string
result += curr_c;
curr_c = c; // remember new character
count = 1; // so far we found this one
}
else
count++; // not new, increment counter
}
// add the last counted char as well
if (count > 1)
result += Convert.ToString(count) + curr_c;
else
result += curr_c;
// output
Console.WriteLine(data + " ==> " + result);
Output:
'EEENKDDDDKKKNNKDK' ==> '3ENK4D3K2NKDK'
Instead of using the indexing operator [] on your string and have to struggle with indexes all over I use foreach c in "sometext" ... which will proceed char-wise through the string - much less hassle.
If you need to run-length encode an array/list (your sor) of strings, simply apply the code to each one (preferably by using foreach s in yourStringList ....

C# More intuitive way to split a string into tokens?

I have a method which takes in a string, which contains various characters, but I'm only concerned about underscores '_' and dollar signs '$'. I want to split up the string into tokens by underscores as each piece b/w the underscores contains important information.
However, if a $ is contained in an area between underscores, then a token should be created from the last occurrence of an underscore to the end (ignoring any underscores in this last section).
Example
input: Hello_To_The$Great_World
expected tokens: Hello, To, The$Great_World
Question
I have a solution below, but I'm wondering is there a cleaner/more intuitive way of doing this than what I have below?
var aTokens = new List<string>();
var aPos = 0;
for (var aNum = 0; aNum < item.Length; aNum++)
{
if (aNum == item.Length - 1)
{
aTokens.Add(item.Substring(aPos, item.Length - aPos));
break;
}
if (item[aNum] == '$')
{
aTokens.Add(item.Substring(aPos, item.Length - aPos));
break;
}
if (item[aNum] == '_')
{
aTokens.Add(item.Substring(aPos, aNum - aPos));
aPos = aNum + 1;
}
}
You can split string by _ not having $ before them.
For that you can use the following regex:
(?<!\$.*)_
Sample code:
string input = "Hello_To_The$Great_World";
string[] output = Regex.Split(input, #"(?<!\$.*)_");
You also can do the task without regex and without loops, but with the help of 2 splits:
string input = "Hello_To_The$Great_World";
string[] temp = input.Split(new[] { '$' }, 2);
string[] output = temp[0].Split('_');
if (temp.Length > 1)
output[output.Length - 1] = output[output.Length - 1] + "$" + temp[1];
This method is not efficient or clean, but it gives you a general idea of how to do this:
Split your string into tokens
Find the index of the first string to contain $
Return a new array with the first n tokens and the final token is the remaining strings concatenated.
It's probably more useful to take advantage of IEnumerable or do things over a for loop instead of all this Array.Copy stuff... but you get the gist of it.
private string[] SomeMethod(string arg)
{
var strings = arg.Split(new[] { '_' });
var indexedValue = strings.Select((v, i) => new { Value = v, Index = i }).FirstOrDefault(x => x.Value.Contains("$"));
if (indexedValue != null)
{
var count = indexedValue.Index + 1;
string[] final = new string[count];
Array.Copy(strings, 0, final, 0, indexedValue.Index);
final[indexedValue.Index] = String.Join("_", strings, indexedValue.Index, strings.Length - indexedValue.Index);
return final;
}
return strings;
}
Here's my version (loops are so last year...)
const char dollar = '$';
const char underscore = '_';
var item = "Hello_To_The$Great_World";
var aTokens = new List<string>();
int dollarIndex = item.IndexOf(dollar);
if (dollarIndex >= 0)
{
int lastUnderscoreIndex = item.LastIndexOf(underscore, dollarIndex);
if (lastUnderscoreIndex >= 0)
{
aTokens.AddRange(item.Substring(0, lastUnderscoreIndex).Split(underscore));
aTokens.Add(item.Substring(lastUnderscoreIndex + 1));
}
else
{
aTokens.Add(item);
}
}
else
{
aTokens.AddRange(item.Split(underscore));
}
Edit:
I should have added, cleaner/more intuitive is very subjective, as you have found out by the variety of answers provided. From a maintainability point of view, it's much more important that the method you write to do the parsing is unit tested!
It's also an interesting exercise to test the performance of the various methods posted here - it quickly becomes apparent that your original version is much faster than using regular expressions! (Although in a real life situation, it's probably quite unlikely that the performance of this method will make any difference to your application!)

Find and count latter pair from a string by alphabetical order

I am working on a small project which is in C# where I want to find and count the latter pairs which comes in alphabetical order by ignoring spaces and special characters.
e.g.
This is a absolutely easy.
Here my output should be
hi 1
ab 1
I refereed This post but not getting exact idea for pair latter count.
First I remove the spaces and special characters as you specified by simply going though the string and checking whether the current character is a letter:
private static string GetLetters(string s)
{
string newString = "";
foreach (var item in s)
{
if (char.IsLetter(item))
{
newString += item;
}
}
return newString;
}
Than I wrote a method which checks if the next letter is in alphabetical order using simple logic. I lower the character's case and check if the current character's ASCII code + 1 is equal to the next one's. If it is, of course they are the same:
private static string[] GetLetterPairsInAlphabeticalOrder(string s)
{
List<string> pairs = new List<string>();
for (int i = 0; i < s.Length - 1; i++)
{
if (char.ToLower(s[i]) + 1 == char.ToLower(s[i + 1]))
{
pairs.Add(s[i].ToString() + s[i+1].ToString());
}
}
return pairs.ToArray();
}
Here is how the main method will look like:
static void Main()
{
string s = "This is a absolutely easy.";
s = GetLetters(s);
string[] pairOfLetters = GetLetterPairsInAlphabeticalOrder(s);
foreach (var item in arr)
{
Console.WriteLine(item);
}
}
First, I would normalize the string to reduce confusion from special characters like this:
string str = "This is a absolutely easy.";
Regex rgx = new Regex("[^a-zA-Z]");
str = rgx.Replace(str, "");
str = str.ToLower();
Then, I would loop over all of the characters in the string and see if their neighbor is the next letter in the alphabet.
Dictionary<string, int> counts = new Dictionary<string, int>();
for (int i = 0; i < str.Length - 1; i++)
{
if (str[i+1] == (char)(str[i]+1))
{
string index = "" + str[i] + str[i+1];
if (!counts.ContainsKey(index))
counts.Add(index, 0);
counts[index]++;
}
}
Printing the counts from there is pretty straightforward.
foreach (string s in counts.Keys)
{
Console.WriteLine(s + " " + counts[s]);
}

Categories