How to run-length encode 'EEDDDNE' to '2E3DNE'? - c#

Explanation: The task itself is that we have 13 strings (stored in the sor[] array) like the one in the title or 'EEENKDDDDKKKNNKDK'
and we have to shorten it in a way that if there's two or more of the same letter next to eachother then we have to write it in the form of 'NumberoflettersLetter'
So by this rule, 'EEENKDDDDKKKNNKDK' would become '3ENK4D3K2NKDK'
using System;
public class Program
{
public static void Main(string[] args)
{
string[] sor = new string[] { "EEENKDDDDKKKNNKDK", "'EEDDDNE'" };
char holder;
int counter = 0;
string temporary;
int indexholder;
for (int i = 0; i < sor.Length; i++)
{
for (int q = 0; q < sor[i].Length; q++)
{
holder = sor[i][q];
indexholder = q;
counter = 0;
while (sor[i][q] == holder)
{
q++;
counter++;
}
if (counter > 1)
{
temporary = Convert.ToString(counter) + holder;
sor[i].Replace(sor[i].Substring(indexholder, q), temporary); // EX here
}
}
}
Console.ReadLine();
}
}
Sorry I didn't make the error clear, it says that :
"The value of index and length has to represent a place inside the string (System.ArgumentOutOfRangeException) - name of parameter: length"
...but I have no clue what's wrong with it, maybe it's a tiny little mistake, maybe the whole thing is messed up, so this is why I'd like someone to help me with this D:
(Ps 'indexholder' is there because i need it for another exercise)
EDIT:
'sor' is the string array that holds these strings (there are 13 of them) like the one mentioned in the title or in the example

You can use regex for this:
Regex.Replace("EEENKDDDDKKKNNKDK", #"(.)\1+", m => $"{m.Length}{m.Groups[1].Value}")
Explanation:
(.) matches any character and puts it in group #1
\1+ matches group #1 as many times can it can

Shortening the same string inplace is more difficult then construction a new one while iterating the old one char by char. If you plan to iteratively add to a string it is better to use the StringBuilder - class instead of adding directly to a string (performance reasons).
You can streamline your approach by using IEnumerable.Aggregate function wich does the iteration on one string for you automatically:
using System;
using System.Linq;
using System.Text;
public class Program
{
public static string RunLengthEncode(string s)
{
if (string.IsNullOrEmpty(s)) // avoid null ref ex and do simple case
return "";
// we need a "state" between the differenc chars of s that we store here:
char curr_c = s[0]; // our current char, we start with the 1st one
int count = 0; // our char counter, we start with 0 as it will be
// incremented as soon as it is processed by Aggregate
// ( and then incremented to 1)
var agg = s.Aggregate(new StringBuilder(), (acc, c) => // StringBuilder
// performs better for multiple string-"additions" then string itself
{
if (c == curr_c)
count++; // same char, increment
else
{
// other char
if (count > 1) // store count if > 1
acc.AppendFormat("{0}", count);
acc.Append(curr_c); // store char
curr_c = c; // set current char to new one
count = 1; // startcount now is 1
}
return acc;
});
// add last things
if (count > 1) // store count if > 1
agg.AppendFormat("{0}", count);
agg.Append(curr_c); // store char
return agg.ToString(); // return the "simple" string
}
Test with
public static void Main(string[] args)
{
Console.WriteLine(RunLengthEncode("'EEENKDDDDKKKNNKDK' "));
Console.ReadLine();
}
}
Output for "'EEENKDDDDKKKNNKDK' ":
'3ENK4D3K2NKDK'
Your approach without using the same string is more like this:
var data = "'EEENKDDDDKKKNNKDK' ";
char curr_c = '\x0'; // avoid unasssinged warning
int count = 0; // counter for the curr_c occurences in row
string result = string.Empty; // resulting string
foreach (var c in data) // process every character of data in order
{
if (c != curr_c) // new character found
{
if (count > 1) // more then 1, add count as string and the char
result += Convert.ToString(count) + curr_c;
else if (count > 0) // avoid initial `\x0` being put into string
result += curr_c;
curr_c = c; // remember new character
count = 1; // so far we found this one
}
else
count++; // not new, increment counter
}
// add the last counted char as well
if (count > 1)
result += Convert.ToString(count) + curr_c;
else
result += curr_c;
// output
Console.WriteLine(data + " ==> " + result);
Output:
'EEENKDDDDKKKNNKDK' ==> '3ENK4D3K2NKDK'
Instead of using the indexing operator [] on your string and have to struggle with indexes all over I use foreach c in "sometext" ... which will proceed char-wise through the string - much less hassle.
If you need to run-length encode an array/list (your sor) of strings, simply apply the code to each one (preferably by using foreach s in yourStringList ....

Related

Count how many time each Char is in a txt file in C#

I have a school homework, where I need to count how many times each letter is in a txt file, I can't use a dictionary and can't use LinQ, I then need to put it in order alphabetically, and in order of iterations.
string = "hello world"
output =
D=1
E=1
H=1
L=3
O=2
R=1
W=1
L=3
O=2
D=1
E=1
H=1
R=1
W=1
What I have so far works just for the order alphabetically , not for the itterations.
public void testChar() {
string text = File.ReadAllText(# "C:\Ecole\Session 2\Prog\Bloc 4\test.txt")?.ToUpper();
text = Regex.Replace(text, # "[^a-zA-Z]", "");
List < char > listChar = new List < char > ();
foreach(char lettre in text) {
listChar.Add(lettre);
}
int countPosition = 0;
List < int > position = new List < int > ();
listOfChar.Add(listChar[0]);
listOfRepetitions.Add(1);
position.Add(countPosition);
int jumpFirstItteration = 0;
foreach(var item in listChar) {
if (jumpFirstItteration == 0) {
jumpFirstItteration++;
}
if (listOfChar.Contains(item)) {
int pos = listOfChar.IndexOf(item);
listOfRepetitions[pos] += 1;
} else if (!listOfChar.Contains(item)) {
listOfChar.Add(item);
listOfRepetitions.Add(1);
countPosition++;
position.Add(countPosition);
}
}
}
Please help :D
The canonical way of computing a concordance is to use an array of integers for counting the letters, the same size as the number of different letters in the text - in this case, just the normal uppercase alphabetic characters A-Z.
Then you iterate through the uppercased letters and if they are in range, increment the count corresponding to that letter.
To simplify this you can make two observations:
To convert a letter to an index, just subtract 'A' from the character code.
To convert an index to a letter, just add 'A' to the index and cast the result back to a char. (The cast is necessary because the result of adding an int to a char is an int, not a char.)
Once you've done that, you'll have all the counts for the characters in alphabetical order. However, you also need the letters in order of frequency of occurrence. To compute that, you can use an overload of Array.Sort() that takes two arrays: The first parameter is an array to sort, and the second parameter is an array to sort in the same way as the first array.
If you pass the array of counts as the first array, and an array of all the letters being counted in alphabetical order as the second array (i.e. the letters A..Z) then after sorting the second array will give you the letters in the correct order to display with the first array.
Putting all that together:
public void testChar()
{
string filename = #"C:\Ecole\Session 2\Prog\Bloc 4\test.txt";
string text = File.ReadAllText(filename).ToUpper();
int[] concordance = new int[26]; // 26 different letters of the alphabet to count.
foreach (char c in text)
{
int index = c - 'A'; // A..Z will convert to 0..25; other chars will be outside that range.
if (index >= 0 && index < 26)
++concordance[index];
}
// Display frequency in alphabetic order, omitting chars with 0 occurances.
for (int i = 0; i < concordance.Length; ++i)
{
if (concordance[i] > 0)
Console.WriteLine($"{(char)('A'+i)} = {concordance[i]}");
}
Console.WriteLine();
// For sorting by frequency we need another array of chars A..Z in alphabetical order.
char[] aToZ = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToCharArray();
Array.Sort(concordance, aToZ);
// Display frequency in occurance order, omitting chars with 0 occurances.
for (int i = 0; i < concordance.Length; ++i)
{
if (concordance[i] > 0)
Console.WriteLine($"{aToZ[i]} = {concordance[i]}");
}
}
maybe add something like a sort function?
position.sort((a,b)=> if a < b return 1; if a > b return -1; return 0;)
This would be a normal sorting function based on their integer value, but since you are dealing with a list, it becomes a little more tedious.
position.sort((a,b)=> if a[i] < b[i] return 1; if a[i] > b[i] return -1; return 0;)
where i the index of integers to be compared
Forgive me, I come from java
Not necessarilly the most efficient, but works.
First, a simple letter class thant can compare its properies.
public class Letter
{
public char Symbole { get; set; }
public int Frequency { get; set; }
public int CompareLetter(object obj)
{
Letter other = obj as Letter;
if (other == null) return 1;
return this.Symbole.CompareTo(other.Symbole);
}
public int CompareFrequency(object obj)
{
Letter other = obj as Letter;
if (other == null) return 1;
return this.Frequency.CompareTo(other.Frequency);
}
}
Then a method to populate a List of Letter
public static List<Letter> ReturnLettersCount(string fileName)
{
string text;
List<Letter> letters = new List<Letter>();
using (StreamReader sr = new StreamReader(fileName))
{
text = sr.ReadToEnd().ToLower();
}
foreach (char c in text)
{
if (char.IsLetter(c))
{
Letter letter = letters.Find((letter) => letter.Symbole == c);
if (letter == null)
{
letters.Add(new Letter() { Symbole = c, Frequency = 1 });
}
else
{
letter.Frequency++;
}
}
}
return letters;
}
And a user code
static void Main(string[] args)
{
string fileName = #"path\to\your\textFile.txt";
List<Letter> letters = ReturnLettersCount(fileName);
letters.Sort( (a, b) => a.CompareLetter(b) );
foreach(Letter letter in letters)
{
Console.WriteLine($"{letter.Symbole}: {letter.Frequency}");
}
Console.WriteLine("--------------------------");
letters.Sort((b, a) => a.CompareFrequency(b));
foreach (Letter letter in letters)
{
Console.WriteLine($"{letter.Symbole}: {letter.Frequency}");
}
}
Without Linq and Dictionary it's like walking by foot to the grocery 20 miles away. ;)
//C# vs >= 8.0
using System;
using Entry = System.Collections.Generic.KeyValuePair<char, int>;
// class header ...
public static void Sort() {
var text = "hello world";
var contest = new System.Collections.Generic.List<Entry>(text.Length);
foreach (var c in text) {
if (!char.IsLetter(c)) {
continue;
}
var i = contest.FindIndex(kv => kv.Key == c);
if (i < 0) {
contest.Add(new(c, 1));
}
else {
contest[i] = new(c, contest[i].Value + 1);
}
}
contest.Sort((e1, e2) => e1.Key - e2.Key);
Console.Write("\n{0}\n\n", string.Join('\n', contest));
contest.Sort((e1, e2) => e2.Value - e1.Value);
Console.Write("\n{0}\n\n", string.Join('\n', contest));
}
A few suggestions that may come in handy.
You could build a container (the CharInfo class object here), to store the information you collect about each char found, it's position and how many times it's found in your text file.
This class container implements IComparable (the comparer just tests the char value, as an integer, against the char value of another object).
This makes it simple to sort a List<CharInfo>, just calling its Sort() method:
public class CharInfo : IComparable<CharInfo>
{
public CharInfo(char letter, int position) {
Character = letter;
Positions.Add(position);
}
public char Character { get; }
public int Occurrences { get; set; } = 1;
public List<int> Positions { get; } = new List<int>();
public static List<string> CharsFound { get; } = new List<string>();
public int CompareTo(CharInfo other) => this.Character - other.Character;
}
Since you can use the Regex class, you can then use Regex.Matches() to return all characters in the [a-zA-Z] range. Each Match object also stores the position where the character was found.
(As a note, you could just use the Matches collection and GroupBy() the results, but you cannot use LINQ and the other suggestions would just be mute :)
Looping the Matches collection, you test whether a char is already present in the list; if it is, add 1 to the Occurrences Property and add its position to the Positions Property.
If it's not already there, add a new CharInfo object to the collection. The constructor of the class takes a char and a position. The Occurrences Property defaults to 1 (you create a new CharInfo because you have found a new char, which is occurrence n° 1).
In the end, just Sort() the collection using its custom comparer. Sort() will call the IComparable.CompareTo() method of the class.
string text = File.ReadAllText([File Path]);
var matches = Regex.Matches(text, #"[a-zA-Z]", RegexOptions.Multiline);
var charsInfo = new List<CharInfo>();
foreach (Match m in matches) {
int pos = CharInfo.CharsFound.IndexOf(m.Value);
if (pos >= 0) {
charsInfo[pos].Occurrences += 1;
charsInfo[pos].Positions.Add(m.Index);
}
else {
CharInfo.CharsFound.Add(m.Value);
charsInfo.Add(new CharInfo(m.Value[0], m.Index));
}
}
// Sort the List<CharInfo> using the provided comparer
charsInfo.Sort();
Call CharInfo.CharsFound.Clear() before you search another text file.
You can then print the results as:
foreach (var chInfo in charsInfo) {
Console.WriteLine(
$"Char: {chInfo.Character} " +
$"Occurrences: {chInfo.Occurrences} " +
$"Positions: {string.Join(",", chInfo.Positions)}");
}
Note that upper and lower case chars are treated as distinct elements.
Modify as required.

C# iterate a continuously growing multi dimensional array

Imagine I wanted to iterate from A to Z. We would use either Foreach or For loop. After attaining Z I would then like to iterate from AA to ZZ, so it starts at AA, then goes to AB, AC...AZ, BA, BC..BZ..ZA,ZB, ZZ. At which point we would move to three chars, then 4 etc up to an undefined point.
Because we don't have a defined length for the array we cannot use nested for loops... so
Question: How can this be done?
Note, No code has been given because we all know how to foreach over an array and nest foreach loops.
Here's some code that will do what you want. Full explanation follows but in summary it takes advantage of the fact that once you have done all the letters of a given length you do A followed by that entire sequence again then B followed by the entire sequence again, etc.
private IEnumerable<string> EnumerateLetters()
{
int count = 1;
while (true)
{
foreach(var letters in EnumerateLetters(count))
{
yield return letters;
}
count++;
}
}
private IEnumerable<string> EnumerateLetters(int count)
{
if (count==0)
{
yield return String.Empty;
}
else
{
char letter = 'A';
while(letter<='Z')
{
foreach(var letters in EnumerateLetters(count-1))
{
yield return letter+letters;
}
letter++;
}
}
}
There are two methods. The first is the one that you call and will generate an infinite sequence of letters. The second does the recursion magic.
The first is pretty simple. it has a count of how many letters we are on, calls the second method with that count and then enumerates through them returning them. Once it has done all for one size it increases the count and loops.
The second method is the one that does the magic. It takes in a count for the number of letters in the generated string. If the count is zero it returns an empty string and breaks.
If the count is more than one it will loop through the letters A to Z and for each letter it will append the sequence that it one shorter than it to the A. Then for the B and so on.
This will then keep going indefinitely.
The sequence will keep generating indefinitely. Because it uses recursion it would be theoretically possible to start stack overflowing if your letter string becomes too long but at one level of recursion per letter in the string you will need to be getting up to very long strings before you need to worry about that (and I suspect if you've gone that far in a loop that you'll run into other problems first).
The other key point (if you are not aware) is that yield return uses deferred execution so it will generate each new element in the sequence as it is needed so it will only generate as many items as you ask for. If you iterate through five times it will only generate A-E and won't have wasted any time thinking about what comes next.
Yet another generator (adding 1 to a number with radix == 26: A stands for 0, B for 1, ... Z for 25):
// please, notice, that Generator() can potentially spawn ifinitely many items
private static IEnumerable<String> Generator() {
char[] data = new char[] { 'A' }; // number to start with - "A"
while (true) {
yield return new string(data);
// trying to add one
for (int i = data.Length - 1; i >= 0; --i)
if (data[i] == 'Z')
data[i] = 'A';
else {
data[i] = (char) (data[i] + 1);
break;
}
// have we exhausted N-length numbers?
if (data.All(item => item == 'A'))
data = Enumerable
.Repeat('A', data.Length + 1) // ... continue with N + 1-length numbers
.ToArray();
}
}
Test
// take first 1000 items:
foreach (var item in Generator().Take(1000))
Console.WriteLine(item);
Outcome
A
B
C
..
X
Y
Z
AA
AB
..
AZ
BA
BB
BC
..
ZY
ZZ
AAA
AAB
AAC
..
ALK
ALL
You could do something like this, it gives me unending output of your pattern (sorry, not exact your pattern, but you understand how to do it)
public static IEnumerable<string> Produce()
{
string seed = "A";
int i = 0;
while (true)
{
yield return String.Join("", Enumerable.Repeat(seed, i));
if (seed == "Z")
{
seed = "A";
i++;
}
else
{
seed = ((char)(seed[0]+1)).ToString();
}
}
}
And than :
foreach (var s in Produce())
{
//Do something
}
EDIT I have desired output with this method :
public static IEnumerable<string> Produce()
{
int i = 1;
while (true)
{
foreach(var c in produceAmount(i))
{
yield return c;
}
i++;
}
}
private static IEnumerable<string> produceAmount(int i)
{
var firstRow = Enumerable.Range('A', 'Z' - 'A'+1).Select(x => ((char)x).ToString());
if (i >= 1)
{
var second = produceAmount(i - 1);
foreach (var c in firstRow)
{
foreach (var s in second)
{
yield return c + s;
}
}
}
else
{
yield return "";
}
}
The way to go is to use simple recursive approach. C# is a good language to present an idea with the use of generators:
private static IEnumerable<string> EnumerateLetters(int length) {
for (int i = 1; i <= length; i++) {
foreach (var letters in EnumerateLettersExact(i)) {
yield return letters;
}
}
}
private static IEnumerable<string> EnumerateLettersExact(int length) {
if (length == 0) {
yield return "";
}
else {
for (char c = 'A'; c <= 'Z'; ++c) {
foreach (var letters in EnumerateLettersExact(length - 1)) {
yield return c + letters;
}
}
}
}
private static void Main(string[] args) {
foreach (var letters in EnumerateLetters(2)) {
Console.Write($"{letters} ");
}
}
EnumerateLetters generates successive sequences of letters. The parameter decides up to which length would you like to request sequences.
EnumerateLettersExact takes care of generating sequences recursively. It can either be empty or is a concatenation of some letter with all sequences of shorter length.
Your're about to have an array from A to Z [A,...,Z].
Then your going to make multiple for loops
for example:
PSEUDOCODE
foreach(in array){
first = declare first variable (array)
foreach(in array{
second =declare 2nd variable (array)
return first + second
}
}
Try the following. This is a method to generate the appropriate string for a given number. You can write a for loop for however many number of iterations you want.
string SingleEntry(int number)
{
char[] array = " ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToArray();
Stack<string> entry = new Stack<string>();
List<string> list = new List<string>();
int bas = 26;
int remainder = number, index = 0;
do
{
if ((remainder % bas) == 0)
{
index = bas;
remainder--;
}
else
index = remainder % bas;
entry.Push(array[index].ToString());
remainder = remainder / bas;
}
while (remainder != 0);
string s = "";
while (entry.Count > 0)
{
s += entry.Pop();
}
return s;
}

C# How to generate a new string based on multiple ranged index

Let's say I have a string like this one, left part is a word, right part is a collection of indices (single or range) used to reference furigana (phonetics) for kanjis in my word:
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす"
The pattern in detail:
word,<startIndex>(-<endIndex>):<furigana>
What would be the best way to achieve something like this (with a space in front of the kanji to mark which part is linked to the [furigana]):
子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Edit: (thanks for your comments guys)
Here is what I wrote so far:
static void Main(string[] args)
{
string myString = "ABCDEF,1:test;3:test2";
//Split Kanjis / Indices
string[] tokens = myString.Split(',');
//Extract furigana indices
string[] indices = tokens[1].Split(';');
//Dictionnary to store furigana indices
Dictionary<string, string> furiganaIndices = new Dictionary<string, string>();
//Collect
foreach (string index in indices)
{
string[] splitIndex = index.Split(':');
furiganaIndices.Add(splitIndex[0], splitIndex[1]);
}
//Processing
string result = tokens[0] + ",";
for (int i = 0; i < tokens[0].Length; i++)
{
string currentIndex = i.ToString();
if (furiganaIndices.ContainsKey(currentIndex)) //add [furigana]
{
string currentFurigana = furiganaIndices[currentIndex].ToString();
result = result + " " + tokens[0].ElementAt(i) + string.Format("[{0}]", currentFurigana);
}
else //nothing to add
{
result = result + tokens[0].ElementAt(i);
}
}
File.AppendAllText(#"D:\test.txt", result + Environment.NewLine);
}
Result:
ABCDEF,A B[test]C D[test2]EF
I struggle to find a way to process ranged indices:
string myString = "ABCDEF,1:test;2-3:test2";
Result : ABCDEF,A B[test] CD[test2]EF
I don't have anything against manually manipulating strings per se. But given that you seem to have a regular pattern describing the inputs, it seems to me that a solution that uses regex would be more maintainable and readable. So with that in mind, here's an example program that takes that approach:
class Program
{
private const string _kinvalidFormatException = "Invalid format for edit specification";
private static readonly Regex
regex1 = new Regex(#"(?<word>[^,]+),(?<edit>(?:\d+)(?:-(?:\d+))?:(?:[^;]+);?)+", RegexOptions.Compiled),
regex2 = new Regex(#"(?<start>\d+)(?:-(?<end>\d+))?:(?<furigana>[^;]+);?", RegexOptions.Compiled);
static void Main(string[] args)
{
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
string result = EditString(myString);
}
private static string EditString(string myString)
{
Match editsMatch = regex1.Match(myString);
if (!editsMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int ichCur = 0;
string input = editsMatch.Groups["word"].Value;
StringBuilder text = new StringBuilder();
foreach (Capture capture in editsMatch.Groups["edit"].Captures)
{
Match oneEditMatch = regex2.Match(capture.Value);
if (!oneEditMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int start, end;
if (!int.TryParse(oneEditMatch.Groups["start"].Value, out start))
{
throw new ArgumentException(_kinvalidFormatException);
}
Group endGroup = oneEditMatch.Groups["end"];
if (endGroup.Success)
{
if (!int.TryParse(endGroup.Value, out end))
{
throw new ArgumentException(_kinvalidFormatException);
}
}
else
{
end = start;
}
text.Append(input.Substring(ichCur, start - ichCur));
if (text.Length > 0)
{
text.Append(' ');
}
ichCur = end + 1;
text.Append(input.Substring(start, ichCur - start));
text.Append(string.Format("[{0}]", oneEditMatch.Groups["furigana"]));
}
if (ichCur < input.Length)
{
text.Append(input.Substring(ichCur));
}
return text.ToString();
}
}
Notes:
This implementation assumes that the edit specifications will be listed in order and won't overlap. It makes no attempt to validate that part of the input; depending on where you are getting your input from you may want to add that. If it's valid for the specifications to be listed out of order, you can also extend the above to first store the edits in a list and sort the list by the start index before actually editing the string. (In similar fashion to the way the other proposed answer works; though, why they are using a dictionary instead of a simple list to store the individual edits, I have no idea…that seems arbitrarily complicated to me.)
I included basic input validation, throwing exceptions where failures occur in the pattern matching. A more user-friendly implementation would add more specific information to each exception, describing what part of the input actually was invalid.
The Regex class actually has a Replace() method, which allows for complete customization. The above could have been implemented that way, using Replace() and a MatchEvaluator to provide the replacement text, instead of just appending text to a StringBuilder. Which way to do it is mostly a matter of preference, though the MatchEvaluator might be preferred if you have a need for more flexible implementation options (i.e. if the exact format of the result can vary).
If you do choose to use the other proposed answer, I strongly recommend you use StringBuilder instead of simply concatenating onto the results variable. For short strings it won't matter much, but you should get into the habit of always using StringBuilder when you have a loop that is incrementally adding onto a string value, because for long string the performance implications of using concatenation can be very negative.
This should do it (and even handle ranged indices), based on the formatting of the input string you have-
using System;
using System.Collections.Generic;
public class stringParser
{
private struct IndexElements
{
public int start;
public int end;
public string value;
}
public static void Main()
{
//input string
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
int wordIndexSplit = myString.IndexOf(',');
string word = myString.Substring(0,wordIndexSplit);
string indices = myString.Substring(wordIndexSplit + 1);
string[] eachIndex = indices.Split(';');
Dictionary<int,IndexElements> index = new Dictionary<int,IndexElements>();
string[] elements;
IndexElements e;
int dash;
int n = 0;
int last = -1;
string results = "";
foreach (string s in eachIndex)
{
e = new IndexElements();
elements = s.Split(':');
if (elements[0].Contains("-"))
{
dash = elements[0].IndexOf('-');
e.start = int.Parse(elements[0].Substring(0,dash));
e.end = int.Parse(elements[0].Substring(dash + 1));
}
else
{
e.start = int.Parse(elements[0]);
e.end = e.start;
}
e.value = elements[1];
index.Add(n,e);
n++;
}
//this is the part that takes the "setup" from the parts above and forms the result string
//loop through each of the "indices" parsed above
for (int i = 0; i < index.Count; i++)
{
//if this is the first iteration through the loop, and the first "index" does not start
//at position 0, add the beginning characters before its start
if (last == -1 && index[i].start > 0)
{
results += word.Substring(0,index[i].start);
}
//if this is not the first iteration through the loop, and the previous iteration did
//not stop at the position directly before the start of the current iteration, add
//the intermediary chracters
else if (last != -1 && last + 1 != index[i].start)
{
results += word.Substring(last + 1,index[i].start - (last + 1));
}
//add the space before the "index" match, the actual match, and then the formatted "index"
results += " " + word.Substring(index[i].start,(index[i].end - index[i].start) + 1)
+ "[" + index[i].value + "]";
//remember the position of the ending for the next iteration
last = index[i].end;
}
//if the last "index" did not stop at the end of the input string, add the remaining characters
if (index[index.Keys.Count - 1].end + 1 < word.Length)
{
results += word.Substring(index[index.Keys.Count-1].end + 1);
}
//trimming spaces that may be left behind
results = results.Trim();
Console.WriteLine("INPUT - " + myString);
Console.WriteLine("OUTPUT - " + results);
Console.Read();
}
}
input - 子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす
output - 子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Note that this should also work with characters the English alphabet if you wanted to use English instead-
input - iliketocodeverymuch,2:A;4-6:B;9-12:CDEFG
output - il i[A]k eto[B]co deve[CDEFG]rymuch

How to get all permutations of groups in a string?

This is not homework, although it may seem like it. I've been browsing through the UK Computing Olympiad's website and found this problem (Question 1): here. I was baffled by it, and I'd want to see what you guys thought of how to do it. I can't think of any neat ways to get everything into groups (checking whether it's a palindrome after that is simple enough, i.e. originalString == new String(groupedString.Reverse.SelectMany(c => c).ToArray), assuming it is a char array).
Any ideas? Thanks!
Text for those at work:
A palindrome is a word that shows the same sequence of letters when
reversed. If a word can have its letters grouped together in two or
more blocks (each containing one or more adjacent letters) then it is
a block palindrome if reversing the order of those blocks results in
the same sequence of blocks.
For example, using brackets to indicate blocks, the following are
block palindromes:
• BONBON can be grouped together as (BON)(BON);
• ONION can be grouped together as (ON)(I)(ON);
• BBACBB can be grouped together as (B)(BACB)(B) or (BB)(AC)(BB) or
(B)(B)(AC)(B)(B)
Note that (BB)(AC)(B)(B) is not valid as the reverse (B)(B)(AC)(BB)
shows the blocks in a different order.
And the question is essentially how to generate all of those groups, to then check whether they are palindromes!
And the question is essentially how to generate all of those groups, to then check whether they are palindromes!
I note that this is not necessarily the best strategy. Generating all the groups first and then checking to see if they are palidromes is considerably more inefficient than generating only those groups which are palindromes.
But in the spirit of answering the question asked, let's solve the problem recursively. I will just generate all the groups; checking whether a set of groups is a palindrome is left as an exercise. I am also going to ignore the requirement that a set of groups contains at least two elements; that is easily checked.
The way to solve this problem elegantly is to reason recursively. As with all recursive solutions, we begin with a trivial base case:
How many groupings are there of the empty string? There is only the empty grouping; that is, the grouping with no elements in it.
Now we assume that we have a solution to a smaller problem, and ask "if we had a solution to a smaller problem, how could we use that solution to solve a larger problem?"
OK, suppose we have a larger problem. We have a string with 6 characters in it and we wish to produce all the groupings. Moreover, the groupings are symmetrical; the first group is the same size as the last group. By assumption we know how to solve the problem for any smaller string.
We solve the problem as follows. Suppose the string is ABCDEF. We peel off A and F from both ends, we solve the problem for BCDE, which remember we know how to do by assumption, and now we prepend A and append F to each of those solutions.
The solutions for BCDE are (B)(C)(D)(E), (B)(CD)(E), (BC)(DE), (BCDE). Again, we assume as our inductive hypothesis that we have the solution to the smaller problem. We then combine those with A and F to produce the solutions for ABCDEF: (A)(B)(C)(D)(E)(F), (A)(B)(CD)(E)(F), (A)(BC)(DE)(F) and (A)(BCDE)(F).
We've made good progress. Are we done? No. Next we peel off AB and EF, and recursively solve the problem for CD. I won't labour how that is done. Are we done? No. We peel off ABC and DEF and recursively solve the problem for the empty string in the middle. Are we done? No. (ABCDEF) is also a solution. Now we're done.
I hope that sketch motivates the solution, which is now straightforward. We begin with a helper function:
public static IEnumerable<T> AffixSequence<T>(T first, IEnumerable<T> body, T last)
{
yield return first;
foreach (T item in body)
yield return item;
yield return last;
}
That should be easy to understand. Now we do the real work:
public static IEnumerable<IEnumerable<string>> GenerateBlocks(string s)
{
// The base case is trivial: the blocks of the empty string
// is the empty set of blocks.
if (s.Length == 0)
{
yield return new string[0];
yield break;
}
// Generate all the sequences for the middle;
// combine them with all possible prefixes and suffixes.
for (int i = 1; s.Length >= 2 * i; ++i)
{
string prefix = s.Substring(0, i);
string suffix = s.Substring(s.Length - i, i);
string middle = s.Substring(i, s.Length - 2 * i);
foreach (var body in GenerateBlocks(middle))
yield return AffixSequence(prefix, body, suffix);
}
// Finally, the set of blocks that contains only this string
// is a solution.
yield return new[] { s };
}
Let's test it.
foreach (var blocks in GenerateBlocks("ABCDEF"))
Console.WriteLine($"({string.Join(")(", blocks)})");
The output is
(A)(B)(C)(D)(E)(F)
(A)(B)(CD)(E)(F)
(A)(BC)(DE)(F)
(A)(BCDE)(F)
(AB)(C)(D)(EF)
(AB)(CD)(EF)
(ABC)(DEF)
(ABCDEF)
So there you go.
You could now check to see whether each grouping is a palindrome, but why? The algorithm presented above can be easily modified to eliminate all non-palindromes by simply not recursing if the prefix and suffix are unequal:
if (prefix != suffix) continue;
The algorithm now enumerates only block palindromes. Let's test it:
foreach (var blocks in GenerateBlocks("BBACBB"))
Console.WriteLine($"({string.Join(")(", blocks)})");
The output is below; again, note that I am not filtering out the "entire string" block but doing so is straightforward.
(B)(B)(AC)(B)(B)
(B)(BACB)(B)
(BB)(AC)(BB)
(BBACBB)
If this subject interests you, consider reading my series of articles on using this same technique to generate every possible tree topology and every possible string in a language. It starts here:
http://blogs.msdn.com/b/ericlippert/archive/2010/04/19/every-binary-tree-there-is.aspx
This should work:
public List<string> BlockPalin(string s) {
var list = new List<string>();
for (int i = 1; i <= s.Length / 2; i++) {
int backInx = s.Length - i;
if (s.Substring(0, i) == s.Substring(backInx, i)) {
var result = string.Format("({0})", s.Substring(0, i));
result += "|" + result;
var rest = s.Substring(i, backInx - i);
if (rest == string.Empty) {
list.Add(result.Replace("|", rest));
return list;
}
else if (rest.Length == 1) {
list.Add(result.Replace("|", string.Format("({0})", rest)));
return list;
}
else {
list.Add(result.Replace("|", string.Format("({0})", rest)));
var recursiveList = BlockPalin(rest);
if (recursiveList.Count > 0) {
foreach (var recursiveResult in recursiveList) {
list.Add(result.Replace("|", recursiveResult));
}
}
else {
//EDIT: Thx to #juharr this list.Add is not needed...
// list.Add(result.Replace("|",string.Format("({0})",rest)));
return list;
}
}
}
}
return list;
}
And call it like this (EDIT: Again thx to #juharr, the distinct is not needed):
var x = BlockPalin("BONBON");//.Distinct().ToList();
var y = BlockPalin("ONION");//.Distinct().ToList();
var z = BlockPalin("BBACBB");//.Distinct().ToList();
The result:
x contains 1 element: (BON)(BON)
y contains 1 element: (ON)(I)(ON)
z contains 3 elements: (B)(BACB)(B),(B)(B)(AC)(B)(B) and (BB)(AC)(BB)
Although not so elegant as the one provided by #Eric Lippert, one might find interesting the following iterative string allocation free solution:
struct Range
{
public int Start, End;
public int Length { get { return End - Start; } }
public Range(int start, int length) { Start = start; End = start + length; }
}
static IEnumerable<Range[]> GetPalindromeBlocks(string input)
{
int maxLength = input.Length / 2;
var ranges = new Range[maxLength];
int count = 0;
for (var range = new Range(0, 1); ; range.End++)
{
if (range.End <= maxLength)
{
if (!IsPalindromeBlock(input, range)) continue;
ranges[count++] = range;
range.Start = range.End;
}
else
{
if (count == 0) break;
yield return GenerateResult(input, ranges, count);
range = ranges[--count];
}
}
}
static bool IsPalindromeBlock(string input, Range range)
{
return string.Compare(input, range.Start, input, input.Length - range.End, range.Length) == 0;
}
static Range[] GenerateResult(string input, Range[] ranges, int count)
{
var last = ranges[count - 1];
int midLength = input.Length - 2 * last.End;
var result = new Range[2 * count + (midLength > 0 ? 1 : 0)];
for (int i = 0; i < count; i++)
{
var range = result[i] = ranges[i];
result[result.Length - 1 - i] = new Range(input.Length - range.End, range.Length);
}
if (midLength > 0)
result[count] = new Range(last.End, midLength);
return result;
}
Test:
foreach (var input in new [] { "BONBON", "ONION", "BBACBB" })
{
Console.WriteLine(input);
var blocks = GetPalindromeBlocks(input);
foreach (var blockList in blocks)
Console.WriteLine(string.Concat(blockList.Select(range => "(" + input.Substring(range.Start, range.Length) + ")")));
}
Removing the line if (!IsPalindromeBlock(input, range)) continue; will produce the answer to the OP question.
It's not clear if you want all possible groupings, or just a possible grouping. This is one way, off the top-of-my-head, that you might get a grouping:
public static IEnumerable<string> GetBlocks(string testString)
{
if (testString.Length == 0)
{
yield break;
}
int mid = testString.Length / 2;
int i = 0;
while (i < mid)
{
if (testString.Take(i + 1).SequenceEqual(testString.Skip(testString.Length - (i + 1))))
{
yield return new String(testString.Take(i+1).ToArray());
break;
}
i++;
}
if (i == mid)
{
yield return testString;
}
else
{
foreach (var block in GetBlocks(new String(testString.Skip(i + 1).Take(testString.Length - (i + 1) * 2).ToArray())))
{
yield return block;
}
}
}
If you give it bonbon, it'll return bon. If you give it onion it'll give you back on, i. If you give it bbacbb, it'll give you b,b,ac.
Here's my solution (didn't have VS so I did it using java):
int matches = 0;
public void findMatch(String pal) {
String st1 = "", st2 = "";
int l = pal.length() - 1;
for (int i = 0; i < (pal.length())/2 ; i ++ ) {
st1 = st1 + pal.charAt(i);
st2 = pal.charAt(l) + st2;
if (st1.equals(st2)) {
matches++;
// DO THE SAME THING FOR THE MATCH
findMatch(st1);
}
l--;
}
}
The logic is pretty simple. I made two array of characters and compare them to find a match in each step. The key is you need to check the same thing for each match too.
findMatch("bonbon"); // 1
findMatch("bbacbb"); // 3
What about something like this for BONBON...
string bonBon = "BONBON";
First check character count for even or odd.
bool isEven = bonBon.Length % 2 == 0;
Now, if it is even, split the string in half.
if (isEven)
{
int halfInd = bonBon.Length / 2;
string firstHalf = bonBon.Substring(0, halfInd );
string secondHalf = bonBon.Substring(halfInd);
}
Now, if it is odd, split the string into 3 string.
else
{
int halfInd = (bonBon.Length - 1) / 2;
string firstHalf = bonBon.Substring(0, halfInd);
string middle = bonBon.Substring(halfInd, bonBon.Length - halfInd);
string secondHalf = bonBon.Substring(firstHalf.Length + middle.length);
}
May not be exactly correct, but it's a start....
Still have to add checking if it is actually a palindrome...
Good luck!!

Parsing strings recursively

I am trying to extract information out of a string - a fortran formatting string to be specific. The string is formatted like:
F8.3, I5, 3(5X, 2(A20,F10.3)), 'XXX'
with formatting fields delimited by "," and formatting groups inside brackets, with the number in front of the brackets indicating how many consecutive times the formatting pattern is repeated. So, the string above expands to:
F8.3, I5, 5X, A20,F10.3, A20,F10.3, 5X, A20,F10.3, A20,F10.3, 5X, A20,F10.3, A20,F10.3, 'XXX'
I am trying to make something in C# that will expand a string that conforms to that pattern. I have started going about it with lots of switch and if statements, but am wondering if I am not going about it the wrong way?
I was basically wondering if some Regex wizzard thinks that Regular expressions can do this in one neat-fell swoop? I know nothing about regular expressions, but if this could solve my problem I am considering putting in some time to learn how to use them... on the other hand if regular expressions can't sort this out then I'd rather spend my time looking at another method.
This has to be doable with Regex :)
I've expanded my previous example and it test nicely with your example.
// regex to match the inner most patterns of n(X) and capture the values of n and X.
private static readonly Regex matcher = new Regex(#"(\d+)\(([^(]*?)\)", RegexOptions.None);
// create new string by repeating X n times, separated with ','
private static string Join(Match m)
{
var n = Convert.ToInt32(m.Groups[1].Value); // get value of n
var x = m.Groups[2].Value; // get value of X
return String.Join(",", Enumerable.Repeat(x, n));
}
// expand the string by recursively replacing the innermost values of n(X).
private static string Expand(string text)
{
var s = matcher.Replace(text, Join);
return (matcher.IsMatch(s)) ? Expand(s) : s;
}
// parse a string for occurenses of n(X) pattern and expand then.
// return the string as a tokenized array.
public static string[] Parse(string text)
{
// Check that the number of parantheses is even.
if (text.Sum(c => (c == '(' || c == ')') ? 1 : 0) % 2 == 1)
throw new ArgumentException("The string contains an odd number of parantheses.");
return Expand(text).Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
}
I would suggest using a recusive method like the example below( not tested ):
ResultData Parse(String value, ref Int32 index)
{
ResultData result = new ResultData();
Index startIndex = index; // Used to get substrings
while (index < value.Length)
{
Char current = value[index];
if (current == '(')
{
index++;
result.Add(Parse(value, ref index));
startIndex = index;
continue;
}
if (current == ')')
{
// Push last result
index++;
return result;
}
// Process all other chars here
}
// We can't find the closing bracket
throw new Exception("String is not valid");
}
You maybe need to modify some parts of the code, but this method have i used when writing a simple compiler. Although it's not completed, just a example.
Personally, I would suggest using a recursive function instead. Every time you hit an opening parenthesis, call the function again to parse that part. I'm not sure if you can use a regex to match a recursive data structure.
(Edit: Removed incorrect regex)
Ended up rewriting this today. It turns out that this can be done in one single method:
private static string ExpandBrackets(string Format)
{
int maxLevel = CountNesting(Format);
for (int currentLevel = maxLevel; currentLevel > 0; currentLevel--)
{
int level = 0;
int start = 0;
int end = 0;
for (int i = 0; i < Format.Length; i++)
{
char thisChar = Format[i];
switch (Format[i])
{
case '(':
level++;
if (level == currentLevel)
{
string group = string.Empty;
int repeat = 0;
/// Isolate the number of repeats if any
/// If there are 0 repeats the set to 1 so group will be replaced by itself with the brackets removed
for (int j = i - 1; j >= 0; j--)
{
char c = Format[j];
if (c == ',')
{
start = j + 1;
break;
}
if (char.IsDigit(c))
repeat = int.Parse(c + (repeat != 0 ? repeat.ToString() : string.Empty));
else
throw new Exception("Non-numeric character " + c + " found in front of the brackets");
}
if (repeat == 0)
repeat = 1;
/// Isolate the format group
/// Parse until the first closing bracket. Level is decremented as this effectively takes us down one level
for (int j = i + 1; j < Format.Length; j++)
{
char c = Format[j];
if (c == ')')
{
level--;
end = j;
break;
}
group += c;
}
/// Substitute the expanded group for the original group in the format string
/// If the group is empty then just remove it from the string
if (string.IsNullOrEmpty(group))
{
Format = Format.Remove(start - 1, end - start + 2);
i = start;
}
else
{
string repeatedGroup = RepeatString(group, repeat);
Format = Format.Remove(start, end - start + 1).Insert(start, repeatedGroup);
i = start + repeatedGroup.Length - 1;
}
}
break;
case ')':
level--;
break;
}
}
}
return Format;
}
CountNesting() returns the highest level of bracket nesting in the format statement, but could be passed in as a parameter to the method. RepeatString() just repeats a string the specified number of times and substitutes it for the bracketed group in the format string.

Categories