This question already has answers here:
Parsing a chemical formula
(5 answers)
Closed 1 year ago.
I am trying to parse a chemical formula (in the format, for example: Al2O3 or O3 or C or C11H22O12) in C# from a string. It works fine unless there is only one atom of a particular element (e.g. the oxygen atom in H2O). How can I fix that problem, and in addition, is there a better way to parse a chemical formula string than I am doing?
ChemicalElement is a class representing a chemical element. It has properties AtomicNumber (int), Name (string), Symbol (string).
ChemicalFormulaComponent is a class representing a chemical element and atom count (e.g. part of a formula). It has properties Element (ChemicalElement), AtomCount (int).
The rest should be clear enough to understand (I hope) but please let me know with a comment if I can clarify anything, before you answer.
Here is my current code:
/// <summary>
/// Parses a chemical formula from a string.
/// </summary>
/// <param name="chemicalFormula">The string to parse.</param>
/// <exception cref="FormatException">The chemical formula was in an invalid format.</exception>
public static Collection<ChemicalFormulaComponent> FormulaFromString(string chemicalFormula)
{
Collection<ChemicalFormulaComponent> formula = new Collection<ChemicalFormulaComponent>();
string nameBuffer = string.Empty;
int countBuffer = 0;
for (int i = 0; i < chemicalFormula.Length; i++)
{
char c = chemicalFormula[i];
if (!char.IsLetterOrDigit(c) || !char.IsUpper(chemicalFormula, 0))
{
throw new FormatException("Input string was in an incorrect format.");
}
else if (char.IsUpper(c))
{
// Add the chemical element and its atom count
if (countBuffer > 0)
{
formula.Add(new ChemicalFormulaComponent(ChemicalElement.ElementFromSymbol(nameBuffer), countBuffer));
// Reset
nameBuffer = string.Empty;
countBuffer = 0;
}
nameBuffer += c;
}
else if (char.IsLower(c))
{
nameBuffer += c;
}
else if (char.IsDigit(c))
{
if (countBuffer == 0)
{
countBuffer = c - '0';
}
else
{
countBuffer = (countBuffer * 10) + (c - '0');
}
}
}
return formula;
}
I rewrote your parser using regular expressions. Regular expressions fit the bill perfectly for what you're doing. Hope this helps.
public static void Main(string[] args)
{
var testCases = new List<string>
{
"C11H22O12",
"Al2O3",
"O3",
"C",
"H2O"
};
foreach (string testCase in testCases)
{
Console.WriteLine("Testing {0}", testCase);
var formula = FormulaFromString(testCase);
foreach (var element in formula)
{
Console.WriteLine("{0} : {1}", element.Element, element.Count);
}
Console.WriteLine();
}
/* Produced the following output
Testing C11H22O12
C : 11
H : 22
O : 12
Testing Al2O3
Al : 2
O : 3
Testing O3
O : 3
Testing C
C : 1
Testing H2O
H : 2
O : 1
*/
}
private static Collection<ChemicalFormulaComponent> FormulaFromString(string chemicalFormula)
{
Collection<ChemicalFormulaComponent> formula = new Collection<ChemicalFormulaComponent>();
string elementRegex = "([A-Z][a-z]*)([0-9]*)";
string validateRegex = "^(" + elementRegex + ")+$";
if (!Regex.IsMatch(chemicalFormula, validateRegex))
throw new FormatException("Input string was in an incorrect format.");
foreach (Match match in Regex.Matches(chemicalFormula, elementRegex))
{
string name = match.Groups[1].Value;
int count =
match.Groups[2].Value != "" ?
int.Parse(match.Groups[2].Value) :
1;
formula.Add(new ChemicalFormulaComponent(ChemicalElement.ElementFromSymbol(name), count));
}
return formula;
}
The problem with your method is here:
// Add the chemical element and its atom count
if (countBuffer > 0)
When you don't have a number, count buffer will be 0, I think this will work
// Add the chemical element and its atom count
if (countBuffer > 0 || nameBuffer != String.Empty)
This will work when for formulas like HO2 or something like that.
I believe that your method will never insert into the formula collection the las element of the chemical formula.
You should add the last element of the bufer to the collection before return the result, like this:
formula.Add(new ChemicalFormulaComponent(ChemicalElement.ElementFromSymbol(nameBuffer), countBuffer));
return formula;
}
first of all: I haven't used a parser generator in .net, but I'm pretty sure you could find something appropriate. This would allow you to write the grammar of Chemical Formulas in a far more readable form. See for example this question for a first start.
If you want to keep your approach: Is it possible that you do not add your last element no matter if it has a number or not? You might want to run your loop with i<= chemicalFormula.Length and in case of i==chemicalFormula.Length also add what you have to your Formula. You then also have to remove your if (countBuffer > 0) condition because countBuffer can actually be zero!
Regex should work fine with simple formula, if you want to split something like:
(Zn2(Ca(BrO4))K(Pb)2Rb)3
it might be easier to use the parser for it (because of compound nesting). Any parser should be capable of handling it.
I spotted this problem few days ago I thought it would be good example how one can write grammar for a parser, so I included simple chemical formula grammar into my NLT suite. The key rules are -- for lexer:
"(" -> LPAREN;
")" -> RPAREN;
/[0-9]+/ -> NUM, Convert.ToInt32($text);
/[A-Z][a-z]*/ -> ATOM;
and for parser:
comp -> e:elem { e };
elem -> LPAREN e:elem RPAREN n:NUM? { new Element(e,$(n : 1)) }
| e:elem++ { new Element(e,1) }
| a:ATOM n:NUM? { new Element(a,$(n : 1)) }
;
Related
List<int> arr = new List<int>();
long max = 0;
long mul = 1;
string abc = #"73167176531330624919225119674426574742355349194934
85861560789112949495459501737958331952853208805511
96983520312774506326239578318016984801869478851843
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450";
foreach (char a in abc)
{
if(arr.Count == 13)
{
arr.RemoveAt(0);
}
int value = (int)Char.GetNumericValue(a);
arr.Add(value);
if(arr.Count == 13)
{
foreach(int b in arr)
{
mul = mul * b;
if (mul > max)
{
max = mul;
}
}
mul = 1;
}
}
Console.WriteLine(max);
I am getting 5377010688 which is a wrong answer and when I am trying same logic with given example in project Euler it is working fine, please help me.
Don't say the answer just correct me where I am doing wrong or where the code is not running as it should.
The string constant, as it is written down like above, contains blanks and \r\n's, e.g. between the last '4' of the first line and the first '8' on the second line. Char.GetNumericValue() returns -1 for a blank.
Propably the character sequence with the highest product spans across adjacent lines in your string, therefore there are blanks in between, which count as -1, which disables your code in finding them.
Write your constant like this:
string abc = #"73167176531330624919225119674426574742355349194934" +
"85861560789112949495459501737958331952853208805511" +
"96983520312774506326239578318016984801869478851843" + etc.
The result is then 23514624000, I hope that's correct.
Don't say the answer just correct me where I am doing wrong or where
the code is not running as it should
You have included all characters into calculation but you should not do that. The input string also contains for example carriage return '\n' at the end of each line.
Your actual string look like this:
string abc = #"73167176531330624919225119674426574742355349194934\r\n
85861560789112949495459501737958331952853208805511\r\n
...
How to solve this? You should ignore these characters, one possible solution is to check each char if it is a digit:
if(!char.IsDigit(a))
{
continue;
}
Example
If I had a text file with these lines:
The cat meowed.
The dog barked.
The cat ran up a tree.
I would want to end up with a matrix of rows and columns like this:
0 1 2 3 4 5 6 7 8 9
0| t-h-e- -c-a-t- -m-e-o-w-e-d-.- - - - - - - -
1| t-h-e- -d-o-g- -b-a-r-k-e-d-.- - - - - - - -
2| t-h-e- -c-a-t- -r-a-n- -u-p- -a- -t-r-e-e-.-
Then I would like to query this matrix to quickly determine information about the text file itself. For example, I would quickly be able to tell if everything in column "0" is a "t" (it is).
I realize that this might seem like a strange thing to do. I am trying to ultimately (among other things) determine if various text files are fixed-width delimited without any prior knowledge about the file. I also want to use this matrix to detect patterns.
The actual files that will go through this are quite large.
Thanks!
For example, I would quickly be able to tell if everything in column "0" is a "t" (it is).
int column = 0;
char charToCheck = 't';
bool b = File.ReadLines(filename)
.All(s => (s.Length > column ? s[column] : '\0') == charToCheck);
What you can do is read the first line of your text file and use it as a mask. Compare every next line to the mask and remove every character from the mask that is not the same as the character at the same position. After processing al lines you'll have a list of delimiters.
Btw, code is not very clean but it is a good starter I think.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace DynamicallyDetectFixedWithDelimiter
{
class Program
{
static void Main(string[] args)
{
var sr = new StreamReader(#"C:\Temp\test.txt");
// Get initial list of delimiters
char[] firstLine = sr.ReadLine().ToCharArray();
Dictionary<int, char> delimiters = new Dictionary<int, char>();
for (int i = 0; i < firstLine.Count(); i++)
{
delimiters.Add(i, firstLine[i]);
}
// Read subsequent lines, remove delimeters from
// the dictionary that are not present in subsequent lines
string line;
while ((line = sr.ReadLine()) != null && delimiters.Count() != 0)
{
var subsequentLine = line.ToCharArray();
var invalidDelimiters = new List<int>();
// Compare all chars in first and subsequent line
foreach (var delimiter in delimiters)
{
if (delimiter.Key >= subsequentLine.Count())
{
invalidDelimiters.Add(delimiter.Key);
continue;
}
// Remove delimiter when it differs from the
// character at the same position in a subsequent line
if (subsequentLine[delimiter.Key] != delimiter.Value)
{
invalidDelimiters.Add(delimiter.Key);
}
}
foreach (var invalidDelimiter in invalidDelimiters)
{
delimiters.Remove(invalidDelimiter);
}
}
foreach (var delimiter in delimiters)
{
Console.WriteLine(String.Format("Delimiter at {0} = {1}", delimiter.Key, delimiter.Value));
}
sr.Close();
}
}
}
"I am trying to ultimately (among other things) determine if various text files are fixed-width (...)"
If that's so, you could try this:
public bool isFixedWidth (string fileName)
{
string[] lines = File.ReadAllLines(fileName);
int length = lines[0].Length;
foreach (string s in lines)
{
if (s.length != Length)
{
return false;
}
}
return true;
}
Once you get that lines variable, you can access any character as though they were in a matrix. Like char c = lines[3][1];. However, there is no hard guarantee that all lines are the same length. You could pad them to be the same length as the longest one, if you so wanted.
Also,
"how would I query to get a list of all columns that contain a space character for ALL rows (for example)"
You could try this:
public bool CheckIfAllCharactersInAColumnAreTheSame (string[] lines, int colIndex)
{
char c = lines[0][colIndex];
try
{
foreach (string s in lines)
{
if (s[colIndex] != c)
{
return false;
}
}
return true;
}
catch (IndexOutOfRangeException ex)
{
return false;
}
}
Since it's not clear where you're have difficulty exactly, here are a few pointers.
Reading the file as strings, one per line:
string[] lines = File.ReadAllLines("filename.txt");
Obtaning a jagged array (a matrix) of characters from the lines (this step seems unnecessary since strings can be indexed just like character arrays):
char[][] charMatrix = lines.Select(l => l.ToCharArray()).ToArray();
Example query: whether every character in column 0 is a 't':
bool allTs = charMatrix.All(row => row[0] == 't');
I have a spinning text : {T1{M1|{A1|B1}|M2}F1|{X1|X2}}
My question is : How can i find all permutations in C# ?
T1M1F1
T1M2F1
T1A1F1
T1B1F1
X1
X2
Any suggestions ?
Edit :
Thank you for your help but M1,A1, .. are examples
With words that could give :
{my name is james vick and i am a {member|user|visitor} on this {forum|website|site} and i am loving it | i am admin and i am a {supervisor|admin|moderator} on this {forum|website|site} and i am loving it}.
my name is james vick and i am a {member|user|visitor} on this {forum|website|site} and i am loving it => 3 * 3 => 9 permutations
i am admin and i am a {supervisor|admin|moderator} on this {forum|website|site} and i am loving it => 3 * 3 => 9 permutations
Result : 18 permutations
Method to generate all permuatuons of spinnable strings
I've implemented a simple method to solve this problem.
It takes an ArrayList argument containing spinnable text string(s).
I use it to generate all the permutations of multiple spinnable strings.
It comes with extra functionality of support of optional blocks, surronded by "[ ]" brackets.
Eq.:
If you have a single string object in the ArrayList with content of:
{A | {B1 | B2 } [B optional] }
It populates the array list with all the permutations, "extracted"
Contents after invocation of method:
A
B1
B1 B optional
B2
B2 B optional
You can also pass multiple strings as argument to generate permutations for all of them:
Eg.:
Input:
ArraList with two string
{A1 | A2}
{B1 | B2}
Contents after invocation:
A1
A2
B1
B2
This implementation works by always finding the inner most bracket pair in the first spinnable section, then extract it. I do this until all the special {}, [] characters are removed.
private void ExtractVersions(ArrayList list)
{
ArrayList IndicesToRemove = new ArrayList();
for (int i = 0; i < list.Count; i++)
{
string s = list[i].ToString();
int firstIndexOfCurlyClosing = s.IndexOf('}');
int firstIndexOfBracketClosing = s.IndexOf(']');
if ((firstIndexOfCurlyClosing > -1) || (firstIndexOfBracketClosing > -1))
{
char type = ' ';
int endi = -1;
int starti = -1;
if ((firstIndexOfBracketClosing == -1) && (firstIndexOfCurlyClosing > -1))
{ // Only Curly
endi = firstIndexOfCurlyClosing;
type = '{';
}
else
{
if ((firstIndexOfBracketClosing > -1) && (firstIndexOfCurlyClosing == -1))
{ // Only bracket
endi = firstIndexOfBracketClosing;
type = '[';
}
else
{
// Both
endi = Math.Min(firstIndexOfBracketClosing, firstIndexOfCurlyClosing);
type = s[endi];
if (type == ']')
{
type = '[';
}
else
{
type = '{';
}
}
}
starti = s.Substring(0, endi).LastIndexOf(type);
if (starti == -1)
{
throw new Exception("Brackets are not valid.");
}
// start index, end index and type found. -> make changes
if (type == '[')
{
// Add two new lines, one with the optional part, one without it
list.Add(s.Remove(starti, endi - starti+1));
list.Add(s.Remove(starti, 1).Remove(endi-1, 1));
IndicesToRemove.Add(i);
}
else
if (type == '{')
{
// Add as many new lines as many alternatives there are. This must be an in most bracket.
string alternatives = s.Substring(starti + 1, endi - starti - 1);
foreach(string alt in alternatives.Split('|'))
{
list.Add(s.Remove(starti,endi-starti+1).Insert(starti,alt));
}
IndicesToRemove.Add(i);
}
} // End of if( >-1 && >-1)
} // End of for loop
for (int i = IndicesToRemove.Count-1; i >= 0; i--)
{
list.RemoveAt((int)IndicesToRemove[i]);
}
}
I hope I've helped.
Maybe it is not the simplest and best implementation, but it works well for me. Please feedback, and vote!
In my opinion, you should proceed like this:
All nested choice lists i.e. between { } should be "flattened" to a single choice list. Like in your example:
{M1|{A1|B1}|M2} -> {M1|A1|B1|M2}
Use recursion to generate all possible combinations. For example, starting from an empty array, first place T1 since it is the only option. Then from the nested list {M1|A1|B1|M2} choose each element in turn an place it on the next position and then finally F1. Repeat until all possibilities are exhausted.
This is just a rough hint, you need to fill in the rest of the details.
i'd like to know how to read and parse specific integer value form a text file and add it to listbox in c#. For example I have a text file MyText.txt like this:
<>
101
192
-
399
~
99
128
-
366
~
101
192
-
403
~
And I want to parse the integer value between '-' and '~' and add each one of it to items in list box for example:
#listBox1
399
366
403
Notice that each line of value separated by Carriage Return and Line Feed. And by the way, it is a data transmitted through RS-232 Serial Communication from microcontroller. Sorry, I'm just new in c# programming. Thanks in advance.
Here's a way to do it with LINQ:
bool keep = false;
listBox1.Items.AddRange(
File.ReadLines("MyText.txt")
.Where(l =>
{
if (l == "-") keep = true;
else if (l == "~") keep = false;
else return keep;
return false;
})
.ToArray());
you could use regular expressions like so:
var s = System.Text.RegularExpressions.Regex.Matches(stringtomatch,#"(?<=-\s*)[0-9]+\b(?=\s*~)");
The regex basically looks for a number. It then checks the characters behind, looks for an optional whitespace and a dash (-). then it matches all the numbers until it encounters another non-word character. it checks for an optional whitespace and then a required ~ (dunno what that's called). Also, it only returns the number (not the whitespace and symbols).
So basically this method returns a list of matches. you could then use it like so:
for (int i = 0; i < s.Count; i++)
{
listBox1.Items.Add(s[i]);
}
EDIT:
typo in the regex and updated the loop (for some reason, foreach doesn't work with the MatchCollection).
you can try running this test script:
var stringtomatch = " asdjasdk jh kjh asd\n-\n123123\n~\nasdasd";
var s = System.Text.RegularExpressions.Regex.Matches(stringtomatch,#"(?<=-\s*)[0-9]+\b(?=\s*~)");
Console.WriteLine(stringtomatch);
for (int i = 0; i < s.Count; i++)
{
listBox1.Items.Add(s[i]);
}
Try
List<Int32> values = new List<Int32>();
bool open = false;
String[] lines = File.ReadAllLines(fileName);
foreach(String line in lines)
{
if( (!open) && (line == "-") )
{
open = true;
}
else if( (open) && (line == "~") )
{
open = false;
}
else if(open)
{
Int32 v;
if(Int32.TryParse(line, out v))
{
values.Add(v);
}
}
}
Listbox.Items.AddRange(values);
This is a easy piece of code with reading a file, converting to integer (although you could stay with strings) and handling lists. You should start with some basic .NET/C# tutorials.
Edit: To add the values to the listbox you can switch to values.ForEach(v => listbox.Items.Add(v.ToString()) if you use .NET 3.5. Otherwise make a foreach yourself.
I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.
`ABCDE` // does not contain repeats
`AABCD` // does contain repeats, ie A is repeated
I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?
EDIT:
The string will be sorted, so order is not important so ABCDA => AABCD
The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.
If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.
Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)
Bonus: here's a slick LINQ solution to implement Jon's functionality:
int longestRun =
s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();
So, OK, it's not very fast! You got a problem with that?!
:-)
If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you could create a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.
EDIT: Now that we know it's sorted, mquander's answer is the best one IMO. Here's an implementation:
public static bool IsSortedNoRepeats(string text)
{
if (text.Length == 0)
{
return true;
}
char current = text[0];
for (int i=1; i < text.Length; i++)
{
char next = text[i];
if (next <= current)
{
return false;
}
current = next;
}
return true;
}
A shorter alternative if you don't mind repeating the indexer use:
public static bool IsSortedNoRepeats(string text)
{
for (int i=1; i < text.Length; i++)
{
if (text[i] <= text[i-1])
{
return false;
}
}
return true;
}
EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.
First a string-specific version:
public static int LongestRun(string text)
{
if (text.Length == 0)
{
return 0;
}
char current = text[0];
int currentRun = 1;
int bestRun = 0;
for (int i=1; i < text.Length; i++)
{
if (current != text[i])
{
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = text[i];
}
currentRun++;
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Now we can also do this as a general extension method on IEnumerable<T>:
public static int LongestRun(this IEnumerable<T> source)
{
bool first = true;
T current = default(T);
int currentRun = 0;
int bestRun = 0;
foreach (T element in source)
{
if (first || !EqualityComparer<T>.Default(element, current))
{
first = false;
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = element;
}
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Then you can call "AABCD".LongestRun() for example.
This will tell you very quickly if a string contains duplicates:
bool containsDups = "ABCDEA".Length != s.Distinct().Count();
It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...
Edit: I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.
Since you're using 3.5, you could do this in one LINQ query:
var results = stringInput
.ToCharArray() // not actually needed, I've left it here to show what's actually happening
.GroupBy(c=>c)
.Where(g=>g.Count()>1)
.Select(g=>new {Letter=g.First(),Count=g.Count()})
;
For each character that appears more than once in the input, this will give you the character and the count of occurances.
I think the easiest way to achieve that is to use this simple regex
bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, #"(\w)\1");
If you need more information about the match (start, length etc)
Match match = null;
string testString = "ABCDE AABCD";
match = Regex.Match(testString, #"(\w)\1+?");
if (match.Success)
{
string matchText = match.Value; // AA
int matchIndnex = match.Index; // 6
int matchLength = match.Length; // 2
}
How about something like:
string strString = "AA BRA KA DABRA";
var grp = from c in strString.ToCharArray()
group c by c into m
select new { Key = m.Key, Count = m.Count() };
foreach (var item in grp)
{
Console.WriteLine(
string.Format("Character:{0} Appears {1} times",
item.Key.ToString(), item.Count));
}
Update Now, you'd need an array of counters to maintain a count.
Keep a bit array, with one bit representing a unique character. Turn the bit on when you encounter a character, and run over the string once. A mapping of the bit array index and the character set is upto you to decide. Break if you see that a particular bit is on already.
/(.).*\1/
(or whatever the equivalent is in your regex library's syntax)
Not the most efficient, since it will probably backtrack to every character in the string and then scan forward again. And I don't usually advocate regular expressions. But if you want brevity...
I started looking for some info on the net and I got to the following solution.
string input = "aaaaabbcbbbcccddefgg";
char[] chars = input.ToCharArray();
Dictionary<char, int> dictionary = new Dictionary<char,int>();
foreach (char c in chars)
{
if (!dictionary.ContainsKey(c))
{
dictionary[c] = 1; //
}
else
{
dictionary[c]++;
}
}
foreach (KeyValuePair<char, int> combo in dictionary)
{
if (combo.Value > 1) //If the vale of the key is greater than 1 it means the letter is repeated
{
Console.WriteLine("Letter " + combo.Key + " " + "is repeated " + combo.Value.ToString() + " times");
}
}
I hope it helps, I had a job interview in which the interviewer asked me to solve this and I understand it is a common question.
When there is no order to work on you could use a dictionary to keep the counts:
String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
if (!result.ContainsKey(c))
{
result[c] = 0; // initialize the counter in the result
}
result[c]++;
}
foreach (var charCombo in result)
{
Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);
}
The hash solution Jon was describing is probably the best. You could use a HybridDictionary since that works well with small and large data sets. Where the letter is the key and the value is the frequency. (Update the frequency every time the add fails or the HybridDictionary returns true for .Contains(key))