String Array Searching - c#

I had an interview for a Jr. developer position a few days ago, and they asked:
"If you had an array of letters "a" and "b" how would you write a method to count how many instances of those letters are in the array?"
I said that you would have a for loop with an if else statement that would increment 1 of 2 counter variables. After that, though, they asked how I would solve that same problem, if the array could contain any letter of the alphabet. I said that I would go about it the same way, with a long IF statement, or a switch statement. In hindsight, that doesn't seem so efficient; is there an easier way to go about doing this?

You could declare the array of size 256 (number of possible character codes) zero it and simply increase the one which corresponds to a char code you read.
For example if you are reading the 'a' the corresponding code is ASCII 97 so you increase the array[97] you can optimize the amount of memory decreasing the code by 97 (if you know the input is going to be characters only) you also need to be aware what to do with capital characters ( are you conciser them as different or not) also in this case you need to take care to decrease the character by 65.
So at the end code would look like this:
int counts[122 - 97] = {0}; // codes of a - z
char a = get_next_char();
if ( is_capital(a)){
counts[a - 65]++;
}
else
{
counts[a - 97] ++;
}
this code assumes the 'A' = 'a'
if its not the case you need to have different translation in the if's but you can probably figure out the idea now. This saves a lot of comparing as opposed to your approach.

Depending on whether the objective is CPU efficiency, memory efficiency, or developer efficiency, you could just do:
foreach(var grp in theString.GroupBy(c => c)) {
Console.WriteLine("{0}: {1}", grp.Key, grp.Count());
}
Not awesome efficiency, but fine for virtually on non-pathological scenarios. In real scenarios, due to unicode, I'd probably use a dictionary as a counter - unicode is to big to pre-allocate an array.
Dictionary<char, int> counts = new Dictionary<char, int>();
foreach(char c in theString) {
int count;
if(!counts.TryGetValue(c, out count)) count = 0;
counts[c] = count + 1;
}
foreach(var pair in counts) {
Console.WriteLine("{0}: {1}", pair.Key, pair.Value);
}

You can create Dictionary<string, int>, then iterate through array, check if element exist as key in dictionary and increment value.
Dictionary<string, int> counter = new Dictionary<string, int>();
foreach(var item in items)
{
if(counter.ContainsKey(item))
{
counter[item] = counter[item] + 1;
}
}

Here is wonderful example given, it may resolve your query.
http://www.dotnetperls.com/array-find
string[] array1 = { "cat", "dog", "carrot", "bird" };</br>
//
// Find first element starting with substring.
//
string value1 = Array.Find(array1,
element => element.StartsWith("car", StringComparison.Ordinal));</br>
//
// Find first element of three characters length.
//
string value2 = Array.Find(array1,
element => element.Length == 3);
//
// Find all elements not greater than four letters long.
//
string[] array2 = Array.FindAll(array1,
element => element.Length <= 4);
Console.WriteLine(value1);
Console.WriteLine(value2);
Console.WriteLine(string.Join(",", array2));

Related

Marking position in a C# list

I have a list of strings which have been read in from a dictionary file (sorted into alphabetical order). I want to create an index of the last position of each starting letter, so if there were 1000 words beginning with A it would be recorded as position 999 (because arrays start from 0). 1000 words beginning with B would mean the end position of B is 1999 and so on. These position values will be stored in a int array.
The only way I can think to do this is loop through the whole list, and have lots of else if statements to look at the first letter of the word. Not the most elegant solution.
Does anyone know of a simple way to do this, rather than 26 if statements?
Edit: The purpose of this is to generate random words. If I wanted a word beginning with B I would generate a random number between 1000 and 1999 and get the word from that position in the list.
Well, you could create a dictionary using LINQ:
// Note: assumes no empty words
Dictionary<char, int> lastEntries = words
.Select((index, value) => new { index, value })
.GroupBy(pair => pair.value[0])
.ToDictionary(g => g.Key, g => g.Max(p => p.index));
Or more usefully, keep the first and last indexes:
Dictionary<char, Tuple<int, int>> entryMinMax = words
.Select((value, index) => new { value, index })
.GroupBy(pair => pair.value[0])
.ToDictionary(g => g.Key,
g => Tuple.Of(g.Min(p => p.index), g.Max(p => p.index));
Alternatively, if the point is to effectively group the words by first letter, just do that, using a lookup:
ILookup<char, string> lookup = words.ToLookup(word => word[0]);
Then you can use:
char first = 'B'; // Or whatever
Random rng = new Random(); // But don't create a new one each time...
var range = lookup[first];
var count = range.Count();
if (count == 0)
{
// No words starting with that letter!
}
int index = random.Next(count);
var randomWord = range.ElementAt(index);
I would handle this a different way, and it doesn't require it to be ordered.
public List<string> GetAllStringsStartingWith(char startsWith, List<string> allWords)
{
List<string> letterSpecificWords = allWords.FindAll(word => word.ToLower()[0].Equals(startsWith));
return letterSpecificWords;
}
From here you now have a list containing only words that start with the letter "a". You can change out "a" with a variable for whatever you need it to be, and it will always find all of them beginning with that letter.
Notes:
word.ToLower() is used to make sure it's a lowercase value. If you switch the letter you're looking for to a variable, you'll want to do this on the variable as well.
You still need to handle the random integer, but you now have an accurate count (words.Count) to use.
This assumes no empty entries in the list.
words.ToLower()[0] gets the first character
This might be a case of an xy problem. Why do you need the index of the last occurrence of each letter? Chances are, this isn't really what you want to do.
To answer your question anyway, for each letter, you could use the FindLastIndex method.
int index = myList.FindLastIndex(i => i.ToLower()[0] == 'a');
I like Jon Skeet's method better though since you don't have to loop through each letter.
You could loop through the list with a for loop and compare the first letter of the current element to the first letter of the next element. If the letter is the same continue the loop, if it is different then store the index of the next element and then continue the loop.
To retrieve last:
words.LastOrDefault(i => i[0].ToLower() == 'a');
To get index:
words.FindLastIndex(i => i[0].ToLower() == 'a');
You can do this per one cycle:
public Dictionary<char, int> GetCharIndex(IList<string> words)
{
if (words == null || !words.Any()) throw new ArgumentException("words can't be null or empty");
Dictionary<char, int> charIndex = new Dictionary<char, int>();
char prevLetter = words[0][0];
for(int i = 1;i < words.Count;i++)
{
char letter = words[i][0];
if (letter != prevLetter) //change of first letter of the word -> add previous letter to dictionary
{
charIndex.Add(prevLetter, i - 1);
prevLetter = letter;
}
}
charIndex.Add(words[words.Count - 1][0], words.Count - 1); //special case for last word
return charIndex;
}

Constantly Incrementing String

So, what I'm trying to do this something like this: (example)
a,b,c,d.. etc. aa,ab,ac.. etc. ba,bb,bc, etc.
So, this can essentially be explained as generally increasing and just printing all possible variations, starting at a. So far, I've been able to do it with one letter, starting out like this:
for (int i = 97; i <= 122; i++)
{
item = (char)i
}
But, I'm unable to eventually add the second letter, third letter, and so forth. Is anyone able to provide input? Thanks.
Since there hasn't been a solution so far that would literally "increment a string", here is one that does:
static string Increment(string s) {
if (s.All(c => c == 'z')) {
return new string('a', s.Length + 1);
}
var res = s.ToCharArray();
var pos = res.Length - 1;
do {
if (res[pos] != 'z') {
res[pos]++;
break;
}
res[pos--] = 'a';
} while (true);
return new string(res);
}
The idea is simple: pretend that letters are your digits, and do an increment the way they teach in an elementary school. Start from the rightmost "digit", and increment it. If you hit a nine (which is 'z' in our system), move on to the prior digit; otherwise, you are done incrementing.
The obvious special case is when the "number" is composed entirely of nines. This is when your "counter" needs to roll to the next size up, and add a "digit". This special condition is checked at the beginning of the method: if the string is composed of N letters 'z', a string of N+1 letter 'a's is returned.
Here is a link to a quick demonstration of this code on ideone.
Each iteration of Your for loop is completely
overwriting what is in "item" - the for loop is just assigning one character "i" at a time
If item is a String, Use something like this:
item = "";
for (int i = 97; i <= 122; i++)
{
item += (char)i;
}
something to the affect of
public string IncrementString(string value)
{
if (string.IsNullOrEmpty(value)) return "a";
var chars = value.ToArray();
var last = chars.Last();
if(char.ToByte() == 122)
return value + "a";
return value.SubString(0, value.Length) + (char)(char.ToByte()+1);
}
you'll probably need to convert the char to a byte. That can be encapsulated in an extension method like static int ToByte(this char);
StringBuilder is a better choice when building large amounts of strings. so you may want to consider using that instead of string concatenation.
Another way to look at this is that you want to count in base 26. The computer is very good at counting and since it always has to convert from base 2 (binary), which is the way it stores values, to base 10 (decimal--the number system you and I generally think in), converting to different number bases is also very easy.
There's a general base converter here https://stackoverflow.com/a/3265796/351385 which converts an array of bytes to an arbitrary base. Once you have a good understanding of number bases and can understand that code, it's a simple matter to create a base 26 counter that counts in binary, but converts to base 26 for display.

What is the simplest way to refine a list's contents (words) based on character frequency and position? (C#)

I'm writing a console-environment Hangman game for my introductory programming class. The player chooses the word length and number of guesses they would like. 'Easy mode' is simple enough... generate a random number to use as the list's index and check that the chosen word is the right length. However, 'hard mode' requires the list to be refined as the game progresses, choosing the largest list of possibilities given the letters guessed.
I should note, we are not using the C# List class but instead, creating array-based structs:
struct ListType
{
public type[] items;
public int count;
}
//defined as:
ListType myList = new ListType();
myList.items = new type[max value];
myList.count = 0;
Anyway, here's an example of the way 'hard mode' should go:
Word List:
hole
airplane
lame
photos
cart
mole
(player chooses word length of 4)
Word List (refined):
hole
lame
cart
mole
(player guesses "l", then "e")
Word List (refined):
hole
mole
"Lame" is omitted because more words have the "...le..." pattern. The technique that makes sense to me (but isn't working the way I'd like) is storing each word's pattern to an array (ie: "mole" and "hole" = 0011, and "lame" = 1001), and counting up the duplicates to determine the larger list.
Is this the way I should be doing it? I'm new to programming and have just under a year's worth of experience, so I guess answer as such.
Thanks!!
There are a few ways of approaching this. A simple way would be to keep track of a list for all candidate words and calculate the amount of matching sequences for that word as well as log the best matching sequence. This way you can both sort on the best sequence and the amount of sequences when the best sequence alone is not a good enough measurement tool. I hope it becomes obvious how to modify this code in order to only sort on the best sequence.
Firstly i setup a test case like:
// mimic the scenario given by the QA
string[] wordList = new string[] { "hole", "airplane", "lame", "photos", "cart", "mole" };
int wordLength = 4;
List<char> requiredCharacters = new List<char>{ 'l', 'e'};
After which i filter the wordList and calculate the best matches which i finally group together to produce the desired result:
// filter all words that dont match the required length
var candidateWords = wordList.Where(x => x.Length == wordLength);
// define a result set holding all the words and all their matches
Dictionary<string, List<int>> refinedWordSet = new Dictionary<string, List<int>>();
foreach (string word in candidateWords)
{
List<int> matches = new List<int>() { 0 };
int currentMatchCount = 0;
foreach (char character in word)
{
if (requiredCharacters.Contains(character))
{
currentMatchCount++;
}
else
{
// if there were previous matches
if (currentMatchCount > 0)
{
// save the current match
matches.Add(currentMatchCount);
currentMatchCount = 0;
}
}
}
// if there was a match at the end
if (currentMatchCount > 0)
{
// save the last match
matches.Add(currentMatchCount);
}
refinedWordSet.Add(word, matches);
}
// sort by a combination of the total amount of matches as well as the highest match
var goupedRefinedWords = from entry in refinedWordSet
group entry.Key by new { Max = entry.Value.Max(), Total = entry.Value.Sum() } into grouped
select grouped;
foreach (var entry in goupedRefinedWords)
{
Console.WriteLine("Word list with best match: {0} and total match {1}: {2}",
entry.Key.Max,
entry.Key.Total,
entry.Aggregate("", (result, nextWord) => result += nextWord + ", "));
}
Console.ReadLine();
Pay attention to the comments in the code
So you look through the array for strings that match the guess pattern.
In the specific case of "le" you culd simply use String.IndexOf(). If you require a more cmplex pattern.. say "*le?" (where * and ? follow DOS-like wildcard pattern) you could employ a dynamicly-cnstructed regex pattern (easy, but performace-heavy if used in a near-realtime system), or character scanning (read each char from the screen and match to your pattern) (more difficult, harder to maintain, better performance for a small number of elements in a near-RT system).
As this is homework, I wouldn't worry about performace profiling at all right now.
Also, that struct looks mighty goofy. There are certainly better constructs for this type of thing. Like a List<String>, or just a String[]... both of which have a .Count property.

Making a Dictionary's key based on a for loop position

I am going to a directory picking up some files and then adding them to a Dictionary.
The first time in the loop the key needs to be A, second time B etc. Afer 26/Z the number represents different characters and from 33 it starts at lowercase a up to 49 which is lowercase q.
Without having a massive if statement to say if i == 1 then Key is 'A' etc etc how can I can keep this code tidy?
Sounds like you just need to keep an index of where you've got to, then some mapping function:
int index = 0;
foreach (...)
{
...
string key = MapIndexToKey(index);
dictionary[key] = value;
index++;
}
...
// Keys as per comments
private static readonly List<string> Keys =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopq"
.Select(x => x.ToString())
.ToList();
// This doesn't really need to be a separate method at the moment, but
// it means it's flexible for future expansion.
private static string MapIndexToKey(int index)
{
return Keys[index];
}
EDIT: I've updated the MapIndexToKey method to make it simpler. It's not clear why you want a string key if you only ever use a single character though...
Another edit: I believe you could actually just use:
string key = ((char) (index + 'A')).ToString();
instead of having the mapping function at all, given your requirements, as the characters are contiguous in Unicode order from 'A'...
Keep incrementing from 101 to 132, ignoring missing sequence, and convert them to character. http://www.asciitable.com/
Use reminder (divide by 132) to identify second loop
This gives you the opportunity to map letters to specific numbers, perhaps not alphabet ordered.
var letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
.Select((chr, index) => new {character = chr, index = index + 1 });
foreach(var letter in letters)
{
int index = letter.index;
char chr = letter.character;
// do something
}
How about:
for(int i=0; i<26; ++i)
{
dict[(char)('A'+ (i % 26))] = GetValueFor(i);
}

Testing for repeated characters in a string

I'm doing some work with strings, and I have a scenario where I need to determine if a string (usually a small one < 10 characters) contains repeated characters.
`ABCDE` // does not contain repeats
`AABCD` // does contain repeats, ie A is repeated
I can loop through the string.ToCharArray() and test each character against every other character in the char[], but I feel like I am missing something obvious.... maybe I just need coffee. Can anyone help?
EDIT:
The string will be sorted, so order is not important so ABCDA => AABCD
The frequency of repeats is also important, so I need to know if the repeat is pair or triplet etc.
If the string is sorted, you could just remember each character in turn and check to make sure the next character is never identical to the last character.
Other than that, for strings under ten characters, just testing each character against all the rest is probably as fast or faster than most other things. A bit vector, as suggested by another commenter, may be faster (helps if you have a small set of legal characters.)
Bonus: here's a slick LINQ solution to implement Jon's functionality:
int longestRun =
s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();
So, OK, it's not very fast! You got a problem with that?!
:-)
If the string is short, then just looping and testing may well be the simplest and most efficient way. I mean you could create a hash set (in whatever platform you're using) and iterate through the characters, failing if the character is already in the set and adding it to the set otherwise - but that's only likely to provide any benefit when the strings are longer.
EDIT: Now that we know it's sorted, mquander's answer is the best one IMO. Here's an implementation:
public static bool IsSortedNoRepeats(string text)
{
if (text.Length == 0)
{
return true;
}
char current = text[0];
for (int i=1; i < text.Length; i++)
{
char next = text[i];
if (next <= current)
{
return false;
}
current = next;
}
return true;
}
A shorter alternative if you don't mind repeating the indexer use:
public static bool IsSortedNoRepeats(string text)
{
for (int i=1; i < text.Length; i++)
{
if (text[i] <= text[i-1])
{
return false;
}
}
return true;
}
EDIT: Okay, with the "frequency" side, I'll turn the problem round a bit. I'm still going to assume that the string is sorted, so what we want to know is the length of the longest run. When there are no repeats, the longest run length will be 0 (for an empty string) or 1 (for a non-empty string). Otherwise, it'll be 2 or more.
First a string-specific version:
public static int LongestRun(string text)
{
if (text.Length == 0)
{
return 0;
}
char current = text[0];
int currentRun = 1;
int bestRun = 0;
for (int i=1; i < text.Length; i++)
{
if (current != text[i])
{
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = text[i];
}
currentRun++;
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Now we can also do this as a general extension method on IEnumerable<T>:
public static int LongestRun(this IEnumerable<T> source)
{
bool first = true;
T current = default(T);
int currentRun = 0;
int bestRun = 0;
foreach (T element in source)
{
if (first || !EqualityComparer<T>.Default(element, current))
{
first = false;
bestRun = Math.Max(currentRun, bestRun);
currentRun = 0;
current = element;
}
}
// It's possible that the final run is the best one
return Math.Max(currentRun, bestRun);
}
Then you can call "AABCD".LongestRun() for example.
This will tell you very quickly if a string contains duplicates:
bool containsDups = "ABCDEA".Length != s.Distinct().Count();
It just checks the number of distinct characters against the original length. If they're different, you've got duplicates...
Edit: I guess this doesn't take care of the frequency of dups you noted in your edit though... but some other suggestions here already take care of that, so I won't post the code as I note a number of them already give you a reasonably elegant solution. I particularly like Joe's implementation using LINQ extensions.
Since you're using 3.5, you could do this in one LINQ query:
var results = stringInput
.ToCharArray() // not actually needed, I've left it here to show what's actually happening
.GroupBy(c=>c)
.Where(g=>g.Count()>1)
.Select(g=>new {Letter=g.First(),Count=g.Count()})
;
For each character that appears more than once in the input, this will give you the character and the count of occurances.
I think the easiest way to achieve that is to use this simple regex
bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, #"(\w)\1");
If you need more information about the match (start, length etc)
Match match = null;
string testString = "ABCDE AABCD";
match = Regex.Match(testString, #"(\w)\1+?");
if (match.Success)
{
string matchText = match.Value; // AA
int matchIndnex = match.Index; // 6
int matchLength = match.Length; // 2
}
How about something like:
string strString = "AA BRA KA DABRA";
var grp = from c in strString.ToCharArray()
group c by c into m
select new { Key = m.Key, Count = m.Count() };
foreach (var item in grp)
{
Console.WriteLine(
string.Format("Character:{0} Appears {1} times",
item.Key.ToString(), item.Count));
}
Update Now, you'd need an array of counters to maintain a count.
Keep a bit array, with one bit representing a unique character. Turn the bit on when you encounter a character, and run over the string once. A mapping of the bit array index and the character set is upto you to decide. Break if you see that a particular bit is on already.
/(.).*\1/
(or whatever the equivalent is in your regex library's syntax)
Not the most efficient, since it will probably backtrack to every character in the string and then scan forward again. And I don't usually advocate regular expressions. But if you want brevity...
I started looking for some info on the net and I got to the following solution.
string input = "aaaaabbcbbbcccddefgg";
char[] chars = input.ToCharArray();
Dictionary<char, int> dictionary = new Dictionary<char,int>();
foreach (char c in chars)
{
if (!dictionary.ContainsKey(c))
{
dictionary[c] = 1; //
}
else
{
dictionary[c]++;
}
}
foreach (KeyValuePair<char, int> combo in dictionary)
{
if (combo.Value > 1) //If the vale of the key is greater than 1 it means the letter is repeated
{
Console.WriteLine("Letter " + combo.Key + " " + "is repeated " + combo.Value.ToString() + " times");
}
}
I hope it helps, I had a job interview in which the interviewer asked me to solve this and I understand it is a common question.
When there is no order to work on you could use a dictionary to keep the counts:
String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
if (!result.ContainsKey(c))
{
result[c] = 0; // initialize the counter in the result
}
result[c]++;
}
foreach (var charCombo in result)
{
Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);
}
The hash solution Jon was describing is probably the best. You could use a HybridDictionary since that works well with small and large data sets. Where the letter is the key and the value is the frequency. (Update the frequency every time the add fails or the HybridDictionary returns true for .Contains(key))

Categories