Marking position in a C# list - c#

I have a list of strings which have been read in from a dictionary file (sorted into alphabetical order). I want to create an index of the last position of each starting letter, so if there were 1000 words beginning with A it would be recorded as position 999 (because arrays start from 0). 1000 words beginning with B would mean the end position of B is 1999 and so on. These position values will be stored in a int array.
The only way I can think to do this is loop through the whole list, and have lots of else if statements to look at the first letter of the word. Not the most elegant solution.
Does anyone know of a simple way to do this, rather than 26 if statements?
Edit: The purpose of this is to generate random words. If I wanted a word beginning with B I would generate a random number between 1000 and 1999 and get the word from that position in the list.

Well, you could create a dictionary using LINQ:
// Note: assumes no empty words
Dictionary<char, int> lastEntries = words
.Select((index, value) => new { index, value })
.GroupBy(pair => pair.value[0])
.ToDictionary(g => g.Key, g => g.Max(p => p.index));
Or more usefully, keep the first and last indexes:
Dictionary<char, Tuple<int, int>> entryMinMax = words
.Select((value, index) => new { value, index })
.GroupBy(pair => pair.value[0])
.ToDictionary(g => g.Key,
g => Tuple.Of(g.Min(p => p.index), g.Max(p => p.index));
Alternatively, if the point is to effectively group the words by first letter, just do that, using a lookup:
ILookup<char, string> lookup = words.ToLookup(word => word[0]);
Then you can use:
char first = 'B'; // Or whatever
Random rng = new Random(); // But don't create a new one each time...
var range = lookup[first];
var count = range.Count();
if (count == 0)
{
// No words starting with that letter!
}
int index = random.Next(count);
var randomWord = range.ElementAt(index);

I would handle this a different way, and it doesn't require it to be ordered.
public List<string> GetAllStringsStartingWith(char startsWith, List<string> allWords)
{
List<string> letterSpecificWords = allWords.FindAll(word => word.ToLower()[0].Equals(startsWith));
return letterSpecificWords;
}
From here you now have a list containing only words that start with the letter "a". You can change out "a" with a variable for whatever you need it to be, and it will always find all of them beginning with that letter.
Notes:
word.ToLower() is used to make sure it's a lowercase value. If you switch the letter you're looking for to a variable, you'll want to do this on the variable as well.
You still need to handle the random integer, but you now have an accurate count (words.Count) to use.
This assumes no empty entries in the list.
words.ToLower()[0] gets the first character

This might be a case of an xy problem. Why do you need the index of the last occurrence of each letter? Chances are, this isn't really what you want to do.
To answer your question anyway, for each letter, you could use the FindLastIndex method.
int index = myList.FindLastIndex(i => i.ToLower()[0] == 'a');
I like Jon Skeet's method better though since you don't have to loop through each letter.

You could loop through the list with a for loop and compare the first letter of the current element to the first letter of the next element. If the letter is the same continue the loop, if it is different then store the index of the next element and then continue the loop.

To retrieve last:
words.LastOrDefault(i => i[0].ToLower() == 'a');
To get index:
words.FindLastIndex(i => i[0].ToLower() == 'a');

You can do this per one cycle:
public Dictionary<char, int> GetCharIndex(IList<string> words)
{
if (words == null || !words.Any()) throw new ArgumentException("words can't be null or empty");
Dictionary<char, int> charIndex = new Dictionary<char, int>();
char prevLetter = words[0][0];
for(int i = 1;i < words.Count;i++)
{
char letter = words[i][0];
if (letter != prevLetter) //change of first letter of the word -> add previous letter to dictionary
{
charIndex.Add(prevLetter, i - 1);
prevLetter = letter;
}
}
charIndex.Add(words[words.Count - 1][0], words.Count - 1); //special case for last word
return charIndex;
}

Related

String Array Searching

I had an interview for a Jr. developer position a few days ago, and they asked:
"If you had an array of letters "a" and "b" how would you write a method to count how many instances of those letters are in the array?"
I said that you would have a for loop with an if else statement that would increment 1 of 2 counter variables. After that, though, they asked how I would solve that same problem, if the array could contain any letter of the alphabet. I said that I would go about it the same way, with a long IF statement, or a switch statement. In hindsight, that doesn't seem so efficient; is there an easier way to go about doing this?
You could declare the array of size 256 (number of possible character codes) zero it and simply increase the one which corresponds to a char code you read.
For example if you are reading the 'a' the corresponding code is ASCII 97 so you increase the array[97] you can optimize the amount of memory decreasing the code by 97 (if you know the input is going to be characters only) you also need to be aware what to do with capital characters ( are you conciser them as different or not) also in this case you need to take care to decrease the character by 65.
So at the end code would look like this:
int counts[122 - 97] = {0}; // codes of a - z
char a = get_next_char();
if ( is_capital(a)){
counts[a - 65]++;
}
else
{
counts[a - 97] ++;
}
this code assumes the 'A' = 'a'
if its not the case you need to have different translation in the if's but you can probably figure out the idea now. This saves a lot of comparing as opposed to your approach.
Depending on whether the objective is CPU efficiency, memory efficiency, or developer efficiency, you could just do:
foreach(var grp in theString.GroupBy(c => c)) {
Console.WriteLine("{0}: {1}", grp.Key, grp.Count());
}
Not awesome efficiency, but fine for virtually on non-pathological scenarios. In real scenarios, due to unicode, I'd probably use a dictionary as a counter - unicode is to big to pre-allocate an array.
Dictionary<char, int> counts = new Dictionary<char, int>();
foreach(char c in theString) {
int count;
if(!counts.TryGetValue(c, out count)) count = 0;
counts[c] = count + 1;
}
foreach(var pair in counts) {
Console.WriteLine("{0}: {1}", pair.Key, pair.Value);
}
You can create Dictionary<string, int>, then iterate through array, check if element exist as key in dictionary and increment value.
Dictionary<string, int> counter = new Dictionary<string, int>();
foreach(var item in items)
{
if(counter.ContainsKey(item))
{
counter[item] = counter[item] + 1;
}
}
Here is wonderful example given, it may resolve your query.
http://www.dotnetperls.com/array-find
string[] array1 = { "cat", "dog", "carrot", "bird" };</br>
//
// Find first element starting with substring.
//
string value1 = Array.Find(array1,
element => element.StartsWith("car", StringComparison.Ordinal));</br>
//
// Find first element of three characters length.
//
string value2 = Array.Find(array1,
element => element.Length == 3);
//
// Find all elements not greater than four letters long.
//
string[] array2 = Array.FindAll(array1,
element => element.Length <= 4);
Console.WriteLine(value1);
Console.WriteLine(value2);
Console.WriteLine(string.Join(",", array2));

What is the simplest way to refine a list's contents (words) based on character frequency and position? (C#)

I'm writing a console-environment Hangman game for my introductory programming class. The player chooses the word length and number of guesses they would like. 'Easy mode' is simple enough... generate a random number to use as the list's index and check that the chosen word is the right length. However, 'hard mode' requires the list to be refined as the game progresses, choosing the largest list of possibilities given the letters guessed.
I should note, we are not using the C# List class but instead, creating array-based structs:
struct ListType
{
public type[] items;
public int count;
}
//defined as:
ListType myList = new ListType();
myList.items = new type[max value];
myList.count = 0;
Anyway, here's an example of the way 'hard mode' should go:
Word List:
hole
airplane
lame
photos
cart
mole
(player chooses word length of 4)
Word List (refined):
hole
lame
cart
mole
(player guesses "l", then "e")
Word List (refined):
hole
mole
"Lame" is omitted because more words have the "...le..." pattern. The technique that makes sense to me (but isn't working the way I'd like) is storing each word's pattern to an array (ie: "mole" and "hole" = 0011, and "lame" = 1001), and counting up the duplicates to determine the larger list.
Is this the way I should be doing it? I'm new to programming and have just under a year's worth of experience, so I guess answer as such.
Thanks!!
There are a few ways of approaching this. A simple way would be to keep track of a list for all candidate words and calculate the amount of matching sequences for that word as well as log the best matching sequence. This way you can both sort on the best sequence and the amount of sequences when the best sequence alone is not a good enough measurement tool. I hope it becomes obvious how to modify this code in order to only sort on the best sequence.
Firstly i setup a test case like:
// mimic the scenario given by the QA
string[] wordList = new string[] { "hole", "airplane", "lame", "photos", "cart", "mole" };
int wordLength = 4;
List<char> requiredCharacters = new List<char>{ 'l', 'e'};
After which i filter the wordList and calculate the best matches which i finally group together to produce the desired result:
// filter all words that dont match the required length
var candidateWords = wordList.Where(x => x.Length == wordLength);
// define a result set holding all the words and all their matches
Dictionary<string, List<int>> refinedWordSet = new Dictionary<string, List<int>>();
foreach (string word in candidateWords)
{
List<int> matches = new List<int>() { 0 };
int currentMatchCount = 0;
foreach (char character in word)
{
if (requiredCharacters.Contains(character))
{
currentMatchCount++;
}
else
{
// if there were previous matches
if (currentMatchCount > 0)
{
// save the current match
matches.Add(currentMatchCount);
currentMatchCount = 0;
}
}
}
// if there was a match at the end
if (currentMatchCount > 0)
{
// save the last match
matches.Add(currentMatchCount);
}
refinedWordSet.Add(word, matches);
}
// sort by a combination of the total amount of matches as well as the highest match
var goupedRefinedWords = from entry in refinedWordSet
group entry.Key by new { Max = entry.Value.Max(), Total = entry.Value.Sum() } into grouped
select grouped;
foreach (var entry in goupedRefinedWords)
{
Console.WriteLine("Word list with best match: {0} and total match {1}: {2}",
entry.Key.Max,
entry.Key.Total,
entry.Aggregate("", (result, nextWord) => result += nextWord + ", "));
}
Console.ReadLine();
Pay attention to the comments in the code
So you look through the array for strings that match the guess pattern.
In the specific case of "le" you culd simply use String.IndexOf(). If you require a more cmplex pattern.. say "*le?" (where * and ? follow DOS-like wildcard pattern) you could employ a dynamicly-cnstructed regex pattern (easy, but performace-heavy if used in a near-realtime system), or character scanning (read each char from the screen and match to your pattern) (more difficult, harder to maintain, better performance for a small number of elements in a near-RT system).
As this is homework, I wouldn't worry about performace profiling at all right now.
Also, that struct looks mighty goofy. There are certainly better constructs for this type of thing. Like a List<String>, or just a String[]... both of which have a .Count property.

How check if letters are in string?

It quite hard question to ask but I will try.
I have my 4 letters m u g o . I have also free string word(s).
Let'say: og ogg muogss. I am looking for any wise method to check if I can construct word(s) using only my letters. Please take notice that we used once g we won't be able to use it again.
og - possible because we need only **g** and **o**
ogg - not possible we took **o** and **g**, need the second **g**
muogss - not possible we took all, need also additional **s**
So my tactic is take my letters to char array and remove one by one and check how many left to build the word(s). But is it possible to use somehow in few lines, i do not know - regex ?
your method is only a few lines...
public static bool CanBeMadeFrom(string word, string letters)
{
foreach (var i in word.Select(c => letters.IndexOf(c, 0)))
{
if (i == -1) return false;
letters = letters.Remove(i, 1);
}
return true;
}
Here's a simple approach:
For your source word, create an array of size 26 and use it to count the how many times each letter appears.
Do the same for each word in your dictionary.
Then compare the two.
If every letter occurs less than or equal to as many times in the dictionary word as the source word, then it can be used to make that word. If not, then it cannot.
C-Sharpish Pseudocode: (probably doesn't compile as written)
/** Converts characters to a 0 to 25 code representing alphabet position.
This is specific to the English language and would need to be modified if used
for other languages. */
int charToLetter(char c) {
return Char.ToUpper(c)-'A';
}
/** Given a source word and an array of other words to check, returns all
words from the array which can be made from the letters of the source word. */
ArrayList<string> checkSubWords(string source, string[] dictionary) {
ArrayList<string> output = new ArrayList<string>();
// Stores how many of each letter are in the source word.
int[] sourcecount = new int[26]; // Should initialize to 0, automatically
foreach (char c in source) {
sourcecount[c]++;
}
foreach (string s in dictionary) {
// Stores how many of each letter are in the dictionary word.
int[] dictcount = new int[26]; // Should initialize to 0, automatically
foreach (char c in s) {
dictcount[c]++;
}
// Then we check that there exist no letters which appear more in the
// dictionary word than the source word.
boolean isSubword = true;
for (int i=0;i<26;i++) {
if (dictcount[i] > sourcecount[i]) {
isSubword = false;
}
}
// If they're all less than or equal to, then we add it to the output.
if (isSubWord) {
output.add(s);
}
}
return output;
}
If your definition of words is any arbitrary permutation of the available charactters then why do you need a regex? Just make sure you use each characters once. Regex doesn't know what a "correct word" is, and it's better to avoid using invalid characters by your algorithms than using them AND using a regex to make sure you didn't use them.

Dictionary<char, char> to map an alphabet -- with unique keys & values

I need to create a Dictionary that expresses a mapping between each char in an alphabet and another char in that alphabet, where both the key and value are unique -- like a very simple cipher that expresses how to code/decode a message. There can be no duplicate keys or values.
Does anyone see what is wrong with this code? It is still producing duplicate values in the mapping despite the fact that on each iteration the pool of available characters decreases for each value already used.
string source_alphabet = _alphabet; //ie "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
string target_alphabet = _alphabet;
Dictionary<char, char> _map = new Dictionary<char, char>();
for (int i = 0; i < source_alphabet.Length; i++)
{
int random = _random.Next(target_alphabet.Length - 1); //select a random index
char _output = target_alphabet[random] //get the char at the random index
_map.Add(source_alphabet[i], _output); //add to the dictionary
target_alphabet = target_alphabet.Replace(_output.ToString(), string.Empty);
// remove the char we just added from the remaining alphabet
}
Thanks.
I would consider performing a simple Fisher Yates shuffle over one or both sequences of the alphabet, then you can simply iterate over the output and put together your mapper.
Pseudocode
Shuffle(sequence1)
Shuffle(sequence2)
for index 0 to 25
dictionary add sequence1[index], sequence2[index]
When you try to select a random value each time, then there is a high probability that you will get a collision and therefore have a non-unique value selected. The answer is usually to shuffle, then select in order.
"a quick fix" though not optimal would be (if mapping A to A is NOT allowed)
int random = _random.Next(target_alphabet.Length - 1);
while ( source_alphabet[i] == target_alphabet[random] ) {random = _random.Next(target_alphabet.Length - 1);};
if mapping A to A is allowed then ignore the above change... BUT at least change the last line to
target_alphabet = target_alphabet.Remove ( random, 1 );
I guess you could add another "for" loop on the target_alphabet inside the existing "for" loop and check for the characters not be same with a small "if" condition and continue the inner loop if same or break if not.
This works.
for (int i = 0; i < source_alphabet.Length; i++)
{
int random = _random.Next(target_alphabet.Length - 1); //select a random index
char _output = target_alphabet[random]; //get the char at the random index
_map.Add(source_alphabet[i], _output); //add to the dictionary
// remove the char we just added from the remaining alphabet
target_alphabet = target_alphabet.Remove(random, 1);
}

Making a Dictionary's key based on a for loop position

I am going to a directory picking up some files and then adding them to a Dictionary.
The first time in the loop the key needs to be A, second time B etc. Afer 26/Z the number represents different characters and from 33 it starts at lowercase a up to 49 which is lowercase q.
Without having a massive if statement to say if i == 1 then Key is 'A' etc etc how can I can keep this code tidy?
Sounds like you just need to keep an index of where you've got to, then some mapping function:
int index = 0;
foreach (...)
{
...
string key = MapIndexToKey(index);
dictionary[key] = value;
index++;
}
...
// Keys as per comments
private static readonly List<string> Keys =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopq"
.Select(x => x.ToString())
.ToList();
// This doesn't really need to be a separate method at the moment, but
// it means it's flexible for future expansion.
private static string MapIndexToKey(int index)
{
return Keys[index];
}
EDIT: I've updated the MapIndexToKey method to make it simpler. It's not clear why you want a string key if you only ever use a single character though...
Another edit: I believe you could actually just use:
string key = ((char) (index + 'A')).ToString();
instead of having the mapping function at all, given your requirements, as the characters are contiguous in Unicode order from 'A'...
Keep incrementing from 101 to 132, ignoring missing sequence, and convert them to character. http://www.asciitable.com/
Use reminder (divide by 132) to identify second loop
This gives you the opportunity to map letters to specific numbers, perhaps not alphabet ordered.
var letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
.Select((chr, index) => new {character = chr, index = index + 1 });
foreach(var letter in letters)
{
int index = letter.index;
char chr = letter.character;
// do something
}
How about:
for(int i=0; i<26; ++i)
{
dict[(char)('A'+ (i % 26))] = GetValueFor(i);
}

Categories