I have a text file which contains a list of about 150000 words. I loaded the the words into a dictionary and word lookup works fine for me. Now i want to search the dictionary to see if the dictionary contains a word starting from a particular alphabet.
I am using
foreach(KeyValuePair pair in dict ){
}
for this purpose. This seems to work fine for smaller word lists. But it doesnt work for 150000 wordlist. Could anyone help please.
void WordAvailable (char sLetter)
{
notAvail = true;
int count = 0;
do {
foreach (KeyValuePair<string,string> pair in dict) {
randWord = pair.Value;
count++;
if (randWord [0] == sLetter && !ListTest.usedWordsList.Contains (randWord)) {
notAvail = false;
startingLetter = char.ToString (sLetter);
break;
}
if (count >= dict.Count) {
ChooseRandomAlpha ();
sLetter = alpha;
count = 0;
}
}
} while(notAvail);
}
Now I want to search the dictionary to see if the dictionary contains a word starting from a particular alphabet.
That sounds like you want a SortedSet rather than a Dictionary. You can use GetViewBetween to find all the entries in the set that lie between two bounds. The lower bound would probably be "the string you're starting with" and for the upper bound, you could either work out "the last possible string starting with those characters" or use an exclusive upper bound by manually ignoring anything that doesn't start with your prefix, and "incrementing" the last character of your prefix. So for example, to find all words beginning with "tim" you can use GetViewBetween("tim", "tin") and ignore tin if it's in the dictionary.
Note that ordering can be an "interesting" exercise when it comes to multiple cultures etc. If this is just an academic exercise and you'll only have ASCII characters, you might want to lower case each word as you add it to the set, and then use an ordinal comparison. If you do need a culture-sensitive ordering, you could make that case-insensitive easily... but working out the upper bound for the prefix will be trickier.
Example of using GetViewBetween:
using System;
using System.Collections.Generic;
class Test
{
static void Main()
{
var words = new SortedSet<string>(StringComparer.Ordinal)
{
"cat", "dog", "banana", "laptop", "mug",
"coffee", "microphone", "water", "stairs", "phone"
};
foreach (var word in words.GetViewBetween("d", "n"))
{
Console.WriteLine(word);
}
}
}
Output:
dog
laptop
microphone
mug
An alternative would be to build your own trie implementation (or find an existing one) but I don't know of one in the BCL.
Related
So I'm using this code down here to figure out all the words that could be spelled out of the alphabet variable, the problem is , I build this alphabet variable each time I call this based on the board of random letters in front of the user. What i see though , and of course, is "aaab" for example...
What I'm after is for code to only use the letter as many times as it appears in the alphabet var, so that it can't do something like "aaab" but just "ab"
I understand this code that I found in another thread is made to build combinations of the letters into 4 letter words, or arrangements,
I'm wondering if theres a simple way using SelectMany or Select, to not add up its self if its already been used, keep in mind there could be multiple "a's" in the alphabet var to begin with, so if theres 2 A's in there, it should still be able to to AAB, just not AAAB. I am a newbie, I know that I could go through my own list and add letters together based on how many times they actually exist in the alphabet string..im just wondering if theres a way to interupt i or x and not add to q if its already been used...
sorry if this is confusing... thank you :)
// I found this in another thread and seemed to work great and fast.
var alphabet = "abcd";
var q = alphabet.Select(x => x.ToString());
int size = 4;
for (int i = 0; i < size - 1; i++)
q = q.SelectMany(x => alphabet, (x, y) => x + y);
foreach (var item in q)
( DO STUFF)
To reach your goal, you must find a way to mark letters in your alphabet which are already used and avoid using these letters a second time.
To do so you need a data structure which can store more than the letters alone, so a list of letters (or a string) is not sufficient.
Try to bulid a list of classes like this one:
class UsedLetter
{
char letter;
bool used;
}
Then you can mark each letter as used after you drew it from the list.
Improvement
You may also store your alphabet as a list of characters:
List<char> alphabet;
and remove each letter from the alphabet after its drawn.
Here's how I have achieved what I think you're after:
using System;
using System.Collections.Generic;
using System.Linq;
namespace WordPerms
{
class Program
{
Stack<char> chars = new Stack<char>();
List<string> words = new List<string>();
static void Main(string[] args)
{
Program p = new Program();
p.GetChar("abad");
foreach (string word in p.words)
{
Console.WriteLine(word);
}
}
// This is called recursively to build the list of words.
private void GetChar(string alpha)
{
string beta;
for (int i = 0; i < alpha.Length; i++)
{
chars.Push(alpha[i]);
beta = alpha.Remove(i, 1);
GetChar(beta);
}
char[] charArray = chars.Reverse().ToArray();
words.Add(new string(charArray));
if (chars.Count() >= 1)
{
chars.Pop();
}
}
}
}
Hope that helps, Greg.
I had an interview for a Jr. developer position a few days ago, and they asked:
"If you had an array of letters "a" and "b" how would you write a method to count how many instances of those letters are in the array?"
I said that you would have a for loop with an if else statement that would increment 1 of 2 counter variables. After that, though, they asked how I would solve that same problem, if the array could contain any letter of the alphabet. I said that I would go about it the same way, with a long IF statement, or a switch statement. In hindsight, that doesn't seem so efficient; is there an easier way to go about doing this?
You could declare the array of size 256 (number of possible character codes) zero it and simply increase the one which corresponds to a char code you read.
For example if you are reading the 'a' the corresponding code is ASCII 97 so you increase the array[97] you can optimize the amount of memory decreasing the code by 97 (if you know the input is going to be characters only) you also need to be aware what to do with capital characters ( are you conciser them as different or not) also in this case you need to take care to decrease the character by 65.
So at the end code would look like this:
int counts[122 - 97] = {0}; // codes of a - z
char a = get_next_char();
if ( is_capital(a)){
counts[a - 65]++;
}
else
{
counts[a - 97] ++;
}
this code assumes the 'A' = 'a'
if its not the case you need to have different translation in the if's but you can probably figure out the idea now. This saves a lot of comparing as opposed to your approach.
Depending on whether the objective is CPU efficiency, memory efficiency, or developer efficiency, you could just do:
foreach(var grp in theString.GroupBy(c => c)) {
Console.WriteLine("{0}: {1}", grp.Key, grp.Count());
}
Not awesome efficiency, but fine for virtually on non-pathological scenarios. In real scenarios, due to unicode, I'd probably use a dictionary as a counter - unicode is to big to pre-allocate an array.
Dictionary<char, int> counts = new Dictionary<char, int>();
foreach(char c in theString) {
int count;
if(!counts.TryGetValue(c, out count)) count = 0;
counts[c] = count + 1;
}
foreach(var pair in counts) {
Console.WriteLine("{0}: {1}", pair.Key, pair.Value);
}
You can create Dictionary<string, int>, then iterate through array, check if element exist as key in dictionary and increment value.
Dictionary<string, int> counter = new Dictionary<string, int>();
foreach(var item in items)
{
if(counter.ContainsKey(item))
{
counter[item] = counter[item] + 1;
}
}
Here is wonderful example given, it may resolve your query.
http://www.dotnetperls.com/array-find
string[] array1 = { "cat", "dog", "carrot", "bird" };</br>
//
// Find first element starting with substring.
//
string value1 = Array.Find(array1,
element => element.StartsWith("car", StringComparison.Ordinal));</br>
//
// Find first element of three characters length.
//
string value2 = Array.Find(array1,
element => element.Length == 3);
//
// Find all elements not greater than four letters long.
//
string[] array2 = Array.FindAll(array1,
element => element.Length <= 4);
Console.WriteLine(value1);
Console.WriteLine(value2);
Console.WriteLine(string.Join(",", array2));
I'm writing a console-environment Hangman game for my introductory programming class. The player chooses the word length and number of guesses they would like. 'Easy mode' is simple enough... generate a random number to use as the list's index and check that the chosen word is the right length. However, 'hard mode' requires the list to be refined as the game progresses, choosing the largest list of possibilities given the letters guessed.
I should note, we are not using the C# List class but instead, creating array-based structs:
struct ListType
{
public type[] items;
public int count;
}
//defined as:
ListType myList = new ListType();
myList.items = new type[max value];
myList.count = 0;
Anyway, here's an example of the way 'hard mode' should go:
Word List:
hole
airplane
lame
photos
cart
mole
(player chooses word length of 4)
Word List (refined):
hole
lame
cart
mole
(player guesses "l", then "e")
Word List (refined):
hole
mole
"Lame" is omitted because more words have the "...le..." pattern. The technique that makes sense to me (but isn't working the way I'd like) is storing each word's pattern to an array (ie: "mole" and "hole" = 0011, and "lame" = 1001), and counting up the duplicates to determine the larger list.
Is this the way I should be doing it? I'm new to programming and have just under a year's worth of experience, so I guess answer as such.
Thanks!!
There are a few ways of approaching this. A simple way would be to keep track of a list for all candidate words and calculate the amount of matching sequences for that word as well as log the best matching sequence. This way you can both sort on the best sequence and the amount of sequences when the best sequence alone is not a good enough measurement tool. I hope it becomes obvious how to modify this code in order to only sort on the best sequence.
Firstly i setup a test case like:
// mimic the scenario given by the QA
string[] wordList = new string[] { "hole", "airplane", "lame", "photos", "cart", "mole" };
int wordLength = 4;
List<char> requiredCharacters = new List<char>{ 'l', 'e'};
After which i filter the wordList and calculate the best matches which i finally group together to produce the desired result:
// filter all words that dont match the required length
var candidateWords = wordList.Where(x => x.Length == wordLength);
// define a result set holding all the words and all their matches
Dictionary<string, List<int>> refinedWordSet = new Dictionary<string, List<int>>();
foreach (string word in candidateWords)
{
List<int> matches = new List<int>() { 0 };
int currentMatchCount = 0;
foreach (char character in word)
{
if (requiredCharacters.Contains(character))
{
currentMatchCount++;
}
else
{
// if there were previous matches
if (currentMatchCount > 0)
{
// save the current match
matches.Add(currentMatchCount);
currentMatchCount = 0;
}
}
}
// if there was a match at the end
if (currentMatchCount > 0)
{
// save the last match
matches.Add(currentMatchCount);
}
refinedWordSet.Add(word, matches);
}
// sort by a combination of the total amount of matches as well as the highest match
var goupedRefinedWords = from entry in refinedWordSet
group entry.Key by new { Max = entry.Value.Max(), Total = entry.Value.Sum() } into grouped
select grouped;
foreach (var entry in goupedRefinedWords)
{
Console.WriteLine("Word list with best match: {0} and total match {1}: {2}",
entry.Key.Max,
entry.Key.Total,
entry.Aggregate("", (result, nextWord) => result += nextWord + ", "));
}
Console.ReadLine();
Pay attention to the comments in the code
So you look through the array for strings that match the guess pattern.
In the specific case of "le" you culd simply use String.IndexOf(). If you require a more cmplex pattern.. say "*le?" (where * and ? follow DOS-like wildcard pattern) you could employ a dynamicly-cnstructed regex pattern (easy, but performace-heavy if used in a near-realtime system), or character scanning (read each char from the screen and match to your pattern) (more difficult, harder to maintain, better performance for a small number of elements in a near-RT system).
As this is homework, I wouldn't worry about performace profiling at all right now.
Also, that struct looks mighty goofy. There are certainly better constructs for this type of thing. Like a List<String>, or just a String[]... both of which have a .Count property.
It quite hard question to ask but I will try.
I have my 4 letters m u g o . I have also free string word(s).
Let'say: og ogg muogss. I am looking for any wise method to check if I can construct word(s) using only my letters. Please take notice that we used once g we won't be able to use it again.
og - possible because we need only **g** and **o**
ogg - not possible we took **o** and **g**, need the second **g**
muogss - not possible we took all, need also additional **s**
So my tactic is take my letters to char array and remove one by one and check how many left to build the word(s). But is it possible to use somehow in few lines, i do not know - regex ?
your method is only a few lines...
public static bool CanBeMadeFrom(string word, string letters)
{
foreach (var i in word.Select(c => letters.IndexOf(c, 0)))
{
if (i == -1) return false;
letters = letters.Remove(i, 1);
}
return true;
}
Here's a simple approach:
For your source word, create an array of size 26 and use it to count the how many times each letter appears.
Do the same for each word in your dictionary.
Then compare the two.
If every letter occurs less than or equal to as many times in the dictionary word as the source word, then it can be used to make that word. If not, then it cannot.
C-Sharpish Pseudocode: (probably doesn't compile as written)
/** Converts characters to a 0 to 25 code representing alphabet position.
This is specific to the English language and would need to be modified if used
for other languages. */
int charToLetter(char c) {
return Char.ToUpper(c)-'A';
}
/** Given a source word and an array of other words to check, returns all
words from the array which can be made from the letters of the source word. */
ArrayList<string> checkSubWords(string source, string[] dictionary) {
ArrayList<string> output = new ArrayList<string>();
// Stores how many of each letter are in the source word.
int[] sourcecount = new int[26]; // Should initialize to 0, automatically
foreach (char c in source) {
sourcecount[c]++;
}
foreach (string s in dictionary) {
// Stores how many of each letter are in the dictionary word.
int[] dictcount = new int[26]; // Should initialize to 0, automatically
foreach (char c in s) {
dictcount[c]++;
}
// Then we check that there exist no letters which appear more in the
// dictionary word than the source word.
boolean isSubword = true;
for (int i=0;i<26;i++) {
if (dictcount[i] > sourcecount[i]) {
isSubword = false;
}
}
// If they're all less than or equal to, then we add it to the output.
if (isSubWord) {
output.add(s);
}
}
return output;
}
If your definition of words is any arbitrary permutation of the available charactters then why do you need a regex? Just make sure you use each characters once. Regex doesn't know what a "correct word" is, and it's better to avoid using invalid characters by your algorithms than using them AND using a regex to make sure you didn't use them.
I am going to a directory picking up some files and then adding them to a Dictionary.
The first time in the loop the key needs to be A, second time B etc. Afer 26/Z the number represents different characters and from 33 it starts at lowercase a up to 49 which is lowercase q.
Without having a massive if statement to say if i == 1 then Key is 'A' etc etc how can I can keep this code tidy?
Sounds like you just need to keep an index of where you've got to, then some mapping function:
int index = 0;
foreach (...)
{
...
string key = MapIndexToKey(index);
dictionary[key] = value;
index++;
}
...
// Keys as per comments
private static readonly List<string> Keys =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopq"
.Select(x => x.ToString())
.ToList();
// This doesn't really need to be a separate method at the moment, but
// it means it's flexible for future expansion.
private static string MapIndexToKey(int index)
{
return Keys[index];
}
EDIT: I've updated the MapIndexToKey method to make it simpler. It's not clear why you want a string key if you only ever use a single character though...
Another edit: I believe you could actually just use:
string key = ((char) (index + 'A')).ToString();
instead of having the mapping function at all, given your requirements, as the characters are contiguous in Unicode order from 'A'...
Keep incrementing from 101 to 132, ignoring missing sequence, and convert them to character. http://www.asciitable.com/
Use reminder (divide by 132) to identify second loop
This gives you the opportunity to map letters to specific numbers, perhaps not alphabet ordered.
var letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
.Select((chr, index) => new {character = chr, index = index + 1 });
foreach(var letter in letters)
{
int index = letter.index;
char chr = letter.character;
// do something
}
How about:
for(int i=0; i<26; ++i)
{
dict[(char)('A'+ (i % 26))] = GetValueFor(i);
}