Performance issue with DictionaryEntry looping - c#

Currently I have 231556 of words collection and do below loop to check every words for duplication.
I am using this function :-
public bool IsContainStringCIAI(string wordIn, HybridDictionary hd, out string wordOut)
{
int iValue = 1;
foreach (DictionaryEntry de2 in hd)
{
iValue = CultureInfo.CurrentCulture.CompareInfo.Compare(wordIn.ToLower(), de2.Key.ToString().ToLower(), CompareOptions.IgnoreNonSpace);
if (iValue == 0)
{
wordOut = de2.Key.ToString(); //Assign the existing word
return true;
}
}
wordOut = wordIn;
return false;
}
It take around 20 hours to finish looping, because each word will be added in to dictionary after comparing if it is not same. Anything can I do to improve this loop? Thanks before.

Can you convert your HybridDictionary to a Dictionary<string, string> where all the keys are already converted into a format you can compare (lower case, unwanted characters stripped out, whatever)? Then your method pretty much becomes this:
return hd.TryGetValue(wordIn.ToLower(), out wordOut);
And Dictionary is pretty fast ;]

Related

I wonder what is exact meaning of "return"

I've studying reflexive method and dictionary class(I'm nor sure it's correct in English cuz I'm Korean) and there was a example calculating fibonacci in C# book.
class Fibonacci
{
private static Dictionary<int, long> memo = new Dictionary<int, long>();
public static long Get(int i)
{
if (i < 0) { return 0; }
if (i == 1) { return 1; }
if (memo.ContainsKey(i)) { return memo[i]; }
else
{
long value = Get(i - 2) + Get(i - 1);
memo[i] = value;
return value;
}
}
}
I wonder which code puts key and value into dictionary list. At i<0 i==0 line I think it seems "return" does that job but at
if (memo.ContainsKey(i)) { return memo[i]; }
here return does its job as passing(handing over?) the code if there is same key in the dictionary list.
So I'm curious what does "return" actually mean. Help!
The i < 0 and i == 1 lines do not use the dictionary at all. For those early terms, the dictionary lookup would be slower than the calculation anyway.
The if (memo.ContainsKey(i)) { return memo[i]; } line also does not set anything new in the dictionary. It reads from the dictionary, but does not write to it, because either the value is already there or we don't know what to write yet. Today, I might update this code to use TryGetValue() instead, in order to save one hash calculation and lookup on success.
The line that actually writes to the dictionary is this:
memo[i] = value;
Additionally, the reason the dictionary is named memo is because the code uses a technique called memoization.
memo[i] = value; adds value to the dictionary. To be more precise, it adds the key value pair if the key didn't exist, or overwrites the value for the key if it existed. Take a look:
private static Dictionary<int, long> memo = new Dictionary<int, long>();
public static void AddToDictionary(int key, long value)
{
memo[key] = value;
foreach (var item in memo)
{
System.Console.WriteLine($"{item.Key}: {item.Value}");
}
}
Result:
Adding 1,2 key-value pair.
1: 2
Adding 1,8 key-value pair.
1: 8
Adding 2,2 key-value pair.
1: 8
2: 2

Best way to find if a value is present in the array, and if so execute code

I'm a student and I was wondering what the most efficient way is to check if a certain value is present in a array.
My second attempt:
string value = "pow";
string[] array = new string[] { "pong", "ping", "pow" };
bool valueIsInArray = false;
foreach(var s in array) if (s == value) valueIsInArray = true;
if (valueIsInArray)
{
// code here
}
I've researched and found if I were to use LINQ the code would look like this:
string value = "oink"; // value given to the method
string[] array = new string[] { "oink", "oink", "baboinkadoink" };
if (array.Contains(value))
{
//code here
}
The question is if using LINQ in anyway negatively impacts the speed or consistency of the code, and if there is an even better way to go about doing this?
Use linq Any(), The enumeration of source is stopped as soon as the result can be determined.
string value = "pow";
string[] array = new string[] { "pong", "ping", "pow" };
bool isValuePresent = array.Any(x => x == value);
https://msdn.microsoft.com/en-us/library/bb534972(v=vs.110).aspx
As a commenter said, LiNQ won't really trouble you here. The difference is microscopic (even on larger collections). However, if you must use an alternative, use IndexOf. See: https://msdn.microsoft.com/en-us/library/system.array.indexof(v=vs.110).aspx
Example:
string value = "oink"; // value given to the method
string[] array = new string[] { "oink", "oink", "baboinkadoink" };
if (Array.IndexOf(array, value) > -1)
{
//code here
}
Although I'm not sure what Contains ends up doing underwater, but they probably make a call to IndexOf aswell.
Willy-nilly you have to scan the array up to the first match (or entire array if there's no match); you can either put foreach loop:
bool valueIsInArray = false;
foreach (var item in array)
if (item == value) {
valueIsInArray = true;
break;
}
use for one:
bool valueIsInArray = false;
foreach (int i = 0; i < array.Length; ++i)
if (array[i] == value) {
valueIsInArray = true;
break;
}
Try Array class:
bool valueIsInArray = array.Contains(value);
Implement the code with a help of Linq:
bool valueIsInArray = array.Any(item => item == value);
The difference of these methods is a question of microseconds (if any); that's why put the version which is the most readable for you. My own choice is array.Contains(value) - let the system work for you and hide unwanted details (e.g. break in the loop)
You shoud have to iterate through the entire array for checking the value.
Either Linq or Conventional looping methods. Or you can use the
Array.Find()
also for the same. Better to go with the Linq and make the code is more simpler.
Happy coding

Using LINQ in a string array to improve efficient C#

I have a equation string and when I split it with a my pattern I get the folowing string array.
string[] equationList = {"code1","+","code2","-","code3"};
Then from this I create a list which only contains the codes.
List<string> codeList = {"code1","code2","code3"};
Then existing code loop through the codeList and retrieve the value of each code and replaces the value in the equationList with the below code.
foreach (var code in codeList ){
var codeVal = GetCodeValue(code);
for (var i = 0; i < equationList.Length; i++){
if (!equationList[i].Equals(code,StringComparison.InvariantCultureIgnoreCase)) continue;
equationList[i] = codeVal;
break;
}
}
I am trying to improve the efficiency and I believe I can get rid of the for loop within the foreach by using linq.
My question is would it be any better if I do in terms of speeding up the process?
If yes then can you please help with the linq statement?
Before jumping to LINQ... which doesn't solve any problems you've described, let's look at the logic you have here.
We split a string with a 'pattern'. How?
We then create a new list of codes. How?
We then loop through those codes and decode them. How?
But since we forgot to keep track of where those code came from, we now loop through the equationList (which is an array, not a List<T>) to substitute the results.
Seems a little convoluted to me.
Maybe a simpler solution would be:
Take in a string, and return IEnumerable<string> of words (similar to what you do now).
Take in a IEnumerable<string> of words, and return a IEnumerable<?> of values.
That is to say with this second step iterate over the strings, and simply return the value you want to return - rather than trying to extract certain values out, parsing them, and then inserting them back into a collection.
//Ideally we return something more specific eg, IEnumerable<Tokens>
public IEnumerable<string> ParseEquation(IEnumerable<string> words)
{
foreach (var word in words)
{
if (IsOperator(word)) yield return ToOperator(word);
else if (IsCode(word)) yield return ToCode(word);
else ...;
}
}
This is quite similar to the LINQ Select Statement... if one insisted I would suggest writing something like so:
var tokens = equationList.Select(ToToken);
...
public Token ToToken(string word)
{
if (IsOperator(word)) return ToOperator(word);
else if (IsCode(word)) return ToCode(word);
else ...;
}
If GetCodeValue(code) doesn't already, I suggest it probably could use some sort of caching/dictionary in its implementation - though the specifics dictate this.
The benefits of this approach is that it is flexible (we can easily add more processing steps), simple to follow (we put in these values and get these as a result, no mutating state) and easy to write. It also breaks the problem down into nice little chunks that solve their own task, which will help immensely when trying to refactor, or find niggly bugs/performance issues.
If your array is always alternating codex then operator this LINQ should do what you want:
string[] equationList = { "code1", "+", "code2", "-", "code3" };
var processedList = equationList.Select((s,j) => (j % 2 == 1) ? s :GetCodeValue(s)).ToArray();
You will need to check if it is faster
I think the fastest solution will be this:
var codeCache = new Dictionary<string, string>();
for (var i = equationList.Length - 1; i >= 0; --i)
{
var item = equationList[i];
if (! < item is valid >) // you know this because you created the codeList
continue;
string codeVal;
if (!codeCache.TryGetValue(item, out codeVal))
{
codeVal = GetCodeValue(item);
codeCache.Add(item, codeVal);
}
equationList[i] = codeVal;
}
You don't need a codeList. If every code is unique you can remove the codeCace.

How to check if random values are unique?

C # code:
I have 20 random numbers between 1-100 in an array and the program should check if every value is unique. Now i should use another method which returns true if there are only unique values in the array and false if there are not any unique values in the array. I would appreciate if someone could help me with this.
bool allUnique = array.Distinct().Count() == array.Count(); // or array.Length
or
var uniqueNumbers = new HashSet<int>(array);
bool allUnique = uniqueNumbers.Count == array.Count();
A small alternative to #TimSchmelters excellent answers that can run a bit more efficient:
public static bool AllUniq<T> (this IEnumerable<T> data) {
HashSet<T> hs = new HashSet<T>();
return data.All(hs.Add);
}
What this basically does is generating a for loop:
public static bool AllUniq<T> (this IEnumerable<T> data) {
HashSet<T> hs = new HashSet<T>();
foreach(T x in data) {
if(!hs.Add(x)) {
return false;
}
}
return true;
}
From the moment one hs.Add fails - this because the element already exists - the method returns false, if no such object can be found, it returns true.
The reason that this can work faster is that it will stop the process from the moment a duplicate is found whereas the previously discussed approaches first construct a collection of unique numbers and then compare the size. Now if you iterate over large amount of numbers, constructing the entire distinct list can be computationally intensive.
Furthermore note that there are more clever ways than generate-and-test to generate random distinct numbers. For instance interleave the generate and test procedure. Once a project I had to correct generated Sudoku's this way. The result was that one had to wait entire days before it came up with a puzzle.
Here's a non linq solution
for(int i=0; i< YourArray.Length;i++)
{
for(int x=i+1; x< YourArray.Length; x++)
{
if(YourArray[i] == YourArray[x])
{
Console.WriteLine("Found repeated value");
}
}
}

Returning potential strings from a list of strings and their next chars performance

This is a question about returning efficiently strings and chars from a string array where:
The string in the string array starts with the user input supplied
The next letter of those strings as a collection of chars.
The idea is that when the user types a letter, the potential responses are displayed along with their next letters. Therefore response time is important, hence a performant algorithm is required.
E.g. If the string array contained:
string[] stringArray = new string[] { "Moose", "Mouse", "Moorhen", "Leopard", "Aardvark" };
If the user types in “Mo”, then “Moose”, “Mouse” and “Moorhen” should be returned along with chars “o” and “u” for the potential next letters.
This felt like a job for LINQ, so my current implementation as a static method is (I store the output to a Suggestions object which just has properties for the 2 returned lists):
public static Suggestions
GetSuggestions
(String userInput,
String[] stringArray)
{
// Get all possible strings based on the user input. This will always contain
// values which are the same length or longer than the user input.
IEnumerable<string> possibleStrings = stringArray.Where(x => x.StartsWith(userInput));
IEnumerable<char> nextLetterChars = null;
// If we have possible strings and we have some input, get the next letter(s)
if (possibleStrings.Any() &&
!string.IsNullOrEmpty(userInput))
{
// the user input contains chars, so lets find the possible next letters.
nextLetterChars =
possibleStrings.Select<string, char>
(x =>
{
// The input is the same as the possible string so return an empty char.
if (x == userInput)
{
return '\0';
}
else
{
// Remove the user input from the start of the possible string, then get
// the next character.
return x.Substring(userInput.Length, x.Length - userInput.Length)[0];
}
});
} // End if
I implemented a second version which actually stored all typing combinations to a list of dictionaries; one for each word, with key on combination and value as the actual animal required, e.g.:
Dictionary 1:
Keys Value
“M” “Moose”
“MO “Moose”
Etc.
Dictionary 2:
Keys Value
“M” “Mouse”
“MO” “Mouse”
Etc.
Since dictionary access has an O(1) retrieval time – I thought perhaps this would be a better approach.
So for loading the dictionaries at start up:
List<Dictionary<string, string>> animalCombinations = new List<Dictionary<string, string>>();
foreach (string animal in stringArray)
{
Dictionary<string, string> animalCombination = new Dictionary<string, string>();
string accumulatedAnimalString = string.Empty;
foreach (char character in animal)
{
accumulatedAnimalString += character;
animalCombination[accumulatedAnimalString] = animal;
}
animalCombinations.Add(animalCombination);
}
And then at runtime to get possible strings:
// Select value entries from the list of dictionaries which contain
// keys which match the user input and flatten into one list.
IEnumerable<string> possibleStrings =
animalCombinations.SelectMany
(animalCombination =>
{
return animalCombination.Values.Where(x =>
animalCombination.ContainsKey(userInput));
});
So questions are:
Which approach is better?
Is there a better approach to this which has better performance?
Are LINQ expressions expensive to process?
Thanks
Which approach is better?
Probably the dictionary approach, but you'll have to profile to find out.
Is there a better approach to this which has better performance?
Use a prefix tree.
Are LINQ expressions expensive to process?
Written correctly, they add very little overhead to imperative versions of the same code. Since they are easier to read and maintain and write, they are usually the way to go.

Categories