Simple(?) logic concerning HashSet - c#

I have a HashSet filled with about 50 posts which I want to pair in two by two into my database (the posts are a title and a description that belong together). The problem is that I cant get the logic together. This code below maybe explains a little better what I am thinking of:
foreach(string item in hash)
{
// Here something that assigns every uneven HashSet-post to item1, the even ones to item2
var NewsItem = new News
{
NewsTitle = item1
NewsDescription = item2
};
dbContext db = new dbContext();
db.News.Add(NewsItem);
db.SaveChanges();
}

You cannot "pair up" items from hash-based containers, because from the logical standpoint these containers are ordered arbitrarily *.
Therefore, you need to pair up the titles and descriptions when you insert your data into hash sets, like this:
class Message {
public string Title {get;set;}
public string Description {get;set;}
public int GetHashCode() {return 31*Title.GetHashCode()+Description.GetHashCode();}
public bool Equals(object other) {
if (other == this) return true;
Message obj = other as Message;
if (obj == null) return false;
return Title.Equals(obj.Title) && Description.Equals(obj.Description);
}
}
ISet<Message> hash = new HashSet<Message>();
At this point you can insert messages into your hash set. The titles and descriptions will be always paired up explicitly by participating in a single Message object.
* The current implementation from Microsoft does maintain the insertion order, but this is an unfortunate implementation detail.

I define the first item in the HashSet is odd(1), and the second even(2), etc.
Then a HashSet is not the right data structure. HastSets are not in any particular order, so if you need to extract the items sequentially then a plain List<string> would work.
That said, one way to do what you need is to use a for loop that gets items two-at-a-
time:
using(dbContext db = new dbContext())
{
for(int i = 0; i < list.Count - 1; i += 2)
{
var NewsItem = new News
{
NewsTitle = list[i];
NewsDescription = list[i+1];
};
db.News.Add(NewsItem);
}
}
db.SaveChanges();

Related

If it possible accelerate From-Where-Select method?

I have two var of code:
first:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var usersLookUp = list_fiodat.ToLookup(u => u.fio, u => u.dat); // create map
var dates = usersLookUp[FullName];
if (dates.Count() > 0)
{
foreach (var dt in dates)
{
if (dt == BirthDate) return true;
}
}
}
and second:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var members = from s in list_fiodat where s.fio == FullName & s.dat == Birthdate select s;
if (members.Count() > 0 return true;
}
They make the same job - searching user by name and birthday.
The first one work very quick.
The second is very slowly (10x-50x)
Tell me please if it possible accelerate the second one?
I mean may be the list need in special preparing?
I tried sorting: list_fiodat_sorted = list_fiodat.OrderBy(x => x.fio).ToList();, but...
I skip your first test and change Count() to Any() (count iterate all list while any stop when there are an element)
public bool Test1(List<pair_fiodat> list_fiodat)
{
foreach (string fname in XML_files)
{
var members = from s in list_fiodat
where s.fio == fname & s.dat == BirthDate
select s;
if (members.Any())
return true;
}
return false;
}
If you want optimize something, you must leave comfortable things that offer the language to you because usually this things are not free, they have a cost.
For example, for is faster than foreach. Is a bit more ugly, you need two sentences to get the variable, but is faster. If you iterate a very big collection, each iteration sum.
LINQ is very powerfull and it's wonder work with it, but has a cost. If you change it for another "for", you save time.
public bool Test2(List<pair_fiodat> list_fiodat)
{
for (int i = 0; i < XML_files.Count; i++)
{
string fname = XML_files[i];
for (int j = 0; j < list_fiodat.Count; j++)
{
var s = list_fiodat[j];
if (s.fio == fname & s.dat == BirthDate)
{
return true;
}
}
}
return false;
}
With normal collections there aren't difference and usually you use foeach, LINQ... but in extreme cases, you must go to low level.
In your first test, ToLookup is the key. It takes a long time. Think about this: you are iterating all your list, creating and filling the map. It's bad in any case but think about the case in which the item you are looking for is at the start of the list: you only need a few iterations to found it but you spend time in each of the items of your list creating the map. Only in the worst case, the time is similar and always worse with the map creation due to the creation itself.
The map is interesting if you need, for example, all the items that match some condition, get a list instead found a ingle item. You spend time creating the map once, but you use the map many times and, in each time, you save time (map is "direct access" against the for that is "sequencial").

How to return from Recursive Foreach Function

I am implementing recursion in one of my requirements. My actual requirement is as below:-
There is one master Table called Inventory which has many records like say "Inventory A","Inventory B","Inventory C".
There is one more table called Inventory Bundle which link one Inventory with other. So Inventory Bundle table has two columns :- SI & TI which represent Source Inventory Id and Target Inventory ID.
Record Ex.
SI TI
A B
B C
In my requirement if I click on any inventory then the associated inventory should also be fetched out.
Like here if I click on B then A & C should be fetched out. I use following recursion method to get the requirement:-
List<Guid> vmAllBundle = new List<Guid>();
List<Guid> vmRecursiveBundle = new List<Guid>();
List<Guid> processedList = new List<Guid>();
public List<Guid> GetAllRecursiveBundle(Guid invId, Guid originalInvId)
{
List<Guid> vmInvSrcBundleList = GetSourceInventory(invId); //Fetch to get All Related Source Inventories
List<Guid> vmInvTarBundleList = GetTargetInventory(invId); //Fetch to get All Related Target Inventories
vmAllBundle.AddRange(vmInvSrcBundleList);
vmAllBundle.AddRange(vmInvTarBundleList);
if (vmAllBundle.Contains(originalInvId))
vmAllBundle.Remove(originalInvId);
vmAllBundle = vmAllBundle.Distinct().ToList();
vmRecursiveBundle = vmAllBundle.ToList().Except(processedList).ToList();
foreach (Guid vmInvBundle in vmRecursiveBundle)
{
vmRecursiveBundle.Remove(vmInvBundle);
processedList.Add(vmInvBundle);
GetAllRecursiveBundle(vmInvBundle, originalInvId);
if (vmRecursiveBundle.Count == 0)
return vmAllBundle;
}
return null;
}
I am able to fetch the data using this method but I am facing problem while returning.
When I am returning it is calling GetAllRecursiveBundle() withing the foreach loop and continue to call until all the items in vmAllBundle gets finished. After this it exits the recursion.
This is something new to me so posting the question to ask if this is normal behavior or some code logic has to be changed.
Modified Code
public List<Guid> GetAllRecursiveBundle(Guid invId, Guid originalInvId)
{
if (vmRecursiveBundle.Count > 0)
vmRecursiveBundle.Remove(invId);
List<Guid> vmInvSrcBundleList = GetSourceInventory(invId); //Fetch to get All Related Source Inventories
List<Guid> vmInvTarBundleList = GetTargetInventory(invId); //Fetch to get All Related Target Inventories
vmAllBundle.AddRange(vmInvSrcBundleList);
vmAllBundle.AddRange(vmInvTarBundleList);
if (vmAllBundle.Contains(originalInvId))
vmAllBundle.Remove(originalInvId);
vmAllBundle = vmAllBundle.Distinct().ToList();
vmRecursiveBundle = vmAllBundle.ToList().Except(processedList).ToList();
foreach (Guid vmInvBundle in vmRecursiveBundle)
{
processedList.Add(vmInvBundle);
GetAllRecursiveBundle(vmInvBundle, originalInvId);
if (vmRecursiveBundle.Count == 0)
break;
}
return vmAllBundle;
}
I am very surprised that your code runs at all.
You are modifying the list being iterated by a foreach - normally that would throw an exception.
foreach (Guid vmInvBundle in vmRecursiveBundle)
{
vmRecursiveBundle.Remove(vmInvBundle); // **CRASHES HERE**
}
Modifying the collection being iterated by the foreach is not allowed, and would be considered bad practice even if it were allowed (because it frequently causes bugs).
You could change to a for loop, which has no such scruples:
for (int i = 0; i < vmRecursiveBundle.Count; i++)
{
Guid vmInvBundle = vmRecursiveBundle[i];
vmRecursiveBundle.Remove(vmInvBundle); // **NO CRASH**
i--; // counteracts the i++ so the next Guid is not skipped
}
For further details, see What is the best way to modify a list in a 'foreach' loop?
Normally, recursive method calls need something like a break value, that has to be checked on return, to signal the end of the recursive calls and to stop calling the reursive method. I do not fully understand your code, therefore here is an example:
private string SearchFileRecursive(string directory, string fileToFind)
{
string filePath = null;
string[] files = Directory.GetFiles(directory);
string foundFile = files.FirstOrDefault( file => (0 == string.Compare(Path.GetFileName(file), fileToFind, true)));
if(string.IsNullOrEmpty(foundFile))
{ // not found
string[] subDirectories = Directory.GetDirectories(directory);
foreach(string subDirectory in subDirectories)
{
filePath = SearchFileRecursive(subDirectory, fileToFind);
if(!string.IsNullOrEmpty(filePath)) // found
break;
}
}
else
{ // found
filePath = Path.Combine(directory, foundFile);
}
return filePath;
}

Generate unique list variable

I have a C# program where I have a list (List<string>) of unique strings. These strings represent the name of different cases. It is not important what is is. But they have to be unique.
cases = new List<string> { "case1", "case3", "case4" }
Sometimes I read some cases saved in a text format into my program. Sometime the a case stored on file have the same name as a case in my program.I have to rename this new case. Lets say that the name of the case I load from a file is case1.
But the trouble is. How to rename this without adding a large random string. In my case it should ideally be called case2, I do not find any good algorithm which can do that. I want to find the smalles number I can add which make it unique.
i would use a HashSet that only accepts unique values.
List<string> cases = new List<string>() { "case1", "case3", "case4" };
HashSet<string> hcases = new HashSet<string>(cases);
string Result = Enumerable.Range(1, 100).Select(x => "case" + x).First(x => hcases.Add(x));
// Result is "case2"
in this sample i try to add elements between 1 and 100 to the hashset and determine the first sucessfully Add()
If you have a list of unique strings consider to use a HashSet<string> instead. Since you want incrementing numbers that sounds as if you actually should use a custom class instead of a string. One that contains a name and a number property. Then you can increment the number and if you want the full name (or override ToString) use Name + Number.
Lets say that class is Case you could fill a HashSet<Case>. HashSet.Add returns false on duplicates. Then use a loop which increments the number until it could be added.
Something like this:
var cases = new HashSet<Case>();
// fill it ...
// later you want to add one from file:
while(!cases.Add(caseFromFile))
{
// you will get here if the set already contained one with this name+number
caseFromFile.Number++;
}
A possible implementation:
public class Case
{
public string Name { get; set; }
public int Number { get; set; }
// other properties
public override string ToString()
{
return Name + Number;
}
public override bool Equals(object obj)
{
Case other = obj as Case;
if (other == null) return false;
return other.ToString() == this.ToString();
}
public override int GetHashCode()
{
return (ToString() ?? "").GetHashCode();
}
// other methods
}
The solution is quite simple. Get the max number of case currently stored in the list, increment by one and add the new value:
var max = myList.Max(x => Convert.ToInt32(x.Substring("case".Length))) + 1;
myList.Add("case" + max);
Working fiddle.
EDIT: For filling any "holes" within your collection you may use this:
var tmp = myList;
var firstIndex = Convert.ToInt32(myList[0].Substring("case".Length));
for(int i = firstIndex; i < tmp.Count; i++) {
var curIndex = Convert.ToInt32(myList[i].Substring("case".Length));
if (curIndex != i)
{
myList.Add("case" + (curIndex + 1));
break;
}
}
It checks for every element in your list if its number behind the case is equal to its index in the list. The loop is stopped at the very first element where the condition is broken and therefor you have a hole in the list.

Parallel loop in c#, accessing the same variable

I have an Item object with a property called generator_list (hashset of strings). I have 8000 objects, and for each object, I'd like to see how it's generator_list intersects with every other generator_list, and then I'd like to store the intersection number in a List<int>, which will have 8000 elements, logically.
The process takes about 8 minutes, but only a few minutes with parallel processing, but I don't think I'm doing the parallel part right, hence the question. Can anyone please tell me if and how I need to modify my code to take advantage of the parallel loops?
The code for my Item object is:
public class Item
{
public int index { get; set; }
public HashSet<string> generator_list = new HashSet<string>();
}
I stored all my Item objects in a List<Item> items (8000 elements). I created a method that takes in items (the list I want to compare) and 1 Item (what I want to compare to), and it's like this:
public void Relatedness2(List<Item> compare, Item compare_to)
{
int compare_to_length = compare_to.generator_list.Count;
foreach (Item block in compare)
{
int block_length = block.generator_list.Count;
int both = 0; //this counts the intersection number
if (compare_to_length < block_length) //to make sure I'm looping
//over the smaller set
{
foreach (string word in compare_to.generator_list)
{
if (block.generator_list.Contains(word))
{
both = both + 1;
}
}
}
else
{
foreach (string word in block.generator_list)
{
if (compare_to.generator_list.Contains(word))
{
both = both + 1;
}
}
}
// I'd like to store the intersection number, both,
// somewhere so I can effectively use parallel loops
}
}
And finally, my Parallel forloop is:
Parallel.ForEach(items, (kk, state, index) => Relatedness2(items, kk));
Any suggestions?
Maybe something like this
public Dictionary<int, int> Relatedness2(IList<Item> compare, Item compare_to)
{
int compare_to_length = compare_to.generator_list.Count;
var intersectionData = new Dictionary<int, int>();
foreach (Item block in compare)
{
int block_length = block.generator_list.Count;
int both = 0;
if (compare_to_length < block_length)
{
foreach (string word in compare_to.generator_list)
{
if (block.generator_list.Contains(word))
{
both = both + 1;
}
}
}
else
{
foreach (string word in block.generator_list)
{
if (compare_to.generator_list.Contains(word))
{
both = both + 1;
}
}
}
intersectionData[block.index] = both;
}
return intersectionData;
}
And
List<Item> items = new List<Item>(8000);
//add to list
var dictionary = new ConcurrentDictionary<int, Dictionary<int, int>>();//thread-safe dictionary
var readOnlyItems = items.AsReadOnly();// if you sure you wouldn't modify collection, feel free use items directly
Parallel.ForEach(readOnlyItems, item =>
{
dictionary[item.index] = Relatedness2(readOnlyItems, item);
});
I assumed that index unique.
i used a dictionaries, but you may want to use your own classes
in my example you can access data in following manner
var intesectiondata = dictionary[1]//dictionary of intersection for item with index 1
var countOfintersectionItemIndex1AndItemIndex3 = dictionary[1][3]
var countOfintersectionItemIndex3AndItemIndex7 = dictionary[3][7]
don't forget about possibility dictionary[i] == null
Thread safe collections is probably what you are looking for http://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx.
When working in multithreaded environment, you need to make sure that
you are not manipulating shared data at the same time without
synchronizing access.
the .NET Framework offers some collection classes that are created
specifically for use in concurrent environments, which is what you
have when you're using multithreading. These collections are
thread-safe, which means that they internally use synchronization to
make sure that they can be accessed by multiple threads at the same
time.
Source: Programming in C# Exam Ref 70-483, Objective 1.1: Implement multhitreading and asynchronous processing, Using Concurrent collections
Which are the following collections
BlockingCollection<T>
ConcurrentBag<T>
ConcurrentDictionary<T>
ConcurentQueue<T>
ConcurentStack<T>
If your Item's index is contiguous and starts at 0, you don't need the Item class at all. Just use a List< HashSet< < string>>, it'll take care of indexes for you. This solution finds the intersect count between 1 item and the others in a parallel LINQ. It then takes that and runs it on all items of your collection in another parallel LINQ. Like so
var items = new List<HashSet<string>>
{
new HashSet<string> {"1", "2"},
new HashSet<string> {"2", "3"},
new HashSet<string> {"3", "4"},
new HashSet<string>{"1", "4"}
};
var intersects = items.AsParallel().Select( //Outer loop to run on all items
item => items.AsParallel().Select( //Inner loop to calculate intersects
item2 => item.Intersect(item2).Count())
//This ToList will create a single List<int>
//with the intersects for that item
.ToList()
//This ToList will create the final List<List<int>>
//that contains all intersects.
).ToList();

am I using Dictionary wrong, it seems it too slow

I've used VS profilier and noticed that ~40% of the time program spends in the lines below.
I'm using title1 and color1 because either Visual Studio or Resharper suggested to do so. Are there any perfomance issues in the code below?
Dictionary<Item, int> price_cache = new Dictionary<Item, int>();
....
string title1 = title;
string color1 = color;
if (price_cache.Keys.Any(item => item.Title == title && item.Color == color))
{
price = price_cache[price_cache.Keys.First(item => item.Title == title11 && item.Color == color1)];
The problem is that your Keys.Any method iterates through all keys in your dictionary to find if there is a match. After that, you use the First method to do the same thing again.
Dictionary is suited for operations when you already have the key and want to get the value fast. In that case, it will calculate the hash code of your key (Item, in your case) and use it to "jump" to the bucket where your item is stored.
First, you need to make your custom comparer to let the Dictionary know how to compare items.
class TitleColorEqualityComparer : IEqualityComparer<Item>
{
public bool Equals(Item a, Item b)
{
// you might also check for nulls here
return a.Title == b.Title &&
a.Color == b.Color;
}
public int GetHashCode(Item obj)
{
// this should be as much unique as possible,
// but not too complicated to calculate
int hash = 17;
hash = hash * 31 + obj.Title.GetHashCode();
hash = hash * 31 + obj.Color.GetHashCode();
return hash;
}
}
Then, instantiate your dictionary using your custom comparer:
Dictionary<Item, int> price_cache =
new Dictionary<Item, int>(new TitleColorEqualityComparer());
From this point on, you can simply write:
Item some_item = GetSomeItem();
price_cache[some_item] = 5; // to quickly set or change a value
or, to search the dictionary:
Item item = GetSomeItem();
int price = 0;
if (price_cache.TryGetValue(item, out price))
{
// we got the price
}
else
{
// there is no such key in the dictionary
}
[Edit]
And to emphasize again: never iterate the Keys property to look for a key. If you do that, you don't need a Dictionary at all, you can simply use a list and get same (even slightly better performance).
Try using an IEqualityComparer as shown in the sample code on this page: http://msdn.microsoft.com/en-us/library/ms132151.aspx and make it calculate the hash code based on the title and color.
As Jesus Ramos suggested (when he said use a different data structure), you could make the key a string that is a concatenation of the title and color, then concatenate the search string and look for that. It should be faster.
So a key could look like name1:FFFFFF (the name, a colon, then the hex of the color), then you would just format the search string the same way.
Replace your price_cache.Keys.Any() with price_cache.Keys.SingleOrDefault() and this way you can store the result in a variable, check for nullity and if not you already have the searched item instead of searching for it twice like you do here.
If you want fast access to your hashtable, you need to implement the GetHashCode and Equals functioning:
public class Item
{
.....
public override int GetHashCode()
{
return (this.color.GetHashCode() + this.title.GetHashCode())/2;
}
public override bool Equals(object o)
{
if (this == o) return true;
var item = o as Item;
return (item != null) && (item.color == color) && (item.title== title) ;
}
Access you dictionary like:
Item item = ...// create sample item
int price = 0;
price_cache.ContainsKey(item);
price_cache[item];
price_cache.TryGetValue(item, out price);

Categories