What is the point of Lookup<TKey, TElement>? - c#

The MSDN explains Lookup like this:
A Lookup<TKey, TElement>
resembles a Dictionary<TKey,
TValue>. The difference is that a
Dictionary<TKey, TValue> maps keys to single values, whereas a
Lookup<TKey, TElement> maps keys to collections of values.
I don't find that explanation particularly helpful. What is Lookup used for?

It's a cross between an IGrouping and a dictionary. It lets you group items together by a key, but then access them via that key in an efficient manner (rather than just iterating over them all, which is what GroupBy lets you do).
For example, you could take a load of .NET types and build a lookup by namespace... then get to all the types in a particular namespace very easily:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml;
public class Test
{
static void Main()
{
// Just types covering some different assemblies
Type[] sampleTypes = new[] { typeof(List<>), typeof(string),
typeof(Enumerable), typeof(XmlReader) };
// All the types in those assemblies
IEnumerable<Type> allTypes = sampleTypes.Select(t => t.Assembly)
.SelectMany(a => a.GetTypes());
// Grouped by namespace, but indexable
ILookup<string, Type> lookup = allTypes.ToLookup(t => t.Namespace);
foreach (Type type in lookup["System"])
{
Console.WriteLine("{0}: {1}",
type.FullName, type.Assembly.GetName().Name);
}
}
}
(I'd normally use var for most of these declarations, in normal code.)

One way to think about it is this: Lookup<TKey, TElement> is similar to Dictionary<TKey, Collection<TElement>>. Basically a list of zero or more elements can be returned via the same key.
namespace LookupSample
{
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
List<string> names = new List<string>();
names.Add("Smith");
names.Add("Stevenson");
names.Add("Jones");
ILookup<char, string> namesByInitial = names.ToLookup((n) => n[0]);
// count the names
Console.WriteLine("J's: {0}", namesByInitial['J'].Count()); // 1
Console.WriteLine("S's: {0}", namesByInitial['S'].Count()); // 2
Console.WriteLine("Z's: {0}", namesByInitial['Z'].Count()); // 0, does not throw
}
}
}

One use of Lookup could be to reverse a Dictionary.
Suppose you have a phonebook implemented as a Dictionary with a bunch of (unique) names as keys, each name associated with a phone number. But two people with different names might share the same phone number. This isn't a problem for a Dictionary, which doesn't care that two keys correspond to the same value.
Now you want a way of looking up who a given phone number belongs to. You build a Lookup, adding all the KeyValuePairs from your Dictionary, but backwards, with the value as the key and the key as the value. You can now query a phone number, and obtain a list of names of all the people whose phone number that is. Building a Dictionary with the same data would drop data (or fail, depending on how you did it), since doing
dictionary["555-6593"] = "Dr. Emmett Brown";
dictionary["555-6593"] = "Marty McFly";
means that the second entry overwrites the first - the Doc is no longer listed.
Trying to write the same data in a slightly different way:
dictionary.Add("555-6593", "Dr. Emmett Brown");
dictionary.Add("555-6593", "Marty McFly");
would throw an exception on the second line since you can't Add a key which is already in the Dictionary.
[Of course, you might want to use some other single data structure to do lookups in both directions, etc. This example means that you have to regenerate the Lookup from the Dictionary each time the latter changes. But for some data it could be the right solution.]

I haven't successfully used it before, but here is my go:
A Lookup<TKey, TElement> would behave pretty much like a (relational) database index on a table without a unique constraint. Use it in the same places you would use the other.

I guess you could argue it this way: imagine you're creating a data structure to hold the contents of a phone book. You want to key by lastName and then by firstName. Using a dictionary here would be dangerous because many people can have the same name. So a Dictionary will always, at most, map to a single value.
A Lookup will map to potentially several values.
Lookup["Smith"]["John"] will be a collection of size one billion.

Supplement more:
ToLookup is immediate execution, and will cache the result in memory. But GroupBy is deferred execution, will not cache grouped result, it will regroup when you called it each time.
If you need to repeatedly access a "grouped fixed data", you should choose ToLookUp to get a LookUp Instance. Especially when the amount of data is large or access data many times, using GroupBy will cause serious performance problems - ToLookUp at the cost of using more memory, the cached grouping results will give your code better performance.
BTW: LookUp can sometimes be used as a "EasyDictionary", because it doesn't throw exceptions on non existed key.

Related

How is a SortedList with SortedLists as keys ordered? (C#)

I'm trying to create a SortedList that contains SortedLists as keys and I want them to be ordered by the first element on the key.
However, I do not know how to find information about how different sorted lists compare to each other (do they use the pointer/address value? Do they use the first element of the list? Etc.)
I thought about creating a class that inherits SortedList (so the keys of the outer SortedList would be of this new personalized class type) but I'm struggling because I can not find what characteristics the Key and Value must have (to define the generic constraints).
Thanks in advance!
As juharr said, using a SortedList as a key in a SortedList will sort it using the reference values for comparing.
I did not manage it to sort it the way I wanted but I got an answer.

Linq to get list of custom objects where a certain value contained within dictionary property equals a specific value

I am struggling to solve this issue and have searched multiple ways and cannot seem to find an answer. I inherited this app from someone else and need to add a couple features to the app. I have not worked much with dictionaries and linq before, so I have been searching and trying to gain knowledge to do what I need to do.
There is a class with the following properties(removed some properties not necessary for this discussion):
class EmailRecord
{
public Dictionary<string, List<string>> emails = new Dictionary<string, List<string>>();
public string RecordID { get; set; }
[followed by additional properties and constructors...]
When the objects are created, the emails Property would have a template string in the key, and a list of strings containing email addresses in the values. For my purposes, I do not need to know what is in the key.
I have a list of EmailRecord objects called allRecords. I need to query allRecords to get a list of all EmailRecord objects where the emails dictionary property's list of values contains a specific email address I have stored in a variable called recipientEmail. The key doesn't matter, and it doesn't matter how many times the email shows up. I just need the instance of the object included in the results if the email shows up anywhere in the values of the emails property. In an instance of EmailRecord, the emails dictionary property may have two keys and within each of those keys, multiple emails in a list of strings for the value. I don't need to limit to a specific key, I just need to know if an email exists anywhere within the list of email strings anywhere in that dictionary.
I've tried a few things, with the latest being this (which doesn't work):
var results = EmailRecords
.SelectMany(x => x.emails)
.Where(x => x.Value.Contains(recipientEmail));
The above just seems to be returning the dictionary property, not the entire object.
I want to be able to loop through the results with something like this:
foreach (EmailRecord foundRecord in results) {
...do work here
}
Any thoughts or suggestions to assist me as I am trying to learn Linq? Thank you in advance for any help you can provide.
If you want to loop through EmailRecord objects which one of its emails property values contains recipientEmail, then you need to have a list of EmailRecord first. Then search throught them. following should do the trick.
List<EmailRecord> EmailRecords = new List<EmailRecord>();
//Fill the EmailRecords somewhere
foreach (KeyValuePair<string, List<string>> emailfoundRecord in
EmailRecords.emails.Where(x => x.Value.Contains(recipientEmail)))
{
//do work here
}
When you call EmailRecords.SelectMany(x => x.Emails) what you get back is an IEnumerable<KeyValuePair<string, List<string>>> or similar. Obviously this is not what you're after for your result since it strips away all that other information.
With LINQ the first thing to consider at each stage is what you are expecting to get out of the query. In this case the query results should be an enumeration of EmailRecord instances which is also what we're feeding in. Filtering that list is most simply done with the Where method, so that's where you should do all the work
Next decide on your filter criteria and write the filter predicate to suit. For any given EmailRecord we want to find out if any of the dictionary entries contains a particular email address. Since the dictionary values are lists we'll use Contains to do the actual comparison, and Any to test the dictionary itself.
Which looks like this:
var filtered = EmailRecords.Where(e =>
e.Emails.Any(kv =>
kv.Value.Contains(recipientEmail)
)
);
This works because a dictionary is also an enumerable, with each entry in the enumeration being a key/value pair.
Using Any will stop when it finds a single matching entry instead of continuing to the end of the Emails dictionary for every EmailRecord instance. If there are a lot of emails and you're expecting a high number of selections then this might save some time. Probably not however, since generally this sort of structure doesn't have a lot of duplicate email addresses in it.
Depending on how often you want to do this however it might be quicker to build a lookup and query that. Assuming that your EmailRecords list changes infrequently and you are doing a lot of this sort of lookup, you could get a large speedup.
I'll use a Dictionary<string, EmailRecord[]> for the lookup because it's (fairly) simple to build once we get a list of all of the pairs of email address and EmailRecord objects:
var emailReferences = EmailRecords.SelectMany(e =>
e.Emails.SelectMany(kv =>
kv.Value.Select(v =>
new { address = v, record = e }
)
)
);
var lookup =
emailReferences
.GroupBy(i => i.address, i => i.record)
.ToDictionary(g => g.Key, g => g.ToArray());
;
From this you will be able to locate an email address and get its referencing EmailRecord instances fairly simply and quickly:
EmailRecord[] filtered = null;
lookup.TryGetValue(recipientEmail, out filtered);
This will be faster per lookup than the LINQ equivalent above, but the setup could consume a fair amount of time and memory for large lists. If you have small or frequently changing lists (since the lookup has to be regenerated or at least invalidated at each change) then this won't improve your program's speed.
As a completely unsolicited aside, here's an extension method I use when dealing with dictionaries that have List<> as the value:
public static partial class extensions
{
public static Dictionary<TKey, List<TElem>> Add<TKey, TElem>(this Dictionary<TKey, List<TElem>> dict, TKey key, TElem value)
{
List<TElem> list;
if (dict.ContainsKey(key))
list = dict[key];
else
dict[key] = list = new List<TElem>();
list.Add(value);
return dict;
}
}
It helps make your adds simpler and easier to read.

Which data structure/class I can use to represent one-to-many relation?

I am trying to write a program that would use a data structure/class that will hold multiple data entries for one key - this will be somehow similar to Dictionary but it's not one to one but one to many relation. I am trying to think of a class that I can use but I cannot figure anything out.
For instance how it may look like:
I have a parameter xValue and 3 different values in different files so i would have :
xValue, <1.txt, 1>
xValue, <2.txt, 2>
xValue, <3.txt, 3>
Any ideas ?
EDIT:
I have figured this out - After all I can use
Dictionary< string , Dictionary<..., ... > >
, can't I ?
As there is no multiset in .NET natively, I would go for
Dictionary<Key, HashSet<XValue>>
in your case.
If you are ok with using 3rd-party containers, you can look up the answers from here, e.g., Wintellect PowerCollections.
If you do not need modify this collection after initialization and just need to do search, you can leverage built in Lookup<TKey, TElement> class, but really this would be tricky and useful in rare cases when you already have IEnumerable<> instances and would flatten it to lookup data structure, anyway this is pretty useful to keep in mind that .NET provides such intersting class.
MSDN
Represents a collection of keys each mapped to one or more values. A
Lookup<TKey, TElement> resembles a Dictionary<TKey, TValue>. The
difference is that a Dictionary<TKey, TValue> maps keys to single
values, whereas a Lookup<TKey, TElement> maps keys to collections of
values.
You can not instantiate it explicitly and just can get instance of lookup using LINQ ToLookup() method. There are major restrictions so you can use this class as lookup data structure - doing search.
There is no public constructor to create a new instance of a
Lookup. Additionally, Lookup objects
are immutable, that is, you cannot add or remove elements or keys from
a Lookup object after it has been created.

Membase server And Enyim -- Integrating LINQ and/or Collections

I just installed membase and the enyim client for .NET, and came across an article that mentions this technique for integrating linq:
public static IEnumerable<T> CachedQuery<T>
(this IQueryable<T> query, MembaseClient cache, string key) where T : class
{
object result;
if (cache.TryGet(key, out result))
{
return (IEnumerable<T>)result;
}
else
{
IEnumerable<T> items = query.ToList();
cache.Store(StoreMode.Set, key, items);
return items;
}
}
It will check if the required data is in cache first, and if not cache it then return it.
Currently I am using a Dictionary<'String, List'> in my application and want to replace this with a membase/memcached type approach.
What about a similar pattern for adding items to a List<'T'> or using Linq operators on a cached list? It seems to me that it could be a bad idea to store an entire List<'T'> in cache under a single key and have to retrieve it, add to it, and then re-set it each time you want to add an element. Or is this an acceptable practice?
public bool Add(T item)
{
object list;
if (cache.TryGet(this.Key, out list))
{
var _list = list as List<T>;
_list.Add(item);
return cache.Store(StoreMode.Set, this.Key, _list);
}
else
{
var _list = new List<T>(new T[] { item });
return cache.Store(StoreMode.Set, this.Key, _list);
}
}
How are collections usually handled in a caching situation like this? Are hashing algorithms usually used instead, or some sort of key-prefixing system to identify 'Lists' of type T within the key-value store of the cache?
It depends on several factors:
Is this supposed to be scalable? Is this list user-specific and you can be certain that "Add" won't be called twice at the same time for the same list? - Race conditions are a risk.
I did implement such a thing where I stored a generic list in membase, but it's user-specific, so I can be pretty certain that there will be no race condition.
You should also consider the volume of the serialized list, which may be large. I my case the lists were pretty small.
Not sure if it helps, but I implemented a very basic iterateable list with random access over membase (via double indirection). Random access is done via a composite key (which is composed of several fields).
You need to:
Have a key that holds the list's length.
Have the ability to build the composite key (e.g one or more fields from your object).
Have the value that you'd like save (e.g. another field).
E.g:
list_length = 3
prefix1_0-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
prefix1_1-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
prefix1_2-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
To perform serial access you iterate over the keys with "prefix1". To perform random access you use the keys with "prefix2" ans the fields that compose the key.
I hope it's clear enough.

Using Custom Generic Collection faster with objects than List

I'm iterating through a List<> to find a matching element. The problem is that object has only 2 significant values, Name and Link (both strings), but has some other values which I don't want to compare.
I'm thinking about using something like HashSet (which is exactly what I'm searching for -- fast) from .NET 3.5 but target framework has to be 2.0. There is something called Power Collections here: http://powercollections.codeplex.com/, should I use that?
But maybe there is other way? If not, can you suggest me a suitable custom collection?
In .NET 2.0 instead of a HashSet<T> you can use a Dictionary<K, V>.
Dictionary uses the hash code to perform key lookups so it has similar performace to the HashSet. There are at least two approaches:
Create a custom class or struct containing the Name and Link and use that as the key in the dictionary, and put the object as the value.
Store the entire object as the key and provide a custom equality comparer that only looks at the Name and Link member, and set the value to null.
The second method is very similar to how you would use a HashSet if it were available.
How about this:
Custom class/collection wich will held List of objects and two dictionaries, one for the name and one for the link. Both of them will have a int value wich will be the index of object. I think that in that case I will only need to check if there is such int value of name dictionary that equals link dictionary int.
Is this a good approach?

Categories