I just installed membase and the enyim client for .NET, and came across an article that mentions this technique for integrating linq:
public static IEnumerable<T> CachedQuery<T>
(this IQueryable<T> query, MembaseClient cache, string key) where T : class
{
object result;
if (cache.TryGet(key, out result))
{
return (IEnumerable<T>)result;
}
else
{
IEnumerable<T> items = query.ToList();
cache.Store(StoreMode.Set, key, items);
return items;
}
}
It will check if the required data is in cache first, and if not cache it then return it.
Currently I am using a Dictionary<'String, List'> in my application and want to replace this with a membase/memcached type approach.
What about a similar pattern for adding items to a List<'T'> or using Linq operators on a cached list? It seems to me that it could be a bad idea to store an entire List<'T'> in cache under a single key and have to retrieve it, add to it, and then re-set it each time you want to add an element. Or is this an acceptable practice?
public bool Add(T item)
{
object list;
if (cache.TryGet(this.Key, out list))
{
var _list = list as List<T>;
_list.Add(item);
return cache.Store(StoreMode.Set, this.Key, _list);
}
else
{
var _list = new List<T>(new T[] { item });
return cache.Store(StoreMode.Set, this.Key, _list);
}
}
How are collections usually handled in a caching situation like this? Are hashing algorithms usually used instead, or some sort of key-prefixing system to identify 'Lists' of type T within the key-value store of the cache?
It depends on several factors:
Is this supposed to be scalable? Is this list user-specific and you can be certain that "Add" won't be called twice at the same time for the same list? - Race conditions are a risk.
I did implement such a thing where I stored a generic list in membase, but it's user-specific, so I can be pretty certain that there will be no race condition.
You should also consider the volume of the serialized list, which may be large. I my case the lists were pretty small.
Not sure if it helps, but I implemented a very basic iterateable list with random access over membase (via double indirection). Random access is done via a composite key (which is composed of several fields).
You need to:
Have a key that holds the list's length.
Have the ability to build the composite key (e.g one or more fields from your object).
Have the value that you'd like save (e.g. another field).
E.g:
list_length = 3
prefix1_0-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
prefix1_1-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
prefix1_2-> prefix2_[field1.value][field2.value][field3.value] -> field4.value
To perform serial access you iterate over the keys with "prefix1". To perform random access you use the keys with "prefix2" ans the fields that compose the key.
I hope it's clear enough.
Related
I am struggling to solve this issue and have searched multiple ways and cannot seem to find an answer. I inherited this app from someone else and need to add a couple features to the app. I have not worked much with dictionaries and linq before, so I have been searching and trying to gain knowledge to do what I need to do.
There is a class with the following properties(removed some properties not necessary for this discussion):
class EmailRecord
{
public Dictionary<string, List<string>> emails = new Dictionary<string, List<string>>();
public string RecordID { get; set; }
[followed by additional properties and constructors...]
When the objects are created, the emails Property would have a template string in the key, and a list of strings containing email addresses in the values. For my purposes, I do not need to know what is in the key.
I have a list of EmailRecord objects called allRecords. I need to query allRecords to get a list of all EmailRecord objects where the emails dictionary property's list of values contains a specific email address I have stored in a variable called recipientEmail. The key doesn't matter, and it doesn't matter how many times the email shows up. I just need the instance of the object included in the results if the email shows up anywhere in the values of the emails property. In an instance of EmailRecord, the emails dictionary property may have two keys and within each of those keys, multiple emails in a list of strings for the value. I don't need to limit to a specific key, I just need to know if an email exists anywhere within the list of email strings anywhere in that dictionary.
I've tried a few things, with the latest being this (which doesn't work):
var results = EmailRecords
.SelectMany(x => x.emails)
.Where(x => x.Value.Contains(recipientEmail));
The above just seems to be returning the dictionary property, not the entire object.
I want to be able to loop through the results with something like this:
foreach (EmailRecord foundRecord in results) {
...do work here
}
Any thoughts or suggestions to assist me as I am trying to learn Linq? Thank you in advance for any help you can provide.
If you want to loop through EmailRecord objects which one of its emails property values contains recipientEmail, then you need to have a list of EmailRecord first. Then search throught them. following should do the trick.
List<EmailRecord> EmailRecords = new List<EmailRecord>();
//Fill the EmailRecords somewhere
foreach (KeyValuePair<string, List<string>> emailfoundRecord in
EmailRecords.emails.Where(x => x.Value.Contains(recipientEmail)))
{
//do work here
}
When you call EmailRecords.SelectMany(x => x.Emails) what you get back is an IEnumerable<KeyValuePair<string, List<string>>> or similar. Obviously this is not what you're after for your result since it strips away all that other information.
With LINQ the first thing to consider at each stage is what you are expecting to get out of the query. In this case the query results should be an enumeration of EmailRecord instances which is also what we're feeding in. Filtering that list is most simply done with the Where method, so that's where you should do all the work
Next decide on your filter criteria and write the filter predicate to suit. For any given EmailRecord we want to find out if any of the dictionary entries contains a particular email address. Since the dictionary values are lists we'll use Contains to do the actual comparison, and Any to test the dictionary itself.
Which looks like this:
var filtered = EmailRecords.Where(e =>
e.Emails.Any(kv =>
kv.Value.Contains(recipientEmail)
)
);
This works because a dictionary is also an enumerable, with each entry in the enumeration being a key/value pair.
Using Any will stop when it finds a single matching entry instead of continuing to the end of the Emails dictionary for every EmailRecord instance. If there are a lot of emails and you're expecting a high number of selections then this might save some time. Probably not however, since generally this sort of structure doesn't have a lot of duplicate email addresses in it.
Depending on how often you want to do this however it might be quicker to build a lookup and query that. Assuming that your EmailRecords list changes infrequently and you are doing a lot of this sort of lookup, you could get a large speedup.
I'll use a Dictionary<string, EmailRecord[]> for the lookup because it's (fairly) simple to build once we get a list of all of the pairs of email address and EmailRecord objects:
var emailReferences = EmailRecords.SelectMany(e =>
e.Emails.SelectMany(kv =>
kv.Value.Select(v =>
new { address = v, record = e }
)
)
);
var lookup =
emailReferences
.GroupBy(i => i.address, i => i.record)
.ToDictionary(g => g.Key, g => g.ToArray());
;
From this you will be able to locate an email address and get its referencing EmailRecord instances fairly simply and quickly:
EmailRecord[] filtered = null;
lookup.TryGetValue(recipientEmail, out filtered);
This will be faster per lookup than the LINQ equivalent above, but the setup could consume a fair amount of time and memory for large lists. If you have small or frequently changing lists (since the lookup has to be regenerated or at least invalidated at each change) then this won't improve your program's speed.
As a completely unsolicited aside, here's an extension method I use when dealing with dictionaries that have List<> as the value:
public static partial class extensions
{
public static Dictionary<TKey, List<TElem>> Add<TKey, TElem>(this Dictionary<TKey, List<TElem>> dict, TKey key, TElem value)
{
List<TElem> list;
if (dict.ContainsKey(key))
list = dict[key];
else
dict[key] = list = new List<TElem>();
list.Add(value);
return dict;
}
}
It helps make your adds simpler and easier to read.
I'm trying to implement A* and I ran into a problem. I have a set where I need to find the minimum value of a given function, but I also need to be able to check if a given cell is in that set. In order to do this efficiently, I need the set to be sorted by position and value. It doesn't seem too difficult to write such a data structure. I just need one sorted by position and one by value, and have each refer to the other. There are two problems with this. First, in order to do it well, the structures need to be able to refer to parts of each other. There's no point in searching through a tree in log time if I can just point to the particular element. In order to do this, I'd pretty much need to rewrite trees from scratch. Second, it doesn't seem like the sort of thing I should be writing. Data structures are supposed to be part of the libraries. What's the name of the sort of data structure I need, and where can I find a C# library for it?
There is no need for the two data structures to interact at all. Just have two data structures side by side. Make sure that when you add/remove an item you add/remove it from both. You can then fetch the minimum value of either collection based on which property you're interested in.
The only real reason to create a new data structure would be to ensure that adding/removing items was kept in sync between the two collections. There would be no need to manipulate the actual trees explicitly.
Such a custom type would look something like this (other operations omitted; they all just delegate to first and/or second).
public class SetPair<T>
{
private SortedSet<T> first;
private SortedSet<T> second;
public SetPair(IComparer<T> firstComparer, IComparer<T> secondComparer)
{
first = new SortedSet<T>(firstComparer ?? Comparer<T>.Default);
second = new SortedSet<T>(secondComparer ?? Comparer<T>.Default);
}
public T FirstMin { get { return first.Min; } }
public T SecondMin { get { return second.Min; } }
public bool Add(T item)
{
return first.Add(item) &&
second.Add(item);
}
public bool Remove(T item)
{
return first.Remove(item) &&
second.Remove(item);
}
}
I am building webservices for many different clients to connect to a database of automotive parts. This parts have a wide variety of properties. Different clients will need different subsets of properties to 'do their thing.'
All clients will need at least an ID, a part number, and a name. Some might need prices, some might need URL's to images, etc. etc. The next client might be written years from now and require yet a different subset of properties. I'd rather not send more than they need.
I have been building separate 'PartDTO's' with subsets of properties for each of these requirements, and serving them up as separate webservice methods to return the same list of parts but with different properties for each one. Rather than build this up for each client and come up with logical names for the DTO's and methods, I'd like a way for the client to specify what they want. I'm returning JSON, so I was thinking about the client passing me a JSON object listing the properties they want in the result-set:
ret = { ImageUrl: true, RetailPrice: true, ... }
First off, does this make sense?
Second, What I'd rather not lose here is the nice syntax to return an IEnumerable < DTO > and let the JSON tools serialize it. I could certainly build up a 'JSON' string and return that, but that seems pretty kludgey.
Suggestions? C# 'dynamic'?
This is a very good candidate for the Entity-Attribute-Value model. Basically you have a table of ID, Name, Value and you allow each customer/facet to store whatever they want... Then when they query you return their name-value pairs and let them use them as they please.
PROS: super flexible. Good for situations where a strong schema adds tons of complexity vs value. Single endpoint for multiple clients.
CONS: Generally disliked pattern, very hard to select from efficiently and also hard to index. However, if all you do is store and return collections of name-value, it should be fine.
I ended up going the dictionary-route. I defined a base class:
public abstract DictionaryAsDTO<T> : IReadOnlyDictionary<string, object>
{
protected DictionaryAsDTO(T t, string listOfProperties)
{
// Populate an internal dictionary with subset of t's props based on string
}
}
Then a DTO for Part like so:
public PartDTO : DictionaryAsDTO<Part>
{
public PartDTO(Part p, string listOfProperties) : base(p, listOfProperties) {}
// Override method to populate base's dictionary with Part properties based on
// listOfProperties
}
Then I wrote a JSON.NET converter for DictionaryAsDTO which emits JSON-y object-properties instead of key-value-pairs.
The web service builds an IEnumerable based on queries that return IEnumerable and serializes them.
Viola!
I'm building a repository with caching using spring.net. Can I update/add/delete one item in the cached list without having to rebuild the whole list?
Looking at the documentation and the example project from their site they always clear the cache whenever they update/add/delete one item. Therefore as long as you only read an object or the list of objects the caching works well but it feels stupid having to rebuild the whole cache just because I change one item?
Example:
// Cache per item and a list of items
[CacheResult("DefaultCache", "'AllMovies'", TimeToLive = "2m")]
[CacheResultItems("DefaultCache", "'Movie-' + ID")]
public IEnumerable<Movie> FindAll()
{
return movies.Values;
}
// Update or add an item invalidating the list of objects
[InvalidateCache("DefaultCache", Keys = "'AllMovies'")]
public void Save([CacheParameter("DefaultCache", "'Movie-' + ID")]Movie movie)
{
if (this.movies.ContainsKey(movie.ID))
{
this.movies[movie.ID] = movie;
}
else
{
this.movies.Add(movie.ID, movie);
}
}
Having mutable things stored in the cache seems to me a fountain of horrible side effects. Imho that is what you would need if you want to add/remove entries from a cached list.
The implementation of CacheResultAdvice and InvalidateCacheAdvice allows to store and invalidate an object (key) -> object (value) combination. You could add another layer and retrieve movie per movie but I think that it is just a case of premature optimization (with the opposite effect).
CacheResultAdvice
InvalidateCacheAdvice
Edit:
Btw. if you use a mature ORM look for integrated level2 caching, if you want to avoid hitting the db server: http://www.klopfenstein.net/lorenz.aspx/using-syscache-as-secondary-cache-in-nhibernate
The MSDN explains Lookup like this:
A Lookup<TKey, TElement>
resembles a Dictionary<TKey,
TValue>. The difference is that a
Dictionary<TKey, TValue> maps keys to single values, whereas a
Lookup<TKey, TElement> maps keys to collections of values.
I don't find that explanation particularly helpful. What is Lookup used for?
It's a cross between an IGrouping and a dictionary. It lets you group items together by a key, but then access them via that key in an efficient manner (rather than just iterating over them all, which is what GroupBy lets you do).
For example, you could take a load of .NET types and build a lookup by namespace... then get to all the types in a particular namespace very easily:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml;
public class Test
{
static void Main()
{
// Just types covering some different assemblies
Type[] sampleTypes = new[] { typeof(List<>), typeof(string),
typeof(Enumerable), typeof(XmlReader) };
// All the types in those assemblies
IEnumerable<Type> allTypes = sampleTypes.Select(t => t.Assembly)
.SelectMany(a => a.GetTypes());
// Grouped by namespace, but indexable
ILookup<string, Type> lookup = allTypes.ToLookup(t => t.Namespace);
foreach (Type type in lookup["System"])
{
Console.WriteLine("{0}: {1}",
type.FullName, type.Assembly.GetName().Name);
}
}
}
(I'd normally use var for most of these declarations, in normal code.)
One way to think about it is this: Lookup<TKey, TElement> is similar to Dictionary<TKey, Collection<TElement>>. Basically a list of zero or more elements can be returned via the same key.
namespace LookupSample
{
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
List<string> names = new List<string>();
names.Add("Smith");
names.Add("Stevenson");
names.Add("Jones");
ILookup<char, string> namesByInitial = names.ToLookup((n) => n[0]);
// count the names
Console.WriteLine("J's: {0}", namesByInitial['J'].Count()); // 1
Console.WriteLine("S's: {0}", namesByInitial['S'].Count()); // 2
Console.WriteLine("Z's: {0}", namesByInitial['Z'].Count()); // 0, does not throw
}
}
}
One use of Lookup could be to reverse a Dictionary.
Suppose you have a phonebook implemented as a Dictionary with a bunch of (unique) names as keys, each name associated with a phone number. But two people with different names might share the same phone number. This isn't a problem for a Dictionary, which doesn't care that two keys correspond to the same value.
Now you want a way of looking up who a given phone number belongs to. You build a Lookup, adding all the KeyValuePairs from your Dictionary, but backwards, with the value as the key and the key as the value. You can now query a phone number, and obtain a list of names of all the people whose phone number that is. Building a Dictionary with the same data would drop data (or fail, depending on how you did it), since doing
dictionary["555-6593"] = "Dr. Emmett Brown";
dictionary["555-6593"] = "Marty McFly";
means that the second entry overwrites the first - the Doc is no longer listed.
Trying to write the same data in a slightly different way:
dictionary.Add("555-6593", "Dr. Emmett Brown");
dictionary.Add("555-6593", "Marty McFly");
would throw an exception on the second line since you can't Add a key which is already in the Dictionary.
[Of course, you might want to use some other single data structure to do lookups in both directions, etc. This example means that you have to regenerate the Lookup from the Dictionary each time the latter changes. But for some data it could be the right solution.]
I haven't successfully used it before, but here is my go:
A Lookup<TKey, TElement> would behave pretty much like a (relational) database index on a table without a unique constraint. Use it in the same places you would use the other.
I guess you could argue it this way: imagine you're creating a data structure to hold the contents of a phone book. You want to key by lastName and then by firstName. Using a dictionary here would be dangerous because many people can have the same name. So a Dictionary will always, at most, map to a single value.
A Lookup will map to potentially several values.
Lookup["Smith"]["John"] will be a collection of size one billion.
Supplement more:
ToLookup is immediate execution, and will cache the result in memory. But GroupBy is deferred execution, will not cache grouped result, it will regroup when you called it each time.
If you need to repeatedly access a "grouped fixed data", you should choose ToLookUp to get a LookUp Instance. Especially when the amount of data is large or access data many times, using GroupBy will cause serious performance problems - ToLookUp at the cost of using more memory, the cached grouping results will give your code better performance.
BTW: LookUp can sometimes be used as a "EasyDictionary", because it doesn't throw exceptions on non existed key.