Efficient Hierarchal Linq query using multiple properties - c#

I have a fairly large collection of foo { int id, int parentid, string name}.
I am looking to collect a list of foo objects where the object has a name of "bar3", and is a child of an object named "bar2", which is a child of an object with an ID of 1.
What sort of collection should I use (I've been playing with lookups and dictionaries with not a whole lot of success) and how should I write this to make an efficient function out of this? There are approximately 30K foo objects and my method is choking to death.
Thanks!

If I really had to stick with this layout for foo, and I really had to make lookups as fast as possible (I don't care about memory size, and will be reusing the same objects repeatedly, so the cost of setting up a set of large structures in memory would be worth it), then I would do:
var byNameAndParentLookup = fooSource.ToLookup(f => Tuple.Create(f.parentid, f.name)); //will reuse this repeatedly
var results = byNameAndParentLookup[Tuple.Create(1, "bar2")].SelectMany(f => byNameAndParentLookup[Tuple.Create(f.id, "bar3")]);
That said, if I was going to store tree data in memory, I'd prefer to create a tree-structure, where each foo had a children collection (perhaps a dictionary keyed on name).
Edit: To explain a bit.
fooSource.ToLookup(f => Tuple.Create(f.parentid, f.name))
Goes through all the items in fooSource (wherever our foo objects are coming from), and for each one creates a tuple of the parentid and the name. This is used as a key for a lookup, so for each parentid-name combination we can retrieve 0 or more foo objects with that combo. (This will use the default string comparison, if you want something else such as case-insensitive, create an IEqualityComparer<Tuple<int, string>> implementation that does the comparison you want and use .ToLookup(f => Tuple.Create(f.parentid, f.name), new MyTupleComparer())).
The second line can be broken down into:
var partWayResults = byNameAndParentLookup[Tuple.Create(1, "bar2")];
var results = partWayResults.SelectMany(f => byNameAndParentLookup[Tuple.Create(f.id, "bar3")]);
The first line simply does a search on our lookup, so it returns an enumeration of those foo objects which have a parentid of 1 and a name of "bar2".
SelectMany takes each item of an enumeration or queryable, and computes an expression that returns an enumeration, which is then flattened into a single enumeration.
In other words, it works a bit like this:
public static SelectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> func)
{
foreach(TSource item in source)
foreach(TResult producedItem in func(item))
yield return producedItem;
}
In our case, the expression passed through takes the id of the element found in the first lookup, and then looks for any elements that have that as their parentid and have the name "bar2".
Hence, for every item with parentid 1 and name bar2, we find every item with that first item's id as its parentid and the name bar3. Which is what was wanted.

Check this out: QuickGraph
I've never actually used it but it seems well documented.
Alternatively you can try the C5 Generic Collection Library
I got this from this tread

I can suggest you to group all items by parentId first then apply conditions on it. First you will need to find group with bar1 element, than you should select all its childs and try to find element with name bar 2...
I can suggest such solution, it not the best but it work (thirdLevelElements will contain needed elements). I've used foreachs to make it clear, this logic could be written in linq statements but for me it will be complicated to understand.
var items = new[]
{
new Foo{id=1,parentid = 0, name="bar1"},
new Foo{id=2,parentid = 1, name="bar2"},
new Foo{id=3,parentid = 2, name="bar3"},
new Foo{id=4,parentid = 0, name="bar12"},
new Foo{id=5,parentid = 1, name="bar13"},
new Foo{id=6,parentid = 2, name="bar14"},
new Foo{id=7,parentid = 2, name="bar3"}
};
var groups = items.GroupBy(item => item.parentid).ToList();
var firstLevelElements = items.Where(item => item.name == "bar1");
List<Foo> secondLevelElements = new List<Foo>();
foreach (var firstLevelElement in firstLevelElements)
{
secondLevelElements.AddRange(groups[firstLevelElement.id]
.Where(item => item.name == "bar2"));
}
List<Foo> thirdLevelElements = new List<Foo>();
foreach (var secondLevelElement in secondLevelElements)
{
thirdLevelElements.AddRange(groups[secondLevelElement.id]
.Where(item => item.name == "bar3"));
}

Related

Create dictionary outside and initialize it using LINQ

I have dictionary indices and want to add several keys to it from another dictionary using LINQ.
var indices = new Dictionary<string, int>();
var source = new Dictionary<string, int> { { "1", 1 }, { "2", 2 } };
source.Select(name => indices[name.Key] = 0); // doesn't work
var res = indices.Count; // returns 0
Then I replace Select with Min and everything works as expected, LINQ creates new keys in my dictionary.
source.Min(name => indices[name.Key] = 0); // works!!!
var res = indices.Count; // returns 2
Question
All I want to do is to initialize dictionary without foreach. Why dictionary keys disappear when LINQ is executed? What iterator or aggregator I could use instead of Min to create keys for a dictionary declared outside of LINQ query?
Update #1
Decided to go with System.Interactive extension.
Update #2
I appreciate and upvote all answers, but need to clarify that, purpose of the question is not to copy a dictionary, but to execute some code in a LINQ query. To add more sense to it, I actually have hierarchical structure of classes with dictionaries and at some point they need to be synchronized, so I want to create flat, non-hierarchical dictionary, used for tracking, that includes all hierarchical keys.
class Account
{
Dictionary<string, User> Users;
}
class User
{
Dictionary<string, Activity> Activities;
}
class Activity
{
string Name;
DateTime Time;
}
Now I want to sync all actions by time, so I need a tracker that will help me to align all actions by time, and I don't want to create 3 loops for Account, User, and Activity. Because that would be considered a hierarchical hell of loops, the same as async or callback hell. With LINQ I don't have to create loop inside loop, inside loop, etc.
Accounts.ForEach(
account => account.Value.Users.ForEach(
user => user.Value.Activities.ForEach(
activity => indices[account.Key + user.Key + activity.Key] = 0));
Also, having loops where it can be replaced with LINQ can be considered as a code smell, not my opinion, but I totally agree, because having too many loops you will probably end up in duplicated code.
https://jasonneylon.wordpress.com/2010/02/23/refactoring-to-linq-part-1-death-to-the-foreach/
You can say that LINQ is used for querying, not for setting a variable, I would say I'm querying ... the KEYS.
Linq is not intended to be used to mutate the elements of a sequence. Rather, it is intended to be used to traverse, filter and project elements of a sequence. In this respect, it is intended to be used more in a "functional programming" style.
As you have discovered, Linq can be used in other than a functional programming style - but by using it in that way you are really misusing it.
Technically, the reason that source.Min() has the effect you were looking for is that it has to visit each of the elements of your sequence in order to determine the minimum element.
Because your selector for Min() has a side-effect (i.e. indices[name.Key] = 0) then a side-effect of finding the minimum value is to add each element's key to indices, but with a value of zero rather than the original value.
(I suspect you might have meant to put indices[name.Key] = name.Value...)
The reason that your use of Select() has no effect is that it has not been used to traverse the sequence - it uses "deferred execution".
You can force it to traverse the sequence by counting the elements, like so:
source.Select(name => indices[name.Key] = 0).Count();
However, that is also counter-intuitive and is a misuse of Linq.
The correct solution is to use foreach. This expresses your intent clearly and unambiguously.
An alternative approach is to write an AddRange() extension method for Dictionary like so:
public static class DictionaryExt
{
public static Dictionary<TKey, TValue> AddRange<TKey, TValue>(
this Dictionary<TKey, TValue> self,
IEnumerable<KeyValuePair<TKey, TValue>> items)
{
foreach (var item in items)
{
self[item.Key] = item.Value;
}
return self;
}
}
Then you can just call indices.AddRange(source); to achieve your aim.
Interestingly, the ImmutableDictionary type does already have an AddRange() method that you could use like so:
var indices = ImmutableDictionary.Create<string, int>();
var source = new Dictionary<string, int> { { "1", 1 }, { "2", 2 } };
indices = indices.AddRange(source);
Console.WriteLine(indices.Count);
But I wouldn't recommend you change over to using ImmutableDictionary just so you can use its AddRange().
Also note that ImmutableDictionary is, well, immutable - so you can't just do indices.AddRange(source);; you have to assign the result back as in indices = indices.AddRange(source); (like when you modify a string using ToUpper()).
You wrote:
All I want to do is to initialize dictionary without foreach
Do you want to replace the values in your indices dictionary with the values in source? Use Enumerable.ToDictionary
indices = (KeyValuePair<string, int>)source // regard the items in the dictionary as KeyValuePairs
.ToDictionary(pair => pair.Key, // the key is the key from original dictionary
pair => pair.Value); // the value is the value from the original
Or do you want to add the values from source to the already existing values in indices? If you don't want a foreach you'll have to take the current values from both dictionaries and Concat them to the values from source. Then use the ToDictionary to create a new Dictionary.
indices = (KeyValuePair<string, int>) indices
.Concat(KeyValuePair<string, int>) source)
.ToDictionary(... etc)
However this would be a waste of processing power.
Consider creating extension functions for Dictionary. See Extension Methods Demystified
public static Dictionary<TKey, TValue> Copy>Tkey, TValue>(
this Dictionary<TKey, TValue> source)
{
return source.ToDictionary(x => x.Key, x => x.Value);
}
public static void AddRange<TKey, TValue>(
this Dictionary<TKey, TValue> destination,
Dictionary<TKey, TValue> source)
{
foreach (var keyValuePair in source)
{
destination.Add(keyValuePair.Key, keyValuePair.Value);
// TODO: decide what to do if Key already in Destination
}
}
Usage:
// initialize:
var indices = source.Copy();
// add values:
indices.AddRange(otherDictionary);

How to swap the data source associated with a Linq query?

I have read in other answers how the paramters of a linq query can be changed at runtime. But is it possible to change the data source that the query browses through after the query has been created? Can perhaps the query be given an empty wrapper in which data sources can be plugged in and out, unbeknownst to the query?
For example, let's assume this situation:
// d1 is a dictionary
var keys = from entry in d1
where entry.Value < 5
select entry.Key;
Now, let's assume rather than modifying d1 I want myQuery to stay the same, except I'd like it to process an entirely new dictionary, d2.
The reason for this is that I'd like to decouple who provides the query from who provides the data source. I.e. think the dictionaries as metadata associated with services and the query as the mean for the consumers of services to discover which set of services match their criteria. I need to apply the same query to the metadata of each service.
I guess one parallel that I'm (maybe erroneously) making is with Regular Expression: one can compile a regular expression and then apply it to any string. I'd like to do the same with queries and dictionaries.
A LINQ query is really nothing other than a method call. As such, you can't "reassign" the object to which you've already called a method after it's been setup.
You could simulate this by creating a wrapper class that would allow you to change what actually gets enumerated, ie:
public class MutatingSource<T> : IEnumerable<T>
{
public MutatingSource(IEnumerable<T> originalSource)
{
this.Source = originalSource;
}
public IEnumerable<T> Source { get; set; }
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return Source.GetEnumerator();
}
}
This would allow you to create the query, then "change your mind" after the fact, ie:
var someStrings = new List<string> { "One", "Two", "Three" };
var source = new MutatingSource<string>(someStrings);
// Build the query
var query = source.Where(i => i.Length < 4);
source.Source = new[] {"Foo", "Bar", "Baz", "Uh oh"};
foreach(var item in query)
Console.WriteLine(item);
This will print Foo, Bar, and Baz (from the "changed" source items).
Edit in response to comments/edit:
I guess one parallel that I'm (maybe erroneously) making is with Regular Expression: one can compile a regular expression and then apply it to any string. I'd like to do the same with queries and dictionaries.
A query is not like a regular expression, in this case. The query is translated directly into a series of method calls, which will only work against the underlying source. As such, you can't change that source (there isn't a "query object", just the return value of a method call).
A better approach would be to move the query into a method, ie:
IEnumerable<TKey> QueryDictionary<TKey,TValue>(IDictionary<TKey,TValue> dictionary)
{
var keys = from entry in dictionary
where entry.Value < 5
select entry.Key;
return keys;
}
You could then use this as needed:
var query1 = QueryDictionary(d1);
var query2 = QueryDictionary(d2);

Find any entities that contain any one string from a list

I can't quite get my head around this one for some reason.
Say we have a class Foo
public class Foo
{
public string Name {get;set;}
}
And we have a generic list of them. I want to search through the generic list and pick out those that have a Name that contains any from a list of strings.
So something like
var source = GetListOfFoos();//assume a collection of Foo objects
var keywords = GetListOfKeyWords();//assume list/array of strings
var temp = new List<Foo>();
foreach(var keyword in keywords)
{
temp.AddRange(source.Where(x => x.Name.Contains(keyword));
}
This issue here being a) the loop (doesn't feel optimal to me) and b) each object might appear more than once (if the name was 'Rob StackOverflow' and there was a keyword 'Rob' and keyword 'Stackoverflow').
I guess I could call Distinct() but again, it just doesn't feel optimal.
I think I'm approaching this incorrectly - what am I doing wrong?
I want to search through the generic list and pick out those that have
a Name that contains any from a list of strings.
Sounds rather easy:
var query = source.Where(e => keywords.Any(k => e.Name.Contains(k)));
Add ToList() to get results as a List<Foo>:
var temp = query.ToList();
Put the keywords into a HashSet for fast lookup, so that you're not doing a N2 loop.
HashSet<string> keywords = new HashSet<string>(GetListOfKeyWords(), StringComparer.InvariantCultureIgnoreCase);
var query = source.Where(x => keywords.Contains(x.Name));
EDIT: Actually, I re-read the question, and was wrong. This will only match the entire keyword, not see if the Name contains the keyword. Working on a better fix.
I like MarcinJuraszek's answer, but I would also assume you want case-insensitive matching of the keywords, so I'd try something like this:
var query = source.Where(f => keywords.Any(k => f.Name.IndexOf(k, StringComparison.OrdinalIgnoreCase) >= 0));

Remove items from list 1 not in list 2

I am learning to write lambda expressions, and I need help on how to remove all elements from a list which are not in another list.
var list = new List<int> {1, 2, 2, 4, 5};
var list2 = new List<int> { 4, 5 };
// Remove all list items not in List2
// new List Should contain {4,5}
// The lambda expression is the Predicate.
list.RemoveAll(item => item. /*solution expression here*/ );
// Display results.
foreach (int i in list)
{
Console.WriteLine(i);
}
You can do this via RemoveAll using Contains:
list.RemoveAll( item => !list2.Contains(item));
Alternatively, if you just want the intersection, using Enumerable.Intersect would be more efficient:
list = list.Intersect(list2).ToList();
The difference is, in the latter case, you will not get duplicate entries. For example, if list2 contained 2, in the first case, you'd get {2,2,4,5}, in the second, you'd get {2,4,5}.
Solution for objects (maybe easier than horaces solution):
If your list contains objects, rather than scalars, it is that simple, by removing by one selected property of the objects:
var a = allActivePatientContracts.RemoveAll(x => !allPatients.Select(y => y.Id).Contains(x.PatientId));
list = list.Except(list2).ToList();
This question has been marked as answered, but there is a catch. If your list contains an object, rather than a scalar, you need to do a bit more work.
I tried this over and over with Remove() and RemoveAt() and all sorts of things and none of them worked correctly. I couldn't even get a Contains() to work correctly. Never matched anything. I was stumped until I got the suspicion that maybe it could not match up the item correctly.
When I realized this, I refactored the item class to implement IEquatable, and then it started working.
Here is my solution:
class GenericLookupE : IEquatable<GenericLookupE>
{
public string ID { get; set; }
public bool Equals( GenericLookupE other )
{
if ( this.ID == other.ID ) return true;
return false;
}
}
After I did this, the above RemoveAll() answer by Reed Copsey worked perfectly for me.
See: http://msdn.microsoft.com/en-us/library/bhkz42b3.aspx

Union two List in C#

I want to union, merge in a List that contains both references, so this is my code, how can I define a list ready for this porpouses?
if (e.CommandName == "AddtoSelected")
{
List<DetalleCita> lstAux = new List<DetalleCita>();
foreach (GridViewRow row in this.dgvEstudios.Rows)
{
var GridData = GetValues(row);
var GridData2 = GetValues(row);
IList AftList2 = GridData2.Values.Where(r => r != null).ToList();
AftList2.Cast<DetalleCita>();
chkEstudio = dgvEstudios.Rows[index].FindControl("ChkAsignar") as CheckBox;
if (chkEstudio.Checked)
{
IList AftList = GridData.Values.Where(r => r != null).ToList();
lstAux.Add(
new DetalleCita
{
codigoclase = Convert.ToInt32(AftList[0]),
nombreestudio = AftList[1].ToString(),
precioestudio = Convert.ToDouble(AftList[2]),
horacita = dt,
codigoestudio = AftList[4].ToString()
});
}
index++;
//this line to merge
lstAux.ToList().AddRange(AftList2);
}
dgvEstudios.DataSource = lstAux;
dgvEstudios.DataBind();
}
this is inside a rowcommand event.
If you want to add all entries from AftList2 to lstAux you should define AftList2 as IEnumerable<> with elements of type DetalleCita (being IEnumerable<DetalleCita> is enough to be used as parameter of AddRange() on List<DetalleCita>). For example like this:
var AftList2 = GridData2.Values.Where(r => r != null).Cast<DetalleCita>();
And then you can add all its elements to lstAux:
lstAux.AddRange(AftList2);
Clarification:
I think you are misunderstanding what extension method ToList() does. It creates new list from IEnumerable<T> and its result is not connected with original IEnumerable<T> that it is applied to.
That is why you are just do nothing useful trying to do list.ToList().AddRange(...) - you are copying list to (another newly created by ToList()) list, update it and then basically throwing away it (because you are not even doing something like list2 = var1.ToList(), original var1 stays unchanged after that!!! you most likely want to save result of ToList() if you are calling it).
Also you don't usually need to convert one list to another list, ToList() is useful when you need list (List<T>) but have IEnumerable<T> (that is not indexable and you may need fast access by index, or lazy evaluates but you need all results calculated at this time -- both situations may arise while trying to use result of LINQ to objects query for example: IEnumerable<int> ints = from i in anotherInts where i > 20 select i; -- even if anotherInts was List<int> result of query ints cannot be cast to List<int> because it is not list but implementation of IEnumerable<int>. In this case you could use ToList() to get list anyway: List<int> ints = (from i in anotherInts where i > 20 select i).ToList();).
UPDATE:
If you really mean union semantics (e.g. for { 1, 2 } and { 1, 3 } union would be something like { 1, 2, 3 }, with no duplication of equal elements from two collections) consider switching to HashSet<T> (it most likely available in your situation 'cause you are using C# 3.0 and I suppose yoou have recent .NET framework) or use Union() extension method instead of AddRange (I don't think this is better than first solution and be careful because it works more like ToList() -- a.Union(b) return new collection and does NOT updates either a or b).

Categories