Using lambda expressions to get a subset where array elements are equal - c#

I have an interesting problem, and I can't seem to figure out the lambda expression to make this work.
I have the following code:
List<string[]> list = GetSomeData(); // Returns large number of string[]'s
List<string[]> list2 = GetSomeData2(); // similar data, but smaller subset
List<string[]> newList = list.FindAll(predicate(string[] line){
return (???);
});
I want to return only those records in list in which element 0 of each string[] is equal to one of the element 0's in list2.
list contains data like this:
"000", "Data", "more data", "etc..."
list2 contains data like this:
"000", "different data", "even more different data"
Fundamentally, i could write this code like this:
List<string[]> newList = new List<string[]>();
foreach(var e in list)
{
foreach(var e2 in list2)
{
if (e[0] == e2[0])
newList.Add(e);
}
}
return newList;
But, i'm trying to use generics and lambda's more, so i'm looking for a nice clean solution. This one is frustrating me though.. maybe a Find inside of a Find?
EDIT:
Marc's answer below lead me to experiment with a varation that looks like this:
var z = list.Where(x => list2.Select(y => y[0]).Contains(x[0])).ToList();
I'm not sure how efficent this is, but it works and is sufficiently succinct. Anyone else have any suggestions?

You could join? I'd use two steps myself, though:
var keys = new HashSet<string>(list2.Select(x => x[0]));
var data = list.Where(x => keys.Contains(x[0]));
If you only have .NET 2.0, then either install LINQBridge and use the above (or similar with a Dictionary<> if LINQBridge doesn't include HashSet<>), or perhaps use nested Find:
var data = list.FindAll(arr => list2.Find(arr2 => arr2[0] == arr[0]) != null);
note though that the Find approach is O(n*m), where-as the HashSet<> approach is O(n+m)...

You could use the Intersect extension method in System.Linq, but you would need to provide an IEqualityComparer to do the work.
static void Main(string[] args)
{
List<string[]> data1 = new List<string[]>();
List<string[]> data2 = new List<string[]>();
var result = data1.Intersect(data2, new Comparer());
}
class Comparer : IEqualityComparer<string[]>
{
#region IEqualityComparer<string[]> Members
bool IEqualityComparer<string[]>.Equals(string[] x, string[] y)
{
return x[0] == y[0];
}
int IEqualityComparer<string[]>.GetHashCode(string[] obj)
{
return obj.GetHashCode();
}
#endregion
}

Intersect may work for you.
Intersect finds all the items that are in both lists.
Ok re-read the question. Intersect doesn't take the order into account.
I have written a slightly more complex linq expression that will return a list of items that are in the same position (index) with the same value.
List<String> list1 = new List<String>() {"000","33", "22", "11", "111"};
List<String> list2 = new List<String>() {"000", "22", "33", "11"};
List<String> subList = list1.Select ((value, index) => new { Value = value, Index = index})
.Where(w => list2.Skip(w.Index).FirstOrDefault() == w.Value )
.Select (s => s.Value).ToList();
Result: {"000", "11"}
Explanation of the query:
Select a set of values and position of that value.
Filter that set where the item in the same position in the second list has the same value.
Select just the value (not the index as well).
Note I used:
list2.Skip(w.Index).FirstOrDefault()
//instead of
list2[w.Index]
So that it will handle lists of different lengths.
If you know the lists will be the same length or list1 will always be shorter then list2[w.Index] would probably a bit faster.

Related

Filter a list of address objects by a list of string postcodes [duplicate]

I have a list of parameters like this:
public class parameter
{
public string name {get; set;}
public string paramtype {get; set;}
public string source {get; set;}
}
IEnumerable<Parameter> parameters;
And a array of strings i want to check it against.
string[] myStrings = new string[] { "one", "two"};
I want to iterate over the parameter list and check if the source property is equal to any of the myStrings array. I can do this with nested foreach's but i would like to learn how to do it in a nicer way as i have been playing around with linq and like the extension methods on enumerable like where etc so nested foreachs just feel wrong. Is there a more elegant preferred linq/lambda/delegete way to do this.
Thanks
You could use a nested Any() for this check which is available on any Enumerable:
bool hasMatch = myStrings.Any(x => parameters.Any(y => y.source == x));
Faster performing on larger collections would be to project parameters to source and then use Intersect which internally uses a HashSet<T> so instead of O(n^2) for the first approach (the equivalent of two nested loops) you can do the check in O(n) :
bool hasMatch = parameters.Select(x => x.source)
.Intersect(myStrings)
.Any();
Also as a side comment you should capitalize your class names and property names to conform with the C# style guidelines.
Here is a sample to find if there are match elements in another list
List<int> nums1 = new List<int> { 2, 4, 6, 8, 10 };
List<int> nums2 = new List<int> { 1, 3, 6, 9, 12};
if (nums1.Any(x => nums2.Any(y => y == x)))
{
Console.WriteLine("There are equal elements");
}
else
{
Console.WriteLine("No Match Found!");
}
If both the list are too big and when we use lamda expression then it will take a long time to fetch . Better to use linq in this case to fetch parameters list:
var items = (from x in parameters
join y in myStrings on x.Source equals y
select x)
.ToList();
list1.Select(l1 => l1.Id).Intersect(list2.Select(l2 => l2.Id)).ToList();
var list1 = await _service1.GetAll();
var list2 = await _service2.GetAll();
// Create a list of Ids from list1
var list1_Ids = list1.Select(l => l.Id).ToList();
// filter list2 according to list1 Ids
var list2 = list2.Where(l => list1_Ids.Contains(l.Id)).ToList();

How do you use LINQ to combine multiple lists in to one list, but with only what is common among all lists? [duplicate]

I have a list of lists which I want to find the intersection for like this:
var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
// expected intersection is List<int>() { 3 };
Is there some way to do this with IEnumerable.Intersect()?
EDIT:
I should have been more clear on this: I really have a list of lists, I don't know how many there will be, the three lists above was just an example, what I have is actually an IEnumerable<IEnumerable<SomeClass>>
SOLUTION
Thanks for all great answers. It turned out there were four options for solving this: List+aggregate (#Marcel Gosselin), List+foreach (#JaredPar, #Gabe Moothart), HashSet+aggregate (#jesperll) and HashSet+foreach (#Tony the Pony). I did some performance testing on these solutions (varying number of lists, number of elements in each list and random number max size.
It turns out that for most situations the HashSet performs better than the List (except with large lists and small random number size, because of the nature of HashSet I guess.)
I couldn't find any real difference between the foreach method and the aggregate method (the foreach method performs slightly better.)
To me, the aggregate method is really appealing (and I'm going with that as the accepted answer) but I wouldn't say it's the most readable solution.. Thanks again all!
How about:
var intersection = listOfLists
.Skip(1)
.Aggregate(
new HashSet<T>(listOfLists.First()),
(h, e) => { h.IntersectWith(e); return h; }
);
That way it's optimized by using the same HashSet throughout and still in a single statement. Just make sure that the listOfLists always contains at least one list.
You can indeed use Intersect twice. However, I believe this will be more efficient:
HashSet<int> hashSet = new HashSet<int>(list1);
hashSet.IntersectWith(list2);
hashSet.IntersectWith(list3);
List<int> intersection = hashSet.ToList();
Not an issue with small sets of course, but if you have a lot of large sets it could be significant.
Basically Enumerable.Intersect needs to create a set on each call - if you know that you're going to be doing more set operations, you might as well keep that set around.
As ever, keep a close eye on performance vs readability - the method chaining of calling Intersect twice is very appealing.
EDIT: For the updated question:
public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = null;
foreach (var list in lists)
{
if (hashSet == null)
{
hashSet = new HashSet<T>(list);
}
else
{
hashSet.IntersectWith(list);
}
}
return hashSet == null ? new List<T>() : hashSet.ToList();
}
Or if you know it won't be empty, and that Skip will be relatively cheap:
public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = new HashSet<T>(lists.First());
foreach (var list in lists.Skip(1))
{
hashSet.IntersectWith(list);
}
return hashSet.ToList();
}
Try this, it works but I'd really like to get rid of the .ToList() in the aggregate.
var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
var intersection = listOfLists.Aggregate((previousList, nextList) => previousList.Intersect(nextList).ToList());
Update:
Following comment from #pomber, it is possible to get rid of the ToList() inside the Aggregate call and move it outside to execute it only once. I did not test for performance whether previous code is faster than the new one. The change needed is to specify the generic type parameter of the Aggregate method on the last line like below:
var intersection = listOfLists.Aggregate<IEnumerable<int>>(
(previousList, nextList) => previousList.Intersect(nextList)
).ToList();
You could do the following
var result = list1.Intersect(list2).Intersect(list3).ToList();
This is my version of the solution with an extension method that I called IntersectMany.
public static IEnumerable<TResult> IntersectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector)
{
using (var enumerator = source.GetEnumerator())
{
if(!enumerator.MoveNext())
return new TResult[0];
var ret = selector(enumerator.Current);
while (enumerator.MoveNext())
{
ret = ret.Intersect(selector(enumerator.Current));
}
return ret;
}
}
So the usage would be something like this:
var intersection = (new[] { list1, list2, list3 }).IntersectMany(l => l).ToList();
This is my one-row solution for List of List (ListOfLists) without intersect function:
var intersect = ListOfLists.SelectMany(x=>x).Distinct().Where(w=> ListOfLists.TrueForAll(t=>t.Contains(w))).ToList()
This should work for .net 4 (or later)
After searching the 'net and not really coming up with something I liked (or that worked), I slept on it and came up with this. Mine uses a class (SearchResult) which has an EmployeeId in it and that's the thing I need to be common across lists. I return all records that have an EmployeeId in every list. It's not fancy, but it's simple and easy to understand, just what I like. For small lists (my case) it should perform just fineā€”and anyone can understand it!
private List<SearchResult> GetFinalSearchResults(IEnumerable<IEnumerable<SearchResult>> lists)
{
Dictionary<int, SearchResult> oldList = new Dictionary<int, SearchResult>();
Dictionary<int, SearchResult> newList = new Dictionary<int, SearchResult>();
oldList = lists.First().ToDictionary(x => x.EmployeeId, x => x);
foreach (List<SearchResult> list in lists.Skip(1))
{
foreach (SearchResult emp in list)
{
if (oldList.Keys.Contains(emp.EmployeeId))
{
newList.Add(emp.EmployeeId, emp);
}
}
oldList = new Dictionary<int, SearchResult>(newList);
newList.Clear();
}
return oldList.Values.ToList();
}
Here's an example just using a list of ints, not a class (this was my original implementation).
static List<int> FindCommon(List<List<int>> items)
{
Dictionary<int, int> oldList = new Dictionary<int, int>();
Dictionary<int, int> newList = new Dictionary<int, int>();
oldList = items[0].ToDictionary(x => x, x => x);
foreach (List<int> list in items.Skip(1))
{
foreach (int i in list)
{
if (oldList.Keys.Contains(i))
{
newList.Add(i, i);
}
}
oldList = new Dictionary<int, int>(newList);
newList.Clear();
}
return oldList.Values.ToList();
}
This is a simple solution if your lists are all small. If you have larger lists, it's not as performing as hash set:
public static IEnumerable<T> IntersectMany<T>(this IEnumerable<IEnumerable<T>> input)
{
if (!input.Any())
return new List<T>();
return input.Aggregate(Enumerable.Intersect);
}

How do I access things returned by GroupBy

Ridiculously simple question that for the life of me I cant figure out. How do I 'get' at the values returned by GroupBy?
Take simple example below. I want to print out the first value that occurs more than once. Looking at the output in the watch window (image below) it sort of suggests that list3[0][0] might get at "one". But it gives me an error.
Note, I'm looking for the general solution - ie understanding what GroupBy returns.
Also, I would like to use the watch window to help me figure out for my self how I would access variables (as I find much of MSDN reference incomprehensible) - is this possible?
var list1 = new List<String>() {
"one", "two", "three", "one", "two"};
var list3 = list1
.GroupBy(x => x)
.Where(x => x.Count() > 1)
.ToList();
Console.WriteLine("list3[0][0]=" + list3[0][0]); //error
While the VS debugger shows you an "index" number because the underlying type is a collection, the grouping is exposed as an IGrouping<T> that does not have an indexer. If you just want the first item in the first group do:
Console.WriteLine("list3[0][0] =" + list3.First().First());
If you want to see all if the items you cam loop through the groupings:
int gi = 0, ii = 0;
foreach(var g in list3)
{
foreach(item i in g)
{
Console.WriteLine("list3[{0}][{1}] = {2}", gi, ii, i);
ii++;
}
gi++;
}
You are looking for the .Key property, as GroupBy returns an IEnumerable containing IGrouping elements.
If you look at the documentation of GroupBy you'll see it returns a IEnumerable<IGrouping<TKey, TSource>>.
IGrouping<TKey,TSource> has a single property Key and itself inherits IEnumerable<TElement>.
So you can enumerate over the list of items returned from a call to GroupBy and each element will have a Key property (which is whatever you grouped by) as well as enumerate each item (which will be the list of items grouped together)
Hopefully this demonstrates a bit clearer. Given a class:
public class Person
{
public string Name{get;set;}
public int Age{get;set;}
}
And a list:
var people = new List<Person>{
new Person{Name="Jamie",Age=35},
new Person{Name="Bob",Age=45},
new Person{Name="Fred",Age=35},
};
Grouping and enumerating as follows:
var groupedByAge = people.GroupBy(x => x.Age);
foreach(var item in groupedByAge)
{
Console.WriteLine("Age:{0}", item.Key);
foreach(var person in item)
{
Console.WriteLine("{0}",person.Name);
}
}
Gives this output:
Age:35
Jamie
Fred
Age:45
Bob
Live example: http://rextester.com/OWPR50756
GroupBy return an IEnumerable<IGrouping<TKey, TSource>> where each IGrouping<TKey, TElement> object contains a sequence of objects and a key it's not a Multidimensional Array which can be accessed by index [][].
To access the first element try this
static void Main(string[] args)
{
var list1 = new List<String>() {
"one", "two", "three", "one", "two"};
var list3 = list1
.GroupBy(x => x)
.Where(x => x.Count() > 1)
.ToList();
Console.WriteLine("list3[0][0]=" + list3[0].ToList()[0].ToString());
//OR Console.WriteLine("list3[0][0]=" + list3[0].First());
}

Intersect Two Lists in C#

I have two lists:
List<int> data1 = new List<int> {1,2,3,4,5};
List<string> data2 = new List<string>{"6","3"};
I want do to something like
var newData = data1.intersect(data2, lambda expression);
The lambda expression should return true if data1[index].ToString() == data2[index]
You need to first transform data1, in your case by calling ToString() on each element.
Use this if you want to return strings.
List<int> data1 = new List<int> {1,2,3,4,5};
List<string> data2 = new List<string>{"6","3"};
var newData = data1.Select(i => i.ToString()).Intersect(data2);
Use this if you want to return integers.
List<int> data1 = new List<int> {1,2,3,4,5};
List<string> data2 = new List<string>{"6","3"};
var newData = data1.Intersect(data2.Select(s => int.Parse(s));
Note that this will throw an exception if not all strings are numbers. So you could do the following first to check:
int temp;
if(data2.All(s => int.TryParse(s, out temp)))
{
// All data2 strings are int's
}
If you have objects, not structs (or strings), then you'll have to intersect their keys first, and then select objects by those keys:
var ids = list1.Select(x => x.Id).Intersect(list2.Select(x => x.Id));
var result = list1.Where(x => ids.Contains(x.Id));
From performance point of view if two lists contain number of elements that differ significantly, you can try such approach (using conditional operator ?:):
1.First you need to declare a converter:
Converter<string, int> del = delegate(string s) { return Int32.Parse(s); };
2.Then you use a conditional operator:
var r = data1.Count > data2.Count ?
data2.ConvertAll<int>(del).Intersect(data1) :
data1.Select(v => v.ToString()).Intersect(data2).ToList<string>().ConvertAll<int>(del);
You convert elements of shorter list to match the type of longer list. Imagine an execution speed if your first set contains 1000 elements and second only 10 (or opposite as it doesn't matter) ;-)
As you want to have a result as List, in a last line you convert the result (only result) back to int.
public static List<T> ListCompare<T>(List<T> List1 , List<T> List2 , string key )
{
return List1.Select(t => t.GetType().GetProperty(key).GetValue(t))
.Intersect(List2.Select(t => t.GetType().GetProperty(key).GetValue(t))).ToList();
}

Intersection of 6 List<int> objects

As I mentioned in the title I've got 6 List objects in my hand. I want to find the intersection of them except the ones who has no item.
intersectionResultSet =
list1.
Intersect(list2).
Intersect(list3).
Intersect(list4).
Intersect(list5).
Intersect(list6).ToList();
When one of them has no item, normally I get empty set as a result. So I want to exclude the ones that has no item from intersection operation. What's the best way to do that?
Thanks in advance,
You could use something like this:
// Your handful of lists
IEnumerable<IEnumerable<int>> lists = new[]
{
new List<int> { 1, 2, 3 },
new List<int>(),
null,
new List<int> { 2, 3, 4 }
};
List<int> intersection = lists
.Where(c => c != null && c.Any())
.Aggregate(Enumerable.Intersect)
.ToList();
foreach (int value in intersection)
{
Console.WriteLine(value);
}
This has been tested and produces the following output:
2
3
With thanks to #Matajon for pointing out a cleaner (and more performant) use of Enumerable.Intersect in the Aggregate function.
Simply, using LINQ too.
var lists = new List<IEnumerable<int>>() { list1, list2, list3, list4, list5, list6 };
var result = lists
.Where(x => x.Any())
.Aggregate(Enumerable.Intersect)
.ToList();
You could use LINQ to get all the list that are longer then 0 , and then send them to the function you've described.
Another option :
Override/Extend "Intersect" to a function that does Intersect on a list only if it's not empty , and call it instead of Intersect.

Categories