How take each two items from IEnumerable as a pair? - c#

I have IEnumerable<string> which looks like {"First", "1", "Second", "2", ... }.
I need to iterate through the list and create IEnumerable<Tuple<string, string>> where Tuples will look like:
"First", "1"
"Second", "2"
So I need to create pairs from a list I have to get pairs as mentioned above.

A lazy extension method to achieve this is:
public static IEnumerable<Tuple<T, T>> Tupelize<T>(this IEnumerable<T> source)
{
using (var enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
{
var item1 = enumerator.Current;
if (!enumerator.MoveNext())
throw new ArgumentException();
var item2 = enumerator.Current;
yield return new Tuple<T, T>(item1, item2);
}
}
Note that if the number of elements happens to not be even this will throw. Another way would be to use this extensions method to split the source collection into chunks of 2:
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> list, int batchSize)
{
var batch = new List<T>(batchSize);
foreach (var item in list)
{
batch.Add(item);
if (batch.Count == batchSize)
{
yield return batch;
batch = new List<T>(batchSize);
}
}
if (batch.Count > 0)
yield return batch;
}
Then you can do:
var tuples = items.Chunk(2)
.Select(x => new Tuple<string, string>(x.First(), x.Skip(1).First()))
.ToArray();
Finally, to use only existing extension methods:
var tuples = items.Where((x, i) => i % 2 == 0)
.Zip(items.Where((x, i) => i % 2 == 1),
(a, b) => new Tuple<string, string>(a, b))
.ToArray();

morelinq contains a Batch extension method which can do what you want:
var str = new string[] { "First", "1", "Second", "2", "Third", "3" };
var tuples = str.Batch(2, r => new Tuple<string, string>(r.FirstOrDefault(), r.LastOrDefault()));

You could do something like:
var pairs = source.Select((value, index) => new {Index = index, Value = value})
.GroupBy(x => x.Index / 2)
.Select(g => new Tuple<string, string>(g.ElementAt(0).Value,
g.ElementAt(1).Value));
This will get you an IEnumerable<Tuple<string, string>>. It works by grouping the elements by their odd/even positions and then expanding each group into a Tuple. The benefit of this approach over the Zip approach suggested by BrokenGlass is that it only enumerates the original enumerable once.
It is however hard for someone to understand at first glance, so I would either do it another way (ie. not using linq), or document its intention next to where it is used.

You can make this work using the LINQ .Zip() extension method:
IEnumerable<string> source = new List<string> { "First", "1", "Second", "2" };
var tupleList = source.Zip(source.Skip(1),
(a, b) => new Tuple<string, string>(a, b))
.Where((x, i) => i % 2 == 0)
.ToList();
Basically the approach is zipping up the source Enumerable with itself, skipping the first element so the second enumeration is one off - that will give you the pairs ("First, "1"), ("1", "Second"), ("Second", "2").
Then we are filtering the odd tuples since we don't want those and end up with the right tuple pairs ("First, "1"), ("Second", "2") and so on.
Edit:
I actually agree with the sentiment of the comments - this is what I would consider "clever" code - looks smart, but has obvious (and not so obvious) downsides:
Performance: the Enumerable has to
be traversed twice - for the same
reason it cannot be used on
Enumerables that consume their
source, i.e. data from network
streams.
Maintenance: It's not obvious what
the code does - if someone else is
tasked to maintain the code there
might be trouble ahead, especially
given point 1.
Having said that, I'd probably use a good old foreach loop myself given the choice, or with a list as source collection a for loop so I can use the index directly.

IEnumerable<T> items = ...;
using (var enumerator = items.GetEnumerator())
{
while (enumerator.MoveNext())
{
T first = enumerator.Current;
bool hasSecond = enumerator.MoveNext();
Trace.Assert(hasSecond, "Collection must have even number of elements.");
T second = enumerator.Current;
var tuple = new Tuple<T, T>(first, second);
//Now you have the tuple
}
}

Starting from NET 6.0, you can use
Enumerable.Chunk(IEnumerable, Int32)
var tuples = new[] {"First", "1", "Second", "2", "Incomplete" }
.Chunk(2)
.Where(chunk => chunk.Length == 2)
.Select(chunk => (chunk[0], chunk[1]));

If you are using .NET 4.0, then you can use tuple object (see http://mutelight.org/articles/finally-tuples-in-c-sharp.html). Together with LINQ it should give you what you need. If not, then you probably need to define your own tuples to do that or encode those strings like for example "First:1", "Second:2" and then decode it (also with LINQ).

Related

How do you use LINQ to combine multiple lists in to one list, but with only what is common among all lists? [duplicate]

I have a list of lists which I want to find the intersection for like this:
var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
// expected intersection is List<int>() { 3 };
Is there some way to do this with IEnumerable.Intersect()?
EDIT:
I should have been more clear on this: I really have a list of lists, I don't know how many there will be, the three lists above was just an example, what I have is actually an IEnumerable<IEnumerable<SomeClass>>
SOLUTION
Thanks for all great answers. It turned out there were four options for solving this: List+aggregate (#Marcel Gosselin), List+foreach (#JaredPar, #Gabe Moothart), HashSet+aggregate (#jesperll) and HashSet+foreach (#Tony the Pony). I did some performance testing on these solutions (varying number of lists, number of elements in each list and random number max size.
It turns out that for most situations the HashSet performs better than the List (except with large lists and small random number size, because of the nature of HashSet I guess.)
I couldn't find any real difference between the foreach method and the aggregate method (the foreach method performs slightly better.)
To me, the aggregate method is really appealing (and I'm going with that as the accepted answer) but I wouldn't say it's the most readable solution.. Thanks again all!
How about:
var intersection = listOfLists
.Skip(1)
.Aggregate(
new HashSet<T>(listOfLists.First()),
(h, e) => { h.IntersectWith(e); return h; }
);
That way it's optimized by using the same HashSet throughout and still in a single statement. Just make sure that the listOfLists always contains at least one list.
You can indeed use Intersect twice. However, I believe this will be more efficient:
HashSet<int> hashSet = new HashSet<int>(list1);
hashSet.IntersectWith(list2);
hashSet.IntersectWith(list3);
List<int> intersection = hashSet.ToList();
Not an issue with small sets of course, but if you have a lot of large sets it could be significant.
Basically Enumerable.Intersect needs to create a set on each call - if you know that you're going to be doing more set operations, you might as well keep that set around.
As ever, keep a close eye on performance vs readability - the method chaining of calling Intersect twice is very appealing.
EDIT: For the updated question:
public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = null;
foreach (var list in lists)
{
if (hashSet == null)
{
hashSet = new HashSet<T>(list);
}
else
{
hashSet.IntersectWith(list);
}
}
return hashSet == null ? new List<T>() : hashSet.ToList();
}
Or if you know it won't be empty, and that Skip will be relatively cheap:
public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = new HashSet<T>(lists.First());
foreach (var list in lists.Skip(1))
{
hashSet.IntersectWith(list);
}
return hashSet.ToList();
}
Try this, it works but I'd really like to get rid of the .ToList() in the aggregate.
var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
var intersection = listOfLists.Aggregate((previousList, nextList) => previousList.Intersect(nextList).ToList());
Update:
Following comment from #pomber, it is possible to get rid of the ToList() inside the Aggregate call and move it outside to execute it only once. I did not test for performance whether previous code is faster than the new one. The change needed is to specify the generic type parameter of the Aggregate method on the last line like below:
var intersection = listOfLists.Aggregate<IEnumerable<int>>(
(previousList, nextList) => previousList.Intersect(nextList)
).ToList();
You could do the following
var result = list1.Intersect(list2).Intersect(list3).ToList();
This is my version of the solution with an extension method that I called IntersectMany.
public static IEnumerable<TResult> IntersectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector)
{
using (var enumerator = source.GetEnumerator())
{
if(!enumerator.MoveNext())
return new TResult[0];
var ret = selector(enumerator.Current);
while (enumerator.MoveNext())
{
ret = ret.Intersect(selector(enumerator.Current));
}
return ret;
}
}
So the usage would be something like this:
var intersection = (new[] { list1, list2, list3 }).IntersectMany(l => l).ToList();
This is my one-row solution for List of List (ListOfLists) without intersect function:
var intersect = ListOfLists.SelectMany(x=>x).Distinct().Where(w=> ListOfLists.TrueForAll(t=>t.Contains(w))).ToList()
This should work for .net 4 (or later)
After searching the 'net and not really coming up with something I liked (or that worked), I slept on it and came up with this. Mine uses a class (SearchResult) which has an EmployeeId in it and that's the thing I need to be common across lists. I return all records that have an EmployeeId in every list. It's not fancy, but it's simple and easy to understand, just what I like. For small lists (my case) it should perform just fine—and anyone can understand it!
private List<SearchResult> GetFinalSearchResults(IEnumerable<IEnumerable<SearchResult>> lists)
{
Dictionary<int, SearchResult> oldList = new Dictionary<int, SearchResult>();
Dictionary<int, SearchResult> newList = new Dictionary<int, SearchResult>();
oldList = lists.First().ToDictionary(x => x.EmployeeId, x => x);
foreach (List<SearchResult> list in lists.Skip(1))
{
foreach (SearchResult emp in list)
{
if (oldList.Keys.Contains(emp.EmployeeId))
{
newList.Add(emp.EmployeeId, emp);
}
}
oldList = new Dictionary<int, SearchResult>(newList);
newList.Clear();
}
return oldList.Values.ToList();
}
Here's an example just using a list of ints, not a class (this was my original implementation).
static List<int> FindCommon(List<List<int>> items)
{
Dictionary<int, int> oldList = new Dictionary<int, int>();
Dictionary<int, int> newList = new Dictionary<int, int>();
oldList = items[0].ToDictionary(x => x, x => x);
foreach (List<int> list in items.Skip(1))
{
foreach (int i in list)
{
if (oldList.Keys.Contains(i))
{
newList.Add(i, i);
}
}
oldList = new Dictionary<int, int>(newList);
newList.Clear();
}
return oldList.Values.ToList();
}
This is a simple solution if your lists are all small. If you have larger lists, it's not as performing as hash set:
public static IEnumerable<T> IntersectMany<T>(this IEnumerable<IEnumerable<T>> input)
{
if (!input.Any())
return new List<T>();
return input.Aggregate(Enumerable.Intersect);
}

LINQ Aggregate special attention for last element of list

I would like to pass in a different function to Aggregate for the last element in the collection.
A use for this would be:
List<string> listString = new List{"1", "2", "3"};
string joined = listString.Aggregate(new StringBuilder(),
(sb,s) => sb.Append(s).Append(", "),
(sb,s) => sb.Append(s)).ToString();
//joined => "1, 2, 3"
What would be a custom implementation if no other exists?
P.S. I would like to do this w/ composable functions iterating once through the collection. In other words, I do not want to do a Select wrapped in a String.Join
Aggregate does not allow that in natural way.
You can carry previous element and do you final handling after Aggregate. Also I think your best bet would be to write custom method that does that custom handling for last (and possibly first) element.
Some approximate code to special case last item with Aggregate (does not handle most special case like empty/short list):
var firstLast = seq.Aggregate(
Tuple.Create(new StringBuilder(), default(string)),
(sum, cur) =>
{
if (sum.Item2 != null)
{
sum.Item1.Append(",");
sum.Item1.Append(sum.Item2);
}
return Tuple.Create(sum.Item1, cur);
});
firstLast.Item1.Append(SpecialProcessingForLast(sum.Item2));
return firstLast.Item1.ToString();
Aggregate with special case for "last". Sample is ready to copy/paste to LinqPad/console app, uncomment "this" when making extension function. Main shows aggregating array with summing all but last element, last one is subtracted from result:
void Main()
{
Console.WriteLine(AggregateWithLast(new[] {1,1,1,-3}, 0, (s,c)=>s+c, (s,c)=>s-c));
Console.WriteLine(AggregateWithLast(new[] {1,1,1,+3}, 0, (s,c)=>s+c, (s,c)=>s-c));
}
public static TAccumulate AggregateWithLast<TSource, TAccumulate>(
/*this */ IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> funcAll,
Func<TAccumulate, TSource, TAccumulate> funcLast)
{
using (IEnumerator<TSource> sourceIterator = source.GetEnumerator())
{
if (!sourceIterator.MoveNext())
{
return seed;
}
TSource last = sourceIterator.Current;
TAccumulate total = seed;
while (sourceIterator.MoveNext())
{
total = funcAll(total, last);
last = sourceIterator.Current;
}
return funcLast(total, last);
}
}
Note: if you need just String.Join than one in .Net 4.0+ takes IEnumerable<T> - so it will iterate sequence only once without need to ToList/ToArray.
Another approach for your particular example, is to skip the comma for the first element and prepend it to the tail elements, like this:
List<string> listString = new() { "1", "2", "3" };
string joined = listString
.Select((value, index) => (value, index))
.Aggregate(new StringBuilder(), (sb, s) =>
s.index == 0
? sb.Append(s.value)
: sb.Append(", ").Append(s.value))
.ToString();
I know this does not address the question in the title, but for most "join things with some infix" problems, this works well. That is, when string.Join is not the solution.

How to get duplicate items from a list using LINQ? [duplicate]

This question already has answers here:
C# LINQ find duplicates in List
(13 answers)
Closed 3 years ago.
I'm having a List<string> like:
List<String> list = new List<String>{"6","1","2","4","6","5","1"};
I need to get the duplicate items in the list into a new list. Now I'm using a nested for loop to do this.
The resulting list will contain {"6","1"}.
Is there any idea to do this using LINQ or lambda expressions?
var duplicates = lst.GroupBy(s => s)
.SelectMany(grp => grp.Skip(1));
Note that this will return all duplicates, so if you only want to know which items are duplicated in the source list, you could apply Distinct to the resulting sequence or use the solution given by Mark Byers.
Here is one way to do it:
List<String> duplicates = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
The GroupBy groups the elements that are the same together, and the Where filters out those that only appear once, leaving you with only the duplicates.
Here's another option:
var list = new List<string> { "6", "1", "2", "4", "6", "5", "1" };
var set = new HashSet<string>();
var duplicates = list.Where(x => !set.Add(x));
I know it's not the answer to the original question, but you may find yourself here with this problem.
If you want all of the duplicate items in your results, the following works.
var duplicates = list
.GroupBy( x => x ) // group matching items
.Where( g => g.Skip(1).Any() ) // where the group contains more than one item
.SelectMany( g => g ); // re-expand the groups with more than one item
In my situation I need all duplicates so that I can mark them in the UI as being errors.
I wrote this extension method based off #Lee's response to the OP. Note, a default parameter was used (requiring C# 4.0). However, an overloaded method call in C# 3.0 would suffice.
/// <summary>
/// Method that returns all the duplicates (distinct) in the collection.
/// </summary>
/// <typeparam name="T">The type of the collection.</typeparam>
/// <param name="source">The source collection to detect for duplicates</param>
/// <param name="distinct">Specify <b>true</b> to only return distinct elements.</param>
/// <returns>A distinct list of duplicates found in the source collection.</returns>
/// <remarks>This is an extension method to IEnumerable<T></remarks>
public static IEnumerable<T> Duplicates<T>
(this IEnumerable<T> source, bool distinct = true)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
// select the elements that are repeated
IEnumerable<T> result = source.GroupBy(a => a).SelectMany(a => a.Skip(1));
// distinct?
if (distinct == true)
{
// deferred execution helps us here
result = result.Distinct();
}
return result;
}
List<String> list = new List<String> { "6", "1", "2", "4", "6", "5", "1" };
var q = from s in list
group s by s into g
where g.Count() > 1
select g.First();
foreach (var item in q)
{
Console.WriteLine(item);
}
Hope this wil help
int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
var duplicates = listOfItems
.GroupBy(i => i)
.Where(g => g.Count() > 1)
.Select(g => g.Key);
foreach (var d in duplicates)
Console.WriteLine(d);
I was trying to solve the same with a list of objects and was having issues because I was trying to repack the list of groups into the original list. So I came up with looping through the groups to repack the original List with items that have duplicates.
public List<MediaFileInfo> GetDuplicatePictures()
{
List<MediaFileInfo> dupes = new List<MediaFileInfo>();
var grpDupes = from f in _fileRepo
group f by f.Length into grps
where grps.Count() >1
select grps;
foreach (var item in grpDupes)
{
foreach (var thing in item)
{
dupes.Add(thing);
}
}
return dupes;
}
All mentioned solutions until now perform a GroupBy. Even if I only need the first Duplicate all elements of the collections are enumerated at least once.
The following extension function stops enumerating as soon as a duplicate has been found. It continues if a next duplicate is requested.
As always in LINQ there are two versions, one with IEqualityComparer and one without it.
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource> source)
{
return source.ExtractDuplicates(null);
}
public static IEnumerable<TSource> ExtractDuplicates(this IEnumerable<TSource source,
IEqualityComparer<TSource> comparer);
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (comparer == null)
comparer = EqualityCompare<TSource>.Default;
HashSet<TSource> foundElements = new HashSet<TSource>(comparer);
foreach (TSource sourceItem in source)
{
if (!foundElements.Contains(sourceItem))
{ // we've not seen this sourceItem before. Add to the foundElements
foundElements.Add(sourceItem);
}
else
{ // we've seen this item before. It is a duplicate!
yield return sourceItem;
}
}
}
Usage:
IEnumerable<MyClass> myObjects = ...
// check if has duplicates:
bool hasDuplicates = myObjects.ExtractDuplicates().Any();
// or find the first three duplicates:
IEnumerable<MyClass> first3Duplicates = myObjects.ExtractDuplicates().Take(3)
// or find the first 5 duplicates that have a Name = "MyName"
IEnumerable<MyClass> myNameDuplicates = myObjects.ExtractDuplicates()
.Where(duplicate => duplicate.Name == "MyName")
.Take(5);
For all these linq statements the collection is only parsed until the requested items are found. The rest of the sequence is not interpreted.
IMHO that is an efficiency boost to consider.

Intersection of multiple lists with IEnumerable.Intersect()

I have a list of lists which I want to find the intersection for like this:
var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
// expected intersection is List<int>() { 3 };
Is there some way to do this with IEnumerable.Intersect()?
EDIT:
I should have been more clear on this: I really have a list of lists, I don't know how many there will be, the three lists above was just an example, what I have is actually an IEnumerable<IEnumerable<SomeClass>>
SOLUTION
Thanks for all great answers. It turned out there were four options for solving this: List+aggregate (#Marcel Gosselin), List+foreach (#JaredPar, #Gabe Moothart), HashSet+aggregate (#jesperll) and HashSet+foreach (#Tony the Pony). I did some performance testing on these solutions (varying number of lists, number of elements in each list and random number max size.
It turns out that for most situations the HashSet performs better than the List (except with large lists and small random number size, because of the nature of HashSet I guess.)
I couldn't find any real difference between the foreach method and the aggregate method (the foreach method performs slightly better.)
To me, the aggregate method is really appealing (and I'm going with that as the accepted answer) but I wouldn't say it's the most readable solution.. Thanks again all!
How about:
var intersection = listOfLists
.Skip(1)
.Aggregate(
new HashSet<T>(listOfLists.First()),
(h, e) => { h.IntersectWith(e); return h; }
);
That way it's optimized by using the same HashSet throughout and still in a single statement. Just make sure that the listOfLists always contains at least one list.
You can indeed use Intersect twice. However, I believe this will be more efficient:
HashSet<int> hashSet = new HashSet<int>(list1);
hashSet.IntersectWith(list2);
hashSet.IntersectWith(list3);
List<int> intersection = hashSet.ToList();
Not an issue with small sets of course, but if you have a lot of large sets it could be significant.
Basically Enumerable.Intersect needs to create a set on each call - if you know that you're going to be doing more set operations, you might as well keep that set around.
As ever, keep a close eye on performance vs readability - the method chaining of calling Intersect twice is very appealing.
EDIT: For the updated question:
public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = null;
foreach (var list in lists)
{
if (hashSet == null)
{
hashSet = new HashSet<T>(list);
}
else
{
hashSet.IntersectWith(list);
}
}
return hashSet == null ? new List<T>() : hashSet.ToList();
}
Or if you know it won't be empty, and that Skip will be relatively cheap:
public List<T> IntersectAll<T>(IEnumerable<IEnumerable<T>> lists)
{
HashSet<T> hashSet = new HashSet<T>(lists.First());
foreach (var list in lists.Skip(1))
{
hashSet.IntersectWith(list);
}
return hashSet.ToList();
}
Try this, it works but I'd really like to get rid of the .ToList() in the aggregate.
var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
var intersection = listOfLists.Aggregate((previousList, nextList) => previousList.Intersect(nextList).ToList());
Update:
Following comment from #pomber, it is possible to get rid of the ToList() inside the Aggregate call and move it outside to execute it only once. I did not test for performance whether previous code is faster than the new one. The change needed is to specify the generic type parameter of the Aggregate method on the last line like below:
var intersection = listOfLists.Aggregate<IEnumerable<int>>(
(previousList, nextList) => previousList.Intersect(nextList)
).ToList();
You could do the following
var result = list1.Intersect(list2).Intersect(list3).ToList();
This is my version of the solution with an extension method that I called IntersectMany.
public static IEnumerable<TResult> IntersectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector)
{
using (var enumerator = source.GetEnumerator())
{
if(!enumerator.MoveNext())
return new TResult[0];
var ret = selector(enumerator.Current);
while (enumerator.MoveNext())
{
ret = ret.Intersect(selector(enumerator.Current));
}
return ret;
}
}
So the usage would be something like this:
var intersection = (new[] { list1, list2, list3 }).IntersectMany(l => l).ToList();
This is my one-row solution for List of List (ListOfLists) without intersect function:
var intersect = ListOfLists.SelectMany(x=>x).Distinct().Where(w=> ListOfLists.TrueForAll(t=>t.Contains(w))).ToList()
This should work for .net 4 (or later)
After searching the 'net and not really coming up with something I liked (or that worked), I slept on it and came up with this. Mine uses a class (SearchResult) which has an EmployeeId in it and that's the thing I need to be common across lists. I return all records that have an EmployeeId in every list. It's not fancy, but it's simple and easy to understand, just what I like. For small lists (my case) it should perform just fine—and anyone can understand it!
private List<SearchResult> GetFinalSearchResults(IEnumerable<IEnumerable<SearchResult>> lists)
{
Dictionary<int, SearchResult> oldList = new Dictionary<int, SearchResult>();
Dictionary<int, SearchResult> newList = new Dictionary<int, SearchResult>();
oldList = lists.First().ToDictionary(x => x.EmployeeId, x => x);
foreach (List<SearchResult> list in lists.Skip(1))
{
foreach (SearchResult emp in list)
{
if (oldList.Keys.Contains(emp.EmployeeId))
{
newList.Add(emp.EmployeeId, emp);
}
}
oldList = new Dictionary<int, SearchResult>(newList);
newList.Clear();
}
return oldList.Values.ToList();
}
Here's an example just using a list of ints, not a class (this was my original implementation).
static List<int> FindCommon(List<List<int>> items)
{
Dictionary<int, int> oldList = new Dictionary<int, int>();
Dictionary<int, int> newList = new Dictionary<int, int>();
oldList = items[0].ToDictionary(x => x, x => x);
foreach (List<int> list in items.Skip(1))
{
foreach (int i in list)
{
if (oldList.Keys.Contains(i))
{
newList.Add(i, i);
}
}
oldList = new Dictionary<int, int>(newList);
newList.Clear();
}
return oldList.Values.ToList();
}
This is a simple solution if your lists are all small. If you have larger lists, it's not as performing as hash set:
public static IEnumerable<T> IntersectMany<T>(this IEnumerable<IEnumerable<T>> input)
{
if (!input.Any())
return new List<T>();
return input.Aggregate(Enumerable.Intersect);
}

Using lambda expressions to get a subset where array elements are equal

I have an interesting problem, and I can't seem to figure out the lambda expression to make this work.
I have the following code:
List<string[]> list = GetSomeData(); // Returns large number of string[]'s
List<string[]> list2 = GetSomeData2(); // similar data, but smaller subset
List<string[]> newList = list.FindAll(predicate(string[] line){
return (???);
});
I want to return only those records in list in which element 0 of each string[] is equal to one of the element 0's in list2.
list contains data like this:
"000", "Data", "more data", "etc..."
list2 contains data like this:
"000", "different data", "even more different data"
Fundamentally, i could write this code like this:
List<string[]> newList = new List<string[]>();
foreach(var e in list)
{
foreach(var e2 in list2)
{
if (e[0] == e2[0])
newList.Add(e);
}
}
return newList;
But, i'm trying to use generics and lambda's more, so i'm looking for a nice clean solution. This one is frustrating me though.. maybe a Find inside of a Find?
EDIT:
Marc's answer below lead me to experiment with a varation that looks like this:
var z = list.Where(x => list2.Select(y => y[0]).Contains(x[0])).ToList();
I'm not sure how efficent this is, but it works and is sufficiently succinct. Anyone else have any suggestions?
You could join? I'd use two steps myself, though:
var keys = new HashSet<string>(list2.Select(x => x[0]));
var data = list.Where(x => keys.Contains(x[0]));
If you only have .NET 2.0, then either install LINQBridge and use the above (or similar with a Dictionary<> if LINQBridge doesn't include HashSet<>), or perhaps use nested Find:
var data = list.FindAll(arr => list2.Find(arr2 => arr2[0] == arr[0]) != null);
note though that the Find approach is O(n*m), where-as the HashSet<> approach is O(n+m)...
You could use the Intersect extension method in System.Linq, but you would need to provide an IEqualityComparer to do the work.
static void Main(string[] args)
{
List<string[]> data1 = new List<string[]>();
List<string[]> data2 = new List<string[]>();
var result = data1.Intersect(data2, new Comparer());
}
class Comparer : IEqualityComparer<string[]>
{
#region IEqualityComparer<string[]> Members
bool IEqualityComparer<string[]>.Equals(string[] x, string[] y)
{
return x[0] == y[0];
}
int IEqualityComparer<string[]>.GetHashCode(string[] obj)
{
return obj.GetHashCode();
}
#endregion
}
Intersect may work for you.
Intersect finds all the items that are in both lists.
Ok re-read the question. Intersect doesn't take the order into account.
I have written a slightly more complex linq expression that will return a list of items that are in the same position (index) with the same value.
List<String> list1 = new List<String>() {"000","33", "22", "11", "111"};
List<String> list2 = new List<String>() {"000", "22", "33", "11"};
List<String> subList = list1.Select ((value, index) => new { Value = value, Index = index})
.Where(w => list2.Skip(w.Index).FirstOrDefault() == w.Value )
.Select (s => s.Value).ToList();
Result: {"000", "11"}
Explanation of the query:
Select a set of values and position of that value.
Filter that set where the item in the same position in the second list has the same value.
Select just the value (not the index as well).
Note I used:
list2.Skip(w.Index).FirstOrDefault()
//instead of
list2[w.Index]
So that it will handle lists of different lengths.
If you know the lists will be the same length or list1 will always be shorter then list2[w.Index] would probably a bit faster.

Categories