looping through to get duplicates in an object C# - c#

I have an object of objects where one property can have duplicate values i am trying to find an algorithm to loop through the object and create a new list of object having group by duplicate values.
how do i loop through an object and create a new object in C# i have a view model eg:
pViewModel {
public itemFullName {get;set;}
public Item Item{get;set;}
public string itemAddress {get;set;}
public string itemCountry {get;set;}
public string addressId {get;set;}
}
public Item{
public int itemId{get;set;}
}
I want to create a new object after finding matching fullname but different id so my new object will have a list of itemFullName, item.itemid(pipedelimited values for all the items in the previous list),itemaddress, itemCountry in it.
Any help will be awesome. thank you
someone pointed out to this
var itemsAndIds = list
.GroupBy(m => m.itemFullName, m => m.Item.itemId)
.Select(g => new {ItemFullName = g.Key, ItemIds = string.Join("|", g)})
but now I need the new properties added to this object

This answer could help you:
https://stackoverflow.com/a/5232194/1341189
Given that ItemAddress and ItemCountry are always the same as ItemFullName in your example that means you can do something like this:
var itemsAndIds = list
.GroupBy(m => new {
ItemFullName = m.itemFullName,
ItemAddress = m.itemAddress,
ItemCoutnry = m.itemCountry },
m => m.Item.itemId)
.Select(g => new {
ItemFullName = g.Key.ItemFullName,
ItemIds = string.Join("|", g),
ItemAddress = g.Key.ItemAddress,
ItemCoutnry = g.Key.ItemCountry})
Also I would like to suggest that you read the Microsoft Naming Conventions.
Property Names should be written in PascalCase (same goes for Class Names).
https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/names-of-type-members

You could override bool Equals(Object obj) method in your object. This way you can group by whole items and then just select new ones. You will keep your comparation logic sealed in your object, co you can easily use build in mechanics like == comparation or whole linq magic without need of any hacks. this way your class is fully compatible with SOLID principle
keep in mind that to preserve consistency if you override Equals, it is also worth to override int GetHashCode() method. this way you keep consistency of comparartion (to be precised you need to be sure that tour implementation follows those three main rules):
a == a and a.Equals(a) should always be true (Reflexivity).
a == b, b == a, a.Equals(b) and b.Equals(a) should always give the same result. (Symmetry)
If a == b is true and b == c is true, then a == c should also be true (Transitivity). The same applies to a.Equals(b), b.Equals(c) and a.Equals(c).

Related

How to check if properties across objects in a list are equal?

I have two objects in a list and want to ensure that all properties of these objects are having the same value.
For example:
List<Person> persons = new List<Person>
{
new Person { Id = "1", Name = "peter" },
new Person { Id = "1", Name = "peter" }
};
Now I want to get true as both objects properties are same. I have tried with the following lambda expression.
var areEqual = persons.All(o => o == persons.First());
but I'm getting false in areEqual. I'm unable to understand why this is so and want to know how to do it correctly.
You can find out if all elements are the same by using:
persons.Distinct().Count() == 1
If it's zero there were no entries in the first place, if it's greater 1, you had entries that were not the same.
Now... how do you make sure the .Distinct() call knows when two objects are the same?
Option 1: Person is already a record. Great. Inbuilt funtionality. Done.
Option 2: Person implements IEquatable<Person> and it does the check you want.
Option 3: Person overrides Object.Equals and Object.GetHashCode on it's own, in a way you need.
Option 4: Person is neither of the above and you don't want to change it to check one of those boxes. Then you can still implement your own IEqualityComparer<Person> and pass an instance of it to the distinct method like this:
persons.Distinct(new MyCustomPersonEqualityComparer()).Count() == 1
This query would be meaningless if you have less than 2 items. So, take first element to compare to the rest
var allAreSame = persons
.All(p => p.Id == persons[0].Id && p.Name == persons[0].Name);
Or (faster way)
var allAreSame = !persons
.Any(p => p.Id != persons[0].Id || p.Name != persons[0].Name);
Person is a reference type, the default == is performing memory's location equality check. In order to perform your own equality check you must implement IComparable<Person>.
You can also use record instead of class, this implements behind the hoods the equality checks on the record's public properties.
public record Person
{
public string Id { get; set; }
public string Name { get; set; }
}

LINQ with Querying "Memory"

Does LINQ have a way to "memorize" its previous query results while querying?
Consider the following case:
public class Foo {
public int Id { get; set; }
public ICollection<Bar> Bars { get; set; }
}
public class Bar {
public int Id { get; set; }
}
Now, if two or more Foo have same collection of Bar (no matter what the order is), they are considered as similar Foo.
Example:
foo1.Bars = new List<Bar>() { bar1, bar2 };
foo2.Bars = new List<Bar>() { bar2, bar1 };
foo3.Bars = new List<Bar>() { bar3, bar1, bar2 };
In the above case, foo1 is similar to foo2 but both foo1 and foo2 are not similar tofoo3
Given that we have a query result consisting IEnumerable or IOrderedEnumerable of Foo. From the query, we are to find the first N foo which are not similar.
This task seems to require a memory of the collection of bars which have been chosen before.
With partial LINQ we could do it like this:
private bool areBarsSimilar(ICollection<Bar> bars1, ICollection<Bar> bars2) {
return bars1.Count == bars2.Count && //have the same amount of bars
!bars1.Select(x => x.Id)
.Except(bars2.Select(y => y.Id))
.Any(); //and when excepted does not return any element mean similar bar
}
public void somewhereWithQueryResult(){
.
.
List<Foo> topNFoos = new List<Foo>(); //this serves as a memory for the previous query
int N = 50; //can be any number
foreach (var q in query) { //query is IOrderedEnumerable or IEnumerable
if (topNFoos.Count == 0 || !topNFoos.Any(foo => areBarsSimilar(foo.Bars, q.Bars)))
topNFoos.Add(q);
if (topNFoos.Count >= N) //We have had enough Foo
break;
}
}
The topNFoos List will serve as a memory of the previous query and we can skip the Foo q in the foreach loop which already have identical Bars with Any of the Foo in the topNFoos.
My question is, is there any way to do that in LINQ (fully LINQ)?
var topNFoos = from q in query
//put something
select q;
If the "memory" required is from a particular query item q or a variable outside of the query, then we could use let variable to cache it:
int index = 0;
var topNFoos = from q in query
let qc = index++ + q.Id //depends on q or variable outside like index, then it is OK
select q;
But if it must come from the previous querying of the query itself then things start to get more troublesome.
Is there any way to do that?
Edit:
(I currently am creating a test case (github link) for the answers. Still figuring out how can I test all the answers fairly)
(Most of the answers below are aimed to solve my particular question and are in themselves good (Rob's, spender's, and David B's answers which use IEqualityComparer are particularly awesome). Nevertheless, if there is anyone who can give answer to my more general question "does LINQ have a way to "memorize" its previous query results while querying", I would also be glad)
(Apart from the significant difference in performance for the particular case I presented above when using fully/partial LINQ, one answer aiming to answer my general question about LINQ memory is Ivan Stoev's. Another one with good combination is Rob's. As to make myself clearer, I look for general and efficient solution, if there is any, using LINQ)
I'm not going to answer your question directly, but rather, propose a method that will be fairly optimally efficient for filtering the first N non-similar items.
First, consider writing an IEqualityComparer<Foo> that uses the Bars collection to measure equality. Here, I'm assuming that the lists might contain duplicate entries, so have quite a strict definition of similarity:
public class FooSimilarityComparer:IEqualityComparer<Foo>
{
public bool Equals(Foo a, Foo b)
{
//called infrequently
return a.Bars.OrderBy(bar => bar.Id).SequenceEqual(b.Bars.OrderBy(bar => bar.Id));
}
public int GetHashCode(Foo foo)
{
//called frequently
unchecked
{
return foo.Bars.Sum(b => b.GetHashCode());
}
}
}
You can really efficiently get the top N non-similar items by using a HashSet with the IEqualityComparer above:
IEnumerable<Foo> someFoos; //= some list of Foo
var hs = new HashSet<Foo>(new FooSimilarityComparer());
foreach(var f in someFoos)
{
hs.Add(f); //hashsets don't add duplicates, as measured by the FooSimilarityComparer
if(hs.Count >= 50)
{
break;
}
}
#Rob s approach above is broadly similar, and shows how you can use the comparer directly in LINQ, but pay attention to the comments I made to his answer.
So, it's ... possible. But this is far from performant code.
var res = query.Select(q => new {
original = q,
matches = query.Where(innerQ => areBarsSimilar(q.Bars, innerQ.Bars))
}).Select(g => new { original = g, joinKey = string.Join(",", g.matches.Select(m => m.Id)) })
.GroupBy (g => g.joinKey)
.Select(g => g.First().original.original)
.Take(N);
This assumes that the Ids are unique for each Foo (you could also use their GetHashCode(), I suppose).
A much better solution is to either keep what you've done, or implement a custom comparer, as follows:
Note: As pointed out in the comments by #spender, the below Equals and GetHashCode will not work for collections with duplicates. Refer to their answer for a better implementation - however, the usage code would remain the same
class MyComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo left, Foo right)
{
return left.Bars.Count() == right.Bars.Count() && //have the same amount of bars
left.Bars.Select(x => x.Id)
.Except(right.Bars.Select(y => y.Id))
.ToList().Count == 0; //and when excepted returns 0, mean similar bar
}
public int GetHashCode(Foo foo)
{
unchecked {
int hc = 0;
if (foo.Bars != null)
foreach (var p in foo.Bars)
hc ^= p.GetHashCode();
return hc;
}
}
}
And then your query becomes simply:
var res = query
.GroupBy (q => q, new MyComparer())
.Select(g => g.First())
.Take(N);
IEnumerable<Foo> dissimilarFoos =
from foo in query
let key = string.Join('|',
from bar in foo.Bars
order by bar.Id
select bar.Id.ToString())
group foo by key into g
select g.First();
IEnumerable<Foo> firstDissimilarFoos =
dissimilarFoos.Take(50);
Sometimes, you may not like the behavior of groupby in the above queries. At the time the query is enumerated, groupby will enumerate the entire source. If you only want partial enumeration, then you should switch to Distinct and a Comparer:
class FooComparer : IEqualityComparer<Foo>
{
private string keyGen(Foo foo)
{
return string.Join('|',
from bar in foo.Bars
order by bar.Id
select bar.Id.ToString());
}
public bool Equals(Foo left, Foo right)
{
if (left == null || right == null) return false;
return keyGen(left) == keyGen(right);
}
public bool GetHashCode(Foo foo)
{
return keyGen(foo).GetHashCode();
}
}
then write:
IEnumerable<Foo> dissimilarFoos = query.Distinct(new FooComparer());
IEnumerable<Foo> firstDissimilarFoos = dissimilarFoos.Take(50);
Idea. You might be able to hack something by devising your own fluent interface of mutators over a cache that you'd capture in "let x = ..." clauses, along the lines of,
from q in query
let qc = ... // your cache mechanism here
select ...
but I suspect you'll have to be careful to limit the updates to your cache to those "let ..." only, as I doubt the implementation of the standard Linq operators and extensions methods will be happy if you allow such side effects to happen in their back through predicates applied in the "where", or "join", "group by", etc, clauses.
'HTH,
I guess by "full LINQ" you mean standard LINQ operators/Enumerable extension methods.
I don't think this can be done with LINQ query syntax. From standard methods the only one that supports mutable processing state is Enumerable.Aggregate, but it gives you nothing more than a LINQ flavor over the plain foreach:
var result = query.Aggregate(new List<Foo>(), (list, next) =>
{
if (list.Count < 50 && !list.Any(item => areBarsSimilar(item.Bars, next.Bars)))
list.Add(next);
return list;
});
Since looks like we are allowed to use helper methods (like areBarsSimilar), the best we can do is to make it at least look more LINQ-ish by defining and using a custom extension method
var result = query.Aggregate(new List<Foo>(), (list, next) => list.Count < 50 &&
!list.Any(item => areBarsSimilar(item.Bars, next.Bars)) ? list.Concat(next) : list);
where the custom method is
public static class Utils
{
public static List<T> Concat<T>(this List<T> list, T item) { list.Add(item); return list; }
}
But note that compared to vanilla foreach, Aggregate has an additional drawback of not being able to exit earlier, thus will consume the whole input sequence (which besides the performance also means it doesn't work with infinite sequences).
Conclusion: While this should answer your original question, i.e. it's technically possible to do what you are asking for, LINQ (like the standard SQL) is not well suited for such type of processing.

LINQ conversion to List object

I am using the following code to return an IList:
FileName = Path.GetFileName(files[i]);
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName);
QueryListFromFTP = (IList<DataX>)QueryListFromFTP
.Select(x => new { x.user_id, x.date, x.application_ID })
.ToList()
.Distinct();
However I keep getting this error:
Unable to cast object of type 'd__7a1[<>f__AnonymousType03[System.String,System.String,System.String]]' to type 'System.Collections.Generic.IList`1[DataXLibrary.DataX]'.
What am I doing wrong?
If what you want is a List < DataX > than all you need is:
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName).Distinct().ToList();
// Use QueryListFromFTP here.
If you want a List of a different type of object as a result of your .Select, than you need to store the result in a List of object of that type i.e. anonymous if that's what you want.
The following line creates an anonymous type in c# which is not correspondent to the type Datax:
new { x.user_id, x.date, x.application_ID })
You should alter it to something like this:
Select(x => new Datax(){User_id = x.user_id, Date = x.date, Application = x.application_ID })
There are two problems in your code:
You're converting the List of DataX objects to an "anonymous type object" (the new { x.user_id, x.date, x.application_ID }). This object is not the same type as DataX, and it can't be coerced back to a DataX object automatically.
Trying to read between the lines a little, it looks like you want a distinct list of DataX objects, where distinctness is determined by a subset of the properties of a DataX object. So you have to answer the question, what will you do with duplicates (by this definition) that have different data in other properties? You have to discard some of them. Distinct() is not the right tool for this, because it only applies to the entire object of the IEnumerable it is applied to.
It's almost like you need a DistinctBy with one parameter giving the properties to calculate distinctness with, and a second parameter giving some logic for deciding which of the non-distinct "duplicates" to select. But this can be achieved with multiple IEnumerable methods: GroupBy and a further expression to select an appropriate single itemd from each resulting group. Here's one possible solution:
FileName = Path.GetFileName(files[i]);
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName)
.GroupBy(datax => new { datax.user_id, datax.date, datax.application_ID })
.Select(g => g.First()); // or another expression to choose one item per group
.ToList();
If, for example, there were a version field and you wanted the most recent one for each "duplicate", you could:
.Select(g => g.OrderByDescending(datax => data.version).First())
Please note, however, that if you just want distinctness over all the properties of the object, and there is no need to select one particular value (in order to get its additional properties after throwing away some objects considered duplicates), then it may be as simple as this:
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName)
.Distinct()
.ToList();
I would furthermore advise that you use IReadOnlyCollection where possible (that's .ToList().AsReadOnly()) and that, depending on your data, you may want to make the GetListFromFTP function perform the de-duplication/distinctness instead.
To answer any concerns that GroupBy isn't the right answer because it may not perform well enough, here is an alternate way to handle this (though I wholeheartedly disagree with you--until tests prove it's slow, it's a perfectly fine answer).
// in a static helper class of some kind
public static IEnumerable<T> DistinctBy<T, TKey>(
this IEnumerable<T> source,
Func<T, TKey> keySelector
) {
if (source == null) {
throw new ArgumentNullException("source", "Source enumerable cannot be null.");
}
if (keySelector == null) {
throw new ArgumentNullException("keySelector", "keySelector function cannot be null. To perform a generic distinct, use .Distinct().");
}
return DistinctByImpl(source, keySelector);
}
private static IEnumerable<T> DistinctByImpl<T, TKey>(
this IEnumerable<T> source,
Func<T, TKey> keySelector
) {
HashSet<TKey> keys = new HashSet<TKey>();
return source.Where(s => keys.Add(keySelector(s)));
}
It is used like this:
public class Animal {
public string Name { get; set; }
public string AnimalType { get; set; }
public decimal Weight { get; set; }
}
IEnumerable<Animal> animals = new List<Animal> {
new Animal { Name = "Fido", AnimalType = "Dog", Weight = 15.0M },
new Animal { Name = "Trixie", AnimalType = "Dog", Weight = 15.0M },
new Animal { Name = "Juliet", AnimalType = "Cat", Weight = 12.0M },
new Animal { Name = "Juliet", AnimalType = "Fish", Weight = 1.0M }
};
var filtered1 = animals.DistinctBy(a => new { a.AnimalType, a.Weight });
/* returns:
Name Type Weight
Fido Dog 15.0
Juliet Cat 12.0
Juliet Fish 1.0
*/
var filtered2 = animals.DistinctBy(a => a.Name); // or a simple property
/* returns:
Name Type Weight
Fido Dog 15.0
Trixie Dog 15.0
Juliet Cat 12.0
*/

c# Linq differed execution challenge - help needed in creating 3 different lists

I am trying to create 3 different lists (1,2,3) from 2 existing lists (A,B).
The 3 lists need to identify the following relationships.
List 1 - the items that are in list A and not in list B
List 2 - the items that are in list B and not in list A
List 3 - the items that are in both lists.
I then want to join all the lists together into one list.
My problem is that I want to identify the differences by adding an enum identifying the relationship to the items of each list. But by adding the Enum the Except Linq function does not identify the fact (obviously) that the lists are the same. Because the Linq queries are differed I can not resolve this by changing the order of my statements ie. identify the the lists and then add the Enums.
This is the code that I have got to (Doesn't work properly)
There might be a better approach.
List<ManufactorListItem> manufactorItemList =
manufactorRepository.GetManufactorList();
// Get the Manufactors from the Families repository
List<ManufactorListItem> familyManufactorList =
this.familyRepository.GetManufactorList(familyGuid);
// Identify Manufactors that are only found in the Manufactor Repository
List<ManufactorListItem> inManufactorsOnly =
manufactorItemList.Except(familyManufactorList).ToList();
// Mark them as (Parent Only)
foreach (ManufactorListItem manOnly in inManufactorsOnly) {
manOnly.InheritanceState = EnumInheritanceState.InParent;
}
// Identify Manufactors that are only found in the Family Repository
List<ManufactorListItem> inFamiliesOnly =
familyManufactorList.Except(manufactorItemList).ToList();
// Mark them as (Child Only)
foreach (ManufactorListItem famOnly in inFamiliesOnly) {
famOnly.InheritanceState = EnumInheritanceState.InChild;
}
// Identify Manufactors that are found in both Repositories
List<ManufactorListItem> sameList =
manufactorItemList.Intersect(familyManufactorList).ToList();
// Mark them Accordingly
foreach (ManufactorListItem same in sameList) {
same.InheritanceState = EnumInheritanceState.InBoth;
}
// Create an output List
List<ManufactorListItem> manufactors = new List<ManufactorListItem>();
// Join all of the lists together.
manufactors = sameList.Union(inManufactorsOnly).
Union(inFamiliesOnly).ToList();
Any ideas hot to get around this?
Thanks in advance
You can make it much simplier:
List<ManufactorListItem> manufactorItemList = ...;
List<ManufactorListItem> familyManufactorList = ...;
var allItems = manufactorItemList.ToDictionary(i => i, i => InheritanceState.InParent);
foreach (var familyManufactor in familyManufactorList)
{
allItems[familyManufactor] = allItems.ContainsKey(familyManufactor) ?
InheritanceState.InBoth :
InheritanceState.InChild;
}
//that's all, now we can get any subset items:
var inFamiliesOnly = allItems.Where(p => p.Value == InheritanceState.InChild).Select(p => p.Key);
var inManufactorsOnly = allItems.Where(p => p.Value == InheritanceState.InParent).Select(p => p.Key);
var allManufactors = allItems.Keys;
This seems like the simplest way to me:
(I'm using the following Enum for simplicity:
public enum ContainedIn
{
AOnly,
BOnly,
Both
}
)
var la = new List<int> {1, 2, 3};
var lb = new List<int> {2, 3, 4};
var l1 = la.Except(lb)
.Select(i => new Tuple<int, ContainedIn>(i, ContainedIn.AOnly));
var l2 = lb.Except(la)
.Select(i => new Tuple<int, ContainedIn>(i, ContainedIn.BOnly));
var l3 = la.Intersect(lb)
.Select(i => new Tuple<int, ContainedIn>(i, ContainedIn.Both));
var combined = l1.Union(l2).Union(l3);
So long as you have access to the Tuple<T1, T2> class (I think it's a .NET 4 addition).
If the problem is with the Except() statement, then I suggest you use the 3 parameter override of Except in order to provide a custom IEqualityComparer<ManufactorListItem> compare which tests the appropriate ManufactorListItem fields, but not the InheritanceState.
e.g. your equality comparer might look like:
public class ManufactorComparer : IEqualityComparer<ManufactorListItem> {
public bool Equals(ManufactorListItem x, ManufactorListItem y) {
// you need to write a method here that tests all the fields except InheritanceState
}
public int GetHashCode(ManufactorListItem obj) {
// you need to write a simple hash code generator here using any/all the fields except InheritanceState
}
}
and then you would call this using code a bit like
// Identify Manufactors that are only found in the Manufactor Repository
List<ManufactorListItem> inManufactorsOnly =
manufactorItemList.Except(familyManufactorList, new ManufactorComparer()).ToList();

LINQ Combine Queries

I have two collections of objects of different type. Lets call them type ALPHA and type BRAVO. Each of these types has a property that is the "ID" for the object. No ID is duplicated within the class, so for any given ID, there is at most one ALPHA and one BRAVO instance. What I need to do is divide them into 3 categories:
Instances of the ID in ALPHA which do not appear in the BRAVO collection;
Instances of the ID in BRAVO which do not appear in the ALPHA collection;
Instances of the ID which appear in both collections.
In all 3 cases, I need to have the actual objects from the collections at hand for subsequent manipulation.
I know for the #3 case, I can do something like:
var myCorrelatedItems = myAlphaItems.Join(myBravoItems, alpha => alpha.Id, beta => beta.Id, (inner, outer) => new
{
alpha = inner,
beta = outer
});
I can also write code for the #1 and #2 cases which look something like
var myUnmatchedAlphas = myAlphaItems.Where(alpha=>!myBravoItems.Any(bravo=>alpha.Id==bravo.Id));
And similarly for unMatchedBravos. Unfortunately, this would result in iterating the collection of alphas (which may be very large!) many times, and the collection of bravos (which may also be very large!) many times as well.
Is there any way to unify these query concepts so as to minimize iteration over the lists? These collections can have thousands of items.
If you are only interested in the IDs,
var alphaIds = myAlphaItems.Select(alpha => alpha.ID);
var bravoIds = myBravoItems.Select(bravo => bravo.ID);
var alphaIdsNotInBravo = alphaIds.Except(bravoIds);
var bravoIdsNotInAlpha = bravoIds.Except(alphaIds);
If you want the alphas and bravos themselves,
var alphaIdsSet = new HashSet<int>(alphaIds);
var bravoIdsSet = new HashSet<int>(bravoIds);
var alphasNotInBravo = myAlphaItems
.Where(alpha => !bravoIdsSet.Contains(alpha.ID));
var bravosNotInAlpha = myBravoItems
.Where(bravo => !alphaIdsSet.Contains(bravo.ID));
EDIT:
A few other options:
The ExceptBy method from MoreLinq.
The Enumerable.ToDictionary method.
If both types inherit from a common type (e.g. an IHasId interface), you could write your own IEqualityComparer<T> implementation; Enumerable.Except has an overload that accepts an equality-comparer as a parameter.
Sometimes LINQ is not the answer. This is the kind of problem where I would consider using a HashSet<T> with a custom comparer to reduce the work of performing set operations. HashSets are much more efficient at performing set operations than lists - and (depending on the data) can reduce the work considerably:
// create a wrapper class that can accomodate either an Alpha or a Bravo
class ABItem {
public Object Instance { get; private set; }
public int Id { get; private set; }
public ABItem( Alpha a ) { Instance = a; Id = a.Id; }
public ABItem( Bravo b ) { Instance = b; Id = b.Id; }
}
// comparer that compares Alphas and Bravos by id
class ABItemComparer : IComparer {
public int Compare( object a, object b ) {
return GetId(a).Compare(GetId(b));
}
private int GetId( object x ) {
if( x is Alpha ) return ((Alpha)x).Id;
if( x is Bravo ) return ((Bravo)x).Id;
throw new InvalidArgumentException();
}
}
// create a comparer based on comparing the ID's of ABItems
var comparer = new ABComparer();
var hashAlphas =
new HashSet<ABItem>(myAlphaItems.Select(x => new ABItem(x)),comparer);
var hashBravos =
new HashSet<ABItem>(myBravoItems.Select(x => new ABItem(x)),comparer);
// items with common IDs in Alpha and Bravo sets:
var hashCommon = new HashSet<Alpha>(hashAlphas).IntersectWith( hashSetBravo );
hashSetAlpha.ExceptWith( hashSetCommon ); // items only in Alpha
hashSetBravo.ExceptWith( hashSetCommon ); // items only in Bravo
Dictionary<int, Alpha> alphaDictionary = myAlphaItems.ToDictionary(a => a.Id);
Dictionary<int, Bravo> bravoDictionary = myBravoItems.ToDictionary(b => b.Id);
ILookup<string, int> keyLookup = alphaDictionary.Keys
.Union(bravoDictionary.Keys)
.ToLookup(x => alphaDictionary.ContainsKey(x) ?
(bravoDictionary.ContainsKey(x) ? "both" : "alpha") :
"bravo");
List<Alpha> alphaBoth = keyLookup["both"].Select(x => alphaDictionary[x]).ToList();
List<Bravo> bravoBoth = keyLookup["both"].Select(x => bravoDictionary[x]).ToList();
List<Alpha> alphaOnly = keyLookup["alpha"].Select(x => alphaDictionary[x]).ToList();
List<Bravo> bravoOnly = keyLookup["bravo"].Select(x => bravoDictionary[x]).ToList();
Here is one possible LINQ solution that performs a full outer join on both sets and appends a property to them showing which group they belong to. This solution might lose its luster, however, when you try to separate the groups into different variables. It all really depends on what kind of actions you need to perform on these objects. At any rate this ran at (I thought) an acceptable speed (.5 seconds) for me on lists of 5000 items:
var q =
from g in
(from id in myAlphaItems.Select(a => a.ID).Union(myBravoItems.Select(b => b.ID))
join a in myAlphaItems on id equals a.ID into ja
from a in ja.DefaultIfEmpty()
join b in myBravoItems on id equals b.ID into jb
from b in jb.DefaultIfEmpty()
select (a == null ?
new { ID = b.ID, Group = "Bravo Only" } :
(b == null ?
new { ID = a.ID, Group = "Alpha Only" } :
new { ID = a.ID, Group = "Both" }
)
)
)
group g.ID by g.Group;
You can remove the 'group by' query or create a dictionary from this (q.ToDictionary(x => x.Key, x => x.Select(y => y))), or whatever! This is simply a way of categorizing your items. I'm sure there are better solutions out there, but this seemed like a truly interesting question so I thought I might as well give it a shot!
I think LINQ is not the best answer to this problem if you want to traverse and compare the minimum amount of times. I think the following iterative solution is more performant. And I believe that code readability doesn't suffer.
var dictUnmatchedAlphas = myAlphaItems.ToDictionary(a => a.Id);
var myCorrelatedItems = new List<AlphaAndBravo>();
var myUnmatchedBravos = new List<Bravo>();
foreach (Bravo b in myBravoItems)
{
var id = b.Id;
if (dictUnmatchedAlphas.ContainsKey(id))
{
var a = dictUnmatchedAlphas[id];
dictUnmatchedAlphas.Remove(id); //to get just the unmatched alphas
myCorrelatedItems.Add(new AlphaAndBravo { a = a, b = b});
}
else
{
myUnmatchedBravos.Add(b);
}
}
Definition of AlphaAndBravo:
public class AlphaAndBravo {
public Alpha a { get; set; }
public Bravo b { get; set; }
}

Categories