Linq: find similar objects from two different lists

Linq: find similar objects from two different lists - c#

I've got two separate lists of custom objects. In these two separate lists, there may be some objects that are identical between the two lists, with the exception of one field ("id"). I'd like to know a smart way to query these two lists to find this overlap. I've attached some code to help clarify. Any suggestions would be appreciated.
namespace ConsoleApplication1
{
class userObj
{
public int id;
public DateTime BirthDate;
public string FirstName;
public string LastName;
}
class Program
{
static void Main(string[] args)
{
List<userObj> list1 = new List<userObj>();
list1.Add(new userObj()
{
BirthDate=DateTime.Parse("1/1/2000"),
FirstName="John",
LastName="Smith",
id=0
});
list1.Add(new userObj()
{
BirthDate = DateTime.Parse("2/2/2000"),
FirstName = "Jane",
LastName = "Doe",
id = 1
});
list1.Add(new userObj()
{
BirthDate = DateTime.Parse("3/3/2000"),
FirstName = "Sam",
LastName = "Smith",
id = 2
});
List<userObj> list2 = new List<userObj>();
list2.Add(new userObj()
{
BirthDate = DateTime.Parse("1/1/2000"),
FirstName = "John",
LastName = "Smith",
id = 3
});
list2.Add(new userObj()
{
BirthDate = DateTime.Parse("2/2/2000"),
FirstName = "Jane",
LastName = "Doe",
id = 4
});
List<int> similarObjectsFromTwoLists = null;
//Would like this equal to the overlap. It could be the IDs on either side that have a "buddy" on the other side: (3,4) or (0,1) in the above case.
}
}
}

I don't know why you want a List<int>, i assume this is what you want:
var intersectingUser = from l1 in list1
join l2 in list2
on new { l1.FirstName, l1.LastName, l1.BirthDate }
equals new { l2.FirstName, l2.LastName, l2.BirthDate }
select new { ID1 = l1.id, ID2 = l2.id };
foreach (var bothIDs in intersectingUser)
{
Console.WriteLine("ID in List1: {0} ID in List2: {1}",
bothIDs.ID1, bothIDs.ID2);
}
Output:
ID in List1: 0 ID in List2: 3
ID in List1: 1 ID in List2: 4

You can implement your own IEqualityComparer<T> for your userObj class and use that to run a comparison between the two lists. This will be the most performant approach.
public class NameAndBirthdayComparer : IEqualityComparer<userObj>
{
public bool Equals(userObj x, userObj y)
{
return x.FirstName == y.FirstName && x.LastName == y.LastName && x.BirthDate == y.BirthDate;
}
public int GetHashCode(userObj obj)
{
unchecked
{
var hash = (int)2166136261;
hash = hash * 16777619 ^ obj.FirstName.GetHashCode();
hash = hash * 16777619 ^ obj.LastName.GetHashCode();
hash = hash * 16777619 ^ obj.BirthDate.GetHashCode();
return hash;
}
}
}
You can use this comparer like this:
list1.Intersect(list2, new NameAndBirthdayComparer()).Select(obj => obj.id).ToList();

You could simply join the lists on those 3 properties:
var result = from l1 in list1
join l2 in list2
on new {l1.BirthDate, l1.FirstName, l1.LastName}
equals new {l2.BirthDate, l2.FirstName, l2.LastName}
select new
{
fname = l1.FirstName,
name = l1.LastName,
bday = l1.BirthDate
};
Instead of doing a simple join on just one property (column), two anonymous objects are created new { prop1, prop2, ..., propN}, on which the join is executed.
In your case we are taking all properties, except the Id, which you want to be ignored and voila:
Output:
And Tim beat me to it by a minute

var similarObjectsFromTwoLists = list1.Where(x =>
list2.Exists(y => y.BirthDate == x.BirthDate && y.FirstName == x.FirstName && y.LastName == x.LastName)
).ToList();
This is shorter, but for large list is more efficient "Intersect" or "Join":
var similarObjectsFromTwoLists =
list1.Join(list2, x => x.GetHashCode(), y => y.GetHashCode(), (x, y) => x).ToList();
(suposing GetHashCode() is defined for userObj)

var query = list1.Join (list2,
obj => new {FirstName=obj.FirstName,LastName=obj.LastName, BirthDate=obj.BirthDate},
innObj => new {FirstName=innObj.FirstName, LastName=innObj.LastName, BirthDate=innObj.BirthDate},
(obj, userObj) => (new {List1Id = obj.id, List2Id = userObj.id}));
foreach (var item in query)
{
Console.WriteLine(item.List1Id + " " + item.List2Id);
}

Related

Combine two lists of entities with a condition

Say I have a class defined as
class Object
{
public int ID { get;set; }
public string Property { get; set; }
public override bool Equals(object obj)
{
Object Item = obj as Object;
return Item.ID == this.ID;
}
public override int GetHashCode()
{
int hash = 13;
hash = (hash * 7) + ID.GetHashCode();
return hash;
}
}
And two lists, defined like so:
List<Object> List1;
List<Object> List2;
These two lists contain objects where ID fields could be the same, but Property fields may or may not. I want to have a result of all objects contained in List1 together with all objects contained in List2, with the condition thatPropertyfield must be set to"1"if it is set to"1"` in any of those lists. The result must contain distinct values (distinct IDs).
For example, if we have 2 lists like this:
List1
-----
ID = 0, Property = "1"
ID = 1, Property = ""
ID = 2, Property = "1"
ID = 3, Property = ""
List2
-----
ID = 1, Property = "1"
ID = 2, Property = ""
ID = 3, Property = ""
I need a result to look like this:
Result
-------
ID = 0, Property = "1"
ID = 1, Property = "1"
ID = 2, Property = "1"
ID = 3, Property = ""
Currently it works like this:
var Result = List1.Except(List2).Concat(List2.Except(List1));
var Intersection = List1.Intersect(List2).ToList();
Intersection.ForEach(x => {
x.Property = List1.Single(y => y.ID == x.ID).Property == "1" ? "1" : List2.Single(y => y.ID == x.ID).Property == "1" ? "1" : "";
});
Result = Result.Concat(Intersection);
...but ForEach is very slow. Can someone suggest a faster way?

var result = List1.Concat(List2)
.GroupBy(o => o.ID)
.Select(g => new Object() {
ID=g.Key,
Property=g.Any(o=>o.Property=="1")?"1":""
})
.ToList();

var result = List1.Concat(List2)
.OrderByDescending(o => o.Property)
.GroupBy(g => o.ID)
.Select(g => g.First())
.ToList();

c# - AutoMapper: how to copy fields from 2 different sources into 1 destination

I have a flat file with a bunch of records, let's say it's a sequence of 2 record types
--- Record1: ID;NAME;SURNAME
--- Record2: AGE;SEX;
Let's call R1 the class representing Record1 and R2 the class representing Record2
In this moment I have an array of R1 and another array of R2
If I have a POCO called Subject that has 5 fields, named exactly as the union of the fields of R1 and R2, how do I configure AutoMapper to do the magic for me?
Now I'm trying this:
var subjects = Mapper.Map<IEnumerable<R1>, List<Subject>>(arrayOfR1s);
Mapper.Map<IEnumerable<R2>, List<Subject>>(arrayOfR2s, subjects);
After the first mapping, I get an array of Subjects, in every element of the array the fields ID, SURNAME, NAME are correctly filled with values. AGE and SEX are left to NULL as expected.
But after the second mapping, all the fields from R1 (ID, NAME, SURNAME) are initialized to NULL and I only get fields from R2 (AGE and SEX).
How do I get the complete union of the fields?
Can someone point me to the right approach?

How about the straightforward dynamic mapping of joined (anonymously typed) objects?
Record1[] firstRecords = new[]
{
new Record1
{
ID = Guid.NewGuid(),
Name = "John", Surname = "Doe"
},
new Record1
{
ID = Guid.NewGuid(),
Name = "Jane", Surname = "Roe"
}
};
Record2[] secondRecords = new[]
{
new Record2 { Age = 20, Sex = Sex.Male },
new Record2 { Age = 20, Sex = Sex.Female }
};
var subjects = firstRecords
.Select((first, index) =>
{
var second = secondRecords[index];
var r = new
{
ID = first.ID,
Name = first.Name,
Surname = first.Surname,
Age = second.Age,
Sex = second.Sex
};
return Mapper.DynamicMap<Subject>(r);
})
.ToArray();
By the way, you can map these object without using AutoMapper, but using LINQ Select().
var subjects = firstRecords
.Select((first, index) =>
{
var second = secondRecords[index];
var r = new Subject
{
ID = first.ID,
Name = first.Name,
Surname = first.Surname,
Age = second.Age,
Sex = second.Sex
};
return r;
})
.ToArray();
Update
If you need to copy a lot of properties, please take a look at the Value Injecter. InjectFrom() FTW!
var subjects = firstRecords
.Select((first, index) =>
{
var second = secondRecords[index];
var r = new Subject();
r.InjectFrom(first).InjectFrom(second);
return r;
})
.ToArray();

Merge contents of multiple lists of custom objects - C#

I have a class Project as
public class Project
{ public int ProjectId { get; set; }
public string ProjectName { get; set; }
public string Customer { get; set; }
public string Address{ get; set; }
}
and I have 3 lists
List<Project> lst1; List<Project> lst2; List<Project> lst3;
lst1 contains Person objects with ProjectId and ProjectName.
ProjectId =1, ProjectName = "X", Customer = null, Address = null
ProjectId =2, ProjectName = "Y", Customer = null, Address = null
lst2 contains Person objects with ProjectId and Customer
ProjectId =1,ProjectName = null, Customer = "c1", Address = null
ProjectId =2,ProjectName = null, Customer = "c2", Address = null
, and
lst3 contains Person objects with ProjectId and Address
ProjectId = 1, ProjectName = null, Customer =null, Address = "a1"
ProjectId = 2, ProjectName = null, Customer =null, Address = "a2".
Considering there are multiple such records in each list and ProjectId is Uniqe for each project, How can I merge/combine these list to get one list with merged objects
ProjectId=1, ProjectName="X", Customer="c1", address="a1"
ProjectId=2, ProjectName="Y", Customer="c2", address="a2"
I found thse links similar and tried with it but could not meet the results
Create a list from two object lists with linq
How to merge two lists using LINQ?
Thank You.

This could be done in a multi-step approach pretty simply. First, define a Func<Project, Project, Project> to handle the actual record merging. That is, you are defining a method with a signature equivalent to public Project SomeMethod(Project p1, Project p2). This method implements the merging logic you outlined above. Next, we concatenate the elements of the lists together before grouping them by ProjectId, using our merge delegate as the an aggregate function in the overload of GroupBy which accepts a result selector:
Func<Project, Project, Project> mergeFunc = (p1,p2) => new Project
{
ProjectId = p1.ProjectId,
ProjectName = p1.ProjectName == null ? p2.ProjectName : p1.ProjectName,
Customer = p1.Customer == null ? p2.Customer : p1.Customer,
Address = p1.Address == null ? p2.Address : p1.Address
};
var output = lst1.Concat(lst2).Concat(lst3)
.GroupBy(x => x.ProjectId, (k, g) => g.Aggregate(mergeFunc));
Here's a quick and dirty test of the above logic along with output:
List<Project> lst1; List<Project> lst2; List<Project> lst3;
lst1 = new List<Project>
{
new Project { ProjectId = 1, ProjectName = "P1" },
new Project { ProjectId = 2, ProjectName = "P2" },
new Project { ProjectId = 3, ProjectName = "P3" }
};
lst2 = new List<Project>
{
new Project { ProjectId = 1, Customer = "Cust1"},
new Project { ProjectId = 2, Customer = "Cust2"},
new Project { ProjectId = 3, Customer = "Cust3"}
};
lst3 = new List<Project>
{
new Project { ProjectId = 1, Address = "Add1"},
new Project { ProjectId = 2, Address = "Add2"},
new Project { ProjectId = 3, Address = "Add3"}
};
Func<Project, Project, Project> mergeFunc = (p1,p2) => new Project
{
ProjectId = p1.ProjectId,
ProjectName = p1.ProjectName == null ? p2.ProjectName : p1.ProjectName,
Customer = p1.Customer == null ? p2.Customer : p1.Customer,
Address = p1.Address == null ? p2.Address : p1.Address
};
var output = lst1
.Concat(lst2)
.Concat(lst3)
.GroupBy(x => x.ProjectId, (k, g) => g.Aggregate(mergeFunc));
IEnumerable<bool> assertedCollection = output.Select((x, i) =>
x.ProjectId == (i + 1)
&& x.ProjectName == "P" + (i+1)
&& x.Customer == "Cust" + (i+1)
&& x.Address == "Add" + (i+1));
Debug.Assert(output.Count() == 3);
Debug.Assert(assertedCollection.All(x => x == true));
--- output ---
IEnumerable<Project> (3 items)
ProjectId ProjectName Customer Address
1 P1 Cust1 Add1
2 P2 Cust2 Add2
3 P3 Cust3 Add3

Using a Lookup you can do it like this:
List<Project> lst = lst1.Union(lst2).Union(lst3).ToLookup(x => x.ProjectId).Select(x => new Project()
{
ProjectId = x.Key,
ProjectName = x.Select(y => y.ProjectName).Aggregate((z1,z2) => z1 ?? z2),
Customer = x.Select(y => y.Customer).Aggregate((z1, z2) => z1 ?? z2),
Address = x.Select(y => y.Address).Aggregate((z1, z2) => z1 ?? z2)
}).ToList();

I belive the folloing is how LINQ Join works:
var mergedProjects =
lst1
.Join(lst2,
proj1 => proj1.ProjectID,
proj2 => proj2.ProjectID,
(proj1, proj2) => new { Proj1 = proj1, Proj2 = proj2 })
.Join(lst3,
pair => pair.Proj1.ProjectID,
proj3 => proj3.ProjectID,
(pair, proj3) => new Project
{
ProjectID = proj3.ProjectID,
ProjectName = pair.Proj1.ProjectName,
Customer = pair.Proj2.Customer,
Address = proj3.Address
});
This will not return any results where the ProjectID is not found in all three lists.
If this is a problem, I think you'd be better off doing this manually rather than using LINQ.

I assume that list contains same number of items and are sorted by ProjectId.
List<Project> lst1; List<Project> lst2; List<Project> lst3
If list are not sorted you can sort it first.
list1.Sort(p => p.ProjectId);
list2.Sort(p => p.ProjectId);
list3.Sort(p => p.ProjectId);
For merging the object
List<Project> list4 = new List<Project>();
for(int i=1; i<list.Count; i++)
{
list4.Add(new Project
{
ProjectId = list1[i].ProjectId;
ProjectName = list1[i].ProjectName;
Customer = list2[i].Customer;
Address = list3[i].Address;
});
}

Although overkill, I was tempted to make this an extension method:
public static List<T> MergeWith<T,TKey>(this List<T> list, List<T> other, Func<T,TKey> keySelector, Func<T,T,T> merge)
{
var newList = new List<T>();
foreach(var item in list)
{
var otherItem = other.SingleOrDefault((i) => keySelector(i).Equals(keySelector(item)));
if(otherItem != null)
{
newList.Add(merge(item,otherItem));
}
}
return newList;
}
Usage would then be:
var merged = list1
.MergeWith(list2, i => i.ProjectId,
(lhs,rhs) => new Project{ProjectId=lhs.ProjectId,ProjectName=lhs.ProjectName, Customer=rhs.Customer})
.MergeWith(list3,i => i.ProjectId,
(lhs,rhs) => new Project{ProjectId=lhs.ProjectId,ProjectName=lhs.ProjectName, Customer=lhs.Customer,Address=rhs.Address});
Live example: http://rextester.com/ETIVB14254

This is assuming that you want to take the first non-null value, or revert to the default value - in this case null for a string.
private static IEnumerable<Project> GetMergedProjects(IEnumerable<List<Project>> projects)
{
var projectGrouping = projects.SelectMany(p => p).GroupBy(p => p.ProjectId);
foreach (var projectGroup in projectGrouping)
{
yield return new Project
{
ProjectId = projectGroup.Key,
ProjectName =
projectGroup.Select(p => p.ProjectName).FirstOrDefault(
p => !string.IsNullOrEmpty(p)),
Customer =
projectGroup.Select(c => c.Customer).FirstOrDefault(
c => !string.IsNullOrEmpty(c)),
Address =
projectGroup.Select(a => a.Address).FirstOrDefault(
a => !string.IsNullOrEmpty(a)),
};
}
}
You could also make this an extension method if needed.

Using List<Person> Distinct() to return 2 values

I have a Person class, with Name and AreaID properties.
public class Person
{
public string Name;
public int AreaID;
// snip
}
I have a List<Person> with the potential for hundreds of Person objects in the list.
e.g., 100 Persons with AreaID = 1 and 100 Persons with AreaID = 2
I want to return distinct list of AreaID's and how many Persons have that AreaID.
For example,
AreaID = 1 Persons = 100
AreaID = 2 Persons = 100

Use the GroupBy method.
var list = ...list of Persons...
var areas = list.GroupBy( p => p.AreaID )
.Select( g => new {
AreaID = g.Key,
Count = g.Count()
});

Looks like you want to group by area ID then:
var groups = from person in persons
group 1 by person.AreaID into area
select new { AreaID = area.Key, Persons = area.Count() };
I'm using "group 1" to indicate that I really don't care about the data within each group - only the count and the key.
This is inefficient in that it has to buffer all the results for the sake of grouping - you make well be able to use Reactive LINQ in .NET 4.0 to do this more efficiently, or you could certainly use Push LINQ if you wanted to. Then again, for relatively small datasets it probably doesn't matter :)

Surprisingly nobody advised to override Equals and GetHashCode. If you do so you can do folowing:
List<Person> unique = personList.Distinct();
Or even
List<Person> areaGroup = personList.GroupBy(p => p.AreaID);
List<Person> area1Count = personList.Where(p => p.AreaID == 1).Count();
This gives you more flexibility, - no need in useless anonymous class.

return list.GroupBy(p => p.AreaID)
.Select(g => new { AreaID = g.Key, People = g.Count() });

you could use list.GroupBy(x => x.AreaID);

You can try this:
var groups = from person in list
group person by person.AreaID into areaGroup
select new {
AreaID = areaGroup.Key,
Count = areaGroup.Count()
};

var people = new List<Person>();
var q = from p in people
group p by p.AreaId into g
select new { Id = g.Key, Total = g.Count() };
people.Add(new Person { AreaId = 1, Name = "Alex" });
people.Add(new Person { AreaId = 1, Name = "Alex" });
people.Add(new Person { AreaId = 2, Name = "Alex" });
people.Add(new Person { AreaId = 3, Name = "Alex" });
people.Add(new Person { AreaId = 3, Name = "Alex" });
people.Add(new Person { AreaId = 4, Name = "Alex" });
people.Add(new Person { AreaId = 2, Name = "Alex" });
people.Add(new Person { AreaId = 4, Name = "Alex" });
people.Add(new Person { AreaId = 1, Name = "Alex" });
foreach (var item in q)
{
Console.WriteLine("AreaId: {0}, Total: {1}",item.Id,item.Total);
}

Something like this, perhaps ?
List<Person> persons = new List<Person> ();
persons.Add (new Person (1, "test1"));
persons.Add (new Person (1, "test2"));
persons.Add (new Person (2, "test3"));
var results =
persons.GroupBy (p => p.AreaId);
foreach( var r in results )
{
Console.WriteLine (String.Format ("Area Id: {0} - Number of members: {1}", r.Key, r.Count ()));
}
Console.ReadLine ();

Instead of distinct, use GroupBy, or the more succinct LINQ statement:
var results = from p in PersonList
group p by p.AreaID into g
select new { AreaID=g.Key, Count=g.Count() };
foreach(var item in results)
Console.WriteLine("There were {0} items in Area {1}", item.Count, item.AreaID);

ToLookup() will do what you want.

Can a single LINQ Query Expression be framed in this scenario?

I am facing a scenario where I have to filter a single object based on many objects.
For sake of example, I have a Grocery object which comprises of both Fruit and Vegetable properties. Then I have the individual Fruit and Vegetable objects.
My objective is this:
var groceryList = from grocery in Grocery.ToList()
from fruit in Fruit.ToList()
from veggie in Vegetable.ToList()
where (grocery.fruitId = fruit.fruitId)
where (grocery.vegId = veggie.vegId)
select (grocery);
The problem I am facing is when Fruit and Vegetable objects are empty.
By empty, I mean their list count is 0 and I want to apply the filter only if the filter list is populated.
I am also NOT able to use something like since objects are null:
var groceryList = from grocery in Grocery.ToList()
from fruit in Fruit.ToList()
from veggie in Vegetable.ToList()
where (grocery.fruitId = fruit.fruitId || fruit.fruitId == String.Empty)
where (grocery.vegId = veggie.vegId || veggie.vegId == String.Empty)
select (grocery);
So, I intend to check for Fruit and Vegetable list count...and filter them as separate expressions on successively filtered Grocery objects.
But is there a way to still get the list in case of null objects in a single query expression?

I think the LINQ GroupJoin operator will help you here. It's similar to the TSQL LEFT OUTER JOIN

IEnumerable<Grocery> query = Grocery
if (Fruit != null)
{
query = query.Where(grocery =>
Fruit.Any(fruit => fruit.FruitId == grocery.FruitId));
}
if (Vegetable != null)
{
query = query.Where(grocery =>
Vegetable.Any(veggie => veggie.VegetableId == grocery.VegetableId));
}
List<Grocery> results = query.ToList();

Try something like the following:
var joined = grocery.Join(fruit, g => g.fruitId,
f => f.fruitId,
(g, f) => new Grocery() { /*set grocery properties*/ }).
Join(veggie, g => g.vegId,
v => v.vegId,
(g, v) => new Grocery() { /*set grocery properties*/ });
Where I have said set grocery properties you can set the properties of the grocery object from the g, f, v variables of the selector. Of interest will obviouly be setting g.fruitId = f.fruitId and g.vegeId = v.vegeId.

var groceryList =
from grocery in Grocery.ToList()
join fruit in Fruit.ToList()
on grocery.fruidId equals fruit.fruitId
into groceryFruits
join veggie in Vegetable.ToList()
on grocery.vegId equals veggie.vegId
into groceryVeggies
where ... // filter as needed
select new
{
Grocery = grocery,
GroceryFruits = groceryFruits,
GroceryVeggies = groceryVeggies
};

You have to use leftouter join (like TSQL) for this. below the query for the trick
private void test()
{
var grocery = new List<groceryy>() { new groceryy { fruitId = 1, vegid = 1, name = "s" }, new groceryy { fruitId = 2, vegid = 2, name = "a" }, new groceryy { fruitId = 3, vegid = 3, name = "h" } };
var fruit = new List<fruitt>() { new fruitt { fruitId = 1, fname = "s" }, new fruitt { fruitId = 2, fname = "a" } };
var veggie = new List<veggiee>() { new veggiee { vegid = 1, vname = "s" }, new veggiee { vegid = 2, vname = "a" } };
//var fruit= new List<fruitt>();
//var veggie = new List<veggiee>();
var result = from g in grocery
join f in fruit on g.fruitId equals f.fruitId into tempFruit
join v in veggie on g.vegid equals v.vegid into tempVegg
from joinedFruit in tempFruit.DefaultIfEmpty()
from joinedVegg in tempVegg.DefaultIfEmpty()
select new { g.fruitId, g.vegid, fname = ((joinedFruit == null) ? string.Empty : joinedFruit.fname), vname = ((joinedVegg == null) ? string.Empty : joinedVegg.vname) };
foreach (var outt in result)
Console.WriteLine(outt.fruitId + " " + outt.vegid + " " + outt.fname + " " + outt.vname);
}
public class groceryy
{
public int fruitId;
public int vegid;
public string name;
}
public class fruitt
{
public int fruitId;
public string fname;
}
public class veggiee
{
public int vegid;
public string vname;
}
EDIT:
this is the sample result
1 1 s s
2 2 a a
3 3

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq: find similar objects from two different lists - c#

Related

Combine two lists of entities with a condition

c# - AutoMapper: how to copy fields from 2 different sources into 1 destination

Merge contents of multiple lists of custom objects - C#

Using List<Person> Distinct() to return 2 values

Can a single LINQ Query Expression be framed in this scenario?

Categories

Resources