comparing two lists with LINQ - c#

Let's say that I have these two lists of Persons. The Person object has FirstName, LastName, and Age properties.
List A
David Smith, 38
David Smith, 38
Susan Johnson, 23
List B
David Smith, 38
David Smith, 38
Susan Johnson, 23
Daniel Wallace, 55
I want to see if A is a subset of B by comparing the three properties. No, in this case I do not have a unique ID for each person.
EDIT: There can be duplicates in List A (David Smith, 38). List B should have the duplicates for it to qualify as a super set of B.

Once you've got a class which implements IEquatable<T> or IEqualityComparer<T>, it's easy to do the rest with Except and Any:
if (collectionA.Except(collectionB).Any())
{
// There are elements in A which aren't in B
}
or
if (collectionA.Except(collectionB, equalityComparer).Any())
{
// There are elements in A which aren't in B
}
EDIT: If there are duplicates, you'd probably want to group each collection, then check the counts:
var groupedA = collectionA.GroupBy(p => p,
(Value, g) => new { Value, Count = g.Count() });
var groupedB = collectionB.GroupBy(p => p,
(Value, g) => new { Value, Count = g.Count() });
var extras = from a in groupedA
join b in groupedB on a.Value equals b.Value into match
where !match.Any() || a.Count > match.First().Count
select a;
// ListA has at least one entry not in B, or with more duplicates than in B
if (extras.Any())
{
}
This is pretty horrible though...

If Person does not implement IEquatable<Person> the "brute force" method would be:
var isSubset = listA.All(pa => listB.Any(pb => pb.FirstName == pa.FirstName &&
pb.LastName == pa.LastName &&
pb.Age == pb.Age
)
)

You can use join
var l1 = new List<Person>();//Subset
var l2 = new List<Person>();//Set of all values
var res = from l1 in lst1
join l2 in lst2
on l1.Value equals l2.Value
select new { result = l1 };
And compare count.If it is euqual, then Set contains subset
bool flag = res.Count()==lst1.Count();

Related

LINQ to JSON group query on array

I have a sample of JSON data that I am converting to a JArray with NewtonSoft.
string jsonString = #"[{'features': ['sunroof','mag wheels']},{'features': ['sunroof']},{'features': ['mag wheels']},{'features': ['sunroof','mag wheels','spoiler']},{'features': ['sunroof','spoiler']},{'features': ['sunroof','mag wheels']},{'features': ['spoiler']}]";
I am trying to retrieve the features that are most commonly requested together. Based on the above dataset, my expected output would be:
sunroof, mag wheels, 2
sunroof, 1
mag wheels 1
sunroof, mag wheels, spoiler, 1
sunroof, spoiler, 1
spoiler, 1
However, my LINQ is rusty, and the code I am using to query my JSON data is returning the count of the individual features, not the features selected together:
JArray autoFeatures = JArray.Parse(jsonString);
var features = from f in autoFeatures.Select(feat => feat["features"]).Values<string>()
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
foreach (var feature in features)
{
Console.WriteLine("{0}, {1}", feature.indFeature, feature.count);
}
Actual Output:
sunroof, 5
mag wheels, 4
spoiler, 3
I was thinking maybe my query needs a 'distinct' in it, but I'm just not sure.
This is a problem with the Select. You are telling it to make each value found in the arrays to be its own item. In actuality you need to combine all the values into a string for each feature. Here is how you do it
var features = from f in autoFeatures.Select(feat => string.Join(",",feat["features"].Values<string>()))
group f by f into grp
orderby grp.Count() descending
select new { indFeature = grp.Key, count = grp.Count() };
Produces the following output
sunroof,mag wheels, 2
sunroof, 1
mag wheels, 1
sunroof,mag wheels,spoiler, 1
sunroof,spoiler, 1
spoiler, 1
You could use a HashSet to identify the distinct sets of features, and group on those sets. That way, your Linq looks basically identical to what you have now, but you need an additional IEqualityComparer class in the GroupBy to help compare one set of features to another to check if they're the same.
For example:
var featureSets = autoFeatures
.Select(feature => new HashSet<string>(feature["features"].Values<string>()))
.GroupBy(a => a, new HashSetComparer<string>())
.Select(a => new { Set = a.Key, Count = a.Count() })
.OrderByDescending(a => a.Count);
foreach (var result in featureSets)
{
Console.WriteLine($"{String.Join(",", result.Set)}: {result.Count}");
}
And the comparer class leverages the SetEquals method of the HashSet class to check if one set is the same as another (and this handles the strings being in a different order within the set, etc.)
public class HashSetComparer<T> : IEqualityComparer<HashSet<T>>
{
public bool Equals(HashSet<T> x, HashSet<T> y)
{
// so if x and y both contain "sunroof" only, this is true
// even if x and y are a different instance
return x.SetEquals(y);
}
public int GetHashCode(HashSet<T> obj)
{
// force comparison every time by always returning the same,
// or we could do something smarter like hash the contents
return 0;
}
}

LINQ - GroupBy multiple columns and merge the result

I am working with sizeable set of data (~130.000 records), I've managed to transform it the way I want it (to csv).
Here is a simplified example of how the List looks like:
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2"
"Surname2, Name2;Address2;State2;YES;Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Now, I would like to merge the records if 1st, 2nd AND 3rd column match, like so:
output
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2 Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Here's what I've got so far:
output.GroupBy(x => new { c1 = x.Split(';')[0], c2 = x.Split(';')[1], c3 = x.Split(';')[2] }).Select(//have no idea what should go here);
First try to get the columns you need projecting the result in an anonymous type:
var query= from r in output
let columns= r.Split(';')
select new { c1 =columns[0], c2 =columns[1], c3 = columns[2] ,c5=columns[4]};
And then create the groups but now using the anonymous object you define in the previous query:
var result= query.GroupBy(e=>new {e.c1, e.c2, e.c3})
.Select(g=> new {SurName=g.Key.c1,
Name=g.Key.c2,
Address=g.Key.c3,
Groups=String.Join(",",g.Select(e=>e.c4)});
I know I'm missing some columns but I think you can get the idea.
PS: The fact I have separated the logic in two queries is just for readability propose, you can compose both queries in one but that is not going to change the performance because LINQ use deferred evaluation.
This is how I would do it:
class Program
{
static void Main(string[] args)
{
List<string> input = new List<string> {
"Surname1, Name1;Address1;State1;YES;Group1",
"Surname2, Name2;Address2;State2;YES;Group2",
"Surname2, Name2;Address2;State2;YES;Group1",
"Surname3, Name3;Address3;State3;NO;Group1",
"Surname1, Name1;Address2;State1;YES;Group1",
};
var transformed = input.Select(s => s.Split(';'))
.GroupBy( s => new string[] { s[0], s[1], s[2], s[3] },
(key, elements) => string.Join(";", key) + ";" + string.Join(" ", elements.Select(e => e.Last())),
new MyEqualityComparer())
.ToList();
}
}
internal class MyEqualityComparer : IEqualityComparer<string[]>
{
public bool Equals(string[] x, string[] y)
{
return x[0] == y[0] && x[1] == y[1] && x[2] == y[2];
}
public int GetHashCode(string[] obj)
{
int hashCode = obj[0].GetHashCode();
hashCode = hashCode ^ obj[1].GetHashCode();
hashCode = hashCode ^ obj[2].GetHashCode();
return hashCode;
}
}
Consider the first 4 columns as the grouping key, but only use the first 3 for the comparison (hence the custom IEqualityComparer).
Then if you have the (key, elements) groups, transform them so that you join the elements of the key with ; (remember, the key consists of the first 4 columns) and add to it the last element from every member of the group, joined with a space.

How to compare two lists with multiple objects and set values?

I have two lists. Each list has a Name object and and a Value object. I want to loop through list1 and check if each list1 Name object is the same as the list2 Name object (the linq code below does this).
If they match, then I want the List1 Value to be set with the list2 Value How can this be done?
list1 list2
Name Value Name Value
john apple John orange
peter null Peter grape
I need it to look like this:
list1 list2
Name Value Name Value
john orange john orange
peter grape peter grape
Linq code:
var x = list1.Where(n => list2.Select(n1 => n1.Name).Contains(n.Name));
For filtering you can use LINQ, to set the values use a loop:
var commonItems = from x in list1
join y in list2
on x.Name equals y.Name
select new { Item = x, NewValue = y.Value };
foreach(var x in commonItems)
{
x.Item.Value = x.NewValue;
}
In one result, you can get the objects joined together:
var output= from l1 in list1
join l2 in list2
on l1.Name equals l2.Name
select new { List1 = l1, List2 = l2};
And then manipulate the objects on the returned results. by looping through each and setting:
foreach (var result in output)
result.List1.Value = result.List2.Value;
You are looking for a left join
var x = from l1 in list1
join l2 in list2 on l1.Name equals l2.Name into l3
from l2 in l3.DefaultIfEmpty()
select new { Name = l1.Name, Value = (l2 == null ? l1.Value : l2.Value) };

c# where in with list and linq

I have two lists, one have a list of object A an other a list of objects B, like this:
ObjectA
{
Int64 idObjectA;
String name;
....
}
ObjectB
{
Int64 idObjectB;
Int64 idObjectA;
String name;
....
}
I have two list, one with Object A and other with Object B. I want to create a new list C that have only objects B, which IDObjectA is any ID of the list A.
In SQL it would be somthing line that:
select * from B where IDObjectA IN(1,2,3,4...);
In my case, the list of values for the IN clause is the list of ObjectA, which have the property idObjectA.
You can use the Join linq method to achieve this by joining listB and listA by their idObjectA, then select itemB.
var result = (from itemB in listB
join itemA in listA on itemB.idObjectA equals itemA.idObjectA
select itemB).ToList();
This method has a linear complexity (O(n)). Using Where(... => ....Contains()) or double foreach has a quadratic complexity (O(n^2)).
The same with Join and without Contains:
var listC = listB.Join(listA, b => b.ObjectAId, a => a.Id, (b, a) => b).ToList();
This is slightly different way of doing it as opposed to a join.
List<ObjectA> listA = ..
List<ObjectB> listB = ..
int[] listAIds = listA.Select(a => a.idObjectA).ToList();
//^^ this projects the list of objects into a list of ints
//It reads like this...
//get items in listB WHERE..
listB.Where(b => listAIds.Contains(b.idObjectA)).ToList();
//b.idObjectA is in listA, OR where listA contains b.idObjectA
Not linq, but does what you want it to:
List<ObjectB> C = new List<ObjectB>();
foreach (n in B)
{
foreach (c in A)
{
if (n.idObjectA == c.idObjectA)
{
C.Add(n)
break;
}
}
}
Or if you wanted higher performance, use a for, and higher than that use Cédric Bignon's solution.

Complex Linq query

I have and containing object named: Flight
it contains:
List<Segement> Segements
List<Passenger> Pax
List<Award> Awards
as each award contains:
List<Segment> Segements
Passenger Pax
I want to check for all combinations of Segements and Pax (taken from the Flight obj) and then compare them to the existing combination within each Award.
So that finally I will get a list of Awards whose combination do not exist in any award object
I wonder how to do that in one linq query
Something like this?
var flight = new Flight();
var x = from s in flight.Segements
from p in flight.Pax
select new
{
Pax = p ,
Segemnt = s
};
var y = from a in flight.Awards
from s in a.Segements
select new
{
Pax = a.Pax,
Segemnt = s
};
var result = x.Except(y);
I think this single query will product the desired result:
var query =
from f in flights
from p in f.Pax
from s in f.Segements
from a in f.Awards
where !((a.Pax.Name == p.Name) && (a.Segements.Select(_ => _.Id).Contains(s.Id)))
select new Award { Pax = p, Segements = new[] { s } };
Obviously, I made some assumptions on how to identify individual passengers and segments. Also, I'd be very surprised if this query worked as-is when querying an entity framework data source directly.

Categories