IEnumerable.Except() between different classes with a common field - c#

Is it possible to use Except() for two List's that have two different classes but a common field? I have List<User1> and List<User2> collections. They have different properties except Id column and I want to find the different records between them using this Id column. I'm trying to use List<>.Except() but I'm getting this error:
The type arguments for method 'System.Linq.Enumerable.Except(System.Collections.Generic.IEnumerable, System.Collections.Generic.IEnumerable)' cannot be inferred from the usage. Try specifying the type arguments explicitly.
Here's what I'm trying:
List<User1> list1 = List1();
List<User2> list2 = List2();
var listdiff = list1.Except(list2.Select(row => row.Id));
What am I doing wrong?

List1 contains instances of User1 and List2 contains instances of User2.
What type of instance should be produced by list1.Except(list2.Select(row => row.Id))?
In other words if type inference was not available, what would you replace var with?
If User1 and User2 inherit from the same ancestor (with ID), use List<User> instead.
Otherwise:
var list2Lookup = list2.ToLookup(user => user.Id);
var listdiff = list1.Where(user => (!list2Lookup.Contains(user.Id))

Not Except, but the correct results and similar performance:
// assumes that the Id property is an Int32
var tempKeys = new HashSet<int>(list2.Select(x => x.Id));
var listdiff = list1.Where(x => tempKeys.Add(x.Id));
And, of course, you can wrap it all up in your own re-usable extension method:
var listdiff = list1.Except(list2, x => x.Id, y => y.Id);
// ...
public static class EnumerableExtensions
{
public static IEnumerable<TFirst> Except<TFirst, TSecond, TKey>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TKey> firstKeySelector,
Func<TSecond, TKey> secondKeySelector)
{
// argument null checking etc omitted for brevity
var keys = new HashSet<TKey>(second.Select(secondKeySelector));
return first.Where(x => keys.Add(firstKeySelector(x)));
}
}

Briefly, make lists to be List<object> and use C# feature from .NET 4.0: dynamic.
Example:
var listDiff = list1
.AsEnumerable<object>()
.Except(list2
.AsEnumerable<object>()
.Select(row => ((dynamic)row).ID));

If you just want the Ids in list1 that are not in list2, you can do:
var idsInList1NotInList2 = list1.Select(user1 => user1.Id)
.Except(list2.Select(user2 => user2.Id));
If you need the associated User1 objects too, here's one way (assuming Ids are unique for a User1 object):
// Create lookup from Id to the associated User1 object
var user1sById = list1.ToDictionary(user1 => user1.Id);
// Find Ids from the lookup that are not present for User2s from list2
// and then retrieve their associated User1s from the lookup
var user1sNotInList2 = user1sById.Keys
.Except(list2.Select(user2 => user2.Id))
.Select(key => user1sById[key]);
EDIT: vc74's take on this idea is slightly better; it doesn't require uniqueness.

public static IEnumerable<TSource> Except<TSource, CSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> TSelector, IEnumerable<CSource> csource, Func<CSource, TKey> CSelector)
{
bool EqualFlag = false;
foreach (var s in source)
{
EqualFlag = false;
foreach (var c in csource)
{
var svalue = TSelector(s);
var cvalue = CSelector(c);
if (svalue != null)
{
if (svalue.Equals(cvalue))
{
EqualFlag = true;
break;
}
}
else if (svalue == null && cvalue == null)
{
EqualFlag = true;
break;
}
}
if (EqualFlag)
continue;
else
{
yield return s;
}
}
}

Try
list1.Where(user1 => !list2.Any(user2 => user2.Id.Equal(user1.Id)));

Related

How to Except<> specifing another key? Or faster way to differences two huge List<>?

I have a list of AE_AlignedPartners items in the db, which I retrieve with:
List<AE_AlignedPartners> ae_alignedPartners_olds = ctx.AE_AlignedPartners.AsNoTracking().ToList();
Than, I got and serialize a new list (of the same object type) with JSON:
List<AE_AlignedPartners> ae_alignedPartners_news = GetJSONPartnersList();
Than I'm getting the intersections of both:
var IDSIntersections = (from itemNew in ae_alignedPartners_news
join itemOld in ae_alignedPartners_olds on itemNew.ObjectID equals itemOld.ObjectID
select itemNew).Select(p => p.ObjectID).ToList();
Now, due of these intersections, I need to create two new lists, with the added items (ae_alignedPartners_news - intersections) and the deleted ones (ae_alignedPartners_olds - interesections). Here's the code:
// to create
IList<AE_AlignedPartners> ae_alignedPartners_toCreate = ae_alignedPartners_news.Where(p => !IDSIntersections.Contains(p.ObjectID)).ToList();
// to delete
IList<AE_AlignedPartners> ae_alignedPartners_toDelete = ae_alignedPartners_olds.Where(p => !IDSIntersections.Contains(p.ObjectID)).ToList();
But with many records (~100k) it tooks too much time.
Is there a sort of Except<> specifing which key need to be compared? In my case its not p.ID (which is the Primary Key on the DB), but p.ObjectID.
Or any other faster way?
There is an Except function that you can use with a custom comparer:
class PartnerComparer : IEqualityComparer<AE_AlignedPartners>
{
// Partners are equal if their ObjectID's are equal.
public bool Equals(AE_AlignedPartners x, AE_AlignedPartners y)
{
//Check whether the partner's ObjectID's are equal.
return x.ObjectID == y.ObjectID;
}
public int GetHashCode(AE_AlignedPartners ap) {
return ap.ObjectId.GetHashCode();
}
}
var intersect = ae_alignedPartners_news.Intersect(ae_alignedPartners_olds);
var creates = ae_alignedPartners_news.Except(intersect, new PartnerComparer);
var deletes = ae_alignedPartners_old.Except(intersect, new PartnerComparer);
This should give you a reasonable boost in performance.
You don't need an inner join, you need a full outer join on primary key. LINQ does not know a full outer join, but it is easy to extend IEnumerable with a function.
from StackOverlow: LINQ full outer join, I took the solution that uses deferred execution. This solution only works if the KeySelector uses unique keys.
public static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> sequenceA,
IEnumerable<TB> sequenceB,
Func<TA, TKey> keyASelector,
Func<TB, TKey> keyBSelector,
Func<TKey, TA, TB, TResult> resultSelector,
IEqualityComparer<TKey> comparer)
{
if (comparer == null) comparer = EqualityComparer<TKey>.Default;
// create two lookup tables:
var alookup = a.ToLookup(selectKeyA, comparer);
var blookup = b.ToLookup(selectKeyB, comparer);
// all used keys:
var aKeys = alookup.Select(p => p.Key);
var bKeys = blookup.Select(p => p.Key);
var allUsedKeys = aKeys.bKeys.Distinct(comparer);
// for every used key:
// get the values from A with this key, or default if it is not a key used by A
// and the value from B with this key, or default if it is not a key used by B
// put the key, and the fetched values in the ResultSelector
foreach (TKey key in allUsedKeys)
{
TA fetchedA = aLookup[key].FirstOrDefault();
TB fetchedB = bLookup[key].FirstOrDefault();
TResult result = ResultSelector(key, fetchedA, fetchedB);
yield result;
}
I use this function to create three types:
Values in A but not in B: (A, null) => must be added
Values in B but not in A: (null, B) => must be removed
Values in A and in B: (A, B) => need further inspection to see if update is needed
.
IEnumerable<AlignedPartners> olds = ...
IEnumerable<AlignedPartners> news = ...
var joinResult = olds.FullOuterJoin(news, // join old and new
oldItem => oldItem.Id, // from every old take the Id
newItem => newItem.Id, // from every new take the Id
(key, oldItem, newItem) => new // when they match make one new object
{ // containing the following properties
OldItem = oldItem,
NewItem = newItem,
});
Note: until now nothing has been enumerated!
foreach (var joinedItem in joinResult)
{
if (joinedItem.OldItem == null)
{
// we won't have both items null, so we know NewItem is not null
AddItem(joinedItem.NewItem);
}
else if (joinedItem.NewItem == null)
{ // old not null, new equals null
DeleteItem(joinedItem.OldItem);
}
else
{ // both old and new not null, if desired: check if update needed
if (!comparer.Equals(old, new))
{ // changed
UpdateItems(old, new)
}
}
}

C# How to split a List in two using LINQ [duplicate]

This question already has answers here:
Can I split an IEnumerable into two by a boolean criteria without two queries?
(6 answers)
Closed 2 years ago.
I am trying to split a List into two Lists using LINQ without iterating the 'master' list twice. One List should contain the elements for which the LINQ condition is true, and the other should contain all the other elements. Is this at all possible?
Right now I just use two LINQ queries, thus iterating the (huge) master List twice.
Here's the (pseudo) code I am using right now:
List<EventModel> events = GetAllEvents();
List<EventModel> openEvents = events.Where(e => e.Closer_User_ID == null);
List<EventModel> closedEvents = events.Where(e => e.Closer_User_ID != null);
Is it possible to yield the same results without iterating the original List twice?
You can use ToLookup extension method as follows:
List<Foo> items = new List<Foo> { new Foo { Name="A",Condition=true},new Foo { Name = "B", Condition = true },new Foo { Name = "C", Condition = false } };
var lookupItems = items.ToLookup(item => item.Condition);
var lstTrueItems = lookupItems[true];
var lstFalseItems = lookupItems[false];
You can do this in one statement by converting it into a Lookup table:
var splitTables = events.Tolookup(event => event.Closer_User_ID == null);
This will return a sequence of two elements, where every element is an IGrouping<bool, EventModel>. The Key says whether the sequence is the sequence with null Closer_User_Id, or not.
However this looks rather mystical. My advice would be to extend LINQ with a new function.
This function takes a sequence of any kind, and a predicate that divides the sequence into two groups: the group that matches the predicate and the group that doesn't match the predicate.
This way you can use the function to divide all kinds of IEnumerable sequences into two sequences.
See Extension methods demystified
public static IEnumerable<IGrouping<bool, TSource>> Split<TSource>(
this IEnumerable<TSource> source,
Func<TSource,bool> predicate)
{
return source.ToLookup(predicate);
}
Usage:
IEnumerable<Person> persons = ...
// divide the persons into adults and non-adults:
var result = persons.Split(person => person.IsAdult);
Result has two elements: the one with Key true has all Adults.
Although usage has now become easier to read, you still have the problem that the complete sequence is processed, while in fact you might only want to use a few of the resulting items
Let's return an IEnumerable<KeyValuePair<bool, TSource>>, where the Boolean value indicates whether the item matches or doesn't match:
public static IEnumerable<KeyValuePair<bool, TSource>> Audit<TSource>(
this IEnumerable<TSource> source,
Func<TSource,bool> predicate)
{
foreach (var sourceItem in source)
{
yield return new KeyValuePair<bool, TSource>(predicate(sourceItem, sourceItem));
}
}
Now you get a sequence, where every element says whether it matches or not. If you only need a few of them, the rest of the sequence is not processed:
IEnumerable<EventModel> eventModels = ...
EventModel firstOpenEvent = eventModels.Audit(event => event.Closer_User_ID == null)
.Where(splitEvent => splitEvent.Key)
.FirstOrDefault();
The where says that you only want those Audited items that passed auditing (key is true).
Because you only need the first element, the rest of the sequence is not audited anymore
GroupBy and Single should accomplish what you're looking for:
var groups = events.GroupBy(e => e.Closer_User_ID == null).ToList(); // As others mentioned this needs to be materialized to prevent `events` from being iterated twice.
var openEvents = groups.SingleOrDefault(grp => grp.Key == true)?.ToList() ?? new List<EventModel>();
var closedEvents = groups.SingleOrDefault(grp => grp.Key == false)?.ToList() ?? new List<EventModel>();
One line solution by using ForEach method of List:
List<EventModel> events = GetAllEvents();
List<EventModel> openEvents = new List<EventModel>();
List<EventModel> closedEvents = new List<EventModel>();
events.ForEach(x => (x.Closer_User_ID == null ? openEvents : closedEvents).Add(x));
You can do without LINQ. Switch to conventional loop approach.
List<EventModel> openEvents = new List<EventModel>();
List<EventModel> closedEvents = new List<EventModel>();
foreach(var e in events)
{
if(e.Closer_User_ID == null)
{
openEvents.Add(e);
}
else
{
closedEvents.Add(e);
}
}

LINQ conversion to List object

I am using the following code to return an IList:
FileName = Path.GetFileName(files[i]);
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName);
QueryListFromFTP = (IList<DataX>)QueryListFromFTP
.Select(x => new { x.user_id, x.date, x.application_ID })
.ToList()
.Distinct();
However I keep getting this error:
Unable to cast object of type 'd__7a1[<>f__AnonymousType03[System.String,System.String,System.String]]' to type 'System.Collections.Generic.IList`1[DataXLibrary.DataX]'.
What am I doing wrong?
If what you want is a List < DataX > than all you need is:
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName).Distinct().ToList();
// Use QueryListFromFTP here.
If you want a List of a different type of object as a result of your .Select, than you need to store the result in a List of object of that type i.e. anonymous if that's what you want.
The following line creates an anonymous type in c# which is not correspondent to the type Datax:
new { x.user_id, x.date, x.application_ID })
You should alter it to something like this:
Select(x => new Datax(){User_id = x.user_id, Date = x.date, Application = x.application_ID })
There are two problems in your code:
You're converting the List of DataX objects to an "anonymous type object" (the new { x.user_id, x.date, x.application_ID }). This object is not the same type as DataX, and it can't be coerced back to a DataX object automatically.
Trying to read between the lines a little, it looks like you want a distinct list of DataX objects, where distinctness is determined by a subset of the properties of a DataX object. So you have to answer the question, what will you do with duplicates (by this definition) that have different data in other properties? You have to discard some of them. Distinct() is not the right tool for this, because it only applies to the entire object of the IEnumerable it is applied to.
It's almost like you need a DistinctBy with one parameter giving the properties to calculate distinctness with, and a second parameter giving some logic for deciding which of the non-distinct "duplicates" to select. But this can be achieved with multiple IEnumerable methods: GroupBy and a further expression to select an appropriate single itemd from each resulting group. Here's one possible solution:
FileName = Path.GetFileName(files[i]);
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName)
.GroupBy(datax => new { datax.user_id, datax.date, datax.application_ID })
.Select(g => g.First()); // or another expression to choose one item per group
.ToList();
If, for example, there were a version field and you wanted the most recent one for each "duplicate", you could:
.Select(g => g.OrderByDescending(datax => data.version).First())
Please note, however, that if you just want distinctness over all the properties of the object, and there is no need to select one particular value (in order to get its additional properties after throwing away some objects considered duplicates), then it may be as simple as this:
IList<DataX> QueryListFromFTP = DataX.GetListFromFTP(FileName)
.Distinct()
.ToList();
I would furthermore advise that you use IReadOnlyCollection where possible (that's .ToList().AsReadOnly()) and that, depending on your data, you may want to make the GetListFromFTP function perform the de-duplication/distinctness instead.
To answer any concerns that GroupBy isn't the right answer because it may not perform well enough, here is an alternate way to handle this (though I wholeheartedly disagree with you--until tests prove it's slow, it's a perfectly fine answer).
// in a static helper class of some kind
public static IEnumerable<T> DistinctBy<T, TKey>(
this IEnumerable<T> source,
Func<T, TKey> keySelector
) {
if (source == null) {
throw new ArgumentNullException("source", "Source enumerable cannot be null.");
}
if (keySelector == null) {
throw new ArgumentNullException("keySelector", "keySelector function cannot be null. To perform a generic distinct, use .Distinct().");
}
return DistinctByImpl(source, keySelector);
}
private static IEnumerable<T> DistinctByImpl<T, TKey>(
this IEnumerable<T> source,
Func<T, TKey> keySelector
) {
HashSet<TKey> keys = new HashSet<TKey>();
return source.Where(s => keys.Add(keySelector(s)));
}
It is used like this:
public class Animal {
public string Name { get; set; }
public string AnimalType { get; set; }
public decimal Weight { get; set; }
}
IEnumerable<Animal> animals = new List<Animal> {
new Animal { Name = "Fido", AnimalType = "Dog", Weight = 15.0M },
new Animal { Name = "Trixie", AnimalType = "Dog", Weight = 15.0M },
new Animal { Name = "Juliet", AnimalType = "Cat", Weight = 12.0M },
new Animal { Name = "Juliet", AnimalType = "Fish", Weight = 1.0M }
};
var filtered1 = animals.DistinctBy(a => new { a.AnimalType, a.Weight });
/* returns:
Name Type Weight
Fido Dog 15.0
Juliet Cat 12.0
Juliet Fish 1.0
*/
var filtered2 = animals.DistinctBy(a => a.Name); // or a simple property
/* returns:
Name Type Weight
Fido Dog 15.0
Trixie Dog 15.0
Juliet Cat 12.0
*/

Trying GroupBy within List in c#

In the below case, I want to get a count of how many times the employee is repeating. For example, if the list has EmpA 25 times, I would like to get it. I am trying with GroupBy but not getting results. I can do record skip and find the count but there are lot of records.
So in below example, lineEmpNrs is the list and I want to have grouping results by employee ID.
Please suggest.
public static string ReadLines(StreamReader input)
{
string line;
while ( (line = input.ReadLine()) != null)
yield return line;
}
private taMibMsftEmpDetails BuildLine(string EmpId, string EmpName, String ExpnsDate)
{
taMibMsftEmpDetails empSlNr = new taMibMsftEmpDetails();
empSlNr.EmployeeId = EmpId;
empSlNr.EmployeeName = EmpName;
empSlNr.ExpenseDate = ExpnsDate;
return empSlNr;
}
List<taMibMsftEmpDetails> lineEmpNrs = new List<taMibMsftEmpDetails>();
foreach (string line in ReadLines(HeaderFile))
{
headerFields = line.Split(',');
lineEmpNrs.Add(BuildLine(headerFields[1],headerFields[2],headerFields[3]));
}
You can define following delegate, which you will use to select grouping key from list elements. It matches any method which accepts one argument and returns some value (key value):
public delegate TResult Func<T, TResult>(T arg);
And following generic method, which will convert any list to dictionary of grouped items
public static Dictionary<TKey, List<T>> ToDictionary<T, TKey>(
List<T> source, Func<T, TKey> keySelector)
{
Dictionary<TKey, List<T>> result = new Dictionary<TKey, List<T>>();
foreach (T item in source)
{
TKey key = keySelector(item);
if (!result.ContainsKey(key))
result[key] = new List<T>();
result[key].Add(item);
}
return result;
}
Now you will be able to group any list into dictionary by any property of list items:
List<taMibMsftEmpDetails> lineEmpNrs = new List<taMibMsftEmpDetails>();
// we are grouping by EmployeeId here
Func<taMibMsftEmpDetails, int> keySelector =
delegate(taMibMsftEmpDetails emp) { return emp.EmployeeId; };
Dictionary<int, List<taMibMsftEmpDetails>> groupedEmployees =
ToDictionary(lineEmpNrs, keySelector);
GroupBy should work if you use it like this:
var foo = lineEmpNrs.GroupBy(e => e.Id);
And if you'd want to get an enumerable with all the employees of the specified ID:
var list = lineEmpNrs.Where(e => e.Id == 1); // Or whatever employee ID you want to match
Combining the two should get you the results you're after.
If you wanted to see how many records there were with each employee, you can use GroupBy as:
foreach (var g in lineEmpNrs.GroupBy(e => e.Id))
{
Console.WriteLine("{0} records with Id '{1}'", g.Count(), g.Key);
}
To simply find out how many records there are for a specified Id, however, it may be simpler to use Where instead:
Console.WriteLine("{0} records with Id '{1}'", lineEmpNrs.Where(e => e.Id == id).Count(), id);

Re-use parts of a linq query

We have a program that uses Linq-To-Sql and does a lot of similar queries on our tables. In particular, we have a timestamp column, and we pull the most recent timestamp to see if the table has changed since our last query.
System.Data.Linq.Binary ts;
ts = context.Documents
.Select(r => r.m_Timestamp)
.OrderByDescending(r => r)
.FirstOrDefault();
We repeat this query often on different tables that's relevant for the current form/query whatever. What I would like to do is create a "function" or something that can repeat the last 3 lines of this query (and ideally would work on every table). Here's what I would like to write:
System.Data.Linq.Binary ts;
ts = context.Documents
.GetMostRecentTimestamp();
But I have no idea how to create this "GetMostRecentTimestamp". Also, these queries are never this simple. They usually filter by the Customer, or by the current order, so a more valid query might be
ts = context.Documents
.Where(r => r.CustomerID == ThisCustomerID)
.GetMostRecentTiemstamp();
Any help? Thanks!
Update [Solution]
I selected Bala R's answer, here's the code updated so it compiles:
public static System.Data.Linq.Binary GetMostRecentTimestamp(this IQueryable<Data.Document> docs)
{
return docs
.Select(r => r.m_Timestamp)
.OrderByDescending(r => r)
.FirstOrDefault();
}
The only drawback to this solution is that I will have to write this function for each table. I would have loved Tejs's answer, if it actually worked, but I'm not re-designing my database for it. Plus DateTime is a not a good way to do timestamps.
Update #2 (Not so fast)
While I can do a query such as Documents.Where( ... ).GetMostRecentTimestamp(), this solution fails if I try to do an association based query such as MyCustomer.Orders.GetMostRecentTimestamp(), or MyCustomer.Orders.AsQueryable().GetMostRecentTimestamp();
This is actually pretty easy to do. You simply need to define an interface on the entities you wish to provide this for:
public class MyEntity : ITimestamp
Then, your extenstion method:
public static DateTime GetMostRecentTimestamp<T>(this IQueryable<T> queryable)
where T : ITimestamp
{
return queryable.Select(x => x.m_Timestamp)
.OrderByDescending(r => r)
.FirstOrDefault()
}
This is then useful on any entity that matches the interface:
context.Documents.GetMostRecentTimestamp()
context.SomeOtherEntity.GetMostRecentTimestamp()
How about an extension like this
public static DateTime GetMostRecentTimestamp (this IQueryable<Document> docs)
{
return docs.Select(r => r.m_Timestamp)
.OrderByDescending(r => r)
.FirstOrDefault();
}
Hmm...
DateTime timeStamp1 = dataContext.Customers.Max(c => c.TimeStamp);
DateTime timeStamp2 = dataContext.Orders.Max(c => c.TimeStamp);
DateTime timeStamp3 = dataContext.Details.Max(c => c.TimeStamp);
I created a pair of extension methods that could help you out: ObjectWithMin and ObjectWithMax:
public static T ObjectWithMax<T, TResult>(this IEnumerable<T> elements, Func<T, TResult> projection)
where TResult : IComparable<TResult>
{
if (elements == null) throw new ArgumentNullException("elements", "Sequence is null.");
if (!elements.Any()) throw new ArgumentException("Sequence contains no elements.");
var seed = elements.Select(t => new {Object = t, Projection = projection(t)}).First();
return elements.Aggregate(seed,
(s, x) =>
projection(x).CompareTo(s.Projection) >= 0
? new {Object = x, Projection = projection(x)}
: s
).Object;
}
public static T ObjectWithMin<T, TResult>(this IEnumerable<T> elements, Func<T, TResult> projection)
where TResult : IComparable<TResult>
{
if (elements == null) throw new ArgumentNullException("elements", "Sequence is null.");
if (!elements.Any()) throw new ArgumentException("Sequence contains no elements.");
var seed = elements.Select(t => new {Object = t, Projection = projection(t)}).First();
return elements.Aggregate(seed,
(s, x) =>
projection(x).CompareTo(s.Projection) < 0
? new {Object = x, Projection = projection(x)}
: s
).Object;
}
These two are more efficient than OrderBy().FirstOrDefault() when used on in-memory collections; however they're unparseable by IQueryable providers. You'd use them something like this:
ts = context.Documents
.Where(r => r.CustomerID == ThisCustomerID)
.ObjectWithMax(r=>r.m_Timestamp);
ts is now the object having the most recent timestamp, so you don't need a second query to get it.

Categories