Combine two lists into one based on property - c#

I would like to ask whether there's an elegant and efficient way to merge two lists of MyClass into one?
MyClass looks like this:
ID: int
Name: string
ExtID: int?
and the lists are populated from different sources and objects in lists do share ID, so it looks like that:
MyClass instance from List1
ID = someInt
Name = someString
ExtID = null
And MyClass instance from List2
ID = someInt (same as List1)
Name = someString (same as List1)
ExtID = someInt
What I basically need is to combine these two lists, so the outcome is a list containing:
ID = someInt (from List1)
Name = someString (from List1)
ExtID = someInt (null if no corresponding item - based on ID - on List2)
I know I can do this simply using foreach loop, but I'd love to know if there's more elegant and maybe preferred (due to performance, readability) method?

There are many approaches depending on what is the priority, ex. Union + Lookup:
//this will create a key value pairs: id -> matching instances
var idMap = list1.Union(list2).ToLookup(myClass => myClass.ID);
//now just select for each ID the instance you want, ex. with some value
var mergedInstances = idMap.Select(row =>
row.FirstOrDefault(myClass => myClass.ExtId.HasValue) ?? row.First());
The benefit of above is that it will work with whatever amount of whatever lists even if they contain many duplicated isntances and then you can easily modify the conditions of merging
A small improvement would be to extract a method to merge instances:
MyClass MergeInstances(IEnumerable<MyClass> instances){
return instances.FirstOrDefault(myClass => myClass.ExtId.HasValue)
?? instances.First(); //or whatever else you imagine
}
and now just use it in the code above
var mergedInstances = idMap.Select(MergeInstances);
Clean, flexible, simple, no additional conditions. Performance wise not perfect, but who cares.
Edit: since performance is the priority, some more options
Do a lookup like above but only for the smaller list. Then iterate through the bigger and do the needed changes O(m log m) + O(n). m - smaller list size, n- bigger list size - should be fastest.
Order both lists by elements ids. Create a for loop, that iterates through both of them keeping current index to the element with same id for both lists. Move index to the next smallest id found in both list, if one has it only, move only this on. O(n log n) + O(m log m) + O(n);

Is this what you want
var joined = from Item1 in list1
join Item2 in list2
on Item1.Id equals Item2.Id // join on some property
select new MyClass(Item1.Id, Item1.Name, Item1.ExtID??Item2.ExtID);
Edit: If you're looking for an outer join,
var query = from Item1 in list1
join Item2 in list2 on Item1.Id equals Item2.Id into gj
from sublist2 in gj.DefaultIfEmpty()
select new MyClass(Item1.Id, Item1.Name, sublist2??string.empty);
Readability wise, using foreach loop is not a too bad idea..

I'd sugest creating the foreach loop in a method of that class, so everytime you needed to do such thing you'd use something like
instanceList1.MergeLists(instanceList2)
and with this method, you could control everything you wanted withing the merge operation.

Related

How way to operate on single items at an index in group in IGrouping?

I have simplified groupby code below, where I know there exists at most two records for each group, with each group being grouped by the value at index two in string a array. I want to iterate through the list of keys in the IGrouping, and combine some values in each Group then add that result to a final list, but I am new to LINQ so don't exactly know how to access these first and/or second values at an index.
Can anyone shed some light on how to do this?
each Group derived from var lines is something like this:
key string[]
---- -------------
123 A, stuff, stuff
123 B, stuff, stuff
and I want the result to be a string[] that combines elements of each group in the "final" list like:
string[]
-------
A, B
my code:
var lines = File.ReadAllLines(#path).Skip(1).Select(r => r.Split('\t')).ToList();
List<string[]> final = new List<string[]>();
var groups = lines.GroupBy(r => r[2]);
foreach (var pairs in groups)
{
// I need to combine items in each group here; maybe a for() statement would be better so I can peek ahead??
foreach (string[] item in pairs)
{
string[] s = new string[] { };
s[0] = item[0];
s[1] = second item in group - not sure what to do here or if I am going aboout this the right way
final.Add(s);
}
}
There's not too much support on the subject either, so I figured it may be helpful to somebody.
It sounds like all you're missing is calling ToList or ToArray on the group:
foreach (var group in groups)
{
List<string[]> pairs = group.ToList();
// Now you can access pairs[0] for the first item in the group,
// pairs[1] for the second item, pairs.Count to check how many
// items there are, or whatever.
}
Or you could avoid creating a list, and call Count(), ElementAt(), ElementAtOrDefault() etc on the group.
Now depending on what you're actually doing in the body of your nested foreach loop (it's not clear, and the code you've given so far won't work because you're trying to assign a value into an empty array) you may be able to get away with:
var final = lines.GroupBy(r => r[2])
.Select(g => ...)
.ToList()
where the ... is "whatever you want to do with a group". If you can possibly do that, it would make the code a lot clearer.
With more information in the question, it looks like you want just:
var final = lines.GroupBy(r => r[2])
.Select(g => g.Select(array => array[0]).ToArray())
.ToList()

Finding the list of common objects between two lists

I have list of objects of a class for example:
class MyClass
{
string id,
string name,
string lastname
}
so for example: List<MyClass> myClassList;
and also I have list of string of some ids, so for example:
List<string> myIdList;
Now I am looking for a way to have a method that accept these two as paramets and returns me a List<MyClass> of the objects that their id is the same as what we have in myIdList.
NOTE: Always the bigger list is myClassList and always myIdList is a smaller subset of that.
How can we find this intersection?
So you're looking to find all the elements in myClassList where myIdList contains the ID? That suggests:
var query = myClassList.Where(c => myIdList.Contains(c.id));
Note that if you could use a HashSet<string> instead of a List<string>, each Contains test will potentially be more efficient - certainly if your list of IDs grows large. (If the list of IDs is tiny, there may well be very little difference at all.)
It's important to consider the difference between a join and the above approach in the face of duplicate elements in either myClassList or myIdList. A join will yield every matching pair - the above will yield either 0 or 1 element per item in myClassList.
Which of those you want is up to you.
EDIT: If you're talking to a database, it would be best if you didn't use a List<T> for the entities in the first place - unless you need them for something else, it would be much more sensible to do the query in the database than fetching all the data and then performing the query locally.
That isn't strictly an intersection (unless the ids are unique), but you can simply use Contains, i.e.
var sublist = myClassList.Where(x => myIdList.Contains(x.id));
You will, however, get significantly better performance if you create a HashSet<T> first:
var hash = new HashSet<string>(myIdList);
var sublist = myClassList.Where(x => hash.Contains(x.id));
You can use a join between the two lists:
return myClassList.Join(
myIdList,
item => item.Id,
id => id,
(item, id) => item)
.ToList();
It is kind of intersection between two list so read it like i want something from one list that is present in second list. Here ToList() part executing the query simultaneouly.
var lst = myClassList.Where(x => myIdList.Contains(x.id)).ToList();
you have to use below mentioned code
var samedata=myClassList.where(p=>p.myIdList.Any(q=>q==p.id))
myClassList.Where(x => myIdList.Contains(x.id));
Try
List<MyClass> GetMatchingObjects(List<MyClass> classList, List<string> idList)
{
return classList.Where(myClass => idList.Any(x => myClass.id == x)).ToList();
}
var q = myClassList.Where(x => myIdList.Contains(x.id));

Linq optimization of query and foreach

I return a List from a Linq query, and after it I have to fill the values in it with a for cycle.
The problem is that it is too slow.
var formentries = (from f in db.bNetFormEntries
join s in db.bNetFormStatus on f.StatusID.Value equals s.StatusID into entryStatus
join s2 in db.bNetFormStatus on f.ExternalStatusID.Value equals s2.StatusID into entryStatus2
where f.FormID == formID
orderby f.FormEntryID descending
select new FormEntry
{
FormEntryID = f.FormEntryID,
FormID = f.FormID,
IPAddress = f.IpAddress,
UserAgent = f.UserAgent,
CreatedBy = f.CreatedBy,
CreatedDate = f.CreatedDate,
UpdatedBy = f.UpdatedBy,
UpdatedDate = f.UpdatedDate,
StatusID = f.StatusID,
StatusText = entryStatus.FirstOrDefault().Status,
ExternalStatusID = f.ExternalStatusID,
ExternalStatusText = entryStatus2.FirstOrDefault().Status
}).ToList();
and then I use the for in this way:
for(var x=0; x<formentries.Count(); x++)
{
var values = (from e in entryvalues
where e.FormEntryID.Equals(formentries.ElementAt(x).FormEntryID)
select e).ToList<FormEntryValue>();
formentries.ElementAt(x).Values = values;
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
But it is definitely too slow.
Is there a way to make it faster?
it is definitely too slow. Is there a way to make it faster?
Maybe. Maybe not. But that's not the right question to ask. The right question is:
Why is it so slow?
It is a lot easier to figure out the answer to the first question if you have an answer to the second question! If the answer to the second question is "because the database is in Tokyo and I'm in Rome, and the fact that the packets move no faster than speed of light is the cause of my unacceptable slowdown", then the way you make it faster is you move to Japan; no amount of fixing the query is going to change the speed of light.
To figure out why it is so slow, get a profiler. Run the code through the profiler and use that to identify where you are spending most of your time. Then see if you can speed up that part.
For what I see, you are iterating trough formentries 2 more times without reason - when you populate the values, and when you convert to dictionary.
If entryvalues is a database driven - i.e. you get them from the database, then put the value field population in the first query.
If it's not, then you do not need to invoke ToList() on the first query, do the loop, and then the Dictionary creation.
var formentries = from f in db.bNetFormEntries
join s in db.bNetFormStatus on f.StatusID.Value equals s.StatusID into entryStatus
join s2 in db.bNetFormStatus on f.ExternalStatusID.Value equals s2.StatusID into entryStatus2
where f.FormID == formID
orderby f.FormEntryID descending
select new FormEntry
{
FormEntryID = f.FormEntryID,
FormID = f.FormID,
IPAddress = f.IpAddress,
UserAgent = f.UserAgent,
CreatedBy = f.CreatedBy,
CreatedDate = f.CreatedDate,
UpdatedBy = f.UpdatedBy,
UpdatedDate = f.UpdatedDate,
StatusID = f.StatusID,
StatusText = entryStatus.FirstOrDefault().Status,
ExternalStatusID = f.ExternalStatusID,
ExternalStatusText = entryStatus2.FirstOrDefault().Status
};
var formEntryDictionary = new Dictionary<int, FormEntry>();
foreach (formEntry in formentries)
{
formentry.Values = GetValuesForFormEntry(formentry, entryvalues);
formEntryDict.Add(formEntry.FormEntryID, formEntry);
}
return formEntryDictionary;
And the values preparation:
private IList<FormEntryValue> GetValuesForFormEntry(FormEntry formEntry, IEnumerable<FormEntryValue> entryValues)
{
return (from e in entryValues
where e.FormEntryID.Equals(formEntry.FormEntryID)
select e).ToList<FormEntryValue>();
}
You can change the private method to accept only entryId instead the whole formEntry if you wish.
It's slow because your O(N*M) where N is formentries.Count and M is entryvalues.Count Even with a simple test I was getting more than 20 times slower with only 1000 elements any my type only had an int id field, with 10000 elements in the list it was over 1600 times slower than the code below!
Assuming your entryvalues is a local list and not hitting a database (just .ToList() it to a new variable somewhere if that's the case), and assuming your FormEntryId is unique (which it seems to be from the .ToDictionary call then try this instead:
var entryvaluesDictionary = entryvalues.ToDictionary(entry => entry.FormEntryID, entry => entry);
for(var x=0; x<formentries.Count; x++)
{
formentries[x] = entryvaluesDictionary[formentries[x].FormEntryID];
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
It should go a long way to making it at least scale better.
Changes: .Count instead of .Count() just because it's better to not call extension method when you don't need to. Using a dictionary to find the values rather than doing a where for every x value in the for loop effectively removes the M from the bigO.
If this isn't entirely correct I'm sure you can change whatever is missing to suit your work case instead. But as an aside, you should really consider using case for your variable names formentries versus formEntries one is just that little bit easier to read.
There are some reasons why this might be slow regarding the way you use formentries.
The formentries List<T> from above has a Count property, but you are calling the enumerable Count() extension method instead. This extension may or may not have an optimization that detects that you're operating on a collection type that has a Count property that it can defer to instead of walking the enumeration to compute the count.
Similarly the formEntries.ElementAt(x) expression is used twice; if they have not optimized ElementAt to determine that they are working with a collection like a list that can jump to an item by its index then LINQ will have to redundantly walk the list to get to the xth item.
The above evaluation may miss the real problem, which you'll only really know if you profile. However, you can avoid the above while making your code significantly easier to read if you switch how you iterate the collection of formentries as follows:
foreach(var fe in formentries)
{
fe.Values = entryvalues
.Where(e => e.FormEntryID.Equals(fe.FormEntryID))
.ToList<FormEntryValue>();
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
You may have resorted to the for(var x=...) ...ElementAt(x) approach because you thought you could not modify properties on object referenced by the foreach loop variable fe.
That said, another point that could be an issue is if formentries has multiple items with the same FormEntryID. This would result in the same work being done multiple times inside the loop. While the top query appears to be against a database, you can still do joins with data in linq-to-object land. Happy optimizing/profiling/coding - let us know what works for you.

LINQ group items. A single item may be in several groups

I have an IEnumerable of items that I would like to group by associated categories. The items are grouped by the categories that are associated with them - which is a List - so a single item can potentially be a part of multiple categories.
var categories = numbers.SelectMany(x => x.Categories).Distinct();
var query =
from cat in categories
select new {Key = cat,
Values = numbers.Where(n => n.Categories.Contains(cat))};
I use the above code, and it does in fact work, but I was wondering if there was a more efficient way of doing this because this operation will likely perform slowly when numbers contains thousands of values.
I am pretty much asking for a refactoring of the code to be more efficient.
You can use LINQ's built-in grouping capabilities, which should be faster than a contains lookup. However, as with any performance-related question, you should really write code to collect performance metrics before deciding how to rewrite code that you know works. It may turn out that there's no performance problem at all for the volumes you will be working with.
So, here's the code. This isn't tested, but something like it should work:
var result = from n in numbers
from c in n.Categories
select new {Key = c, n.Value}
into x group x by x.Key into g
select g;
Each group contains a key and a sequence of values that belong to that key:
foreach( var group in result )
{
Console.WriteLine( group.Key );
foreach( var value in group )
Console.WriteLine( value );
}

Linq to update a collection with values from another collection?

I have IQueryable<someClass> baseList
and List<someOtherClass> someData
What I want to do is update attributes in some items in baseList.
For every item in someData, I want to find the corresponding item in baselist and update a property of the item.
someOtherClass.someCode == baseList.myCode
can I do some type of join with Linq and set baseList.someData += someOtherClass.DataIWantToConcantenate.
I could probably do this by iteration, but is there a fancy Linq way I can do this in just a couple lines of code?
Thanks for any tips,
~ck in San Diego
To pair elements in the two lists you can use a LINQ join:
var pairs = from d in someData
join b in baseList.AsEnumerable()
on d.someCode equals b.myCode
select new { b, d };
This will give you an enumeration of each item in someData paired with its counterpart in baseList. From there, you can concatenate in a loop:
foreach(var pair in pairs)
pair.b.SomeData += pair.d.DataIWantToConcantenate;
If you really meant set concatenation rather than +=, take a look at LINQ's Union, Intersect or Except methods.
LINQ is for querying - not for updating. That means it'll be fine to use LINQ to find the corresponding item, but for the modification you should be using iteration.
Admittedly you might want to perform some appropriate query to get baseList into an efficient form first - e.g. a Dictionary<string, SomeClass> based on the property you'll be using to find the corresponding item.
You can convert the IQueryable<SomeClass> into a List<SomeClass>, use the ForEach method to loop over it and update the elements, then convert back to IQueryable:
List<SomeClass> convertedList = baseList.ToList();
convertedList.ForEach(sc =>
{
SomeOtherClass oc = someData.First(obj => obj.SomeCode == sc.MyCode);
if (oc != null)
{
sc.SomeData += oc.DataIWantToConcatenate;
}
});
baseList = convertedList.AsQueryable(); // back to IQueryable
But it may be more efficient during this using non-LINQ constructs.
As mentioned before, it should be a combination of loop and LINQ
foreach (var someDataItem in someData)
{
someDataItem.PropertyToUpdate = (baseList.FirstOrDefault(baseListItem => baseListItem .key == someDataItem.key) ?? new SomeClass(){OtherProperty = "OptionalDefaultValue"}).OtherProperty;
}
You can't simply find objects that are in one list but not the other, because they are two different types. I'll assume you're comparing a property called OtherProperty that is common to the two different classes, and shares the same type. In that case, using nothing but Linq queries:
// update those items that match by creating a new item with an
// updated property
var updated =
from d in data
join b in baseList on d.OtherProperty equals b.OtherProperty
select new MyType()
{
PropertyToUpdate = d.PropertyToUpdate,
OtherProperty = d.OtherProperty
};
// and now add to that all the items in baseList that weren't found in data
var result =
(from b in baseList
where !updated.Select(x => x.OtherProperty).Contains(b.OtherProperty)
select b).Concat(updated);

Categories