LINQ Combine Queries - c#

I have two collections of objects of different type. Lets call them type ALPHA and type BRAVO. Each of these types has a property that is the "ID" for the object. No ID is duplicated within the class, so for any given ID, there is at most one ALPHA and one BRAVO instance. What I need to do is divide them into 3 categories:
Instances of the ID in ALPHA which do not appear in the BRAVO collection;
Instances of the ID in BRAVO which do not appear in the ALPHA collection;
Instances of the ID which appear in both collections.
In all 3 cases, I need to have the actual objects from the collections at hand for subsequent manipulation.
I know for the #3 case, I can do something like:
var myCorrelatedItems = myAlphaItems.Join(myBravoItems, alpha => alpha.Id, beta => beta.Id, (inner, outer) => new
{
alpha = inner,
beta = outer
});
I can also write code for the #1 and #2 cases which look something like
var myUnmatchedAlphas = myAlphaItems.Where(alpha=>!myBravoItems.Any(bravo=>alpha.Id==bravo.Id));
And similarly for unMatchedBravos. Unfortunately, this would result in iterating the collection of alphas (which may be very large!) many times, and the collection of bravos (which may also be very large!) many times as well.
Is there any way to unify these query concepts so as to minimize iteration over the lists? These collections can have thousands of items.

If you are only interested in the IDs,
var alphaIds = myAlphaItems.Select(alpha => alpha.ID);
var bravoIds = myBravoItems.Select(bravo => bravo.ID);
var alphaIdsNotInBravo = alphaIds.Except(bravoIds);
var bravoIdsNotInAlpha = bravoIds.Except(alphaIds);
If you want the alphas and bravos themselves,
var alphaIdsSet = new HashSet<int>(alphaIds);
var bravoIdsSet = new HashSet<int>(bravoIds);
var alphasNotInBravo = myAlphaItems
.Where(alpha => !bravoIdsSet.Contains(alpha.ID));
var bravosNotInAlpha = myBravoItems
.Where(bravo => !alphaIdsSet.Contains(bravo.ID));
EDIT:
A few other options:
The ExceptBy method from MoreLinq.
The Enumerable.ToDictionary method.
If both types inherit from a common type (e.g. an IHasId interface), you could write your own IEqualityComparer<T> implementation; Enumerable.Except has an overload that accepts an equality-comparer as a parameter.

Sometimes LINQ is not the answer. This is the kind of problem where I would consider using a HashSet<T> with a custom comparer to reduce the work of performing set operations. HashSets are much more efficient at performing set operations than lists - and (depending on the data) can reduce the work considerably:
// create a wrapper class that can accomodate either an Alpha or a Bravo
class ABItem {
public Object Instance { get; private set; }
public int Id { get; private set; }
public ABItem( Alpha a ) { Instance = a; Id = a.Id; }
public ABItem( Bravo b ) { Instance = b; Id = b.Id; }
}
// comparer that compares Alphas and Bravos by id
class ABItemComparer : IComparer {
public int Compare( object a, object b ) {
return GetId(a).Compare(GetId(b));
}
private int GetId( object x ) {
if( x is Alpha ) return ((Alpha)x).Id;
if( x is Bravo ) return ((Bravo)x).Id;
throw new InvalidArgumentException();
}
}
// create a comparer based on comparing the ID's of ABItems
var comparer = new ABComparer();
var hashAlphas =
new HashSet<ABItem>(myAlphaItems.Select(x => new ABItem(x)),comparer);
var hashBravos =
new HashSet<ABItem>(myBravoItems.Select(x => new ABItem(x)),comparer);
// items with common IDs in Alpha and Bravo sets:
var hashCommon = new HashSet<Alpha>(hashAlphas).IntersectWith( hashSetBravo );
hashSetAlpha.ExceptWith( hashSetCommon ); // items only in Alpha
hashSetBravo.ExceptWith( hashSetCommon ); // items only in Bravo

Dictionary<int, Alpha> alphaDictionary = myAlphaItems.ToDictionary(a => a.Id);
Dictionary<int, Bravo> bravoDictionary = myBravoItems.ToDictionary(b => b.Id);
ILookup<string, int> keyLookup = alphaDictionary.Keys
.Union(bravoDictionary.Keys)
.ToLookup(x => alphaDictionary.ContainsKey(x) ?
(bravoDictionary.ContainsKey(x) ? "both" : "alpha") :
"bravo");
List<Alpha> alphaBoth = keyLookup["both"].Select(x => alphaDictionary[x]).ToList();
List<Bravo> bravoBoth = keyLookup["both"].Select(x => bravoDictionary[x]).ToList();
List<Alpha> alphaOnly = keyLookup["alpha"].Select(x => alphaDictionary[x]).ToList();
List<Bravo> bravoOnly = keyLookup["bravo"].Select(x => bravoDictionary[x]).ToList();

Here is one possible LINQ solution that performs a full outer join on both sets and appends a property to them showing which group they belong to. This solution might lose its luster, however, when you try to separate the groups into different variables. It all really depends on what kind of actions you need to perform on these objects. At any rate this ran at (I thought) an acceptable speed (.5 seconds) for me on lists of 5000 items:
var q =
from g in
(from id in myAlphaItems.Select(a => a.ID).Union(myBravoItems.Select(b => b.ID))
join a in myAlphaItems on id equals a.ID into ja
from a in ja.DefaultIfEmpty()
join b in myBravoItems on id equals b.ID into jb
from b in jb.DefaultIfEmpty()
select (a == null ?
new { ID = b.ID, Group = "Bravo Only" } :
(b == null ?
new { ID = a.ID, Group = "Alpha Only" } :
new { ID = a.ID, Group = "Both" }
)
)
)
group g.ID by g.Group;
You can remove the 'group by' query or create a dictionary from this (q.ToDictionary(x => x.Key, x => x.Select(y => y))), or whatever! This is simply a way of categorizing your items. I'm sure there are better solutions out there, but this seemed like a truly interesting question so I thought I might as well give it a shot!

I think LINQ is not the best answer to this problem if you want to traverse and compare the minimum amount of times. I think the following iterative solution is more performant. And I believe that code readability doesn't suffer.
var dictUnmatchedAlphas = myAlphaItems.ToDictionary(a => a.Id);
var myCorrelatedItems = new List<AlphaAndBravo>();
var myUnmatchedBravos = new List<Bravo>();
foreach (Bravo b in myBravoItems)
{
var id = b.Id;
if (dictUnmatchedAlphas.ContainsKey(id))
{
var a = dictUnmatchedAlphas[id];
dictUnmatchedAlphas.Remove(id); //to get just the unmatched alphas
myCorrelatedItems.Add(new AlphaAndBravo { a = a, b = b});
}
else
{
myUnmatchedBravos.Add(b);
}
}
Definition of AlphaAndBravo:
public class AlphaAndBravo {
public Alpha a { get; set; }
public Bravo b { get; set; }
}

Related

Check if elements from one list elements present in another list

I have 2 c# classes -
class ABC
{
string LogId;
string Name;
}
class XYZ
{
string LogId;
string Name;
}
class Checker
{
public void comparelists()
{
List<ABC> lstABC =new List<ABC>();
lstABC.Add(new ABC...);
lstABC.Add(new ABC...);
lstABC.Add(new ABC...);
List<XYZ> lstXYZ =new List<XYZ>();
lstXYZ.Add(new XYZ...);
lstXYZ.Add(new XYZ...);
lstXYZ.Add(new XYZ...);
var commonLogId = lstABC
.Where(x => lstXYZ.All(y => y.LogId.Contains(x.LogId)))
.ToList();
}
}
As seen from the code , I want to fetch all logids from lstABC which are present in lstXYZ.
Eg. lstABC has ->
LogId="1", Name="somename1"
LogId="2", Name="somename2"
LogId="3", Name="somename3"
LogId="4", Name="somename4"
LogId="5", Name="somename5"
lstXYZ has ->
LogId="1", Name="somename11"
LogId="2", Name="somename22"
LogId="3", Name="somename33"
LogId="8", Name="somename8"
LogId="9", Name="somename9"
Then all logids from lstABC which are present in lstXYZ are - 1,2,3 ; so all those records are expected to get fetched.
But with below linq query -
var commonLogId = lstABC
.Where(x => lstXYZ.All(y => y.LogId.Contains(x.LogId)))
.ToList();
0 records are getting fetched/selected.
approach with Any()
var res = lstABC.Where(x => (lstXYZ.Any(y => y.LogId == x.LogId))).Select(x => x.LogId);
https://dotnetfiddle.net/jRnUwS
another approach would be Intersect() which felt a bit more natural to me
var res = lstABC.Select(x => x.LogId).Intersect(lstXYZ.Select(y => y.LogId));
https://dotnetfiddle.net/7iWYDO
You are using the wrong LINQ function. Try Any():
var commonLogId = lstABC
.Where(x => lstXYZ.Any(y => y.LogId == x.LogId))
.ToList();
Also note that the id comparison with Contains() was wrong. Just use == instead.
All() checks if all elements in a list satisfy the specified condition. Any() on the other hand only checks if at least one of the elements does.
Be aware that your implementation will be very slow when both lists are large, because it's runtime complexity grows quadratically with number of elements to compare. A faster implementation would use Join() which was created and optimized exactly for this purpose:
var commonLogIds = lstABC
.Join(
lstXYZ,
x => x.LogId, // Defines what to use as key in `lstABC`.
y => y.LogId, // Defines what to use as key in `lstXYZ`.
(x, y) => x) // Defines the output of matched pairs. Here
// we simply use the values of `lstABC`.
.ToList();
It seems pretty unnatural to intersect entirely different types, so I would be tempted to interface the commonality and write an EqualityComparer:
class ABC : ILogIdProvider
{
public string LogId {get;set;}
public string Name;
}
class XYZ : ILogIdProvider
{
public string LogId{get;set;}
public string Name;
}
interface ILogIdProvider
{
string LogId{get;}
}
class LogIdComparer : EqualityComparer<ILogIdProvider>
{
public override int GetHashCode(ILogIdProvider obj) => obj.LogId.GetHashCode();
public override bool Equals(ILogIdProvider x, ILogIdProvider y) => x.LogId == y.LogId;
}
Then you can Intersect the lists more naturally:
var res = lstABC.Intersect(lstXYZ, new LogIdComparer());
Live example: https://dotnetfiddle.net/0Tt6eu

Method that receives LINQ IGrouping as Parameter

Edit 3:Improved question wording and examples
I have the following linq query that uses grouping. The grouping and select operations are complex, so I abstracted one of the selects to a method that makes some choices on how to render the data.
My query works correctly inside the anonymous group definition, but as soon as I type it to a class in order to pass it to a method as an IGrouping object it stops grouping the results.
public class TestController : Controller
{
public JsonResult ThisWorks()
{
var valueList = DataMocker.GetTestValues();
var group = from v in valueList.AsEnumerable()
where (v.Data != 0)
group v by new
{
Year = v.Fecha.Value.Year,
Trimester = string.Empty,
Month = v.Fecha.Value.Month,
Day = 0,
}
into g
select new SeriesDataPoint
{
y = g.OrderByDescending(obd => obd.Fecha)
.Select(obd => obd.Data.Value)
.FirstOrDefault(),
color = "black",
month = g.Key.Month,
year = g.Key.Year,
seriesName = "Test Series",
};
return Json(group, JsonRequestBehavior.AllowGet);
}
public JsonResult ThisDoesnt()
{
var valueList = DataMocker.GetTestValues();
var group = from v in valueList.AsEnumerable()
where (v.Data != 0)
group v by new Models.SeriesResultGroup
{
Year = v.Fecha.Value.Year,
Trimester = string.Empty,
Month = v.Fecha.Value.Month,
Day = 0,
}
into g
select new SeriesDataPoint
{
y = RenderDataPoint(valueList, g),
color = "black",
month = g.Key.Month,
year = g.Key.Year,
seriesName = "Test Series",
};
return Json(group, JsonRequestBehavior.AllowGet);
}
public static decimal? RenderDataPoint(List<Models.ValoresResultSet> valores, IGrouping<Models.SeriesResultGroup, Models.ValoresResultSet> group)
{
return group.OrderByDescending(obd => obd.Fecha)
.Select(obd => obd.Data.Value)
.FirstOrDefault();
}
}
This is the correct output: https://dl.dropbox.com/u/9764/Thisworks.txt
This is the wrong output: https://dl.dropbox.com/u/9764/ThisDoesnt.txt
In first case you group by anonymous type, generated by compiler. This type also has generated Equals and HashCode overrides (you can check it via ildasm). Anonymous type`s default Equals runs equality comparer for each field. I think this was made for use in cases like this.
In second case you group by your custom type. Since it is a reference type, default equality comparer compares objects by reference. Because before grouping you produce a sequence of objects, each of them is unique. So default equality check thinks that they differs.
Solutions are (choose any):
Override Equals and HashCode.
Make type struct instead of class
Be careful and don`t forget to implement HashCode as well.

Is there any way to reduce duplication in these two linq queries

Building a bunch of reports, have to do the same thing over and over with different fields
public List<ReportSummary> ListProducer()
{
return (from p in Context.stdReports
group p by new { p.txt_company, p.int_agencyId }
into g
select new ReportSummary
{
PKi = g.Key.int_agencyId,
Name = g.Key.txt_company,
Sum = g.Sum(foo => foo.lng_premium),
Count = g.Count()
}).OrderBy(q => q.Name).ToList();
}
public List<ReportSummary> ListCarrier()
{
return (from p in Context.stdReports
group p by new { p.txt_carrier, p.int_carrierId }
into g
select new ReportSummary
{
PKi = g.Key.int_carrierId,
Name = g.Key.txt_carrier,
Sum = g.Sum(foo => foo.lng_premium),
Count = g.Count()
}).OrderBy(q => q.Name).ToList();
}
My Mind is drawing a blank on how i might be able to bring these two together.
It looks like the only thing that changes are the names of the grouping parameters. Could you write a wrapper function that accepts lambdas specifying the grouping parameters? Or even a wrapper function that accepts two strings and then builds raw T-SQL, instead of using LINQ?
Or, and I don't know if this would compile, can you alias the fields in the group statement so that the grouping construct can always be referenced the same way, such as g.Key.id1 and g.Key.id2? You could then pass the grouping construct into the ReportSummary constructor and do the left-hand/right-hand assignment in one place. (You'd need to pass it as dynamic though, since its an anonymous object at the call site)
You could do something like this:
public List<ReportSummary> GetList(Func<Record, Tuple<string, int>> fieldSelector)
{
return (from p in Context.stdReports
group p by fieldSelector(p)
into g
select new ReportSummary
{
PKi = g.Key.Item2
Name = g.Key.Item1,
Sum = g.Sum(foo => foo.lng_premium),
Count = g.Count()
}).OrderBy(q => q.Name).ToList();
}
And then you could call it like this:
var summary = GetList(rec => Tuple.Create(rec.txt_company, rec.int_agencyId));
or:
var summary = GetList(rec => Tuple.Create(rec.txt_carrier, rec.int_carrierId));
Of course, you'll want to replace Record with whatever type Context.stdReports is actually returning.
I haven't checked to see if that will compile, but you get the idea.
Since all that changes between the two queries is the group key, parameterize it. Since it's a composite key (has more than one value within), you'll need to create a simple class which can hold those values (with generic names).
In this case, to parameterize it, make the key selector a parameter to your function. It would have to be an expression and the method syntax to get this to work. You could then generalize it into a function:
public class GroupKey
{
public int Id { get; set; }
public string Name { get; set; }
}
private IQueryable<ReportSummary> GetReport(
Expression<Func<stdReport, GroupKey>> groupKeySelector)
{
return Context.stdReports
.GroupBy(groupKeySelector)
.Select(g => new ReportSummary
{
PKi = g.Key.Id,
Name = g.Key.Name,
Sum = g.Sum(report => report.lng_premium),
Count = g.Count(),
})
.OrderBy(summary => summary.Name);
}
Then just make use of this function in your queries using the appropriate key selectors.
public List<ReportSummary> ListProducer()
{
return GetReport(r =>
new GroupKey
{
Id = r.int_agencyId,
Name = r.txt_company,
})
.ToList();
}
public List<ReportSummary> ListCarrier()
{
return GetReport(r =>
new GroupKey
{
Id = r.int_carrierId,
Name = r.txt_carrier,
})
.ToList();
}
I don't know what types you have mapped for your entities so I made some assumptions. Use whatever is appropriate in your case.

c# Linq differed execution challenge - help needed in creating 3 different lists

I am trying to create 3 different lists (1,2,3) from 2 existing lists (A,B).
The 3 lists need to identify the following relationships.
List 1 - the items that are in list A and not in list B
List 2 - the items that are in list B and not in list A
List 3 - the items that are in both lists.
I then want to join all the lists together into one list.
My problem is that I want to identify the differences by adding an enum identifying the relationship to the items of each list. But by adding the Enum the Except Linq function does not identify the fact (obviously) that the lists are the same. Because the Linq queries are differed I can not resolve this by changing the order of my statements ie. identify the the lists and then add the Enums.
This is the code that I have got to (Doesn't work properly)
There might be a better approach.
List<ManufactorListItem> manufactorItemList =
manufactorRepository.GetManufactorList();
// Get the Manufactors from the Families repository
List<ManufactorListItem> familyManufactorList =
this.familyRepository.GetManufactorList(familyGuid);
// Identify Manufactors that are only found in the Manufactor Repository
List<ManufactorListItem> inManufactorsOnly =
manufactorItemList.Except(familyManufactorList).ToList();
// Mark them as (Parent Only)
foreach (ManufactorListItem manOnly in inManufactorsOnly) {
manOnly.InheritanceState = EnumInheritanceState.InParent;
}
// Identify Manufactors that are only found in the Family Repository
List<ManufactorListItem> inFamiliesOnly =
familyManufactorList.Except(manufactorItemList).ToList();
// Mark them as (Child Only)
foreach (ManufactorListItem famOnly in inFamiliesOnly) {
famOnly.InheritanceState = EnumInheritanceState.InChild;
}
// Identify Manufactors that are found in both Repositories
List<ManufactorListItem> sameList =
manufactorItemList.Intersect(familyManufactorList).ToList();
// Mark them Accordingly
foreach (ManufactorListItem same in sameList) {
same.InheritanceState = EnumInheritanceState.InBoth;
}
// Create an output List
List<ManufactorListItem> manufactors = new List<ManufactorListItem>();
// Join all of the lists together.
manufactors = sameList.Union(inManufactorsOnly).
Union(inFamiliesOnly).ToList();
Any ideas hot to get around this?
Thanks in advance
You can make it much simplier:
List<ManufactorListItem> manufactorItemList = ...;
List<ManufactorListItem> familyManufactorList = ...;
var allItems = manufactorItemList.ToDictionary(i => i, i => InheritanceState.InParent);
foreach (var familyManufactor in familyManufactorList)
{
allItems[familyManufactor] = allItems.ContainsKey(familyManufactor) ?
InheritanceState.InBoth :
InheritanceState.InChild;
}
//that's all, now we can get any subset items:
var inFamiliesOnly = allItems.Where(p => p.Value == InheritanceState.InChild).Select(p => p.Key);
var inManufactorsOnly = allItems.Where(p => p.Value == InheritanceState.InParent).Select(p => p.Key);
var allManufactors = allItems.Keys;
This seems like the simplest way to me:
(I'm using the following Enum for simplicity:
public enum ContainedIn
{
AOnly,
BOnly,
Both
}
)
var la = new List<int> {1, 2, 3};
var lb = new List<int> {2, 3, 4};
var l1 = la.Except(lb)
.Select(i => new Tuple<int, ContainedIn>(i, ContainedIn.AOnly));
var l2 = lb.Except(la)
.Select(i => new Tuple<int, ContainedIn>(i, ContainedIn.BOnly));
var l3 = la.Intersect(lb)
.Select(i => new Tuple<int, ContainedIn>(i, ContainedIn.Both));
var combined = l1.Union(l2).Union(l3);
So long as you have access to the Tuple<T1, T2> class (I think it's a .NET 4 addition).
If the problem is with the Except() statement, then I suggest you use the 3 parameter override of Except in order to provide a custom IEqualityComparer<ManufactorListItem> compare which tests the appropriate ManufactorListItem fields, but not the InheritanceState.
e.g. your equality comparer might look like:
public class ManufactorComparer : IEqualityComparer<ManufactorListItem> {
public bool Equals(ManufactorListItem x, ManufactorListItem y) {
// you need to write a method here that tests all the fields except InheritanceState
}
public int GetHashCode(ManufactorListItem obj) {
// you need to write a simple hash code generator here using any/all the fields except InheritanceState
}
}
and then you would call this using code a bit like
// Identify Manufactors that are only found in the Manufactor Repository
List<ManufactorListItem> inManufactorsOnly =
manufactorItemList.Except(familyManufactorList, new ManufactorComparer()).ToList();

How to match the results back to an array

I have an array of objects. The object has two properties a value and an index.
I use a linq to entities query with the contains keyword to bring back all results in a table that match up to value.
Now here is the issue... I want to match up the results to the object index...
what is the fastest best way to perform this. I can add properties to the object.
It is almost like I want the query results to return this:
index = 1;
value = "searchkey"
queryvalue = "query value"
From your question I think I can assume that you have the following variables defined:
Lookup[] (You look-up array)
IEnumerable<Record> (The results returned by your query)
... and the types look roughly like this:
public class Lookup
{
public int Index { get; set; }
public int Value { get; set; }
}
public class Record
{
public int Value { get; set; }
/* plus other fields */
}
Then you can solve your problem in a couple of ways.
First using an anonymous type:
var matches
= from r in records
join l in lookups on r.Value equals l.Value
group r by l.Index into grs
select new
{
Index = grs.Key,
Records = grs.ToArray(),
};
The other two just use standard LINQ GroupBy & ToLookup:
IEnumerable<IGrouping<int, Record>> matches2
= from r in records
join l in lookups on r.Value equals l.Value
group r by l.Index;
ILookup<int, Record[]> matches3
= matches2.ToLookup(m => m.Key, m => m.ToArray());
Do these solve your problem?
Just a shot in the dark as to what you need, but the LINQ extension methods can handle the index as a second paramter to the lambda functions. IE:
someCollection.Select( (x,i) => new { SomeProperty = x.Property, Index = i } );

Categories