LINQ - GroupBy multiple columns and merge the result - c#

I am working with sizeable set of data (~130.000 records), I've managed to transform it the way I want it (to csv).
Here is a simplified example of how the List looks like:
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2"
"Surname2, Name2;Address2;State2;YES;Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Now, I would like to merge the records if 1st, 2nd AND 3rd column match, like so:
output
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2 Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Here's what I've got so far:
output.GroupBy(x => new { c1 = x.Split(';')[0], c2 = x.Split(';')[1], c3 = x.Split(';')[2] }).Select(//have no idea what should go here);

First try to get the columns you need projecting the result in an anonymous type:
var query= from r in output
let columns= r.Split(';')
select new { c1 =columns[0], c2 =columns[1], c3 = columns[2] ,c5=columns[4]};
And then create the groups but now using the anonymous object you define in the previous query:
var result= query.GroupBy(e=>new {e.c1, e.c2, e.c3})
.Select(g=> new {SurName=g.Key.c1,
Name=g.Key.c2,
Address=g.Key.c3,
Groups=String.Join(",",g.Select(e=>e.c4)});
I know I'm missing some columns but I think you can get the idea.
PS: The fact I have separated the logic in two queries is just for readability propose, you can compose both queries in one but that is not going to change the performance because LINQ use deferred evaluation.

This is how I would do it:
class Program
{
static void Main(string[] args)
{
List<string> input = new List<string> {
"Surname1, Name1;Address1;State1;YES;Group1",
"Surname2, Name2;Address2;State2;YES;Group2",
"Surname2, Name2;Address2;State2;YES;Group1",
"Surname3, Name3;Address3;State3;NO;Group1",
"Surname1, Name1;Address2;State1;YES;Group1",
};
var transformed = input.Select(s => s.Split(';'))
.GroupBy( s => new string[] { s[0], s[1], s[2], s[3] },
(key, elements) => string.Join(";", key) + ";" + string.Join(" ", elements.Select(e => e.Last())),
new MyEqualityComparer())
.ToList();
}
}
internal class MyEqualityComparer : IEqualityComparer<string[]>
{
public bool Equals(string[] x, string[] y)
{
return x[0] == y[0] && x[1] == y[1] && x[2] == y[2];
}
public int GetHashCode(string[] obj)
{
int hashCode = obj[0].GetHashCode();
hashCode = hashCode ^ obj[1].GetHashCode();
hashCode = hashCode ^ obj[2].GetHashCode();
return hashCode;
}
}
Consider the first 4 columns as the grouping key, but only use the first 3 for the comparison (hence the custom IEqualityComparer).
Then if you have the (key, elements) groups, transform them so that you join the elements of the key with ; (remember, the key consists of the first 4 columns) and add to it the last element from every member of the group, joined with a space.

Related

Check if elements from one list elements present in another list

I have 2 c# classes -
class ABC
{
string LogId;
string Name;
}
class XYZ
{
string LogId;
string Name;
}
class Checker
{
public void comparelists()
{
List<ABC> lstABC =new List<ABC>();
lstABC.Add(new ABC...);
lstABC.Add(new ABC...);
lstABC.Add(new ABC...);
List<XYZ> lstXYZ =new List<XYZ>();
lstXYZ.Add(new XYZ...);
lstXYZ.Add(new XYZ...);
lstXYZ.Add(new XYZ...);
var commonLogId = lstABC
.Where(x => lstXYZ.All(y => y.LogId.Contains(x.LogId)))
.ToList();
}
}
As seen from the code , I want to fetch all logids from lstABC which are present in lstXYZ.
Eg. lstABC has ->
LogId="1", Name="somename1"
LogId="2", Name="somename2"
LogId="3", Name="somename3"
LogId="4", Name="somename4"
LogId="5", Name="somename5"
lstXYZ has ->
LogId="1", Name="somename11"
LogId="2", Name="somename22"
LogId="3", Name="somename33"
LogId="8", Name="somename8"
LogId="9", Name="somename9"
Then all logids from lstABC which are present in lstXYZ are - 1,2,3 ; so all those records are expected to get fetched.
But with below linq query -
var commonLogId = lstABC
.Where(x => lstXYZ.All(y => y.LogId.Contains(x.LogId)))
.ToList();
0 records are getting fetched/selected.
approach with Any()
var res = lstABC.Where(x => (lstXYZ.Any(y => y.LogId == x.LogId))).Select(x => x.LogId);
https://dotnetfiddle.net/jRnUwS
another approach would be Intersect() which felt a bit more natural to me
var res = lstABC.Select(x => x.LogId).Intersect(lstXYZ.Select(y => y.LogId));
https://dotnetfiddle.net/7iWYDO
You are using the wrong LINQ function. Try Any():
var commonLogId = lstABC
.Where(x => lstXYZ.Any(y => y.LogId == x.LogId))
.ToList();
Also note that the id comparison with Contains() was wrong. Just use == instead.
All() checks if all elements in a list satisfy the specified condition. Any() on the other hand only checks if at least one of the elements does.
Be aware that your implementation will be very slow when both lists are large, because it's runtime complexity grows quadratically with number of elements to compare. A faster implementation would use Join() which was created and optimized exactly for this purpose:
var commonLogIds = lstABC
.Join(
lstXYZ,
x => x.LogId, // Defines what to use as key in `lstABC`.
y => y.LogId, // Defines what to use as key in `lstXYZ`.
(x, y) => x) // Defines the output of matched pairs. Here
// we simply use the values of `lstABC`.
.ToList();
It seems pretty unnatural to intersect entirely different types, so I would be tempted to interface the commonality and write an EqualityComparer:
class ABC : ILogIdProvider
{
public string LogId {get;set;}
public string Name;
}
class XYZ : ILogIdProvider
{
public string LogId{get;set;}
public string Name;
}
interface ILogIdProvider
{
string LogId{get;}
}
class LogIdComparer : EqualityComparer<ILogIdProvider>
{
public override int GetHashCode(ILogIdProvider obj) => obj.LogId.GetHashCode();
public override bool Equals(ILogIdProvider x, ILogIdProvider y) => x.LogId == y.LogId;
}
Then you can Intersect the lists more naturally:
var res = lstABC.Intersect(lstXYZ, new LogIdComparer());
Live example: https://dotnetfiddle.net/0Tt6eu

Conversion of DataTable to Dictionary<String,StringBuilder>

DataTable has following column names:
Name, Value, Type, Info
Need to convert a structure Dictionary
Where Name is the Key (string) and other values will be appended to the StringBuilder like "Value,Type,Info", however it is possible to have repetitive Name column value, then successive each appended value to the StringBuilder will use a separator like ?: to depict the next set of value.
For eg: if the DataTable data is like:
Name, Value, Type, Info
one 1 a aa
one 11 b bb
two 2 c cc
two 22 dd ddd
two 222 ee eee
Now the result structure should be like:
Dictionary<String,StringBuilder> detail = new Dictionary<String,StringBuilder>
{
{[one],[1,a,aa?:11,b,bb},
{[two],[2,c,cc?:22,dd,ddd?:222,ee,eee}
}
It is easy to achieve the same using for loop, but I was trying to do it via Linq, so I tried something like:
datatable.AsEnumerable.Select(row =>
{
KeyValuePair<String,StringBuilder> kv = new KeyValuePair<String,StringBuilder>();
kv.key = row[Name];
kv.Value = row[Value]+","+row[Type]+","+row[Info]
return kv;
}).ToDictionary(y=>y.Key,y=>y.Value)
This code doesn't take care of repetitive keys and thus appending, probably I need to use SelectMany to flatten the structure, but how would it work in giving me a dictionary with requirements specified above, so that delimiters can be added to be existing key's value. Any pointer that can direct me in the correct direction.
Edited:
datatable.AsEnumerable()
.GroupBy(r => (string)r["Name"])
.Select(g => new
{
Key = g.Key,
// Preferred Solution
Value = new StringBuilder(
g.Select(r => string.Format("{0}, {1}, {2}",
r["Value"], r["Type"], r["Info"]))
.Aggregate((s1, s2) => s1 + "?:" + s2))
/*
//as proposed by juharr
Value = new StringBuilder(string.Join("?:", g.Select( r => string.Format("{0}, {1}, {2}", r["Value"], r["Type"], r["Info"]))))
*/
})
.ToDictionary(p => p.Key, p => p.Value);
Something like this should work, and it avoid some complex Linq that could get irritating to debug:
public static Dictionary<string, StringBuilder> GetData(DataTable table)
{
const string delimiter = "?:";
var collection = new Dictionary<string, StringBuilder>();
// dotNetFiddle wasn't liking the `.AsEnumerable()` extension
// But you should still be able to use it here
foreach (DataRow row in table.Rows)
{
var key = (string)row["Name"];
var #value = string.Format("{0},{1},{2}",
row["Value"],
row["Type"],
row["Info"]);
StringBuilder existingSb;
if (collection.TryGetValue(key, out existingSb))
{
existingSb.Append(delimiter + #value);
}
else
{
existingSb = new StringBuilder();
existingSb.Append(#value);
collection.Add(key, existingSb);
}
}
return collection;
}

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

My program creates a .csv file with a persons name and an integer next to them.
Occasionally there are two entries of the same name in the file, but with a different time. I only want one instance of each person.
I would like to take the mean of the two numbers to produce just one row for the name, where the number will be the average of the two existing.
So here Alex Pitt has two numbers. How can I take the mean of 105 and 71 (in this case) to produce a row that just includes Alex Pitt, 88?
Here is how I am creating my CSV file if reference is required.
public void CreateCsvFile()
{
PaceCalculator ListGather = new PaceCalculator();
List<string> NList = ListGather.NameGain();
List<int> PList = ListGather.PaceGain();
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
string filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
using (var file = File.CreateText(filepath))
{
foreach (var arr in nAndPList)
{
if (arr == null || arr.Length == 0) continue;
file.Write(arr[0]);
for (int i = 1; i < arr.Length; i++)
{
file.Write(arr[i]);
}
file.WriteLine();
}
}
}
To start with, you can write your current CreateCsvFile much more simply like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => String.Format("{0},{1}", a, b));
File.WriteAllLines(filepath, records);
}
Now, it can easily be changed to work out the average pace if you have duplicate names, like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
from record in ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => new { Name = a, Pace = b })
group record.Pace by record.Name into grs
select String.Format("{0},{1}", grs.Key, grs.Average());
File.WriteAllLines(filepath, records);
}
I would recommend to merge the duplicates before you put everything into the CSV file.
use:
// The List with all duplicate values
List<string> duplicateChecker = new List<string>();
//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part.
duplicateChecker = NList .Distinct().ToList();
Now you can simply Iterrate through the new list and search their values in your NList. Use a foreach loop which is looking up the index of the Name value in Nlist. After that you can use the Index to merge the integers with a simple math method.
//Something like this:
Make a foreach loop for every entry in your duplicateChecker =>
Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>
Get the Value of the current String and search it in Nlist =>
Get the Index of the current Element in Nlist and search for the Index in Plist =>
Get the Integer of Plist and store it in a array =>
// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.
I hope you understand what I was trying to say. However I would recommend using a unique identifier for your persons. Sooner or later 2 persons will appear with the same name (like in a huge company).
Change the code below:
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
To
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
.ToList()
.GroupBy(x => x.[The field you want to group by])
.Select(y => y.First);

Querying a list of entities with composite keys in EF [duplicate]

given a list of ids, I can query all relevant rows by:
context.Table.Where(q => listOfIds.Contains(q.Id));
But how do you achieve the same functionality when the Table has a composite key?
This is a nasty problem for which I don't know any elegant solution.
Suppose you have these key combinations, and you only want to select the marked ones (*).
Id1 Id2
--- ---
1 2 *
1 3
1 6
2 2 *
2 3 *
... (many more)
How to do this is a way that Entity Framework is happy? Let's look at some possible solutions and see if they're any good.
Solution 1: Join (or Contains) with pairs
The best solution would be to create a list of the pairs you want, for instance Tuples, (List<Tuple<int,int>>) and join the database data with this list:
from entity in db.Table // db is a DbContext
join pair in Tuples on new { entity.Id1, entity.Id2 }
equals new { Id1 = pair.Item1, Id2 = pair.Item2 }
select entity
In LINQ to objects this would be perfect, but, too bad, EF will throw an exception like
Unable to create a constant value of type 'System.Tuple`2 (...) Only primitive types or enumeration types are supported in this context.
which is a rather clumsy way to tell you that it can't translate this statement into SQL, because Tuples is not a list of primitive values (like int or string). For the same reason a similar statement using Contains (or any other LINQ statement) would fail.
Solution 2: In-memory
Of course we could turn the problem into simple LINQ to objects like so:
from entity in db.Table.AsEnumerable() // fetch db.Table into memory first
join pair Tuples on new { entity.Id1, entity.Id2 }
equals new { Id1 = pair.Item1, Id2 = pair.Item2 }
select entity
Needless to say that this is not a good solution. db.Table could contain millions of records.
Solution 3: Two Contains statements (incorrect)
So let's offer EF two lists of primitive values, [1,2] for Id1 and [2,3] for Id2. We don't want to use join, so let's use Contains:
from entity in db.Table
where ids1.Contains(entity.Id1) && ids2.Contains(entity.Id2)
select entity
But now the results also contains entity {1,3}! Well, of course, this entity perfectly matches the two predicates. But let's keep in mind that we're getting closer. In stead of pulling millions of entities into memory, we now only get four of them.
Solution 4: One Contains with computed values
Solution 3 failed because the two separate Contains statements don't only filter the combinations of their values. What if we create a list of combinations first and try to match these combinations? We know from solution 1 that this list should contain primitive values. For instance:
var computed = ids1.Zip(ids2, (i1,i2) => i1 * i2); // [2,6]
and the LINQ statement:
from entity in db.Table
where computed.Contains(entity.Id1 * entity.Id2)
select entity
There are some problems with this approach. First, you'll see that this also returns entity {1,6}. The combination function (a*b) does not produce values that uniquely identify a pair in the database. Now we could create a list of strings like ["Id1=1,Id2=2","Id1=2,Id2=3]" and do
from entity in db.Table
where computed.Contains("Id1=" + entity.Id1 + "," + "Id2=" + entity.Id2)
select entity
(This would work in EF6, not in earlier versions).
This is getting pretty messy. But a more important problem is that this solution is not sargable, which means: it bypasses any database indexes on Id1 and Id2 that could have been used otherwise. This will perform very very poorly.
Solution 5: Best of 2 and 3
So the most viable solution I can think of is a combination of Contains and a join in memory: First do the contains statement as in solution 3. Remember, it got us very close to what we wanted. Then refine the query result by joining the result as an in-memory list:
var rawSelection = from entity in db.Table
where ids1.Contains(entity.Id1) && ids2.Contains(entity.Id2)
select entity;
var refined = from entity in rawSelection.AsEnumerable()
join pair in Tuples on new { entity.Id1, entity.Id2 }
equals new { Id1 = pair.Item1, Id2 = pair.Item2 }
select entity;
It's not elegant, messy all the same maybe, but so far it's the only scalable1 solution to this problem I found, and applied in my own code.
Solution 6: Build a query with OR clauses
Using a Predicate builder like Linqkit or alternatives, you can build a query that contains an OR clause for each element in the list of combinations. This could be a viable option for really short lists. With a couple of hundreds of elements, the query will start performing very poorly. So I don't consider this a good solution unless you can be 100% sure that there will always be a small number of elements. One elaboration of this option can be found here.
Solution 7: Unions
There's also a solution using UNIONs that I posted later here.
1As far as the Contains statement is scalable: Scalable Contains method for LINQ against a SQL backend
Solution for Entity Framework Core with SQL Server
🎉 NEW! QueryableValues EF6 Edition has arrived!
The following solution makes use of QueryableValues. This is a library that I wrote to primarily solve the problem of query plan cache pollution in SQL Server caused by queries that compose local values using the Contains LINQ method. It also allows you to compose values of complex types in your queries in a performant way, which will achieve what's being asked in this question.
First you will need to install and set up the library, after doing that you can use any of the following patterns that will allow you to query your entities using a composite key:
// Required to make the AsQueryableValues method available on the DbContext.
using BlazarTech.QueryableValues;
// Local data that will be used to query by the composite key
// of the fictitious OrderProduct table.
var values = new[]
{
new { OrderId = 1, ProductId = 10 },
new { OrderId = 2, ProductId = 20 },
new { OrderId = 3, ProductId = 30 }
};
// Optional helper variable (needed by the second example due to CS0854)
var queryableValues = dbContext.AsQueryableValues(values);
// Example 1 - Using a Join (preferred).
var example1Results = dbContext
.OrderProduct
.Join(
queryableValues,
e => new { e.OrderId, e.ProductId },
v => new { v.OrderId, v.ProductId },
(e, v) => e
)
.ToList();
// Example 2 - Using Any (similar behavior as Contains).
var example2Results = dbContext
.OrderProduct
.Where(e => queryableValues
.Where(v =>
v.OrderId == e.OrderId &&
v.ProductId == e.ProductId
)
.Any()
)
.ToList();
Useful Links
Nuget Package
GitHub Repository
Benchmarks
QueryableValues is distributed under the MIT license.
You can use Union for each composite primary key:
var compositeKeys = new List<CK>
{
new CK { id1 = 1, id2 = 2 },
new CK { id1 = 1, id2 = 3 },
new CK { id1 = 2, id2 = 4 }
};
IQuerable<CK> query = null;
foreach(var ck in compositeKeys)
{
var temp = context.Table.Where(x => x.id1 == ck.id1 && x.id2 == ck.id2);
query = query == null ? temp : query.Union(temp);
}
var result = query.ToList();
You can create a collection of strings with both keys like this (I am assuming that your keys are int type):
var id1id2Strings = listOfIds.Select(p => p.Id1+ "-" + p.Id2);
Then you can just use "Contains" on your db:
using (dbEntities context = new dbEntities())
{
var rec = await context.Table1.Where(entity => id1id2Strings .Contains(entity.Id1+ "-" + entity.Id2));
return rec.ToList();
}
You need a set of objects representing the keys you want to query.
class Key
{
int Id1 {get;set;}
int Id2 {get;set;}
If you have two lists and you simply check that each value appears in their respective list then you are getting the cartesian product of the lists - which is likely not what you want. Instead you need to query the specific combinations required
List<Key> keys = // get keys;
context.Table.Where(q => keys.Any(k => k.Id1 == q.Id1 && k.Id2 == q.Id2));
I'm not completely sure that this is valid use of Entity Framework; you may have issues with sending the Key type to the database. If that happens then you can be creative:
var composites = keys.Select(k => p1 * k.Id1 + p2 * k.Id2).ToList();
context.Table.Where(q => composites.Contains(p1 * q.Id1 + p2 * q.Id2));
You can create an isomorphic function (prime numbers are good for this), something like a hashcode, which you can use to compare the pair of values. As long as the multiplicative factors are co-prime this pattern will be isomorphic (one-to-one) - i.e. the result of p1*Id1 + p2*Id2 will uniquely identify the values of Id1 and Id2 as long as the prime numbers are correctly chosen.
But then you end up in a situation where you're implementing complex concepts and someone is going to have to support this. Probably better to write a stored procedure which takes the valid key objects.
Ran into this issue as well and needed a solution that both did not perform a table scan and also provided exact matches.
This can be achieved by combining Solution 3 and Solution 4 from Gert Arnold's Answer
var firstIds = results.Select(r => r.FirstId);
var secondIds = results.Select(r => r.SecondId);
var compositeIds = results.Select(r => $"{r.FirstId}:{r.SecondId}");
var query = from e in dbContext.Table
//first check the indexes to avoid a table scan
where firstIds.Contains(e.FirstId) && secondIds.Contains(e.SecondId))
//then compare the compositeId for an exact match
//ToString() must be called unless using EF Core 5+
where compositeIds.Contains(e.FirstId.ToString() + ":" + e.SecondId.ToString()))
select e;
var entities = await query.ToListAsync();
For EF Core I use a slightly modified version of the bucketized IN method by EricEJ to map composite keys as tuples. It performs pretty well for small sets of data.
Sample usage
List<(int Id, int Id2)> listOfIds = ...
context.Table.In(listOfIds, q => q.Id, q => q.Id2);
Implementation
public static IQueryable<TQuery> In<TKey1, TKey2, TQuery>(
this IQueryable<TQuery> queryable,
IEnumerable<(TKey1, TKey2)> values,
Expression<Func<TQuery, TKey1>> key1Selector,
Expression<Func<TQuery, TKey2>> key2Selector)
{
if (values is null)
{
throw new ArgumentNullException(nameof(values));
}
if (key1Selector is null)
{
throw new ArgumentNullException(nameof(key1Selector));
}
if (key2Selector is null)
{
throw new ArgumentNullException(nameof(key2Selector));
}
if (!values.Any())
{
return queryable.Take(0);
}
var distinctValues = Bucketize(values);
if (distinctValues.Length > 1024)
{
throw new ArgumentException("Too many parameters for SQL Server, reduce the number of parameters", nameof(values));
}
var predicates = distinctValues
.Select(v =>
{
// Create an expression that captures the variable so EF can turn this into a parameterized SQL query
Expression<Func<TKey1>> value1AsExpression = () => v.Item1;
Expression<Func<TKey2>> value2AsExpression = () => v.Item2;
var firstEqual = Expression.Equal(key1Selector.Body, value1AsExpression.Body);
var visitor = new ReplaceParameterVisitor(key2Selector.Parameters[0], key1Selector.Parameters[0]);
var secondEqual = Expression.Equal(visitor.Visit(key2Selector.Body), value2AsExpression.Body);
return Expression.AndAlso(firstEqual, secondEqual);
})
.ToList();
while (predicates.Count > 1)
{
predicates = PairWise(predicates).Select(p => Expression.OrElse(p.Item1, p.Item2)).ToList();
}
var body = predicates.Single();
var clause = Expression.Lambda<Func<TQuery, bool>>(body, key1Selector.Parameters[0]);
return queryable.Where(clause);
}
class ReplaceParameterVisitor : ExpressionVisitor
{
private ParameterExpression _oldParameter;
private ParameterExpression _newParameter;
public ReplaceParameterVisitor(ParameterExpression oldParameter, ParameterExpression newParameter)
{
_oldParameter = oldParameter;
_newParameter = newParameter;
}
protected override Expression VisitParameter(ParameterExpression node)
{
if (ReferenceEquals(node, _oldParameter))
return _newParameter;
return base.VisitParameter(node);
}
}
/// <summary>
/// Break a list of items tuples of pairs.
/// </summary>
private static IEnumerable<(T, T)> PairWise<T>(this IEnumerable<T> source)
{
var sourceEnumerator = source.GetEnumerator();
while (sourceEnumerator.MoveNext())
{
var a = sourceEnumerator.Current;
sourceEnumerator.MoveNext();
var b = sourceEnumerator.Current;
yield return (a, b);
}
}
private static TKey[] Bucketize<TKey>(IEnumerable<TKey> values)
{
var distinctValueList = values.Distinct().ToList();
// Calculate bucket size as 1,2,4,8,16,32,64,...
var bucket = 1;
while (distinctValueList.Count > bucket)
{
bucket *= 2;
}
// Fill all slots.
var lastValue = distinctValueList.Last();
for (var index = distinctValueList.Count; index < bucket; index++)
{
distinctValueList.Add(lastValue);
}
var distinctValues = distinctValueList.ToArray();
return distinctValues;
}
In the absence of a general solution, I think there are two things to consider:
Avoid multi-column primary keys (will make unit testing easier too).
But if you have to, chances are that one of them will reduce the
query result size to O(n) where n is the size of the ideal query
result. From here, its Solution 5 from Gerd Arnold above.
For example, the problem leading me to this question was querying order lines, where the key is order id + order line number + order type, and the source had the order type being implicit. That is, the order type was a constant, order ID would reduce the query set to order lines of relevant orders, and there would usually be 5 or less of these per order.
To rephrase: If you have a composite key, changes are that one of them have very few duplicates. Apply Solution 5 from above with that.
I tried this solution and it worked with me and the output query was perfect without any parameters
using LinqKit; // nuget
var customField_Ids = customFields?.Select(t => new CustomFieldKey { Id = t.Id, TicketId = t.TicketId }).ToList();
var uniqueIds1 = customField_Ids.Select(cf => cf.Id).Distinct().ToList();
var uniqueIds2 = customField_Ids.Select(cf => cf.TicketId).Distinct().ToList();
var predicate = PredicateBuilder.New<CustomFieldKey>(false); //LinqKit
var lambdas = new List<Expression<Func<CustomFieldKey, bool>>>();
foreach (var cfKey in customField_Ids)
{
var id = uniqueIds1.Where(uid => uid == cfKey.Id).Take(1).ToList();
var ticketId = uniqueIds2.Where(uid => uid == cfKey.TicketId).Take(1).ToList();
lambdas.Add(t => id.Contains(t.Id) && ticketId.Contains(t.TicketId));
}
predicate = AggregateExtensions.AggregateBalanced(lambdas.ToArray(), (expr1, expr2) =>
{
var invokedExpr = Expression.Invoke(expr2, expr1.Parameters.Cast<Expression>());
return Expression.Lambda<Func<CustomFieldKey, bool>>
(Expression.OrElse(expr1.Body, invokedExpr), expr1.Parameters);
});
var modifiedCustomField_Ids = repository.GetTable<CustomFieldLocal>()
.Select(cf => new CustomFieldKey() { Id = cf.Id, TicketId = cf.TicketId }).Where(predicate).ToArray();
I ended up writing a helper for this problem that relies on System.Linq.Dynamic.Core;
Its a lot of code and don't have time to refactor at the moment but input / suggestions appreciated.
public static IQueryable<TEntity> WhereIsOneOf<TEntity, TSource>(this IQueryable<TEntity> dbSet,
IEnumerable<TSource> source,
Expression<Func<TEntity, TSource,bool>> predicate) where TEntity : class
{
var (where, pDict) = GetEntityPredicate(predicate, source);
return dbSet.Where(where, pDict);
(string WhereStr, IDictionary<string, object> paramDict) GetEntityPredicate(Expression<Func<TEntity, TSource, bool>> func, IEnumerable<TSource> source)
{
var firstP = func.Parameters[0];
var binaryExpressions = RecurseBinaryExpressions((BinaryExpression)func.Body);
var i = 0;
var paramDict = new Dictionary<string, object>();
var res = new List<string>();
foreach (var sourceItem in source)
{
var innerRes = new List<string>();
foreach (var bExp in binaryExpressions)
{
var emp = ToEMemberPredicate(firstP, bExp);
var val = emp.GetKeyValue(sourceItem);
var pName = $"#{i++}";
paramDict.Add(pName, val);
var str = $"{emp.EntityMemberName} {emp.SQLOperator} {pName}";
innerRes.Add(str);
}
res.Add( "(" + string.Join(" and ", innerRes) + ")");
}
var sRes = string.Join(" || ", res);
return (sRes, paramDict);
}
EMemberPredicate ToEMemberPredicate(ParameterExpression firstP, BinaryExpression bExp)
{
var lMember = (MemberExpression)bExp.Left;
var rMember = (MemberExpression)bExp.Right;
var entityMember = lMember.Expression == firstP ? lMember : rMember;
var keyMember = entityMember == lMember ? rMember : lMember;
return new EMemberPredicate(entityMember, keyMember, bExp.NodeType);
}
List<BinaryExpression> RecurseBinaryExpressions(BinaryExpression e, List<BinaryExpression> runningList = null)
{
if (runningList == null) runningList = new List<BinaryExpression>();
if (e.Left is BinaryExpression lbe)
{
var additions = RecurseBinaryExpressions(lbe);
runningList.AddRange(additions);
}
if (e.Right is BinaryExpression rbe)
{
var additions = RecurseBinaryExpressions(rbe);
runningList.AddRange(additions);
}
if (e.Left is MemberExpression && e.Right is MemberExpression)
{
runningList.Add(e);
}
return runningList;
}
}
Helper class:
public class EMemberPredicate
{
public readonly MemberExpression EntityMember;
public readonly MemberExpression KeyMember;
public readonly PropertyInfo KeyMemberPropInfo;
public readonly string EntityMemberName;
public readonly string SQLOperator;
public EMemberPredicate(MemberExpression entityMember, MemberExpression keyMember, ExpressionType eType)
{
EntityMember = entityMember;
KeyMember = keyMember;
KeyMemberPropInfo = (PropertyInfo)keyMember.Member;
EntityMemberName = entityMember.Member.Name;
SQLOperator = BinaryExpressionToMSSQLOperator(eType);
}
public object GetKeyValue(object o)
{
return KeyMemberPropInfo.GetValue(o, null);
}
private string BinaryExpressionToMSSQLOperator(ExpressionType eType)
{
switch (eType)
{
case ExpressionType.Equal:
return "==";
case ExpressionType.GreaterThan:
return ">";
case ExpressionType.GreaterThanOrEqual:
return ">=";
case ExpressionType.LessThan:
return "<";
case ExpressionType.LessThanOrEqual:
return "<=";
case ExpressionType.NotEqual:
return "<>";
default:
throw new ArgumentException($"{eType} is not a handled Expression Type.");
}
}
}
Use Like so:
// This can be a Tuple or whatever.. If Tuple, then y below would be .Item1, etc.
// This data structure is up to you but is what I use.
[FromBody] List<CustomerAddressPk> cKeys
var res = await dbCtx.CustomerAddress
.WhereIsOneOf(cKeys, (x, y) => y.CustomerId == x.CustomerId
&& x.AddressId == y.AddressId)
.ToListAsync();
Hope this helps others.
in Case of composite key you can use another idlist and add a condition for that in your code
context.Table.Where(q => listOfIds.Contains(q.Id) && listOfIds2.Contains(q.Id2));
or you can use one another trick create a list of your keys by adding them
listofid.add(id+id1+......)
context.Table.Where(q => listOfIds.Contains(q.Id+q.id1+.......));
I tried this on EF Core 5.0.3 with the Postgres provider.
context.Table
.Select(entity => new
{
Entity = entity,
CompositeKey = entity.Id1 + entity.Id2,
})
.Where(x => compositeKeys.Contains(x.CompositeKey))
.Select(x => x.Entity);
This produced SQL like:
SELECT *
FROM table AS t
WHERE t.Id1 + t.Id2 IN (#__compositeKeys_0)),
Caveats
this should only be used where the combination of Id1 and Id2 will always produce a unique result (e.g., they're both UUIDs)
this cannot use indexes, though you could save the composite key to the db with an index

LINQ: Collapsing a series of strings into a set of "ranges"

I have an array of strings similar to this (shown on separate lines to illustrate the pattern):
{ "aa002","aa003","aa004","aa005","aa006","aa007", // note that aa008 is missing
"aa009"
"ba023","ba024","ba025"
"bb025",
"ca002","ca003",
"cb004",
...}
...and the goal is to collapse those strings into this comma-separated string of "ranges":
"aa002-aa007,aa009,ba023-ba025,bb025,ca002-ca003,cb004, ... "
I want to collapse them so I can construct a URL. There are hundreds of elements, but I can still convey all the information if I collapse them this way - putting them all into a URL "longhand" (it has to be a GET, not a POST) isn't feasible.
I've had the idea to separate them into groups using the first two characters as the key - but does anyone have any clever ideas for collapsing those sequences (without gaps) into ranges? I'm struggling with it, and everything I've come up with looks like spaghetti.
So the first thing that you need to do is parse the strings. It's important to have the alphabetic prefix and the integer value separately.
Next you want to group the items on the prefix.
For each of the items in that group, you want to order them by number, and then group items while the previous value's number is one less than the current item's number. (Or, put another way, while the previous item plus one is equal to the current item.)
Once you've grouped all of those items you want to project that group out to a value based on that range's prefix, as well as the first and last number. No other information from these groups is needed.
We then flatten the list of strings for each group into just a regular list of strings, since once we're all done there is no need to separate out ranges from different groups. This is done using SelectMany.
When that's all said and done, that, translated into code, is this:
public static IEnumerable<string> Foo(IEnumerable<string> data)
{
return data.Select(item => new
{
Prefix = item.Substring(0, 2),
Number = int.Parse(item.Substring(2))
})
.GroupBy(item => item.Prefix)
.SelectMany(group => group.OrderBy(item => item.Number)
.GroupWhile((prev, current) =>
prev.Number + 1 == current.Number)
.Select(range =>
RangeAsString(group.Key,
range.First().Number,
range.Last().Number)));
}
The GroupWhile method can be implemented like so:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (!predicate(previous, iterator.Current))
{
yield return list;
list = new List<T>();
}
list.Add(iterator.Current);
previous = iterator.Current;
}
yield return list;
}
}
And then the simple helper method to convert each range into a string:
private static string RangeAsString(string prefix, int start, int end)
{
if (start == end)
return prefix + start;
else
return string.Format("{0}{1}-{0}{2}", prefix, start, end);
}
Here's a LINQ version without the need to add new extension methods:
var data2 = data.Skip(1).Zip(data, (d1, d0) => new
{
value = d1,
jump = d1.Substring(0, 2) == d0.Substring(0, 2)
? int.Parse(d1.Substring(2)) - int.Parse(d0.Substring(2))
: -1,
});
var agg = new { f = data.First(), t = data.First(), };
var query2 =
data2
.Aggregate(new [] { agg }.ToList(), (a, x) =>
{
var last = a.Last();
if (x.jump == 1)
{
a.RemoveAt(a.Count() - 1);
a.Add(new { f = last.f, t = x.value, });
}
else
{
a.Add(new { f = x.value, t = x.value, });
}
return a;
});
var query3 =
from q in query2
select (q.f) + (q.f == q.t ? "" : "-" + q.t);
I get these results:

Categories