How to dynamically GroupBy using Linq - c#

There are several similar sounding posts, but none that do exactly what I want.
Okay, so imagine that I have the following data structure (simplified for this LinqPad example)
public class Row
{
public List<string> Columns { get; set; }
}
public List<Row> Data
=> new List<Row>
{
new Row { Columns = new List<string>{ "A","C","Field3"}},
new Row { Columns = new List<string>{ "A","D","Field3"}},
new Row { Columns = new List<string>{ "A","C","Field3"}},
new Row { Columns = new List<string>{ "B","D","Field3"}},
new Row { Columns = new List<string>{ "B","C","Field3"}},
new Row { Columns = new List<string>{ "B","D","Field3"}},
};
For the property "Data", the user will tell me which column ordinals to GroupBy; they may say "don't group by anything", or they may say "group by Column[1]" or "group by Column[0] and Column[1]".
If I want to group by a single column, I can use:
var groups = Data.GroupBy(d => d.Columns[i]);
And if I want to group by 2 columns, I can use:
var groups = Data.GroupBy(d => new { A = d.Columns[i1], B = d.Columns[i2] });
However, the number of columns is variable (zero -> many); Data could contain hundreds of columns and the user may want to GroupBy dozens of columns.
So the question is, how can I create this GroupBy at runtime (dynamically)?
Thanks
Griff

With that Row data structure what are you asking for is relatively easy.
Start by implementing a custom IEqualityComparer<IEnumerable<string>>:
public class ColumnEqualityComparer : EqualityComparer<IEnumerable<string>>
{
public static readonly ColumnEqualityComparer Instance = new ColumnEqualityComparer();
private ColumnEqualityComparer() { }
public override int GetHashCode(IEnumerable<string> obj)
{
if (obj == null) return 0;
// You can implement better hash function
int hashCode = 0;
foreach (var item in obj)
hashCode ^= item != null ? item.GetHashCode() : 0;
return hashCode;
}
public override bool Equals(IEnumerable<string> x, IEnumerable<string> y)
{
if (x == y) return true;
if (x == null || y == null) return false;
return x.SequenceEqual(y);
}
}
Now you can have a method like this:
public IEnumerable<IGrouping<IEnumerable<string>, Row>> GroupData(IEnumerable<int> columnIndexes = null)
{
if (columnIndexes == null) columnIndexes = Enumerable.Empty<int>();
return Data.GroupBy(r => columnIndexes.Select(c => r.Columns[c]), ColumnEqualityComparer.Instance);
}
Note the grouping Key type is IEnumerable<string> and contains the selected row values specified by the columnIndexes parameter, that's why we needed a custom equality comparer (otherwise they will be compared by reference, which doesn't produce the required behavior).
For instance, to group by columns 0 and 2 you could use something like this:
var result = GroupData(new [] { 0, 2 });
Passing null or empty columnIndexes will effectively produce single group, i.e. no grouping.

you can use a Recursive function for create dynamic lambdaExpression. but you must define columns HardCode in the function.

Related

How to compare two csv files by 2 columns?

I have 2 csv files
1.csv
spain;russia;japan
italy;russia;france
2.csv
spain;russia;japan
india;iran;pakistan
I read both files and add data to lists
var lst1= File.ReadAllLines("1.csv").ToList();
var lst2= File.ReadAllLines("2.csv").ToList();
Then I find all unique strings from both lists and add it to result lists
var rezList = lst1.Except(lst2).Union(lst2.Except(lst1)).ToList();
rezlist contains this data
[0] = "italy;russia;france"
[1] = "india;iran;pakistan"
At now I want to compare, make except and union by second and third column in all rows.
1.csv
spain;russia;japan
italy;russia;france
2.csv
spain;russia;japan
india;iran;pakistan
I think I need to split all rows by symbol ';' and make all 3 operations (except, distinct and union) but cannot understand how.
rezlist must contains
india;iran;pakistan
I added class
class StringLengthEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
...
}
public int GetHashCode(string obj)
{
...
}
}
StringLengthEqualityComparer stringLengthComparer = new StringLengthEqualityComparer();
var rezList = lst1.Except(lst2,stringLengthComparer ).Union(lst2.Except(lst1,stringLengthComparer),stringLengthComparer).ToList();
Your question is not very clear: for instance, is india;iran;pakistan the desired result primarily because russia is at element[1]? Isn't it also included because element [2] pakistan does not match france and japan? Even though thats unclear, I assume the desired result comes from either situation.
Then there is this: find all unique string from both lists which changes the nature dramatically. So, I take it that the desired results are because "iran" appears in column[1] no where else in column[1] in either file and even if it did, that row would still be unique due to "pakistan" in col[2].
Also note that a data sample of 2 leaves room for a fair amount of error.
Trying to do it in one step makes it very confusing. Since eliminating dupes found in 1.CSV is pretty easy, do it first:
// parse "1.CSV"
List<string[]> lst1 = File.ReadAllLines(#"C:\Temp\1.csv").
Select(line => line.Split(';')).
ToList();
// parse "2.CSV"
List<string[]> lst2 = File.ReadAllLines(#"C:\Temp\2.csv").
Select(line => line.Split(';')).
ToList();
// extracting once speeds things up in the next step
// and leaves open the possibility of iterating in a method
List<List<string>> tgts = new List<List<string>>();
tgts.Add(lst1.Select(z => z[1]).Distinct().ToList());
tgts.Add(lst1.Select(z => z[2]).Distinct().ToList());
var tmpLst = lst2.Where(x => !tgts[0].Contains(x[1]) ||
!tgts[1].Contains(x[2])).
ToList();
That results in the items which are not in 1.CSV (no matching text in Col[1] nor Col[2]). If that is really all you need, you are done.
Getting unique rows within 2.CSV is trickier because you have to actually count the number of times each Col[1] item occurs to see if it is unique; then repeat for Col[2]. This uses GroupBy:
var unique = tmpLst.
GroupBy(g => g[1], (key, values) =>
new GroupItem(key,
values.ToArray()[0],
values.Count())
).Where(q => q.Count == 1).
GroupBy(g => g.Data[2], (key, values) => new
{
Item = string.Join(";", values.ToArray()[0]),
Count = values.Count()
}
).Where(q => q.Count == 1).Select(s => s.Item).
ToList();
The GroupItem class is trivial:
class GroupItem
{
public string Item { set; get; } // debug aide
public string[] Data { set; get; }
public int Count { set; get; }
public GroupItem(string n, string[] d, int c)
{
Item = n;
Data = d;
Count = c;
}
public override string ToString()
{
return string.Join(";", Data);
}
}
It starts with tmpList, gets the rows with a unique element at [1]. It uses a class for storage since at this point we need the array data for further review.
The second GroupBy acts on those results, this time looking at col[2]. Finally, it selects the joined string data.
Results
Using 50,000 random items in File1 (1.3 MB), 15,000 in File2 (390 kb). There were no naturally occurring unique items, so I manually made 8 unique in 2.CSV and copied 2 of them into 1.CSV. The copies in 1.CSV should eliminate 2 if the 8 unique rows in 2.CSV making the expected result 6 unique rows:
NepalX and ItalyX were the repeats in both files and they correctly eliminated each other.
With each step it is scanning and working with less and less data, which seems to make it pretty fast for 65,000 rows / 130,000 data elements.
your GetHashCode()-Method in EqualityComparer are buggy. Fixed version:
public int GetHashCode(string obj)
{
return obj.Split(';')[1].GetHashCode();
}
now the result are correct:
// one result: "india;iran;pakistan"
btw. "StringLengthEqualityComparer"is not a good name ;-)
private void GetUnion(List<string> lst1, List<string> lst2)
{
List<string> lstUnion = new List<string>();
foreach (string value in lst1)
{
string valueColumn1 = value.Split(';')[0];
string valueColumn2 = value.Split(';')[1];
string valueColumn3 = value.Split(';')[2];
string result = lst2.FirstOrDefault(s => s.Contains(";" + valueColumn2 + ";" + valueColumn3));
if (result != null)
{
if (!lstUnion.Contains(result))
{
lstUnion.Add(result);
}
}
}
}
class Program
{
static void Main(string[] args)
{
var lst1 = File.ReadLines(#"D:\test\1.csv").Select(x => new StringWrapper(x)).ToList();
var lst2 = File.ReadLines(#"D:\test\2.csv").Select(x => new StringWrapper(x));
var set = new HashSet<StringWrapper>(lst1);
set.SymmetricExceptWith(lst2);
foreach (var x in set)
{
Console.WriteLine(x.Value);
}
}
}
struct StringWrapper : IEquatable<StringWrapper>
{
public string Value { get; }
private readonly string _comparand0;
private readonly string _comparand14;
public StringWrapper(string value)
{
Value = value;
var split = value.Split(';');
_comparand0 = split[0];
_comparand14 = split[14];
}
public bool Equals(StringWrapper other)
{
return string.Equals(_comparand0, other._comparand0, StringComparison.OrdinalIgnoreCase)
&& string.Equals(_comparand14, other._comparand14, StringComparison.OrdinalIgnoreCase);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
return obj is StringWrapper && Equals((StringWrapper) obj);
}
public override int GetHashCode()
{
unchecked
{
return ((_comparand0 != null ? StringComparer.OrdinalIgnoreCase.GetHashCode(_comparand0) : 0)*397)
^ (_comparand14 != null ? StringComparer.OrdinalIgnoreCase.GetHashCode(_comparand14) : 0);
}
}
}

How to getting distinct values by linq or lambda?

I have a list of items, and i try to getting unique items by distinct keys.
The class:
class TempClass
{
public string One { get; set; }
public string Two { get; set; }
public string Key
{
get
{
return "Key_" + One + "_" + Two;
}
}
}
I build the dummy list as follows:
List<TempClass> l = new List<TempClass>()
{
new TempClass(){ One="Da" , Two = "Mi"},
new TempClass(){ One="Da" , Two = "Mi"},
new TempClass(){ One="Da" , Two = "Mi"},
new TempClass(){ One="Mi" , Two = "Da"},
new TempClass(){ One="Mi" , Two = "Da"},
};
My question is - how get only 1 item? by check that does exist only unique key? unique item means that should to check that have there only one key that equals to "Key_Da_Mi" or "Key_Mi_Da"?
how to achieve that?
Group each of the items on a HashSet of strings containing both keys, use HashSet's set comparer to compare the items as sets (sets are unordered) and then pull out the first (or whichever) item from each group:
var distinct = l.GroupBy(item => new HashSet<string>() { item.One, item.Two },
HashSet<string>.CreateSetComparer())
.Select(group => group.First());
You should either implement equality comparison, or implement IEqualityComparer<T> with your specific logic:
class TempClassEqualityComparer : IEqualityComparer<TempClass>
{
public bool Equals(TempClass x, TempClass y)
{
if (Object.ReferenceEquals(x, y)) return true;
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
// For comparison check both combinations
return (x.One == y.One && x.Two == y.Two) || (x.One == y.Two && x.Two == y.One);
}
public int GetHashCode(TempClass x)
{
if (Object.ReferenceEquals(x, null)) return 0;
return x.One.GetHashCode() ^ x.Two.GetHashCode();
}
}
Then you can use this comparer in Distinct method:
var result = l.Distinct(new TempClassEqualityComparer());
Just order them before you create the key.
public string Key
{
get{
List<string> l = new List<string>{One, Two};
l = l.OrderBy(x => x).ToList();
return "Key_" + string.Join("_", l);
}
}

Case insensitive group on multiple columns

Is there anyway to do a LINQ2SQL query doing something similar to this:
var result = source.GroupBy(a => new { a.Column1, a.Column2 });
or
var result = from s in source
group s by new { s.Column1, s.Column2 } into c
select new { Column1 = c.Key.Column1, Column2 = c.Key.Column2 };
but with ignoring the case of the contents of the grouped columns?
You can pass StringComparer.InvariantCultureIgnoreCase to the GroupBy extension method.
var result = source.GroupBy(a => new { a.Column1, a.Column2 },
StringComparer.InvariantCultureIgnoreCase);
Or you can use ToUpperInvariant on each field as suggested by Hamlet Hakobyan on comment. I recommend ToUpperInvariant or ToUpper rather than ToLower or ToLowerInvariant because it is optimized for programmatic comparison purpose.
I couldn't get NaveenBhat's solution to work, getting a compile error:
The type arguments for method
'System.Linq.Enumerable.GroupBy(System.Collections.Generic.IEnumerable,
System.Func,
System.Collections.Generic.IEqualityComparer)' cannot be
inferred from the usage. Try specifying the type arguments explicitly.
To make it work, I found it easiest and clearest to define a new class to store my key columns (GroupKey), then a separate class that implements IEqualityComparer (KeyComparer). I can then call
var result= source.GroupBy(r => new GroupKey(r), new KeyComparer());
The KeyComparer class does compare the strings with the InvariantCultureIgnoreCase comparer, so kudos to NaveenBhat for pointing me in the right direction.
Simplified versions of my classes:
private class GroupKey
{
public string Column1{ get; set; }
public string Column2{ get; set; }
public GroupKey(SourceObject r) {
this.Column1 = r.Column1;
this.Column2 = r.Column2;
}
}
private class KeyComparer: IEqualityComparer<GroupKey>
{
bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
{
if (!x.Column1.Equals(y.Column1,StringComparer.InvariantCultureIgnoreCase) return false;
if (!x.Column2.Equals(y.Column2,StringComparer.InvariantCultureIgnoreCase) return false;
return true;
//my actual code is more complex than this, more columns to compare
//and handles null strings, but you get the idea.
}
int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
{
return 0.GetHashCode() ; // forces calling Equals
//Note, it would be more efficient to do something like
//string hcode = Column1.ToLower() + Column2.ToLower();
//return hcode.GetHashCode();
//but my object is more complex than this simplified example
}
}
I had the same issue grouping by the values of DataRow objects from a Table, but I just used .ToString() on the DataRow object to get past the compiler issue, e.g.
MyTable.AsEnumerable().GroupBy(
dataRow => dataRow["Value"].ToString(),
StringComparer.InvariantCultureIgnoreCase)
instead of
MyTable.AsEnumerable().GroupBy(
dataRow => dataRow["Value"],
StringComparer.InvariantCultureIgnoreCase)
I've expanded on Bill B's answer to make things a little more dynamic and to avoid hardcoding the column properties in the GroupKey and IQualityComparer<>.
private class GroupKey
{
public List<string> Columns { get; } = new List<string>();
public GroupKey(params string[] columns)
{
foreach (var column in columns)
{
// Using 'ToUpperInvariant()' if user calls Distinct() after
// the grouping, matching strings with a different case will
// be dropped and not duplicated
Columns.Add(column.ToUpperInvariant());
}
}
}
private class KeyComparer : IEqualityComparer<GroupKey>
{
bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
{
for (var i = 0; i < x.Columns.Count; i++)
{
if (!x.Columns[i].Equals(y.Columns[i], StringComparison.OrdinalIgnoreCase)) return false;
}
return true;
}
int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
{
var hashcode = obj.Columns[0].GetHashCode();
for (var i = 1; i < obj.Columns.Count; i++)
{
var column = obj.Columns[i];
// *397 is normally generated by ReSharper to create more unique hash values
// So I added it here
// (do keep in mind that multiplying each hash code by the same prime is more prone to hash collisions than using a different prime initially)
hashcode = (hashcode * 397) ^ (column != null ? column.GetHashCode() : 0);
}
return hashcode;
}
}
Usage:
var result = source.GroupBy(r => new GroupKey(r.Column1, r.Column2, r.Column3), new KeyComparer());
This way, you can pass any number of columns into the GroupKey constructor.

LINQ (or something else) to compare a pair of values from two lists (in any order)?

Basically, I have two IEnumerable<FooClass>s where each FooClass instance contains 2 properties: FirstName, LastName.
The instances on each of the enumerables is NOT the same. Instead, I need to check against the properties on each of the instances. I'm not sure of the most efficient way to do this, but basically I need to make sure that both lists contain similar data (not the same instance, but the same values on the properties). I don't have access to the FooClass itself to modify it.
I should say that the FooClass is a type of Attribute class, which has access to the Attribute.Match() method, so I don't need to check each properties individually.
Based on the comments, I've updated the question to be more specific and changed it slightly... This is what I have so far:
public void Foo()
{
var info = typeof(MyClass);
var attributes = info.GetCustomAttributes(typeof(FooAttribute), false) as IEnumerable<FooAttribute>;
var validateAttributeList = new Collection<FooAttribute>
{
new FooAttribute(typeof(int), typeof(double));
new FooAttribute(typeof(int), typeof(single));
};
//Make sure that the each item in validateAttributeList is contained in
//the attributes list (additional items in the attributes list don't matter).
//I know I can use the Attribute.Match(obj) to compare.
}
Enumerable.SequenceEqual will tell you if the two sequences are identical.
If FooClass has an overridden Equals method that compares the FirstName and LastName, then you should be able to write:
bool equal = List1.SequenceEqual(List2);
If FooClass doesn't have an overridden Equals method, then you need to create an IEqualityComparer<FooClass>:
class FooComparer: IEqualityComparer<FooClass>
{
public bool Equals(FooClass f1, FooClass f2)
{
return (f1.FirstName == f2.FirstName) && (f1.LastName == f2.LastName);
}
public int GetHashCode()
{
return FirstName.GetHashCode() ^ LastName.GetHashCode();
}
}
and then you write:
var comparer = new FooComparer();
bool identical = List1.SequenceEqual(List2, comparer);
You can do in this way:
Define a custom IEqualityComparer<FooAttribute> :
class FooAttributeComparer : IEqualityComparer<FooAttribute>
{
public bool Equals(FooAttribute x, FooAttribute y)
{
return x.Match(y);
}
public int GetHashCode(FooAttribute obj)
{
return 0;
// This makes lookups complexity O(n) but it could be reasonable for small lists
// or if you're not sure about GetHashCode() implementation to do.
// If you want more speed you could return e.g. :
// return obj.Field1.GetHashCode() ^ (17 * obj.Field2.GetHashCode());
}
}
Define an extension method to compare lists in any order and having the same number of equal elements:
public static bool ListContentIsEqualInAnyOrder<T>(
this IEnumerable<T> list1, IEnumerable<T> list2, IEqualityComparer<T> comparer)
{
var lookup1 = list1.ToLookup(x => x, comparer);
var lookup2 = list2.ToLookup(x => x, comparer);
if (lookup1.Count != lookup2.Count)
return false;
return lookup1.All(el1 => lookup2.Contains(el1.Key) &&
lookup2[el1.Key].Count() == el1.Count());
}
Usage example:
static void Main(string[] args)
{
List<FooAttribute> attrs = new List<FooAttribute>
{
new FooAttribute(typeof(int), typeof(double)),
new FooAttribute(typeof(int), typeof(double)),
new FooAttribute(typeof(bool), typeof(float)),
new FooAttribute(typeof(uint), typeof(string)),
};
List<FooAttribute> attrs2 = new List<FooAttribute>
{
new FooAttribute(typeof(uint), typeof(string)),
new FooAttribute(typeof(int), typeof(double)),
new FooAttribute(typeof(int), typeof(double)),
new FooAttribute(typeof(bool), typeof(float)),
};
// this returns true
var listEqual1 = attrs.ListContentIsEqualInAnyOrder(attrs2, new FooAttributeComparer());
// this returns false
attrs2.RemoveAt(1);
var listEqual2 = attrs.ListContentIsEqualInAnyOrder(attrs2, new FooAttributeComparer());
}
Assuming that
The lists both fit in memory and are unsorted
Case doesn't matter
Names don't contain the character "!"
Names do not contain duplicates:
then
var setA = new HashSet<String>(
firstEnumerable.Select(i => i.FirstName.ToUpper() + "!" + i.LastName.ToUpper()));
var setB = new HashSet<String>(
secondEnumerable.Select(i => i.FirstName.ToUpper() + "!" + i.LastName.ToUpper()));
return setA.SetEquals(setB);

LINQ, SelectMany with multiple possible outcomes

I have a situation where I have lists of objects that have to be merged. Each object in the list will have a property that explains how it should be treated in the merger. So assume the following..
enum Cascade {
Full,
Unique,
Right,
Left
}
class Note {
int Id { get; set; }
Cascade Cascade { get; set; }
// lots of other data.
}
var list1 = new List<Note>{
new Note {
Id = 1,
Cascade.Full,
// data
},
new Note {
Id = 2,
Cascade.Right,
// data
}
};
var list2 = new List<Note>{
new Note {
Id = 1,
Cascade.Left,
// data
}
};
var list3 = new List<Note>{
new Note {
Id = 1,
Cascade.Unique,
// data similar to list1.Note[0]
}
}
So then, I'll have a method ...
Composite(this IList<IList<Note>> notes){
return new List<Note> {
notes.SelectMany(g => g).Where(g => g.Cascade == Cascade.All).ToList()
// Here is the problem...
.SelectMany(g => g).Where(g => g.Cascade == Cascade.Right)
.Select( // I want to do a _LastOrDefault_ )
// continuing for the other cascades.
}
}
This is where I get lost. I need to do multiple SelectMany statements, but I don't know how to. But this is the expected behavior.
Cascade.Full
The Note will be in the final collection no matter what.
Cascade.Unique
The Note will be in the final collection one time, ignoring any duplicates.
Cascade.Left
The Note will be in the final collection, First instances superseding subsequent instances. (So then, Notes 1, 2, 3 are identical. Note 1 gets pushed through)
Cascade.Right
The Note will be in the final collection, Last instance superseding duplicates. (So Notes 1, 2, 3 are identical. Note 3 gets pushed trough)
I think you should decompose the problem in smaller parts. For example, you can implement the cascade rules for an individual list in a seperate extension method. Here's my untested take at it:
public static IEnumerable<Note> ApplyCascades(this IEnumerable<Note> notes)
{
var uniques = new HashSet<Note>();
Note rightToYield = null;
foreach (var n in notes)
{
bool leftYielded = false;
if (n.Cascade == Cascade.All) yield return n;
if (n.Cascade == Cascade.Left && !leftYielded)
{
yield return n;
leftYielded = true;
}
if (n.Cascade == Cascade.Right)
{
rightToYield = n;
}
if (n.Cascade == Cascade.Unique && !uniques.Contains(n))
{
yield return n;
uniques.Add(n);
}
}
if (rightToYield != null) yield return rightToYield;
}
}
This method would allow to implement the original extension method something like this:
List<Note> Composite(IList<IList<Note>> notes)
{
var result = from list in notes
from note in list.ApplyCascades()
select note;
return result.ToList();
}

Categories