LINQ, SelectMany with multiple possible outcomes - c#

I have a situation where I have lists of objects that have to be merged. Each object in the list will have a property that explains how it should be treated in the merger. So assume the following..
enum Cascade {
Full,
Unique,
Right,
Left
}
class Note {
int Id { get; set; }
Cascade Cascade { get; set; }
// lots of other data.
}
var list1 = new List<Note>{
new Note {
Id = 1,
Cascade.Full,
// data
},
new Note {
Id = 2,
Cascade.Right,
// data
}
};
var list2 = new List<Note>{
new Note {
Id = 1,
Cascade.Left,
// data
}
};
var list3 = new List<Note>{
new Note {
Id = 1,
Cascade.Unique,
// data similar to list1.Note[0]
}
}
So then, I'll have a method ...
Composite(this IList<IList<Note>> notes){
return new List<Note> {
notes.SelectMany(g => g).Where(g => g.Cascade == Cascade.All).ToList()
// Here is the problem...
.SelectMany(g => g).Where(g => g.Cascade == Cascade.Right)
.Select( // I want to do a _LastOrDefault_ )
// continuing for the other cascades.
}
}
This is where I get lost. I need to do multiple SelectMany statements, but I don't know how to. But this is the expected behavior.
Cascade.Full
The Note will be in the final collection no matter what.
Cascade.Unique
The Note will be in the final collection one time, ignoring any duplicates.
Cascade.Left
The Note will be in the final collection, First instances superseding subsequent instances. (So then, Notes 1, 2, 3 are identical. Note 1 gets pushed through)
Cascade.Right
The Note will be in the final collection, Last instance superseding duplicates. (So Notes 1, 2, 3 are identical. Note 3 gets pushed trough)

I think you should decompose the problem in smaller parts. For example, you can implement the cascade rules for an individual list in a seperate extension method. Here's my untested take at it:
public static IEnumerable<Note> ApplyCascades(this IEnumerable<Note> notes)
{
var uniques = new HashSet<Note>();
Note rightToYield = null;
foreach (var n in notes)
{
bool leftYielded = false;
if (n.Cascade == Cascade.All) yield return n;
if (n.Cascade == Cascade.Left && !leftYielded)
{
yield return n;
leftYielded = true;
}
if (n.Cascade == Cascade.Right)
{
rightToYield = n;
}
if (n.Cascade == Cascade.Unique && !uniques.Contains(n))
{
yield return n;
uniques.Add(n);
}
}
if (rightToYield != null) yield return rightToYield;
}
}
This method would allow to implement the original extension method something like this:
List<Note> Composite(IList<IList<Note>> notes)
{
var result = from list in notes
from note in list.ApplyCascades()
select note;
return result.ToList();
}

Related

How to Update or Add to a Collection properly in Parallel processing (ForEach)

I am trying to process a list of addresses into 2 collections; a list of addresses that meets requirements; a list of scores for each of those addresses that meets requirements.
I have several different criteria to slice and dice the addresses, but I am running into an incorrect implementation of modifying a collection during parallel processing.
For instance, I take the zip code and for each (+388k records) I determine if the Levenshtein Distance is greater or equal to a filtering value. If it does, then add the address to the compiled list of addresses. Then using the SET method, determine if the score for the address already exists. If is does not, then add. Else use the score currently in the collection and update the appropriate property.
I am hitting the "Collection Has Been Modified" error and I can not think of another way to implement the solution. I am looking for alternative ways to accomplish this.
private LevenshteinDistance _ld = new LevenshteinDistance();
private int _filter_zip = int.Parse(ConfigHelper.GetAppSettingByKey("filter_zip"));
public Factory ScoreAddresses(Factory model)
{
Factory result = model;
IList<Address> _compiledList = new List<Address>();
IList<Scores> _compiledScores = new List<Scores>();
ParallelLoopResult MailLoopResult = Parallel.ForEach(
result.ALL_Addresses, factory =>
{
if (factory.MAIL_ZIP_CODE_NBR != null
&& factory.MAIL_ZIP_CODE_NBR.Length >= 1)
{
int _zipDistance = _ld.Compute(
result.Target_Address.MAIL_ZIP_CODE_NBR,
factory.MAIL_ZIP_CODE_NBR);
if (_zipDistance <= _filter_zip)
{
_compiledList.Add(factory);
Scores _score = new Scores();
_compiledScores = _score.Set(_compiledScores,
factory.INDIVIDUAL_ID, "Levenshtein_MAIL_ZIP_CODE_NBR",
_zipDistance, string.Empty, null);
}
}
});
return result;
}
public IList<Scores> Set(IList<Scores> current, int Individual_ID,
string Score_Type, int Levenshtein, string Methaphone, decimal? other)
{
int currentcount = current.Count();
bool updating = false;
IList<Scores> result = current;
try
{
Scores newscore = new Scores();
//Check if scoreing for individual already present
Scores lookingforIndv = current.AsParallel().Where(
p => p.INDIVIDUAL_ID == Individual_ID).FirstOrDefault();
if (lookingforIndv != null)
{
newscore = lookingforIndv;
updating = true;
}
else
{
newscore.INDIVIDUAL_ID = Individual_ID;
updating = false;
}
//Add score based upon indicated score type
switch (Score_Type)
{
case "Levenshtein_MAIL_ZIP_CODE_NBR":
newscore.Levenshtein_MAIL_ZIP_CODE_NBR = Levenshtein;
result.Add(newscore);
break;
//More and More Cases
}
}
catch (Exception e)
{
}
return result;
}

How to dynamically GroupBy using Linq

There are several similar sounding posts, but none that do exactly what I want.
Okay, so imagine that I have the following data structure (simplified for this LinqPad example)
public class Row
{
public List<string> Columns { get; set; }
}
public List<Row> Data
=> new List<Row>
{
new Row { Columns = new List<string>{ "A","C","Field3"}},
new Row { Columns = new List<string>{ "A","D","Field3"}},
new Row { Columns = new List<string>{ "A","C","Field3"}},
new Row { Columns = new List<string>{ "B","D","Field3"}},
new Row { Columns = new List<string>{ "B","C","Field3"}},
new Row { Columns = new List<string>{ "B","D","Field3"}},
};
For the property "Data", the user will tell me which column ordinals to GroupBy; they may say "don't group by anything", or they may say "group by Column[1]" or "group by Column[0] and Column[1]".
If I want to group by a single column, I can use:
var groups = Data.GroupBy(d => d.Columns[i]);
And if I want to group by 2 columns, I can use:
var groups = Data.GroupBy(d => new { A = d.Columns[i1], B = d.Columns[i2] });
However, the number of columns is variable (zero -> many); Data could contain hundreds of columns and the user may want to GroupBy dozens of columns.
So the question is, how can I create this GroupBy at runtime (dynamically)?
Thanks
Griff
With that Row data structure what are you asking for is relatively easy.
Start by implementing a custom IEqualityComparer<IEnumerable<string>>:
public class ColumnEqualityComparer : EqualityComparer<IEnumerable<string>>
{
public static readonly ColumnEqualityComparer Instance = new ColumnEqualityComparer();
private ColumnEqualityComparer() { }
public override int GetHashCode(IEnumerable<string> obj)
{
if (obj == null) return 0;
// You can implement better hash function
int hashCode = 0;
foreach (var item in obj)
hashCode ^= item != null ? item.GetHashCode() : 0;
return hashCode;
}
public override bool Equals(IEnumerable<string> x, IEnumerable<string> y)
{
if (x == y) return true;
if (x == null || y == null) return false;
return x.SequenceEqual(y);
}
}
Now you can have a method like this:
public IEnumerable<IGrouping<IEnumerable<string>, Row>> GroupData(IEnumerable<int> columnIndexes = null)
{
if (columnIndexes == null) columnIndexes = Enumerable.Empty<int>();
return Data.GroupBy(r => columnIndexes.Select(c => r.Columns[c]), ColumnEqualityComparer.Instance);
}
Note the grouping Key type is IEnumerable<string> and contains the selected row values specified by the columnIndexes parameter, that's why we needed a custom equality comparer (otherwise they will be compared by reference, which doesn't produce the required behavior).
For instance, to group by columns 0 and 2 you could use something like this:
var result = GroupData(new [] { 0, 2 });
Passing null or empty columnIndexes will effectively produce single group, i.e. no grouping.
you can use a Recursive function for create dynamic lambdaExpression. but you must define columns HardCode in the function.

Case insensitive group on multiple columns

Is there anyway to do a LINQ2SQL query doing something similar to this:
var result = source.GroupBy(a => new { a.Column1, a.Column2 });
or
var result = from s in source
group s by new { s.Column1, s.Column2 } into c
select new { Column1 = c.Key.Column1, Column2 = c.Key.Column2 };
but with ignoring the case of the contents of the grouped columns?
You can pass StringComparer.InvariantCultureIgnoreCase to the GroupBy extension method.
var result = source.GroupBy(a => new { a.Column1, a.Column2 },
StringComparer.InvariantCultureIgnoreCase);
Or you can use ToUpperInvariant on each field as suggested by Hamlet Hakobyan on comment. I recommend ToUpperInvariant or ToUpper rather than ToLower or ToLowerInvariant because it is optimized for programmatic comparison purpose.
I couldn't get NaveenBhat's solution to work, getting a compile error:
The type arguments for method
'System.Linq.Enumerable.GroupBy(System.Collections.Generic.IEnumerable,
System.Func,
System.Collections.Generic.IEqualityComparer)' cannot be
inferred from the usage. Try specifying the type arguments explicitly.
To make it work, I found it easiest and clearest to define a new class to store my key columns (GroupKey), then a separate class that implements IEqualityComparer (KeyComparer). I can then call
var result= source.GroupBy(r => new GroupKey(r), new KeyComparer());
The KeyComparer class does compare the strings with the InvariantCultureIgnoreCase comparer, so kudos to NaveenBhat for pointing me in the right direction.
Simplified versions of my classes:
private class GroupKey
{
public string Column1{ get; set; }
public string Column2{ get; set; }
public GroupKey(SourceObject r) {
this.Column1 = r.Column1;
this.Column2 = r.Column2;
}
}
private class KeyComparer: IEqualityComparer<GroupKey>
{
bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
{
if (!x.Column1.Equals(y.Column1,StringComparer.InvariantCultureIgnoreCase) return false;
if (!x.Column2.Equals(y.Column2,StringComparer.InvariantCultureIgnoreCase) return false;
return true;
//my actual code is more complex than this, more columns to compare
//and handles null strings, but you get the idea.
}
int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
{
return 0.GetHashCode() ; // forces calling Equals
//Note, it would be more efficient to do something like
//string hcode = Column1.ToLower() + Column2.ToLower();
//return hcode.GetHashCode();
//but my object is more complex than this simplified example
}
}
I had the same issue grouping by the values of DataRow objects from a Table, but I just used .ToString() on the DataRow object to get past the compiler issue, e.g.
MyTable.AsEnumerable().GroupBy(
dataRow => dataRow["Value"].ToString(),
StringComparer.InvariantCultureIgnoreCase)
instead of
MyTable.AsEnumerable().GroupBy(
dataRow => dataRow["Value"],
StringComparer.InvariantCultureIgnoreCase)
I've expanded on Bill B's answer to make things a little more dynamic and to avoid hardcoding the column properties in the GroupKey and IQualityComparer<>.
private class GroupKey
{
public List<string> Columns { get; } = new List<string>();
public GroupKey(params string[] columns)
{
foreach (var column in columns)
{
// Using 'ToUpperInvariant()' if user calls Distinct() after
// the grouping, matching strings with a different case will
// be dropped and not duplicated
Columns.Add(column.ToUpperInvariant());
}
}
}
private class KeyComparer : IEqualityComparer<GroupKey>
{
bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
{
for (var i = 0; i < x.Columns.Count; i++)
{
if (!x.Columns[i].Equals(y.Columns[i], StringComparison.OrdinalIgnoreCase)) return false;
}
return true;
}
int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
{
var hashcode = obj.Columns[0].GetHashCode();
for (var i = 1; i < obj.Columns.Count; i++)
{
var column = obj.Columns[i];
// *397 is normally generated by ReSharper to create more unique hash values
// So I added it here
// (do keep in mind that multiplying each hash code by the same prime is more prone to hash collisions than using a different prime initially)
hashcode = (hashcode * 397) ^ (column != null ? column.GetHashCode() : 0);
}
return hashcode;
}
}
Usage:
var result = source.GroupBy(r => new GroupKey(r.Column1, r.Column2, r.Column3), new KeyComparer());
This way, you can pass any number of columns into the GroupKey constructor.

c# Intersection and Union not working correctly

I am using C# 4.0 in VS 2010 and trying to produce either an intersection or a union of n sets of objects.
The following works correctly:
IEnumerable<String> t1 = new List<string>() { "one", "two", "three" };
IEnumerable<String> t2 = new List<string>() { "three", "four", "five" };
List<String> tInt = t1.Intersect(t2).ToList<String>();
List<String> tUnion = t1.Union(t2).ToList<String>();
// this also works
t1 = t1.Union(t2);
// as does this (but not at the same time!)
t1 = t1.Intersect(t2);
However, the following doesn't. These are code snippets.
My class is:
public class ICD10
{
public string ICD10Code { get; set; }
public string ICD10CodeSearchTitle { get; set; }
}
In the following:
IEnumerable<ICD10Codes> codes = Enumerable.Empty<ICD10Codes>();
IEnumerable<ICD10Codes> codesTemp;
List<List<String>> terms;
// I create terms here ----
// and then ...
foreach (List<string> item in terms)
{
// the following line produces the correct results
codesTemp = dataContextCommonCodes.ICD10Codes.Where(e => item.Any(k => e.ICD10CodeSearchTitle.Contains(k)));
if (codes.Count() == 0)
{
codes = codesTemp;
}
else if (intersectionRequired)
{
codes = codes.Intersect(codesTemp, new ICD10Comparer());
}
else
{
codes = codes.Union(codesTemp, new ICD10Comparer());
}
}
return codes;
The above only ever returns the results of the last item searched.
I also added my own comparer just in case, but this made no difference:
public class ICD10Comparer : IEqualityComparer<ICD10Codes>
{
public bool Equals(ICD10Codes Code1, ICD10Codes Code2)
{
if (Code1.ICD10Code == Code2.ICD10Code) { return true; }
return false;
}
public int GetHashCode(ICD10Codes Code1)
{
return Code1.ICD10Code.GetHashCode();
}
}
I am certain I am overlooking something obvious - I just cannot see what it is!
This code: return codes; returns a deferred enumerable. None of the queries have been executed to fill the set. Some queries get executed each time through the loop to make a Count though.
This deferred execution is a problem because of the closure issue... at the return, item is bound to the last loop execution.
Resolve this by forcing the queries to execute in each loop execution:
if (codes.Count() == 0)
{
codes = codesTemp.ToList();
}
else if (intersectionRequired)
{
codes = codes.Intersect(codesTemp, new ICD10Comparer()).ToList();
}
else
{
codes = codes.Union(codesTemp, new ICD10Comparer()).ToList();
}
if you are using an own comparer, you should take a look at the correct implementation of the GetHashCode function. the linq operators use this comparison too. you can take a look here:
http://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.80).aspx
you could try changing the hash function to "return 0", to see if it is the problem. ICD10Code.GetHashCode will return perhaps different values if it is a class object
Your problem definitely is not connect to Intersect or Union LINQ extension methods. I've just tested following:
var t1 = new List<ICD10>()
{
new ICD10() { ICD10Code = "123" },
new ICD10() { ICD10Code = "234" },
new ICD10() { ICD10Code = "345" }
};
var t2 = new List<ICD10>()
{
new ICD10() { ICD10Code = "234" },
new ICD10() { ICD10Code = "456" }
};
// returns list with just one element - the one with ICF10Code == "234"
var results = t1.Intersect(t2, new ICD10Comparer()).ToList();
// return list with 4 elements
var results2 = t1.Union(t2, new ICD10Comparer()).ToList();
Using your ICD10 and ICD10Comparer classes declarations. Everything works just fine! You have to search for bug in your custom code, because LINQ works just fine.

Sort list given a list of keys?

I have a List<Project>
A Project has an ID which is an int.
I then have a list of int which is correspond to IDs of projects.
The projects need to be processed in the order of the list of int.
There might be projects with a null ID.
Any project not having an ID or have an id not in the list will go to the bottom (or even better, be removed from the results lists).
I can think of an O(N^2) way to do this but I am wondering if there might be a better way with LINQ or something that could be more m + n or n or something...
Thanks
class Project
{
public int? id;
public Project(int? iid) { id = iid; }
}
public class Program
{
static void Main(string[] args)
{
List<Project> pros = new List<Project>() { new Project(null), new Project(10), new Project(50), new Project(1), new Project(null) };
var x = new Comparison<Project>((Project r, Project l) =>
{
if (r.id == null && l.id == null)
return 0;
if (r.id == null)
{
return 1;
}
if (l.id == null)
{
return -1;
}
return Math.Sign(r.id.Value - l.id.Value);
});
pros.Sort(x);
Console.ReadLine();
}
}
You can change who minuses who and the polarity of -1 or 1 to get the nans where you want it to go. This one pushes the nans to the end and sorts smallest to largest.
Alternatively, if you don't want to process the nans at all and don't want to sort by ID, just use a where statement to get the iterator without the null ids:
var nonulls = pros.Where(pr => (pr.id != null));
Which lazily evaluates as just the set without the nulls, doesn't actually store the intermediates so you don't have to worry about the storage issues. O(N), little to no overhead.
With next solution projects with null Id will be ignored. O(N)
int[] ids = new int[10];
List<Project> projects = new List<Project>();
var projectsDictionary = projects.ToDictionary(proj=> proj.Id, proj => proj);
var orderedProjects = ids.Select(id => projectsDictionary[id]);
Use a custom comparison to handle the null project numbers any way you like:
class Project {
int? ID { get; set; }
}
...
Comparison<Project> comparison = delegate(Project x, Project y)
{
int xkey = x.ID.HasValue ? x.ID.Value : int.MaxValue;
int ykey = y.ID.HasValue ? y.ID.Value : int.MaxValue;
return xkey.CompareTo(ykey);
};
list.Sort(comparison);
Here's a very simple LINQ way to do it. Not sure about the runtime though.
List<int?> pids = new List<int?>() { 2, 4, 3 };
List<Project> projects = new List<Project>() {
new Project(1), new Project(2),
new Project(3), new Project(4),
new Project(5), new Project(null) };
List<Project> sortedProjectsByPids = pids
.Select(pid => projects.First(p => p.ID == pid))
.ToList<Project>();
Assuming your Project class looks like the following:
class Project
{
public int? ID;
public Project(int? id)
{
ID = id;
}
}
Hope this helps!

Categories