c# Intersection and Union not working correctly - c#

I am using C# 4.0 in VS 2010 and trying to produce either an intersection or a union of n sets of objects.
The following works correctly:
IEnumerable<String> t1 = new List<string>() { "one", "two", "three" };
IEnumerable<String> t2 = new List<string>() { "three", "four", "five" };
List<String> tInt = t1.Intersect(t2).ToList<String>();
List<String> tUnion = t1.Union(t2).ToList<String>();
// this also works
t1 = t1.Union(t2);
// as does this (but not at the same time!)
t1 = t1.Intersect(t2);
However, the following doesn't. These are code snippets.
My class is:
public class ICD10
{
public string ICD10Code { get; set; }
public string ICD10CodeSearchTitle { get; set; }
}
In the following:
IEnumerable<ICD10Codes> codes = Enumerable.Empty<ICD10Codes>();
IEnumerable<ICD10Codes> codesTemp;
List<List<String>> terms;
// I create terms here ----
// and then ...
foreach (List<string> item in terms)
{
// the following line produces the correct results
codesTemp = dataContextCommonCodes.ICD10Codes.Where(e => item.Any(k => e.ICD10CodeSearchTitle.Contains(k)));
if (codes.Count() == 0)
{
codes = codesTemp;
}
else if (intersectionRequired)
{
codes = codes.Intersect(codesTemp, new ICD10Comparer());
}
else
{
codes = codes.Union(codesTemp, new ICD10Comparer());
}
}
return codes;
The above only ever returns the results of the last item searched.
I also added my own comparer just in case, but this made no difference:
public class ICD10Comparer : IEqualityComparer<ICD10Codes>
{
public bool Equals(ICD10Codes Code1, ICD10Codes Code2)
{
if (Code1.ICD10Code == Code2.ICD10Code) { return true; }
return false;
}
public int GetHashCode(ICD10Codes Code1)
{
return Code1.ICD10Code.GetHashCode();
}
}
I am certain I am overlooking something obvious - I just cannot see what it is!

This code: return codes; returns a deferred enumerable. None of the queries have been executed to fill the set. Some queries get executed each time through the loop to make a Count though.
This deferred execution is a problem because of the closure issue... at the return, item is bound to the last loop execution.
Resolve this by forcing the queries to execute in each loop execution:
if (codes.Count() == 0)
{
codes = codesTemp.ToList();
}
else if (intersectionRequired)
{
codes = codes.Intersect(codesTemp, new ICD10Comparer()).ToList();
}
else
{
codes = codes.Union(codesTemp, new ICD10Comparer()).ToList();
}

if you are using an own comparer, you should take a look at the correct implementation of the GetHashCode function. the linq operators use this comparison too. you can take a look here:
http://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.80).aspx
you could try changing the hash function to "return 0", to see if it is the problem. ICD10Code.GetHashCode will return perhaps different values if it is a class object

Your problem definitely is not connect to Intersect or Union LINQ extension methods. I've just tested following:
var t1 = new List<ICD10>()
{
new ICD10() { ICD10Code = "123" },
new ICD10() { ICD10Code = "234" },
new ICD10() { ICD10Code = "345" }
};
var t2 = new List<ICD10>()
{
new ICD10() { ICD10Code = "234" },
new ICD10() { ICD10Code = "456" }
};
// returns list with just one element - the one with ICF10Code == "234"
var results = t1.Intersect(t2, new ICD10Comparer()).ToList();
// return list with 4 elements
var results2 = t1.Union(t2, new ICD10Comparer()).ToList();
Using your ICD10 and ICD10Comparer classes declarations. Everything works just fine! You have to search for bug in your custom code, because LINQ works just fine.

Related

Is there a way to compare two lists of known equal size functionally with a given comparison function in C#?

I am trying to compare two arrays of known equal size and compare items by index. Normally the following code is what I would use to achieve this:
public bool CompareLists(object[] arr1, object[] arr2)
{
for (int i = 0; i < arr1.Length; ++i)
{
if (!( <compare arr1[i] to arr2[i]; may not be equality comparison> ))
return false;
}
return true;
}
Is there a way to do this functionally in C#? What about using Linq? I am trying to see if something like the following is possible:
return arr.Compare(arr2, (item1, item2) => < some comparison here > );
I think you're looking for SequenceEqual. You can provide it an IEqualityComparer. Example from MSDN:
Product[] storeA = { new Product { Name = "apple", Code = 9 },
new Product { Name = "orange", Code = 4 } };
Product[] storeB = { new Product { Name = "apple", Code = 9 },
new Product { Name = "orange", Code = 4 } };
bool equalAB = storeA.SequenceEqual(storeB, new ProductComparer());
Console.WriteLine("Equal? " + equalAB);
If you want to avoid creating a class that implements IEqualityComparer for each comparison, you'll have to create your own overload of the SequenceEqual method.

Convert object to a queryable

I used a modified version of this answer: How to dynamically create a class in C#? to create a dynamic object that represents a typed class.
public static object CreateNewObject(string[] columnNames)
{
var myType = CompileResultType(columnNames);
return Activator.CreateInstance(myType) as IQueryable;
}
Then in the main app:
var obj = MyTypeBuilder.CreateNewObject(rs.ColumnNames);
I need to somehow convert that to an IQueryable so I can do some Linq calls off it, such as .where(), .select() ect. Naturally, I am not currently able to because my app doesn't know what is exactly in that object, or what that object is.
So what I need is:
var obj = MyTypeBuilder.CreateNewObject(rs.ColumnNames);
List<obj> aListICanFill = new List<obj>();
..
aListICanFill.where(x => x.Equals("")).take(3);
I've blindly tried different casts, and even failed to try an iterate through the object - and now I'm completley stuck.
Is there any way to do this?
http://msdn.microsoft.com/en-us/library/bb882637.aspx seems to be something I should hook onto.
What my object looks like:
If you can use List<dynamic> you can use Where and Select IEnumerable<T> extension methods like below. This does not work with IQueryable because those methods require an Expression which cannot be dynamic.
using System;
using System.Collections.Generic;
using System.Linq;
namespace DynamicListTest
{
internal class Program
{
private static void Main(string[] args)
{
var dynamicObjects = GetDynamicObjects().Cast<dynamic>().AsEnumerable();
var itemsToPrint = dynamicObjects
.Where(item => item.Age > 30);
foreach (var item in itemsToPrint)
{
Console.WriteLine(item);
}
Console.ReadKey();
}
private static IQueryable GetDynamicObjects()
{
return new List<dynamic>()
{
new { Name = "A", Age = 10 },
new { Name = "B", Age = 20 },
new { Name = "C", Age = 30 },
new { Name = "D", Age = 40 },
new { Name = "E", Age = 50 },
}.AsQueryable();
}
}
}
This prints
{ Name = D, Age = 40 }
{ Name = E, Age = 50 }
check out linq to objects
http://msdn.microsoft.com/en-us/library/bb397919.aspx
Hopefully your object contains an array?
Could you give a sample of how you want to query it? And also what CompileResultType does?
var myType = CompileResultType(columnNames);
EDIT
For future reference - as suggested by Shane - OP is trying out - Dynamic Linq dynamiclinq.codeplex.com

A VFP-Cursor in C#?

I'm having a old Visual FoxPro programm, which i need to rewrite in c#.
There we used the cursors from VFP, to read .txt-files and load it into temporary cursors.
Looks for example like this in FoxPro: (mb5b is the mb5b-textfile)
SELECT werk,matnr,ALLTRIM(matnr)+ALLTRIM(werk) as matwerk,sum(zugang) as zugang,sum(abgang) as abgang INTO CURSOR mb5b_temp FROM mb5b GROUP BY werk,matnr
Those cursors dont exist in c#. (I didnt found anything like this.)
So im creating a DataTable and while reading the file I insert it into the DataTable.
DataTable dt_mb5b_temp = new DataTable();
dt_mb5b_temp.Columns.Add("matnr");
dt_mb5b_temp.Columns.Add("werk");
dt_mb5b_temp.Columns.Add("matwerk");
dt_mb5b_temp.Columns.Add("zugang");
dt_mb5b_temp.Columns.Add("abgang");
while ((mb5bline = sr_mb5b.ReadLine()) != null)
{
DataRow dr = dt_mb5b_temp.NewRow();
string[] mb5b = mb5bline.Split(new Char[] { '|' });
dr["matnr"] = mb5b[1].Trim();
dr["werk"] = mb5b[2].Trim();
dr["matwerk"] = mb5b[1].Trim() + mb5b[2].Trim();
dr["zugang"] = mb5b[6].Trim();
dr["abgang"] = mb5b[7].Trim();
}
I thought i may can work with the DataTable.Select() to use a select-statment as above, but it doesnt work ... And other solutions dont come to my mind at the moment :/
For sure i could also insert it into a DB - then use select, but i try to avoid this (Would need two extra tables, and i think those inserts and select will take a long time).
Is there any possibility to get this working ?
Thanks!
If you need anymore Informations, please tell.
look at this site. http://www.dotnetperls.com/readline
using System;
using System.Collections.Generic;
using System.IO;
class Program
{
static void Main()
{
const string f = "TextFile1.txt";
// 1
// Declare new List.
List<string> lines = new List<string>();
// 2
// Use using StreamReader for disposing.
using (StreamReader r = new StreamReader(f))
{
// 3
// Use while != null pattern for loop
string line;
while ((line = r.ReadLine()) != null)
{
// 4
// Insert logic here.
// ...
// "line" is a line in the file. Add it to our List.
lines.Add(line);
}
}
// 5
// Print out all the lines.
foreach (string s in lines)
{
Console.WriteLine(s);
}
}
}
Output
(Prints contents of TextFile1.txt)
This is a text file I created,
Just for this article.
group by ienum
class Pet
{
public string Name { get; set; }
public int Age { get; set; }
}
// Uses method-based query syntax.
public static void GroupByEx1()
{
// Create a list of pets.
List<Pet> pets =
new List<Pet>{ new Pet { Name="Barley", Age=8 },
new Pet { Name="Boots", Age=4 },
new Pet { Name="Whiskers", Age=1 },
new Pet { Name="Daisy", Age=4 } };
// Group the pets using Age as the key value
// and selecting only the pet's Name for each value.
IEnumerable<IGrouping<int, string>> query =
pets.GroupBy(pet => pet.Age, pet => pet.Name);
// Iterate over each IGrouping in the collection.
foreach (IGrouping<int, string> petGroup in query)
{
// Print the key value of the IGrouping.
Console.WriteLine(petGroup.Key);
// Iterate over each value in the
// IGrouping and print the value.
foreach (string name in petGroup)
Console.WriteLine(" {0}", name);
}
}
/*
This code produces the following output:
8
Barley
4
Boots
Daisy
1
Whiskers
*/

Case insensitive group on multiple columns

Is there anyway to do a LINQ2SQL query doing something similar to this:
var result = source.GroupBy(a => new { a.Column1, a.Column2 });
or
var result = from s in source
group s by new { s.Column1, s.Column2 } into c
select new { Column1 = c.Key.Column1, Column2 = c.Key.Column2 };
but with ignoring the case of the contents of the grouped columns?
You can pass StringComparer.InvariantCultureIgnoreCase to the GroupBy extension method.
var result = source.GroupBy(a => new { a.Column1, a.Column2 },
StringComparer.InvariantCultureIgnoreCase);
Or you can use ToUpperInvariant on each field as suggested by Hamlet Hakobyan on comment. I recommend ToUpperInvariant or ToUpper rather than ToLower or ToLowerInvariant because it is optimized for programmatic comparison purpose.
I couldn't get NaveenBhat's solution to work, getting a compile error:
The type arguments for method
'System.Linq.Enumerable.GroupBy(System.Collections.Generic.IEnumerable,
System.Func,
System.Collections.Generic.IEqualityComparer)' cannot be
inferred from the usage. Try specifying the type arguments explicitly.
To make it work, I found it easiest and clearest to define a new class to store my key columns (GroupKey), then a separate class that implements IEqualityComparer (KeyComparer). I can then call
var result= source.GroupBy(r => new GroupKey(r), new KeyComparer());
The KeyComparer class does compare the strings with the InvariantCultureIgnoreCase comparer, so kudos to NaveenBhat for pointing me in the right direction.
Simplified versions of my classes:
private class GroupKey
{
public string Column1{ get; set; }
public string Column2{ get; set; }
public GroupKey(SourceObject r) {
this.Column1 = r.Column1;
this.Column2 = r.Column2;
}
}
private class KeyComparer: IEqualityComparer<GroupKey>
{
bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
{
if (!x.Column1.Equals(y.Column1,StringComparer.InvariantCultureIgnoreCase) return false;
if (!x.Column2.Equals(y.Column2,StringComparer.InvariantCultureIgnoreCase) return false;
return true;
//my actual code is more complex than this, more columns to compare
//and handles null strings, but you get the idea.
}
int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
{
return 0.GetHashCode() ; // forces calling Equals
//Note, it would be more efficient to do something like
//string hcode = Column1.ToLower() + Column2.ToLower();
//return hcode.GetHashCode();
//but my object is more complex than this simplified example
}
}
I had the same issue grouping by the values of DataRow objects from a Table, but I just used .ToString() on the DataRow object to get past the compiler issue, e.g.
MyTable.AsEnumerable().GroupBy(
dataRow => dataRow["Value"].ToString(),
StringComparer.InvariantCultureIgnoreCase)
instead of
MyTable.AsEnumerable().GroupBy(
dataRow => dataRow["Value"],
StringComparer.InvariantCultureIgnoreCase)
I've expanded on Bill B's answer to make things a little more dynamic and to avoid hardcoding the column properties in the GroupKey and IQualityComparer<>.
private class GroupKey
{
public List<string> Columns { get; } = new List<string>();
public GroupKey(params string[] columns)
{
foreach (var column in columns)
{
// Using 'ToUpperInvariant()' if user calls Distinct() after
// the grouping, matching strings with a different case will
// be dropped and not duplicated
Columns.Add(column.ToUpperInvariant());
}
}
}
private class KeyComparer : IEqualityComparer<GroupKey>
{
bool IEqualityComparer<GroupKey>.Equals(GroupKey x, GroupKey y)
{
for (var i = 0; i < x.Columns.Count; i++)
{
if (!x.Columns[i].Equals(y.Columns[i], StringComparison.OrdinalIgnoreCase)) return false;
}
return true;
}
int IEqualityComparer<GroupKey>.GetHashCode(GroupKey obj)
{
var hashcode = obj.Columns[0].GetHashCode();
for (var i = 1; i < obj.Columns.Count; i++)
{
var column = obj.Columns[i];
// *397 is normally generated by ReSharper to create more unique hash values
// So I added it here
// (do keep in mind that multiplying each hash code by the same prime is more prone to hash collisions than using a different prime initially)
hashcode = (hashcode * 397) ^ (column != null ? column.GetHashCode() : 0);
}
return hashcode;
}
}
Usage:
var result = source.GroupBy(r => new GroupKey(r.Column1, r.Column2, r.Column3), new KeyComparer());
This way, you can pass any number of columns into the GroupKey constructor.

LINQ, SelectMany with multiple possible outcomes

I have a situation where I have lists of objects that have to be merged. Each object in the list will have a property that explains how it should be treated in the merger. So assume the following..
enum Cascade {
Full,
Unique,
Right,
Left
}
class Note {
int Id { get; set; }
Cascade Cascade { get; set; }
// lots of other data.
}
var list1 = new List<Note>{
new Note {
Id = 1,
Cascade.Full,
// data
},
new Note {
Id = 2,
Cascade.Right,
// data
}
};
var list2 = new List<Note>{
new Note {
Id = 1,
Cascade.Left,
// data
}
};
var list3 = new List<Note>{
new Note {
Id = 1,
Cascade.Unique,
// data similar to list1.Note[0]
}
}
So then, I'll have a method ...
Composite(this IList<IList<Note>> notes){
return new List<Note> {
notes.SelectMany(g => g).Where(g => g.Cascade == Cascade.All).ToList()
// Here is the problem...
.SelectMany(g => g).Where(g => g.Cascade == Cascade.Right)
.Select( // I want to do a _LastOrDefault_ )
// continuing for the other cascades.
}
}
This is where I get lost. I need to do multiple SelectMany statements, but I don't know how to. But this is the expected behavior.
Cascade.Full
The Note will be in the final collection no matter what.
Cascade.Unique
The Note will be in the final collection one time, ignoring any duplicates.
Cascade.Left
The Note will be in the final collection, First instances superseding subsequent instances. (So then, Notes 1, 2, 3 are identical. Note 1 gets pushed through)
Cascade.Right
The Note will be in the final collection, Last instance superseding duplicates. (So Notes 1, 2, 3 are identical. Note 3 gets pushed trough)
I think you should decompose the problem in smaller parts. For example, you can implement the cascade rules for an individual list in a seperate extension method. Here's my untested take at it:
public static IEnumerable<Note> ApplyCascades(this IEnumerable<Note> notes)
{
var uniques = new HashSet<Note>();
Note rightToYield = null;
foreach (var n in notes)
{
bool leftYielded = false;
if (n.Cascade == Cascade.All) yield return n;
if (n.Cascade == Cascade.Left && !leftYielded)
{
yield return n;
leftYielded = true;
}
if (n.Cascade == Cascade.Right)
{
rightToYield = n;
}
if (n.Cascade == Cascade.Unique && !uniques.Contains(n))
{
yield return n;
uniques.Add(n);
}
}
if (rightToYield != null) yield return rightToYield;
}
}
This method would allow to implement the original extension method something like this:
List<Note> Composite(IList<IList<Note>> notes)
{
var result = from list in notes
from note in list.ApplyCascades()
select note;
return result.ToList();
}

Categories