Assume there is this class:
public class Foo
{
public int Id { get; set; }
public int? NullableId { get; set; }
public Foo(int id, int? nullableId)
{
Id = id;
NullableId = nullableId;
}
}
I need to compare these objects by following rules:
If both objects have value for NullableId then we compare both Id
and NullableId
If some of the objects/both of them do not have NullableId then
ignore it and compare only Id.
To achieve it I have overwritten Equals and GetHashCode like this:
public override bool Equals(object obj)
{
var otherFoo = (Foo)obj;
var equalityCondition = Id == otherFoo.Id;
if (NullableId.HasValue && otherFoo.NullableId.HasValue)
equalityCondition &= (NullableId== otherFoo.NullableId);
return equalityCondition;
}
public override int GetHashCode()
{
var hashCode = 806340729;
hashCode = hashCode * -1521134295 + Id.GetHashCode();
return hashCode;
}
Further down I have two lists of Foo:
var first = new List<Foo> { new Foo(1, null) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };
Next, I want to join these lists. If I do it like this:
var result = second.Join(first, s => s, f => f, (f, s) => new {f, s}).ToList();
then the result would be as I expected and I will get 3 items.
But, if I change order and join first with second:
var result = first.Join(second, f => f, s => s, (f, s) => new {f, s}).ToList();
then the result would only have 1 item - new Foo(1, null) and new Foo(1 ,3)
I can not get what am I doing wrong. If try to put a break point in Equals method then I can see that it tries to compare items from same list (e. g. compare new Foo(1, 1) and new Foo(1 ,2)). For me it looks like that happens because of Lookup that is being created inside Join method.
Could someone clarify what happens there? What should I change to achieve desired behavior?
Your Equals method is reflexive and symmetric, but it is not transitive.
Your implementation doesn't meet the requirements specified in the docs:
If (x.Equals(y) && y.Equals(z)) returns true, then x.Equals(z) returns true.
from https://learn.microsoft.com/en-us/dotnet/api/system.object.equals?view=netframework-4.8
For example, suppose you have:
var x = new Foo(1, 100);
var y = new Foo(1, null);
var z = new Foo(1, 200);
You have x.Equals(y) and y.Equals(z) which implies that you should also have x.Equals(z), but your implementation does not do this. Since you don't meet the specification, you can't expect any algorithms reliant on your Equals method to behave correctly.
You ask what you can do instead. This depends on exactly what you need to do. Part of the problem is that it's not really clear what is intended in the corner-cases, if indeed they can appear. What should happen if one Id appears multiple times with the same NullableId in one or both lists? For a simple example, if new Foo(1, 1) exists in the first list three times, and the second list three times, what should be in the output? Nine items, one for each pairing?
Here's a naive attempt to solve your problem. This joins on only Id and then filters out any pairings that have incompatible NullableId. But you might not be expecting the duplicates when an Id appears multiple times in each list, as can be seen in the example output.
using System;
using System.Linq;
using System.Collections.Generic;
public class Foo
{
public int Id { get; set; }
public int? NullableId { get; set; }
public Foo(int id, int? nullableId)
{
Id = id;
NullableId = nullableId;
}
public override string ToString() => $"Foo({Id}, {NullableId?.ToString()??"null"})";
}
class MainClass {
public static IEnumerable<Foo> JoinFoos(IEnumerable<Foo> first, IEnumerable<Foo> second) {
return first
.Join(second, f=>f.Id, s=>s.Id, (f,s) => new {f,s})
.Where(fs =>
fs.f.NullableId == null ||
fs.s.NullableId == null ||
fs.f.NullableId == fs.s.NullableId)
.Select(fs => new Foo(fs.f.Id, fs.f.NullableId ?? fs.s.NullableId));
}
public static void Main (string[] args) {
var first = new List<Foo> { new Foo(1, null), new Foo(1, null), new Foo(1, 3) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3), new Foo(1, null) };
foreach (var f in JoinFoos(first, second)) {
Console.WriteLine(f);
}
}
}
Output:
Foo(1, 1)
Foo(1, 2)
Foo(1, 3)
Foo(1, null)
Foo(1, 1)
Foo(1, 2)
Foo(1, 3)
Foo(1, null)
Foo(1, 3)
Foo(1, 3)
It also might be too slow for you if you have tens of thousands of items with the same Id, because it builds up every possible pair with matching Id before filtering them out. If each list has 10,000 items with Id == 1 then that's 100,000,000 pairs to pick through.
My answer contains a program that I believe is better than the one proposed in Weeble's answer but first I would like to demonstrate how the Join method works and talk about problems I see in your approach.
As you can see here https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.join?view=netframework-4.8
the Join method
Correlates the elements of two sequences based on matching keys.
If the keys don't match then elements from both collections are not included. For example, remove your Equals and GetHashCode methods and try this code:
var first = new List<Foo> { new Foo(1, 1) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };
//This is your original code that returns no results
var result = second.Join(first, s => s, f => f, (f, s) => new { f, s }).ToList();
result = first.Join(second, s => s, f => f, (f, s) => new { f, s }).ToList();
//This code is mine and it returns in both calls of the Join method one element in the resulting collection; the element contains two instances of Foo (1,1) - f and s
result = second.Join(first, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
result = first.Join(second, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
But if you set your original data input that contains null with my code:
var first = new List<Foo> { new Foo(1, null) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };
var result = second.Join(first, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
result = first.Join(second, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
the result variable will be empty in both cases since the key { 1, null } doesn't match any other key, i.e. { 1, 1 }, { 1, 2 }, { 1, 3 }.
Now returning to your question. I would suggest you reconsider your entire approach in cases like this and here is why. Let us imagine that your implementation of the Equals and GetHashCode methods worked as you expected and you even didn't post your question. Then your solution creates the following outcomes, as I see it:
To understand how your code calculates its output the user of your code has to have access to the code of the Foo type and spend time reviewing your implementation of the Equals and GetHashCode methods (or reading documentation).
With such implementation of the Equals and GetHashCode methods, you are trying to change the expected behavior of the Join method. The user may expect that the first element of the first collection Foo(1, null) will not be considered equal to the first element of the second collection Foo(1, 1).
Let us imagine that you have multiple classes to join, each is written by some individual, and each class has its own logic in the Equals and GetHashCode methods. To figure out how actually your joining works with each type the user instead of looking into a joining method implementation only once would need to check the source code of all those classes trying to understand how each type handles its own comparison facing different variations of things like this with magic numbers (taken from your code):
public override int GetHashCode()
{
var hashCode = 806340729;
hashCode = hashCode * -1521134295 + Id.GetHashCode();
return hashCode;
}
It may don't seem a big problem but imagine you are a new person on the
the project, you have a lot of classes with logic like this and limited time
to complete your task, e.g. you have an urgent change request, huge sets
of data input, and no unit tests.
If someone inherites from your class Foo and put an instance of Foo1 to the collection among with Foo instances:
public class Foo1 : Foo
{
public Foo1(int id, int? nullableId) : base (id, nullableId)
{
Id = id;
NullableId = nullableId;
}
public override bool Equals(object obj)
{
var otherFoo1 = (Foo1)obj;
return Id == otherFoo1.Id;
}
public override int GetHashCode()
{
var hashCode = 806340729;
hashCode = hashCode * -1521134295 + Id.GetHashCode();
return hashCode;
}
}
var first = new List<Foo> { new Foo1(1, 1) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3)};
var result = second.Join(first, s => s, f => f, (f, s) => new { f, s }).ToList();
result = first.Join(second, s => s, f => f, (f, s) => new { f, s }).ToList();
then you have here a run-time exception in the Equals method of the type Foo1:
System.InvalidCastException, Message=Unable to cast object of type
'ConsoleApp1.Foo' to type 'ConsoleApp1.Foo1'. With the same input data, my code
would work fine in this situation:
var result = second.Join(first, s => s.Id, f => f.Id, (f, s) => new { f, s }).ToList();
result = first.Join(second, s => s.Id, f => f.Id, (f, s) => new { f, s }).ToList();
With your implementation of the Equals and GetHashCode methods when someone modifies the joining code like this:
var result = second.Join(first, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
result = first.Join(second, s => new { s.Id, s.NullableId }, f => new { f.Id, f.NullableId }, (f, s) => new { f, s }).ToList();
then your logic in the Equals and GetHashCode methods will be ignored and
you will have a different result.
In my opinion, this approach (with overriding Equals and GetHashCode methods) may be a source of multiple bugs. I think it is better when your code performing joining has an implementation that can be understood without any extra information, the implementation of the logic is concentrated within one method, the implementation is clear, predictable, maintainable, and it is simple to understand.
Please also note that with your input data:
var first = new List<Foo> { new Foo(1, null) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };
the code in the Weeble's answer generates the following output:
Foo(1, 1)
Foo(1, 2)
Foo(1, 3)
while as far as I understand you asked for an implementation that with the input produces output that looks like this:
Foo(1, null), Foo(1, 1)
Foo(1, null), Foo(1, 2)
Foo(1, null), Foo(1, 3)
Please consider updating your solution with my code since it produces a result in the format you asked for, my code is easier to understand, and it has other advantages as you can see:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp40
{
public class Foo
{
public int Id { get; set; }
public int? NullableId { get; set; }
public Foo(int id, int? nullableId)
{
Id = id;
NullableId = nullableId;
}
public override string ToString() => $"Foo({Id}, {NullableId?.ToString() ?? "null"})";
}
class Program
{
static void Main(string[] args)
{
var first = new List<Foo> { new Foo(1, null), new Foo(1, 5), new Foo(2, 3), new Foo(6, 2) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3), new Foo(2, null) };
var result = second.Join(first, s=>s.Id, f=>f.Id, (f, s) => new { f, s })
.Where(o => !((o.f.NullableId != null && o.s.NullableId != null) &&
(o.f.NullableId != o.s.NullableId)));
foreach (var o in result) {
Console.WriteLine(o.f + ", " + o.s);
}
Console.ReadLine();
}
}
}
Output:
Foo(1, 1), Foo(1, null)
Foo(1, 2), Foo(1, null)
Foo(1, 3), Foo(1, null)
Foo(2, null), Foo(2, 3)
I got a class:
public class result
{
public int A;
public int B;
public int C;
}
I make a list of it:
public static List<result> results = new List<result>();
I then fill that list with random data
Somthing like 10,000,000 enteries, where a, b and c will have a value of 0 to 24.
I would like to show in my console what combo has been found and how many
in SQL it would be somthing like:
SELECT A, B, C, COUNT(*) AS total
FROM results
GROUP BY A, B, C
And i have tried so many things, i think i can write a book about it.
some of the stuff i tried:
var query1 = results.GroupBy(x => new { x.A, x.B, x.C }).Select(group => new { Value = group.Key, Count = group.Count() });
var query2 = from r in results group r by new { r.A, r.B, r.C } into rGroup select rGroup;
var query3 = results.GroupBy(x => new { x.A, x.B, x.C }) .Where(g => g.Count() > 1).ToDictionary(x => x.Key, y => y.Count());
var query4 = from r in results
group r by new { r.A, r.B, r.C } into rGroup
select new { key = rGroup.Key, cnt = rGroup.Count() };
But nothing seems to work.
I would like to get back a list with the a,b,c values and a count of how many have been found.
Yet im unable to get it working, i tried hours of googleing and tried everything, at this point i am completly lost.
For completeness sake a full example.
Same solutions as nlawalker though, producing a dict.
public class result
{
public int A;
public int B;
public int C;
public result(int a, int b, int c)
{
A = a;
B = b;
C = c;
}
}
static void Main(string[] args)
{
Random r = new Random(23);
var data = new List<result>();
for (int i = 0; i < 100; i++)
data.Add(new result(r.Next(1, 3), r.Next(1, 3), r.Next(1, 3)));
var dic = data
.GroupBy(k => new { k.A, k.B, k.C })
.ToDictionary(g => g.Key, g => g.Count());
foreach (var kvp in dic)
Console.WriteLine($"({kvp.Key.A},{kvp.Key.B},{kvp.Key.C}) : {kvp.Value}");
Console.ReadLine();
}
Output:
(2,2,2) : 13
(1,2,2) : 11
(2,1,1) : 9
(1,1,1) : 16
(1,2,1) : 14
(1,1,2) : 15
(2,1,2) : 7
(2,2,1) : 15
You got the GroupBy part right - you just need to select the groups into another object that has the A, B and C values of the group, along with the count of the group:
results.GroupBy(x => new { x.A, x.B, x.C })
.Select(g => new { g.Key.A, g.Key.B, g.Key.C, Count = g.Count()})
I have a dictionary that contains mapping of employee and his/her manager like this
Dictionary<string, string> employees = new Dictionary<string, string>()
{
{ "A","C" },
{ "B","C" },
{ "C","F" },
{ "D","E" },
{ "E","F" },
{ "F","F" }
};
I want to get no of employees under each manager in the hierarchy not just their direct reports but down the hierarchy chain.
In the above dictionary the root node/ceo is listed as reporting to himself. This is the only node that is guaranteed to have this self relationship.
How can I find total no of employees reporting to each manager. Output should be
A - 0
B - 0
C - 2
D - 0
E - 1
F - 5
This is what I tried so far but it only gives counts of direct reports not all reports in the chain
var reports = employees
.GroupBy(e => e.Value, (key, g) => new { employee = key, reports = g.Count() });
The problem you describe is virtually identical to the problem described in this blog post.
Your spec could be written as (this is a trivial adaptation from the quoted text):
The complete set of reports upon which an Employee depends is the transitive closure of the directly-reports-to relationship.
The post then proceeds to provide the following code to compute the transitive closure of a particular relationship:
static HashSet<T> TransitiveClosure<T>(
this Func<T, IEnumerable<T>> relation,
T item)
{
var closure = new HashSet<T>();
var stack = new Stack<T>();
stack.Push(item);
while (stack.Count > 0)
{
T current = stack.Pop();
foreach (T newItem in relation(current))
{
if (!closure.Contains(newItem))
{
closure.Add(newItem);
stack.Push(newItem);
}
}
}
return closure;
}
So all that's left is to provide the code for the directly-reports-to relationship.
This can be easily computed by creating a lookup from your dictionary mapping each employee to their reports:
var directReportLookup = employees.ToLookup(pair => pair.Value, pair => pair.Key);
Func<string, IEnumerable<string>> directReports =
employee => directReportLookup[employee];
employees.Select(pair => new
{
Employee = pair.Key,
Count = directReports.TransitiveClosure(pair.Key).Count,
});
You might be interested in a recursion in the anonymous methods. There is one interesting approach from the functional languages: fixed-point combinator.
It looks like this:
public static class Combinator
{
public static Func<TInput, TResult> Y<TInput, TResult>(Func<Func<TInput,TResult>, TInput, TResult> function)
{
return input => function(Y(function), input);
}
}
And can be used like this:
var result = employees
.Select(employee => Combinator.Y<string, int>
(
(f, e) => employees.Where(x => x.Value == e && x.Value != x.Key)
.Aggregate(employees.Count(x => x.Value == e && x.Value != x.Key), (current, next) => current + f(next.Key))
)
.Invoke(employee.Key))
.ToList();
Of course, it will be more useful for simler tasks, like this:
var fact = Combinator.Y<int, int>((f, n) => n > 1 ? n * f(n - 1) : 1);
var fib = Combinator.Y<uint, int>((f, n) => n > 2 ? f(n - 1) + f(n - 2) : (n == 0 ? 0 : 1));