Doing pivot with LINQ - c#

I've got this problem..I have a CSV file in the following format (customer, bought item pair):
customer1 item1
customer1 item2
customer1 item3
customer2 item4
customer2 item2
customer3 item5
customer3 item1
customer3 item2
customer4 item1
customer4 item2
customer5 item5
customer5 item1
Now, I wish to show in query results:
item x; item y; how many customers have bought itemx and item together
For example:
item1 item2 3 (because cust1 and cust2 and cust3 bought item1 and item2 together)
item1 item5 1 (because cust5 and cust3 bought item1 and item5 together)
The query return all possible combinations of items that customers have bought in pairs. Also notice that Pair(x, y) is the same as Pair(y, x).
An SQL query would look like this:
SELECT a1.item_id, a2.item_id, COUNT(a1.cust_id) AS how_many_custs_bought_both
FROM data AS a1
INNER JOIN data AS a2
ON a2.cust_id=a1.cust_id AND a2.item_id<>a1.item_id AND a1.item_id<a2.item_id
GROUP BY a1.item_id, a2.item_id
How would you do that in C# 1) using regular for/foreach loops 2) using LINQ ?
I tried doing it in LINQ first but stuck when I noticed that LINQ doesn't support multiple equals keyword in join clause. Then I tried doing using normal loops, however, it became so unefficient that it could only process like 30 lines (of CSV file rows) per second.
Please advise!

Using LINQ (and following the first 5 lines from Tim's answer) combining the chained method syntax with the query syntax for the join part:
var custItems = new [] {
new { customer = 1, item = 1 },
new { customer = 1, item = 2 },
new { customer = 1, item = 3 },
new { customer = 2, item = 4 },
new { customer = 2, item = 2 },
new { customer = 3, item = 5 },
new { customer = 3, item = 1 },
new { customer = 3, item = 2 },
new { customer = 4, item = 1 },
new { customer = 4, item = 2 },
new { customer = 5, item = 5 },
new { customer = 5, item = 1 }
};
};
var pairs = custItems.GroupBy(x => x.customer)
.Where(g => g.Count() > 1)
.Select(x => (from a in x.Select( y => y.item )
from b in x.Select( y => y.item )
where a < b //If you want to avoid duplicate (a,b)+(b,a)
// or just: where a != b, if you want to keep the dupes.
select new { a, b}))
.SelectMany(x => x)
.GroupBy(x => x)
.Select(g => new { Pair = g.Key, Count = g.Count() })
.ToList();
pairs.ForEach(x => Console.WriteLine(x));
EDIT: Forgot that OP wanted pair ocurrence count, added another .GroupBy() magic.
EDIT: Completed the example to show what it would output:
{ Pair = { a = 1, b = 2 }, Count = 3 }
{ Pair = { a = 1, b = 3 }, Count = 1 }
{ Pair = { a = 2, b = 3 }, Count = 1 }
{ Pair = { a = 2, b = 4 }, Count = 1 }
{ Pair = { a = 1, b = 5 }, Count = 2 }
{ Pair = { a = 2, b = 5 }, Count = 1 }
EDIT: rolled back and changed strings to integers, as OP shows a dataset with integers as IDs, and that removes the need for .GetHashCode()

Perhaps:
var lines = File.ReadLines(csvFilePath);
var custItems = lines
.Select(l => new { split = l.Split() })
.Select(x => new { customer = x.split[0].Trim(), item = x.split[1].Trim() })
.ToList();
var groups = from ci1 in custItems
join ci2 in custItems
on ci1.customer equals ci2.customer
where ci1.item != ci2.item
group new { Item1 = ci1.item, Item2 = ci2.item } by new { Item1 = ci1.item, Item2 = ci2.item } into ItemGroup
select ItemGroup;
var result = groups.Select(g => new
{
g.Key.Item1,
g.Key.Item2,
how_many_custs_bought_both = g.Count()
});
Note that the materialization with ToList is important when the file is large because of the self-join.
{ Item1 = item1, Item2 = item2, how_many_custs_bought_both = 3 }
{ Item1 = item1, Item2 = item3, how_many_custs_bought_both = 1 }
{ Item1 = item2, Item2 = item1, how_many_custs_bought_both = 3 }
{ Item1 = item2, Item2 = item3, how_many_custs_bought_both = 1 }
{ Item1 = item3, Item2 = item1, how_many_custs_bought_both = 1 }
{ Item1 = item3, Item2 = item2, how_many_custs_bought_both = 1 }
{ Item1 = item4, Item2 = item2, how_many_custs_bought_both = 1 }
{ Item1 = item2, Item2 = item4, how_many_custs_bought_both = 1 }
{ Item1 = item5, Item2 = item1, how_many_custs_bought_both = 2 }
{ Item1 = item5, Item2 = item2, how_many_custs_bought_both = 1 }
{ Item1 = item1, Item2 = item5, how_many_custs_bought_both = 2 }
{ Item1 = item2, Item2 = item5, how_many_custs_bought_both = 1 }

You can write some like this:
IDictionary<int, int> pivotResult = customerItems.ToLookup(c => c.Customer)
.ToDictionary(x=>x.Key, y=>y.Count());

Working LINQ example, not too pretty!
using System;
using System.Collections.Generic;
using System.Linq;
class Data
{
public Data(int cust, int item)
{
item_id = item;
cust_id = cust;
}
public int item_id { get; set; }
public int cust_id { get; set; }
static void Main(string[] args)
{
var data = new List<Data>
{new Data(1,1),new Data(1,2),new Data(1,3),
new Data(2,4),new Data(2,2),new Data(3,5),
new Data(3,1),new Data(3,2),new Data(4,1),
new Data(4,2),new Data(5,5),new Data(5,1)};
(from a1 in data
from a2 in data
where a2.cust_id == a1.cust_id && a2.item_id != a1.item_id && a1.item_id < a2.item_id
group new {a1, a2} by new {item1 = a1.item_id, item2 = a2.item_id}
into g
select new {g.Key.item1, g.Key.item2, count = g.Count()})
.ToList()
.ForEach(x=>Console.WriteLine("{0} {1} {2}",x.item1,x.item2,x.count))
;
Console.Read();
}
}
Output:
1 2 3
1 3 1
2 3 1
2 4 1
1 5 2
2 5 1

Related

LINQ Select Multiple Elements from a List<object[]>

I have a List which is populated with data from a database.
The object array has say 10 elements when populated
I want to do a LINQ Select Statement that gets returns a List<object[]> with just 2 elements. How can I select these elements 1 and 2.
I have tried the following which work for element 0 but How can I get element 0 and element 1 ??
var resultDistinct = result.Select(p => p.GetValue(0)).Distinct();
var resultDistinct2 = result.Select(p => p.ElementAt(0)).Distinct();
You could use an anonymous object for this..
var items = result.Select(p => new { ValueA = p.GetValue(0), ValueB = p.GetValue(1) });
Then access each item
foreach(var item in items)
{
var valueA = item.ValueA;
var valueB = item.ValueB;
}
You can use the Take extension method:
items.Take(x);
This will return the first x items of a collection.
If you want to skip over some elements, you can use Skip(x) before calling Take. These two methods are very often used for paging.
If you want distinct and then 2 then,
result.Select(p => p).Distinct().Take(2);
If just 2 then,
result.Take(2);
private class Foo
{
public int Item1;
public int Item2;
public int Item3;
}
static void Main(string[] args)
{
List<Foo> foos = new List<Foo>
{
new Foo() { Item1 = 1, Item2 = 2, Item3 = 3 },
new Foo() { Item1 = 4, Item2 = 5, Item3 = 6 },
new Foo() { Item1 = 7, Item2 = 8, Item3 = 9 }
};
// Create a list of lists where each list has three elements corresponding to
// the values stored in Item1, Item2, and Item3. Then use SelectMany
// to flatten the list of lists.
var items = foos.Select(f => new List<int>() { f.Item1, f.Item2, f.Item3 }).SelectMany(item => item).Distinct();
foreach (int item in items)
Console.WriteLine(item.ToString());
Console.ReadLine();
}
refer to: https://nickstips.wordpress.com/2010/09/16/linq-selecting-multiple-properties-from-a-list-of-objects/

merging 2 lists into 1 with logic

I have 2 lists of the same type.
List 1:
ID
Name
Value
1,"Prod1", 0
2,"Prod2", 50
3,"Prod3", 0
List 2:
ID
Name
Value
1,"Prod1", 25
2,"Prod2", 100
3,"Prod3", 75
I need to combine these 2 lists into 1, but I only want the values from list2 if the corresponding value from list1 == 0
So my new list should look like this:
1,"Prod1", 25
2,"Prod2", 50
3,"Prod3", 75
I've tried many variations of something like this:
var joined = from l1 in List1.Where(x=>x.Value == "0")
join l2 in List2 on l1.ID equals l2.ID into gj
select new { gj };
I've also tried a variation of the concat
What is the best way of doing this?
You just need to select the individual properties and conditionally select either the Value from the first or second list item.
var List1 = new[]
{
new { Name = "Prod1", Id = 1, Value = 0 },
new { Name = "Prod2", Id = 2, Value = 50 },
new { Name = "Prod3", Id = 3, Value = 0 },
new { Name = "NotInList2", Id = 4, Value = 0}
};
var List2 = new[]
{
new { Name = "Prod1", Id = 1, Value = 25 },
new { Name = "Prod2", Id = 2, Value = 100 },
new { Name = "Prod3", Id = 3, Value = 75 }
};
var results = from l1 in List1
join l2temp in List2 on l1.Id equals l2temp.Id into grpj
from l2 in grpj.DefaultIfEmpty()
select new
{
l1.Id,
l1.Name,
Value = l1.Value == 0 && l2 != null ? l2.Value : l1.Value
};
foreach(var item in results)
Console.WriteLine(item);
Will output
{ Id = 1, Name = Prod1, Value = 25 }
{ Id = 2, Name = Prod2, Value = 50 }
{ Id = 3, Name = Prod3, Value = 75 }
{ Id = 4, Name = NotInList2, Value = 0 }
NOTE: This assumes that you only want all the ids that are in List1 (not any that are only in List2) and that the ids are unique and that the Name from List1 is what you want even if it is different in List2.
clone l1 and
foreach (var item in l1Clone)
if (item.value == 0)
item.value == l2.FirstOrDefault(l2item => l2item.ID == item.ID)
Refer to the code below:
IEnumerable<item> join_lists(IEnumerable<item> list1, IEnumerable<item> list2)
{
var map = list2.ToDictionary(i => i.id);
return list1.Select(i => new item()
{
id = i.id,
name = i.name,
value = i.value == 0 ? map[i.id].value : i.value
});
}
You could use Zip:
var combined = list1
.Zip(list2, (product1, product2) => product1.Value == 0 ? product2 : product1);

Merge equal items with multiple lists but different length

I have 2 lists.
They are different in length but same type.
I want that an Item from List2 replaces an equal item in List1.
var item1 = new Item { Id = 1, Name = "Test1" };
var item2 = new Item { Id = 2, Name = "Test2" };
var item3 = new Item { Id = 3, Name = "Test3" };
var item4 = new Item { Id = 4, Name = "Test4" };
var item5 = new Item { Id = 5, Name = "Test5" };
var list1 = new List<Item> { item1, item2, item3, item4, item5 };
var list2 = new List<Item> { new Item { Id = 1, Name = "NewValue" } };
As a result I expect a list with 5 items where the item with Id = 1 has a value "NewValue".
How can I do that preferable with linq.
UPDATE
I extend my question:
How can the replacement of the replaced Item happen without copying all properties manually. Just imagine I have 100 properties...
This is one way to do it:
First define an equality comparer that depends only on the Id property of the Item class like this:
public class IdBasedItemEqualityComparer : IEqualityComparer<Item>
{
public bool Equals(Item x, Item y)
{
return x.Id == y.Id;
}
public int GetHashCode(Item obj)
{
return obj.Id.GetHashCode();
}
}
Then you can take items list1 that don't have corresponding items in list2 using the Except method and then you can concatenate that with list2 using the Concat method like this:
var result = list1.Except(list2, new IdBasedItemEqualityComparer()).Concat(list2).ToList();
Notice how I use the IdBasedItemEqualityComparer with the Except method, so that comparison is based only on Id.
Off the top of my head this is one solution
var list3 = new List<Item>();
foreach (var item in list1)
list3.Add(list2.FirstOrDefault(s => s.Id == item.Id) ?? item);
I think LEFT OUTER JOIN in Linq will be able to merge 2 lists regardless of number of properties(columns) like this:
List<Item> newItems =
(from l1 in list1
join l2 in list2 on l1.Id equals l2.Id into l12
from l2 in l12.DefaultIfEmpty()
select new { Item = (l2 == null) ? l1 : l2 }).Select(r => r.Item).ToList();

Remove duplicate items and calculate average values using LINQ

For example I have a list of objects (properties: Name and value)
item1 20;
item2 30;
item1 50;
I want the result:
item1 35 (20+50)/2
item2 30
How can I do this?
Sorry guys, duplicate is based on item.Name.
var results =
from kvp in source
group kvp by kvp.Key.ToUpper() into g
select new
{
Key= g.Key,
Value= g.Average(kvp => kvp.Value)
}
or
var results = source.GroupBy(c=>c.Name)
.Select(c => new (c.Key, c.Average(d=>d.Value)));
You could do it using average and group by:
public class myObject
{
public string Name {get;set;}
public double Value {get;set;}
}
var testData = new List<myObject>() {
new myObject() { Name = "item1", Value = 20 },
new myObject() { Name = "item2", Value = 30 },
new myObject() { Name = "item1", Value = 50 }
};
var result = from x in testData
group x by x.Name into grp
select new myObject() {
Name=grp.Key,
Value= grp.Average(obj => obj.Value)
};

how to get a SUM in Linq?

I need to do the following, I have a List with a class which contains 2 integer id and count
Now I want to do the following linq query:
get the sum of the count for each id
but there can be items with the same id, so it should be summerized e.g.:
id=1, count=12
id=2, count=1
id=1, count=2
sould be:
id=1 -> sum 14
id=2 -> sum 1
how to do this?
Group the items by Id and then sum the Counts in each group:
var result = items.GroupBy(x => x.Id)
.Select(g => new { Id = g.Key, Sum = g.Sum(x => x.Count) });
Try it ,
.GroupBy(x => x.id)
.Select(n => n.Sum(m => m.count));
The following program...
struct Item {
public int Id;
public int Count;
}
class Program {
static void Main(string[] args) {
var items = new [] {
new Item { Id = 1, Count = 12 },
new Item { Id = 2, Count = 1 },
new Item { Id = 1, Count = 2 }
};
var results =
from item in items
group item by item.Id
into g
select new { Id = g.Key, Count = g.Sum(item => item.Count) };
foreach (var result in results) {
Console.Write(result.Id);
Console.Write("\t");
Console.WriteLine(result.Count);
}
}
}
...prints:
1 14
2 1

Categories