How can I select group by values in specific propoerty? - c#

I am using a group by select using entity framework core and linq.
var list = context.Ways.GroupBY(s=>s.Type).Select(w=> new {
type = w.key,
total = (int)w.Sum(b => b.Length)
})
This giwes me a list.
type total
T1 2541
T2 5481
T5 4
T9 2
T11 856
T3 25
So I want to group into "Others", if total is smaller than 100 like following,
type total
T1 2541
T2 5481
T11 856
OTHERS 31
is this possible?

You can do this with a second group by
var list = context.Ways.GroupBy(s => s.Type).Select(w => new
{
type = w.Key,
total = (int)w.Sum(b => b.Length)
}).GroupBy(s => s.total < 100 ? "Others" : s.type)
.Select(w => new
{
type = w.Key,
total = (int)w.Sum(b => b.total)
});

You can't do this with Entity Framework, but you can write a method to iterate over the list in memory. For example, assuming you have a class to hold the key and value like this (or you could rewrite using KeyValuePair or a tuple):
public class ItemCount
{
public string Name { get; set; }
public int Count { get; set; }
}
An extension method to aggregate the smaller values could look like this:
public static IEnumerable<ItemCount> AggregateWithThreshold(this IEnumerable<ItemCount> source,
int threshold)
{
// The total item to return
var total = new ItemCount
{
Name = "Others",
Count = 0
};
foreach (var item in source)
{
if (item.Count >= threshold)
{
// If count is above threshold, just return the value
yield return item;
}
else
{
// Keep the total count
total.Count += item.Count;
}
}
// No need to return a zero count if all values were above the threshold
if(total.Count > 0)
{
yield return total;
}
}
And you would call it like this:
var list = context.Ways
.GroupBY(s => s.Type)
.Select(w => new ItemCount // Note we are using the new class here
{
Name = w.key,
Count = (int)w.Sum(b => b.Length)
});
var result = list.AggregateWithThreshold(100);

Technically you can add additional operation that would calculate the Others value based on 2 collection values you have already. Like this:
var list = context.Ways.GroupBy(s=>s.Type).Select(w=> new {
type = w.key,
total = (int)w.Sum(b => b.Length)
});
var totalSum = context.Ways.Sum(x => x.Length);
var listSum = list.Sum(x => x.total);
list.Add(new {
type = "Others",
total = totalSum - listSum
});

Related

Merge first list with second list based on standard deviation of second list C#

Given 2 datasets (which are both a sequence of standard deviations away from a number, we are looking for the overlapping sections):
var list1 = new decimal[] { 357.06, 366.88, 376.70, 386.52, 406.15 };
var list2 = new decimal[] { 370.51, 375.62, 380.72, 385.82, 390.93 };
I would like to perform a merge with items from List2 being placed closest to items of List1, within a certain range, i.e. merge List2 element within 5.10 (standard deviation) of List1 element:
357.06
366.88 => 370.51
376.70 => 375.52, 380.72
386.52 => 390.93
406.15
The idea is to cluster values from List2 and count them, in this case element with value 376.70 would have the highest significance as it has 2 close neighbors of 375.52 and 380.72 (where as 366.88 and 386.52 have only 1 match, and the remaining none within range).
Which C# math/stats libraries could be used for this (or would there be a better way to combine statistically)?
If this is more of a computer science or stats question apologies in advance will close and reopen on relevant SO site.
Assuming that list2 is sorted (if not, put Array.Sort(list2);) you can try Binary Search:
Given:
var list1 = new decimal[] { 357.06m, 366.88m, 376.70m, 386.52m, 406.15m };
var list2 = new decimal[] { 370.51m, 375.62m, 380.72m, 385.82m, 390.93m };
decimal sd = 5.10m;
Code:
// Array.Sort(list2); // Uncomment, if list2 is not sorted
List<(decimal value, decimal[] list)> result = new List<(decimal value, decimal[] list)>();
foreach (decimal value in list1) {
int leftIndex = Array.BinarySearch<decimal>(list2, value - sd);
if (leftIndex < 0)
leftIndex = -leftIndex - 1;
else // edge case
for (; leftIndex >= 1 && list1[leftIndex - 1] == value - sd; --leftIndex) ;
int rightIndex = Array.BinarySearch<decimal>(list2, value + sd);
if (rightIndex < 0)
rightIndex = -rightIndex - 1;
else // edge case
for (; rightIndex < list1.Length - 1 && list1[rightIndex + 1] == value + sd; ++rightIndex) ;
result.Add((value, list2.Skip(leftIndex).Take(rightIndex - leftIndex).ToArray()));
}
Let's have a look:
string report = string.Join(Environment.NewLine, result
.Select(item => $"{item.value} => [{string.Join(", ", item.list)}]"));
Console.Write(report);
Outcome:
357.06 => []
366.88 => [370.51]
376.70 => [375.62, 380.72]
386.52 => [385.82, 390.93]
406.15 => []
Something like this should work
var list1 = new double[] { 357.06, 366.88, 376.70, 386.52, 406.15 };
var list2 = new double[] { 370.51, 375.62, 380.72, 385.82, 390.93 };
double dev = 5.1;
var result = new Dictionary<double, List<double>>();
foreach (var l in list2) {
var diffs = list1.Select(r => new { diff = Math.Abs(r - l), r })
.Where(d => d.diff <= dev)
.MinBy(r => r.diff)
.FirstOrDefault();
if (diffs == null) {
continue;
}
List<double> list;
if (! result.TryGetValue(diffs.r, out list)) {
list = new List<double>();
result.Add(diffs.r, list);
}
list.Add(l);
}
It uses MinBy from MoreLinq, but it is easy to modify to work without it.
In fact, you don't need extra libs or something else. You can use just LINQ for this.
internal class Program
{
private static void Main(string[] args)
{
var deviation = 5.1M;
var list1 = new decimal[] { 357.06M, 366.88M, 376.70M, 386.52M, 406.15M };
var list2 = new decimal[] { 370.51M, 375.62M, 380.72M, 385.82M, 390.93M };
var result = GetDistribution(list1.ToList(), list2.ToList(), deviation);
result.ForEach(x => Console.WriteLine($"{x.BaseValue} => {string.Join(", ", x.Destribution)} [{x.Weight}]"));
Console.ReadLine();
}
private static List<Distribution> GetDistribution(List<decimal> baseList, List<decimal> distrebutedList, decimal deviation)
{
return baseList.Select(x =>
new Distribution
{
BaseValue = x,
Destribution = distrebutedList.Where(y => x - deviation < y && y < x + deviation).ToList()
}).ToList();
}
}
internal class Distribution
{
public decimal BaseValue { get; set; }
public List<decimal> Destribution { get; set; }
public int Weight => Destribution.Count;
}
I hope it was useful for you.

Apply function to some elements of list

I have a list of complex objects i.e.
class MyObject
{
public bool selected;
public int id;
public string name;
}
List<MyObject> theObjects = functionThatSelectsObjectsFromContainer();
And I have a list from another source that just give me int ids that are in the list of objects
List<int> idList = functionThatReturnsListOfIds();
Now for each of the items in the idList I want to set the selected property true. I know I can set up a foreach of one list and then search for the matching items in the other list and set it that way, but I was wondering if there's a different way that might be quicker.
Conclusion
I did some testing on all my methods below, as well as un-lucky's answer, and the fastest of them all was option 2 below, ie
var results = theObjects.Join(idList, o => o.id, id => id, (o, id) => o).ToList();
results.ForEach(o => o.selected = true);
Another way of doing it with Linq, where we iterate around theObjects and check each one to see if its' id exists in idList:
1
var result = theObjects.ForEach(o => o.selected = idList.Contains(o.id) ? true : false);
or using Join and ForEach, where we first extract the matching items using Join and then iterate around those items:
2
var results = theObjects.Join(idList, o => o.id, id => id, (o, id) => o).ToList();
results.ForEach(o => o.selected = true);
or, you can use Select with ForEach and FirstOrDefault. This is probably going to be slower than the other 2:
3
theObjects
.Select(o => o.id)
.Where(i => idList.Contains(i)).ToList()
.ForEach(i =>
theObjects.FirstOrDefault(o => o.id == i).selected = true);
I did some testing on the 3 methods I posted, where we have 10000 MyObjects and 1000 unique ids. I ran each method 1000 times, and then got the mean ElapsedMillliseconds for each.
The results were
1
8.288 ms
2
0.19 ms
3
57.342 ms
one = 0;
two = 0;
three = 0;
for (var i = 0; i <1000; i++) {
RunTest();
}
oneMean = one / 1000;
twoMean = two / 1000;
threeMean = three / 1000;
where
private void RunTest()
{
ResetData();
var stopwatch = Stopwatch.StartNew();
theObjects.ForEach(o => o.selected = idList.Contains(o.id) ? true : false);
stopwatch.Stop();
one += stopwatch.ElapsedMilliseconds;
ResetData();
stopwatch = Stopwatch.StartNew();
var results = theObjects.Join(idList, o => o.id, id => id, (o, id) => o).ToList();
results.ForEach(o => o.selected = true);
stopwatch.Stop();
two += stopwatch.ElapsedMilliseconds;
ResetData();
stopwatch = Stopwatch.StartNew();
theObjects
.Select(o => o.id)
.Where(i => idList.Contains(i)).ToList()
.ForEach(i =>
theObjects.FirstOrDefault(o => o.id == i).selected = true);
stopwatch.Stop();
three += stopwatch.ElapsedMilliseconds;
}
private void ResetData()
{
theObjects = new List<MyObject>();
idList = new List<int>();
var rnd = new Random();
for (var i=0; i<10000; i++) {
theObjects.Add(new MyObject(){id = i});
}
for (var i=0; i<=1000; i++) {
var r = rnd.Next(0, 1000);
while (idList.Contains(r)) {
r = rnd.Next(0, 10000);
}
idList.Add(r);
}
}
I tested un-lucky's answer (most upvotes right now) and it got a mean score of 147.676
foreach(var obj in theObjects.Where(o => idList.Any(i=> i == o.id)))
{
obj.selected = true;
}
I think you can do something like this, to make that working
foreach(var obj in theObjects.Where(o => idList.Any(i=> i == o.id)))
{
obj.selected = true;
}
With the help of Linq, you can use Where, ToList and ForEach to achieve your required behaviour -
theObjects.Where(x => idList.Contains(x.id)).ToList().ForEach(y => y.selected = true);
Using linq you can do like so
theObjects.Where(g => idList.Contains(g.id)).ForEach(g => g.selected = true);
You can try below solution where you don't need to use Where and Contains:
theObjects.ForEach(a => a.selected = idList.Exists(b => a.id == b));

How to pass the current index iteration inside a select new MyObject

This is my code:
infoGraphic.chartData = (from x in db.MyDataSource
group x by x.Data.Value.Year into g
select new MyObject
{
index = "", // here I need a string such as "index is:" + index
counter = g.Count()
});
I need the current index iteration inside the select new. Where do I pass it?
EDIT - My current query:
var test = db.MyData
.GroupBy(item => item.Data.Value.Year)
.Select((item, index ) => new ChartData()
{
index = ((double)(3 + index ) / 10).ToString(),
value = item.Count().ToString(),
fill = index.ToString(),
label = item.First().Data.Value.Year.ToString(),
}).ToList();
public class ChartData
{
public string index { get; set; }
public string value { get; set; }
public string fill { get; set; }
public string label { get; set; }
}
Use IEnumerable extension methods, I think the syntax is more straightforward.
You need the 2nd overload, that receives the IEnumerable item and the index.
infoGraphic.chartData.Select((item, index) => {
//what you want to do here
});
You want to apply grouping on your chartData, and afterwards select a subset / generate a projection on the resulting data ?
your solution should look like:
infoGraphic.chartData
.GroupBy(...)
.Select((item, index) => {
//what you want to do here
});
abstracting the dataSource as x:
x.GroupBy(item => item.Data.Value.Year)
.Select((item, index) => new { index = index, counter = item.Count() });
As a follow up to your new question...
here is a simple working scenario with a custom type (like your ChartData):
class Program
{
static void Main(string[] args)
{
List<int> data = new List<int> { 1, 872, -7, 271 ,-3, 7123, -721, -67, 68 ,15 };
IEnumerable<A> result = data
.GroupBy(key => Math.Sign(key))
.Select((item, index) => new A { groupCount = item.Count(), str = item.Where(i => Math.Sign(i) > 0).Count() == 0 ? "negative" : "positive" });
foreach(A a in result)
{
Console.WriteLine(a);
}
}
}
public class A
{
public int groupCount;
public string str;
public override string ToString()
{
return string.Format("Group Count: [{0}], String: [{1}].", groupCount, str);
}
}
/* Output:
* -------
* Group Count: [6], String: positive
* Group Count: [4], String: negative
*/
Important: Make sure the data type you are to use the extension methods is of type IEnumerable (inherits IEnumerable), otherwise you will not find this Select overload my solution is talking about, exposed.
you can do something like this:
let currIndex = collection.IndexOf(collectionItem)
Your code would then become:
infoGraphic.chartData =
(from x in db.MyDataSource group x by x.Data.Value.Year into g
// Get Iterator Index Here
let currIndex = db.MyDataSource.IndexOf(x)
select new MyObject
{index = currIndex.ToString(), // Your Iterator Index
counter = g.Count()
});

query to get all action during for each hour

i want to run and print a query that shows the number of orders per each hour in a day(24).
should look like:
hour-1:00, number of orders-5
hour-2:00, number of orders-45
hour-3:00, number of orders-25
hour-4:00, number of orders-3
hour-5:00, number of orders-43
and so on...
i try:
public void ShowBestHours()
{
using (NorthwindDataContext db = new NorthwindDataContext())
{
var query =
from z in db.Orders
select new Stime
{
HourTime = db.Orders.GroupBy(x => x.OrderDate.Value.Hour).Count(),
};
foreach (var item in query)
{
Console.WriteLine("Hour : {0},Order(s) Number : {1}", item.HourTime, item.Count);
}
}
}
public class Stime
{
public int HourTime { get; set; }
public int Count { get; set; }
}
You need to change your query to
var query =
from z in db.Orders
group z by z.OrderDate.Value.Hour into g
select new Stime{ HourTime = g.Key, Count=g.Count () };
or alternatively
var query = db,Orders.GroupBy (o => o.OrderDate.Value.Hour).Select (
g => new Stime{ HourTime=g.Key, Count=g.Count () });
In my copy of Northwind all of the OrderDate values are dates only so the result is just
HourTime = 0, Count = 830.
I'm assuming you're just experimenting with grouping. Try grouping by day of week like this
var query = db.Orders.GroupBy (o => o.OrderDate.Value.DayOfWeek).Select (
g => new { DayOfWeek=g.Key, Count=g.Count () });
which gives a more useful result.
You aren't setting Stime.Count anywhere in your query and you aren't grouping by hour correctly. I haven't seen your exact setup of course, but I think the following should work for you.
var query =
from z in db.Orders
group z by z.OrderDate.Value.Hour into g
select new Stime() { HourTime = g.Key, Count = g.Count() };
foreach (var item in query)
{
Console.WriteLine("Hour : {0},Order(s) Number : {1}", item.HourTime, item.Count);
}
Try this:
public void ShowBestHours()
{
using (NorthwindDataContext db = new NorthwindDataContext())
{
var query = db.Orders.GroupBy(x => x.OrderDate.Value.Hour).OrderByDescending(x => x.Count()).Select(x => new Stime { HourTime = x.Key, Count = x.Count() });
foreach (var item in query)
{
Console.WriteLine("Hour : {0},Order(s) Number : {1}", item.HourTime, item.Count);
}
}
}

Combine Elements in a List based on Type and Summate their Values, LINQ

Given this structure..
I basically want to be able to take a list of items with multiple types, and create a new list that condenses down the sum of the values of each like-type. However the names of the types are dynamic (they may or may not have a specific order, and there is no finite list of them)
using System.Linq;
using System.Collections.Generic;
class Item
{
public ItemType Type;
public int Value;
public int Add(Item item)
{
return this.Value + item.Value;
}
}
class ItemType
{
public string Name;
}
class Test
{
public static void Main()
{
List<ItemType> types = new List<ItemType>();
types.Add(new ItemType { Name = "Type1" });
types.Add(new ItemType { Name = "Type2" });
types.Add(new ItemType { Name = "Type3" });
List<Item> items = new List<Item>();
for (int i = 0; i < 10; i++)
{
items.Add(new Item
{
Type = types.Single(t => t.Name == "Type1"),
Value = 1
});
}
for (int i = 0; i < 10; i++)
{
items.Add(new Item
{
Type = types.Single(t => t.Name == "Type2"),
Value = 1
});
}
for (int i = 0; i < 10; i++)
{
items.Add(new Item
{
Type = types.Single(t => t.Name == "Type3"),
Value = 1
});
}
List<Item> combined = new List<Item>();
// create a list with 3 items, one of each 'type', with the sum of the total values of that type.
// types included are not always known at runtime.
}
}
Something like this should work. Warning: I didn't compile this.
items.GroupBy(i => i.Name)
.Select(g => new Item { Type= g.First().Name, Value = g.Sum(i => i.Value)})
.ToList()
List<Item> combined = items.GroupBy(i => i.Type).Select(g => new Item { Type = g.Key, Value = g.Sum(i => i.Value) }).ToList();
var itemsByType = items.ToLookup(i => i.Type);
var sums = from g in itemsByType
select new Item {
Type = g.Key,
Value = g.Sum(i => i.Value)
};
var sumList = sums.ToList();
It seems to me like you are trying to get a list of Types along with their count (since Value will always be 1 in your example). Below is some code that should do this:
from i in items
group i by i.Type into t
select new { t.Key, TypeCount = t.Count() }
This would return 3 objects (displayed in table form below):
Type TypeCount
-------- ---------
Type1 10
Type2 10
Type3 10
If value is always going to be one then I believe it's the same as just getting the count.

Categories