Looping through fields to get sum - c#

I know there are probably 100 much easier ways to do this but until I see it I can't comprehend how to go about it. I'm using linqpad for this. Before I go to phase 2 I need to get this part to work!
I've connected to an SQL database.
I'm running a query to retrieve some desired records.
var DesiredSym = (from r in Symptoms
where r.Status.Equals(1) && r.Create_Date < TimespanSecs
select r).Take(5);
So, in this example, I retrieve 5 'records' essentially in my DesiredSym variable as iQueryable (linqpad tells me this)
The DesiredSym contains a large number of fields including a number feilds that hold a int of Month1_Used, Month2_Used, Month3_Used .... Month12_Use.
So I want to loop through the DesiredSym and basically get the sum of all the Monthx_Used fields.
foreach (var MonthUse in DesiredSym)
{
// get sum of all fields where they start with MonthX_Used;
}
This is where I'm not clear on how to proceed or even if I'm on the right track. Thanks for getting me on the right track.

Since you've got a static number of fields, I'd recommend this:
var DesiredSym =
(from r in Symptoms
where r.Status.Equals(1) && r.Create_Date < TimespanSecs
select retireMe)
.Take(5);
var sum = DesiredSym.Sum(s => s.Month1_Use + s.Month2_Use + ... + s.Month12_Use);
You could use reflection, but that would be significantly slower and require more resources, since you'd need to pull the whole result set into memory first. But just for the sake of argument, it would look something like this:
var t = DesiredSym.GetType().GenericTypeArguments[0];
var props = t.GetProperties().Where(p => p.Name.StartsWith("Month"));
var sum = DesiredSym.AsEnumerable()
.Sum(s => props.Sum(p => (int)p.GetValue(s, null)));
Or this, which is a more complicated use of reflection, but it has the benefit of still being executed on the database:
var t = DesiredSym.GetType().GenericTypeArguments[0];
var param = Expression.Parameter(t);
var exp = t.GetProperties()
.Where(p => p.Name.StartsWith("Month"))
.Select(p => (Expression)Expression.Property(param, p))
.Aggregate((x, y) => Expression.Add(x, y));
var lambda = Expression.Lambda(exp, param);
var sum = DesiredSym.Sum(lambda);
Now, to these methods (except the third) to calculate the sum in batches of 5, you can use MoreLINQ's Batch method (also available on NuGet):
var DesiredSym =
from r in Symptoms
where r.Status.Equals(1) && r.Create_Date < TimespanSecs
select retireMe;
// first method
var batchSums = DesiredSym.Batch(5, b => b.Sum(s => s.Month1_Use ...));
// second method
var t = DesiredSym.GetType().GenericTypeArguments[0];
var props = t.GetProperties().Where(p => p.Name.StartsWith("Month"));
var batchSums = DesiredSym.Batch(5, b => b.Sum(s => props.Sum(p => (int)p.GetValue(s, null))));
Both these methods will be a bit slower and use more resources since all the processing has to be don in memory. For the same reason the third method will not work, since MoreLinq does not support the IQueryable interface.

Related

Refactor GroupBy to avoid slowing down operation on big dataset

I have a big collection where i need to get the newest item based on two properties.
The first step is ordering the list based on the date prop. This is all fine and pretty quick.
Then I group the newlist by two properties, and take the first item from each.
var one = Fisks.Where(s=>s.Havn.Id == 1).OrderByDescending(s=>s.Date);
var two = one.GroupBy(s=>new {s.Arter.Name, s.Sort});
var three = two.Select(s=>s.FirstOrDefault());
This works, but it is really slow when using it on the large collection. How can I avoid using the groupBy but still get the same result?
Thanks!
Using LINQ only for the first step and then taking the first ones in a loop gives you more control over the process and avoids grouping altogether:
var query = Fisks
.Where(f => f.Havn.Id == 1)
.OrderByDescending(f => f.Date)
.ThenBy(f => f.Arter.Name)
.ThenBy(f => f.Sort);
var list = new List<Fisk>();
foreach (Fisk fisk in query) {
if (list.Count == 0) {
list.Add(fisk);
} else {
Fisk last = list[list.Count - 1];
if (fisk.Sort != last.Sort || fisk.Arter.Name != last.Arter.Name) {
list.Add(fisk);
}
}
}
Generally I advise against ordering before doing something that possibly destroys that order (such as GroupBy can do in SQL as generated by LINQ2SQL). Also try ordering only the stuff you are going to use. You can improve your query performance, if you limit the selection only the required fields/properties. You can fiddle around with this sample and use your real backend instead:
var Fisks=new[]{
new {Havn=new{Id=1},Date=DateTime.MinValue,Arter=new{Name="A"},Sort=1,Title="A1"},
new {Havn=new{Id=1},Date=DateTime.MinValue.AddDays(1),Arter=new{Name="A"},Sort=1,Title="A2"},
new {Havn=new{Id=1},Date=DateTime.MinValue,Arter=new{Name="B"},Sort=1,Title="B1",},
new {Havn=new{Id=1},Date=DateTime.MinValue.AddDays(2),Arter=new{Name="B"},Sort=1,Title="B2",},
new {Havn=new{Id=1},Date=DateTime.MinValue.AddDays(2),Arter=new{Name="B"},Sort=1,Title="B3",},
};
var stopwatch=Stopwatch.StartNew();
var one = Fisks.Where(s=>s.Havn.Id == 1).OrderByDescending(s=>s.Date);
var two = one.GroupBy(s=>new {s.Arter.Name, s.Sort});
var three = two.Select(s=>s.FirstOrDefault());
var answer=three.ToArray();
stopwatch.Stop();
stopwatch.ElapsedTicks.Dump("elapsed Ticks");
answer.Dump();
stopwatch.Restart();
answer=Fisks
.Where(f=>f.Havn.Id.Equals(1))
.GroupBy(s=>new {s.Arter.Name, s.Sort},(k,g)=>new{
s=g.OrderByDescending(s=>s.Date).First()//TOP 1 -> quite fast
})
.Select(g=>g.s)
.OrderByDescending(s=>s.Date) // only fully order results
.ToArray();
stopwatch.Stop();
stopwatch.ElapsedTicks.Dump("elapsed Ticks");
answer.Dump();
If you're working against any SQL Server you should check the generated SQL in LINQPad. You don't want to end up with a n+1 Query. Having an index on Havn.Id and Fisks.Date might also help.

Linq with boolean function to relational db in Entity Framework

Probably a few things wrong with my code here but I'm mostly having a problem with the syntax. Entry is a model for use in Entries and contains a TimeStamp for each entry. Member is a model for people who are assigned entries and contains an fk for Entry. I want to sort my list of members based off of how many entries the member has within a given period (arbitrarily chose 30 days).
A. I'm not sure that the function I created works correctly, but this is aside from the main point because I haven't really dug into it yet.
B. I cannot figure out the syntax of the Linq statement or if it's even possible.
Function:
private bool TimeCompare(DateTime TimeStamp)
{
DateTime bound = DateTime.Today.AddDays(-30);
if (bound <= TimeStamp)
{
return true;
}
return false;
}
Member list:
public PartialViewResult List()
{
var query = repository.Members.OrderByDescending(p => p.Entry.Count).Where(TimeCompare(p => p.Entry.Select(e => e.TimeStamp));
//return PartialView(repository.Members);
return PartialView(query);
}
the var query is my problem here and I can't seem to find a way to incorporate a boolean function into a .where statement in a linq.
EDIT
To summarize I am simply trying to query all entries timestamped within the past 30 days.
I also have to emphasize the relational/fk part as that appears to be forcing the Timestamp to be IEnumerable of System.Datetime instead of simple System.Datetime.
This errors with "Cannot implicitly convert timestamp to bool" on the E.TimeStamp:
var query = repository.Members.Where(p => p.Entry.First(e => e.TimeStamp) <= past30).OrderByDescending(p => p.Entry.Count);
This errors with Operator '<=' cannot be applied to operands of type 'System.Collections.Generic.IEnumerable' and 'System.DateTime'
var query = repository.Members.Where(p => p.Entry.Select(e => e.TimeStamp) <= past30).OrderByDescending(p => p.Entry.Count);
EDIT2
Syntactically correct but not semantically:
var query = repository.Members.Where(p => p.Entry.Select(e => e.TimeStamp).FirstOrDefault() <= timeComparison).OrderByDescending(p => p.Entry.Count);
The desired result is to pull all members and then sort by the number of entries they have, this pulls members with entries and then orders by the number of entries they have. Essentially the .where should somehow be nested inside of the .count.
EDIT3
Syntactically correct but results in a runtime error (Exception Details: System.ArgumentException: DbSortClause expressions must have a type that is order comparable.
Parameter name: key):
var query = repository.Members.OrderByDescending(p => p.Entry.Where(e => e.TimeStamp <= timeComparison));
EDIT4
Closer (as this line compiles) but it doesn't seem to be having any effect on the object. Regardless of how many entries I add for a user it doesn't change the sort order as desired (or at all).
var timeComparison = DateTime.Today.AddDays(-30).Day;
var query = repository.Members.OrderByDescending(p => p.Entry.Select(e => e.TimeStamp.Day <= timeComparison).FirstOrDefault());
A bit of research dictates that Linq to Entities (IE: This section)
...var query = repository.Members.OrderByDescending(...
tends to really not like it if you use your own functions, since it will try to map to a SQL variant.
Try something along the lines of this, and see if it helps:
var query = repository.Members.AsEnumerable().Where(TimeCompare(p => p.Entry.Select(e => e.TimeStamp).OrderByDescending(p => p.Entry.Count));
Edit: I should just read what you are trying to do. You want it to grab only the ones within the last X number of days, correct? I believe the following should work, but I would need to test when I get to my home computer...
public PartialViewResult List()
{
var timeComparison = DateTime.Today.AddDays(-30);
var query = repository.Members.Where(p => p.Entry.Select(e => e.TimeStamp).FirstOrDefault() <= timeComparison).OrderByDescending(p => p.Entry.Count));
//return PartialView(repository.Members);
return PartialView(query);
}
Edit2: This may be a lack of understanding from your code, but is e the same type as p? If so, you should be able to just reference the timestamp like so:
public PartialViewResult List()
{
var timeComparison = DateTime.Today.AddDays(-30);
var query = repository.Members.Where(p => p.TimeStamp <= timeComparison).OrderByDescending(p => p.Entry.Count));
//return PartialView(repository.Members);
return PartialView(query);
}
Edit3: In Edit3, I see what you are trying to do now (I believe). You're close, but OrderByDescending would need to go on the end. Try this:
var query = repository.Members
.Select(p => p.Entry.Where(e => e.TimeStamp <= timeComparison))
.OrderByDescending(p => p.Entry.Count);
Thanks for all the help Dylan but here is the final answer:
public PartialViewResult List()
{
var timeComparison = DateTime.Today.AddDays(-30).Day;
var query = repository.Members
.OrderBy(m => m.Entry.Where(e => e.TimeStamp.Day <= timeComparison).Count());
return PartialView(query);
}

How to combine a slow moving observable with the most recent value of a fast moving observable

A friend asked me this - I thought it was a good question so I am reposting it and my answer here:
I have these two streams:
var slowSource = Observable.Interval(TimeSpan.FromSeconds(1));
var fastSource = Observable.Interval(TimeSpan.FromMilliseconds(100));
and I’d like to combine them so that I produce output pairs which contain
- The next value from slowSource
- The most recent value from fastSource
I only want one output pair per value from slowSource. For example, the first three output values might look something like this:
0,8
1,18,
2,28
A join gets me close but I end up with more than one output per slowSource (due to the way that the durations overlap, I guess):
var qry = slowSource.Join(
right: fastSource,
leftDurationSelector: i => fastSource,
rightDurationSelector: j => fastSource,
resultSelector: (l, r) => {return new {L = l, R = r};})
.Subscribe(Console.WriteLine);
Using a GroupJoin and a Select produces output that looks about right:
var qry2 = slowSource.GroupJoin(
right: fastSource,
leftDurationSelector: i => fastSource,
rightDurationSelector: j => fastSource,
resultSelector: (l, r) => {return new {L= l, R = r};}
)
.Select(async item => {
return new {L = item.L, R = await item.R.FirstAsync()};})
.Subscribe(Console.WriteLine);
However, this doesn’t feel like a great approach; there must be a better way that uses the other combinators to do stuff like this in a simpler way. Is there?
How about this overload of Zip which combines an IObservable with an IEnumerable. It uses MostRecent() to get a sample of the latest value of a stream as the enumerable.
slowSource.Zip(fastSource.MostRecent(0), (l,r) => new {l,r})
Observable.CombineLatest http://msdn.microsoft.com/en-us/library/hh211991(v=vs.103).aspx can help and then you need to resample the fastSource at the rate of the slowSource.
var combinedObservable = slowSource.CombineLatest
( fastSource.Sample(slowSource)
, (s,f)=>new {s,f}
);

Linq select from list where property matches a condition

I have an collection of Videos who have a field typeidentifier that tells me if a video is a trailer, clip or interview.
I need to put them in 3 seperate collections.
var trailers = myMediaObject.Videos.Where(type => type.TypeIdentifier == 1);
var clips = myMediaObject.Videos.Where(type => type.TypeIdentifier == 2);
var interviews = myMediaObject.Videos.Where(type => type.TypeIdentifier == 3);
Is there a more efficient way of doing this? I love using Linq here though.
How about:
var lookup = myMediaObject.Videos.ToLookup(type => type.TypeIdentifier);
var trailers = lookup[1];
var clips = lookup[2];
var interviews = lookup[3];
Note that this will materialize the results immediately, whereas your first version didn't. If you still want deferred execution, you might want to use GroupBy instead - although that will be slightly trickier later on. It really depends what you need to do with the results.

Lambda expression to find difference

With the following data
string[] data = { "a", "a", "b" };
I'd very much like to find duplicates and get this result:
a
I tried the following code
var a = data.Distinct().ToList();
var b = a.Except(a).ToList();
obviously this didn't work, I can see what is happening above but I'm not sure how to fix it.
When runtime is no problem, you could use
var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();
Good old O(n^n) =)
Edit: Now for a better solution. =)
If you define a new extension method like
static class Extensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
HashSet<T> hash = new HashSet<T>();
foreach (T item in input)
{
if (!hash.Contains(item))
{
hash.Add(item);
}
else
{
yield return item;
}
}
}
}
you can use
var duplicates = data.Duplicates().Distinct().ToArray();
Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.
from g in (from x in data group x by x)
where g.Count() > 1
select g.Key;
--OR if you prefer extension methods
data.GroupBy(x => x)
.Where(x => x.Count() > 1)
.Select(x => x.Key)
Where Count() == 1 that's your distinct items and where Count() > 1 that's one or more duplicate items.
Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:
var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1
select x.Key;
// distinct
from x in g
where x.Count() == 1
select x.Key;
When creating the grouping a set of sets will be created. Assuming that it's a set with O(1) insertion the running time of the group by approach is O(n). The incurred cost for each operation is somewhat high, but it should equate to near linear performance.
Sort the data, iterate through it and remember the last item. When the current item is the same as the last, its a duplicate. This can be easily implemented either iteratively or using a lambda expression in O(n*log(n)) time.

Categories