BehaviorSubject per group with GroupBy and Switch()

BehaviorSubject per group with GroupBy and Switch() - c#

I have code which would need to have GroupBy and would need a unique BehaviorSubject per group of Switch().
We have a stream of stock market values that we group by Symbol and perform level crossing across a number of levels (defined by a BehaviorSubject and a switch to always use the latest values).
So I need to go from this:
var feed = new Subject<double>();
var levels = new BehaviorSubject<double[]>(new[] { 400.0, 500.0, 600.0, 700.0 });
levels
.Select(thresholds => feed
.Buffer(2, 1)
.Where(x => x.Count == 2)
.Select(x => new { LevelsCrossed = thresholds.GetCrossovers(x[0], x[1]), Previous = x[0], Current = x[1] })
.Where(x => x.LevelsCrossed.Any())
.SelectMany(x => x.LevelsCrossed.Select(level => new ThresholdCrossedEvent(level, x.Previous, x.Current))))
.Switch()
.Subscribe(x => Console.WriteLine(JsonConvert.SerializeObject(x)));
And adapt the above to take a stream of Tick below and group by Symbol, each with its own level threshold detection on each grouped Value.
class Tick
{
public string Symbol { get; set; } // The name.
public decimal Value { get; set; } // The value.
}
Outline:
Take Market data
Group by Symbol
Alert on levels (depending on group name, using a dictionary of BehaviorSubject)
Output
Use Switch() to always use latest values from the dictionary
With a naive implementation I have a wrapper class (ReactiveSymbolFeed below), however blurring non-reactive and reactive code can introduce potential concurrency issues that reactive extensions otherwise deals neatly with.
Questions please:
Am I introducing any side effects, or will this cause issue at scale (say 100,000 messages per second across 2,000 groups)?
Since we have many groups each with their own BehaviorSubject that needs Switch() - can we rewrite our Reactive Extensions statement block to include the thresholds levels per symbol group, or is the above wrapper class the right way to do this?
Further context and the wrapper class solution
Instead I create a ReactiveSymbolFeed wrapper that will form the value part of a dictionary per symbol key.
class ReactiveSymbolFeed
{
readonly BehaviorSubject<double[]> levels;
readonly Subject<double> feed;
public ReactiveSymbolFeed(double[] levels)
{
this.feed = new Subject<double>();
this.levels = new BehaviorSubject<double[]>(levels);
this.levels
.Select(thresholds => this.feed
.Buffer(2, 1)
.Where(x => x.Count == 2)
.Select(x => new { LevelsCrossed = thresholds.GetCrossovers(x[0], x[1]), Previous = x[0], Current = x[1] })
.Where(x => x.LevelsCrossed.Any())
.SelectMany(x => x.LevelsCrossed.Select(level => new ThresholdCrossedEvent(level, x.Previous, x.Current))))
.Switch()
.DistinctUntilChanged(x => x.Threshold)
.Subscribe(x => Console.WriteLine(JsonConvert.SerializeObject(x)));
}
public void OnNext(double value) => this.feed.OnNext(value);
public void UpdateThresholds(double[] levels) => this.levels.OnNext(levels);
}
And then use with the below:
// Setup the detection thresholds per Symbol - each Symbol has 1 set of thresholds
var dictionary = new Dictionary<string, ReactiveSymbolFeed>();
dictionary.Add("AAPL", new ReactiveSymbolFeed(new[] { 120.0, 125.0, 130.0 }));
dictionary.Add("VXX", new ReactiveSymbolFeed(new[] { 10.5, 15, 18.5, 20 }));
// Create some test tick data.
var ticks = new[]
{
new Tick { Symbol = "AAPL", Value = 119.0 },
new Tick { Symbol = "VXX", Value = 10.3 },
new Tick { Symbol = "VXX", Value = 10.8 },
new Tick { Symbol = "AAPL", Value = 121.0 },
new Tick { Symbol = "AAPL", Value = 121.0 }
// Followed by many other differnet Symbols and Values
};
// Loop through test data and dispatch it.
foreach(var tick in ticks)
{
if(dictionary.TryGetValue(tick.Symbol, out var value))
value.OnNext(tick.Value);
}

Related

Using GroupBy to compute average or count based on the whole data until the corresponding date

I have the AssessmentItems DB object which contains the items about: Which user evaluated (EvaluatorId), which submission (SubmissionId), based on which rubric item (or criteria)(RubricItemId) and when (DateCreated).
I group by this object by RubricItemId and DateCreated to get compute some daily statistics based on each assessment criteria (or rubric item).
For example, I compute the AverageScore, which works fine and returns an output like: RubricItem: 1, Day: 15/01/2019, AverageScore: 3.2.
_context.AssessmentItems
.Include(ai => ai.RubricItem)
.Include(ai => ai.Assessment)
.Where(ai => ai.RubricItem.RubricId == rubricId && ai.Assessment.Submission.ReviewRoundId == reviewRoundId)
.Select(ai => new
{
ai.Id,
DateCreated = ai.DateCreated.ToShortDateString(),//.ToString(#"yyyy-MM-dd"),
ai.CurrentScore,
ai.RubricItemId,
ai.Assessment.SubmissionId,
ai.Assessment.EvaluatorId
})
.GroupBy(ai => new { ai.RubricItemId, ai.DateCreated })
.Select(g => new
{
g.Key.RubricItemId,
g.Key.DateCreated,
AverageScore = g.Average(ai => ai.CurrentScore),
NumberOfStudentsEvaluating = g.Select(ai => ai.EvaluatorId).Distinct().Count(),
}).ToList();
What I want to do is to compute the average until that day. I mean instead of calculating the average for the day, I want to get the average until that day (that is, I want to consider the assessment scores of the preceding days). The same why, when I compute NumberOfStudentsEvaluating, I want to indicate the total number of students participated in the evaluation until that day.
One approach to achieve this could be to iterate through the result object and compute these properties again:
foreach (var i in result)
{
i.AverageScore = result.Where(r => r.DateCreated <= i.DateCreated).Select(r => r.AverageScore).Average(),
}
But, this is quite costly. I wonder if it is possible to tweak the code a bit to achieve this, or should I start from scratch with another approach.

If you split the query into two halves, you can compute the average as you would like (I also computed the NumberOfStudentsEvaluating on the same criteria) but I am not sure if EF/EF Core will be able to translate to SQL:
var base1 = _context.AssessmentItems
.Include(ai => ai.RubricItem)
.Include(ai => ai.Assessment)
.Where(ai => ai.RubricItem.RubricId == rubricId && ai.Assessment.Submission.ReviewRoundId == reviewRoundId)
.Select(ai => new {
ai.Id,
ai.DateCreated,
ai.CurrentScore,
ai.RubricItemId,
ai.Assessment.SubmissionId,
ai.Assessment.EvaluatorId
})
.GroupBy(ai => ai.RubricItemId);
var ans1 = base1
.SelectMany(rig => rig.Select(ai => ai.DateCreated).Distinct().Select(DateCreated => new { RubricItemId = rig.Key, DateCreated, Items = rig.Where(b => b.DateCreated <= DateCreated) }))
.Select(g => new {
g.RubricItemId,
DateCreated = g.DateCreated.ToShortDateString(), //.ToString(#"yyyy-MM-dd"),
AverageScore = g.Items.Average(ai => ai.CurrentScore),
NumberOfStudentsEvaluating = g.Items.Select(ai => ai.EvaluatorId).Distinct().Count(),
}).ToList();

how to aggregate a linq query by different groupings

How do you perform multiple seperate aggregations on different grouping in linq?
for example, i have a table:
UNO YOS Ranking Score
123456 1 42 17
645123 3 84 20
I want to perform an set of aggregations on this data both grouped and ungrouped, like:
var grouped = table.GroupBy(x => x.score )
.Select(x => new
{
Score = x.Key.ToString(),
OverallAverageRank = x.Average(y => y.Ranking),
Year1RankAvg = x.Where(y => y.YOS == 1).Average(y => y.Ranking),
Year2RankAvg = x.Where(y => y.YOS == 2).Average(y => y.Ranking)
//...etc
});
I also want to perform different aggregations (standard deviation) on the same slices and whole-set data.
I can't figure out how to both group and not group the YOS at the same time and while this compiles fine, when it comes to runtime, I get "Sequence contains no elements", if any of the YOS averages are in.

Like anything programming, when you have a sequence of similar items, use a collection. In this case, I left it IEnumerable, but you could make it a List, or a Dictionary by YOS, if desired.
var ans = table.GroupBy(t => t.Score)
.Select(tg => new {
Score = tg.Key,
OverallAverageRank = tg.Average(t => t.Ranking),
YearRankAvgs = tg.GroupBy(t => t.YOS).Select(tyg => new { YOS = tyg.Key, RankAvg = tyg.Average(t => t.Ranking) })
});
If you need the range of years from 1 to max (or some other number) filled in, you can modify the answer:
var ans2 = ans.Select(soryr => new {
soryr.Score,
soryr.OverallAverageRank,
YearRankDict = soryr.YearRankAvgs.ToDictionary(yr => yr.YOS),
YearMax = soryr.YearRankAvgs.Max(yr => yr.YOS)
})
.Select(soryr => new {
Score = soryr.Score,
OverAverageRank = soryr.OverallAverageRank,
YearRankAvgs = Enumerable.Range(1, soryr.YearMax).Select(yos => soryr.YearRankDict.ContainsKey(yos) ? soryr.YearRankDict[yos] : new { YOS = yos, RankAvg = 0.0 }).ToList()
});
If you preferred, you could modify the original ans to return RankAvg as double? and put null in place of 0.0 when adding missing years.

Can this query about finding missing keys be improved? (either SQL or LINQ)

I am developing a ASP.NET MVC website and is looking a way to improve this routine. It can be improved either at LINQ level or SQL Server level. I hope at best we can do it within one query call.
Here is the tables involved and some example data:
We have no constraint that every Key has to have each LanguageId value, and indeed the business logic does not allow such contraint. However, at application level, we want to warn the admin that a key is missing a/some language values. So I have this class and query:
public class LocalizationKeyWithMissingCodes
{
public string Key { get; set; }
public IEnumerable<string> MissingCodes { get; set; }
}
This method get the Key list, as well as any missing codes (for example, if we have en + jp + ch language codes, and the key only has values for en + ch, the list will contains jp):
public IEnumerable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes()
{
var languageList = Utils.ResolveDependency<ILanguageRepository>().GetActive();
var languageIdList = languageList.Select(q => q.Id);
var languageIdDictionary = languageList.ToDictionary(q => q.Id);
var keyList = this.GetActive()
.Select(q => q.Key)
.Distinct();
var result = new List<LocalizationKeyWithMissingCodes>();
foreach (var key in keyList)
{
// Get missing codes
var existingCodes = this.Get(q => q.Active && q.Key == key)
.Select(q => q.LanguageId);
// ToList to make sure it is processed at application
var missingLangId = languageList.Where(q => !existingCodes.Contains(q.Id))
.ToList();
result.Add(new LocalizationKeyWithMissingCodes()
{
Key = key,
MissingCodes = missingLangId
.Select(q => languageIdDictionary[q.Id].Code),
});
}
result = result.OrderByDescending(q => q.MissingCodes.Count() > 0)
.ThenBy(q => q.Key)
.ToList();
return result;
}
I think my current solution is not good, because it make a query call for each key. Is there a way to improve it, by either making it faster, or pack within one query call?
EDIT: This is the final query of the answer:
public IQueryable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes()
{
var languageList = Utils.ResolveDependency<ILanguageRepository>().GetActive();
var localizationList = this.GetActive();
return localizationList
.GroupBy(q => q.Key, (key, items) => new LocalizationKeyWithMissingCodes()
{
Key = key,
MissingCodes = languageList
.GroupJoin(
items,
lang => lang.Id,
loc => loc.LanguageId,
(lang, loc) => loc.Any() ? null : lang)
.Where(q => q != null)
.Select(q => q.Code)
}).OrderByDescending(q => q.MissingCodes.Count() > 0) // Show the missing keys on the top
.ThenBy(q => q.Key);
}

Another possibility, using LINQ:
public IEnumerable<LocalizationKeyWithMissingCodes> GetAllKeysWithMissingCodes(
List<Language> languages,
List<Localization> localizations)
{
return localizations
.GroupBy(x => x.Key, (key, items) => new LocalizationKeyWithMissingCodes
{
Key = key,
MissingCodes = languages
.GroupJoin( // check if there is one or more match for each language
items,
x => x.Id,
y => y.LanguageId,
(x, ys) => ys.Any() ? null : x)
.Where(x => x != null) // eliminate all languages with a match
.Select(x => x.Code) // grab the code
})
.Where(x => x.MissingCodes.Any()); // eliminate all complete keys
}

Here is the SQL logic to identify the keys that are missing "complete" language assignments:
SELECT
all.[Key],
all.LanguageId
FROM
(
SELECT
loc.[Key],
lang.LanguageId
FROM
Language lang
FULL OUTER JOIN
Localization loc
ON (1 = 1)
WHERE
lang.Active = 1
) all
LEFT JOIN
Localization loc
ON (loc.[Key] = all.[Key])
AND (loc.LanguageId = all.LanguageId)
WHERE
loc.[Key] IS NULL;
To see all keys (instead of filtering):
SELECT
all.[Key],
all.LanguageId,
CASE WHEN loc.[Key] IS NULL THEN 1 ELSE 0 END AS Flagged
FROM
(
SELECT
loc.[Key],
lang.LanguageId
FROM
Language lang
FULL OUTER JOIN
Localization loc
ON (1 = 1)
WHERE
lang.Active = 1
) all
LEFT JOIN
Localization loc
ON (loc.[Key] = all.[Key])
AND (loc.LanguageId = all.LanguageId);

your code seems to be doing a lot of database query and materialization..
in terms of LINQ, the single query would look like this..
we take the cartesian product of language and localization tables to get all combinations of (key, code) and then subtract the (key, code) tuples that exist in the relationship. this gives us the (key, code) combination that don't exist.
var result = context.Languages.Join(context.Localizations, lang => true,
loc => true, (lang, loc) => new { Key = loc.Key, Code = lang.Code })
.Except(context.Languages.Join(context.Localizations, lang => lang.Id,
loc => loc.LanguageId, (lang, loc) => new { Key = loc.Key, Code = lang.Code }))
.GroupBy(r => r.Key).Select(r => new LocalizationKeyWithMissingCodes
{
Key = r.Key,
MissingCodes = r.Select(kc => kc.Code).ToList()
})
.ToList()
.OrderByDescending(lkmc => lkmc.MissingCodes.Count())
.ThenBy(lkmc => lkmc.Key).ToList();
p.s. i typed this LINQ query on the go, so let me know if it has syntax issues..
the gist of the query is that we take a cartesian product and subtract matching rows.

Multiple fields in GroupBy clause in LINQ (where one is computed field)

I have a LINQ query upon which I need to add two fields as group by clauses. While I can easily group by with as many column fields but the problem is occurring when one of the fields is a calculated field. I can't seem to be able to get my head around on how to add the second attribute in this case
var values = intermediateValues
//.GroupBy(x => new {x.Rate, x.ExpiryDate })
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize)) * BucketSize })
.Select(y => new FXOptionScatterplotValue
{
Volume = y.Sum(z => z.TransactionType == "TERMINATION" ? -z.Volume : z.Volume),
Rate = y.Key.Rate,
ExpiryDate = y.Key.ExpiryDate,
Count = y.Count()
}).ToArray();
In the above code sample I would like to have ExpiryDate added to my existing GroupBy clause which has a computed field of Rate already there. The code looks like this in VS editor

So just include it as you have in the commented-out code:
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize)) * BucketSize,
r.ExpiryDate })

This might help you
var values = intermediateValues
//.GroupBy(x => new {x.Rate, x.ExpiryDate })
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize) ) * BucketSize,ExpiryDate1 = r.ExpiryDate })
.Select(y => new FXOptionScatterplotValue
{
Volume = y.Sum(z => z.TransactionType == "TERMINATION" ? -z.Volume : z.Volume),
Rate = y.Key.Rate,
ExpiryDate = y.Key.ExpiryDate1,
Count = y.Count()
}).ToArray();
Just use ExpiryDate1 as anonymous type and use this as key name....

Select single item from each group in multiple groups

I have a list (specifically IEnumerable) of items of a specific class:
internal class MyItem
{
public MyItem(DateTime timestamp, string code)
{
Timestamp= timestamp;
Code = code;
}
public DateTime Timestamp { get; private set; }
public string Code { get; private set; }
}
Within this list, there will be multiple items with the same code. Each will have a timestamp, which may or may not be unique.
I'm attempting to retrieve a dictionary of MyItem's (Dictionary<string, MyItem>) where the key is the code associated with the item.
public Dictionary<string, MyItem> GetLatestCodes(IEnumerable<MyItem> items, DateTime latestAllowableTimestamp)
Given this signature, how would I retrieve the MyItem with a timestamp closest to, but not after latestAllowableTimestamp for each code?
For example, given the following for input:
IEnumerable<MyItem> items = new List<MyItem>{
new MyItem(DateTime.Parse("1/1/2014"), "1"),
new MyItem(DateTime.Parse("1/2/2014"), "2"),
new MyItem(DateTime.Parse("1/3/2014"), "1"),
new MyItem(DateTime.Parse("1/4/2014"), "1"),
new MyItem(DateTime.Parse("1/4/2014"), "2")};
If the latestAllowableTimestamp is 1/3/2014, the result would contain only the following items:
Timestamp | Code
----------------
1/3/2014 | 1
1/2/2014 | 2
I can manage to filter the list down to only those timestamps prior to latestAllowableTimestamp, but I don't know linq well enough to pick the most recent for each code and insert it into a dictionary.
var output = items.Where(t => (t.Timestamp <= latestAllowableTimestamp)).GroupBy(t => t.Code);
At this point, I've ended up with two groups, but don't know how to select a single item across each group.

Here is the actual method you are trying to write. It even returns a dictionary and everything:
static Dictionary<string, MyItem> GetLatestCodes(
IEnumerable<MyItem> items, DateTime latestAllowableTimestamp)
{
return items
.Where(item => item.TimeStamp <= latestAllowableTimestamp)
.GroupBy(item => item.Code)
.Select(group => group
.OrderByDescending(item => item.TimeStamp)
.First())
.ToDictionary(item => item.Code);
}
See Enumerable.ToDictionary

This is the your part you should have posted in your question (as LB pointed out)
var list = new List<MyItem>()
{
new MyItem(){ code = "1" , timestamp = new DateTime(2014,1,1)},
new MyItem(){ code = "2" , timestamp = new DateTime(2014,1,2)},
new MyItem(){ code = "1" , timestamp = new DateTime(2014,1,3)},
new MyItem(){ code = "1" , timestamp = new DateTime(2014,1,4)},
new MyItem(){ code = "2" , timestamp = new DateTime(2014,1,4)}
};
DateTime latestAllowableTimestamp = new DateTime(2014, 1, 3);
This is my answer
var result = list.GroupBy(x => x.code)
.Select(x => x.OrderByDescending(y => y.timestamp)
.FirstOrDefault(z => z.timestamp <= latestAllowableTimestamp))
.ToList();

To create your Dictionary, could construct your query like so:
var newDict = items.Where(a => a.Timestamp <= latestAllowableTimestamp)
.GroupBy(b => b.Timestamp)
.ToDictionary(c => c.First().Timestamp, c => c.First());
This should create a Dictionary from your data, with no duplicate days. Note that without the GroupBy query, you'll raise an exception, because ToDictionary doesn't filter out keys it's already seen.
And then if you wanted to get only one MyItem for any given code number, you could use this query:
newDict.Select(a => a.Value)
.OrderByDescending(b => b.Timestamp)
.GroupBy(c => c.Code)
.Select(d => d.First());
The FirstOrDefault query will return only one element from each group. This will give you the MyItem closest to the latest date for any given code.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

BehaviorSubject per group with GroupBy and Switch() - c#

Related

Using GroupBy to compute average or count based on the whole data until the corresponding date

how to aggregate a linq query by different groupings

Can this query about finding missing keys be improved? (either SQL or LINQ)

Multiple fields in GroupBy clause in LINQ (where one is computed field)

Select single item from each group in multiple groups

Categories

Resources