Rx Window, Join, GroupJoin? - c#

I have generated/tested two Observables to be combined for executing a single query.
A user can have multiple roles. Whenever their role id changes the data needs to be updated. But the data should only update if the query is active (there is some control that currently needs the data).
A role Id change can also happen when a query is suspended. When the query becomes active again the data should also load.
//Tuple has the Id of the current Role and the time that the Id updated
IObservable<Tuple<Guid, DateTime>> idUpdate
//Tuple has the state of the query (true=active or false=suspended)
//and the time the state of the query updated
IObservable<Tuple<bool, DateTime>> queryStateUpdate
I would like to create
//A hot observable that pushes true whenever the query should execute
IObservable<bool> execute
I broke it down into two cases that could be merged but I cannot figure out how to create the case observables.
case a) the role Id updated & the last state was Active
case b) the state updated to Active && this is the first active state since the role Id updated
I have looked through the videos, lee campbells site, the beginners TOC, etc but I can't seem to find a good example for this rx join. Any ideas on how to create the execute or case observables?

Given the problem as described - which is a little vague as I don't see what the actual id (Guid) is used for, nor the DateTime values - I've got the following query which appears to solve your problem:
IObservable<bool> execute =
idUpdate
.Publish(_idUpdate =>
from qsu in queryStateUpdate
select qsu.Item1
? _idUpdate.Select(x => true)
: Observable.Empty<bool>())
.Switch();
I've tested this with the following idUpdate & queryStateUpdate observables.
var rnd = new Random();
IObservable<Tuple<Guid, DateTime>> idUpdate =
Observable
.Generate(
0,
n => n < 10000,
n => n + 1,
n => Tuple.Create(Guid.NewGuid(), DateTime.Now),
n => TimeSpan.FromSeconds(rnd.NextDouble() * 0.1));
IObservable<Tuple<bool, DateTime>> queryStateUpdate =
Observable
.Generate(
0,
n => n < 100,
n => n + 1,
n => n % 2 == 0,
n => TimeSpan.FromSeconds(rnd.NextDouble() * 2.0))
.StartWith(true)
.DistinctUntilChanged()
.Select(b => Tuple.Create(b, DateTime.Now));
If you can provide some clarification around your problem I will probably be able to provide a better answer to suit your needs.
EDIT: Added the "replay(1)" behaviour required when the Id changes when inactive.
Please note that I have gotten rid of the need to have tuples with DateTime.
IObservable<Guid> idUpdate = ...
IObservable<bool> queryStateUpdate = ...
var replay = new ReplaySubject<Guid>(1);
var disposer = new SerialDisposable();
Func<bool, IObservable<bool>, IObservable<Guid>,
IObservable<Guid>> getSwitch = (qsu, qsus, iu) =>
{
if (qsu)
{
return replay.Merge(iu);
}
else
{
replay.Dispose();
replay = new ReplaySubject<Guid>(1);
disposer.Disposable = iu.TakeUntil(qsus).Subscribe(replay);
return Observable.Empty<Guid>();
}
};
var query =
queryStateUpdate
.DistinctUntilChanged()
.Publish(qsus =>
idUpdate
.Publish(ius =>
qsus
.Select(qsu =>
getSwitch(qsu, qsus, ius))))
.Switch();

I read the question as saying that there is a stream of notifications idUpdate, which will be processed as long as queryStateUpdate is set. When queryStateUpdate isn't set, then the notifications should pause until queryStateUpdate is set again.
In which case the join operator is not going to solve your problem.
I would suggest that you need some form of cache while queryStateUpdate is unset, i.e.
List<Tuple<Guid,DateTime>> cache = new List<Tuple<Guid,DateTime>>();
Subject<Tuple<Guid,DateTime>> execute = new Subject<Tuple<Guid,DateTime>>();
idUpdate.Subscribe( x => {
if (queryStateUpdate.Last().Item1) //might be missing something here with Last, you might need to copy the state out
exeucte.OnNext(x);
else
cache.Add(x);
});
queryStateUpdate.Subscribe(x=> {
if (x.Item1)
{
//needs threadsafety
foreach(var x in cache)
execute.OnNext(x);
cache.Clear();
});

Thanks to Enigmativity and AlSki. Using a cache I came up with the answer.
var execute = new Subject<Guid>();
var cache = new Stack<Guid>();
idUpdate.CombineLatest(queryStateUpdate, (id, qs) => new { id, qs }).Subscribe( anon =>
{
var id = anon.id;
var queryState = anon.qs;
//The roleId updated after the queryState updated
if (id.Item2 > queryState.Item2)
{
//If the queryState is active, call execute
if (observationState.Item1)
{
cache.Clear();
execute.OnNext(roleId.Item1);
return;
}
//If the id updated and the state is suspended, cache it
cache.Push(id.Item1);
}
//The queryState updated after the roleId
else if (queryState.Item2 > roleId.Item2)
{
//If the queryState is active and a roleId update has been cached, call execute
if (queryState.Item1 && cache.Count > 0)
{
execute.OnNext(cache.Pop());
cache.Clear();
}
}});

Related

Renumbering database rows is inefficient

My database has a Task table with a Sequence column. The Sequence column specifies the order of the tasks.
In some cases, I need to change the order. So I would use something like this:
var tasks = (dbContext.Tasks
.Where(t => t.UserId == userId)).ToList();
for (int i = 0; i < tasks.Count; i++)
{
// Set new sequence
tasks[i].Sequence = i;
}
dbContext.SaveChanges();
This seems rather inefficient to have to retrieve every column of every Task in the set.
Is there a more efficient way to do this?
Note: Please don't get caught up in the fact that I'm simply setting Sequence to i in the code above. The real code will have data receive data that indicates the correct values. But if I could optimize the code above, I could then adapt it to my final needs.
You should be able to pull down only the column you want to update by using a Select statement, and then, according to this answer, update just that column.
This example might work, but unfortunately I can't test it right now:
// Query just a single column
var tasks = dbContext.Tasks
.Where(t => t.UserId == userId)
.Select(t => new Task { UserId = t.UserId, Sequence = t.Sequence })
.ToList();
// Update a single column and tell EF to track it
for(int i = 0; i < tasks.Count; i++)
{
tasks[i].Sequence = i;
dbContext.Attach(tasks[i]);
dbContext.Entry(tasks[i]).Property(t => t.Sequence).IsModified = true;
}
// Save the changes to that column
dbContext.SaveChanges();

The query processor ran out of internal resources and could not produce a query plan in EF

I have a query in EF where there is a List of string value that it checks for existence in another table.
Please consider the below query for more details.
Code
List<string> ItmsStock = item.Select(ds => ds.ItemNum).ToList(); // Currently, This List items count is 80,000 records.
this.Db.Database.CommandTimeout = 180;
var existsStckList = Db.Stocktakes.Where(ds => ItmsStock.Contains(ds.ItemNo)).Select(ds => ds.ItemNo).ToList();
item.RemoveAll(ds => existsStckList.Contains(ds.ItemNum));
var ItmsExists = Db.Items.Where(ds => ItmsStock.Contains(ds.ItemNo)).Select(ds => ds.ItemNo).ToList();
ItmsExists = Db.Stocktakes.Where(ds => !ItmsExists.Contains(ds.ItemNo)).Select(ds => ds.ItemNo).ToList();
I searched on the internet and found the converted sql uses IN to check for existence. so, the limit for the IN makes the problem. My question here is, How can I efficiently perform the above actions without using for loop.
I ll be appreciating you, If anybody can help me out.
Edit
Previously, I had the below code. After facing the performance issue with the below code, I wrote the above one.
foreach (var stockitems in item)
{
if (Db.Stocktakes.Any(a => a.ItemNo == stockitems.ItemNum))
{
StockResult ss = new StockResult();
ss.ItemNumber = stockitems.ItemNum;
ss.FileName = stockitems.FileName;
Stockres.Add(ss);
}
else if (!Db.Stocktakes.Any(a => a.ItemNo == stockitems.ItemNum) && Db.Items.Any(a => a.ItemNo == stockitems.ItemNum))
{
var ItemNo = stockitems.ItemNum;
var AdminId = Convert.ToInt32(Session["AccId"]);
var CreatedOn = System.DateTime.Now;
int dbres = Db.Database.ExecuteSqlCommand("insert into Stocktake values({0},{1},{2})", ItemNo, AdminId, CreatedOn);
Db.SaveChanges();
totalcount = totalcount + 1;
}
else
{
StockResult sss = new StockResult();
sss.ItemNumber = stockitems.ItemNum;
sss.FileName = stockitems.FileName;
Stockitemsdup.Add(sss);
}
}
Thanks.
Issue batches of 1000 item IDs to the database, or use native SQL and submit a table-valued parameter, or a temp table filled with SqlBulkCopy.
I'm surprised you got htis particular message. The parameter limit is about 2000 parameters. Your query should have been rejected.

Complexity limits of Linq queries

I'm a big fan of Linq, and I have been really enjoying the power of expression trees etc. But I have found that whenever I try to get too clever with my queries, I hit some kind of limitation in the framework: while the query can take a very short time to run on the database (as shown by performance analyzer), the results take ages to materialize. When that happens I know I've been too fancy, and I start breaking the query up into smaller, bite sized chunks - so I have a solution for that, though it might not always be the most optimal.
But I'd like to understand:
What is it that pushes the Linq framework over the edge in terms of materializing the query results?
Where can I read about the mechanism of materializing query results?
Is there a certain measurable complexity limit for Linq queries that should be avoided?
What design patterns are known to cause this problem, and what patterns can remedy it?
EDIT: As requested in comments, here's an example of a query that I measured to run on SQL Server in a few seconds, but took almost 2 minutes to materialize. I'm not going to try explaining all the stuff in context; it's here just so you can view the constructs and see an example of what I'm talking about:
Expression<Func<Staff, TeacherInfo>> teacherInfo =
st => new TeacherInfo
{
ID = st.ID,
Name = st.FirstName + " " + st.LastName,
Email = st.Email,
Phone = st.TelMobile,
};
var step1 =
currentReportCards.AsExpandable()
.GroupJoin(db.ScholarReportCards,
current =>
new { current.ScholarID, current.AcademicTerm.AcademicYearID },
past => new { past.ScholarID, past.AcademicTerm.AcademicYearID },
(current, past) => new
{
Current = current,
PastCards =
past.Where(
rc =>
rc.AcademicTerm.StartDate <
current.AcademicTerm.StartDate &&
rc.AcademicTerm.Grade == current.AcademicTerm.Grade &&
rc.AcademicTerm.SchoolID == current.AcademicTerm.SchoolID)
});
// This materialization is what takes a long time:
var subjects = step1.SelectMany(x => from key in x.Current.Subjects
.Select(s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID })
.Union(x.PastCards.SelectMany(c => c.Subjects)
.Select(
s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID }))
join cur in x.Current.Subjects on key equals
new { cur.Subject.SubjectID, cur.Subject.SubjectCategoryID } into jcur
from cur in jcur.DefaultIfEmpty()
join past in x.PastCards.SelectMany(p => p.Subjects) on key equals
new { past.Subject.SubjectID, past.Subject.SubjectCategoryID } into past
select new
{
x.Current.ScholarID,
IncludeInContactSection =
// ReSharper disable ConstantNullCoalescingCondition
(bool?)cur.Subject.IncludeInContactSection ?? false,
IncludeGrades = (bool?)cur.Subject.IncludeGrades ?? true,
// ReSharper restore ConstantNullCoalescingCondition
SubjectName =
cur.Subject.Subject.Name ?? past.FirstOrDefault().Subject.Subject.Name,
SubjectCategoryName = cur.Subject.SubjectCategory.Description,
ClassInfo = (from ce in myDb.ClassEnrollments
.Where(
ce =>
ce.Class.SubjectID == cur.Subject.SubjectID
&& ce.ScholarID == x.Current.ScholarID)
.Where(enrollmentExpr)
.OrderByDescending(ce => ce.TerminationDate ?? DateTime.Today)
let teacher = ce.Class.Teacher
let secTeachers = ce.Class.SecondaryTeachers
select new
{
ce.Class.Nickname,
Primary = teacherInfo.Invoke(teacher),
Secondaries = secTeachers.AsQueryable().AsExpandable()
.Select(ti => teacherInfo.Invoke(ti))
})
.FirstOrDefault(),
Comments = cur.Comments
.Select(cc => new
{
Staff = cc.Staff.FirstName + " "
+ cc.Staff.LastName,
Comment = cc.CommentTemplate.Text ??
cc.CommentFreeText
}),
// ReSharper disable ConstantNullCoalescingCondition
DisplayOrder = (byte?)cur.Subject.DisplayOrder ?? (byte)99,
// ReSharper restore ConstantNullCoalescingCondition
cur.Percentile,
cur.Score,
cur.Symbol,
cur.MasteryLevel,
PastScores = past.Select(p => new
{
p.Score,
p.Symbol,
p.MasteryLevel,
p.ScholarReportCard
.AcademicTermID
}),
Assessments = cur.Assessments
.Select(a => new
{
a.ScholarAssessment.AssessmentID,
a.ScholarAssessment.Assessment.Description,
a.ScholarAssessment.Assessment.Type.Nickname,
a.ScholarAssessment.AssessmentDate,
a.ScoreDesc,
a.ScorePerc,
a.MasteryLevel,
a.ScholarAssessment.Assessment.Type.AssessmentFormat,
a.ScholarAssessment.PublishedStatus,
a.ScholarAssessment.FPScore,
a.ScholarAssessment.TotalScore,
a.ScholarAssessment.Assessment.Type.ScoreType,
a.ScholarAssessment.Assessment.Type.OverrideBelowLabel,
a.ScholarAssessment.Assessment.Type.OverrideApproachingLabel,
a.ScholarAssessment.Assessment.Type.OverrideMeetingLabel,
a.ScholarAssessment.Assessment.Type.OverrideExceedingLabel,
})
})
.ToList();
Linq uses deferred execution for some tasks, for example while iterating through an IEnumerable<>, so what you call materialization includes some actual data fetching.
var reportCards = db.ScholarReportCards.Where(cr => ...); // this prepares the query
foreach (var rc in reportCards) {} // this executes your query and calls the DB
I think that if you trace/time queries on your SQL server you may see some queries arriving during the "materialization" step. This problem may even be exacerbated by anti-patterns such as the "Select N+1" problem : for example it looks like you're not including the AcademicTerm objects in your request; if you don't resolving these will result in a select N+1, that is for every ScholarReportCard there will be a call to the DB to lazily resolve the AcademicTerm attached.
If we focus on the Linq to DB aspect, at least try not to :
select n+1: Include the related datatables you will need
select too much data: include only the columns you need in your selection (Include on the table you need)

Optimize LINQ to Objects query

I have around 200K records in a list and I'm looping through them and forming another collection. This works fine on my local 64 bit Win 7 but when I move it to a Windows Server 2008 R2, it takes a lot of time. There is difference of about an hour almost!
I tried looking at Compiled Queries and am still figuring it out.
For various reasons, we cant do a database join and retrieve the child values
Here is the code:
//listOfDetails is another collection
List<SomeDetails> myDetails = null;
foreach (CustomerDetails myItem in customerDetails)
{
var myList = from ss in listOfDetails
where ss.CustomerNumber == myItem.CustomerNum
&& ss.ID == myItem.ID
select ss;
myDetails = (List<SomeDetails>)(myList.ToList());
myItem.SomeDetails = myDetails;
}
I would do this differently:
var lookup = listOfDetails.ToLookup(x => new { x.CustomerNumber, x.ID });
foreach(var item in customerDetails)
{
var key = new { CustomerNumber = item.CustomerNum, item.ID };
item.SomeDetails = lookup[key].ToList();
}
The big benefit of this code is that it only has to loop through the listOfDetails once to build the lookup - which is nothing more than a hash map. After that we just get the values using the key, which is very fast as that is what hash maps are built for.
I don't know why you have the difference in performance, but you should be able to make that code perform better.
//listOfDetails is another collection
List<SomeDetails> myDetails = ...;
detailsGrouped = myDetails.ToLookup(x => new { x.CustomerNumber, x.ID });
foreach (CustomerDetails myItem in customerDetails)
{
var myList = detailsGrouped[new { CustomerNumber = myItem.CustomerNum, myItem.ID }];
myItem.SomeDetails = myList.ToList();
}
The idea here is to avoid the repeated looping on myDetails, and build a hash based lookup instead. Once that is built, it is very cheap to do a lookup.
The inner ToList() is forcing an evaluation on each loop, which has got to hurt. The SelectMany might let you avoid the ToList, something like this :
var details = customerDetails.Select( item => listOfDetails
.Where( detail => detail.CustomerNumber == item.CustomerNum)
.Where( detail => detail.ID == item.ID)
.SelectMany( i => i as SomeDetails )
);
If you first get all the SomeDetails and then assign them to the items, it might speed up. Or it might not. You should really profile to see where the time is being taken.
I think you'd probably benefit from a join here, so:
var mods = customerDetails
.Join(
listOfDetails,
x => Tuple.Create(x.ID, x.CustomerNum),
x => Tuple.Create(x.ID, x.CustomerNumber),
(a, b) => new {custDet = a, listDet = b})
.GroupBy(x => x.custDet)
.Select(g => new{custDet = g.Key,items = g.Select(x => x.listDet).ToList()});
foreach(var mod in mods)
{
mod.custDet.SomeDetails = mod.items;
}
I didn't compile this code...
With a join the matching of items from one list against another is done by building a hashtable-like collection (Lookup) of the second list in O(n) time. Then it's a matter of iterating the first list and pulling items from the Lookup. As pulling data from a hashtable is O(1), the iterate/match phase also only takes O(n), as does the subsequent GroupBy. So in all the operation should take ~O(3n) which is equivalent to O(n), where n is the length of the longer list.

Use LINQ to group a sequence by date with no gaps

I'm trying to select a subgroup of a list where items have contiguous dates, e.g.
ID StaffID Title ActivityDate
-- ------- ----------------- ------------
1 41 Meeting with John 03/06/2010
2 41 Meeting with John 08/06/2010
3 41 Meeting Continues 09/06/2010
4 41 Meeting Continues 10/06/2010
5 41 Meeting with Kay 14/06/2010
6 41 Meeting Continues 15/06/2010
I'm using a pivot point each time, so take the example pivot item as 3, I'd like to get the following resulting contiguous events around the pivot:
ID StaffID Title ActivityDate
-- ------- ----------------- ------------
2 41 Meeting with John 08/06/2010
3 41 Meeting Continues 09/06/2010
4 41 Meeting Continues 10/06/2010
My current implementation is a laborious "walk" into the past, then into the future, to build the list:
var activity = // item number 3: Meeting Continues (09/06/2010)
var orderedEvents = activities.OrderBy(a => a.ActivityDate).ToArray();
// Walk into the past until a gap is found
var preceedingEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID);
DateTime dayBefore;
var previousEvent = activity;
while (previousEvent != null)
{
dayBefore = previousEvent.ActivityDate.AddDays(-1).Date;
previousEvent = preceedingEvents.TakeWhile(a => a.ID != previousEvent.ID).LastOrDefault();
if (previousEvent != null)
{
if (previousEvent.ActivityDate.Date == dayBefore)
relatedActivities.Insert(0, previousEvent);
else
previousEvent = null;
}
}
// Walk into the future until a gap is found
var followingEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
DateTime dayAfter;
var nextEvent = activity;
while (nextEvent != null)
{
dayAfter = nextEvent.ActivityDate.AddDays(1).Date;
nextEvent = followingEvents.SkipWhile(a => a.ID != nextEvent.ID).Skip(1).FirstOrDefault();
if (nextEvent != null)
{
if (nextEvent.ActivityDate.Date == dayAfter)
relatedActivities.Add(nextEvent);
else
nextEvent = null;
}
}
The list relatedActivities should then contain the contiguous events, in order.
Is there a better way (maybe using LINQ) for this?
I had an idea of using .Aggregate() but couldn't think how to get the aggregate to break out when it finds a gap in the sequence.
Here's an implementation:
public static IEnumerable<IGrouping<int, T>> GroupByContiguous(
this IEnumerable<T> source,
Func<T, int> keySelector
)
{
int keyGroup = Int32.MinValue;
int currentGroupValue = Int32.MinValue;
return source
.Select(t => new {obj = t, key = keySelector(t))
.OrderBy(x => x.key)
.GroupBy(x => {
if (currentGroupValue + 1 < x.key)
{
keyGroup = x.key;
}
currentGroupValue = x.key;
return keyGroup;
}, x => x.obj);
}
You can either convert the dates to ints by means of subtraction, or imagine a DateTime version (easily).
In this case I think that a standard foreach loop is probably more readable than a LINQ query:
var relatedActivities = new List<TActivity>();
bool found = false;
foreach (var item in activities.OrderBy(a => a.ActivityDate))
{
int count = relatedActivities.Count;
if ((count > 0) && (relatedActivities[count - 1].ActivityDate.Date.AddDays(1) != item.ActivityDate.Date))
{
if (found)
break;
relatedActivities.Clear();
}
relatedActivities.Add(item);
if (item.ID == activity.ID)
found = true;
}
if (!found)
relatedActivities.Clear();
For what it's worth, here's a roughly equivalent -- and far less readable -- LINQ query:
var relatedActivities = activities
.OrderBy(x => x.ActivityDate)
.Aggregate
(
new { List = new List<TActivity>(), Found = false, ShortCircuit = false },
(a, x) =>
{
if (a.ShortCircuit)
return a;
int count = a.List.Count;
if ((count > 0) && (a.List[count - 1].ActivityDate.Date.AddDays(1) != x.ActivityDate.Date))
{
if (a.Found)
return new { a.List, a.Found, ShortCircuit = true };
a.List.Clear();
}
a.List.Add(x);
return new { a.List, Found = a.Found || (x.ID == activity.ID), a.ShortCircuit };
},
a => a.Found ? a.List : new List<TActivity>()
);
Somehow, I don't think LINQ was truly meant to be used for bidirectional-one-dimensional-depth-first-searches, but I constructed a working LINQ using Aggregate. For this example I'm going to use a List instead of an array. Also, I'm going to use Activity to refer to whatever class you are storing the data in. Replace it with whatever is appropriate for your code.
Before we even start, we need a small function to handle something. List.Add(T) returns null, but we want to be able to accumulate in a list and return the new list for this aggregate function. So all you need is a simple function like the following.
private List<T> ListWithAdd<T>(List<T> src, T obj)
{
src.Add(obj);
return src;
}
First, we get the sorted list of all activities, and then initialize the list of related activities. This initial list will contain the target activity only, to start.
List<Activity> orderedEvents = activities.OrderBy(a => a.ActivityDate).ToList();
List<Activity> relatedActivities = new List<Activity>();
relatedActivities.Add(activity);
We have to break this into two lists, the past and the future just like you currently do it.
We'll start with the past, the construction should look mostly familiar. Then we'll aggregate all of it into relatedActivities. This uses the ListWithAdd function we wrote earlier. You could condense it into one line and skip declaring previousEvents as its own variable, but I kept it separate for this example.
var previousEvents = orderedEvents.TakeWhile(a => a.ID != activity.ID).Reverse();
relatedActivities = previousEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, prevItem) => items.OrderBy(a => a.ActivityDate).First().ActivityDate.Subtract(prevItem.ActivityDate).Days.Equals(1) ? ListWithAdd(items, prevItem) : items).ToList();
Next, we'll build the following events in a similar fashion, and likewise aggregate it.
var nextEvents = orderedEvents.SkipWhile(a => a.ID != activity.ID);
relatedActivities = nextEvents.Aggregate<Activity, List<Activity>>(relatedActivities, (items, nextItem) => nextItem.ActivityDate.Subtract(items.OrderBy(a => a.ActivityDate).Last().ActivityDate).Days.Equals(1) ? ListWithAdd(items, nextItem) : items).ToList();
You can properly sort the result afterwards, as now relatedActivities should contain all activities with no gaps. It won't immediately break when it hits the first gap, no, but I don't think you can literally break out of a LINQ. So it instead just ignores anything which it finds past a gap.
Note that this example code only operates on the actual difference in time. Your example output seems to imply that you need some other comparison factors, but this should be enough to get you started. Just add the necessary logic to the date subtraction comparison in both entries.

Categories