Using Linq to look for gap in dates - c#

I have a single queryable that is ordered by id and then by date in descending order (most recent will appear first), tracking when the item (id) was online. If a gap is detected (more than 1 day) I want to filter out all of the dates that come prior to the gap, as to only get the most recent range of online days.
Currently I am looping through with for loops, however the data set is very large so I would like to improve performance using linq.
Are there any ways to compare the records by id, then date, and remove elements of that id after a gap is detected ( current.date - next.date != 1)?
Id
Date
1
2022/01/01
1
2021/12/31
1
2021/12/25
2
2021/12/20
2
2021/12/19
2
2021/12/18
2
2021/12/15
would return:
Id
Date
1
2022/01/01
1
2021/12/31
2
2021/12/20
2
2021/12/19
2
2021/12/18

var result = queryable
.GroupBy(entry => entry.Id)
.AsEnumerable()
.Select(entryGroup => entryGroup
.OrderByDescending(entry => entry.Date)
.Aggregate((EntryGroup: new List<Entry>(), GapDetected: false), (accumulated, current) =>
{
if (accumulated.GapDetected) return accumulated;
var previous = accumulated.EntryGroup.LastOrDefault();
if (previous == null || (previous.Date - current.Date).Days < 2) accumulated.EntryGroup.Add(current);
else accumulated.GapDetected = true;
return accumulated;
}))
.SelectMany(entryGroup => entryGroup.EntryGroup)
.ToList();
Note that only the GroupBy portion of the code is actually executed as an SQL query and the rest of the query is done locally since it can not be translated to SQL. I couldn't come up with a solution where the entire query could be translated to SQL but I wanted to show how this can be done with LinQ.

Related

How to search from top xxx rows in database using Entity Framework

I have a database with over 1 million records, I want to find a value in this database, but I know that this value is found somewhere in the top 1000 records.
List<string> onHoldOrderslist =
orderList.Where(m => (m.customerId == item.customerId)
&& (m.Marketplace == Market.US)
&& (m.OrderStatus == "onHold"))
.Select(s => s.OrderId)
.ToList();
In the code, I do not want to search the whole orderList database table, just the top xxx records.
My questions are:
How is it done with linq? I couldn't find any example!
Does it enhance the query performance?
Use
List<string> onHoldOrderslist = orderList.Where(m => (m.customerId == item.customerId) && (m.Marketplace == Market.US) && (m.OrderStatus == "onHold"))
.OrderBy(x => x.WhateverMakesSense)
.Take(1000)
.Select(s => s.OrderId)
.ToList();
Please note that ordering is important as otherwise you may get random 1000 elements...
Given you say this is "a record" and you are only returning one, don't worry about the fact that it's in the top 1000 (and top doesn't even mean anything unless you specify an order). Using Take(1000) after the where clause will do nothing as there is only one record anyway. All you need is an index, in this case on customerId, Marketplace, and OrderStatus.

how to take 100 records from linq query based on a condition

I have a query, which will give the result set . based on a condition I want to take the 100 records. that means . I have a variable x, if the value of x is 100 then I have to do .take(100) else I need to get the complete records.
var abc=(from st in Context.STopics
where st.IsActive==true && st.StudentID == 123
select new result()
{
name = st.name }).ToList().Take(100);
Because LINQ returns an IQueryable which has deferred execution, you can create your query, then restrict it to the first 100 records if your condition is true and then get the results. That way, if your condition is false, you will get all results.
var abc = (from st in Context.STopics
where st.IsActive && st.StudentID == 123
select new result
{
name = st.name
});
if (x == 100)
abc = abc.Take(100);
abc = abc.ToList();
Note that it is important to do the Take before the ToList, otherwise, it would retrieve all the records, and then only keep the first 100 - it is much more efficient to get only the records you need, especially if it is a query on a database table that could contain hundreds of thousands of rows.
One of the most important concept in SQL TOP command is order by. You should not use TOP without order by because it may return different results at different situations.
The same concept is applicable to linq too.
var results = Context.STopics.Where(st => st.IsActive && st.StudentID == 123)
.Select(st => new result(){name = st.name})
.OrderBy(r => r.name)
.Take(100).ToList();
Take and Skip operations are well defined only against ordered sets. More info
Although the other users are correct in giving you the results you want...
This is NOT how you should be using Entity Framework.
This is the better way to use EF.
var query = from student in Context.Students
where student.Id == 123
from topic in student.Topics
order by topic.Name
select topic;
Notice how the structure more closely follows the logic of the business requirements.
You can almost read the code in English.

RavenDB Select Distinct on a single property

I have an object stored in RavenDB with three properties: ID, Score, Date.
I want to create an index for retrieving the top 5 scores within a given date range. However, I only want to retrieve one record per ID. If a single ID shows up more than once in the top scores, I only want to retrieve the highest score for that ID, then move on to the next ID.
example scores:
Score____ID____
1000 1
950 1
900 1
850 2
800 2
750 3
700 4
650 5
600 6
550 7
desired query results:
Score____ID____
1000 1
850 2
750 3
700 4
650 5
I have created an explicit index similar to this (adjusted for simplicity):
Map = docs => from doc in docs
orderby doc.Score descending
select new
{
Score = doc.Score,
ID = doc.ID,
Date = doc.Date
};
I call my query with code similar to this (adjusted for simplicity):
HighScores = RavenSession.Query<Score, Scores_ByDate>()
.Customize(x => x.WaitForNonStaleResultsAsOfNow())
.Where(x => x.Date > StartDate)
.Where(x => x.Date < EndDate)
.OrderByDescending(x => x.Score)
.Take(5)
.ToList();
I don't know how to say "only give me the results from each ID one time in the list."
So a few pointers:
Don't order in the Map function. Maps are designed to just dump documents out.
Use the Reduce to do grouping, as this is the way they work by design
Add a hint to RavenDB that a particular column will be sorted in code, and what type of field it is.
By default, the map/reduce assumes the sorting is for text, even if it is a number - (I learned this the hard way and got help for it.)
So..
Just define the Map/Reduce index as normal, and add a sort condition at the end, like this:
public class Score_TopScoringIndex : AbstractIndexCreationTask<Score, Score>
{
public Score_TopScoringIndex()
{
Map = docs => from doc in docs
select new
{
Score = doc.Score,
ID = doc.ID,
Date = doc.Date
};
Reduce = results => from result in results
group result by result.ID into g
select new
{
Score = g.First().Score,
ID = g.Key,
Date = g.First().Date
};
Sort(x=>x.Score, SortOptions.Int);
}
}
Make sure the index is in the DB by using at the startup of your application:
IndexCreation.CreateIndexes(typeof(Score_TopScoringIndex).Assembly, documentStore);
Now, when you query, the OrderByDescending, it will be very fast.
using(var session = store.OpenSession())
{
var highScores = session.Query<Score>("Scores/TopScoringIndex")
.OrderByDescending(x=>x.Score)
.Take(5);
}
You can try using morelinq library
https://code.google.com/p/morelinq/
which has a DistintBy extension.

Converting SQL to LINQ needing to group records created per second

I am looking for a way in C# LINQ using lambda format to group records per second. in my search i have yet to find a good way to do this.
the SQL query is as follows.
select count(cct_id) as 'cnt'
,Year(cct_date_created)
,Month(cct_date_created)
,datepart(dd,cct_date_created)
,datepart(hh,cct_date_created)
,datepart(mi,cct_date_created)
,datepart(ss,cct_date_created)
from ams_transactions with (nolock)
where cct_date_created between dateadd(dd,-1,getdate()) and getdate()
group by
Year(cct_date_created)
,Month(cct_date_created)
,datepart(dd,cct_date_created)
,datepart(hh,cct_date_created)
,datepart(mi,cct_date_created)
,datepart(ss,cct_date_created)
now the closest i was able to come was the following but it is not giving me the right results.
var groupedResult = MyTable.Where(t => t.cct_date_created > start
&& t.t.cct_date_created < end)
.GroupBy(t => new { t.cct_date_created.Month,
t.cct_date_created.Day,
t.cct_date_created.Hour,
t.cct_date_created.Minute,
t.cct_date_created.Second })
.Select(group => new {
TPS = group.Key.Second
});
this appears to be grouping by seconds but not considering it as per individual minute in the date range and instead that second of every minute in the date range. To get Transactions per second i need it to consider each minute of the month, hour, day, minute separately.
The goal will be to pull a Max and Average then from this grouped list. Any help would be greatly appreciated :)
Currently you're selecting the second, rather than the count - why? (You're also using an anonymous type for no obvious reason - whenever you have a single property, consider just selecting that property instead of wrapping it in an anonymous type.)
So change your Select to:
.Select(group => new { Key = group.Key,
Transactions = group.Count() });
Or to have all of the key properties separately:
.Select(group => new { group.Month,
group.Day,
group.Hour,
group.Minute,
group.Second,
Transactions = group.Count() });
(As an aside, do you definitely not need the year part? It's in your SQL...)

Is possible to run a query with linq to search for a period of time?

Problem details:
SQL Server 2005;
Entity Framework 4.0.
I'm trying with linq to run a query for a period of time, only. Exemple:
I have the following datetime data in my server:
30/03/2012 12:53:22
30/03/2012 17:23:29
04/04/2012 11:10:14
04/04/2012 19:06:55
I want to run a query that will return me all data between the time (12:00 and 20:00) and the query have to return to me the following data:
30/03/2012 12:53:22
30/03/2012 17:23:29
04/04/2012 19:06:55
Or between (11:00 and 13:00) and the query have to return to me the following data:
30/03/2012 12:53:22
04/04/2012 11:10:14
How can i do this with linq? Is it possible (to ignore the date, use only the time)?
var filteredTimes = myContext.MyTable
.Where(r => SqlFunctions.DatePart("hour", r.DateField) >= 11 &&
SqlFunctions.DatePart("hour", r.DateField) <= 13);
You need to include System.Data.Objects.SqlClient to get SqlFunctions.
If you convert the date value to a double and use the fractional part of if, you get a number between 0 and 1 that represents a time in a day. Having that, it is trivial to test for time intervals, where e.g. 13:45 would be 0,5729166667 (or more precise: 13:45.345 is 0,572920679).
You can do this because EF (i.e. 4.3, the version I use) translates Cast and even Math functions into SQL:
mycontext.Data.Where(dt => dt.DateTime.HasValue)
.Select(dt => dt.DateTime.Value).Cast<double>()
.Select(d => d - Math.Floor(d))
.Where(... your comparisons
This translates to Sql containing CAST([date column] AS float) and Sql's FLOOR function.
After your comments:
It looked so easy, but I can't find a way to instruct EF to do a CAST on a single property in a Select(). Only the Cast<>() function is translated to CAST (sql), but that operates on a set.
Well, fortunately, there is another way:
mycontext.Data.Where(dt => dt.DateTime.HasValue)
.Select(dt => new
{
Date = DbFunctions.TruncateTime(dt.DateTime.Value),
Ms = DbFunctions.DiffMilliseconds(
DbFunctions.TruncateTime(dt.DateTime.Value), dt.DateTime.Value)
})
.Where(x => x.Ms > lower && x.Ms < upper)
where lower and upper are the TotalMilliseconds property of TimeSpan objects.
Note that this is horribly inefficient in sql! The date functions are repeated for each comparison. So make sure that you do the comparisons on a set of data that has been confined as much as possible by other criteria. It may even be better to fetch data in memory first and then do the comparisons.
Note: prior to EF6, DbFunctions was EntityFunctions.
If you had a date and a time column in your table (as possible with SQL Server 2008), you could do this directly in SQL.
As that's not the case, you have to do it like this:
// the first two lines can be omitted, if your bounds are already timespans
var start = startDate.TimeOfDay;
var end = endDate.TimeOfDay;
var filteredItems = context.Items.ToList()
.Where(x => x.DateTimeColumn.TimeOfDay >= start
&& x.DateTimeColumn.TimeOfDay <= end);
-You can use the TimeOfDay property on the dates to compare them.
string timeAsString = "13:00";
from f in TableRows
where f.Date.TimeOfDay > DateTime.Parse("11-12-2012 "+timeAsString).TimeOfDay
select f
EDIT
here is some code you can test that runs for me:
DateTime d = DateTime.Parse("12-12-2012 13:00");
List<DateTime> dates = new List<DateTime>();
dates.Add(DateTime.Parse("12-12-2012 13:54"));
dates.Add(DateTime.Parse("12-12-2012 12:55"));
dates.Add(DateTime.Parse("12-12-2012 11:34"));
dates.Add(DateTime.Parse("12-12-2012 14:53"));
var result = (from f in dates where f.TimeOfDay > d.TimeOfDay select f);
EDIT 2
Yea it seems that you needed to .ToList(), which, tbh you should have been able to figure out. I had no way of knowing what of collection you had. Not sure either of us deserve a downvote for trying to help you when you don't supply an awful amount of information on the problem

Categories