RavenDB Select Distinct on a single property - c#

I have an object stored in RavenDB with three properties: ID, Score, Date.
I want to create an index for retrieving the top 5 scores within a given date range. However, I only want to retrieve one record per ID. If a single ID shows up more than once in the top scores, I only want to retrieve the highest score for that ID, then move on to the next ID.
example scores:
Score____ID____
1000 1
950 1
900 1
850 2
800 2
750 3
700 4
650 5
600 6
550 7
desired query results:
Score____ID____
1000 1
850 2
750 3
700 4
650 5
I have created an explicit index similar to this (adjusted for simplicity):
Map = docs => from doc in docs
orderby doc.Score descending
select new
{
Score = doc.Score,
ID = doc.ID,
Date = doc.Date
};
I call my query with code similar to this (adjusted for simplicity):
HighScores = RavenSession.Query<Score, Scores_ByDate>()
.Customize(x => x.WaitForNonStaleResultsAsOfNow())
.Where(x => x.Date > StartDate)
.Where(x => x.Date < EndDate)
.OrderByDescending(x => x.Score)
.Take(5)
.ToList();
I don't know how to say "only give me the results from each ID one time in the list."

So a few pointers:
Don't order in the Map function. Maps are designed to just dump documents out.
Use the Reduce to do grouping, as this is the way they work by design
Add a hint to RavenDB that a particular column will be sorted in code, and what type of field it is.
By default, the map/reduce assumes the sorting is for text, even if it is a number - (I learned this the hard way and got help for it.)
So..
Just define the Map/Reduce index as normal, and add a sort condition at the end, like this:
public class Score_TopScoringIndex : AbstractIndexCreationTask<Score, Score>
{
public Score_TopScoringIndex()
{
Map = docs => from doc in docs
select new
{
Score = doc.Score,
ID = doc.ID,
Date = doc.Date
};
Reduce = results => from result in results
group result by result.ID into g
select new
{
Score = g.First().Score,
ID = g.Key,
Date = g.First().Date
};
Sort(x=>x.Score, SortOptions.Int);
}
}
Make sure the index is in the DB by using at the startup of your application:
IndexCreation.CreateIndexes(typeof(Score_TopScoringIndex).Assembly, documentStore);
Now, when you query, the OrderByDescending, it will be very fast.
using(var session = store.OpenSession())
{
var highScores = session.Query<Score>("Scores/TopScoringIndex")
.OrderByDescending(x=>x.Score)
.Take(5);
}

You can try using morelinq library
https://code.google.com/p/morelinq/
which has a DistintBy extension.

Related

Using Linq to look for gap in dates

I have a single queryable that is ordered by id and then by date in descending order (most recent will appear first), tracking when the item (id) was online. If a gap is detected (more than 1 day) I want to filter out all of the dates that come prior to the gap, as to only get the most recent range of online days.
Currently I am looping through with for loops, however the data set is very large so I would like to improve performance using linq.
Are there any ways to compare the records by id, then date, and remove elements of that id after a gap is detected ( current.date - next.date != 1)?
Id
Date
1
2022/01/01
1
2021/12/31
1
2021/12/25
2
2021/12/20
2
2021/12/19
2
2021/12/18
2
2021/12/15
would return:
Id
Date
1
2022/01/01
1
2021/12/31
2
2021/12/20
2
2021/12/19
2
2021/12/18
var result = queryable
.GroupBy(entry => entry.Id)
.AsEnumerable()
.Select(entryGroup => entryGroup
.OrderByDescending(entry => entry.Date)
.Aggregate((EntryGroup: new List<Entry>(), GapDetected: false), (accumulated, current) =>
{
if (accumulated.GapDetected) return accumulated;
var previous = accumulated.EntryGroup.LastOrDefault();
if (previous == null || (previous.Date - current.Date).Days < 2) accumulated.EntryGroup.Add(current);
else accumulated.GapDetected = true;
return accumulated;
}))
.SelectMany(entryGroup => entryGroup.EntryGroup)
.ToList();
Note that only the GroupBy portion of the code is actually executed as an SQL query and the rest of the query is done locally since it can not be translated to SQL. I couldn't come up with a solution where the entire query could be translated to SQL but I wanted to show how this can be done with LinQ.

Using LINQ to aggregate and group a list of data into a new list but Sum is 0

I have a list of data retrieved from SQL and stored in a class. I want to now aggregate the data using LINQ in C# rather than querying the database again on a different dataset.
Example data I have is above.
Date, Period, Price, Vol and I am trying to create a histogram using this data. I tried to use Linq code below but seem to be getting a 0 sum.
Period needs to be a where clause based on a variable
Volume needs to be aggregated for the price ranges
Price needs to be a bucket and grouped on this column
I dont want a range. Just a number for each bucket.
Example output I want is (not real data just as example):
Bucket SumVol
18000 50
18100 30
18200 20
Attempted the following LINQ query but my SUM seems to be be empty. I still need to add my where clause in, but for some reason the data is not aggregating.
var ranges = new[] { 10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 20000 };
var priceGroups = eod.GroupBy(x => ranges.FirstOrDefault(r => r > x.price))
.Select(g => new { Price = g.Key, Sum = g.Sum(s => s.vol)})
.ToList();
var grouped = ranges.Select(r => new
{
Price = r,
Sum = priceGroups.Where(g => g.Price > r || g.Price == 0).Sum(g => g.Sum)
});
First things first... There seems to be nothing wrong with your priceGroups list. I've run that on my end and, as far as I can understand your purpose, it seems to be grabbing the expected values from your dataset.
var ranges = new[] { 10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 20000 };
var priceGroups = eod.GroupBy(x => ranges.FirstOrDefault(r => r > x.price))
.Select(g => new { Price = g.Key, Sum = g.Sum(s => s.vol) })
.ToList();
Now, I assume your intent with the grouped list was to obtain yet another anonymous type list, much like you did with your priceGroups list, which is also an anonymous type list... List<'a> in C#.
var grouped = ranges.Select(r => new
{
Price = r,
Sum = priceGroups.Where(g => g.Price > r || g.Price == 0).Sum(g => g.Sum)
});
For starters, your are missing the ToList() method call at the end of it. However, that's not the main issue here, as you could still work with an IEnumerable<'a> just as well for most purposes.
As I see it, the core problem is at your anonymous property Sum attribution. Why are your filtering for g.Price > r || g.Price == 0?
There is no element with Price equal to zero on your priceGroups list. Those are a subset of ranges, and there is no zero there. Then you are comparing every value in ranges against that subset in priceGroups, and consolidating the Sums of every element in priceGroups that have Price higher than the range being evaluated. In other words, the property Sum in your grouped list is a sum of sums.
Keep in mind that priceGroups is already an aggregated list. It seems to me you are trying to aggregate it again when you call the Sum() method after a Where() clause like you are doing. That doesn't make much sense.
What you want (I believe) for the Sum property in the grouped list is for it to be the same as the Sum property in the priceGroups list, if the range being evaluated matches the Price being evaluated. Furthermore, where there is no matches, you want your grouped list Sum to be zero, as that means the range being evaluated was not in the original dataset. You can achieve that with the following instead:
Sum = priceGroups.FirstOrDefault(g => g.Price == r)?.Sum ?? 0
You said your Sum was "empty" in your post, but that's not the behavior I saw on my end. Try the above and, if still not behaving as you would expect, share a small dataset for which you know the expected output with me and I can try to help you further.
Use LINQ instead to query the DB is great, mainly because you are saving process avoiding a new call to your DB. And in case you don't have a high update BD (that change the data very quickly) you can use the retrived data to calculate all using LINQ

Need to generate report using LINQ

I have a table and I want to generate report from it.
I need to generate reporter using LINQ given the route and shift it returns the list of dates with the number of subscriptions for each day.
i.e
given route 2022, shift 2120 it give me a list with the following
date count
2015-05-01 5
2015-05-02 10
2015-05-03 8
....
any clue ?
Your input does not match the output. But I am assuming you are showing partial input from your data. Then You can try,
var result = mydata.Where(x=>x.route.Equals("2022") && x.shift.Equals("2120"))
.GroupBy(x=> x.Date).Select(grp => new {Date = grp.Key, Count = grp.Count()});

Converting SQL to LINQ needing to group records created per second

I am looking for a way in C# LINQ using lambda format to group records per second. in my search i have yet to find a good way to do this.
the SQL query is as follows.
select count(cct_id) as 'cnt'
,Year(cct_date_created)
,Month(cct_date_created)
,datepart(dd,cct_date_created)
,datepart(hh,cct_date_created)
,datepart(mi,cct_date_created)
,datepart(ss,cct_date_created)
from ams_transactions with (nolock)
where cct_date_created between dateadd(dd,-1,getdate()) and getdate()
group by
Year(cct_date_created)
,Month(cct_date_created)
,datepart(dd,cct_date_created)
,datepart(hh,cct_date_created)
,datepart(mi,cct_date_created)
,datepart(ss,cct_date_created)
now the closest i was able to come was the following but it is not giving me the right results.
var groupedResult = MyTable.Where(t => t.cct_date_created > start
&& t.t.cct_date_created < end)
.GroupBy(t => new { t.cct_date_created.Month,
t.cct_date_created.Day,
t.cct_date_created.Hour,
t.cct_date_created.Minute,
t.cct_date_created.Second })
.Select(group => new {
TPS = group.Key.Second
});
this appears to be grouping by seconds but not considering it as per individual minute in the date range and instead that second of every minute in the date range. To get Transactions per second i need it to consider each minute of the month, hour, day, minute separately.
The goal will be to pull a Max and Average then from this grouped list. Any help would be greatly appreciated :)
Currently you're selecting the second, rather than the count - why? (You're also using an anonymous type for no obvious reason - whenever you have a single property, consider just selecting that property instead of wrapping it in an anonymous type.)
So change your Select to:
.Select(group => new { Key = group.Key,
Transactions = group.Count() });
Or to have all of the key properties separately:
.Select(group => new { group.Month,
group.Day,
group.Hour,
group.Minute,
group.Second,
Transactions = group.Count() });
(As an aside, do you definitely not need the year part? It's in your SQL...)

Identifying Date Clashes using linq

I am looking to identify rows using linq where there is a date clash. I have (for this example) 5 columns
ID ref_id ref_Name Borrow_Date Return_Date
1 1343 Gate 13/09/2011 20/09/2011
2 1352 Door 20/09/2011 22/09/2011
3 1343 Gate 17/09/2011 21/09/2011
In this case my 'Gate' is clashing because someone wants to borrow it when someone else also wants to borrow it.
Is there anyway to identify the date range clashes using linq easily?
One way would be like this. It might be more performant variants out there though:
var collisions = myList.Where( d1 => !myList.Where( d => d != d1).All( d2 => d1.Return_Date <= d2.Borrow_Date|| d1.Borrow_Date >= d2.Return_Date));
This will return all rows that overlap with at least one other row. In the case above it will return all three of them, since the line with ID 3 overlaps both 1 and 2. If you change 1 to have Return_Date 17/09/2011, it will return only 2 and 3.
If you have a list of objects with properties as shown in your table, you can find out the books with the same title that have conflicting dates using something like this:
(Haven't tested this code, so there might be some typo bugs.)
var collisions = collection
.Join(collection, x => x.ref_Name, y => y.ref_Name,
(x, y) => new {
ID_x = x.ID,
ID_y = y.ID,
ref_id = x.ref_id,
ref_Name = x.ref_Name,
Borrow_Date_x = x.Borrow_Date,
Borrow_Date_y = y.Borrow_Date,
Return_Date_x = x.Return_Date,
Return_Date_y = y.Return_Date
}
)
.Where( z => (z.Return_Date_x > z.Borrow_Date_y && z.Borrow_Date_x < z.Return_Date_y))
.Where( z => z.ID_x != z.ID_y);
You will probably get duplicates of results. (i.e. ID 1 and 3, and ID 3 and 1)
Although it is certainly possible to identify these clashes in the database once they have occurred, would it not be a better to prevent the second person from borrowing an item when it is already scheduled to be borrowed. In this case this would be a simple matter of testing to ensure no existing rows with a ref_id of 1343 have a return date equal to or greater then the new requested borrow date.

Categories