How to use GroupBy on an index in RavenDB? - c#

I have this document, a post :
{Content:"blabla",Tags:["test","toto"], CreatedOn:"2019-05-01 01:02:01"}
I want to have a page that displays themost used tags since the last 30 days.
So far I tried to create an index like this
public class Toss_TagPerDay : AbstractIndexCreationTask<TossEntity, TagByDayIndex>
{
public Toss_TagPerDay()
{
Map = tosses => from toss in tosses
from tag in toss.Tags
select new TagByDayIndex()
{
Tag = tag,
CreatedOn = toss.CreatedOn.Date,
Count = 1
};
Reduce = results => from result in results
group result by new { result.Tag, result.CreatedOn }
into g
select new TagByDayIndex()
{
Tag = g.Key.Tag,
CreatedOn = g.Key.CreatedOn,
Count = g.Sum(i => i.Count)
};
}
}
And I query it like that
await _session
.Query<TagByDayIndex, Toss_TagPerDay>()
.Where(i => i.CreatedOn >= firstDay)
.GroupBy(i => i.Tag)
.OrderByDescending(g => g.Sum(i => i.Count))
.Take(50)
.Select(t => new BestTagsResult()
{
CountLastMonth = t.Count(),
Tag = t.Key
})
.ToListAsync()
But this gives me the error
Message: System.NotSupportedException : Could not understand expression: from index 'Toss/TagPerDay'.Where(i => (Convert(i.CreatedOn, DateTimeOffset) >= value(Toss.Server.Models.Tosses.BestTagsQueryHandler+<>c__DisplayClass3_0).firstDay)).GroupBy(i => i.Tag).OrderByDescending(g => g.Sum(i => i.Count)).Take(50).Select(t => new BestTagsResult() {CountLastMonth = t.Count(), Tag = t.Key})
---- System.NotSupportedException : GroupBy method is only supported in dynamic map-reduce queries
Any idea how can I make this work ? I could query for all the index data from the past 30 days and do the groupby / order / take in memory but this could make my app load a lot of data.

The results from the map-reduce index you created will give you the number of tags per day. You want to have the most popular ones from the last 30 days so you need to do the following query:
var tagCountPerDay = session
.Query<TagByDayIndex, Toss_TagPerDay>()
.Where(i => i.CreatedOn >= DateTime.Now.AddDays(-30))
.ToList();
Then you can the the client side grouping by Tag:
var mostUsedTags = tagCountPerDay.GroupBy(x => x.Tag)
.Select(t => new BestTagsResult()
{
CountLastMonth = t.Count(),
Tag = t.Key
})
.OrderByDescending(g => g.CountLastMonth)
.ToList();

#Kuepper
Based on your index definition. You can handle that by the following index:
public class TrendingSongs : AbstractIndexCreationTask<TrackPlayedEvent, TrendingSongs.Result>
{
public TrendingSongs()
{
Map = events => from e in events
where e.TypeOfTrack == TrackSubtype.song && e.Percentage >= 80 && !e.Tags.Contains(Podcast.Tags.FraKaare)
select new Result
{
TrackId = e.TrackId,
Count = 1,
Timestamp = new DateTime(e.TimestampStart.Year, e.TimestampStart.Month, e.TimestampStart.Day)
};
Reduce = results => from r in results
group r by new {r.TrackId, r.Timestamp}
into g
select new Result
{
TrackId = g.Key.TrackId,
Count = g.Sum(x => x.Count),
Timestamp = g.Key.Timestamp
};
}
}
and the query using facets:
from index TrendingSongs where Timestamp between $then and $now select facet(TrackId, sum(Count))

The reason for the error is that you can't use 'GroupBy' in a query made on an index.
'GroupBy' can be used when performing a 'dynamic query',
i.e. a query that is made on a collection, without specifying an index.
See:
https://ravendb.net/docs/article-page/4.1/Csharp/client-api/session/querying/how-to-perform-group-by-query

I solved a similar problem, by using AdditionalSources that uses dynamic values.
Then I update the index every morning to increase the Earliest Timestamp. await IndexCreation.CreateIndexesAsync(new AbstractIndexCreationTask[] {new TrendingSongs()}, _store);
I still have to try it in production, but my tests so far look like it's a lot faster than the alternatives. It does feel pretty hacky though and I'm surprised RavenDB does not offer a better solution.
public class TrendingSongs : AbstractIndexCreationTask<TrackPlayedEvent, TrendingSongs.Result>
{
public DateTime Earliest = DateTime.UtcNow.AddDays(-16);
public TrendingSongs()
{
Map = events => from e in events
where e.TypeOfTrack == TrackSubtype.song && e.Percentage >= 80 && !e.Tags.Contains(Podcast.Tags.FraKaare)
&& e.TimestampStart > new DateTime(TrendingHelpers.Year, TrendingHelpers.Month, TrendingHelpers.Day)
select new Result
{
TrackId = e.TrackId,
Count = 1
};
Reduce = results => from r in results
group r by new {r.TrackId}
into g
select new Result
{
TrackId = g.Key.TrackId,
Count = g.Sum(x => x.Count)
};
AdditionalSources = new Dictionary<string, string>
{
{
"TrendingHelpers",
#"namespace Helpers
{
public static class TrendingHelpers
{
public static int Day = "+Earliest.Day+#";
public static int Month = "+Earliest.Month+#";
public static int Year = "+Earliest.Year+#";
}
}"
}
};
}
}

Related

Linq query optimization/summarize

I have the following query:
var countA=await _importContext.table1.CountAsync(ssc => ssc.ImportId == importId)
var countB=await _importContext.table2.CountAsync(ssc => ssc.ImportId == importId)
var countC=await _importContext.table3.CountAsync(ssc => ssc.ImportId == importId)
var countD=await _importContext.table4.CountAsync(ssc => ssc.ImportId == importId)
There are 9 more count from different tables. Is there a way to summarize the query in terms of optimizing & removing redundancy?
I tried wrapping up the queries like:
var result = new
{
countA = context.table1.Count(),
countB = context.table2.Count(),
.....
};
but this takes more time than the first one above.
You can't really optimise it as you seem to need the counts from all of the tables. Your second method of getting the data still calls the database the same amount of times as the first but also creates an object containing all of the counts so is likely to take longer.
The only thing you can really do to make it faster is to get the data in parallel but this might be overkill. I would just go with your first option unless it's really slow.
You can create such query via gouping by constant and Concat operator:
Helper class:
public class TableResult
{
public string Name { get; set; }
public int Count { get; set; }
}
Query:
var query = _importContext.table1.Where(ssc.ImportId == importId).GroupBy(x => 1).Select(g => new TableResult { Name = "table1", Count = g.Count() })
.Concat(_importContext.table2.Where(ssc.ImportId == importId).GroupBy(x => 1).Select(g => new TableResult { Name = "table2", Count = g.Count() }))
.Concat(_importContext.table3.Where(ssc.ImportId == importId).GroupBy(x => 1).Select(g => new TableResult { Name = "table3", Count = g.Count() }))
.Concat(_importContext.table4.Where(ssc.ImportId == importId).GroupBy(x => 1).Select(g => new TableResult { Name = "table4", Count = g.Count() }));
var result = await query.ToListAsync();

Query CosmosDB document using C#

I have documents stored in cosmos db, I have multiple documents for a same "stationkey"(partition key), in this example stationkey "ABC" has more than one documents with "yymm" has "2018-02" & "2018-01" e.t.c,
query that i am trying is get all "avg" & "dd" fields along with "yymm" for the given stationkey and yymm filter combination
I am trying to query using C#, I am trying to get "avg", "dd" & "yymm" fields from "data" array, the query that I have written is giving entire "data" array.
var weatherQuery = this.docClient.CreateDocumentQuery<WeatherStation>(docUri, queryOptions)
.Where(wq => wq.stationName == stationKey && lstYearMonthFilter.Contains(wq.yearMonth))
.Select(s => s.data);
what is the best way to query specific fields in from a document array?
So you got the data in s => s.data. To get only the avg from the array you have to do another projection as following:
.Select (s => s.data.Select ( a => a.avg ))
Modifying my answer as you say you don't find 'Select' on 'data'.
Define a class MyDocument as such:
public class Datum
{
[JsonProperty("dd")]
public string dd;
[JsonProperty("max")]
public int max;
[JsonProperty("min")]
public int min;
[JsonProperty("avg")]
public int avg;
}
public class MyDocument : Document
{
[JsonProperty("id")]
public string id;
[JsonProperty("data")]
public Datum[] data;
}
modify your code accordingly
IDocumentQuery<MyDocument> query = client.CreateDocumentQuery<MyDocument>(UriFactory.CreateDocumentCollectionUri(_database, _collection),
new FeedOptions { MaxItemCount = -1, EnableCrossPartitionQuery = true, MaxDegreeOfParallelism = 199, MaxBufferedItemCount = 100000})
.Where(predicate)
.AsDocumentQuery();
while (query.HasMoreResults)
{
FeedResponse<MyDocument> feedResponse = await query.ExecuteNextAsync<MyDocument>();
Console.WriteLine (feedResponse.Select(x => x.data.Select(y => y.avg)));
}
HTH
You can select only specific fields from the array items using a double-nested anonymous class - see the altered SelectMany below. This will return yymm with every Datum, so may not be as efficient as just selecting the entire array - definitely measure the RU/s in both cases.
var weatherQuery = this.docClient.CreateDocumentQuery<WeatherStation>(docUri, queryOptions)
.Where(wq => wq.stationName == stationKey && lstYearMonthFilter.Contains(wq.yearMonth))
.SelectMany(x => x.data.Select(y => new { x.yymm, data = new[] { new { y.dd, y.avg } } }))
.AsDocumentQuery();
var results = new List<WeatherStation>();
while (weatherQuery.HasMoreResults)
{
results.AddRange(await weatherQuery.ExecuteNextAsync<WeatherStation>());
}
var groupedResults = results
.GroupBy(x => x.yymm)
.Select(x => new { x.First().yymm, data = x.SelectMany(y => y.data).ToArray() })
.Select(x => new WeatherStation() { yymm = x.yymm, data = x.data });

Include count = 0 in linq results

I have a table having TeamName and CurrentStatus fields. I am making a linq query to get for each team and for each status the count of records:
var teamStatusCounts = models.GroupBy(x => new { x.CurrentStatus, x.TeamName })
.Select(g => new { g.Key, Count = g.Count() });
The results of this query returns all the counts except where count is 0. I need to get the rows where there is no record for a specific team and a specific status (where count = 0).
You could have a separate collection for team name and statuses you are expecting and add the missing ones to the result set
//assuming allTeamNamesAndStatuses is a cross joing of all 'CurrentStatus' and 'TeamNames'
var teamStatusCounts = models.GroupBy(x => new { x.CurrentStatus, x.TeamName })
.Select(g => new { g.Key, Count = g.Count() })
.ToList();
var missingTeamsAndStatuses = allTeamNamesAndStatuses
.Where(a=>
!teamStatusCounts.Any(b=>
b.Key.CurrentStatus == a.CurrentStatus
&& b.Key.TeamName == a.TeamName))
.Select(a=>new {
Key = new { a.CurrentStatus, a.TeamName },
Count = 0
});
teamStatusCounts.AddRange(emptyGroups);
I've created a fiddle demonstrating the answer as well
I would select the team names and status first:
var teams = models.Select(x => x.TeamName).Distinct().ToList();
var status = models.Select(x => x.CurrentStatus).Distinct().ToList();
You can skip this if you know the list entries already.
Then you can select for each team and each state the number of models:
var teamStatusCounts = teams.SelectMany(team => states.Select(state =>
new
{
TeamName = team,
CurrentStatus = state,
Count = models.Count(model =>
model.TeamName == team && model.CurrentStatus == state)
}));

LINQ:Select Newest From Each Group

So, I know I'm very rusty, but I never thought this would be so difficult in spite of spending hours trying different solutions. I'm trying to select the newest record from each group, after eliminating a particular clause and then binding it to a grid. Having essentially no luck... this is where I left off:
var widgets = db.Updates
.GroupBy(c => c.widgetType)
.SelectMany(s => s)
.Where(c => c.Sold.Equals(false))
.OrderByDescending(x => x.TimeStamp)
.FirstOrDefault();
var list = new List<Update>() {widgets};
widgetsGrid.DataSource = list;
widgetsGrid.DataBind();
I added the cast to list since I was getting a data type error, and at present it returns only the last item of all records, rather than the last item from each group.
Thanks in advance for any help!!!
The OrderByDescending should be on the level of each group. You also don't need SelectMany because this will join the groups back into a flat list.
var widgets = Updates
.GroupBy(c => c.widgetType)
.Where(c => c.Sold.Equals(false))
.Select(x => x.OrderByDescending(y => y.TimeStamp).First());
var widgets = db.Updates
.GroupBy(c => c.widgetType)
.SelectMany(s => s)
.Where(c => c.Sold.Equals(false))
.OrderByDescending(x => x.TimeStamp)
.Select(y=>y.First());
var list = new List<Update>() {widgets};
widgetsGrid.DataSource = list;
widgetsGrid.DataBind();
It seems the key thing you need to do is order by and select the first item from the list. Here's an example program:
class Program
{
static void Main(string[] args)
{
var foodOrders = new List<FoodOrder>
{
new FoodOrder { FoodName = "hotdog", OrderDate = new DateTime(2016, 7, 7) },
new FoodOrder { FoodName = "hamburger", OrderDate = new DateTime(2016, 7, 6) },
new FoodOrder { FoodName = "taco", OrderDate = new DateTime(2016, 7, 5) },
};
var mostRecentFoodOrder = foodOrders.OrderByDescending(f => f.OrderDate).First().FoodName;
Console.WriteLine(mostRecentFoodOrder);
//cmd
//hotdog
//Press any key to continue . . .
}
}
class FoodOrder
{
public string FoodName { get; set; }
public DateTime OrderDate { get; set; }
}
var widgets = db.Updates
.Where(c => c.Sold.Equals(false))
.GroupBy(c => c.widgetType)
.Select(x => x.OrderByDescending(y => y.TimeStamp).First()).ToList();
widgetGrid.DataSource = widgets;
widgetGrid.DataBind();
I've seen this question asked a hundred times all over the web... this worked for me. Courtesy of, and many thanks to #Ahmad Ibrahim

Classify values on ranges

I have an enumeration as follows:
public enum BPLevel {
Normal = 1,
HighNormal = 2,
HypertensionStage1 = 3,
ModerateHypertensionStage2 = 4,
SeverHypertensionStage3 = 5,
} // BloodPressureLevel
And I have the following classification:
I am using Entity Framework and I need count how many persons are in each level:
IDictionary<BPLevel, Int32> stats = context
.Persons
.Select(x => new { PersonId = x.Person.Id, BPDiastolic = x.BPDiastolic, BPSystolic = x.BPSystolic })
.Count( ...
My problem is how can I apply this classification in my query?
I would just add a classification member that is assigned to the result of a function call
IDictionary<BPLevel, Int32> stats = context
.Persons
.Select(x => new { PersonId = x.Person.Id, BPDiastolic = x.BPDiastolic,
BPSystolic = x.BPSystolic,
Classification = GetClassification(BPDiastolic, BPSystolic) })
.Count( ...
BPLevel GetClassification(int diastolic, int systolic)
{
...
}
Queries to EF sometimes don't like operations happening inside the queries, so you may need to do a ToList before the Select to get it into memory (so its LINQ to objects).
What I'd do in this case, is make a helper property in Person
public BPLevel BpLevel
{
get
{
if(Systolic >= 180)
return BPLevel.SeverHypertensionStage3
else if
...
}
}
and then I'd do a group by
.ToList() // you need to execute against the DB before you call the helper property
.GroupBy(x => x.BPLevel)
.Select(x => /*moar data transformation
x is a collection of Person
x.Key, is the BPLevel*/ )
Make sure that you do that ToList() part, or else you might get a not supported exception when it tries to convert your helper property to SQL
Another option would be to Select into a concrete class that has a BPLevel property getter that does the classification for you:
public class PersonWithBP {
// other properties
public BPLevel BPClassification {
get {
// logic to calculate BPLevel
return bpLevel;
}
}
Now your select becomes
.Select(x => new PersonWithBP() {}
Using a GroupBy statement would allow you to classify everyone into their respective BPLevel, which which point you merely need to perform a ToDictionary and count the people in each category. Thus
IDictionary<BPLevel, Int32> stats = context
.Persons
.Select(x => new { PersonId = x.Person.Id, BPDiastolic = x.BPDiastolic, BPSystolic = x.BPSystolic })
.AsEnumerable() // I'm not completely familiar with Entity Framework, so this line may be necessary to force evaluation to continue in-memory from this point forward
.GroupBy(p => ... // Test which returns a BPLevel)
.ToDictionary(g => g.Key, g => g.Count());
I would write a private static helper function in your class that does the classification for you and insert a call to that function in your projection. Something like:
private static BPLevel ClassifyBP(int diastolic, int systolic) {
// Appropriate switch statement here
}
and then your Select projection looks like:
.Select(x => new { PersonId = x.Person.Id,
BPDiastolic = x.BPDiastolic,
BPSystolic = x.BPSystolic,
BPLevel = ClassifyBP(x.BPDiastolic, x.BPSystolic) })
This is not pretty, but the count query will be executed in database.
var stats = context.Persons
.Select(x => new
{
Level = x.BPDiastolic < 85 && x.BPSystolic < 130
? BPLevel.Normal
: (x.BPDiastolic < 90 && x.BPSystolic < 140
? BPLevel.HighNormal
: (x.BPDiastolic < 100 && x.BPSystolic < 160
? BPLevel.HypertensionStage1)
: (x.BPDiastolic < 110 && x.BPSystolic < 180
? BPLevel.ModerateHypertensionStage2
: BPLevel.SeverHypertensionStage3)))
})
.GroupBy(x => x.Level)
.ToDictionary(x => x.Key, g => g.Count()) // execute in database
.Union(Enum.GetValues(typeof(BPLevel))
.OfType<BPLevel>()
.ToDictionary(x => x, x => 0)) // default empty level
.GroupBy(x => x.Key)
.ToDictionary(x => x.Key, x => x.Sum(y => y.Value)); // combine both

Categories