RavenDB index for nested query

RavenDB index for nested query - c#

I'm pretty new to RavenDB and am struggling to find a solution to the following:
I have a collection called ServiceCalls that look like this:
public class ServiceCall
{
public int ID { get; set; }
public string IncidentNumber { get; set; }
public string Category { get; set; }
public string SubCategory { get; set; }
public DateTime ReportedDateTime { get; set; }
public string Block { get; set; }
public decimal Latitude { get; set; }
public decimal Longitude { get; set; }
}
I have an index named ServiceCalls/CallsByCategory that looks like this:
Map = docs => from doc in docs
select new
{
Category = doc.Category,
CategoryCount = 1,
ServiceCalls = doc,
};
Reduce = results => from result in results
group result by result.Category into g
select new
{
Category = g.Key,
CategoryCount = g.Count(),
ServiceCalls = g.Select(i => i.ServiceCalls)
};
So the output is:
public class ServiceCallsByCategory
{
public string Category { get; set; }
public int CategoryCount { get; set; }
public IEnumerable<ServiceCall> ServiceCalls { get; set; }
}
using this query everything works as it should
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory") select i
Where I am absolutely lost is writing an index that would allow me to query by ReportedDateTime. Something that would allow me to do this:
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory")
where i.ServiceCalls.Any(x=>x.ReportedDateTime >= new DateTime(2012,10,1))
select i
Any guidance would be MUCH appreciated.

A few things,
You can't have a .Count() method in your reduce clause. If you look closely, you will find your counts are wrong. As of build 2151, this will actually throw an exception. Instead, you want CategoryCount = g.Sum(x => x.CategoryCount)
You always want the structure of the map to match the structure of the reduce. If you're going to build a list of things, then you should map a single element array of each thing, and use .SelectMany() in the reduce step. The way you have it now only works due to a quirk that will probably be fixed at some point.
By building the result as a list of ServiceCalls, you are copying the entire document into the index storage. Not only is that inefficient, but it's unnecessary. You would do better keeping a list of just the ids. Raven has an .Include() method that you can use if you need to retrieve the full document. The main advantage here is that you are guaranteed to have the most current data for each item you get back, even if your index results are still stale.
Putting all three together, the correct index would be:
public class ServiceCallsByCategory
{
public string Category { get; set; }
public int CategoryCount { get; set; }
public int[] ServiceCallIds { get; set; }
}
public class ServiceCalls_CallsByCategory : AbstractIndexCreationTask<ServiceCall, ServiceCallsByCategory>
{
public ServiceCalls_CallsByCategory()
{
Map = docs => from doc in docs
select new {
Category = doc.Category,
CategoryCount = 1,
ServiceCallIds = new[] { doc.ID },
};
Reduce = results => from result in results
group result by result.Category
into g
select new {
Category = g.Key,
CategoryCount = g.Sum(x => x.CategoryCount),
ServiceCallIds = g.SelectMany(i => i.ServiceCallIds)
};
}
}
Querying it with includes, would look like this:
var q = session.Query<ServiceCallsByCategory, ServiceCalls_CallsByCategory>()
.Include<ServiceCallsByCategory, ServiceCall>(x => x.ServiceCallIds);
When you need a document, you still load it with session.Load<ServiceCall>(id) but Raven will not have to make a round trip back to the server to get it.
NOW - that doesn't address your question about how to filter the results by date. For that, you really need to think about what you are trying to accomplish. All of the above would assume that you really want every service call shown for each category at once. Most of the time, that's not going to be practical because you want to paginate results. You probably DON'T want to even use what I've described above. I am making some grand assumptions here, but most of the time one would filter by category, not group by it.
Let's say you had an index that just counts the categories (the above index without the list of service calls). You might use that to display an overview screen. But you wouldn't be interested in the documents that were in each category until you clicked one and drilled into a details screen. At that point, you know which category you're in, and you can filter by it and reduce to a date range without a static index:
var q = session.Query<ServiceCall>().Where(x=> x.Category == category && x.ReportedDateTime >= datetime)
If I am wrong and you really DO need to show all documents from all categories, grouped by category, and filtered by date, then you are going to have to adopt an advanced technique like the one I described in this other StackOverflow answer. If this is really what you need, let me know in comments and I'll see if i can write it for you. You will need Raven 2.0 to make it work.
Also - be very careful about what you are storing for ReportedDateTime. If you are going to be doing any comparisons at all, you need to understand the difference between calendar time and instantaneous time. Calendar time has quirks like daylight savings transitions, time zone differences, and more. Instantaneous time tracks the moment something happened, regardless of who's asking. You probably want instantaneous time for your usage, which means either using a UTC DateTime, or switching to DateTimeOffset which will let you represent instantaneous time without losing the local contextual value.
Update
I experimented with trying to build an index that would use that technique I described to let you have all results in your category groups but still filter by date. Unfortunately, it's just not possible. You would have to have all ServiceCalls grouped together in the original document and express it in the Map. It doesn't work the same way at all if you have to Reduce first. So you really should just consider simple query for ServiceCalls once you are in a specific Category.

Could you add ReportedDateTime to the Map and aggregate it in the Reduce? If you only care about the max per category, something like this should be sufficient.
Map = docs => from doc in docs
select new
{
Category = doc.Category,
CategoryCount = 1,
ServiceCalls = doc,
ReportedDateTime
};
Reduce = results => from result in results
group result by result.Category into g
select new
{
Category = g.Key,
CategoryCount = g.Sum(x => x.CategoryCount),
ServiceCalls = g.Select(i => i.ServiceCalls)
ReportedDateTime = g.Max(rdt => rdt.ReportedDateTime)
};
You could then query it just based on the aggregated ReportedDateTime:
var q = from i in session.Query<ServiceCallsByCategory>("ServiceCalls/CallsByCategory")
where i.ReportedDateTime >= new DateTime(2012,10,1)
select i

Related

EF Core 5 check if all ids from filter exists in related entities

I have two models:
public class Employee
{
public int Id { get; set; }
public IList<Skill> { get; set; }
}
public class Skill
{
public int Id { get; set; }
}
And I have filter with list of skill ids, that employee should contain:
public class Filter
{
public IList<int> SkillIds { get; set; }
}
I want to write query to get all employees, that have all skills from filter.
I tried:
query.Where(e => filter.SkillIds.All(id => e.Skills.Any(skill => skill.Id == id)));
And:
query = query.Where(e => e.Skills
.Select(x => x.Id)
.Intersect(filter.SkillIds)
.Count() == filter.SkillIds.Count);
But as a result I get exception says that query could not be translated.

It is going to be a difficult, if not impossible task, to run a query like this on the sql server side.
This is because to make this work on the SQL side, you would be grouping each set of employee skills into a single row which would need to have a new column for every skill listed in the skills table.
SQL server wasn't really made to handle grouping with an unknown set of columns passed into a query. Although this kind of query is technically possible, it's probably not very easy to do through a model binding framework like ef core.
It would be easier to do this on the .net side using something like:
var employees = _context.Employees.Include(x=>x.Skill).ToList();
var filter = someFilter;
var result = employees.Where(emp => filter.All(skillID=> emp.skills.Any(skill=>skill.ID == skillID))).ToList()

This solution works:
foreach (int skillId in filter.SkillIds)
{
query = query.Where(e => e.Skills.Any(skill => skill.Id == skillId));
}
I am not sure about it's perfomance, but works pretty fast with small amount of data.

I've also encountered this issue several times now, this is the query I've come up with that I found works best and does not result in an exception.
query.Where(e => e.Skills.Where(s => filter.SkillIds.Contains(s.Id)).Count() == filter.SkillIds.Count);

NEST Search whole document C# Elasticsearch

I want to make a query over a million documents in Elasticsearch using Nest. My code:
var response = client.Search<MyObject>(s => s
.Index("test")
.Type("one")
.Query(q => q.
Term(
t => t.name, "A"
)
)
.Size(10000)
.Scroll("10m")
.Pretty()
);
My MyObject class:
public class MyObject
{
public int id { get; set; }
public int age { get; set; }
public string lastname { get; set; }
public string name { get; set; }
}
The problem is when this query is not found in the first 10k documents, it won't continue searching the rest of the results scroll API.
My question is how to achieve this (i.e moving through the whole pages in Scroll API despite there is no hits..)?

The query will search all documents, but will only return you the top .Size number of documents.
You can paginate results using .From() and .Size(), however, deep pagination is likely a concern when paginating over a million documents. For this, you would be better to use the scroll API to efficiently retrieve 1 million documents. NEST has an observable helper ScrollAll() to help with this
var client = new ElasticClient();
// number of slices in slice scroll
var numberOfSlices = 4;
var scrollObserver = client.ScrollAll<MyObject>("1m", numberOfSlices, s => s
.MaxDegreeOfParallelism(numberOfSlices)
.Search(search => search
.Index("test")
.Type("one")
.Term(t => t.name, "A")
)
).Wait(TimeSpan.FromMinutes(60), r =>
{
// do something with documents from a given response.
var documents = r.SearchResponse.Documents;
});

Simple way of converting a flat table from a stored procedure into a hierarchical structure in ASP.NET

I am rather new to programming, < 2 years. I am trying to take a flat table that is currently a stored procedure in MS-SQL and turn it into a complex data structure. What I'm trying to accomplish is returning all the changes for the various release versions of a project.
These are the model classes I currently have:
public class ReleaseNote
{
public string ReleaseVersion { get; set; }
public DateTime ReleaseDate { get; set; }
public List<ReleaseNoteItems> ReleaseNoteItems { get; set; }
}
public class ReleaseNoteItems
{
public string ChangeType { get; set; }
public List<string> Changes { get; set; }
}
And this is the business logic in the DAL class I have:
public IEnumerable<ReleaseNote> GetAllReleaseNotes()
{
string cmdText = ConfigurationManager.AppSettings["ReleaseNotesAll"];
Func<DataTable, List<ReleaseNote>> transform = releaseNoteTransform;
return getRecords<ReleaseNote>(cmdText, transform);
}
public List<ReleaseNote> releaseNoteTransform(DataTable data)
{
//DISTINCT LIST OF ALL VERSIONS (PARENT RECORDS)
var versions = data.AsEnumerable().Select(row => new ReleaseNote
{
ReleaseVersion = row["ReleaseVersion"].ToString(),
ReleaseDate = DateTime.Parse(row["ReleaseDate"].ToString())
}).Distinct().ToList();
//ENUMERATE VERSIONS AND BUILD OUT RELEASENOTEITEMS
versions.ForEach(version =>
{
//GET LIST OF ROWS THAT BELONG TO THIS VERSION NUMBER
var rows = data.AsEnumerable().Where(row => row["ReleaseVersion"].ToString() == version.ReleaseVersion).ToList();
//GET DISTINCT LIST OF CHANGE TYPES IN THIS VERSION
var changeTypes = rows.Select(row => row["ChangeType"].ToString()).Distinct().ToList();
//INSTANTIATE LIST FOR RELEASENOTE ITEMS
version.ReleaseNoteItems = new List<ReleaseNoteItems>();
//ENUMERATE CHANGE TYPES AND CREATE THEM
changeTypes.ForEach(changeType =>
{
//FILTER FOR CHANGES FOR THIS SPECIFIC CHANGE TYPE AND PROJECT TO LIST<STRING>
var changes = rows.Where(row => row["ChangeType"].ToString() == changeType)
.Select(row => row["ReleaseNote"].ToString()).ToList();
//CREATE THE ITEM AND POPULATE IT
var releaseNoteDetail = new ReleaseNoteItems();
releaseNoteDetail.ChangeType = changeType;
releaseNoteDetail.Changes = changes;
version.ReleaseNoteItems.Add(releaseNoteDetail);
});
});
return versions;
}
I'm presently using Postman to return a JSON object and the issue I'm presently having is that it is not returning unique objects or release versions, it is still giving me duplicates.
These are some links I've looked at. None I've found provide solutions for the specific implementation I'm using. I've tried different implementations, but it seems they fall outside the framework of what I'm trying to accomplish.
Please let me know if you need more information. I'm trying to follow the question protocol, but I'm sure there is something I've left out.
Thanks in advance!
Nice & universal way to convert List of items to Tree
Is there a way to easily convert a flat DataTable to a nested .NET object?Recursive method turning flat structure to recursive

Sounds like your data has duplicates. A given ReleaseVersion may have more than one record. When you take DISTINCT in your example, you are enforcing uniqueness over {ReleaseVersion, ReleaseDate}, which apparently is not good enough.
If you want to have rows that are unique with respect to ReleaseVersion, you need to figure out how to populate ReleaseDate when there is more than one possible value. I would suggest that it should be populated with the latest release date associated with that version. You can enforce that logic with LINQ GroupBy and Max, like this:
var uniqueRows = dt.AsEnumerable()
.GroupBy(row => row["ReleaseVersion"])
.Select (group => new ReleaseNote
{
ReleaseVersion = group.Key as string,
ReleaseDate = group.Max(row => (DateTime)row["ReleaseDate"])
}
);
This LINQ will create one row per release version. The release date will be populated with the latest (max) release date, given the release version.

How do I group on one of two possible fields using LINQ?

I am trying to get the latest contact with a given user, grouped by user:
public class ChatMessage
{
public string SentTo { get; set; }
public string SentFrom { get; set; }
public string MessageBody { get; set; }
public string SendDate { get; set; }
}
The user's contact info could either be in SentTo or SentFrom.
List<ChatMessage> ecml = new List<ChatMessage>();
var q = ecml.OrderByDescending(m => m.SendDate).First();
would give me the latest message, but I need the last message per user.
The closest solution I could find was LINQ Max Date with group by, but I cant seem to figure out the correct syntax. I would rather not create multiple List objects if I don't have to.
If the user's info is in SentTo, my info will be in SentFrom, and vice-versa, so I do have some way of checking where the user's data is.
Did I mention I was very new to LINQ? Any help would be greatly appreciated.

Since you need to interpret each record twice - i.e. as a SentTo and a SentFrom, the query becomes a bit tricky:
var res = ecml
.SelectMany(m => new[] {
new { User = m.SentFrom, m.SendDate }
, new { User = m.SentTo, m.SendDate }
})
.GroupBy(p => p.User)
.Select(g => new {
User = g.Key
, Last = g.OrderByDescending(m => m.SendDate).First()
});
The key trick is in SelectMany, which makes each ChatMessage item into two anonymous items - one that pairs up the SentFrom user with SendDate, and one that pairs up the SentTo user with the same date.
Once you have both records in an enumerable, the rest is straightforward: you group by the user, and then apply the query from your post to each group.

It should be pretty easy, look at this code:
string username = "John";
var q = ecml.Where(i=>i.SentFrom == username || i.SentTo == username).OrderByDescending(m => m.SendDate).First();
It simply filter your collection be choosing items which either SentFrom or SentTo is equal to username.

Linq groupBy, Sum and orderBy

Still learning Linq here. So I've got a class that looks like this (comes from a db table):
class FundRaisingData
{
public double funds { get; set; }
public DateTime date { get; set; }
public string agentsGender { get; set; }
}
What I need to do is transform this to list of anonymous objects that looks something like this (it will be transformed and returned as JSON later ). I know this is not an anonymous object, but it will give you some idea of what I'm trying to do:
class groupedFunds
{
public string gender { get; set; }
public List<double> fundsRaised { get; set; }
}
So I need to figure out how I can sum the funds for each year in the right order (2010-2014).
Eventually it should look like this in JSON:
ListOfGroupedFunds[
{
"gender" : "Female",
"fundsRaised" : [2000, 2500, 3000]
},
{
"gender" : "Male",
"fundsRaised": [4300,2300,3100]
}
];
So fundsRaised[0] would correspond to 2012, fundsRaised[1] would correspond to 2013, etc. but not actually say "2013" anywhere in the object.
I've read a ton of articles today on Linq and searched through similar StackOverflow articles but I still just can't quite figure it out. Any help in pointing me in the right direction would be awesome.
-- Edit 2 (changing code more closely match solution) --
I think the code by Mike worked well, but because I'm not sure how many years there will be in advance I've modified it slightly:
var results = data.GroupBy(g=> Gender = g.agentsGender)
.Select(g=> new
{
gender = g.Key
years = g.GroupBy(y=> y.Date.Year)
.OrderBy(y=> y.Key)
.Select(y=> y.Sum(z=> z.funds))
.ToArray()
})
.ToArray();
Is there anything wrong about the above? It seems to work but I'm open to better solutions of course.

Querying for any number of years is just another sub GroupBy() in your gender group.
var results = data.GroupBy(g=> Gender = g.agentsGender)
.Select(g=> new
{
gender = g.Key
years = g.GroupBy(y=> y.Date.Year)
.OrderBy(y=> y.Key)
.Select(y=> y.Sum(y.funds))
.ToArray()
})
.ToArray();

Thins LINQ statement will Group By Gender, then get the sum of funds per year as a List.
var query = from d in data
group d by d.agentsGender into g
select new { gender = g.Key, fundsRaised = new List<double> {
g.Where(f => f.date.Year == 2012).Sum(f => f.funds),
g.Where(f => f.date.Year == 2013).Sum(f => f.funds),
g.Where(f => f.date.Year == 2014).Sum(f => f.funds),
}};

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

RavenDB index for nested query - c#

Related

EF Core 5 check if all ids from filter exists in related entities

NEST Search whole document C# Elasticsearch

Simple way of converting a flat table from a stored procedure into a hierarchical structure in ASP.NET

How do I group on one of two possible fields using LINQ?

Linq groupBy, Sum and orderBy

Categories

Resources