Understanding MongoDb .NET driver indexing - c#

I'm currently working on an application where MongoDb is used for quite a large amount of data.
The objects I'm storing in MongoDb looks like this:
public class PowerPlantDataReading
{
[BsonId]
public int ID { get; set; }
[BsonElement("EDIEL")]
public string EDIEL { get; set; }
[BsonElement("EndDate")]
public DateTime EndDate { get; set; }
[BsonElement("Created")]
public DateTime Created { get; set; }
[BsonElement("DataReading")]
public DataReading DataReading { get; set; }
}
public class DataReading
{
[BsonElement("Version")]
public int Version { get; set; }
[BsonElement("OriginalId")]
public int OriginalId { get; set; }
[BsonElement("Unit")]
public string Unit { get; set; }
[BsonRepresentation(MongoDB.Bson.BsonType.Double)]
[BsonElement("Quantity")]
public decimal Quantity { get; set; }
[BsonElement("Quality")]
public string Quality { get; set; }
[BsonElement("StartDate")]
public DateTime StartDate { get; set; }
}
And the query I'm running against MongoDb looks like this:
DateTime startDateUtc = DateTime.UtcNow.AddDays(-5);
DateTime endDateUtc = DateTime.UtcNow;
var queryBuilder = Builders<PowerPlantDataReading>.Filter;
var filter = queryBuilder.Where(x => x.EndDate >= startDateUtc && x.EndDate < endDateUtc);
var query = collection.Find(filter).ToListAsync();
return query.Result;
The query returns around 825.000 objects, but takes well over 4 minutes to run.
I then tried to create an index like this:
IMongoCollection<PowerPlantDataReading> collection = GetCollection();
collection.Indexes.CreateOne(Builders<PowerPlantDataReading>.IndexKeys.Descending(x => x.EndDate));
Then ran the query again, but to my surprise, it didn't make a difference at all.
I'm not sure if I'm creating the index correctly? If not, how should I create my index to get the best possible performance for the query?
Thanks in advance.

Related

Filtering on the Collection Navigation property

I would like to filter my 'TranslationSet' entities, based on their 'Translations' Collection Navigation Property.
E.g.
If a 'Translation' has a 'LanguageId' of 5 (Italian), then the 'TranslationSet' that contains this 'Translation' should be removed from the result.
Here are my Entity classes:
public class Language
{
public int LanguageId { get; set; }
public string NationalLanguage { get; set; }
//Make table multi tenanted.
public int TenantId { get; set; }
public ApplicationTenant Tenant { get; set; }
public List<Translation> Translation { get; set; } = new List<Translation>();
}
public class Translation
{
public int TranslationId { get; set; }
public string TranslatedText { get; set; }
public int LanguageId { get; set; }
public Language Language { get; set; }
//Make table multi tenanted.
public int TenantId { get; set; }
public ApplicationTenant Tenant { get; set; }
public int TranslationSetId { get; set; }
public TranslationSet TranslationSet {get; set;}
}
public class TranslationSet
{
public int TranslationSetId { get; set; }
public int TenantId { get; set; }
public ApplicationTenant Tenant { get; set; }
public IEnumerable<Translation> Translations { get; set; }
}
Here is my attempt
From the image you can see that the query fails because a Translation exists with LanguageId of 5.
I have tried many many attempts to resolve this but I can't even get close the LINQ which returns my query correctly.
Please let me know if any further clarification is needed and thanks in advance to anybody who offers help.
My rule of the thumb that nearly always work is: start by querying the entities you want. That will prevent duplicates as you see in your query result. Then add predicates to filter the entities, using navigation properties. That will be:
var sets = TranslationSets // start the query here
.Where(ts => ts.Translations.All(t => t.LanguageId != 5)); // Filter
Or if you like this better:
var sets = TranslationSets // start the query here
.Where(ts => !ts.Translations.Any(t => t.LanguageId == 5)); // Filter
EF will translate both queries as WHERE NOT EXISTS.

Entity Framework - Loading Subclass Entities with TPT Setup

Our system is receiving input from two external sources (phone call/web submission).
// Table-Per-Type Hierarchy
public class Submission
{
public int SubmissionId { get; set; } // Primary Key
public int? PersonId { get; set; }
public int? CompanyId { get; set; }
public long? EmployeeId { get; set; }
public bool? Completed { get; set; }
public string AbsenceReason { get; set; }
public string AbsenceType { get; set; }
public DateTime? AbsenceDate { get; set; }
}
public class CallSubmission : Submission
{
public string CallerId { get; set; }
public string PhoneNumber { get; set; }
public DateTime? HangUp { get; set; }
public DateTime? PickUp { get; set; }
}
public class WebSubmission : Submission
{
public string EmailAddress { get; set; }
public string PhoneNumber { get; set; }
public DateTime SubmissionDate { get; set; }
}
My goal is to retrieve all submissions within the past seven days using PickUp/SubmissionDate depending on the type of submission we're dealing with. Is it possible to achieve this with a single LINQ statement? Ideally, I'd like to avoid having to load two different data sets in-memory.
Statements I'm hoping to integrate
Users.Where(user => user.UserName == name)
.SelectMany(user => user.Submissions)
.OfType<CallSubmission)()
.Where(call => call.PickUp >= startDate)
Users.Where(user => user.UserName == name)
.SelectMany(user => user.Submissions)
.OfType<WebSubmission>()
.Where(web => web.SubmissionDate >= startDate)
Actually (surprisingly for me) what are you asking is possible (at least in the latest EF6.1.3) since the C# is and as operators are supported (they are basically used by OfType method).
var query = db.Users
.Where(user => user.UserName == name)
.SelectMany(user => user.Submissions)
.Where(subm => (subm as CallSubmission).PickUp >= startDate
|| (subm as WebSubmission).SubmissionDate >= startDate);
The important part is to use as operator and not cast which generates unsupported exception. There is no need to check for null because the generated SQL query handles NULLs naturally.

Is this the right way of using Dapper or am I doing it all wrong?

I am trying to get away from the Entity Framework since I have to support HANA Databases aside from SQL server Databases in our solution.
I am doing some research with dapper so I created a quick test environment with some fictitious scenario.
I have the following POCOs that resemble my Database schema (I have more but I limited to showing these for simplicity):
public class Adopter
{
public int Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public string Address { get; set; }
public string Address2 { get; set; }
public string City { get; set; }
public State State { get; set; }
public int StateId { get; set; }
public string Zip { get; set; }
public string Email { get; set; }
public string Phone { get; set; }
public string Fax { get; set; }
public IEnumerable<Pet> Pets { get; set; }
}
public class State
{
public int Id { get; set; }
public string Name { get; set; }
public string Abreviation { get; set; }
}
public class Pet
{
public int Id { get; set; }
public string IdTag { get; set; }
public string Name { get; set; }
public DateTime AdmitionDate { get; set; }
public Status Status { get; set; }
public int StatusId { get; set; }
public string Notes { get; set; }
public DateTime AdoptionDate { get; set; }
public bool IsAdopted { get; set; }
public int? AdopterId { get; set; }
public int Age { get; set; }
public decimal Weight { get; set; }
public string Color { get; set; }
public Breed Breed { get; set; }
public int BreedId { get; set; }
public Gender Gender { get; set; }
public int GenderId { get; set; }
public IEnumerable<PetImage> PetImages { get; set; }
}
public class Status
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
}
public class Gender
{
public int Id { get; set; }
public string Name { get; set; }
}
I am using the following in a repository to return a list of all the adopters:
using (SqlConnection connection = new SqlConnection(_connectionString))
{
var adopters = connection.Query<Adopter>("SELECT a.* FROM Adopters a");
foreach (var adopter in adopters)
{
adopter.State = connection.QueryFirst<State>("Select s.* FROM States s WHERE s.Id = #Id", new { Id = adopter.StateId });
adopter.Pets = connection.Query<Pet>("Select p.* FROM Pets p WHERE p.AdopterId = #Id", new { Id = adopter.Id });
foreach (var pet in adopter.Pets)
{
pet.Status = connection.QueryFirst<Status>("Select s.* FROM Status s WHERE s.Id = #Id", new { Id = pet.StatusId });
pet.Gender = connection.QueryFirst<Gender>("Select g.* FROM Genders g WHERE g.Id = #Id", new { Id = pet.GenderId });
}
}
return adopters;
}
As you can see, I am retrieving the data for each POCO individually based on the previous one and doing the Joins manually in code.
Is this the right way of doing it or should I be doing a big query with multiple joins and mapping the result somehow thru dapper and LINQ?
A possible improvement to your actual solution is through the use of QueryMultiple extension like this:
using (SqlConnection connection = new SqlConnection(_connectionString))
{
string query = #"SELECT * FROM Adopters;
SELECT * FROM States;
SELECT * FROM Pets;
SELECT * FROM Status;
SELECT * FROM Genders;";
using (var multi = connection.QueryMultiple(query, null))
{
var adopters = multi.Read<Adopter>();
var states = multi.Read<State>();
var pets = multi.Read<Pet>();
var statuses = multi.Read<Status>();
var genders = multi.Read<Gender>();
foreach (Adopter adp in adopters)
{
adp.State = states.FirstOrDefault(x => x.Id == adp.StateID);
adp.Pets = pets.Where(x => x.IsAdopted &&
x.AdopterID.HasValue &&
x.AdopterID.Value == adp.AdopterID)
.ToList();
foreach(Pet pet in adp.Pets)
{
pet.Status = statuses.FirstOrDefault(x => x.Id == pet.StatusID);
pet.Gender = genders.FirstOrDefault(x => x.Id == pet.GenderID);
}
}
}
}
The benefit here is that you reach the database just one time and then process everything in memory.
However this could be a performance hit and a memory bottleneck if you have a really big data to retrieve, (and from a remote location). Better to look closely at this approach and try also some kind of Async processing and/or pagination if possible.
I don't like to be negative, but... don't do this! Don't even think like this. You want to dump EF, but you're walking into the trap by wanting to emulate EF. The bridge between your app and your DB is not something to be built once for all time, for every conceivable purpose. Concretely, you shouldn't really ever bring back a whole table, and certainly not to then loop on every row and emit more queries. You may feel unjustly criticised, you were just testing the tools ! If so, perhaps tell us what aspect of the tool your examining, and we'll focus in on that.
Dapper or QueryFirst greatly simplify running queries, and consuming the results, so bring back just what you need, just when you need it. Then denormalize a little, for the specific job in hand. Why are there no joins in your queries? RDBMSs are amazing, and amazingly good at doing joins. If you're joining data outside the DB, crazy is the only word, even if Linq gives you a super (sql-like) syntax for doing it. The unthinking assumption that 1 table corresponds to 1 class is the start of a lot of problems.

Entity Framework Adding Child Objects to DB

I am currently stuck on adding a new child entity to my database using lamda queries.
The structure of my database is that Area has a one to many relationship with
Shifts
In my seeding database I populate the Shifts while creating the Areas:
new Area()
{
AreaDesc = "Area 1",
AreaActive = true,
AreaCreatedDate = DateTime.Now,
SHFID = new Shift()
{
StartTime = new TimeSpan (5,30,00),
EndTime = new TimeSpan (11, 00, 00),
RequiredResources = 2,
ShiftDesc = "AM Shift",
ShiftDayID = 1
}
}
That works fine, where I am struggling and probably due to a simple lack of understanding on entity frameworks abilities is adding a new Shift to an existing Area.
So far I have the following
var AreaVal = _context.Areas.Where(a => a.AreaID == AreaID).ToList();
var Shift = new Shift
{
Area = AreaVal,
StartTime = StartTime,
EndTime = EndTime,
ShiftDayID = model.ShiftDayID,
ShiftDesc = model.ShiftDesc
};
Thinking that once I had the correct Area (I have the ID coming from the model) I could load the Area and pass it as the Area parameter in the Shift and entity framework would know what to do.
The Error I get in the parser is:
Cannot implicitly convert type (Generic.List to
Models.Area.
I have also considered going from the other direction using _context.Areas.Update() but have been unable to work that one out very well.
Extra Info, Model Structures
Shift.cs
public class Shift
{
[Key]
public int SHFID { get; set; }
public TimeSpan StartTime { get; set; }
public TimeSpan EndTime { get; set; }
public int RequiredResources { get; set; }
public string ShiftDesc { get; set; }
public int ShiftDayID { get; set; }
public DateTime ShiftExDateStart { get; set; }
public DateTime ShiftExDateEnd { get; set; }
public int ShiftExLevel { get; set; }
public TimeSpan ShiftExStartTime { get; set; }
public TimeSpan ShiftExEndTime { get; set; }
public Area Area { get; set; }
}
Area.cs
public class Area
{
[Key]
public int AreaID { get; set; }
public string AreaDesc { get; set; }
public Boolean AreaActive { get; set; }
public DateTime AreaCreatedDate { get; set; }
public List<Shift> SHFID { get; set; }
public Company Company { get; set;}
}
You are on the right track.
AreaVal needs to be a single entity (Area), not a list of entities (List<Area>). Then it should work as expected.
Change the line:
var AreaVal = _context.Areas.Where(a => a.AreaID == AreaID).ToList();
to
var AreaVal = _context.Areas.Where(a => a.AreaID == AreaID).Single();

How to do nested group by in RavenDB multi map index

I have two different document collections in my RavenDB database - Teams and Matches. The documents look like this:
public class Team {
public string Id { get; set; }
public string Name { get; set; }
public int LeaguePosition { get; set; }
}
public class Match {
public string Id { get; set; }
public string HomeTeamName { get; set; }
public string AwayTeamName { get; set; }
public DateTime StartTime { get; set; }
}
So basically I have teams and matches between these teams. However, for certain operations I need to get an entity which look something like the following from the database:
public class MatchWithExtraData {
public string Id { get; set; } // Id from the match document.
public string HomeTeamId { get; set; }
public string HomeTeamName { get; set; }
public int HomeTeamPosition { get; set; }
public string AwayTeamId { get; set; }
public string AwayTeamName { get; set; }
public int AwayTeamPosition { get; set; }
public DateTime? StartTime { get; set; }
}
What I want is really the match document but with extra fields for the home and away teams' ids and league positions. Basically join the match document on home and away team name with two team documents, one for the home team and one for the away team. I figured that a multi map/reduce index should do the trick so I have started with the following index:
public class MatchWithExtraDataIndex: AbstractMultiMapIndexCreationTask<MatchWithExtraData> {
public MatchWithExtraData() {
AddMap<Team>(
teams => from team in teams
select new {
Id = (string)null,
HomeTeamId = team.Id,
HomeTeamName = team.Name,
HomeTeamPosition = team.LeaguePosition,
AwayTeamId = team.Id,
AwayTeamName = team.Name,
AwayTeamPosition = team.LeaguePosition,
StartTime = (DateTime?)null
}
);
AddMap<Match>(
matches => from match in matches
select new {
Id = match.Id,
HomeTeamId = (string)null,
HomeTeamName = match.HomeTeamName,
HomeTeamPosition = 0,
AwayTeamId = (string)null,
AwayTeamName = match.AwayTeamName,
AwayTeamPosition = 0,
StartTime = match.StartTime
}
);
Reduce = results => from result in results
// NOW WHAT?
}
}
The reduce part is the one I can't figure out since there are two teams in each match. I think I need to do a nested group by, first on the HomeTeamName, and then on the AwayTeamName but I can't figure out how to do that.
Maybe this is more a LINQ problem than a RavenDB problem. But how would such a nested group by statement look? Or could it be done in another way?
You are better off using Transform Results for that, or includes.
See the docs here: http://ravendb.net/docs/client-api/querying/handling-document-relationships

Categories