Send multiple queries to EF / SQL database - c#

.NET 4.7.6 and Entity Framework 3.1.16
Is it possible to send multiple queries via EF or any other method when querying the database?
The problem I have is that I have to extract a lot of data from multiple tables with multiple queries for example (I have omitted unnecessary info)
var companyValues = GetCompanyValues(companyId); // DB query
var allMemberScores = GetAllMemberScores(companyId); // DB query
var currentPeriodMemberScores = GetAllMemberScores(companyId, timespan); // DB query
var previousPeriodMemberScores = GetAllMemberScores(companyId, previousTimespan); // DB query
var currentDepartments = allMemberScores.Select(x => x.Department).Distinct();
if (currentDepartments?.Any() == true)
{
foreach (var department in currentDepartments)
{
var current = currentPeriodMemberScores.Where(x => x.Department.Equals(department));
var previous = previousPeriodMemberScores.Where(x => x.Department.Equals(department));
var combinedCompanyValueScore = allMemberScores.Where(x => x.Department.Equals(department)).SelectMany(x => x.CompanyValueScores).Average(x => x.Score);
foreach (var companyValue in companyValues)
{
int? value = GetCompanyValueDifference(current, previous, companyValue); // DB query
}
int responseRate = GetResponseRate(surveysQuery, current); // DB query
}
}
but this will send a lot of requests depending on the number of departments and company values both of which are undefined in length, could be 10 could be 20+ etc.
This is an inherited project, so there is existing data etc. and redesigning the database is on the backseat.
I have previously used Elastic search in another project where I could define a search request that takes in multiple queries and give each query an id, and on the object that is returned pull it the response for each of those ids.
Is there anything like this that I could utilise? So could I build up a query and then process all the results?
Thanks in advance :)

Related

Large database linq query to sql server takes forever

Background
So, I am using a React frontend and a .net core 3.1 backend for a webapp where I display a view with a list of data. The list is often times several thousands long. In this case its around 7500. We virtualize it to prevent sluggishness. Along with the display of data, every row has a column with the latest logchange someone did on that datarow. The logs and the rest of the data for every row comes from two different applications with their own databases. The log data consists of the name, and date of when the log was made, is also supposed to be rendering for every row.
The problem
When you route to the page, a useEffect fires that fetches the rows from one of the databases. When I get the response, I filter out all of the ids from the data and then I post that list to the other endpoint to request the latest log from every id. This endpoint queries the logging database. The number of ids I am passing to the endpoint is about 7200+. It wont always be this much, but sometimes.
Troubleshooting
This is the query that is giving me trouble in the log endpoint
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds= (LogIds)parameters["LogIds"];
var results = await context.Set<LogEvent>()
.Where(x => LogIds.Ids.Contains(x.Id)).ToListAsync(); //55 600 entities
results = results
.GroupBy(x => x.ContextId)
.Select(x => x.OrderByDescending(p => p.CreationDate).First()).ToList(); //7 500 entities
var transformed = results.Select(MapEntityToLogEvent).ToList();
return Ok(transformed);
}
The first db query takes around 25 seconds (!) and returns around 56000 entities.
The second linq takes about 2 seconds, and returns around 7500 entites, and the mapping takes around 1 second.
The database is SQL server, and there are three indexes, one of which is Id, the other two are irrelevant for this assignment.
I have tried different queries, AsNoTracking, but to no avail.
Obviously this is horrible. Do you know of a way to optimize this query?
There are two ways, how to improve your query:
Pure EF Core
We can rewrite LINQ query to be translatable and avoid unnecessary records on the client side. Note that your GroupBy will work with EF Core 6:
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds = (LogIds)parameters["LogIds"];
var results = context.Set<LogEvent>()
.Where(x => LogIds.Ids.Contains(x.Id));
results =
from d in results.Select(d => new { d.ContextId }).Distinct()
from r in results
.Where(r => r.ContextId == d.ContextId)
.OrderByDescending(r => r.CreationDate)
.Take(1)
select r;
var transformed = await results.Select(MapEntityToLogEvent).ToListAsync();
return Ok(transformed);
}
Using third party extension
With linq2db.EntityFrameworkCore we can use full power of the SQL and make most efficient query in this case.
Big list of ids can fast be copied to temorary table and used in result query.
Retrieveing only latest records by ContextId can be done effectively with Windows Function ROW_NUMBER.
Disclaimer I'm maintainer of this library.
// helper class for creating temporary table
class IdsTable
{
public int Id { get; set; }
}
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds = (LogIds)parameters["LogIds"];
using var db = context.CreateLinqToDBConnection();
TempTable<IdsTable>? idsTable = null;
var results = context.Set<LogEvent>().AsQueryable();
try
{
// avoid using temporary table for small amount of Ids
if (LogIds.Ids.Count() < 20)
{
results = results.Where(x => LogIds.Ids.Contains(x.Id));
}
else
{
// initializing temporary table
idsTable = await db.CreateTampTableAsync(LogIds.Ids.Select(id => new IdsTable { Id = id }, tableName: "temporaryIds"));
// filter via join
results =
from t in idsTable
join r in results on t.Id equals r.Id
select r;
}
// selecting last log
results =
from r in results
select new
{
r,
rn = Sql.Ext.RowNumber().Over()
.PartitionBy(r.ContextId)
.OrderByDesc(r.CreationDate)
.ToValue()
} into s
where s.rn == 1
select s.r;
var transformed = await results
.Select(MapEntityToLogEvent)
.ToListAsyncLinqToDB(); // we have to use our extension because of name collision with EF Core extensions
}
finally
{
// dropping temporaty table if it was used
idsTable?.Dispose();
}
return Ok(transformed);
}
Warning
Also note that logs count will grow and you have to limit result set by date and probably count of retrieved records.

C# Azure CosmosDb and Mongo - how to know if Find is hitting an index, and which are the best indexing recommendations for this scenario?

I have an ASP.Net Core 3.1 API that saves documents in Azure CosmosDb using Mongo Driver nuget package v2.11.
First of all, my document's class:
public class Customer
{
public Guid CustomerId {get;set;}
public string Email {get;set;}
public int Channel {get;set;}
public string PartitionKey
{
get { return GetPartitionKey(CustomerId); }
set {; }
}
public static string GetPartitionKey(Guid id)
{
return id.ToString().Substring(0, 2);
}
}
Before sharing my repository class, I'd like to share a few details about the situations I'm struggling here. I have a partitioned collection (with the PartitionKey property of my Customer class) and I have two requirements for Find operations:
To be able to find by CustomerId and Channel (same CustomerId can exist for different Channel)
To be able to check if user exists. The customer exists if the CustomerId or Email exist for the same Channel (again, same CustomerId or Email can exist for different Channel)
My question is about the appropriate indexes, in order to take advantage of them when I find by another thing than the partition key. Let's move to the repository class, then to the indexes:
public class MyRepository
{
private IMongoCollection<Customer> Collection;
public MyRepository()
{
MongoClientSettings settings = MongoClientSettings.FromUrl(new MongoUrl("The connection string"));
settings.SslSettings = new SslSettings() { EnabledSslProtocols = SslProtocols.Tls12 };
var mongoClient = new MongoClient(settings);
var database = mongoClient.GetDatabase("db-customer");
this.Collection = database.GetCollection<Customer>("col-customer");
// What indexes here ?!?
}
public Customer GetByKey(Guid customerId, int channel)
{
var channelFilter = Builders<Customer>.Filter.Eq(x => x.Channel, customer.Channel);
var idFilter = Builders<Customer>.Filter.Eq(x => x.CustomerId, customer.CustomerId);
var filter = channelFilter & idFilter;
Customer result = this.Collection.Find(filter).FirstOrDefault();
return result;
}
public bool Exists(Customer customer)
{
var channelFilter = Builders<Customer>.Filter.Eq(x => x.Channel, customer.Channel);
var emailFilter = Builders<Customer>.Filter.Eq(x => x.Email, customer.Email);
var idFilter = Builders<Customer>.Filter.Eq(x => x.CustomerId, customer.CustomerId);
var filter = channelFilter & (emailFilter | idFilter);
bool found = this.Collection.Find(filter).FirstOrDefault() != null;
return found;
}
}
So, my question is, which is the best indexing setup for this repository? Should I create one index for each field I'm searching, like this:
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.Channel)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.Email)));
Or should I create compound indexes, depending on the searches I'm trying to attempt, like this?
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId).Ascending(i => i.Channel)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId).Ascending(i => i.Email).Ascending(i => i.Channel)));
By using Azure monitor checking metrics I always get low RU consumption and overall low response times, but my repository has a few records at this stage. I'm afraid as the number of records scale (this will have millions of records) that the RU consumption becomes too large or the response times too high, or in a worst case scenario, both.
Can I have your two cents on this subject?
Thanks.
You should create a compound index only if your query needs to sort efficiently on multiple fields at once. For queries with multiple filters that don't need to sort, create multiple single field indexes instead of a single compound index. One query uses multiple single field indexes where available.
So, in your case, I see you have multiple filters that don't need to sort. Hence create multiple single field indexes.
Refer Manage indexing in Azure Cosmos DB's API for MongoDB for details

How do I get a single read action using Contains in Entity Framework

I was trying to replace this code
var sql = string.Format("SELECT * FROM AgreementTexts WHERE IsSelected = 1 AND HeadLineID IN ({0}) ORDER BY consNumber", headLineIDs);
var exampelTexts = await _db.Database.SqlQuery<AgreementExampelTextViewModel>(sql).ToListAsync();
with the following LINQ statements:
var query = _db.AgreementTexts
.Where(aet => aet.IsSelected && listOfHeadLineIDs.Contains(aet.HeadLineID))
.OrderBy(aet => aet.consNumber);
var exampelTexts = (await query.ToListAsync()).ToAgreementExampelTextViewModel();
However there was a big performance drop. Logging the calls the first example generates a single request to the database while the other example results in requests closing and opening the connection in between.
Is there a way for making the second example perform a single request?
You can use Distinct method to optimize your query as below :
var query = _db.AgreementTexts
.Where(aet => aet.IsSelected && listOfHeadLineIDs.Distinct().Contains(aet.HeadLineID))
.OrderBy(aet => aet.consNumber);
var exampelTexts = (await query.ToListAsync()).ToAgreementExampelTextViewModel();

Linq RemoveAll with ContainsList instead of AddArrayParameters

How do I convert this SQL command query to Entity Framework Linq? Trying to remove a contains list.
Trying to utilize resource
public async Task<int> PurgeInventory(IEnumerable<int> ids)
{
var command = new SqlCommand("Update [dbo].[Inventory] set status = 0 where InventoryId in ({ids})");
command.AddArrayParameters("ids", ids);
return await UnitOfWork.ExecuteSqlCommandAsync(command);
}
My attempt - looking for fix syntax:
IEnumerable<int> ids = new IEnumerable<int>();
UnitOfWork.DBSet<Inventory>().Remove(x => Where(x => InventoryId.Contains(ids));
Resource: How to use parameters in Entity Framework in a "in" clause?
For large bulk delete operations it's generally better to do that as a raw SQL statement, however if you have a manageable number of rows to delete:
(fast)
IEnumerable<int> ids = getIdsToDelete();
var idRecords = ids.Select(x => new Inventory { InventoryId = x }).ToList();
using (var context = new MyDbContext())
{
foreach(var idRecord in idRecords)
{
context.Inventory.Attach(idRecord);
context.Inventory.Remove(idRecord);
}
context.SaveChanges();
}
This requires a new Context instance which will not have any Inventory records possibly loaded, and requires that the Inventory ID's selected to remove contain no duplicates. We convert the IDs into dummy Inventory entities then attach them to the DbContext and tell EF to remove them.
(thorough)
IEnumerable<int> ids = getIdsToDelete();
using (var context = new MyDbContext())
{
var idRecords = context.Inventory.Where(x => ids.Contains(x.InventoryId)).ToList();
foreach(var idRecord in idRecords)
context.Inventory.Remove(idRecord);
context.SaveChanges();
}
This one retrieves the entities from the context then removes them. In cases where you want to update related records or otherwise manage relationships, this along with remembering to .Include() those relationships would be the way.

How to optimize this query without making excessive calls to db using LINQ to SQL and Entity Framework?

I have this anonymous type that I'm building through repository calls form Entity Framework but am getting this error: The specified LINQ expression contains references to queries that are associated with different contexts. The code below only pulls from one database so I do not understand why this is being raised.
// listOfReportIDs is a list of ints
var reports = BusinessLogic.Repository.Read<Report>().Where(r => listOfReportIDs.Contains(r.ReportID));
var huForm = BusinessLogic.Repository.Read<HumanCase>().Where(h => listOfReportIDs.Contains(h.ReportID));
var anForm = BusinessLogic.Repository.Read<AnimalCase>().Where(a => listOfReportIDs.Contains(a.ReportID));
var reportSummaryData = from r in reports
from h in huForm.Where(h => h.ReportID == r.ReportID)
from a in anForm.Where(a => a.ReportID == r.ReportID)
select new
{
CDC_ReportID = r.CDCReportID,
StateReportID = r.StateReportID,
r.ReportDate,
ReportStatus = r.LookupReportStatus.LookupReportStatusName,
r.AuthorID,
h.HumanComment,
a.AnimalComment
};
var reportData = reportSummaryData.ToList();
When I call the ToList() method above at the end (in order to cut down calls to the db til the end), I get the error mentiond above about multiple contexts. They all come from the same single database, just different tables, why is this still being thrown and how can I fix it in order to only make one call to the db?
EDIT:
Read method:
public IQueryable<T> Read<T>() where T : EntityObject, new()
{
var objectSet = Context.CreateObjectSet<T>();
objectSet.MergeOption = MergeOption.PreserveChanges;
return objectSet;
}
The multiple contexts refers to reports, huForm and anForm. You need to move these to the same context, or use separate queries to get the data from the database, and join then join the results.
Each of those reads is giving you a separate context. You need to abstract your db connection and then inherit to each of the models.

Categories