I am taking input from a client to build up an elasticsearch query using NEST. I start out with the basics, like so:
var search = esClient.Search<MyData>(s => s
.From(pageNum * pageSize)
.Take(pageSize)
.QueryRaw(#"{""match_all"": {} }")
I then parse out the request and see if an optional sorting parameter was passed in. If it was, I create a new SearchDescriptor<MyData>() which performs that requested sort, and I want to add it to my original search criteria. Obviously .Search() will actually perform an HTTP call, so it can't happen as it is today, but how can I stick a series of SearchDescriptor calls together and then perform the search at the end?
You can build SearchDescriptor incrementally as under. I've used aggregations instead of facets (which are deprecated now) but I hope you get the idea.
var sd = new SearchDescriptor<MyData>();
sd = sd.QueryRaw(<raw query string>);
if (<should sort>)
{
string fieldToBeSortedOn; // input from user
bool sortInAscendingOrder; // input from user
if (sortInAscendingOrder)
{
sd = sd.Sort(f => f
.Ascending()
.OnField(fieldToBeSortedOn));
}
else
{
sd = sd.Sort(f => f
.Descending()
.OnField(fieldToBeSortedOn));
}
}
if (<should compute aggregations>)
{
sd = sd.Aggregations(a => a
.Terms(
"term_aggs",
t => t
.Field(<name of field to compute terms aggregation on>)));
}
var search = esClient.Search<MyData>(s => sd);
Related
Background
So, I am using a React frontend and a .net core 3.1 backend for a webapp where I display a view with a list of data. The list is often times several thousands long. In this case its around 7500. We virtualize it to prevent sluggishness. Along with the display of data, every row has a column with the latest logchange someone did on that datarow. The logs and the rest of the data for every row comes from two different applications with their own databases. The log data consists of the name, and date of when the log was made, is also supposed to be rendering for every row.
The problem
When you route to the page, a useEffect fires that fetches the rows from one of the databases. When I get the response, I filter out all of the ids from the data and then I post that list to the other endpoint to request the latest log from every id. This endpoint queries the logging database. The number of ids I am passing to the endpoint is about 7200+. It wont always be this much, but sometimes.
Troubleshooting
This is the query that is giving me trouble in the log endpoint
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds= (LogIds)parameters["LogIds"];
var results = await context.Set<LogEvent>()
.Where(x => LogIds.Ids.Contains(x.Id)).ToListAsync(); //55 600 entities
results = results
.GroupBy(x => x.ContextId)
.Select(x => x.OrderByDescending(p => p.CreationDate).First()).ToList(); //7 500 entities
var transformed = results.Select(MapEntityToLogEvent).ToList();
return Ok(transformed);
}
The first db query takes around 25 seconds (!) and returns around 56000 entities.
The second linq takes about 2 seconds, and returns around 7500 entites, and the mapping takes around 1 second.
The database is SQL server, and there are three indexes, one of which is Id, the other two are irrelevant for this assignment.
I have tried different queries, AsNoTracking, but to no avail.
Obviously this is horrible. Do you know of a way to optimize this query?
There are two ways, how to improve your query:
Pure EF Core
We can rewrite LINQ query to be translatable and avoid unnecessary records on the client side. Note that your GroupBy will work with EF Core 6:
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds = (LogIds)parameters["LogIds"];
var results = context.Set<LogEvent>()
.Where(x => LogIds.Ids.Contains(x.Id));
results =
from d in results.Select(d => new { d.ContextId }).Distinct()
from r in results
.Where(r => r.ContextId == d.ContextId)
.OrderByDescending(r => r.CreationDate)
.Take(1)
select r;
var transformed = await results.Select(MapEntityToLogEvent).ToListAsync();
return Ok(transformed);
}
Using third party extension
With linq2db.EntityFrameworkCore we can use full power of the SQL and make most efficient query in this case.
Big list of ids can fast be copied to temorary table and used in result query.
Retrieveing only latest records by ContextId can be done effectively with Windows Function ROW_NUMBER.
Disclaimer I'm maintainer of this library.
// helper class for creating temporary table
class IdsTable
{
public int Id { get; set; }
}
public async Task<IActionResult> GetLatestLog(ODataActionParameters parameters)
{
var LogIds = (LogIds)parameters["LogIds"];
using var db = context.CreateLinqToDBConnection();
TempTable<IdsTable>? idsTable = null;
var results = context.Set<LogEvent>().AsQueryable();
try
{
// avoid using temporary table for small amount of Ids
if (LogIds.Ids.Count() < 20)
{
results = results.Where(x => LogIds.Ids.Contains(x.Id));
}
else
{
// initializing temporary table
idsTable = await db.CreateTampTableAsync(LogIds.Ids.Select(id => new IdsTable { Id = id }, tableName: "temporaryIds"));
// filter via join
results =
from t in idsTable
join r in results on t.Id equals r.Id
select r;
}
// selecting last log
results =
from r in results
select new
{
r,
rn = Sql.Ext.RowNumber().Over()
.PartitionBy(r.ContextId)
.OrderByDesc(r.CreationDate)
.ToValue()
} into s
where s.rn == 1
select s.r;
var transformed = await results
.Select(MapEntityToLogEvent)
.ToListAsyncLinqToDB(); // we have to use our extension because of name collision with EF Core extensions
}
finally
{
// dropping temporaty table if it was used
idsTable?.Dispose();
}
return Ok(transformed);
}
Warning
Also note that logs count will grow and you have to limit result set by date and probably count of retrieved records.
I am trying to search record(s) from table by appying multiple search parameters.
as per below snap.
here by using various parameters as per above snap i want to filter the records.
here user could enter any combination of parameter(s) to search record.
i tried something like below code hich works for single condition but fails for combination of any search paramets.
public List<students> SearchStudents(students search)
{
var result = new List<students>();
var records= from stud in db.students
where stud.enrollmentNumber== search.enrollmentNumber
|| stud.enrollmentDate==search.enrollmenttDate
|| stud.enrollmentType==search.enrollmentType
|| stud.className==search.className
select new Search()
{
enrollmentNumber= stud.enrollmentNumber,
enrollmentDate = stud.enrollmentDate,
enrollmentType = stud.enrollmentType,
Name = stud.Name,
className=stud.className,
Description = stud.Description
};
result = records.ToList();
return result;
}
but this is not working properly. means it returns same result whatever parameters I pass.
Like in the table i ahve 20 records and the enrollment number is the unique value field in DB so here when i am passing enrollment number thats like "2018-0001" it returns all records when it should return only single reocrd.
can someone guide me with this?
Without further explanation in your question about how this isn't working, the best we can do is guess. However, one very plausible reason for this is because you're including parameters you don't want to be filtering on.
Because you're using ORs in your statement, if any of those other properties are defaulted in the database, you're going to be returning those records. What you need to be doing is conditionally including your pieces of the WHERE clauses for only the properties that you want to search on. Unfortunately, that is not possible with the "SQL syntax" version of LINQ, so you will need to convert your query to that. (Good news: It's slightly more performant as well as it usually has to convert the SQL to the method syntax.)
Because of deferred execution, your query will not be sent to the database until you call a .ToList() or something to actually start processing the results. This allows you to chain method calls together, even if they are completely different C# statements. This is what you'll want to do:
public List<students> SearchStudents(students search)
{
var query = db.students;
if (!string.IsNullOrWhiteSpace(search.enrollmentNumber))
{
query = query.Where(s => s.enrollmentNumber == search.enrollmentNumber);
}
if (search.enrollmentDate != DateTime.MinValue)
{
query = query.Where(s => s.enrollmentDate == search.enrollmentDate);
}
if (!string.IsNullOrWhiteSpace(search.enrollmentType))
{
query = query.Where(s => s.enrollmentType == search.enrollmentType);
}
if (!string.IsNullOrWhiteSpace(search.className))
{
query = query.Where(s => s.className == search.className);
}
return query.Select(stud => new Search
{
enrollmentNumber= stud.enrollmentNumber,
enrollmentDate = stud.enrollmentDate,
enrollmentType = stud.enrollmentType,
Name = stud.Name,
className=stud.className,
Description = stud.Description
})
.ToList();
}
You may need to adjust the if statements in there to accommodate different data types than what is intuitive from the names, but this will only add the filter if a value has been provided.
I have LINQ query as following
HashSet<Guid> temp1 ; // Getting this value through another method
var Ids = temp1.Except((from temp2 in prodContext.Documents
where temp1.Contains(temp2.id)
select temp2.id)).ToList();
Here, temp1 has around 40k values. I'm getting timeout error sometimes, How can I divide this query using while or any other loop so that it won't give me timeout error.
I tried to set Connect Timeout in connection string and for database context but nothing works.
Any suggestion please
This is one of those unusual operations that is likely to be a query that can more effectively be performed in memory by the application rather than by the database. Instead of trying to send all of the id values you have in your set to the DB, have it find all of the items with those IDs, and then send them all back to you, it's very plausibly better to just get all of the document ids and filter them on the application side of things.
var documentIds = prodContext.Documents.Select(doc => doc.id);
var Ids = temp1.Except(documentIds).ToList();
Now, depending on how many documents you have, even that could theoretically time out. If it would, then you'll need to paginate the fetching of all of the document IDs. You can use the following method to paginate any query to avoid fetching the entire result set all at once:
public static IEnumerable<IEnumerable<T>> Paginate<T>(
this IQueryable<T> query,
int pageSize)
{
int page = 0;
while (true)
{
var nextPage = query.Skip(page * pageSize)
.Take(pageSize)
.ToList();
if (nextPage.Any())
yield return nextPage;
else
yield break;
page++;
}
}
This allows you to write:
var documentIds = prodContext.Documents.Select(doc => doc.id)
//play around with different batch sizes to see what works best
.Paginate(someBatchSize)
.SelectMany(x => x);
temp1.ExceptWith(documentIds);
Here's a way to do it where it combines both pagination and caching.
This way it only caches the one page size at a time to prevent memory overloading and prevent the time out. I hope this works.
int pageSize = 1000;
HashSet<Guid> temp1;
List<Guid> idsFromTable = new List<Guid>();
var Ids = temp1.ToList();
for(int i = 0; true; i++)
{
//Cache the table locally to prevent logic running while selecting on page size
idsFromTable.AddRange(prodContext.Documents.Skip(pageSize * i).Take(pageSize).Select(x=> x.id));
if(idsFromTable.Any())
{
//Then use the cached list instead of the datacontext
Ids = Ids.Except(idsFromTable).ToList();
idsFromTable.Clear();
}
else
break;
}
Similarly to Servy's answer you might want to try paginating the queries as opposed to just pulling them all in. The efficiency of this depends on the DB you are using, I certainly got some mileage out of it on Informix. In this case the logic would look like this
HashSet<Guid> ids... //Got from another method
List<Guid> validIds = new List<Guid>();
const Int32 BUFFERSIZE = 1000;
var validationBuffer = new List<Guid>(BUFFERSIZE);
foreach(var g in ids)
{
validationBuffer.Add(g)
if(validationBuffer.Count == BUFFERSIZE)
{
validIds.AddRange(
prodContext.Documents
.Select(t => t.id)
.Where(g => validationBuffer.Contains(g)));
validationBuffer.Clear();
}
}
//Do last query
validIds.AddRange(
prodContext.Documents
.Select(t => t.id)
.Where(g => validationBuffer.Contains(g)));
var missingIds = ids.Except(validIds);
I have a database where I'm wanting to return a list of Clients.
These clients have a list of FamilyNames.
I started with this
var query = DbContext.Clients.Include(c => c.FamilyNames).ToList() //returns all clients, including their FamilyNames...Great.
But I want somebody to be able to search for a FamilyName, ifany results are returned, then show the clients to the user.
so I did this...
var query = DbContext.Clients.Include(c => c.FamilyNames.Where(fn => fn.familyName == textEnteredByUser)).ToList();
I tried...
var query = DbContext.Clients.Include(c => c.FamilyNames.Any(fn => fn.familyName == textEnteredByUser)).ToList();
and...
var query = DbContext.FamilyNames.Include(c => c.Clients).where(fn => fn.familyname == textEnteredByUser.Select(c => c.Clients)).ToList();
What I would like to know (obviously!) is how I could get this to work, but I would like it if at all possible to be done in one query to the database. Even if somebody can point me in the correct direction.
Kind regards
In Linq to Entities you can navigate on properties and they will be transformed to join statements.
This will return a list of clients.
var query = DbContext.Clients.Where(c => c.FamilyNames.Any(fn => fn == textEnteredByUser)).ToList();
If you want to include all their family names with eager loading, this should work:
var query = DbContext.Clients.Where(c => c.FamilyNames.Any(fn => fn == textEnteredByUser)).Include(c => c.FamilyNames).ToList();
Here is some reference about loading related entities if something doesn't work as expected.
You can use 'Projection', basically you select just the fields you want from any level into a new object, possibly anonymous.
var query = DbContext.Clients
.Where(c => c.FamilyNames.Any(fn => fn == textEnteredByUser))
// only calls that can be converted to SQL safely here
.Select(c => new {
ClientName = c.Name,
FamilyNames = c.FamilyNames
})
// force the query to be materialized so we can safely do other transforms
.ToList()
// convert the anon class to what we need
.Select(anon => new ClientViewModel() {
ClientName = anon.ClientName,
// convert IEnumerable<string> to List<string>
FamilyNames = anon.FamilyNames.ToList()
});
That creates an anonymous class with just those two properties, then forces the query to run, then performs a 2nd projection into a ViewModel class.
Usually I would be selecting into a ViewModel for passing to the UI, limiting it to just the bare minimum number of fields that the UI needs. Your needs may vary.
I know there are probably 100 much easier ways to do this but until I see it I can't comprehend how to go about it. I'm using linqpad for this. Before I go to phase 2 I need to get this part to work!
I've connected to an SQL database.
I'm running a query to retrieve some desired records.
var DesiredSym = (from r in Symptoms
where r.Status.Equals(1) && r.Create_Date < TimespanSecs
select r).Take(5);
So, in this example, I retrieve 5 'records' essentially in my DesiredSym variable as iQueryable (linqpad tells me this)
The DesiredSym contains a large number of fields including a number feilds that hold a int of Month1_Used, Month2_Used, Month3_Used .... Month12_Use.
So I want to loop through the DesiredSym and basically get the sum of all the Monthx_Used fields.
foreach (var MonthUse in DesiredSym)
{
// get sum of all fields where they start with MonthX_Used;
}
This is where I'm not clear on how to proceed or even if I'm on the right track. Thanks for getting me on the right track.
Since you've got a static number of fields, I'd recommend this:
var DesiredSym =
(from r in Symptoms
where r.Status.Equals(1) && r.Create_Date < TimespanSecs
select retireMe)
.Take(5);
var sum = DesiredSym.Sum(s => s.Month1_Use + s.Month2_Use + ... + s.Month12_Use);
You could use reflection, but that would be significantly slower and require more resources, since you'd need to pull the whole result set into memory first. But just for the sake of argument, it would look something like this:
var t = DesiredSym.GetType().GenericTypeArguments[0];
var props = t.GetProperties().Where(p => p.Name.StartsWith("Month"));
var sum = DesiredSym.AsEnumerable()
.Sum(s => props.Sum(p => (int)p.GetValue(s, null)));
Or this, which is a more complicated use of reflection, but it has the benefit of still being executed on the database:
var t = DesiredSym.GetType().GenericTypeArguments[0];
var param = Expression.Parameter(t);
var exp = t.GetProperties()
.Where(p => p.Name.StartsWith("Month"))
.Select(p => (Expression)Expression.Property(param, p))
.Aggregate((x, y) => Expression.Add(x, y));
var lambda = Expression.Lambda(exp, param);
var sum = DesiredSym.Sum(lambda);
Now, to these methods (except the third) to calculate the sum in batches of 5, you can use MoreLINQ's Batch method (also available on NuGet):
var DesiredSym =
from r in Symptoms
where r.Status.Equals(1) && r.Create_Date < TimespanSecs
select retireMe;
// first method
var batchSums = DesiredSym.Batch(5, b => b.Sum(s => s.Month1_Use ...));
// second method
var t = DesiredSym.GetType().GenericTypeArguments[0];
var props = t.GetProperties().Where(p => p.Name.StartsWith("Month"));
var batchSums = DesiredSym.Batch(5, b => b.Sum(s => props.Sum(p => (int)p.GetValue(s, null))));
Both these methods will be a bit slower and use more resources since all the processing has to be don in memory. For the same reason the third method will not work, since MoreLinq does not support the IQueryable interface.