Querying external data source with LINQ - c#

I'm storing what basically amounts to log data stored in CSV files. It's of the format <datetime>,<val1>,<val2>, etc. However, the log files are stored by account ID and month, so if you query across months or account IDs you're going to retrieve multiple files.
I'd like to be able to query it with LINQ, so that if I could call logFiles.Where(o => o.Date > 1-1-17 && o.Date < 4-1-17). I suppose I'll need something to examine the date range in that query and notice that it spans 4 months, which then causes it to only examine files in that date range.
Is there any way to do this that does not involve getting my hands very dirty with a custom IQueryable LINQ provider? I can go down that rabbit hole if necessary, but I want to make sure it's the right rabbit hole first.

If you want to filter both on the log file name and on the log file contents in the same Where expression, I don't see a solution without a custom IQueryable LINQ provider, because that's exactly the use case for them: To access data in a smart way based on the expressions used in the LINQ query.
That said, it might be worth to use a multi-step approach as a compromise:
Use LINQ to restrict the log files to be searched,
read the files and
use LINQ for further searching.
Example:
IEnumerable<LogFile> files = LogFiles.Where(f => f.Date > new DateTime(17, 1, 1) && f.AccountID == 4711);
IEnumerable<LogData> data = ParseLogFiles(files);
IEnumerable<LogData> filteredData = data.Where(d => d.val1 == 42 && d.val2 > 17);
LogData firstMatch = filteredData.FirstOrDefault();
If you implement ParseLogFiles (a) with deferred execution and (b) as an extension method on IEnumerable<LogFile>, the resulting code will look-and-feel very similar to pure LINQ:
var filteredData = LogFiles.
Where(f => f.Date > new DateTime(17, 1, 1) && f.AccountID = 4711).
ParseLogFiles().
Where(d => d.val == 42 && d.val2 > 17);
// If ParseLogFiles uses deferred execution, the following line won't read
// more log files than required to get the first matching row:
var firstMatch = filteredData.First();
It's a bit more work than having it all in one single LINQ query, but it saves you from having to implement your own LINQ provider.

Related

My ForEach loop with a LINQ expression cannot be translated?

I need to go through all my chemicals and send out a warning to users if the expiration date is either 2 days away or 15 days away.
I have it set like that so the warning is sent out twice. Once 2 weeks before it expires, and once 2 days before it expires.
So I have this loop:
foreach (var chemical in _context.Chemical.Where(c => (c.DateOfExpiration - DateTime.Now).TotalDays == 15 || (c.DateOfExpiration - DateTime.Now).TotalDays == 2))
{
// send out notices
var base = _context.Base.Find(chemical.BaseId);
var catalyst = _context.Catalyst.Find(chemical.CatalystId);
_warningService.SendSafetyWarning(chemical, base, catalyst.Name);
}
But whenever I run it, I get this error:
System.InvalidOperationException: 'The LINQ expression 'DbSet<Chemical>
.Where(p => (p.DateOfExpiration - DateTime.Now).TotalDays == 15 || (p.DateOfExpiration - DateTime.Now).TotalDays == 2)'
could not be translated.
I'm not sure why it's giving me this error when I run it.
Is there a better way of doing what I'm trying to do?
Thanks!
DateTime.TotalDays returns a double.
You might have more luck changing your code to:
foreach (var chemical in _context.Chemical.Where(c => ((int)(c.DateOfExpiration - DateTime.Now).TotalDays == 15) || ((int)(c.DateOfExpiration - DateTime.Now).TotalDays == 2)))
{
// send out notices
var base = _context.Base.Find(chemical.BaseId);
var catalyst = _context.Catalyst.Find(chemical.CatalystId);
_warningService.SendSafetyWarning(chemical, base, catalyst.Name);
}
There are two solutions to this problem.
I'll first explain why it is throwing an error. EF is converting your code to an SQL Expression. When working with Expression you cannot parse in any C# code and expect it to work. EF needs to be able to translate it to SQL. That is why you are getting the exception saying that it can't translate your code.
When working directly in the Where clause on the DbSet or the DbContext you are actualy talking to the Where method for an IQueryable. Which has a parameter of Expression<Func<bool, T>> where T is your DbSet<T> type.
You either need to simplify your query so it can be translated to SQL. Or you can convert your IQueryable to an IEnumerable.
When you want to run your query on the SQL server, you need to use the first approach. But when you can run your query on the client you can use the second approach.
The first approached as mentioned by vivek nuna is using EntityFunctions. It might not be enough for your use case, because it has limited functionality.
The statement made that DateTime.Now is different on each iteratoion is not true. Because the query is converted once, it isn't running the query on every iteration. It simply doesn't know how to translate your query to SQL.
https://learn.microsoft.com/en-us/dotnet/api/system.data.objects.entityfunctions?view=netframework-4.8
The second approach means adding AsEnumerable after your DbSet. This will query all your data from your SQL server, and evaluate in C#. This is usually not recommended if you have a large data set. Example:
var chemicals = _context.Chemical.AsEnumerable(); // this will get the complete collection
chemicals.Where(i => i.Value == true); // everything will work now
And answer to a "better" approach. It's a bit cleaner code:
If you have navigation properties for your Base and Catalyst you can also include this in your query.
Note that you can still query server side before pulling everything client side using a Where on the _context.Chemical and then including and calling AsEnumerable.
var chemicals = _context.Chemical.Include(i => i.Base).Include(i => i.Catalyst).AsEnumerable();
foreach (var chemical in chemicals.Where(i => true)) // set correct where clause
{
_warningService.SendSafetyWarning(chemical, chemical.Base, chemical.Catalyst);
}
You can declare the variable before and then pass it to your expression. Like DateOfExpiration == nextFifteenDays …..
var nextFifteenDays = DateTime.Now.Adddays(15);
var nextTwoDays = DateTime.Now.AddDays(2);
Or you can use EntityFunctions.Diffdays
Reason for the error is well explained by #neil that DateTime.Now will keep on changing, so can’t be used.

How do I control the priority of nested queries in Sitecore ContentSearch with the Solr Provider?

Version Details: I am working with Sitecore 7.5 build 141003, using Solr v4.7 as the search engine/indexing server. I am also using the standard Sitecore Solr provider with no custom indexers.
Target Goal:
I am using Sitecore ContentSearch LINQ with PredicateBuilder to compile some flexible and nested queries. Currently, I need to search within a specific "Root item", while excluding templates with "folder" in their name, also excluding items with "/testing" in their path. At some point the "Root item" could be more than one item, and so could the path contains (currently just "/testing". In those cases, the idea is to use PredicateBuilder to build an outer "AND" predicate with inner "OR"s for the multiple "Root item"s and path exclusions.
Problem:
At the moment, I am dealing with an issue regarding the order of nesting and priorities for these predicates/conditions. I have been testing several approaches and combinations, but the issue I keep running into is the !TemplateName.Contains and Item["_fullpath"].Contains being prioritized over the Paths.Contains, which ends up resulting in 0 results each time.
I am using the Search.log to check the query output, and I have been manually testing against the Solr admin, running queries against it to compare results. Below, you will find examples of the combinations I have tried using Sitecore Linq, and the queries they produce for Solr.
Original Code Sample:
Original test with List for root items
// sometimes will be 1, sometimes will be multiple
var rootItems = new List<ID> { pathID }; // simplified to 1 item for now
var query = context.GetQueryable<SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
var pathFilter = PredicateBuilder.False<SearchResultItem>();
pathFilter = rootItems.Aggregate(pathFilter, (current, id) => current.Or(i => i.Paths.Contains(id)));
folderFilter = folderFilter.And(pathFilter);
query.Filter(folderFilter).GetResults();
Query output: (-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND _path:(730c169987a44ca7a9ce294ad7151f13)
As you can see in the above output, there is an inner set of parenthesis around the two "not contains" filters which takes precedence over the Path one. When I run this exact query in the Solr admin, it returns 0 results. However, if I remove the inner parenthesis so it's all a single "AND" set, it returns the results expected.
I tested this further with different combinations and approaches to the PredicateBuilder, and each combination results in the same query. I even tried adding two individual filters ("query.Filter(pred1).Filter(pred2)") to my main query object, and it results in the same output.
Additional Code Samples:
Alt. 1 - Adding "Paths.Contains" to folder filter directly
var query = context.GetQueryable<SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
folderFilter = folderFilter.And(i => i.Paths.Contains(pathID));
query.Filter(folderFilter).GetResults();
Query output: (-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND _path:(730c169987a44ca7a9ce294ad7151f13)
Alt 2 - Two predicates joined to first
var query = context.GetQueryable<SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
var pathFilter = PredicateBuilder.False<SearchResultItem>().Or(i => i.Paths.Contains(pathID));
folderFilter = folderFilter.And(pathFilter);
query.Filter(folderFilter).GetResults();
Query output: (-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND _path:(730c169987a44ca7a9ce294ad7151f13)
Alt 3 - Two "inner" predicates, one for "Not"s and one for "Paths" joined to an outer predicate
var query = context.GetQueryable<SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
var pathFilter = PredicateBuilder.False<SearchResultItem>().Or(i => i.Paths.Contains(pathID));
var finalPredicate = PredicateBuilder.True<SearchResultItem>().And(folderFilter).And(pathFilter);
query.Filter(finalPredicate).GetResults();
Query output: (-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND _path:(730c169987a44ca7a9ce294ad7151f13)
Conclusion:
Ultimately, what I am looking for is a way to control the prioritization of these nested queries/conditions, or how I can build them to put the paths first, and the "Not" filters after. As mentioned, there are conditions where we will have multiple "Root items" and multiple path exclusions where I need to query something more like:
(-_templatename:(*folder*) AND -_fullpath:(*/testing*) AND
(_path:(730c169987a44ca7a9ce294ad7151f13) OR
_path:(12c1aa7f60fa4e8d9f0a983bbbb40d8b)))
OR
(-_templatename:(*folder*) AND -_fullpath:(*/testing*) AND
(_path:(730c169987a44ca7a9ce294ad7151f13)))
Both of these queries return the results I expect/need when I run them directly in the Solr admin. However, I cannot seem to come up with an approach or order of operations using Sitecore ContentSearch Linq to output a query this way.
Does anyone else have experience with how I can accomplish this? Depending on the suggestion, I am also willing to assemble this piece of the query without Sitecore Linq, if I can marry it back to the IQueryable for calling "GetFacets" and "GetResults".
Update:
I didn't include all the revisions I have done because SO would probably kill me for how long this would get. That said, I did try one other slight variation on my original example (top) with a similar result as the others:
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder")).And(i => !i["_fullpath"].Contains("/testing"));
var rootItems = new List<ID> { pathID, path2 };
// or paths separately
var pathFilter = PredicateBuilder.False<SearchResultItem>();
pathFilter = rootItems.Aggregate(pathFilter, (current, id) => current.Or(i => i.Paths.Contains(id)));
var finalPredicate = folderFilter.And(pathFilter);
var query = context.GetQueryable<SearchResultItem>();
query.Filter(finalPredicate).GetResults();
Query Output: ((-_templatename:(*folder*) AND -_fullpath:(*/testing*)) AND (_path:(730c169987a44ca7a9ce294ad7151f13) OR _path:(12c1aa7f60fa4e8d9f0a983bbbb40d8b)))
And it's still those inner parenthesis around the "_templatename" and "_fullpath" conditions that causes problems.
Thanks.
Alright, I raised this question here and posted the situation to Sitecore support as well, and I just received a response and some additional information.
According to the Solr wiki (http://wiki.apache.org/solr/FAQ), in the "Searching" section, the question Why does 'foo AND -baz' match docs, but 'foo AND (-bar)' doesn't ? answers why the results are coming back 0.
Boolean queries must have at least one "positive" expression (ie; MUST or SHOULD) in order to match. Solr tries to help with this, and if asked to execute a BooleanQuery that does contains only negatived clauses at the topmost level, it adds a match all docs query (ie: :)
If the top level BoolenQuery contains somewhere inside of it a nested BooleanQuery which contains only negated clauses, that nested query will not be modified, and it (by definition) an't match any documents -- if it is required, that means the outer query will not match.
I am not sure of what entirely is being done to construct the query in the Sitecore Solr provider, or why they are grouping the negatives together in a nested query, but the nested query with negatives only is returning 0 results as expected, according to Solr doc. The trick, then, is to add a "match all" query (*:*) to the sub-query.
Instead of having to do this manually for any query that I think might encounter this situation, the support rep provided a patch DLL to replace the provider, that will automatically modify the nested query to remedy this.
They also logged this as a bug and provided reference number 398622 for the issue.
Now, the resulting query looks like this:
((-_templatename:(*folder*) AND -_fullpath:(*/testing*) AND *:*) AND _path:(730c169987a44ca7a9ce294ad7151f13))
or, for multiple queries:
((-_templatename:(*folder*) AND -_fullpath:(*/testing*) AND *:*) AND (_path:(730c169987a44ca7a9ce294ad7151f13) OR _path:(12c1aa7f60fa4e8d9f0a983bbbb40d8b)))
And the results return as expected. If anyone else comes across this, I would use the reference number with Sitecore support and see if they can provide the patch. You will also have to update the provider used in your Solr.Index and Solr.Indexes.Analytics config files.
If the 2 working samples at the end are correct then you need to AND together the parts of your query separatly, instead of including 2 statements in a single call, which is what is causing the nesting of the initial part of your statement:
// the path part of the query. OR together all the locations
var pathFilter = PredicateBuilder.False<SearchResultItem>();
pathFilter = pathFilter.Or(i => i.Paths.Contains(pathID));
pathFilter = pathFilter.Or(i => i.Paths.Contains(pathID2));
...
// the exclusions, build them up seprately
var query = PredicateBuilder.True<SearchResultItem>();
query = query.And(i => !i.TemplateName.Contains("folder"));
query = query.And(i => !i["_fullpath"].Contains("/testing"));
// join both parts together
query = query.And(pathFilter);
This should give you (pseudo):
!templateName.Contains("folder")
AND !_fullpath.Contains("/testing")
AND (path.Contains(pathID1) || path.Contains(pathID2))
If you are trying to exclude certain templates then you could exclude them from your Index in the fisrt place by updating the ExcludeTemplate settings in Sitecore.ContentSearch.Solr.DefaultIndexConfiguration.config. You won't need to worry about specifically excluding it in query then:
<exclude hint="list:ExcludeTemplate">
<MyTemplateId>{11111111-1111-1111-1111-111111111111}</MyTemplateId>
<MyTemplateId>{22222222-2222-2222-2222-222222222222}</MyTemplateId>
</exclude>
I have tried the following code and it did produce your needed output query, The trick was to use PredicateBuilder.True() when creating Path filter query, Not sure if that's a normal behavior from Content Search API, or its a bug
var query = context.GetQueryable<Sitecore.ContentSearch.SearchTypes.SearchResultItem>();
var folderFilter = PredicateBuilder.True<SearchResultItem>().And(i => !i.TemplateName.Contains("folder") && !i["_fullpath"].Contains("/testing"));
var pathFilter = PredicateBuilder.True<SearchResultItem>();
pathFilter = pathFilter.Or(i => i.Paths.Contains(Path1) || i.Paths.Contains(Path2));
folderFilter = folderFilter.And(pathFilter);

The linq query taking too much time. Need to reduce the Time

Here i am using the below query and its taking lots of time around 14 to 15 seconds for retrieving the large amount of data.
In below Query the CreatedDate is of DateTimeOffset data type.
var naId = UnitOfWork.SalesPhases.FirstOrDefault(p => p.PhaseName =="NA").SalesPhaseId;
var rejectedId = UnitOfWork.SalesPhases.FirstOrDefault(p => p.PhaseName =="Rejected").SalesPhaseId;
var data = UnitOfWork.Leads.Query().AsEnumerable()
.Where(p =>(p.SalesPhaseId == naId || p.SalesPhaseId == rejectedId) &&
p.CreatedDate.Date >= fromDate && p.CreatedDate.Date <= toDate).Select(m =>
new
{
m.LeadId,
m.LeadOwnerId,
m.SalesPhaseId,
m.LeadActivities,
m.Employee,
m.SalesPhase,
m.CompanyName,
m.CreatedDate,
m.LeadHistories,
m.LeadAddresses
}).ToList();
I tried using the AsQueryable instead of the AsEnumerable but it gives the below error:
"The specified type member 'Date' is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties are supported."
Can you help me out to reduce the execution time of the query?
Your use of AsEnumerable is forcing the filtering to be done locally. It's pulling in all the data, then filtering it in your app. That's clearly very inefficient. Now, it seems that part of your query can't be directly expressed in LINQ to SQL. I see two options here.
Firstly you could do most of your filtering in SQL, but then do the date filtering locally:
var data = UnitOfWork.Leads.Query()
// Do this part of the query in SQL
.Where(p => p.SalesPhaseId == naId ||
p.SalesPhaseId == rejectedId)
.AsEnumerable()
// Do the rest of the query in-process
.Where(p => p.CreatedDate.Date >= fromDate &&
p.CreatedDate.Date <= toDate)
.Select(...)
That's suitable if the first part will filter it down massively, and then you only need to do local processing of a small set of data.
Alternatively, you could work out what your date filtering means in terms of DateTime. It looks like you could do:
// This may not be required, depending on the source.
fromDate = fromDate.Date;
// This will be, although you may be able to get rid of the ".Date" part.
toDate = toDate.Date.AddDays(1);
var data = UnitOfWork.Leads.Query()
// Do this part of the query in SQL
.Where(p => (p.SalesPhaseId == naId ||
p.SalesPhaseId == rejectedId) &&
p.CreatedDate >= fromDate &&
p.CreatedDate < toDate)
.Select(...)
That's created an equivalent query, but without using the Date property in the query itself.
Everything after AsEnumerable() is executed locally rather than on the server. See also
https://stackoverflow.com/a/2013876/141172
This means that all rows in the table are returned from the database, and then filtered in your C# code.
Remove that call so that the filtering happens server-side.
EDIT
Noticed Jon's comment and it reminded me that he reimplemented LINQ to Objects as a learning exercise. His comments about the AsEnumerable() reimplementation are worth reading
I can describe its behaviour pretty easily: it returns source.
That's all it does. There's no argument validation, it doesn't create another iterator. It just returns source.
You may well be wondering what the point is... and it's all about changing the compile-time type of the expression. I'm going to take about IQueryable in another post (although probably not implement anything related to it) but hopefully you're aware that it's usually used for "out of process" queries - most commonly in databases.
Now it's not entirely uncommon to want to perform some aspects of the query in the database, and then a bit more manipulation in .NET - particularly if there are aspects you basically can't implement in LINQ to SQL (or whatever provider you're using). For example, you may want to build a particular in-memory representation which isn't really amenable to the provider's model.
https://msmvps.com/blogs/jon_skeet/archive/2011/01/14/reimplementing-linq-to-objects-part-36-asenumerable.aspx
Your code should like this..
var naId = UnitOfWork.SalesPhases.FirstOrDefault(p => p.PhaseName =="NA").SalesPhaseId;
var rejectedId = UnitOfWork.SalesPhases.FirstOrDefault(p => p.PhaseName =="Rejected").SalesPhaseId;
var data = UnitOfWork.Leads.Query().AsQueryable()
.Where(p =>(p.SalesPhaseId == naId || p.SalesPhaseId == rejectedId) &&
p.CreatedDate>= fromDate.Date && p.CreatedDate <= toDate.Date).Select(m =>
new
{
m.LeadId,
m.LeadOwnerId,
m.SalesPhaseId,
m.LeadActivities,
m.Employee,
m.SalesPhase,
m.CompanyName,
m.CreatedDate,
m.LeadHistories,
m.LeadAddresses
}).ToList();
Firstly, You need to use .ToQueryable instead of .ToIEnumerable().
Secondly, you cannot use .Date to datetime properties inside a entity framework linq query. That only works for in-memory collections like list and arrays.

Optimize Entity Framework query with multiple contain statements

I'm trying to optimize a query which is taking around 6 seconds to execute.
string[] filters = ...;
var data =
(from n in ctx.People
.Where(np => np.IsActive)
let isFilterMatch = filters.All(f => n.FirstName.ToLower().Contains(f) ||
n.Prefix.ToLower().Contains(f) ||
n.MiddleName.ToLower().Contains(f) ||
n.LastName.ToLower().Contains(f) ||
n.Information.Email.ToLower().Contains(f) ||
(n.Address!= null &&
(SqlFunctions.StringConvert((double)n.Address.Number).
Contains(f) ||
n.Address.Street.ToLower().Contains(f) ||
n.Address.ZipCode.ToLower().Contains(f) ||
n.Address.City.ToLower().Contains(f))))
where isFilterMatch
orderby n.LastName
select n
).Take(numberOfItems).ToList();
This is a query for a search dialog. The user can type in any text and it will then search for a person that matches the input. We split the user input into a string array and then do a Contains on the Person fields. The query cannot be precompiled because of the filter array.
How can I optimize this function? I heard about things like FullTextSearch on Sql Server or stored procedures. Could that help?
We are using Sql Server 2008, Entity Framework 4.0 (Model First) and C#.
I would not use a SQL query / Linq query for this search query. Normal queries for text searching can be slow and they only return exact results; they don't correct spelling/grammar errors etc.
You might consider using the 'Full Text Search' functionality of SQL Server; but the resulting performance might be still poor. Please refer to http://www.sql-server-performance.com/2010/full-text-search-2008/.
I would suggest to use a search indexer like Apache Lucene (which is available as a dll in Lucene.NET). Another option is that you write your own Windows service that indexes all the records.

How to Use Dates in Where Clause in EF Core?

I need to filter my queries by dates but I don't care in this case about time portion of it that is stored in SQL Database.
I first tried to something like
var now = DateTime.Now.Date;
Where(x => x.CreatedDate.Date.Compare(now) == 0)
but this seems to all get locally checked making the query slow. How can I do this without making it do the check locally?
I am pretty much trying to just find all results that would say have happened today(2020-01-06).
There are a limited number of methods you can use on translatable types when constructing your Lambda / Linq expressions. This is because each method would need additional code so that it could be translated into a sql store expression. It means that you must check that any methods you want to use and expect to be translated into a sql store expression are supported.
In this case the DateTime.Compare is not supported.
The easiest thing to do here is a simple range comparison because the time is included in your persisted value.
var start = DateTime.Now.Date;
var end = start.AddDays(1);
Where(x => x.CreatedDate >= start && x.CreatedDate < end)
This will result in a sargable query.
Use
var now = DateTime.Now.Date
...WHERE(CreatedDate.Date == now)
I just checked that above translates to the following SQL query:
WHERE ((CONVERT(date, [x].[CreatedDate]) = '2019-01-07T00:00:00.000')
I used this (link) method to see what LINQ translates to

Categories