I'm getting the "Possible Multiple Enumeration of IEnumerable" warning from Reshaper. How to handle it is already asked in another SO question. My question is slightly more specific though, about the various places the warning will pop up.
What I'm wondering is whether or not Resharper is correct in giving me this warning. My main concern is that the warning occurs on all instances of the users variable below, indicated in code by "//Warn".
My code is gathering information to be displayed on a web page in a grid. I'm using server-side paging, since the entire data set can be tens or hundreds of thousands of rows long. I've commented the code as best as possible.
Again, please let me know whether or not this code is susceptible to multiple enumerations. My goal is to perform my filtering and sorting of data before calling ToList(). Is that the correct way to do this?
private List<UserRow> GetUserRows(UserFilter filter, int start, int limit,
string sort, SortDirection dir, out int count)
{
count = 0;
// LINQ applies filter to Users object
var users = (
from u in _userManager.Users
where filter.Check(u)
select new UserRow
{
UserID = u.UserID,
FirstName = u.FirstName,
LastName = u.LastName,
// etc.
}
);
// LINQ orders by given sort
if (!String.IsNullOrEmpty(sort))
{
if (sort == "UserID" && dir == SortDirection.ASC)
users = users.OrderBy(u => u.UserID); //Warn
else if (sort == "UserID" && dir == SortDirection.DESC)
users = users.OrderByDescending(u => u.UserID); //Warn
else if (sort == "FirstName" && dir == SortDirection.ASC)
users = users.OrderBy(u => u.FirstName); //Warn
else if (sort == "FirstName" && dir == SortDirection.DESC)
users = users.OrderByDescending(u => u.FirstName); //Warn
else if (sort == "LastName" && dir == SortDirection.ASC)
users = users.OrderBy(u => u.LastName); //Warn
else if (sort == "LastName" && dir == SortDirection.DESC)
users = users.OrderByDescending(u => u.LastName); //Warn
// etc.
}
else
{
users = users.Reverse(); //Warn
}
// Output variable
count = users.Count(); //Warn
// Guard case - shouldn't trigger
if (limit == -1 || start == -1)
return users.ToList(); //Warn
// Pagination and ToList()
return users.Skip((start / limit) * limit).Take(limit).ToList(); //Warn
}
Yes, ReSharper is right: count = users.Count(); enumerates unconditionally, and then if the limit or the start is not negative 1, the ToList would enumerate users again.
It appears that once ReSharper decides that something is at risk of being enumerated multiple times, it flags every single reference to the item in question with the multiple enumeration warning, even though it's not the code that is responsible for multiple enumeration. That's why you see the warning on so many lines.
A better approach would add a separate call to set the count. You can do it upfront in a separate statement, like this:
count = _userManager.Users.Count(u => filter.Check(u));
This way you would be able to leave users in its pre-enumerated state until the final call of ToList.
Your warning is hopefully generated by the call to Count, which does run an extra query.
In the case where limit == -1 || start == -1 you could make the ToList call and then get the count from that, but in the general case there's nothing you can do - you are making two queries, one for the full count and one for a subset of the items.
I'd be curious to see whether fixing the special case causes the warning to go away.
Edit: As this is LINQ-to-objects, you can replace the last return line with a foreach loop that goes through your whole collection counting them, but also builds up your restricted skip/take sublist dynamically, and thus only iterates once.
You could also benefit from only projecting (select new UserRow) in this foreach loop and just before your special-case ToList, rather than projecting your whole collection and then potentially discarding the majority of your objects.
var users = _userManager.Users.Where(u => filter.Check(u));
// Sort as above
List<UserRow> rtn;
if (limit == -1 || start == -1)
{
rtn = users.Select(u => new UserRow { UserID = u.UserID, ... }).ToList();
count = rtn.Length;
}
else
{
int takeFrom = (start / limit) * limit;
int forgetFrom = takeFrom + limit;
count = 0;
rtn = new List<UserRow>();
foreach(var u in users)
{
if (count >= takeFrom && count < forgetFrom)
rtn.Add(new UserRow { UserID = u.UserID, ... });
count++;
}
}
return rtn;
Related
I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.
Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).
Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.
Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.
I have faced a strange problem. When user comes to any page of my web app
I do check if user has permissions to access it, and provide trial period if its first time to come.
Here is my piece of code:
List<string> temp_workers_id = new List<string>();
...
if (temp_workers_id.Count > 6)
{
System.Data.SqlTypes.SqlDateTime sqlDate = new System.Data.SqlTypes.SqlDateTime(DateTime.Now.Date);
var rusers = dbctx.tblMappings.Where(tm => temp_workers_id.Any(c => c == tm.ModelID));
var permissions = dbctx.UserPermissions
.Where(p => rusers
.Any(ap => ap.UserID == p.UserID)
&& p.DateStart != null
&& p.DateEnd != null
&& p.DateStart <= sqlDate.Value
&& p.DateEnd >= sqlDate.Value);
if (permissions.Count() < 1)
{
permissions = dbctx.UserPermissions
.Where(p => rusers
.Any(ap => ap.UserID == p.UserID)
&& p.DateStart == null
&& p.DateEnd == null);
var used = dbctx.UserPermissions
.Where(p => rusers
.Any(ap => ap.UserID == p.UserID)
&& p.DateStart != null
&& p.DateEnd != null);
if (permissions.Count() > 0 && used.Count() < 1)
{
var p = permissions.First();
using (Models.TTTDbContext tdbctx = new Models.TTTDbContext())
{
var tp = tdbctx.UserPermissions.SingleOrDefault(tup => tup.UserID == p.UserID);
tp.DateStart = DateTime.Now.Date;
tp.DateEnd = DateTime.Now.Date.AddDays(60);
tdbctx.SaveChanges();
}
here the First() method throws exception:
Sequence contains no elements
how that even could be?
EDIT:
I dont think that user opens two browsers and navigate here at the same time, but could be the concurrency issue?
You claim you only found this in the server logs and didn't encounter it during debugging. That means that between these lines:
if (permissions.Count() > 0)
{
var p = permissions.First();
Some other process or thread changed your database, so that the query didn't match any documents anymore.
This is caused by permissions holding a lazily evaluated resource, meaning that the query is only executed when you iterate it (which Count() and First()) do.
So in the Count(), the query is executed:
SELECT COUNT(*) ... WHERE ...
Which returns, at that moment, one row. Then the data is modified externally, causing the next query (at First()):
SELECT n1, n2, ... WHERE ...
To return zero rows, causing First() to throw.
Now for how to solve that, is up to you, and depends entirely on how you want to model this scenario. It means the second query was actually correct: at that moment, there were no more rows that fulfilled the query criteria. You could materialize the query once:
permissions = query.Where(...).ToList()
But that would mean your logic operates on stale data. The same would happen if you'd use FirstOrDefault():
var permissionToApply = permissions.FirstOrDefault();
if (permissionToApply != null)
{
// rest of your logic
}
So it's basically a lose-lose scenario. There's always the chance that you're operating on stale data, which means that the next code:
tdbctx.UserPermissions.SingleOrDefault(tup => tup.UserID == p.UserID);
Would throw as well. So every time you query the database, you'll have to write the code in such a way that it can handle the records not being present anymore.
Using Entity Framework 6.0.2 and .NET 4.5.1 in Visual Studio 2013 Update 1 with a DbContext connected to SQL Server:
I have a long chain of filters I am applying to a query based on the caller's desired results. Everything was fine until I needed to add paging. Here's a glimpse:
IQueryable<ProviderWithDistance> results = (from pl in db.ProviderLocations
let distance = pl.Location.Geocode.Distance(_geo)
where pl.Location.Geocode.IsEmpty == false
where distance <= radius * 1609.344
orderby distance
select new ProviderWithDistance() { Provider = pl.Provider, Distance = Math.Round((double)(distance / 1609.344), 1) }).Distinct();
if (gender != null)
{
results = results.Where(p => p.Provider.Gender == (gender.ToUpper() == "M" ? Gender.Male : Gender.Female));
}
if (type != null)
{
int providerType;
if (int.TryParse(type, out providerType))
results = results.Where(p => p.Provider.ProviderType.Id == providerType);
}
if (newpatients != null && newpatients == true)
{
results = results.Where(p => p.Provider.ProviderLocations.Any(pl => pl.AcceptingNewPatients == null || pl.AcceptingNewPatients == AcceptingNewPatients.Yes));
}
if (string.IsNullOrEmpty(specialties) == false)
{
List<int> _ids = specialties.Split(',').Select(int.Parse).ToList();
results = results.Where(p => p.Provider.Specialties.Any(x => _ids.Contains(x.Id)));
}
if (string.IsNullOrEmpty(degrees) == false)
{
List<int> _ids = specialties.Split(',').Select(int.Parse).ToList();
results = results.Where(p => p.Provider.Degrees.Any(x => _ids.Contains(x.Id)));
}
if (string.IsNullOrEmpty(languages) == false)
{
List<int> _ids = specialties.Split(',').Select(int.Parse).ToList();
results = results.Where(p => p.Provider.Languages.Any(x => _ids.Contains(x.Id)));
}
if (string.IsNullOrEmpty(keyword) == false)
{
results = results.Where(p =>
(p.Provider.FirstName + " " + p.Provider.LastName).Contains(keyword));
}
Here's the paging I added to the bottom (skip and max are just int parameters):
if (skip > 0)
results = results.Skip(skip);
results = results.Take(max);
return new ProviderWithDistanceDto { Locations = results.AsEnumerable() };
Now for my question(s):
As you can see, I am doing an orderby in the initial LINQ query, so why is it complaining that I need to do an OrderBy before doing a Skip (I thought I was?)...
I was under the assumption that it won't be turned into a SQL query and executed until I enumerate the results, which is why I wait until the last line to return the results AsEnumerable(). Is that the correct approach?
If I have to enumerate the results before doing Skip and Take how will that affect performance? Obviously I'd like to have SQL Server do the heavy lifting and return only the requested results. Or does it not matter (or have I got it wrong)?
I am doing an orderby in the initial LINQ query, so why is it complaining that I need to do an OrderBy before doing a Skip (I thought I was?)
Your result starts off correctly as an ordered queryable: the type returned from the query on the first line is IOrderedQueryable<ProviderWithDistance>, because you have an order by clause. However, adding a Where on top of it makes your query an ordinary IQueryable<ProviderWithDistance> again, causing the problem that you see down the road. Logically, that's the same thing, but the structure of the query definition in memory implies otherwise.
To fix this, remove the order by in the original query, and add it right before you are ready for the paging, like this:
...
if (string.IsNullOrEmpty(languages) == false)
...
if (string.IsNullOrEmpty(keyword) == false)
...
result = result.OrderBy(r => r.distance);
As long as ordering is the last operation, this should fix the runtime problem.
I was under the assumption that it won't be turned into a SQL query and executed until I enumerate the results, which is why I wait until the last line to return the results AsEnumerable(). Is that the correct approach?
Yes, that is the correct approach. You want your RDBMS to do as much work as possible, because doing paging in memory defeats the purpose of paging in the first place.
If I have to enumerate the results before doing Skip and Take how will that affect performance?
It would kill the performance, because your system would need to move around a lot more data than it did before you added paging.
I have a problem I cant wrap my head around.
I have a Sharepoint List of Items, which have Categories. I want to read all Categories and count how often, they occur.
In another method, I want to take the categoryCount, divide it by the total number of tickets and multiply by 100 to get a percentage.
The problem is the Count.
This is my query so far:
public IEnumerable<KategorieVM> GetAllCategories()
{
int counter = 0;
var result = (from t in Tickets
where t.Kategorie != Kategorie.Invalid && t.Kategorie != Kategorie.None && t.Kategorie != null
select new KategorieVM() { _name = t.Kategorie.ToString(), _val = counter++ });
return result;
}
the problem is, I can't use counter++. Is there a clean workaround? The option to build a query for the purpose of counting each category is not a valid option. The list has 15000 Listitems and growing. In the end I need to iterate through every category and call the query to count the tickets which just takes about 3 minutes.
So counting the cateogry in one query is mandatory.
Any help is highly appreciated.
/edit: for the sake of clearity:
the counter++ as count was just a brainfart - I dont know why I tried it; this would have resulted in an index. I needed a way to count how often the 'category' occured in those 15k entries.
You can use GroupBy to perform the Count within the query itself:
return Tickets
.Where(t => t.Kategorie != Kategorie.Invalid && t.Kategorie != Kategorie.None && t.Kategorie != null)
.GroupBy(t => t.Kategorie.ToString())
.Select(g => new KategorieVM() { _name = g.Key, _val = g.Count() });
I have the DB that contains billions of rows.
I created function that recieve from user number of parameters and cut the DB by those parameters.
This works well for me with small DB(30000 rows), but when I try to use this function on big DB I got TIMEOUTEXCEPTION from SQLSERVER.
Here is my code:
public static IQueryable<LogViewer.EF.InternetEF.Log> ExecuteInternetGetLogsQuery(FilterCriteria p_Criteria, ref GridView p_Datagrid)
{
IQueryable<LogViewer.EF.InternetEF.Log> internetQuery = null;
using (InternetDBConnectionString context = new InternetDBConnectionString())
{
internetQuery = context.Logs;
if ((p_Criteria.DateTo != null && p_Criteria.DateFrom != null))
{
internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom && c.Timestamp < p_Criteria.DateTo);
}
else if (p_Criteria.DateFrom != null && p_Criteria.DateFrom > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom);
}
else if (p_Criteria.DateTo != null && p_Criteria.DateTo > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp < p_Criteria.DateTo);
}
if (!string.IsNullOrEmpty(p_Criteria.FreeText))
{
internetQuery = internetQuery.Where(c => c.FormattedMessage.Contains(p_Criteria.FreeText));
}
if (p_Criteria.Titles.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Titles.Contains(c.Title)).AsQueryable();
}
if (p_Criteria.MachineNames.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.MachineNames.Contains(c.MachineName)).AsQueryable();
}
if (p_Criteria.Severities.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Severities.Contains(c.Severity)).AsQueryable();
}
internetQuery= internetQuery.OrderByDescending(c=>c.LogID);
if (internetQuery.Count() > p_Criteria.TopValue)
{
internetQuery = internetQuery.Take(p_Criteria.TopValue);
}
p_Datagrid.DataSource = internetQuery;
p_Datagrid.DataBind();
return internetQuery;
}
}
My version of SQL is 2005.
I got an exception on p_Datagrid.DataBind(); row.
Any suggetions?
Thanks
What I can see you have these options:
Increase the timeout (Bad idé just moves the problem in the future)
Instead of doing a linq query. Get the data by a store procedure
Make the grid page. So you just retrieve the data for target page.
Look at the query plan and see if you can do any indexes on the column that you are doing your where statements on and order by.
Why do you need to have billions of rows in a datagrid. What is the requirements? Maybe you can just show top 1000 or top 10000. Because from a user respective I can not see any pros of seeing a grid with a billion rows.
That was just top of my head.
EDIT
And if I would have this function I would start looking at this section of the code:
if (p_Criteria.Titles.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Titles.Contains(c.Title)).AsQueryable();
}
if (p_Criteria.MachineNames.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.MachineNames.Contains(c.MachineName)).AsQueryable();
}
if (p_Criteria.Severities.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Severities.Contains(c.Severity)).AsQueryable();
}
This actually make a IEnumerable of the result and then you do the in memory where statements with database calls. You might also have a problem doing this because when you call the related tables it call the database. Maybe you can fetch the rows and then do the contains with a id of the IQueryable . All the pros of having a IQueryable diapers when doing this.
In general, 'swiss army knife' specification or criteria patterns like this are hard to optimise (i.e. Index at SQL Level), because of the large number of permutations of filter combinations that a client / user can specify. So if you can somehow force the user to specify at least one mandatory criterion, which reduces the rowcount significantly, e.g. by making the Date Range mandatory and no more than one month, I would start there, because then at least we've got something to start when we look at indexing.
Due to the potentially large number of rows, I would assert or validate that the value of p_Criteria.TopValue used to limit the rows is always present, and is a sensible number, e.g. Take(1000). You can always warn the user to narrow his / her search range if this threshold is reached.
The major problem is likely to be that filtering on Titles, MachineNames and Severities each calls AsEnumerable(), which materializes the query thus far, and thus you evaluate these 3 filters in memory, not in SQL, potentially with large numbers of records. All recent versions of EF are able to convert predicates of the form Where(c => IEnumerable<X>.Contains(c.Column)) into the SQL WHERE c.Column IN (X1, X2, X3).
i.e. You should remove the AsEnumerable() on these 3 filters (and you then don't need to convert back to AsQueryable()), i.e.
if (p_Criteria.Titles.Any())
{
internetQuery = internetQuery
.Where(c => p_Criteria.Titles.Contains(c.Title));
}
if (p_Criteria.MachineNamesAny())
{
internetQuery = internetQuery
.Where(c => p_Criteria.MachineNames.Contains(c.MachineName));
}
if (p_Criteria.Severities.Any())
{
internetQuery = internetQuery
.Where(c => p_Criteria.Severities.Contains(c.Severity));
}
Another issue in the Take check, by running .Count() in the check, that you are materializing the query (if you haven't already done so). You should instead just run Take() directly - no need to check if we've exceeded the rowcount. If there are LESS than p_Criteria.TopValue rows then it will return as may rows as are present, i.e. remove the if check and just leave this:
internetQuery = internetQuery.Take(p_Criteria.TopValue);
Another thing I would look at for performance reasons is whether you can change the FreeText string checks to use StartsWith instead of Contains. Indexing on SQL database char columns is only effective at the start of the string.
If the %filter% wildcard is not needed, then this is obviously different to OP's code, but will use be able to use indexing on the FreeText column:
if (!string.IsNullOrEmpty(p_Criteria.FreeText))
{
internetQuery = internetQuery
.Where(c => c.FormattedMessage.StartsWith(p_Criteria.FreeText));
}
Minor quibble, and won't effect the database performance, but you can reduce the number of branches on your date filtering to just the following:
if (p_Criteria.DateFrom != null && p_Criteria.DateFrom > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom);
}
if (p_Criteria.DateTo != null && p_Criteria.DateTo > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp < p_Criteria.DateTo);
}
From a naming standards point of view, I would also change the name of your Object/DbContext from *ConnectionString to *Context.
Since the concrete schema is not available you can try following things.
Write a Stored procedure with all your filter criteria and send parameters from code. Then execute the stored procedure from code and check whether you still get time outs. To check how you can call SP's from Entity framework, read this
If you do not succeed with step 1. You might want to review your table design and add Indexes and / or extra filters. To check guidelines on how to index a SQL Server database read this
You may also want to create a "shadow" copy of your tables to keep archived DB rows. With archived I mean rows which are of no use as of now, but cannot be permanently deleted.
EDIT : I agree with #Arion about having a paged grid instead fetching all the rows.
After a week of searching of solution I found this post. This work great with indexed DB with more than billion rows.
Here is my code solution:
public static IQueryable<LogViewer.EF.InternetEF.Log> ExecuteInternetGetLogsQuery(FilterCriteria p_Criteria, ref GridView p_Datagrid)
{
IQueryable<LogViewer.EF.InternetEF.Log> internetQuery = null;
List<LogViewer.EF.InternetEF.Log> executedList = null;
using (InternetDBConnectionString context = new InternetDBConnectionString())
{
internetQuery = context.Logs;
if ((p_Criteria.DateTo != null && p_Criteria.DateFrom != null))
{
internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom.Value && c.Timestamp < p_Criteria.DateTo.Value);
}
else if (p_Criteria.DateFrom != null && p_Criteria.DateFrom > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom);
}
else if (p_Criteria.DateTo != null && p_Criteria.DateTo > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp < p_Criteria.DateTo);
}
if (!string.IsNullOrEmpty(p_Criteria.FreeText))
{
internetQuery = internetQuery.Where(c => c.FormattedMessage.Contains(p_Criteria.FreeText));
}
if (p_Criteria.Titles.Count > 0)
{
internetQuery = internetQuery.Where(BuildOrExpression<LogViewer.EF.InternetEF.Log, string>(p => p.Title, p_Criteria.Titles));
}
if (p_Criteria.MachineNames.Count > 0)
{
internetQuery = internetQuery.Where(BuildOrExpression<LogViewer.EF.InternetEF.Log, string>(p => p.MachineName, p_Criteria.MachineNames));
}
if (p_Criteria.Severities.Count > 0)
{
internetQuery = internetQuery.Where(BuildOrExpression<LogViewer.EF.InternetEF.Log, string>(p => p.Severity, p_Criteria.Severities));
}
internetQuery = internetQuery.Take(p_Criteria.TopValue);
executedList = internetQuery.ToList<LogViewer.EF.InternetEF.Log>();
executedList = executedList.OrderByDescending(c => c.LogID).ToList<LogViewer.EF.InternetEF.Log>(); ;
p_Datagrid.DataSource = executedList;
p_Datagrid.DataBind();
return internetQuery;
}
}
public static Expression<Func<TElement, bool>> BuildOrExpression<TElement, TValue>(
Expression<Func<TElement, TValue>> valueSelector,
IEnumerable<TValue> values )
{
if (null == valueSelector)
throw new ArgumentNullException("valueSelector");
if (null == values)
throw new ArgumentNullException("values");
ParameterExpression p = valueSelector.Parameters.Single();
if (!values.Any())
return e => false;
var equals = values.Select(value =>
(Expression)Expression.Equal(
valueSelector.Body,
Expression.Constant(
value,
typeof(TValue)
)
)
);
var body = equals.Aggregate<Expression>(
(accumulate, equal) => Expression.Or(accumulate, equal)
);
return Expression.Lambda<Func<TElement, bool>>(body, p);
}
I hope this will usefull for our community
Thanks
The problem is that you are trying to load all results into your data grid at all. Is it really neccessary? Can you use something like paging to read only first e.g. 100 rows and the rest only on demand?
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Severities.Contains(c.Severity))
Is EF smart enough to do queries like this as a join or does it cause a SELECT N + 1 problem?
Otherwise, maybe you're missing an index on something.
I would check the generated SQL first.