I have this code that looks though all Contacts and does a count on each email that's been sent to them and if they haven't open/click the last X amount then return them in a list
at the moment the code is taking about 10 mins to run, is there anything I can do to improve this?
I know I could limit the amount returned but that's still slow.
var contactList =
(from c in db.Contacts
where c.Accounts_CustomerID == Account.AccountID && !c.Deleted && !c.EmailOptOut
select c).ToList();
foreach (var person in contactList)
{
var SentEmails =
(from c in db.Comms_Emails_EmailsSents where c.ContactID == person.ID select c).OrderBy(
x => x.DateSent).Take(Last).ToList();
if (SentEmails.Count == Last)
{
if (!Clicks)
{
if (SentEmails.Count(x => x.Opens == 0) == Last)
{
ReturnContacts.Add(person);
}
}
else
{
if (SentEmails.Count(x => x.Clicks == 0) == Last)
{
ReturnContacts.Add(person);
}
}
}
}
return ReturnContacts;
Remove the .ToList()'s and use IQueryables. By using iqueryables the code will execute once and reduces memory. The ToList() retrieves all entities and store them in memory, which you don't want.
Run the logic on the db - rewrite a query using joins etc., so that it returns a result set that already contains relevant data.
What you're doing now is performing a db query for each initial query result. That can mean A LOT of queries.
If you offload that to the RDBMS you can always try and and optimize it there (by introducing indexes etc.).
EDIT: I rewrote the code in notepad:
foreach(var record in (from c in db.Contacts
join es in db.Comms_Emails_EmailsSents
on c.Id equals es.ContactId
where c.Accounts_CustomerID == Account.AccountID && !c.Deleted && !c.EmailOptOut
orderby c.Id, es.DateSent descending
select new {opens=es.Opens, clicks=es.Clicks, person=c})
.GroupBy(r=>r.person)){
var mails = record.Take(Last).ToList();
if(mails.Count == Last){
if(!Clicks){
if(mails.Count(x=>x.opens == 0) == Last){
ReturnContacts.Add(record.Key);
}
}
}else
{
if (SentEmails.Count(x => x.Clicks == 0) == Last)
{
ReturnContacts.Add(record.Key);
}
}
}
I don't have time at hand to mock up a db and test it. Also, this approach performs a join between contacts and emails, and if you have 100k emails per person, this might be a very bad idea. You could optimize it by using rank function, but I'd say that if performance is still bad, you could start thinking of doing db-side optimizations, as this data structure is - at least to my, non-dba eyes - not perfectly suited for this kind of querying.
Related
I have the following dynamic linq
var results=(from a in alist where a.id==id select a)
if(...something)
{
results=(from a in results where a.amount>input1 && a.typeId==1 select a)
}
if(...something else)
{
results=(from a in results where a.amount>input2 && a.typeId==2 select a)
}
if(...something else again)
{
results=(from a in results where a.amount>input3 && a.typeId==3 select a)
}
However this produces an AND statement which means all the statements need to be true for anything to be returned.
I need the last 3 statements to be ORed together.
eg I want
Where (a.id==id) && ((a.amount>input1 && a.typeId==1) ||
(a.amount>input2 && a.typeId==2) || (a.amount>input3 && a.typeId==3))
How do I do this?
Check the PredicateBuilder class. This is a famous implementation of extensions methods for Linq to easily perform dynamic logic operations with OR and AND.
Given your list is of a TypeA for sample, you coul try this:
Expression<Func<TypeA, bool>> filter = a => a.id == id;
if(...something)
{
filter = filter.Or(a => a...);
}
if(...something)
{
filter = filter.Or(a => a...);
}
if(...something)
{
filter = filter.Or(a => a...);
}
var results = alist.Where(filter).ToList();
Use .Concat()
I am not absolutely sure if I understood you question correctly, but this code will create a resultset that is appended to if your if conditions are true, rather than replacing the original resultset.
var results=(from a in alist where a.id==id select a)
if(...something)
{
results = results.Concat((from a in alist where a.amount>input1 && a.typeId==1 select a))
}
if(...something else)
{
results = results.Concat((from a in alist where a.amount>input2 && a.typeId==2 select a))
}
//....
Edited as per Peter B's comment.
If multiple lists may contain the same element and you only wish to have every element at most once, use .Union instead of .Concat. This has some performance penalty of course (having to compare the elements).
After your edit
Your edit clarified things a bit. You have two options:
Move your a.id == id check into the inner queries:
var results=Enumerable.Empty<typeofa>()
if(...something)
{
results = results.Concat((from a in alist where a.id == id && a.amount>input1 && a.typeId==1 select a))
}
if(...something else)
{
results = results.Concat((from a in alist where a.id == id && a.amount>input2 && a.typeId==2 select a))
}
//....
First filter the set using the id, materialize that, then further narrow that down using the method I showed above.
var results=Enumerable.Empty<typeofa>();
var fileterdList = (from a in alist where a.id==id select a).ToList();
if(...something)
{
results = results.Concat((from a in fileterdList where a.amount>input1 && a.typeId==1 select a))
}
if(...something else)
{
results = results.Concat((from a in fileterdList where a.amount>input2 && a.typeId==2 select a))
}
//....
Whichever works better depends on your situation. General advice is that prefiltering is more efficient if it narrows down the list considerably and/or the original source is relatively expensive to query (sql for example), but as always, you should profile your concrete example yourself.
I have two tables (tbPerson and tbDataLog) where I need to return Id from one table (tbPerson) after checking certain conditions on both. After this, this result should be passed to another query. My first query returns the Id (primary key of a table) successfully and I need to pass these ids to another query so that it return me data based upon these Id. I also has an IQueryable type base object to check certain conditions to fetch data.
IQueryable<tbPerson> dataset
and I cannot changes this from Iqueryable to other as it will break other part of the code)
My first linq statement:
public static IQueryable<LogResults> GetResultsForYes()
{
Databasename ents = new Databasename();
var ids = (from f in ents.tbPerson
join g in ents.tbDataLog
on f.InfoID equals g.RefId
where g.Tag == "subscribed" && g.OldValue == "No" && g.Action == "Modified"
select new LogResults { _LogID = f.Id }).OrderBy(x => x._LogID);
return ids;
}
public class LogResults
{
public int _LogID { get; set; }
}
I access my result something like this where I can see in debugger all the Ids.
IQueryable<LogResults> log = GetResultsForYes();
Problem comes, when I tried to get records from tbPerson based upon these returned Id.
dataset=log.where(x=>x._LogID != 0);
I get this error:
Cannot implicitly convert type 'System.Linq.IQueryable' to 'System.Linq.IQueryable'. An explicit conversion exists(are you missing a cast)?
Any suggestions or some other good approach is welcome.
I love this thing about stackoverflow. when we write questions we force our brain to think more deeply and after 30 mins of posting this question, I solved it in a simple way. Sometimes we overcomplicated things!
var ids = (from f in ents.tbPerson
join g in ents.tbDataLog
on f.InfoID equals g.RefId
where g.Tag == "subscribed" && g.OldValue == "No" && g.Action == "Modified"
select new { f.Id }).ToArray();
var allId = ids.Select(x => x.Id).ToArray();
dataset = dataset.Where(x => allId.Contains(x.Id));
#ankit_sharma : I have not tested yours but will give a try and come back to you. Thanks for giving time and effort.
IQueryable<tbPerson> dataset=log.where(x=>x._LogID != 0);
The result of log.where(x=>x._LogID != 0) is an IQueryable<LogResults>, and you are trying to assign this result to dataset of type IQueryable<tbPerson>, two diferent types.
EDIT:
I see you make a join to get the tbPerson ids, and then you do a second query to get the persons. You could get the persons in the first join.
I just modify your code:
IQueryable<tbPerson> persons = from person in ents.tbPerson
join g in ents.tbDataLog
on person.InfoID equals g.RefId
where g.Tag == "subscribed" && g.OldValue == "No" && g.Action == "Modified"
select person;
Using Entity Framework 6.0.2 and .NET 4.5.1 in Visual Studio 2013 Update 1 with a DbContext connected to SQL Server:
I have a long chain of filters I am applying to a query based on the caller's desired results. Everything was fine until I needed to add paging. Here's a glimpse:
IQueryable<ProviderWithDistance> results = (from pl in db.ProviderLocations
let distance = pl.Location.Geocode.Distance(_geo)
where pl.Location.Geocode.IsEmpty == false
where distance <= radius * 1609.344
orderby distance
select new ProviderWithDistance() { Provider = pl.Provider, Distance = Math.Round((double)(distance / 1609.344), 1) }).Distinct();
if (gender != null)
{
results = results.Where(p => p.Provider.Gender == (gender.ToUpper() == "M" ? Gender.Male : Gender.Female));
}
if (type != null)
{
int providerType;
if (int.TryParse(type, out providerType))
results = results.Where(p => p.Provider.ProviderType.Id == providerType);
}
if (newpatients != null && newpatients == true)
{
results = results.Where(p => p.Provider.ProviderLocations.Any(pl => pl.AcceptingNewPatients == null || pl.AcceptingNewPatients == AcceptingNewPatients.Yes));
}
if (string.IsNullOrEmpty(specialties) == false)
{
List<int> _ids = specialties.Split(',').Select(int.Parse).ToList();
results = results.Where(p => p.Provider.Specialties.Any(x => _ids.Contains(x.Id)));
}
if (string.IsNullOrEmpty(degrees) == false)
{
List<int> _ids = specialties.Split(',').Select(int.Parse).ToList();
results = results.Where(p => p.Provider.Degrees.Any(x => _ids.Contains(x.Id)));
}
if (string.IsNullOrEmpty(languages) == false)
{
List<int> _ids = specialties.Split(',').Select(int.Parse).ToList();
results = results.Where(p => p.Provider.Languages.Any(x => _ids.Contains(x.Id)));
}
if (string.IsNullOrEmpty(keyword) == false)
{
results = results.Where(p =>
(p.Provider.FirstName + " " + p.Provider.LastName).Contains(keyword));
}
Here's the paging I added to the bottom (skip and max are just int parameters):
if (skip > 0)
results = results.Skip(skip);
results = results.Take(max);
return new ProviderWithDistanceDto { Locations = results.AsEnumerable() };
Now for my question(s):
As you can see, I am doing an orderby in the initial LINQ query, so why is it complaining that I need to do an OrderBy before doing a Skip (I thought I was?)...
I was under the assumption that it won't be turned into a SQL query and executed until I enumerate the results, which is why I wait until the last line to return the results AsEnumerable(). Is that the correct approach?
If I have to enumerate the results before doing Skip and Take how will that affect performance? Obviously I'd like to have SQL Server do the heavy lifting and return only the requested results. Or does it not matter (or have I got it wrong)?
I am doing an orderby in the initial LINQ query, so why is it complaining that I need to do an OrderBy before doing a Skip (I thought I was?)
Your result starts off correctly as an ordered queryable: the type returned from the query on the first line is IOrderedQueryable<ProviderWithDistance>, because you have an order by clause. However, adding a Where on top of it makes your query an ordinary IQueryable<ProviderWithDistance> again, causing the problem that you see down the road. Logically, that's the same thing, but the structure of the query definition in memory implies otherwise.
To fix this, remove the order by in the original query, and add it right before you are ready for the paging, like this:
...
if (string.IsNullOrEmpty(languages) == false)
...
if (string.IsNullOrEmpty(keyword) == false)
...
result = result.OrderBy(r => r.distance);
As long as ordering is the last operation, this should fix the runtime problem.
I was under the assumption that it won't be turned into a SQL query and executed until I enumerate the results, which is why I wait until the last line to return the results AsEnumerable(). Is that the correct approach?
Yes, that is the correct approach. You want your RDBMS to do as much work as possible, because doing paging in memory defeats the purpose of paging in the first place.
If I have to enumerate the results before doing Skip and Take how will that affect performance?
It would kill the performance, because your system would need to move around a lot more data than it did before you added paging.
I am trying to create an application that will extract some data out of an automatically generated excel file. This can be very easily done with Access but the file is in Excel and the solution must be a one button sort of thing.
For some reason, simply looping through the data without doing any actions is slow. The code below is my attempt at optimizing it from something that was far slower. I have arrived at using Linq to SQL after a few attempts at this with the Interop classes directly and through different wrappers.
I also have read the answers to a few questions on here and Google. In an attempt to see what is causing the slowness, I have removed all instructions but kept "i++" from the relevant section. It is still very slow. I also tried to optimize it by limiting the number of records retrieved in the where clause in the third line but that didn't work. Your help would be appreciated.
Thank you.
Dictionary<string,double> instructors = new Dictionary<string,double>();
var t = from c in excel.Worksheet("Course_201410_M1")
// where c["COURSE CODE"].ToString().Substring(0,4) == "COSC" || c["COURSE CODE"].ToString().Substring(0,3) == "COEN" || c["COURSE CODE"].ToString().Substring(0,3) == "GEIT" || c["COURSE CODE"].ToString().Substring(0,3) == "ITAP" || c["COURSE CODE"] == "PRPL 0012" || c["COURSE CODE"] == "ASSE 4311" || c["COURSE CODE"] == "GEEN 2312" || c["COURSE CODE"] == "ITLB 1311"
select c;
HashSet<string> uniqueForce = new HashSet<string>();
foreach (var c in t)
{
if(uniqueForce.Add(c["Instructor"]))
instructors.Add(c["Instructor"],0.0);
}
foreach (string name in instructors.Keys)
{
var y = from d in t
where d["Instructor"] == name
select d;
int i = 1;
foreach(var z in y)
{
//this is the really slow. It takes a couple of minutes to finish. The
// file has less than a 1000 records.
i++;
}
}
Put the query that forms var t into brackets and then call ToList() on it.
var t = (from c in excel.Worksheet("Course_201410_M1")
select c).ToList();
Due to linq's lazy/deferred execution model, whenever you iterate over the collection it will requery the data source unless you give it a List to work with.
Can anyone help me figure this out?
The below code works fine and gets inside the if statument
foreach (var m in msg)
{
if (string.IsNullOrEmpty(m.PhoneNumber))
{
m.PhoneNumber = (from c in db.Customers
where c.CustomerID == m.CustomerID
select c.PhoneNumber).Single();
}
}
However in the below code phoneNumber is never set
foreach (var m in msg.Where(z => (z.PhoneNumber == null || z.PhoneNumber == "")))
{
m.PhoneNumber = (from c in db.Customers
where c.CustomerID == m.CustomerID
select c.PhoneNumber).Single();
}
I'm presuming its because the top code actually evaluates the expression whereas the below dosent. If that is the case then how can you check for null on an unevaluated LINQ query?
EDIT Just to stop confusion here is how msg is poplated in both cases
var msg = from m in db.Messages
where (m.StatusID == (int)MessageStatus.Submitted && m.MessageBoxTypeID == (int)MessageBoxType.Outbox)
select m;
I’m somewhat baffled by this one, but I have a wild guess. If the msg sequence is an IQueryable<T> which translates to an SQL query, then the behavior of the two snippets may vary. Suppose you have:
var msg =
from m in dataContext.MyTable
select m;
Your first snippet would cause the entire msg sequence to be enumerated, thereby issuing an unfiltered SELECT…FROM command to the database and fetching all the rows within your table.
foreach (var m in msg)
On the other hand, your second snippet applies a filter to your sequence before it is enumerated. Thus, the command issued to the database is a SELECT…FROM…WHERE.
foreach (var m in msg.Where(z => (z.PhoneNumber == null || z.PhoneNumber == "")))
There are various cases where the behavior of a filter applied in .NET would differ from its translation to Transact-SQL. For one, case-sensitivity. In your case, I assume that the mismatch is caused by entries whose PhoneNumber consists of whitespace, as these may match the empty string in SQL Server.
To test this possibility, check what happens if you change your second snippet to:
foreach (var m in msg.ToList().Where(z => (z.PhoneNumber == null || z.PhoneNumber == "")))
Edit: Your issue might be that your query is being executed again during subsequent access (when you check whether PhoneNumber was set).
If you execute:
foreach (var m in msg.Where(z => (z.PhoneNumber == null || z.PhoneNumber == "")))
{
m.PhoneNumber = …
}
bool stillHasNulls = msg.Any(z => z.PhoneNumber == null || z.PhoneNumber == "");
You will find that stillHasNulls might still evaluate to true, since your assignment to m.PhoneNumber is being lost when you re-evaluate the msg sequence (in the above case, when you execute msg.Any, which issues an EXISTS command to the database).
For your m.PhoneNumber assignments to be preserved, you need to either persist them to the database (if that’s what you want), or else make sure that you’re accessing the same sequence elements each time. One way to do this would be to pre-populate the sequence as a collection, using ToList.
msg = msg.Where(z => (z.PhoneNumber == null || z.PhoneNumber == "")).ToList();
foreach (var m in msg)
{
m.PhoneNumber = …
}
In the above code, the filter still gets issued to the database as a SELECT…FROM…WHERE, but the result is evaluated eagerly, and then stored as a list within msg. Any subsequent queries on msg would be evaluated against the pre-populated in-memory collection (which would contain any new values you assign to its elements).