I have the following LINQ query:
var q = from bal in data.balanceDetails
where bal.userName == userName && bal.AccountID == accountNumber
select new
{
date = bal.month + "/" + bal.year,
commission = bal.commission,
rebate = bal.rebateBeforeService,
income = bal.commission - bal.rebateBeforeService
};
I remember seeing a lambda shorthand for summing the commission field for each row of q.
What would be the best way of summing this? Aside from manually looping through the results?
It's easy - no need to loop within your code:
var totalCommission = q.Sum(result => result.commission);
Note that if you're going to use the results of q for various different calculations (which seems a reasonable assumption, as if you only wanted the total commission I doubt that you'd be selecting the other bits) you may want to materialize the query once so that it doesn't need to do all the filtering and projecting multiple times. One way of doing this would be to use:
var results = q.ToList();
That will create a List<T> for your anonymous type - you can still use the Sum code above on results here.
Related
I'm currently trying to write some code that will run a query on two separate databases, and will return the results to an anonymous object. Once I have the two collections of anonymous objects, I need to perform a comparison on the two collections. The comparison is that I need to retrieve all of the records that are in webOrders, but not in foamOrders. Currently, I'm making the comparison by use of Linq. My major problem is that both of the original queries return about 30,000 records, and as my code is now, it takes waay too long to complete. I'm new to using Linq, so I'm trying to understand if using Linq to compare the two collections of anonymous objects will actually cause the database queries to run over and over again - due to deferred execution. This may be an obvious answer, but I don't yet have a very firm understanding of how Linq and anonymous objects work with deferred execution. I'm hoping someone may be able to enlighten me. Below is the code that I have...
private DataTable GetData()
{
using (var foam = Databases.Foam(false))
{
using (MySqlConnection web = new MySqlConnection(Databases.ConnectionStrings.Web(true)
{
var foamOrders = foam.DataTableEnumerable(#"
SELECT order_id
FROM Orders
WHERE order_id NOT LIKE 'R35%'
AND originpartner_code = 'VN000011'
AND orderDate > Getdate() - 7 ")
.Select(o => new
{
order = o[0].ToString().Trim()
}).ToList();
var webOrders = web.DataTableEnumerable(#"
SELECT ORDER_NUMBER FROM TRANSACTIONS AS T WHERE
(Str_to_date(T.ORDER_DATE, '%Y%m%d %k:%i:%s') >= DATE_SUB(Now(), INTERVAL 7 DAY))
AND (STR_TO_DATE(T.ORDER_DATE, '%Y%m%d %k:%i:%s') <= DATE_SUB(NOW(), INTERVAL 1 HOUR))")
.Select(o => new
{
order = o[0].ToString().Trim()
}).ToList();
return (from w in webOrders
where !(from f in foamOrders
select f.order).Contains(w.order)
select w
).ToDataTable();
}
}
}
Your linq ceases to be deferred when you do
ToDataTable();
At that point it is snapshotted as done and dusted forever.
Same is true with foamOrders and webOrders when you convert it
ToList();
You could do it as one query. I dont have mySQL to check it out on.
Regarding deferred execution:
Method .ToList() iterates over the IEnumerable retrieves all values and fill a new List<T> object with that values. So it's definitely not deferred execution at this point.
It's most likely the same with .ToDataTable();
P.S.
But i'd recommend to :
Use custom types rather than anonymous types.
Do not use LINQ to compare objects because it's not really effective (linq is doing extra job)
You can create a custom MyComparer class (that might implement IComparer interface) and method like Compare<T1, T2> that compares two entities. Then you can create another method to compare two sets of entities for example T1[] CompareRange<T1,T2>(T1[] entities1, T2[] entities2) that reuse your Compare method in a loop and returns result of the operation
P.S.
Some of other resource-intensive operations that may potentially lead to significant performance losses (in case if you need to perform thousands of operations) :
Usage of enumerator object (foreach loop or some of LINQ methods)
Possible solution : Try to use for loop if it is possible.
Extensive use of anonymous methods (compiler requires significant time to compile the lambda expression / operator );
Possible solution : Store lambdas in delegates (like Func<T1, T2>)
In case it helps anyone in the future, my new code is pasted below. It runs much faster now. Thanks to everyone's help, I've learned that even though the deferred execution of my database queries was cut off and the results became static once I used .ToList(), using Linq to compare the resulting collections was very inefficient. I went with a for loop instead.
private DataTable GetData()
{
//Needed to have both connections open in order to preserve the scope of var foamOrders and var webOrders, which are both needed in order to perform the comparison.
using (var foam = Databases.Foam(isDebug))
{
using (MySqlConnection web = new MySqlConnection(Databases.ConnectionStrings.Web(isDebug)))
{
var foamOrders = foam.DataTableEnumerable(#"
SELECT foreignID
FROM Orders
WHERE order_id NOT LIKE 'R35%'
AND originpartner_code = 'VN000011'
AND orderDate > Getdate() - 7 ")
.Select(o => new
{
order = o[0].ToString()
.Trim()
}).ToList();
var webOrders = web.DataTableEnumerable(#"
SELECT ORDER_NUMBER FROM transactions AS T WHERE
(Str_to_date(T.ORDER_DATE, '%Y%m%d %k:%i:%s') >= DATE_SUB(Now(), INTERVAL 7 DAY))
AND (STR_TO_DATE(T.ORDER_DATE, '%Y%m%d %k:%i:%s') <= DATE_SUB(NOW(), INTERVAL 1 HOUR))
", 300)
.Select(o => new
{
order = o[0].ToString()
.Trim()
}).ToList();
List<OrderNumber> on = new List<OrderNumber>();
foreach (var w in webOrders)
{
if (!foamOrders.Contains(w))
{
OrderNumber o = new OrderNumber();
o.orderNumber = w.order;
on.Add(o);
}
}
return on.ToDataTable();
}
}
}
public class OrderNumber
{
public string orderNumber { get; set; }
}
}
I need to identify items from one list that are not present in another list. The two lists are of different entities (ToDo and WorkshopItem). I consider a workshop item to be in the todo list if the Name is matched in any of the todo list items.
The following does what I'm after but find it awkward and hard to understand each time I revisit it. I use NHibernate QueryOver syntax to get the two lists and then a LINQ statement to filter down to just the Workshop items that meet the requirement (DateDue is in the next two weeks and the Name is not present in the list of ToDo items.
var allTodos = Session.QueryOver<ToDo>().List();
var twoWeeksTime = DateTime.Now.AddDays(14);
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime).List();
var matches = from wsi in workshopItemsDueSoon
where !(from todo in allTodos
select todo.TaskName)
.Contains(wsi.Name)
select wsi;
Ideally I'd like to have just one NHibernate query that returns a list of WorkshopItems that match my requirement.
I think I've managed to put together a Linq version of the answer put forward by #CSL and will mark that as the accepted answer as it put me in the direction of the following.
var twoWeeksTime = DateTime.Now.AddDays(14);
var subquery = NHibernate.Criterion.QueryOver.Of<ToDo>().Select(t => t.TaskName);
var matchingItems = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime &&
w.IsWorkshopItemInProgress == true)
.WithSubquery.WhereProperty(x => x.Name).NotIn(subquery)
.Future<WorkshopItem>();
It returns the results I'm expecting and doesn't rely on magic strings. I'm hesitant because I don't fully understand the WithSubquery (and whether inlining it would be a good thing). It seems to equate to
WHERE WorkshopItem.Name IS NOT IN (subquery)
Also I don't understand the Future instead of List. If anyone would shed some light on those that would help.
I am not 100% sure how to achieve what you need using LINQ so to give you an option I am just putting up an alternative solution using nHibernate Criteria (this will execute in one database hit):
// Create a query
ICriteria query = Session.CreateCriteria<WorkShopItem>("wsi");
// Restrict to items due within the next 14 days
query.Add(Restrictions.Le("DateDue", DateTime.Now.AddDays(14));
// Return all TaskNames from Todo's
DetachedCriteria allTodos = DetachedCriteria.For(typeof(Todo)).SetProjection(Projections.Property("TaskName"));
// Filter Work Shop Items for any that do not have a To-do item
query.Add(SubQueries.PropertyNotIn("Name", allTodos);
// Return results
var matchingItems = query.Future<WorkShopItem>().ToList()
I'd recommend
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime)
var allTodos = Session.QueryOver<ToDo>();
Instead of
var allTodos = Session.QueryOver<ToDo>().List();
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime).List();
So that the collection isn't iterated until you need it to be.
I've found that it's helpfull to use linq extension methods to make subqueries more readable and less awkward.
For example:
var matches = from wsi in workshopItemsDueSoon
where !allTodos.Select(it=>it.TaskName).Contains(wsi.Name)
select wsi
Personally, since the query is fairly simple, I'd prefer to do it like so:
var matches = workshopItemsDueSoon.Where(wsi => !allTodos.Select(it => it.TaskName).Contains(wsi.Name))
The latter seems less verbose to me.
I return a List from a Linq query, and after it I have to fill the values in it with a for cycle.
The problem is that it is too slow.
var formentries = (from f in db.bNetFormEntries
join s in db.bNetFormStatus on f.StatusID.Value equals s.StatusID into entryStatus
join s2 in db.bNetFormStatus on f.ExternalStatusID.Value equals s2.StatusID into entryStatus2
where f.FormID == formID
orderby f.FormEntryID descending
select new FormEntry
{
FormEntryID = f.FormEntryID,
FormID = f.FormID,
IPAddress = f.IpAddress,
UserAgent = f.UserAgent,
CreatedBy = f.CreatedBy,
CreatedDate = f.CreatedDate,
UpdatedBy = f.UpdatedBy,
UpdatedDate = f.UpdatedDate,
StatusID = f.StatusID,
StatusText = entryStatus.FirstOrDefault().Status,
ExternalStatusID = f.ExternalStatusID,
ExternalStatusText = entryStatus2.FirstOrDefault().Status
}).ToList();
and then I use the for in this way:
for(var x=0; x<formentries.Count(); x++)
{
var values = (from e in entryvalues
where e.FormEntryID.Equals(formentries.ElementAt(x).FormEntryID)
select e).ToList<FormEntryValue>();
formentries.ElementAt(x).Values = values;
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
But it is definitely too slow.
Is there a way to make it faster?
it is definitely too slow. Is there a way to make it faster?
Maybe. Maybe not. But that's not the right question to ask. The right question is:
Why is it so slow?
It is a lot easier to figure out the answer to the first question if you have an answer to the second question! If the answer to the second question is "because the database is in Tokyo and I'm in Rome, and the fact that the packets move no faster than speed of light is the cause of my unacceptable slowdown", then the way you make it faster is you move to Japan; no amount of fixing the query is going to change the speed of light.
To figure out why it is so slow, get a profiler. Run the code through the profiler and use that to identify where you are spending most of your time. Then see if you can speed up that part.
For what I see, you are iterating trough formentries 2 more times without reason - when you populate the values, and when you convert to dictionary.
If entryvalues is a database driven - i.e. you get them from the database, then put the value field population in the first query.
If it's not, then you do not need to invoke ToList() on the first query, do the loop, and then the Dictionary creation.
var formentries = from f in db.bNetFormEntries
join s in db.bNetFormStatus on f.StatusID.Value equals s.StatusID into entryStatus
join s2 in db.bNetFormStatus on f.ExternalStatusID.Value equals s2.StatusID into entryStatus2
where f.FormID == formID
orderby f.FormEntryID descending
select new FormEntry
{
FormEntryID = f.FormEntryID,
FormID = f.FormID,
IPAddress = f.IpAddress,
UserAgent = f.UserAgent,
CreatedBy = f.CreatedBy,
CreatedDate = f.CreatedDate,
UpdatedBy = f.UpdatedBy,
UpdatedDate = f.UpdatedDate,
StatusID = f.StatusID,
StatusText = entryStatus.FirstOrDefault().Status,
ExternalStatusID = f.ExternalStatusID,
ExternalStatusText = entryStatus2.FirstOrDefault().Status
};
var formEntryDictionary = new Dictionary<int, FormEntry>();
foreach (formEntry in formentries)
{
formentry.Values = GetValuesForFormEntry(formentry, entryvalues);
formEntryDict.Add(formEntry.FormEntryID, formEntry);
}
return formEntryDictionary;
And the values preparation:
private IList<FormEntryValue> GetValuesForFormEntry(FormEntry formEntry, IEnumerable<FormEntryValue> entryValues)
{
return (from e in entryValues
where e.FormEntryID.Equals(formEntry.FormEntryID)
select e).ToList<FormEntryValue>();
}
You can change the private method to accept only entryId instead the whole formEntry if you wish.
It's slow because your O(N*M) where N is formentries.Count and M is entryvalues.Count Even with a simple test I was getting more than 20 times slower with only 1000 elements any my type only had an int id field, with 10000 elements in the list it was over 1600 times slower than the code below!
Assuming your entryvalues is a local list and not hitting a database (just .ToList() it to a new variable somewhere if that's the case), and assuming your FormEntryId is unique (which it seems to be from the .ToDictionary call then try this instead:
var entryvaluesDictionary = entryvalues.ToDictionary(entry => entry.FormEntryID, entry => entry);
for(var x=0; x<formentries.Count; x++)
{
formentries[x] = entryvaluesDictionary[formentries[x].FormEntryID];
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
It should go a long way to making it at least scale better.
Changes: .Count instead of .Count() just because it's better to not call extension method when you don't need to. Using a dictionary to find the values rather than doing a where for every x value in the for loop effectively removes the M from the bigO.
If this isn't entirely correct I'm sure you can change whatever is missing to suit your work case instead. But as an aside, you should really consider using case for your variable names formentries versus formEntries one is just that little bit easier to read.
There are some reasons why this might be slow regarding the way you use formentries.
The formentries List<T> from above has a Count property, but you are calling the enumerable Count() extension method instead. This extension may or may not have an optimization that detects that you're operating on a collection type that has a Count property that it can defer to instead of walking the enumeration to compute the count.
Similarly the formEntries.ElementAt(x) expression is used twice; if they have not optimized ElementAt to determine that they are working with a collection like a list that can jump to an item by its index then LINQ will have to redundantly walk the list to get to the xth item.
The above evaluation may miss the real problem, which you'll only really know if you profile. However, you can avoid the above while making your code significantly easier to read if you switch how you iterate the collection of formentries as follows:
foreach(var fe in formentries)
{
fe.Values = entryvalues
.Where(e => e.FormEntryID.Equals(fe.FormEntryID))
.ToList<FormEntryValue>();
}
return formentries.ToDictionary(entry => entry.FormEntryID, entry => entry);
You may have resorted to the for(var x=...) ...ElementAt(x) approach because you thought you could not modify properties on object referenced by the foreach loop variable fe.
That said, another point that could be an issue is if formentries has multiple items with the same FormEntryID. This would result in the same work being done multiple times inside the loop. While the top query appears to be against a database, you can still do joins with data in linq-to-object land. Happy optimizing/profiling/coding - let us know what works for you.
I need to add a literal value to a query. My attempt
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
In the above example, I get an error:
"Local sequence cannot be used in LINQ to SQL implementation
of query operators except the Contains() operator."
If I am using Entity Framework 4 for example, what could I add to the Union statement to always include the "seed" ID?
I am trying to produce SQL code like the following:
select distinct ID
from product
union
select 0 as ID
So later I can join the list to itself so I can find all values where the next highest value is not present (finding the lowest available ID in the set).
Edit: Original Linq Query to find lowest available ID
var skuQuery = Context.Products
.Where(p => p.sku > skuSeedStart &&
p.sku < skuSeedEnd)
.Select(p => p.sku).Distinct();
var lowestSkuAvailableList =
(from p1 in skuQuery
from p2 in skuQuery.Where(a => a == p1 + 1).DefaultIfEmpty()
where p2 == 0 // zero is default for long where it would be null
select p1).ToList();
var Answer = (lowestSkuAvailableList.Count == 0
? skuSeedStart :
lowestSkuAvailableList.Min()) + 1;
This code creates two SKU sets offset by one, then selects the SKU where the next highest doesn't exist. Afterward, it selects the minimum of that (lowest SKU where next highest is available).
For this to work, the seed must be in the set joined together.
Your problem is that your query is being turned entirely into a LINQ-to-SQL query, when what you need is a LINQ-to-SQL query with local manipulation on top of it.
The solution is to tell the compiler that you want to use LINQ-to-Objects after processing the query (in other words, change the extension method resolution to look at IEnumerable<T>, not IQueryable<T>). The easiest way to do this is to tack AsEnumerable() onto the end of your query, like so:
var aa = new List<long>();
aa.Add(0);
var a = Products.Select(p => p.sku).Distinct().AsEnumerable().Union(aa);
a.ToList().Dump(); // LinqPad's way of showing the values
Up front: not answering exactly the question you asked, but solving your problem in a different way.
How about this:
var a = Products.Select(p => p.sku).Distinct().ToList();
a.Add(0);
a.Dump(); // LinqPad's way of showing the values
You should create database table for storing constant values and pass query from this table to Union operator.
For example, let's imagine table "Defaults" with fields "Name" and "Value" with only one record ("SKU", 0).
Then you can rewrite your expression like this:
var zero = context.Defaults.Where(_=>_.Name == "SKU").Select(_=>_.Value);
var result = context.Products.Select(p => p.sku).Distinct().Union(zero).ToList();
I have implemented a paging routine using skip and take. It works great, but I need the total number of records in the table prior to calling Take and Skip.
I know I can submit 2 separate queries.
Get Count
Skip and Take
But I would prefer not to issue 2 calls to LINQ.
How can I return it in the same query (e.g. using a nested select statement)?
Previously, I used a paging technique in a stored procedure. I returned the items by using a temporary table, and I passed the count to an output parameter.
I'm sorry, but you can't. At least, not in a pretty way.
You can do it in an unpretty way, but I don't think you like that:
var query = from e in db.Entities where etc etc etc;
var pagedQuery =
from e in query.Skip(pageSize * pageNumber).Take(pageSize)
select new
{
Count = query.Count(),
Entity = e
};
You see? Not pretty at all.
There is no reason to do two seperate queries or even a stored procedure. Use a let binding to note a sub-query when you are done you can have an anon type that contains both your selected item as well as your total count. A single query to the database, 1 linq expression and your done. TO Get the values it would be jobQuery.Select(x => x.item) or jobQuery.FirstOrDefault().Count
Let expressions are an amazing thing.
var jobQuery = (
from job in jc.Jobs
let jobCount = (
from j in jc.Jobs
where j.CustomerNumber.Equals(CustomerNumber)
select
j
).Count()
where job.CustomerNumber.Equals(CustomerNumber)
select
new
{
item = job.OrderBy(x => x.FieldName).Skip(0).Take(100),
Count = jobCount
}
);