I'm not sure I fully understand the n+1 problem. Does this case also relate to the n+1 problem?
Using EF Core: for example there are 10000 members and 1000000 transactions.
public class ReportService
{
...
public IEnumerbale<ReportItem> GetResult()
{
var reportItems = new List<ReportItem>();
var members = _context.Users.Where(x => x.IsMember);
foreach(var member in members)
{
var calculationResult = _calculationService.Calculate(member.Id);
reportItems.Add(calculationResult);
}
return reportItems;
}
}
public class CalculationService
{
...
public CalculationResult Calculate(int memberId)
{
var memberTransactions = _context.Transactions.Where(x => x.UserId == memberId);
var result = new CalculationResult(memberTransactions.Sum(x => x.Amount));
return result;
}
}
Should I move responsibility to get data from the CalculationService (to avoid many queries)? What is the best way to avoid situations like this one?
Finding users and than making a foreach to finding each users Transaction data will make a lot of effort and may kill your perfomance.
This kind of code will make you api quite slow and it may throw time out exceptions. the better way will be joining your both tables and just getting result from joined list. Something like this will be better.
context.Users.Join(
contex.Transactions,
x => x.MemberId,
xm => xm.MemberId
(x,xm) => new {Users = x, Transactions = xm }
).Select(p => p.Transactions.Amount).Sum()
This will make easier for you app and you dont't need each time make a query.
Related
I am writing a C# .NET 5 application that will act as a backend for an Angular frontend, providing CRUD APIs. The purpose of the app is managing a flight school.
I am writing the API methods that will return the list of pilots and the single pilot, that is https://myapp/api/pilots and https://myapp/api/pilots/{id}.
I have three methods here:
GetPilots() for the complete list
GetPilot(long idPilot) for the detail
an auxiliary method GetFlightTime(long idPilot) to return the total flight time of each pilot doing the sum of the flight durations.
My problem: the detail method works because I first call the auxiliary function and then I use the result in the returned viewmodel. But GetPilots() doesn't work and returns the error System.InvalidOperationException: The client projection contains a reference to a constant expression of 'QTBWeb.Models.PilotsRepository' through the instance method 'GetFlightTime'. This could potentially cause a memory leak; consider making the method static so that it does not capture constant in the instance.
Is this because I am calling the GetFlightTime method inside the LINQ expression? I don't understand the make the method static suggestion. How can I reformat the code to make it work?
Thanks!
public IEnumerable<PilotViewModel> GetPilots() // THIS METHOD RETURNS ERROR
{
return _context.Pilots
.Select(pilot => new PilotViewModel
{
Id = pilot.Id,
Name = pilot.Name,
FlightMinutes = GetFlightTime(pilot.Id) // THE PROBLEM IS HERE
})
.ToList();
}
public PilotViewModel GetPilot(long idPilot) // THIS METHOD WORKS
{
var _flightMinutes = GetFlightTime(idPilot);
return _context.Pilots
.Select(pilot => new PilotViewModel
{
Id = pilot.Id,
Name = pilot.Name,
FlightMinutes = _flightMinutes
})
.Where(pilot => pilot.Id == idPilot)
.FirstOrDefault();
}
public int GetFlightTime(long idPilot)
{
return _context.Flights
.Where(flight => flight.pilot == idPilot)
.Select(flight => flight.Duration).Sum();
}
A good way to solve this would be to make sure that your Pilot class has a collection of Flights, serving as the other side of the one-to-many map you have as Flight.Pilot.
You can then use this collection to calculate the sum, without having to query the database for every looped instance of Pilot.
Your code would look something like this:
public IEnumerable<PilotViewModel> GetPilots()
{
return _context.Pilots
.Include(pilot => pilot.Flights) // Include Flights to join data
.Select(pilot => new PilotViewModel
{
Id = pilot.Id,
Name = pilot.Name,
FlightMinutes = pilot.Flights.Sum(flight => flight.Duration)
});
}
public PilotViewModel GetPilot(long idPilot)
{
return _context.Pilots
.Include(pilot => pilot.Flights) // Include Flights to join data
.Where(pilot => pilot.Id == idPilot) // Notice how we filter first
.Select(pilot => new PilotViewModel
{
Id = pilot.Id,
Name = pilot.Name,
FlightMinutes = pilot.Flights.Sum(flight => flight.Duration)
})
.FirstOrDefault();
}
I've got realm implemented in a PCL for Xamarin. This works fine, and as it should (data is being stored and retrieved).
Now that I'm building more and more features I'm running into the situation that I can't find a way to query empty collections.
I need to return a IRealmCollection<Customer> because of model binding, so I can't enumerate and THEN filter items out that have no blogentries.
Any idea how I can make this happen on an IQueryable?
I tried
var realm = Realm.GetInstance();
var customers = realm.All<Customer>();
// errors out - only Realm-managed props can be used
customers = customers.Where(x => x.BlogEntries.Count > 0));
// errors out - Any() is not supported
customers = customers.Where(x => x.BlogEntries.Any());
// errors out - Datatype mismatch in comparison
customers = customers.Where(x => x.BlogEntries != null);
// errors out - Datatype mismatch in comparison
customers = customers.Where(x => x.BlogEntries == default(IList<BlogEntries>));
Unfortunately that is not supported as of Realm Xamarin 1.2.0. What you could do is implement a poor-man's collection notifications to work around the issue:
public class MyViewModel
{
private IRealmCollection<BlogEntry> _blogEntries;
private IEnumerable<Customer> _customers;
public IEnumerable<Customer> Customers
{
get { return _customers; }
set { Set(ref _customers, value); }
}
public MyViewModel
{
Customers = realm.All<Customer>()
.AsEnumerable()
.Where(c => !c.BlogEntries.Any())
.ToArray();
_blogEntries = realm.All<BlogEntry>().AsRealmCollection();
_blogEntries.CollectionChanged += (s, e) =>
{
var updatedCustomers = realm.All<Customer>()
.AsEnumerable()
.Where(c => !c.BlogEntries.Any())
.ToArray();
if (!IsEquivalent(updatedCustomers, Customers))
{
Customers = updatedCustomers;
}
};
}
private bool IsEquivalent(Customer[] a, Customer[] b)
{
if (a.Length != b.Length)
{
return false;
}
for (var i = 0; i < a.Length; i++)
{
if (!a[i].Equals(b[i]))
{
return false;
}
}
return true;
}
}
As when we call ToArray() we only lose collection change notifications, we implement them naively by observing the blog entries collection in Realm and doing a simplistic check if anything has been updated. If you feel like it, you could extend this solution and wrap it in a custom implementation of INotifyCollectionChanged and bind to that. Then, you could even apply some semantics to raise the correct collection change events (or opt for a simple .Reset for each change).
To address any performance concerns, calling ToArray on a Realm collection will not materialize objects' properties, so it's relatively cheap. The only somewhat expensive operation is iterating over all Customer objects and checking their BlogEntries lists. My advise would be to give it a try and see if it performs satisfactory for your use case.
I'm currently working on a web application in asp.net. In certain api-calls it is necessary to compare ListA with a ListB of Lists to determine if ListA has the same elements of any List in ListB. In other words: If ListA is included in ListB.
Both collections are queried with Linq of an EF-Code-First db. ListB has either one matching List or none, never more than one. In the worst case ListB has millions of elements, so the comparison needs to be scalable.
Instead of doing nested foreach loops, i'm looking for a pure linq query, which will let the db do the work. (before i consider multi column index)
To illustrate the structure:
//In reality Lists are queried of EF
var ListA = new List<Element>();
var ListB = new List<List<Element>>();
List<Element> solution;
bool flag = false;
foreach (List e1 in ListB) {
foreach(Element e2 in ListA) {
if (e1.Any(e => e.id == e2.id)) flag = true;
else {
flag = false;
break;
}
}
if(flag) {
solution = e1;
break;
}
}
Update Structure
Since its a EF Database i'll provide the relevant Object Structure. I'm not sure if i'm allowed to post real code, so this example is still generic.
//List B
class Result {
...
public int Id;
public virtual ICollection<Curve> curves;
...
}
class Curve {
...
public int Id;
public virtual Result result;
public int resultId;
public virtual ICollection<Point> points;
...
}
public class Point{
...
public int Id;
...
}
The controller (for the api-call) wants to serve the right Curve-Object. To identify the right Object, a filter (ListA) is provided (which is in fact a Curve Object)
Now the filter (ListA) needs to be compared to the List of Curves in Result (ListB)
The only way to compare the Curves is by comparing the Points both have.
(So infact comparing Lists)
Curves have around 1 - 50 Points.
Result can have around 500.000.000 Curves
It's possible to compare by Object-Identity here, because all Objects (even the filter) is re-queried of the db.
I'm looking for a way to implement this mechanism, not how to get around this situation. (e.g. by using multi column index (altering the table))
(for illustration purposes):
class controller {
...
public Response serveRequest(Curve filter) {
foreach(Curve c in db.Result.curves) {
if(compare(filter.points , c.points)) return c;
}
}
}
Use Except:
public static bool ContainsAllItems(IList<T> listA, IList<T> listB)
{
return !listB.Except(listA).Any();
}
the above method will tell if listA contains all the elements of listB or not..and the complexity is much faster than O(n*m) approach.
Try this:
bool isIn = ListB.Any(x=>x.Count==ListA.Count && ListA.All(y=>x.Contains(y)));
or, if you want the element
var solution = ListB.FirstOrDefault(x=>x.Count==ListA.Count && ListA.All(y=>x.Contains(y)));
I have something for you:
var db = new MyContext();
var a = db.LoadList(); // or whatever
var b = new List<IQueryable<Entities>>(db.LoadListOfLists()/*or whatever*/);
b.Any(x => x.Count.Equals(a.Count) & x.All(y => a.Any(z => z.Id == y.Id)));
Because performance is concern, I would suggest convert your listA to lookup/dictionary before comparing Ex-
var listALookup = listA.ToLookup(item => item.Id);
var result = listB.FirstOrDefault(childList => childList.Count == listA.Count && childList.All(childListItem => listALookup.Contains(childListItem.Id)));
Lookup.Contain is O(1) while List.Contains is O(n)
Better option is to perform this comparison at db level, to reduce loading unnecessary data.
I have a list of transactions and i need to find if there is more then 1 account
i did
var MultipleAccounts = list.GroupBy(t => t.AccountId).Count() > 1;
is there a better way?
If you're willing to lose the single-line I prefer the use of !.All(item => bool) or .Any(item => bool) as I think it's the most semantic and easiest to read, as well as being a good candidate for the fastest.
var accountId = accounts[0].AccountId;
var hasMultipleAccounts = !accounts.All(account => account.AccountId == accountId);
Alternatively, and perhaps even more semantically, you could use .Any(item => bool) instead of .All(item => bool).
var accountId = accounts[0].AccountId;
var hasMultipleAccounts = accounts.Any(account => account.AccountId != accountId);
Things to watch out for are making sure you have at least one item (so that accounts[0] doesn't fail) and not doing a multiple enumeration of your IEnumerable. You say you're working with a List, so multiple enumeration shouldn't cause you any trouble, but when you just have an unknown IEnumerable it's important to be careful.
I prefer:
var MultipleAccounts = list.Select(t => t.AccountId).Distinct().Skip(1).Any();
This should be exceedingly fast as it will stop iterating the source list as soon as it finds a second AccountId.
Anytime you execute a full .Count() it has to iterate the full source list.
You can test this with the following code:
void Main()
{
Console.WriteLine(Data().Select(t => t).Distinct().Skip(1).Any());
}
private Random __random = new Random();
public IEnumerable<int> Data()
{
while (true)
{
var #return = __random.Next(0, 10);
Console.WriteLine(#return);
yield return #return;
}
}
A typical run looks like this:
7
9
True
Ok here is what i found the quickest
public bool HasMultipleAccounts(List<Account> list)
{
foreach (var account in list)
if (account.AccountId != list[0].AccountId)
return true;
return false;
}
usage: var MultipleAccounts = HasMultipleAccounts(list);
Credits: #hvd
i know its more code but if you think what the cpu needs to do its the quickest
Does LINQ have a way to "memorize" its previous query results while querying?
Consider the following case:
public class Foo {
public int Id { get; set; }
public ICollection<Bar> Bars { get; set; }
}
public class Bar {
public int Id { get; set; }
}
Now, if two or more Foo have same collection of Bar (no matter what the order is), they are considered as similar Foo.
Example:
foo1.Bars = new List<Bar>() { bar1, bar2 };
foo2.Bars = new List<Bar>() { bar2, bar1 };
foo3.Bars = new List<Bar>() { bar3, bar1, bar2 };
In the above case, foo1 is similar to foo2 but both foo1 and foo2 are not similar tofoo3
Given that we have a query result consisting IEnumerable or IOrderedEnumerable of Foo. From the query, we are to find the first N foo which are not similar.
This task seems to require a memory of the collection of bars which have been chosen before.
With partial LINQ we could do it like this:
private bool areBarsSimilar(ICollection<Bar> bars1, ICollection<Bar> bars2) {
return bars1.Count == bars2.Count && //have the same amount of bars
!bars1.Select(x => x.Id)
.Except(bars2.Select(y => y.Id))
.Any(); //and when excepted does not return any element mean similar bar
}
public void somewhereWithQueryResult(){
.
.
List<Foo> topNFoos = new List<Foo>(); //this serves as a memory for the previous query
int N = 50; //can be any number
foreach (var q in query) { //query is IOrderedEnumerable or IEnumerable
if (topNFoos.Count == 0 || !topNFoos.Any(foo => areBarsSimilar(foo.Bars, q.Bars)))
topNFoos.Add(q);
if (topNFoos.Count >= N) //We have had enough Foo
break;
}
}
The topNFoos List will serve as a memory of the previous query and we can skip the Foo q in the foreach loop which already have identical Bars with Any of the Foo in the topNFoos.
My question is, is there any way to do that in LINQ (fully LINQ)?
var topNFoos = from q in query
//put something
select q;
If the "memory" required is from a particular query item q or a variable outside of the query, then we could use let variable to cache it:
int index = 0;
var topNFoos = from q in query
let qc = index++ + q.Id //depends on q or variable outside like index, then it is OK
select q;
But if it must come from the previous querying of the query itself then things start to get more troublesome.
Is there any way to do that?
Edit:
(I currently am creating a test case (github link) for the answers. Still figuring out how can I test all the answers fairly)
(Most of the answers below are aimed to solve my particular question and are in themselves good (Rob's, spender's, and David B's answers which use IEqualityComparer are particularly awesome). Nevertheless, if there is anyone who can give answer to my more general question "does LINQ have a way to "memorize" its previous query results while querying", I would also be glad)
(Apart from the significant difference in performance for the particular case I presented above when using fully/partial LINQ, one answer aiming to answer my general question about LINQ memory is Ivan Stoev's. Another one with good combination is Rob's. As to make myself clearer, I look for general and efficient solution, if there is any, using LINQ)
I'm not going to answer your question directly, but rather, propose a method that will be fairly optimally efficient for filtering the first N non-similar items.
First, consider writing an IEqualityComparer<Foo> that uses the Bars collection to measure equality. Here, I'm assuming that the lists might contain duplicate entries, so have quite a strict definition of similarity:
public class FooSimilarityComparer:IEqualityComparer<Foo>
{
public bool Equals(Foo a, Foo b)
{
//called infrequently
return a.Bars.OrderBy(bar => bar.Id).SequenceEqual(b.Bars.OrderBy(bar => bar.Id));
}
public int GetHashCode(Foo foo)
{
//called frequently
unchecked
{
return foo.Bars.Sum(b => b.GetHashCode());
}
}
}
You can really efficiently get the top N non-similar items by using a HashSet with the IEqualityComparer above:
IEnumerable<Foo> someFoos; //= some list of Foo
var hs = new HashSet<Foo>(new FooSimilarityComparer());
foreach(var f in someFoos)
{
hs.Add(f); //hashsets don't add duplicates, as measured by the FooSimilarityComparer
if(hs.Count >= 50)
{
break;
}
}
#Rob s approach above is broadly similar, and shows how you can use the comparer directly in LINQ, but pay attention to the comments I made to his answer.
So, it's ... possible. But this is far from performant code.
var res = query.Select(q => new {
original = q,
matches = query.Where(innerQ => areBarsSimilar(q.Bars, innerQ.Bars))
}).Select(g => new { original = g, joinKey = string.Join(",", g.matches.Select(m => m.Id)) })
.GroupBy (g => g.joinKey)
.Select(g => g.First().original.original)
.Take(N);
This assumes that the Ids are unique for each Foo (you could also use their GetHashCode(), I suppose).
A much better solution is to either keep what you've done, or implement a custom comparer, as follows:
Note: As pointed out in the comments by #spender, the below Equals and GetHashCode will not work for collections with duplicates. Refer to their answer for a better implementation - however, the usage code would remain the same
class MyComparer : IEqualityComparer<Foo>
{
public bool Equals(Foo left, Foo right)
{
return left.Bars.Count() == right.Bars.Count() && //have the same amount of bars
left.Bars.Select(x => x.Id)
.Except(right.Bars.Select(y => y.Id))
.ToList().Count == 0; //and when excepted returns 0, mean similar bar
}
public int GetHashCode(Foo foo)
{
unchecked {
int hc = 0;
if (foo.Bars != null)
foreach (var p in foo.Bars)
hc ^= p.GetHashCode();
return hc;
}
}
}
And then your query becomes simply:
var res = query
.GroupBy (q => q, new MyComparer())
.Select(g => g.First())
.Take(N);
IEnumerable<Foo> dissimilarFoos =
from foo in query
let key = string.Join('|',
from bar in foo.Bars
order by bar.Id
select bar.Id.ToString())
group foo by key into g
select g.First();
IEnumerable<Foo> firstDissimilarFoos =
dissimilarFoos.Take(50);
Sometimes, you may not like the behavior of groupby in the above queries. At the time the query is enumerated, groupby will enumerate the entire source. If you only want partial enumeration, then you should switch to Distinct and a Comparer:
class FooComparer : IEqualityComparer<Foo>
{
private string keyGen(Foo foo)
{
return string.Join('|',
from bar in foo.Bars
order by bar.Id
select bar.Id.ToString());
}
public bool Equals(Foo left, Foo right)
{
if (left == null || right == null) return false;
return keyGen(left) == keyGen(right);
}
public bool GetHashCode(Foo foo)
{
return keyGen(foo).GetHashCode();
}
}
then write:
IEnumerable<Foo> dissimilarFoos = query.Distinct(new FooComparer());
IEnumerable<Foo> firstDissimilarFoos = dissimilarFoos.Take(50);
Idea. You might be able to hack something by devising your own fluent interface of mutators over a cache that you'd capture in "let x = ..." clauses, along the lines of,
from q in query
let qc = ... // your cache mechanism here
select ...
but I suspect you'll have to be careful to limit the updates to your cache to those "let ..." only, as I doubt the implementation of the standard Linq operators and extensions methods will be happy if you allow such side effects to happen in their back through predicates applied in the "where", or "join", "group by", etc, clauses.
'HTH,
I guess by "full LINQ" you mean standard LINQ operators/Enumerable extension methods.
I don't think this can be done with LINQ query syntax. From standard methods the only one that supports mutable processing state is Enumerable.Aggregate, but it gives you nothing more than a LINQ flavor over the plain foreach:
var result = query.Aggregate(new List<Foo>(), (list, next) =>
{
if (list.Count < 50 && !list.Any(item => areBarsSimilar(item.Bars, next.Bars)))
list.Add(next);
return list;
});
Since looks like we are allowed to use helper methods (like areBarsSimilar), the best we can do is to make it at least look more LINQ-ish by defining and using a custom extension method
var result = query.Aggregate(new List<Foo>(), (list, next) => list.Count < 50 &&
!list.Any(item => areBarsSimilar(item.Bars, next.Bars)) ? list.Concat(next) : list);
where the custom method is
public static class Utils
{
public static List<T> Concat<T>(this List<T> list, T item) { list.Add(item); return list; }
}
But note that compared to vanilla foreach, Aggregate has an additional drawback of not being able to exit earlier, thus will consume the whole input sequence (which besides the performance also means it doesn't work with infinite sequences).
Conclusion: While this should answer your original question, i.e. it's technically possible to do what you are asking for, LINQ (like the standard SQL) is not well suited for such type of processing.