Entity framework with Linq to Entities performance - c#

If I have a static method like this
private static bool TicArticleExists(string supplierIdent)
{
using (TicDatabaseEntities db = new TicDatabaseEntities())
{
if((from a in db.Articles where a.SupplierArticleID.Equals(supplierIdent) select a).Count() > 0)
return true;
}
return false;
}
and use this method in various places in foreach loops or just plain calling it numerous times, does it create and open new connection every time?
If so, how can I tackle this? Should I cache the results somewhere, like in this case, I would cache the entire Classifications table in Memory Cache? And then do queries vs this cached object?
Or should I make TicDatabaseEntities variable static and initialize it at class level?
Should my class be static if it contains only static methods? Because right now it is not..
Also I've noticed that if I return result.First() instead of FirstOrDefault() and the query does not find a match, it will issue an exception (with FirstOrDefault() there is no exception, it returns null).
Thank you for clarification.

new connections are non-expensive thanks to connection caching. Basically, it grabs an already open connection (I htink they are kept open for 2 minutes for reuse).
Still, caching may be better. I do really not like the "firstordefault". Thinks of whether you can acutally pull in more in ONE statement, then work from that.
For the rest, I can not say anything - too much depends on what you actually do there logically. What IS TicDatabaseEntities? CAN it be cached? How long? Same with (3) - we do not know because we do not know what else is in there.
If this is something like getting just some lookup strings for later use, I would say....
Build a key out of classI, class II, class III
load all classifications in (I assume there are only a couple of hundred)
Put them into a static / cached dictionary, assuming they normally do not change (and I htink I have that idea here - is this a financial tickstream database?)
Without business knowledge this can not be answered.
4: yes, that is as documented. First gives first or an exception, FirstOrDefault defaults to default (empty struct initialized with 0, null for classes).

Thanks Dan and TomTom, I've came up with this. Could you please comment this if you see anything out or the order?
public static IEnumerable<Article> TicArticles
{
get
{
ObjectCache cache = MemoryCache.Default;
if (cache["TicArticles"] == null)
{
CacheItemPolicy policy = new CacheItemPolicy();
using(TicDatabaseEntities db = new TicDatabaseEntities())
{
IEnumerable<Article> articles = (from a in db.Articles select a).ToList();
cache.Set("TicArticles", articles, policy);
}
}
return (IEnumerable<Article>)MemoryCache.Default["TicArticles"];
}
}
private static bool TicArticleExists(string supplierIdent)
{
if (TicArticles.Count(p => p.SupplierArticleID.Equals(supplierIdent)) > 0)
return true;
return false;
}
If this is ok, I'm going to make all my method follow this pattern.

does it create and open new connection every time?
No. Connections are cached.
Should I cache the results somewhere
No. Do not cache entire tables.
should I make TicDatabaseEntities variable static and initialize it at class level?
No. Do not retain a DataContext instance longer than a UnitOfWork.
Should my class be static if it contains only static methods?
Sure... doing so will prevent anyone from creating useless instances of the class.
Also I've noticed that if I return result.First() instead of FirstOrDefault() and the query does not find a match, it will issue an exception
That is the behavior of First. As such - I typically restrict use of First to IGroupings or to collections previously checked with .Any().
I'd rewrite your existing method as:
using (TicDatabaseEntities db = new TicDatabaseEntities())
{
bool result = db.Articles
.Any(a => a.supplierArticleID.Equals(supplierIdent));
return result;
}
If you are calling the method in a loop, I'd rewrite to:
private static Dictionary<string, bool> TicArticleExists
(List<string> supplierIdents)
{
using (TicDatabaseEntities db = new TicDatabaseEntities())
{
HashSet<string> queryResult = new HashSet(db.Articles
.Where(a => supplierIdents.Contains(a.supplierArticleID))
.Select(a => a.supplierArticleID));
Dictionary<string, bool> result = supplierIdents
.ToDictionary(s => s, s => queryResult.Contains(s));
return result;
}
}

I'm trying to find the article where I read this, but I think it's better to do (if you're just looking for a count):
from a in db.Articles where a.SupplierArticleID.Equals(supplierIdent) select 1
Also, use Any instead of Count > 0.
Will update when I can cite a source.

Related

how to speed up query binary search in LINQ

I use Linq select statement to get file as byte from the database. It take long time to get the byte array from my Files Table.
I'm using this code:
Fun<Files,bool> Filter = delegate(Files x)
{
return x.FileID == 10;
};
Files File = DAL.FilesDAL.GetFile(Filter).First();
public static List<Files> GetFile(Func<Files,bool> lambda){
return db.Where(lamnda).ToList();
}
For 1M file size it take up to 1m. That too long for my clients.
How I can improve the speed of that query?
Looks like you're missing the fact, that using Func<> makes your query be executed as LINQ to Objects. That means entire table is fetched into application memory and then filtering is performed by the application, not by the database. To make the filter execute at database it should be passed as Expression<Func<>> instead:
public static List<Files> GetFile(Expression<Func<Files,bool>> lambda){
return db.Where(lamnda).ToList();
}
I assumed, that db here is IQueryable<Files>.
To call it use:
Files File = DAL.FilesDAL.GetFile(x => x.Filter == 10).First();
To make it even more descriptive, you should probably change your method to only return one Files item. Method name is GetFile, so I would expect it to return one file, not collection of files:
public static Files GetFile(Expression<Func<Files,bool>> lambda){
return db.FirstOrDefault(lamnda);
}
usage:
var File = DAL.FilesDAL.GetFile(x => x.Filter == 10);
Or to increase semantics, you can refactor the method to be GetFileById and take Id, not an expression:
public static Files GetFileById(int Id)
{
return db.FirstOrDefault(x => x.FileId == id);
}
usage
var File = DAL.FilesDAL.GetFileById(10);
How I can improve the speed of that query?
Faster server. Faster network. Do not get the binary content you do not need.
SImple like that. NOthing you can do on LINQ level that will magically fix a bad design.
Generally:
Btw., a bad mistake - you return ToList in GetFiles. Never do that. Either let it a queryable, so the user can add conditions and order and grouping or return an enumerable (which is way less powerfull) but ToList means all gets into memory - even if the user then does not want a list.
And man, the way to hand in the lambdaa is just redundant and limiting. Bad design.

Building an Observable Repository with Rx

I'm working in the common scenario whereby I'd like to access a subset of a repository and not worry about having to keep it updated e.g. 'get all orders whose price is greater than 10'. I have implemented a solution but have two issues with it (listed at the end).
A subset of a repository can be achieved with something equivalent to
var expensiveOrders = Repository.GetOrders().Where(o => o.Price > 10);
But this is an IEnumerable and will not be updated when the original collection is updated. I could add handlers for CollectionChanged, but what if we want to access a further subset?
var expensiveOrdersFromBob = expensiveOrders.Where(o => o.Name == Bob);
We'd have to wire up a collection-changed for this one as well. The concept of live updates led me to thinking of Rx, so I set about to build an ObservableCache which contains both the ObservableCollection of items that auto-updates itself, and an RX stream for notification. (The stream is also what updates the cache under the hood.)
class ObservableCache<T> : IObservableCache<T>
{
private readonly ObservableCollection<T> _cache;
private readonly IObservable<Tuple<T, CRUDOperationType>> _updates;
public ObservableCache(IEnumerable<T> initialCache
, IObservable<Tuple<T, CRUDOperationType>> currentStream, Func<T, bool> filter)
{
_cache = new ObservableCollection<T>(initialCache.Where(filter));
_updates = currentStream.Where(tuple => filter(tuple.Item1));
_updates.Subscribe(ProcessUpdate);
}
private void ProcessUpdate(Tuple<T, CRUDOperationType> update)
{
var item = update.Item1;
lock (_cache)
{
switch (update.Item2)
{
case CRUDOperationType.Create:
_cache.Add(item);
break;
case CRUDOperationType.Delete:
_cache.Remove(item);
break;
case CRUDOperationType.Replace:
case CRUDOperationType.Update:
_cache.Remove(item); // ToDo: implement some key-based equality
_cache.Add(item);
break;
}
}
}
public ObservableCollection<T> Cache
{
get { return _cache; }
}
public IObservable<T> Updates
{
get { return _updates.Select(tuple => tuple.Item1); }
}
public IObservableCache<T> Where(Func<T, bool> predicate)
{
return new ObservableCache<T>(_cache, _updates, predicate);
}
}
You can then use it like this:
var expensiveOrders = new ObservableCache<Order>(_orders
, updateStream
, o => o.Price > 10);
expensiveOrders.Updates.Subscribe
(o => Console.WriteLine("Got new expensive order: " + o));
_observableBoundToSomeCtrl = expensiveOrders.Cache;
var expensiveOrdersFromBob = expensiveOrders
.Where(o => o.Name == "Bob");
expensiveOrdersFromBob.Updates.Subscribe
(o => Console.WriteLine("Got new expensive order from Bob: " + o));
_observableBoundToSomeOtherCtrl = expensiveOrdersFromBob.Cache;
And so forth, the idea being that you can keep projecting the cache into narrower and narrower subsets and never have to worry about it being out of sync. So what is my problem then?
I'm wondering whether I can do away with the CRUD stuff by having RX intrinsically update the collections. Maybe 'project' the updates with a Select, or something like that?
There is a race condition intrinsic to the repository-with-update pattern, in that I might miss some updates while I'm constructing the new cache. I think I need some sort of sequencing, but that would mean having all my T objects implement an ISequenceableItem interface. Is there any better way to do this? RX is great because it handles all the threading for you. I'd like to leverage that.
The OLinq project at http://github.com/wasabii/OLinq is designed for this kind of reactive updating, and the ObservableView is, I think, what you are after.
Have a look at these two projects which achieve what you want albeit by different means:
https://github.com/RolandPheasant/DynamicData
https://bitbucket.org/mendelmonteiro/reactivetables [disclaimer: this is my project]
Suppose you have a definition like this:
class SetOp<T>
{
public T Value { get; private set; }
public bool Include { get; private set; }
public SetOp(T value, bool include)
{
Value = value;
Include = include;
}
}
Using Observable.Scan and System.Collections.Immutable you can do something like this:
IObservable<SetOp<int>> ops = ...;
IImmutableSet<int> empty = ImmutableSortedSet<int>.Empty;
var observableSet = ops
.Scan(empty, (s, op) => op.Include ? s.Add(op.Value) : s.Remove(op.Value))
.StartWith(empty);
Using the immutable collection type is the key trick here: any observer of the observableSet can do whatever it wants with the values that are pushed at it, because they are immutable. Add it is efficient because it reuses the majority of the set data structure between consecutive values.
Here is an example of an ops stream and the corresponding observableSet.
ops observableSet
-------- ------------------
{}
Add 7 {7}
Add 4 {4,7}
Add 5 {4,5,7}
Add 6 {4,5,6,7}
Remove 5 {4,6,7}
Add 8 {4,6,7,8}
Remove 4 {6,7,8}
You should not need to lock _cache within ProcessUpdate. If your source observable currentStream is honoring Rx Guidelines you are guaranteed to only be within a single call to OnNext at a time. In otherwords, you will not receive another value from the stream while you are still processing the previous value.
The only reliable way to solve your race condition is to make sure you create the cache before the updateStream starts producing data.
You may want to take a look at Extensions for Reactive Extensions (Rxx). I believe Dave has built a number of utilities for binding UI controls to observable data. Documentation is sparse. I don't know if there is anything there for what you are doing.

SQL "not in" syntax for Entity Framework 4.1

I have a simple issue with Entity Framework syntax for the "not in" SQL equivalent. Essentially, I want to convert the following SQL syntax into Entity Framework syntax:
select ID
from dbo.List
where ID not in (list of IDs)
Here is a method that I use for looking up a single record:
public static List GetLists(int id)
{
using (dbInstance db = new dbInstance())
{
return db.Lists.Where(m => m.ID == id);
}
}
Here is a pseudo-method that I want to use for this:
public static List<List> GetLists(List<int> listIDs)
{
using (dbInstance db = new dbInstance())
{
return db.Lists.Where(**** What Goes Here ****).ToList();
}
}
Can anyone give me pointers as to what goes in the Where clause area? I read some forums about this and saw mention of using .Contains() or .Any(), but none of the examples were a close enough fit.
Give this a go...
public static List<List> GetLists(List<int> listIDs)
{
using (dbInstance db = new dbInstance())
{
// Use this one to return List where IS NOT IN the provided listIDs
return db.Lists.Where(x => !listIDs.Contains(x.ID)).ToList();
// Or use this one to return List where IS IN the provided listIDs
return db.Lists.Where(x => listIDs.Contains(x.ID)).ToList();
}
}
These will turn into approximately the following database queries:
SELECT [Extent1].*
FROM [dbo].[List] AS [Extent1]
WHERE NOT ([Extent1].[ID] IN (<your,list,of,ids>))
or
SELECT [Extent1].*
FROM [dbo].[List] AS [Extent1]
WHERE [Extent1].[ID] IN (<your,list,of,ids>)
respectively.
This one requires you to think backwards a little bit. Instead of asking if the value is not in some list of ids, you have to ask of some list of id's does not contain the value. Like this
int[] list = new int[] {1,2,3}
Result = (from x in dbo.List where list.Contains(x.id) == false select x);
Try this for starters ...
m => !listIDs.Contains(m.ID)
This might be a way to do what you want:
// From the method you provided, with changes...
public static List GetLists(int[] ids) // Could be List<int> or other =)
{
using (dbInstance db = new dbInstance())
{
return db.Lists.Where(m => !ids.Contains(m.ID));
}
}
However I've found that doing so might raise error on some scenarios, specially when the list is too big and connection is somewhat slow.
Remember to check everything else BEFORE so this filter might have less values to check.
Also remember that Linq does not populate the variable when you build your filter/query (at least not by default). If you're going to iterate for each record, remember to call a ToList() or ToArray() method before, unless each record has 500MB or more...

How to optimise this short and simple code (includes a best practice question)

Here is a question I frequently ask myself. Here is a code with no repeated code:
private static void delete(Guid teamID, Guid repID)
{
using (var context = AccesDataRépart.GetNewContext())
{
Team_Representant teamRepresentant = getTeamRep(teamID, repID);
if (teamRepresentant != null)
context.Team_Representant.DeleteOnSubmit(teamRepresentant);
context.SubmitChanges();
}
}
private static Team_Representant getTeamRep(Guid teamID, Guid repID)
{
using (var context = AccesDataRépart.GetNewContext())
{
return (from c in context.Team_Representant
where c.RepID == repID &&
c.TeamID == teamID
select c).FirstOrDefault();
}
}
It is normal to have a getTeamRep function, it is used very often. On the other hand, I don't repeat the query it contains in the Delete function because it would generate extra steps and would generally be slower.
What do you do in such case? Do you repeat the getTeamRep linq query in the Delete function or do you accept this extra workload?
Thanks!
I never do anything twice :). Make a variable to hold the result of getTeamRep.
If you've never tried it before, get rid of your static stuff and make these all instance methods. Have a Team class and a Rep class, and let the Teams contain their Reps. It might be more fun this way, and it tends to prevent the whole problem of looking up the same object twice.
It's a matter of preference. I find instance methods more elegant in most cases, because there are fewer parameters:
Team team = new Team(teamID);
team.Delete(repID);

Linq-to-Sql: recursively get children

I have a Comment table which has a CommentID and a ParentCommentID. I am trying to get a list of all children of the Comment. This is what I have so far, I haven't tested it yet.
private List<int> searchedCommentIDs = new List<int>();
// searchedCommentIDs is a list of already yielded comments stored
// so that malformed data does not result in an infinite loop.
public IEnumerable<Comment> GetReplies(int commentID) {
var db = new DataClassesDataContext();
var replies = db.Comments
.Where(c => c.ParentCommentID == commentID
&& !searchedCommentIDs.Contains(commentID));
foreach (Comment reply in replies) {
searchedCommentIDs.Add(CommentID);
yield return reply;
// yield return GetReplies(reply.CommentID)); // type mis-match.
foreach (Comment replyReply in GetReplies(reply.CommentID)) {
yield return replyReply;
}
}
}
2 questions:
Is there any obvious way to improve this? (Besides maybe creating a view in sql with a CTE.)
How come I can't yield a IEnumerable <Comment> to an IEnumerable <Comment>, only Comment itself?
Is there anyway to use SelectMany in this situation?
I'd probably use either a UDF/CTE, or (for very deep structures) a stored procedure that does the same manually.
Note that if you can change the schema, you can pre-index such recursive structures into an indexed/ranged tree that lets you do a single BETWEEN query - but the maintenance of the tree is expensive (i.e. query becomes cheap, but insert/update/delete become expensive, or you need a delayed scheduled task).
Re 2 - you can only yield the type specified in the enumeration (the T in IEnumerable<T> / IEnumerator<T>).
You could yield an IEnumerable<Comment> if the method returned IEnumerable<IEnumerable<Comment>> - does that make sense?
Improvements:
perhaps a udf (to keep composability, rather than a stored procedure) that uses the CTE recursion approach
use using, since DataContext is IDisposable...
so:
using(var db = new MyDataContext() ) { /* existing code */ }
LoadWith is worth a try, but I'm not sure I'd be hopeful...
the list of searched ids is risky as a field - I guess you're OK as long as you don't call it twice... personally, I'd use an argument on a private backing method... (i.e. pass the list between recursive calls, but not on the public API)

Categories