LinQ optimization - c#

Here is a peace of code:
void MyFunc(List<MyObj> objects)
{
MyFunc1(objects);
foreach( MyObj obj in objects.Where(obj1=>obj1.Good))
{
// Do Action With Good Object
}
}
void MyFunc1(List<MyObj> objects)
{
int iGoodCount = objects.Where(obj1=>obj1.Good).Count();
BeHappy(iGoodCount);
// do other stuff with 'objects' collection
}
Here we see that collection is analyzed twice and each time the value of 'Good' property is checked for each member: 1st time when calculating count of good objects, 2nd - when iterating through all good objects.
It is desirable to have that optimized, and here is a straightforward solution:
before call to MyFunc1 makecreate an additional temporary collection of good objects only (goodObjects, it can be IEnumerable);
get count of these objects and pass it as an additional parameter to MyFunc1;
in the 'MyFunc' method iterate not through 'objects.Where(...)' but through the 'goodObjects' collection.
Not too bad approach (as far as I see), but additional variable is required to be created in the 'MyFunc' method and additional parameter is required to be passed.
Question: is there any LinQ out-of-the-box functionality that allows any caching during 1st Where().Count(), remembering a processed collection and use it in the next iteration?
Any thoughts are welcome.
Thanks.

No, LINQ queries are not optimized in this way (what you describe is similar to the way SQL Server reuses a query execution plan). LINQ does not (and, for practical purposes, cannot) know enough about your objects in order to optimize this way. As far as it knows, your collection has changed (or is entirely different) between the two calls.
You're obviously aware of the ability to persist your query into a new List<T>, but apart from that there's really nothing that I can recommend without knowing more about your class and where else MyFunc is used.

As long as MyFunc1 doesn't need to modify the list by adding/removing objects, this will work.
void MyFunc(List<MyObj> objects)
{
ILookup<bool, MyObj> objLookup = objects.ToLookup(obj1 => obj1.Good);
MyFunc1(objLookup[true]);
foreach(MyObj obj in objLookup[true])
{
//..
}
}
void MyFunc1(IEnumerable<MyObj> objects)
{
//..
}

Related

Is it Possible to Lazy Load & Type Cast the Object in Single Iteration?

I am new in c# programming and have following scenario.
I am using an API which returns IEnumerable, which I want to iterate based on some object properties:
IEnumerable<objects> listOfObjects = filter.getItems(id);
List<CustomObject> sortedList = new List<CustomObject>();
foreach (CustomObject obj in listOfObjects )
{
obj.Load(Load.Expanded);
sortedList.Add(obj);
}
foreach (CustomObject custObj in sortedList.OrderByDescending(c => c.RevisionDate))
{
// business logic
}
I need to do all the above because I am not able to typecast the object returned in filter query. Also, the object returned from the filter query is not loaded which means if I don't execute the first foreach loop, the RevisionDate value in the second foreach will be null.
I am wondering if there is a better way to handle this scenario and can these number of lines be eliminated with just 1 loop?
You can do it in one linq statement like this:
foreach (var custObj in listOfObjects
.Cast<CustomObject>()
.Select(obj => {obj.Load(); return obj;})
.OrderByDescending(c => c.RevisionDate))
Note that such usage of Select is usually discouraged and is not a very good practice (having something that has side effects like obj.Load in Select that is).
You should be able to make use of the IEnumerable method Select.
See documentation here.
In short, you'll want to do something like this to eliminate the first loop:
List<CustomObject> sortedList = filter.getItems(id).Select<object, CustomObject>(x =>
{
(CustomObject)x).Load(Load.Expanded);
return (CustomObject)x;
});
Thera are several special constructs in C# libraries which were designed to handle lazy initialization. Each may have some peculiarities which may better or worse fit you needs.
First one is System.Lazy<T> class, than there is better performing without class overhead System.Threading.LazyInitializer with a bunch of static methods, and providing thread specific data System.Threading.ThreadLocal<T>.
Usage of Lazy<T> is simple:
// Initialize by using default Lazy<T> constructor. The
// Orders array itself is not created yet.
Lazy<Orders> _orders = new Lazy<Orders>();
// Initialize by invoking a specific constructor on Order which
// will be used when Value property is accessed
Lazy<Orders> _orders = new Lazy<Orders>(() => new Orders(100));
// Lazy<Orders> will create the array only if displayOrders is
// which will go through path where _orders.Value is accessed
if (displayOrders == true)
{
DisplayOrders(_orders.Value.OrderData);
}
else
{
// Don't waste resources getting order data.
}
Instead of passing lambda with constructor you can pass lambda casting your objects to CustomObject. If you strive for best performance avoid Linq and do it in procedural way.
Examples are from: Lazy Initialization Microsoft Docs with comments changed by me for clarity and can be almost directly applied to your code

which is more efficient in conditional looping?

suppose i have the following collection
IEnumerable<car> cars = new IEnumerable<car>();
now I need to loop on this collection.
I need to do some function depending on the car type; so I can do one of the following ways:
Method A:
foreach( var item in cars){
if(item.color == white){
doSomething();
}
else{
doSomeOtherThing();
}
}
or the other way:
Method B:
foreach( var item in cars.where(c=>c.color==white)){
doSomething();
}
foreach( var item in cars.where(c=>c.color!=white)){
doSomeOtherthing();
}
to me i think method A is better bec. I loop only once on the collection
while method B seems enticing bec. the framework will loop and filter the collection for you.
So which method is better and faster ?
Well, it depends on how complicated the filtering process is. It may be so insanely efficient that it's irrelevant, especially in light of the fact that you're no longer having to do your own filtering with the if statement.
I'll say one thing: unless your collections are massive, it probably won't make enough of a difference to care. And, sometimes, it's better to optimise for readabilty rather than speed :-)
But, if you really want to know, you measure! Time the operations in your environment with suitable production-like test data. That's the only way to be certain.
Method A is more readable than method B. Just one question, is it car.color or item.color?

Memory management / caching for costly objects in C#

Assume that I have the following object
public class MyClass
{
public ReadOnlyDictionary<T, V> Dict
{
get
{
return createDictionary();
}
}
}
Assume that ReadOnlyDictionary is a read-only wrapper around Dictionary<T, V>.
The createDictionary method takes significant time to complete and returned dictionary is relatively large.
Obviously, I want to implement some sort of caching so I could reuse result of createDictionary but also I do not want to abuse garbage collector and use to much memory.
I thought of using WeakReference for the dictionary but not sure if this is best approach.
What would you recommend? How to properly handle result of a costly method that might be called multiple times?
UPDATE:
I am interested in an advice for a C# 2.0 library (single DLL, non-visual). The library might be used in a desktop of a web application.
UPDATE 2:
The question is relevant for read-only objects as well. I changed value of the property from Dictionary to ReadOnlyDictionary.
UPDATE 3:
The T is relatively simple type (string, for example). The V is a custom class. You might assume that an instance of V is costly to create. The dictionary might contain from 0 to couple of thousands elements.
The code assumed to be accessed from a single thread or from multiple threads with an external synchronization mechanism.
I am fine if the dictionary is GC-ed when no one uses it. I am trying to find a balance between time (I want to somehow cache the result of createDictionary) and memory expenses (I do not want to keep memory occupied longer than necessary).
WeakReference is not a good solution for a cache since you object won´t survive the next GC if nobody else is referencing your dictionary. You can make a simple cache by storing the created value in a member variable and reuse it if it is not null.
This is not thread safe and you would end up in some situations creating the dictionary several times if you have heavy concurent access to it. You can use the double checked lock pattern to guard against this with minimal perf impact.
To help you further you would need to specify if concurrent access is an issue for you and how much memory your dictionary does consume and how it is created. If e.g. the dictionary is the result of an expensive query it might help to simply serialize the dictionary to disc and reuse it until you need to recreate it (this depends on your specific needs).
Caching is another word for memory leak if you have no clear policy when your object should be removed from the cache. Since you are trying WeakReference I assume you do not know when exactly a good time would be to clear the cache.
Another option is to compress the dictionary into a less memory hungry structure. How many keys does your dictionary has and what are the values?
There are four major mechanisms available for you (Lazy comes in 4.0, so it is no option)
lazy initialization
virtual proxy
ghost
value holder
each has it own advantages.
i suggest a value holder, which populates the dictionary on the first call of the GetValue
method of the holder. then you can use that value as long as you want to AND it is only
done once AND it is only done when in need.
for more information, see martin fowlers page
Are you sure you need to cache the entire dictionary?
From what you say, it might be better to keep a Most-Recently-Used list of key-value pairs.
If the key is found in the list, just return the value.
If it is not, create the one value (which is supposedly faster than creating all of them, and using less memory too) and store it in the list, thereby removing the key-value pair that hasn't been used the longest.
Here's a very simple MRU list implementation, it might serve as inspiration:
using System.Collections.Generic;
using System.Linq;
internal sealed class MostRecentlyUsedList<T> : IEnumerable<T>
{
private readonly List<T> items;
private readonly int maxCount;
public MostRecentlyUsedList(int maxCount, IEnumerable<T> initialData)
: this(maxCount)
{
this.items.AddRange(initialData.Take(maxCount));
}
public MostRecentlyUsedList(int maxCount)
{
this.maxCount = maxCount;
this.items = new List<T>(maxCount);
}
/// <summary>
/// Adds an item to the top of the most recently used list.
/// </summary>
/// <param name="item">The item to add.</param>
/// <returns><c>true</c> if the list was updated, <c>false</c> otherwise.</returns>
public bool Add(T item)
{
int index = this.items.IndexOf(item);
if (index != 0)
{
// item is not already the first in the list
if (index > 0)
{
// item is in the list, but not in the first position
this.items.RemoveAt(index);
}
else if (this.items.Count >= this.maxCount)
{
// item is not in the list, and the list is full already
this.items.RemoveAt(this.items.Count - 1);
}
this.items.Insert(0, item);
return true;
}
else
{
return false;
}
}
public IEnumerator<T> GetEnumerator()
{
return this.items.GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
In your case, T is a key-value pair. Keep maxcount small enough, so that searching stays fast, and to avoid excessive memory usage. Call Add each time you use an item.
An application should use WeakReference as a caching mechanism if the useful lifetime of an object's presence in the cache will be comparable to reference lifetime of the object. Suppose, for example, that you have a method which will create a ReadOnlyDictionary based on deserializing a String. If a common usage pattern would be to read a string, create a dictionary, do some stuff with it, abandon it, and start again with another string, WeakReference is probably not ideal. On the other hand, if your objective is to deserialize many strings (quite a few of which will be equal) into ReadOnlyDictionary instances, it may be very useful if repeated attempts to deserialize the same string yield the same instance. Note that the savings would not just come from the fact that one only had to do the work of building the instance once, but also from the facts that (1) it would not be necessary to keep multiple instances in memory, and (2) if ReadOnlyDictionary variables refer to the same instance, they can be known to be equivalent without having to examine the instances themselves. By contrast, determining whether two distinct ReadOnlyDictionary instances were equivalent might require examining all the items in each. Code which would have to do many such comparisons could benefit from using a WeakReference cache so that variables which hold equivalent instances would usually hold the same instance.
I think you have two mechanisms you can rely on for caching, instead of developing your own. The first, as you yourself suggested, was to use a WeakReference, and to let the garbage collector decide when to free this memory up.
You have a second mechanism - memory paging. If the dictionary is created in one swoop, it'll probably be stored in a more or less continuous part of the heap. Just keep the dictionary alive, and let Windows page it out to the swap file if you don't need it. Depending on your usage (how random is your dictionary access), you may end up with better performance than the WeakReference.
This second approach is problematic if you're close to your address space limits (this happens only in 32-bit processes).

C# laziness question

What's the common approach to design applications, which strongly rely on lazy evaluation in C# (LINQ, IEnumerable, IQueryable, ...)?
Right now I usually attempt to make every query as lazy as possible, using yield return and LINQ queries, but in runtime this could usually lead to "too lazy" behavior, when every query gets builts from it's beginning obviously resulting in severe visual performance degradation.
What I usually do means putting ToList() projection operators somewhere to cache the data, but I suspect this approach might be incorrect.
What's the appropriate / common ways to design this sort of applications from the very beginning?
I find it useful to classify each IEnumerable into one of three categories.
fast ones - e.g. lists and arrays
slow ones - e.g. database queries or heavy calculations
non-deterministic ones - e.g. list.Select(x => new { ... })
For category 1, I tend keep the concrete type when appropriate, arrays or IList etc.
For category 3, those are best to keep local within a method, to avoid hard-to find bugs.
Then we have category 2, and as always when optimizing performance, measure first to find the bottlenecks.
A few random thoughts - as the question itself is loosely defined:
Lazy is good only when the result might not be used hence loaded only when needed. Most operations, however, would need the data to be loaded so laziness is not good in that term.
Laziness can cause difficult bugs. We have seen it all with data contexts in ORMs
Lazy is good when it comes to MEF
Pretty broad question and unfortunately you're going to hear this a lot: It depends. Lazy-loading is great until it's not.
In general, if you're using the same IEnumerables over and over it might be best to cache them as lists.
But rarely does it make sense for your callers to know this either way. That is, if you're getting IEnumerables from a repository or something, it is best to let the repository do its job. It might cache it as a list internally or it might build it up every time. If your callers try to get too clever they might miss changes in the data, etc.
I would suggest doing a ToList in your DAL before returning the DTO
public IList<UserDTO> GetUsers()
{
using (var db = new DbContext())
{
return (from u in db.tblUsers
select new UserDTO()
{
Name = u.Name
}).ToList();
}
}
In the example above you have to do a ToList() before the DbContext scope ends.
I you need a certain sequence of data to be cached, call one of the aggregation operators (ToList, ToArray, etc.) on that sequence. Otherwise just use lazy evaluation.
Build your code around your data. What data is volatile and needs to be pulled fresh each time? Use lazy evaluation and don't cache. What data is relatively static and only needs to be pulled once? Cache that data in memory so you don't pull it unnecessarily.
Deferred execution and caching all items with .ToList() are not the only options. The third option is to cache the items while you are iterating by using a lazy List.
The execution is still deferred but all items are only yielded once. An example of how this work:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.

Partially thread-safe dictionary

I have a class that maintains a private Dictionary instance that caches some data.
The class writes to the dictionary from multiple threads using a ReaderWriterLockSlim.
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
Right now, I have the following:
public ReadOnlyCollection<MyClass> Values() {
using (sync.ReadLock())
return new ReadOnlyCollection<MyClass>(cache.Values.ToArray());
}
Is there a way to do this without copying the collection many times?
I'm using .Net 3.5 (not 4.0)
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
You have three choices.
1) Make a copy of the data, hand out the copy. Pros: no worries about thread safe access to the data. Cons: Client gets a copy of out-of-date data, not fresh up-to-date data. Also, copying is expensive.
2) Hand out an object that locks the underlying collection when it is read from. You'll have to write your own read-only collection that has a reference to the lock of the "parent" collection. Design both objects carefully so that deadlocks are impossible. Pros: "just works" from the client's perspective; they get up-to-date data without having to worry about locking. Cons: More work for you.
3) Punt the problem to the client. Expose the lock, and make it a requirement that clients lock all views on the data themselves before using it. Pros: No work for you. Cons: Way more work for the client, work they might not be willing or able to do. Risk of deadlocks, etc, now become the client's problem, not your problem.
If you want a snapshot of the current state of the dictionary, there's really nothing else you can do with this collection type. This is the same technique used by the ConcurrentDictionary<TKey, TValue>.Values property.
If you don't mind throwing an InvalidOperationException if the collection is modified while you are enumerating it, you could just return cache.Values since it's readonly (and thus can't corrupt the dictionary data).
EDIT: I personally believe the below code is technically answering your question correctly (as in, it provides a way to enumerate over the values in a collection without creating a copy). Some developers far more reputable than I strongly advise against this approach, for reasons they have explained in their edits/comments. In short: This is apparently a bad idea. Therefore I'm leaving the answer but suggesting you not use it.
Unless I'm missing something, I believe you could expose your values as an IEnumerable<MyClass> without needing to copy values by using the yield keyword:
public IEnumerable<MyClass> Values {
get {
using (sync.ReadLock()) {
foreach (MyClass value in cache.Values)
yield return value;
}
}
}
Be aware, however (and I'm guessing you already knew this), that this approach provides lazy evaluation, which means that the Values property as implemented above can not be treated as providing a snapshot.
In other words... well, take a look at this code (I am of course guessing as to some of the details of this class of yours):
var d = new ThreadSafeDictionary<string, string>();
// d is empty right now
IEnumerable<string> values = d.Values;
d.Add("someKey", "someValue");
// if values were a snapshot, this would output nothing...
// but in FACT, since it is lazily evaluated, it will now have
// what is CURRENTLY in d.Values ("someValue")
foreach (string s in values) {
Console.WriteLine(s);
}
So if it's a requirement that this Values property be equivalent to a snapshot of what is in cache at the time the property is accessed, then you're going to have to make a copy.
(begin 280Z28): The following is an example of how someone unfamiliar with the "C# way of doing things" could lock the code:
IEnumerator enumerator = obj.Values.GetEnumerator();
MyClass first = null;
if (enumerator.MoveNext())
first = enumerator.Current;
(end 280Z28)
Review next possibility, just exposes ICollection interface, so in Values() you can return your own implementation. This implementation will use only reference on Dictioanry.Values and always use ReadLock for access items.

Categories