What, exactly, do MemoryCache's memory limits mean?

What, exactly, do MemoryCache's memory limits mean? - c#

System.Runtime.Caching.MemoryCache is a class in the .NET Framework (version 4+) that caches objects in-memory, using strings as keys. More than System.Collections.Generic.Dictionary<string, object>, this class has all kinds of bells and whistles that let you configure how much big the cache can grow to (in either absolute or relative terms), set different expiration policies for different cache items, and so much more.
My questions relate to the memory limits. None of the docs on MSDN seem to explain this satisfactorily, and the code on Reference Source is fairly opaque. Sorry about piling all of this into one SO "question", but I can't figure out how to take some out into their own questions, because they're really just different views of one overall question: "how do you reconcile idiomatic C#/.NET with the notion of a generally useful in-memory cache that has configurable memory limits that's implemented nearly entirely in managed code?"
Do key sizes count towards the space that the MemoryCache is considered to take up? What about keys in the intern pool, each of which should only add the size of an object reference to the size of the cache?
Does MemoryCache consider more than just the size of the object references that it stores when determining the size of the object being stored in the pool? I mean... it has to, right? Otherwise, the configuration options are extremely misleading for the common-case... for the remaining questions, I'm going to assume that it does.
Given that MemoryCache almost certainly considers more than the size of the object references of the values stored in the cache, how deep does it go?
If I were implementing something like this, I would find it very difficult to consider the memory usage of the "child" members of individual objects, without also pulling in "parent" reference properties.
e.g., imagine a class in a game application, Player. Player has some player-specific state that's encapsulated in a public PlayerStateData PlayerState { get; } property that encapsulates what direction the player is looking, how many sprockets they're holding, etc., as well as a reference to the entire game's state public GameStateData GameState { get; } that can be used to get back to the game's (much larger) state from a method that only knows about a player.
Does MemoryCache consider both PlayerState and GameState when considering the size of the contribution to the cache?
Maybe it's more like "what's the total size on the managed heap taken up by the objects directly stored in the cache, and everything that's reachable through members of those objects"?
It seems like it would be silly to multiply the size of GameState's contribution to the limit by 5 just because 5 players are cached... but then again, a likely implementation might do just that, and it's difficult to count PlayerState without counting GameState.
If an object is stored multiple times in the MemoryCache, does each entry count separately towards the limit?
Related to the previous one, if an object is stored directly in the MemoryCache, but also indirectly through another object's members, what impact does either one have on the memory limit?
If an object is stored in the MemoryCache, but also referenced by some other live objects completely disconnected from the MemoryCache, which objects count against the memory limit? What about if it's an array of objects, some (but not all) of which have incoming external references?
My own research led me to SRef.cs, which I gave up on trying to understand after getting here, which later leads here. Guessing the answers to all these questions would revolve around finding and meditating on the code that ultimately populated the INT64 that's stored in that handle.

I know this is late but I've done a lot of digging in the source code to try to understand what is going on and I have a fairly good idea now. I will say that MemoryCache is the worst documented class on MSDN, which kind of baffles me for something intended to be used by people trying to optimize their applications.
MemoryCache uses a special "sized reference" to measure the size of objects. It all looks like a giant hack in the memory cache source code involving reflection to wrap an internal type called "System.SizedReference", which from what I can tell causes the GC to set the size of the object graph it points to during gen 2 collections.
From my testing, this WILL include the size of parent objects, and thus all child objects referenced by the parent etc, BUT I've found that if you make references to parent objects weak references (i.e. via WeakReference or WeakReference<>) then it is no longer counted as part of the object graph, so that is what I do for all cache objects now.
I believe cache objects need to be completely self-contained or use weak references to other objects for the memory limit to work at all.
If you want to play with it yourself, just copy the code from SRef.cs, create an object graph and point a new SRef instance to it, and then call GC.Collect. After the collection the approximate size will be set to the size of the object graph.

Related

Best practise referencing objects

The question itself is not directly related to XNA but the issue i have is. I am interested in the performance effect for referencing objects to methods/functions. For instance, in XNA i often see code referencing the complete Game1 object when there is only the need for a specific value or object like the GraphicsDevice, or just the viewport lying much deeper in the hierarchy. I always reference the specific value or object because i think this is best practise, often i have to go back and add another value if needed.
So what is best practise for referencing values and objects to methods/functions? Does it matter much or is it just somekind of pointer? So a pointer to simple int or a huge object would be the same? Things get more obvious when this gets stored in another class in a property, there needs to be another memory block reserved for the property, right?

'Referencing' an object for the sake of accessing only a small portion of it's proporties/methods is not good practice. It is much better to expose the required functionality, make it static, then do something like:
// Calling a static method
Game1.DoSomething();
// Accessing a static property
int test += Game1.MyInt;
I think the reason why you are seeing XNA code samples referencing the whole object is because a number of XNA programmers do not have object orientated development backgrounds compared to straight-up C# developers, and are thus unaware of best practices for using the language efficiently.

Why doesn't the .NET framework provide a method to deep copy objects? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was using a custom method to deep clone objects the other day, and I know you can deep clone in different ways (reflection, binary serialization, etc), so I was just wondering:
What is/are the reason(s) that Microsoft does not include a deep copy method in the framework?

The problem is substantially harder than you seem to realize, at least in the general case.
For starters, a copy isn't just deep or shallow, it's a spectrum.
Let's imagine for a second that we have a list of arrays of strings, and we want to make a copy of it.
We start out at the shallowest level, we just copy the reference of the whole thing to another variable. Any changes to the list referenced from either variable is seen by the other.
So now we go and create a brand new list to give to the second variable. For each item in the first list we add it to the second list. Now we can modify the list referenced from either variable without it being seen by the other one. But what if we grab the first item of a list and change the string that's in the first array, it will be seen by both lists!
Now we're going through and creating a new list, and for each array in the first list we're creating a new array, adding each of the strings in the underlying array to the new array, and adding each of those new arrays to the new list. Now we can mutate any of the arrays in either list without seeing the changes. But wait, both lists are still referencing the same strings (which are value types after all; they internally have a character array for their data). What if some mean person were to come along and mutate one of the strings (using unsafe code you could actually do this)! So now you're copying all of the strings with a deep copy. But what if we don't need to do that? What if we know that nobody is so mean that they would mutate the string? Or, for that matter, what if we know that none of the arrays will be mutated (or that if they will be, that they're supposed to be reflected by both lists).
Then of course there are problems such as circular references, fields in a class that don't really represent it's state (i.e. possibly cached values or derived data that could just be re-calculated as-needed by a clone).
Realistically you'd need to have every type implement IClonable or some equivalent, and have it's own custom code for how to clone itself. This would be a lot of work to maintain for a language, especially since there are so many ways that complex objects could possibly be cloned. The cost would be quite high, and the benefits (outside of a handful of objects that it is deemed worthwhile to implement clone methods for) are generally not worth it. You, as a programmer, and write your own logic for cloning a type based on how deep you know you need to go.

It's similar to how it works (or doesn't work) in C and C++:
To do a deep copy, you actually have to know how different data is interpreted. In trivial cases, a shallow copy (which is provided) is the same as a deep copy. But once this is no longer true, it really depends on the implementation and interpretation. There's no general rule of thumb.
Let's use a game as a simple example:
A NPC object has two integers as members. One integer represents its health points, the other one is its unique ID.
If you clone the NPC, you have to keep the amount of health, while changing the unique ID. This is something the compiler/runtime can't determine on their own. You have to code this, essentially telling the program "how to copy".
I can think of two possible solutions:
Add a keyword to denote things that can't be copied. While this sounds like a good idea, it doesn't really solve the issue. You can tell the compiler that UniqueID must not copied, but at the same time you can't define how this should happen. And even if you could, you could just...
Create a copy constructor (C++) or a method to copy/clone the object (C#, e.g. CopyTo()).

Hmm.. My view is that:
A) because very rarely you want to have the copy really deep
B) because the framework cannot guarantee to know how to truly and meaningfully CLONE an object
C) because implementing deep-cloning in a naiive way is simple and takes one method and several lines of code using reflection and recursion
but I'll try to find an old MSDN article that covered that
edit: I've not found :( I'm still sure that I saw it somewhere, but I cannot google-it-out now.. However some useful links about related ICloneable and derived:
http://blogs.msdn.com/b/brada/archive/2004/05/03/125427.aspx
http://blogs.msdn.com/b/mrtechnocal/archive/2009/10/19/why-not-icloneable-t.aspx
https://stackoverflow.com/a/3712564/717732
So, as I've not found the author's words, let me expand the points:
A: because very rarely you want to have the copy really deep
You see, how can the framework guess how deep should it be in general? Let's assume that completely-deep and let's assume it has been implemented. Now we have memberwise-clone and total-clone methods. Still, there are some cases when people will need clone-me-but-not-the-root-base. So they post another questions why the total-clone has no way of cutting off the raw base. Or second-to-raw. Etc. Providing deep-clone solves almost nothing from the .Net team's point of view, as we, the users, will still rant about that just because we see some partial tools and are lazy and want to have everything:)
B) because the framework cannot guarantee to know how to truly and meaningfully CLONE an object
Especially with some special objects with handles or native-like IDs like those from Entity Framework, .Net Remoting Proxies, COM-wrappers etc: You might sucessfully read and clone the upper class hierarchy layers, but eventually, somewhere below you find some arcane thingies like IntPtrs that you just know that you should not copy. Most of the times. But sometimes you can. But the framework's code must be universal. Deep-cloning would either have to be harshly complicated with many sanity checks against specially-looking class members, or it would produce dangerous results if the programmer would invoke it on something that has base classes that the programmer did not care to analyze.
B+) Also, please note that the more base classes you have in your tree, the more probably is that they will have some parameterized constructors, which might indicate that direct-copying is not a good idea. Direct-copiable classes usually have parameterless constructors and all the copiable data accessible by properties..
B++) From the framework's designer point of view, taking memory and speed concerns, shallow copying is almost always very fast, while deep copying is just the opposite. It is beneficial to the framework's and platform's reputation to NOT allow the developers to freely deep-copy huge objects. Anyways, would you need a deep-copy if your object was lightweight and simple, huh? :) Not providing a deep-copy encourages the developers to think around the need of deep-copy, what usually makes the application lighter and faster.
C) because implementing deep-cloning in a naiive way is simple and takes one method and several lines of code using reflection and recursion
Having a shallow copy, how hard it is to actually write a deep copy? Not so hard! Just implement a method that is given an object 'obj':
pseudocode:
object deepcopier(object obj)
newobject = obj.shallowcopy()
foreach(field in newobject.fields)
newobject.field = deepcopier(newobject.field)
return newobject
and well, that's all. Of course the field enumeration must be performed by Reflection, and also reading/writing the fields - too.
However, this way is very naiive. It this has a serious flaw: what if some object has two fields that point to the same another object? We should detect it and do the cloning once then assign both fields to that one clone. Also if an object pointed by some field has reference to some object that is also pointed by another object (...) - that may also need to be tracked and cloned only once. Also, how about cycles? if somewhere there deep in the tree, an object has a reference back to the root? Such algo like above would happily descent and would re-copy everything once again, then again, and eventually would choke with StackOverflow.
This makes the cloning quite hard to be tracked and starts to look more like serialization. In fact if your class is a DataContract or Serializable, you can simply serialize it and deserialize to get a perfect deep copy :)
Deep-cloning is hard to do in an universal way, unless you know what the object means and what all its fields mean and know which ones should really be cloned and which should be unified. If you, as developer, know that this is just a data-object that is perfectly safe to deep-clone, so whydontya just make it Serializable? If you can't make it Serializable, then probably you also can't deep-clone it!

Pass a big existing object or create and pass a small object

Suppose I have a big object that contains the properties I require and additionally several methods and a few sizeable collections.
I would like to know what would cost more: To pass as an argument this big object that already exists, or create a small object containing only the handful of properties I require and pass that?

If you're just passing it as an argument to a method, passing the "big" object is "cheaper" - you'll only be passing a copy of the object reference, which is just the size of a pointer (unless the object is of a struct type, in which case the whole "object" is copied into the stack). If you were to create a new object, even if it's small, you'd be paying the price of the allocation and copying of the properties onto it, which you don't have if you pass the "large" object.
If you're passing it in a way that it needs to be serialized (i.e., across applications, in a call to a web service, etc.), then having the smaller object (essentially a DTO, Data Transfer Object) is preferred.

As long as you pass by reference it doesn't matter. Therefore you shouldn't introduce additional complexity. In fact, it would cost the machine more, since it then would have to create that container object as well.
Since you seem concerned with performance I'd recommend to learn how pointers and memory management works in C. Once you understand that, you will have a much easier time understanding how their abstracted versions in higher level languages impact performance.

Does it "save memory" to put methods in one Big class? (Many objects)

I'm working on a game for moblie platforms where memory is always a concern.
I'm using a base abstract class enemy. All other enemies will be variants of this class.
My initial intent is to store methods for updating the enemy in the base class, and then store the specific behaviors in each child enemy class.
My question is this: With possibly hundreds of enemies per load (or level), will I save memory writing all of my behaviors in one big class which each enemy refers to? In other words, does having hundreds of enemies, each of which has a boatload of behaior code, require much more memory than storing everything in a single reference class?
Initial Idea:
enemy.Update()
Memory saving idea:
//Static class is named EnemyBehavior
EnemyBehavior.UpdateEnemy(enemy)

Although each instance of your class has its own private, memory-consuming copies of its variables, it does not have separate copies of its method definitions. Methods consume memory per-class, not per-instance. That only makes sense, since it's not possible for a class instance to change the definition of a method. So, sorry, but your idea won't save any memory.

That's not quite how memory works. The heap will need to store data and references, not actual code. As such, you don't save anything by deciding where to store 'common code' in a static class - having the code with your object instances doesn't mean it is physically copied into every instance.
What you're suggesting goes against the whole reason for object oriented programming, which is encapsulation of behaviour and separation of concerns.
tl;dr - keep the code with the class itself.

C# reference collection for storing reference types

I like to implement a collection (something like List<T>) which would hold all my objects that I have created in the entire life span of my application as if its an array of pointers in C++. The idea is that when my process starts I can use a central factory to create all objects and then periodically validate/invalidate their state. Basically I want to make sure that my process only deals with valid instances and I don't re-fetch information I already fetched from the database. So all my objects will basically be in one place - my collection. A cool thing I can do with this is avoid database calls to get data from the database if I already got it (even if I updated it after retrieval its still up-to-date if of course some other process didn't update it but that a different concern). I don't want to be calling new Customer("James Thomas"); again if I initted James Thomas already sometime in the past. Currently I will end up with multiple copies of the same object across the appdomain - some out of sync other in sync and even though I deal with this using timestamp field on the MSSQL server I'd like to keep only one copy per customer in my appdomain (if possible process would be better).
I can't use regular collections like List or ArrayList for example because I cannot pass parameters by their real local reference to the their existing Add() methods where I'm creating them using ref so that's not to good I think. So how can this be implemented/can it be implemented at all ? A 'linked list' type of class with all methods working with ref & out params is what I'm thinking now but it may get ugly pretty quickly. Is there another way to implement such collection like RefList<T>.Add(ref T obj)?
So bottom line is: I don't want re-create an object if I've already created it before during the entire application life unless I decide to re-create it explicitly (maybe its out-of-date or something so I have to fetch it again from the db). Is there alternatives maybe ?

The easiest way to do what you're trying to accomplish is to create a wrapper that holds on to the list. This wrapper will have an add method which takes in a ref. In the add it looks up the value in the list and creates it when it can't find the value. Or a Cache
But... this statement would make me worry.
I don't want re-create an object if
I've already created it before during
the entire application life
But as Raymond Chen points out that A cache with a bad policy is another name for a memory leak. What you've described is a cache with no policy
To fix this you should consider using for a non-web app either System.Runtime.Caching for 4.0 or for 3.5 and earlier the Enterprise Library Caching Block. If this is a Web App then you can use the System.Web.Caching. Or if you must roll your own at least get a sensible policy in place.
All of this of course assumes that your database's caching is insufficient.

Using Ioc will save you many many many bugs, and make your application easier to test and your modules will be less coupled.
Ioc performance are pretty good.
I recommend you to use the implementation of Castle project
http://stw.castleproject.org/Windsor.MainPage.ashx
maybe you'll need a day to learn it, but it's great.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.