Realm how to close and read - c#

I am having an issue where I am encountering the following error: Realms.Exceptions.RealmClosedException
when I try to read my list of realm objects after I retrieved them. The closing of realm is my desired behavior as in my logs I saw ~2K instances of crashes reporting Realm OOM exceptions so I decided to experiment with wrapping my call to realm in a using statement as so:
List<RealmDomain.User> users;
using(var realm = Realm.GetInstance(config))
{
users = realm.All<RealmDomain.User>().ToList();
}
return users;
I then try to work with this data as follows (and that is when the exception is raised)
allUsers.FirstOrDefault(x => x.PortalUserId == id.ToString());. allUsers in this case is the variable holding the data returned by the last block of code. Thus I am wondering what is the correct way to properly handle disposing of realm in order to ensure we don't run into OOM exceptions with it, and how to properly read the data from said source even after it has been closed?
Edit: The block which returns the users is inside of my UserRepository, the application implements the UnitOfWork pattern (to access our realm db) which is then accessed via DI.
Edit2: Another follow up question would be: should I only wrap my calls to realm in a Using statement if it's only a CUD operation (Create, Update, Delete) whilst reads should not be wrapped in that and let the GC handle disposing of the realm instance as needed later?

I think that actually there are multiple points in your question.
Disposing
The way that you dispose your realm really depends on whether you are on the main thread or on a background thread.
If you are on a main thread then you should not dispose your realm, and it would make sense to either call Realm.GetInstance() when you need it, or to initialize the realm as a singleton, like in the following snippet:
public class MyViewModel
{
private readonly Realm realm;
public MyViewModel()
{
this.realm = Realm.GetInstance(); //or this.realm = RealmProvider.Singleton
}
}
If you are on a background thread then you definitely need to dispose of the realm. In this case the using statement is probably the easiest thing to do:
public async Task OnBackgroundThread()
{
using(var realm = Realm.GetInstance()) //From C#8 you don't even need the braces
{
//do work with realm
}
}
You need to dispose of realm on background threads otherwise you will cause the increase of the size of the realm on the disk. You can read more about why in this answer.
ToList()
Using ToList() on realm.All() is probably not a good idea, as you will load all the objects in memory losing the "laziness" of access. If what you need to do is find a certain object you can use a query like:
realm.All<User>().Where(x => x.PortalUserId == id.ToString()).First();
Repository
Realm does not really work well with the repository pattern. If you really want to use the repository pattern, then you will lose some of the advantages of working with realm, like the fact that the objects and collections are live and auto-updating. Besides, this is all made even more complicated by the use of background threads because of what I have said before.

Related

C# lock based on class property

I've seen many examples of the lock usage, and it's usually something like this:
private static readonly object obj = new object();
lock (obj)
{
// code here
}
Is it possible to lock based on a property of a class? I didn't want to lock globally for any calls to the method with the lock statement, I'd like to lock only if the object passed as argument had the same property value as another object which was being processed prior to that.
Is that possible? Does that make sense at all?
This is what I had in mind:
public class GmailController : Controller
{
private static readonly ConcurrentQueue<PushRequest> queue = new ConcurrentQueue<PushRequest>();
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
var existingPush = queue.FirstOrDefault(q => q.Matches(push));
if (existingPush == null)
{
queue.Enqueue(push);
existingPush = push;
}
try
{
// lock if there is an existing push in the
// queue that matches the requested one
lock (existingPush)
{
// process the push notification
}
}
finally
{
queue.TryDequeue(out existingPush);
}
}
}
Background: I have an API where I receive push notifications from Gmail's API when our users send/receive emails. However, if someone sends a message to two users at the same time, I get two push notifications. My first idea was querying the database before inserting (based on subject, sender, etc). In some rare cases, the query of the second call is made before the SaveChanges of the previous call, so I end up having duplicates.
I know that if I ever wanted to scale out, lock would become useless. I also know I could just create a job to check recent entries and eliminate duplicates, but I was trying something different. Any suggestions are welcome.
Let me first make sure I understand the proposal. The problem given is that we have some resource shared to multiple threads, call it database, and it admits two operations: Read(Context) and Write(Context). The proposal is to have lock granularity based on a property of the context. That is:
void MyRead(Context c)
{
lock(c.P) { database.Read(c); }
}
void MyWrite(Context c)
{
lock(c.P) { database.Write(c); }
}
So now if we have a call to MyRead where the context property has value X, and a call to MyWrite where the context property has value Y, and the two calls are racing on two different threads, they are not serialized. However, if we have, say, two calls to MyWrite and a call to MyRead, and in all of them the context property has value Z, those calls are serialized.
Is this possible? Yes. That doesn't make it a good idea. As implemented above, this is a bad idea and you shouldn't do it.
It is instructive to learn why it is a bad idea.
First, this simply fails if the property is a value type, like an integer. You might think, well, my context is an ID number, that's an integer, and I want to serialize all accesses to the database using ID number 123, and serialize all accesses using ID number 345, but not serialize those accesses with respect to each other. Locks only work on reference types, and boxing a value type always gives you a freshly allocated box, so the lock would never be contested even if the ids were the same. It would be completely broken.
Second, it fails badly if the property is a string. Locks are logically "compared" by reference, not by value. With boxed integers, you always get different references. With strings, you sometimes get different references! (Because of interning being applied inconsistently.) You could be in a situation where you are locking on "ABC" and sometimes another lock on "ABC" waits, and sometimes it does not!
But the fundamental rule that is broken is: you must never lock on an object unless that object has been specifically designed to be a lock object, and the same code which controls access to the locked resource controls access to the lock object.
The problem here is not "local" to the lock but rather global. Suppose your property is a Frob where Frob is a reference type. You don't know if any other code in your process is also locking on that same Frob, and therefore you don't know what lock ordering constraints are necessary to prevent deadlocks. Whether a program deadlocks or not is a global property of a program. Just like you can build a hollow house out of solid bricks, you can build a deadlocking program out of a collection of locks that are individually correct. By ensuring that every lock is only taken out on a private object that you control, you ensure that no one else is ever locking on one of your objects, and therefore the analysis of whether your program contains a deadlock becomes simpler.
Note that I said "simpler" and not "simple". It reduces it to almost impossible to get correct, from literally impossible to get correct.
So if you were hell bent on doing this, what would be the right way to do it?
The right way would be to implement a new service: a lock object provider. LockProvider<T> needs to be able to hash and compare for equality two Ts. The service it provides is: you tell it that you want a lock object for a particular value of T, and it gives you back the canonical lock object for that T. When you're done, you say you're done. The provider keeps a reference count of how many times it has handed out a lock object and how many times it got it back, and deletes it from its dictionary when the count goes to zero, so that we don't have a memory leak.
Obviously the lock provider needs to be threadsafe and needs to be extremely low contention, because it is a mechanism designed to prevent contention, so it had better not cause any! If this is the road you intend to go down, you need to get an expert on C# threading to design and implement this object. It is very easy to get this wrong. As I have noted in comments to your post, you are attempting to use a concurrent queue as a sort of poor lock provider and it is a mass of race condition bugs.
This is some of the hardest code to get correct in all of .NET programming. I have been a .NET programmer for almost 20 years and implemented parts of the compiler and I do not consider myself competent to get this stuff right. Seek the help of an actual expert.
Although I find Eric Lippert's answer fantastic and marked it as the correct one (and I won't change that), his thoughts made me think and I wanted to share an alternative solution I found to this problem (and I'd appreciate any feedbacks), even though I'm not going to use it as I ended up using Azure functions with my code (so this wouldn't make sense), and a cron job to detected and eliminate possible duplicates.
public class UserScopeLocker : IDisposable
{
private static readonly object _obj = new object();
private static ICollection<string> UserQueue = new HashSet<string>();
private readonly string _userId;
protected UserScopeLocker(string userId)
{
this._userId = userId;
}
public static UserScopeLocker Acquire(string userId)
{
while (true)
{
lock (_obj)
{
if (UserQueue.Contains(userId))
{
continue;
}
UserQueue.Add(userId);
return new UserScopeLocker(userId);
}
}
}
public void Dispose()
{
lock (_obj)
{
UserQueue.Remove(this._userId);
}
}
}
...then you would use it like this:
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
using(var scope = UserScopeLocker.Acquire(push.UserId))
{
// process the push notification
// two threads can't enter here for the same UserId
// the second one will be blocked until the first disposes
}
}
The idea is:
UserScopeLocker has a protected constructor, ensuring you call Acquire.
_obj is private static readonly, only the UserScopeLocker can lock this object.
_userId is a private readonly field, ensuring even its own class can't change its value.
lock is done when checking, adding and removing, so two threads can't compete on these actions.
Possible flaws I detected:
Since UserScopeLocker relies on IDisposable to release some UserId, I can't guarantee the caller will properly use using statement (or manually dispose the scope object).
I can't guarantee the scope won't be used in a recursive function (thus possibly causing a deadlock).
I can't guarantee the code inside the using statement won't call another function which also tries to acquire a scope to the user (this would also cause a deadlock).

Entity Framework, DBContext and using() + async?

There is a thing that's been bugging me for a long time about Entity Framework.
Last year I wrote a big application for a client using EF. And during the development everything worked great.
We shipped the system in august. But after some weeks I started to see weird memory leaks on the production-server. My ASP.NET MVC 4 process was taking up all the resources of the machine after a couple of days running (8 GB). This was not good. I search around on the net and saw that you should surround all your EF queries and operations in a using() block so that the context can be disposed.
In a day I refactored all my code to use using() and this solved my problems, since then the process sits on a steady memory usage.
The reason I didn't surround my queries in the first place however is that I started my first controllers and repositories from Microsofts own scaffolds included in Visual Studio, these did not surround it's queries with using, instead it had the DbContext as an instance variable of the controller itself.
First of all: if it's really important to dispose of the context (something that wouldn't be weird, the dbconnection needs to be closed and so on), Microsoft maybe should have this in all their examples!
Now, I have started working on a new big project with all my learnings in the back of my head and I've been trying out the new features of .NET 4.5 and EF 6 async and await. EF 6.0 has all these async methods (e.g SaveChangesAsync, ToListAsync, and so on).
public Task<tblLanguage> Post(tblLanguage language)
{
using (var langRepo = new TblLanguageRepository(new Entities()))
{
return langRepo.Add(RequestOrganizationTypeEnum, language);
}
}
In class TblLanguageRepo:
public async Task<tblLanguage> Add(OrganizationTypeEnum requestOrganizationTypeEnum, tblLanguage language)
{
...
await Context.SaveChangesAsync();
return langaugeDb;
}
However, when I now surround my statements in a using() block I get the exception, DbContext was disposed, before the query has been able to return. This is expected behaviour. The query runs async and the using block is finished ahead of the query. But how should I dispose of my context in a proper way while using the async and await functions of ef 6??
Please point me in the right direction.
Is using() needed in EF 6? Why do Microsoft's own examples never feature that? How do you use async features and dispose of your context properly?
Your code:
public Task<tblLanguage> Post(tblLanguage language)
{
using (var langRepo = new TblLanguageRepository(new Entities()))
{
return langRepo.Add(RequestOrganizationTypeEnum, language);
}
}
is disposing the repository before returning a Task. If you make the code async:
public async Task<tblLanguage> Post(tblLanguage language)
{
using (var langRepo = new TblLanguageRepository(new Entities()))
{
return await langRepo.Add(RequestOrganizationTypeEnum, language);
}
}
then it will dispose the repository just before the Task completes. What actually happens is when you hit the await, the method returns an incomplete Task (note that the using block is still "active" at this point). Then, when the langRepo.Add task completes, the Post method resumes executing and disposes the langRepo. When the Post method completes, the returned Task is completed.
For more information, see my async intro.
I would go for the 'one DbContext per request' way, and reuse the DbContext within the request. As all tasks should be completed at the end of the request anyway, you can safely dispose it again.
See i.e.: One DbContext per request in ASP.NET MVC (without IOC container)
Some other advantages:
some entities might already be materialized in the DbContext from
previous queries, saving some extra queries.
you don't have all those extra using statements cluttering your code.
If you are using proper n-tiered programming patters, your controller should never even know that a database request is being made. That should all happen in your service layer.
There are a couple of ways to do this. One is to create 2 constructors per class, one that creates a context and one that accepts an already existing context. This way, you can pass the context around if you're already in the service layer, or create a new one if it's the controller/model calling the service layer.
The other is to create an internal overload of each method and accept the context there.
But, yes, you should be wrapping these in a using.
In theory, the garbage collection SHOULD clean these up without wrapping them, but I don't entirely trust the GC.
I agree with #Dirk Boer that the best way to manage DbContext lifetime is with an IoC container that disposes of the context when the http request completes. However if that is not an option, you could also do something like this:
var dbContext = new MyDbContext();
var results = await dbContext.Set<MyEntity>.ToArrayAsync();
dbContext.Dispose();
The using statement is just syntactic sugar for disposing of an object at the end of a code block. You can achieve the same effect without a using block by simply calling .Dispose yourself.
Come to think of it, you shouldn't get object disposed exceptions if you use the await keyword within the using block:
public async Task<tblLanguage> Post(tblLanguage language)
{
using (var langRepo = new TblLanguageRepository(new Entities()))
{
var returnValue = langRepo.Add(RequestOrganizationTypeEnum, language);
await langRepo.SaveChangesAsync();
return returnValue;
}
}
If you want to keep your method synchronous but you want to save to DB asynchronously, don't use the using statement. Like #danludwig said, it is just a syntactic sugar. You can call the SaveChangesAsync() method and then dispose the context after the task is completed. One way to do it is this:
//Save asynchronously then dispose the context after
context.SaveChangesAsync().ContinueWith(c => context.Dispose());
Take note that the lambda you pass to ContinueWith() will also be executed asynchronously.
IMHO, it's again an issue caused by usage of lazy-loading. After you disposed your context, you can't lazy-load a property anymore because disposing the context closes the underlying connection to the database server.
If you do have lazy-loading activated and the exception occurs after the using scope, then please see https://stackoverflow.com/a/21406579/870604

EF (entity framework) usage of "using" statement

I have a project on MVC. We chose EF for our DB transactions. We created some managers for the BLL layer. I found a lot of examples, where "using" statement is used, i.e.
public Item GetItem(long itemId)
{
using (var db = new MyEntities())
{
return db.Items.Where(it => it.ItemId == itemId && !it.IsDeleted).FirstOrDefault();
}
}
Here we create a new instance of DBcontext MyEntities().
We using "using" in order to "ensure the correct use of IDisposable objects."
It's only one method in my manager. But I have more than ten of them.
Every time I call any method from the manager I'll be using "using" statement and create another DBcontext in the memory. When will the garbage collector (GC) dispose them? Does anyone know?
But there is alternative usage of the manager methods.
We create a global variable:
private readonly MyEntities db = new MyEntities();
and use DBcontext in every method without "using" statement. And method looks like this:
public Item GetItem(long itemId)
{
return db.Items.Where(it => it.ItemId == itemId && !it.IsDeleted).FirstOrDefault();
}
Questions:
What is the proper way of using DBcontext variable?
What if we wouldn't use "usage" statement (because it affects the performance) - GC will do all for that?
I'm a "rookie" in EF usage and still haven't found the unequivocal answer for this question.
I think you will find many suggesting this style of pattern. Not just me or Henk
DBContext handling
Yes, Ideally Using statements for DBContext subtypes
Even better Unit Of Work patterns that are managed with Using, that have a context and dispose the context Just 1 of many UoW examples, this one from Tom Dykstra
The Unit Of Work Manager should be New each Http request
The context is NOT thread safe so make sure each thread has its own context.
Let EF cache things behind the scenes.
Test Context creation times. after several Http request. Do you still have a concern?
Expect problems if you store the context in static. any sort of concurrent access will hurt and if you are using parallel AJAX calls, assume 90+% chance of problems if using a static single context.
For some performance tips, well worth a read
The proper or best practice way of using DBContext variable is with the Using.
using (var db = new MyEntities())
{
return db.Items.Where(it => it.ItemId == itemId && !it.IsDeleted).FirstOrDefault();
}
The benefit is many things are done automatically for us. For example once the block of code is completed the dispose is called.
Per MSDN EF Working with DbContext
The lifetime of the context begins when the instance is created and
ends when the instance is either disposed or garbage-collected. Use
using if you want all the resources that the context controls to be
disposed at the end of the block. When you use using, the compiler
automatically creates a try/finally block and calls dispose in the
finally block.

Threading/Ambient Context in CRM 2011 plugins

We have recently had a few occasions where the question came up whether in Dynamics CRM 2011, one plugin execution (i.e. a pass of the Execute() method) is guaranteed to stay on the same thread.
I'd like to implement tracing using the Ambient Context pattern to avoid passing the tracing service to any class that might want to trace. The problem is that as we know the plugin is only instantiated once per registered step and then serves all subsequent operations from the same instance; that means I can't just have some static property like Tracing.Current to which I assign the current ITracingService instance and I'm good to go. If I did that, the operation started last would overwrite the instance for all other operations that might still be running (and this sort of concurrency is not uncommon).
Now if I could be sure everything under the Execute() method remains in the same thread, I could still use an Ambient Context utilizing the [ThreadStatic] attribute for static fields:
public static class Tracing
{
[ThreadStatic]
private static ITracingService _current;
public static ITracingService Current
{
get
{
if (null == _current)
{
_current = new NullTracingService();
}
return _current;
}
set { _current = value; }
}
}
I would set this upon entering the Execute() method and clear it at the end so the reference to the tracing service instance will be removed.
The only thing I could kind of find out about threading in the context of MSCRM plugins is that apparently the individual threads come from the ThreadPool - whatever consequences that might have with regards to my issue.
Does anyone have a deeper insight into how threading is handled with MSCRM plugins - or any other ideas on how the cross-cutting concern of tracing could be handled elegantly with SOLID code in this special case (AOP/dynamic interception are no options here)?
Thanks for any help and pointers.
The simple and smart-ass answer: if it hurts when you do that, then don't do that. :)
Your self-imposed requirement of using the Ambient Context pattern is conflicting with CRM's design pattern. Think of how CRM works - it passes you a IServiceProvider, with everything you need including the Tracing Service. It handles all the complicated multithreading and optimizations for you, and only asks that you don't try to outsmart it with fancy patterns or static variables or threading tricks.
My recommendation is to use the same pattern - pass the IServiceProvider to any classes or methods that need it. Much simpler - plus later when you have a weird bug, you'll not question whether you successfully outsmarted Microsoft's engineers or not. :)
CRM creates a single plugin object, and then uses threads as needed to process the requests. So the only thing you can be sure of is that you will have multiple threads running at a single time for a single plugin object.
The threads are managed through IIS, and will get reused if possible. So if you want to ensure that each time Execute is called, it has a new ITracingService you'll have to set it. If you just want to ensure that each time Execute is called, it has one, you'll just need to do an if statement to check for it.
Since your backing variable is ThreadStatic, you won't need to worry about threading issues, but since IIS tries to reuse threads, it will not be empty each time Execute is called.
I'm afraid I can only guess at the whole plugin/thread/static issue here, what you propose does seem a little complicated however. So as an alternative, have you considered using Trace Listeners?
If you use Trace.Writeline across your application then a single Trace Listener would capture all those messages. That way you don't have to pass a trace object around.
For example:
Execute(...)
{
if(System.Diagnostics.Trace.Listeners
.Count(l => typeof(l) == MyCustomTraceListener) == 0)
{
System.Diagnostics.Trace.Listeners.Add(new MyCustomTraceListener());
}
DoWork();
}
DoWork()
{
System.Diagnostics.Trace.WriteLine("I'm doing work!");
}
Relevant links:
Trace Listeners and Walkthrough: Creating a Custom Trace
Listener

net c# lock statement in data access layer

I saw a code where they have the data access layer like this:
public class CustomerDA{
private static readonly object _sync = new object();
private static readonly CustomerDA _mutex = new CustomerDA();
private CustomerDA(){
}
public CustomerDA GetInstance(){
lock(_sync){
return _mutex;
}
}
public DataSet GetCustomers(){
//database SELECT
//return a DataSet
}
public int UpdateCustomer(some parameters){
//update some user
}
}
public class CustomerBO{
public DataSet GetCustomers(){
//some bussiness logic
return CustomerDA.GetInstance().GetCustomers();
}
}
I was using it, but start thinking... "and what if had to build a facebook like application where there are hundreds of thousands of concurrent users? would I be blocking each user from doing his things until the previous user ends his database stuff? and for the Update method, is it useful to LOCK THREADS in the app when database engines already manage concurrency at database server level?"
Then I started to think about moving the lock to the GetCustomers and UpdateCustomer methods, but think again: "is it useful at all?"
Edit on January 03:
you're all right, I missed the "static" keyword in the "GetInstance" method.
Antoher thing: I was in the idea that no thread could access the _mutex variable if there was another thread working in the same data access class. I mean, I thought that since the _mutex variable is being returned from inside the lock statement, no thread could access the _mutex until the ";" was reached in the following sentence:
return CustomerDA.GetInstance().GetCustomer();
After doing some tracing, I realize I was making the wrong assumption. Could you please confirm that I was making the wrong assumption?
So... Can I say for sure that my Data Access layer does not need any lock statement (even on INSERT, UPDATE, DELETE) and that it does not matter if methods in my DataAccess are static or instance methods?
Thanks again... your comments are so useful to me
The lock in that code is completely pointless. It locks around code that returns a value that never changes, so there is no reason to have a lock there. The purpose of the lock in the code is to make the object a singleton, but as it's not using lazy initialisation, the lock is not needed at all.
Making the data access layer a singleton is a really bad idea, that means that only one thread at a time can access the database. It also means that the methods in the class have to use locks to make sure that only one thread at a time accesses the database, or the code won't work properly.
Instead, each thread should get their own instance of the data access layer, with their own connection to the database. That way the database takes care of the concurrency issues, and the theads doesn't have to do any locking at all.
Set your lock where it is needed, so where concurrent accesses happen. Put in only as much code inside lock/critical section as much really need.
That GetInstance shouldn't be static ?
the following pseudo code explains how GetInstance operates:
LOCK
rval = _mutex
UNLOCK
Return rval
_mutex is readonly, refers to a non-null object, so it can't be changed, why lock ?
If your database provides concurrency management, but in your program you create two thread writing the same data in the same time in your own domain while waiting for the data,
how could your database help ?

Categories