Recently I've been thinking about performance difference between class field members and method variables. What exactly I mean is in the example below :
Lets say we have a DataContext object for Linq2SQL
class DataLayer
{
ProductDataContext context = new ProductDataContext();
public IQueryable<Product> GetData()
{
return context.Where(t=>t.ProductId == 2);
}
}
In the example above, context will be stored in heap and the GetData method variables will be removed from Stack after Method is executed.
So lets examine the following example to make a distinction :
class DataLayer
{
public IQueryable<Product> GetData()
{
ProductDataContext context = new ProductDataContext();
return context.Where(t=>t.ProductId == 2);
}
}
(*1) So okay first thing we know is if we define ProductDataContext instance as a field, we can reach it everywhere in the class which means we don't have to create same object instance all the time.
But lets say we are talking about Asp.NET and once the users press submit button the post data is sent to the server and the events are executed and the posted data stored in a database via the method above so it is probable that the same user can send different data after one another.If I know correctly after the page is executed, the finalizers come into play and clear things from memory (from heap) and that means we lose our instance variables from memory as well and after another post, DataContext should be created once again for the new page cycle.
So it seems the only benefit of declaring it publicly to the whole class is the just number one text above.
Or is there something other?
Thanks in advance...
(If I told something incorrect please fix me.. )
When it comes to the performance difference between creating an object per method or per class instance, I wouldn't worry to much about it. However, what you seem to miss here are some important principles around the DataContext class and the unit of work pattern in general.
The DataContext class operates as a single unit of work. Thus, you create a DataContext, you create objects, update and delete objects, you submit all changes and you dispose the DataContext after that. You may create multiple DataContext classes per request, one per (business) transaction. But in ASP.NET you should never create a DataContext that survives a web request. All the DataContexts that are created during a request should be disposed when or before that request is over. There are two reasons for this.
First of all, the DataContext has an internal cache of all objects that it has fetched from the database. Using a DataContext for a long period of time will make its cache grow indefinitely and can cause memory problems when you’ve got a big database. The DataContext will also favor returning an object from cache when it can, making your objects go stale quickly. Any update and delete operation made on another DataContext or directly to the database can get unnoticed because of this staleness.
Second reason for not caching DataContexts, is that they are not thread-safe. It’s best to see a DataContext as a unit of work, or as a (business) transaction. You create a bunch of new objects, add them to the DataContext, change some others, remove some objects and when you’re done, you call SubmitChanges. If another request calls SubmitChanges on that same instance during that operation, you are losing the idea of the transaction. When you are allowing code to do this, in the most fortunate situation, your new objects will be persisted and your transaction is split up in two separate transactions. At worst, you leave your DataContext, or the objects it persists in an invalid state, which could mean other requests fail or invalid data enters your database. And this is no unlikely scenario, I’ve seen strange things happen on projects were developers created a single (static) DataContext per web site.
So with this in mind, let’s get back to your question. While defining a DataContext as instance field is not a problem, it is important to know how you are using the DataLayer class. When you create one DataLayer per request or on per method call, you’ll probably be safe, but in that case you shouldn’t store that DataLayer in a static field. When you want to do that, you should create a DataContext per method call.
It is important to know what the design of the DataLayer class is. In your code you only show us a query method. No CUD methods. Is every method meant to be a single transaction, or do you want to call multiple methods and call a SaveChanges on the DataLayer afterwards? When you want this last option, you need to store the DataContext as an instance field and in that case you should implement IDisposable on the DataLayer. When every method is its own transaction, you can create a DataContext per method and you should wrap a DataContext in a using statement. Note however, that disposing the DataContext can cause problems when you return objects with lazy loading properties from a method. Those properties cannot be loaded anymore when the DataContext is disposed. Here is more interesting information about this.
As you see, I haven’t even talked about which of your two options would be better for performance, because performance is of no importance when the solution gives inconsistent and incorrect results.
I'm sorry for my long answer :-)
You don't ever want to store a DataContext class on the class level. If you do, then you will have to implement the IDisposable interface on your class and call the Dispose method when you know you are done with it.
It's better to just create a new DataContext in your method and use a using statement to automatically dispose of it when you are done.
Even though the implementation of IDisposable on DataContext does nothing, that is an implementation detail, whereas exposing an IDisposable interface is a contract which you should always abide by.
It be especially handy if you upgrade to LINQ-to-Entities and use the ObjectContext class where you must call Dispose on the instance when you are done with it, otherwise, resources will leak until the next garbage collection.
So it seems the only benefit of
declaring it publicly to the whole
class is the just number one text
above.
Yes, declaring a class level variable is to allow the entire class to access the same variable. It should not be used to try and deliberately prevent a Garbage Collection from occurring. The access modifier on properties, methods etc. is used to determine what objects external or internal to your class can access/modify/monkey with that piece of code.
In ASP.NET once the request is sent to the browser, the created objects for that page request will get CGed at some point in time in the future, regardless of whether or not the variable is public. If you want information to stay between requests, you either need to create a singleton instance of the object, or serialize the object to either the session or application state.
See this for example - "Linq to SQL DataContext Lifetime Management": http://www.west-wind.com/weblog/posts/246222.aspx This approach makes life simpler.
Related
I have a program that displays several charts on a TV monitor and these charts get cycled through every 30 seconds or so.
I'm using global objects for each chart that is displayed (so the objects contains some bar, area,line series', and some methods). The reason i have them global is because after the initial sql query is executed (every day at 7am) the data for the charts won't change. I just need to cycle through 20 different charts throughout the day.
Would using local objects be a waste in this scenario? Everytime it is time to switch to a new chart the program would have to create a new object and populate the different series' with datapoints that are always the same.
What can i do to avoid using global objects in this scenario? The reason i'm asking is because i've read that you should keep away from global objects in your programs.
Global/local objects - its really a matter of encapsulation (and a matter of scope - objects can be application global or class global - best practice is to initialize and assign an object as close to where it is needed, scopewise - and scope can be application (namespace actually), class, method and even block (like using{} or foreach{}). Instead of creating a lot of application global members you should encapsulate them into classes and initialize those classes. First of all you dont clutter your main loop and secondly you have all the advantages of, well classes - subclassing, polymorphism, etc. Keep references to those classes as long as you need them, in your case, as far as I can tell, keep them until the data changes. That would be 24 hours? So what?
At some point your data will need to be accessible from the main application object anyhow. If you encapsulated your data objects nicely, i.e. create some classes that hold all the information you need to display, you will still have to initialize those classes in your main application loop.
I do not think it is necessary to recreate the objects every 30 seconds. Once you populated the class members they dont change, except if you change or dispose them of course. Also it seems unlikely to me that they actually hog your computers memory so much that each chart should really get disposed once it was displayed only to get recreated a few seconds or possibly minutes later from data which had to be stored in memory somewhere anyhow.
The way I understand how your app should work I would create a base chart class, create as many chart subclasses as I have charts (if they differ from each other in object design, else just create on class and, possibly put them in a List(), fill each chart at 7am and loop through the collection to display them.
Using global objects are to be avoided in large projects because they generally affect the flexibility of the system.
For example, when the business object caches results for 24 hours, it means other parts of the program that may wish to fetch the latest result will end up getting cached values, or it will have to be rewritten.
This problem can be avoided by not using class members, but instance members. e.g.
public class ResultsCache
{
public IEnumerable<Visitors> TotalVisitors {get; }
public IEnumerable<Purchases> TotalPurchases {get; }
public IEnumerable<Refunds> TotalRefunds {get; }
public void FetchData() {
//...
}
}
var cache24hours = new ResultsCache();
var currentView = new RefundsView(cache24hours);
I have a web service that is quite heavy on database access. It works fine in test, but as soon as I put it in production and ramp up the load it starts churning out errors that are raised when something calls a method in the DataContext. The error is normally one of these:
Object reference not set to an instance of an object
Cannot access a disposed object. Object name: 'DataContext accessed after Dispose.'.
but not always.
Any single web service requests can result as many as 10 or 15 database queries, and 1 or 2 updates.
I've designed my application with a data access layer which is a bunch of objects that represent the tables in my database which hold all the business logic. The is a separate project to my web service as it's shared with a Web GUI.
The data access objects derive from a base class which has a GetDataContext() method to initiate an instance of the data context whenever it's needed.
All throughout my data access objects I've written this:
using (db = GetDataContext())
{
// do some stuff
}
which happily creates/uses/disposes my DataContext (created by sqlmetal.exe) object for each and every database interaction.
After many hours of head scratching, I think I've decided that the cause of my errors is that under load the datacontext object is being created and disposed way too much, and I need to change things to share the same datacontext for the duration of the web service request.
I found this article on the internet which has a DataContextFactory that seems to do exactly what I need.
However, now that I've implemented this, and the DataContext is saved as an item in the HttpContext, I get...
Cannot access a disposed object.
Object name: 'DataContext accessed after Dispose.'
...whenever my datacontext is used more than once. This is because my using (...) {} code is disposing my datacontext after its first use.
So, my question is... before I go through my entire data access layer and remove loads of usings, what is the correct way to do this? I don't want to cause a memory leak by taking out the usings, but at the same time I want to share my datacontext across different data access objects.
Should I just remove the usings, and manually call the dispose method just before I return from the web service request? If so then how go I make sure I capture everything bearing in mind I have several try-catch blocks that could get messy.
Is there another better way to do this? Should I just forget about disposing and hope everything is implicitly cleaned up?
UPDATE
The problem doesn't appear to be a performance issue... requests are handled very quickly, no more than about 200ms. In fact I have load tested it by generating lots of fake requests with no problems.
As far as I can see, it is load related for one of two reasons:
A high number of requests causes concurrent requests to affect each other
The problem happens more frequently simply because there are a lot of requests.
When the problem does occur, the application pool goes into a bad state, and requires a recycle to get it working again.
Although I would prefer the unit-of-work approach using using, sometimes it doesn't always fit into your design. Ideally you'd want to ensure that you are freeing up your SqlConnection when you're done with it so that anothe request has a chance of grabbing that connection from the pool. If that is not possible, what you would need is some assurance that the context is disposed of after each request. This could be done a couple of ways:
If you're using WebForms, you can tie the disposal of the DataContext at the end of the page lifecycle. Make a check to the HttpContext.Items collection to determine if the last page had a data context, and if so, dispose of it.
Create a dedicated IHttpModule which attaches an event to the end of the request, where you do the same as above.
The problem with both of the above solutions, is that if you are under heavy load, you'll find that a lot of requests hang about waiting for a connection to be made available, likely timing out. You'll have to weigh up the risks.
All in all, the unit-of-work approach would still be favoured, as you are releasing the resource as soon as it is no longer required.
I managed to fix this myself...
I had a base class that had a method that would create the DataContext instance, like this:
public abstract class MyBase {
protected static DataContext db = null;
protected static DataContext GetDataContext() {
return new DataContext("My Connection String");
}
// rest of class
}
And then, in the classes that inherited MyBase where I wanted to do my queries, I had statements like this:
using (db = GetDataContext()) { ... }
The thing is, I wanted to access the database from both static methods and non-static methods, and so in my base class, I'd declared the db variable as static... Big mistake!
If the DataContext variable is declared as static, during heavy loads when lots of things are happening at the same time the DataContext is shared among the requests, and if something happens on the DataContext at exactly the same time it screws up the instance of the DataContext, and the Database connection stored in the Application pool for all subsequent requested until it's recycled, and the database connection is refreshed.
So, the simple fix is to change this:
protected static DataContext db = null;
to this:
protected DataContext db = null;
...which will break all of the using statements in the static methods. But this can easily be fixed by declaring the DataContext variable in the using instead, like this:
using (DataContext db = GetDataContext()) { ... }
This happens if you have, for example, an object that references another object (i.e. a join between two tables) and you try to access the referenced object after the context has been disposed of. Something like this:
IEnumerable<Father> fathers;
using (var db = GetDataContext())
{
// Assume a Father as a field called Sons of type IEnumerable<Son>
fathers = db.Fathers.ToList();
}
foreach (var father in fathers)
{
// This will get the exception you got
Console.WriteLine(father.Sons.FirstOrDefault());
}
This can be avoided by forcing it to load all the referenced objects like this:
IEnumerable<Father> fathers;
using (var db = GetDataContext())
{
var options = new System.Data.Linq.DataLoadOptions();
options.LoadWith<Father>(f => f.Sons);
db.LoadOptions = options;
fathers = db.Fathers.ToList();
}
foreach (var father in fathers)
{
// This will no longer throw
Console.WriteLine(father.Sons.FirstOrDefault());
}
In a web application that I have run across, I found the following code to deal with the DataContext when dealing with LinqToSQL
public partial class DbDataContext
{
public static DbDataContext DB
{
get
{
if (HttpContext.Current.Items["DB"] == null)
HttpContext.Current.Items["DB"] = new DbDataContext();
return (DbDataContext)HttpContext.Current.Items["DB"];
}
}
}
Then referencing it later doing this:
DbDataContext.DB.Accounts.Single(a => a.accountId == accountId).guid = newGuid;
DbDataContext.DB.SubmitChanges();
I have been looking into best practices when dealing with LinqToSQL.
I am unsure about the approach this one has taken when dealing with DataContext not being ThreadSafe and keeping a static copy of it around.
Is this a good approach to take in a web application?
#Longhorn213 - Based on what you said and the more I have read into HttpContext because of that, I think you are right. But in the application that I have inherited it is confusing this because at the beginning of each method they are requerying the db to get the information, then modifying that instance of the datacontext and submitting changes on it.
From this, I think this method should be discouraged because it is giving the false impression that the datacontext is static and persisting between requests. If a future developer thinks that requerying the data at the beginning of a method because they think it is already there, they could run into problems and not understand why.
So I guess my question is, should this method be discouraged in future development?
This is not a static copy. Note that the property retrieves it from Context.Items, which is per-request. This is a per-request copy of the DataContext, accessed through a static property.
On the other hand, this property is assuming only a single thread per request, which may not be true forever.
A DataContext is cheap to make and you won't gain much by caching it in this way.
I have done many Linq to Sql web apps and I am not sure if what you have would work.
The datacontext is supposed to track the changes you make to your objects and it will not do that in this instance.
So when you go hit submit changes, it will not know that any of your objects where updated, thus not update the database.
You have to do some extra work with the datacontext in a disconnected environment like a web application. It is hardest with an update, but not really that bad. I would not cache and just recreate it.
Also the context itself is not transactional so it is theoretically possible an update could occur on another request and your update could fail.
I prefer to create a Page base class (inherit from System.Web.UI.Page), and expose a DataContext property. It ensures that there is one instance of DataContext per page request.
This has worked well for me, it's a good balance IMHO. You can just call DataContext.SubmitChanges() at the end of the page and be assured that everything is updated. You also ensure that all the changes are for one user at a time.
Doing this via static will cause pain -- I fear DataContext will lose track of changes since it's trying to track changes for many users concurrently. I don't think it was designed for that.
I am currently playing around with the Asp.Net mvc framework and loving it compared to the classic asp.net way. One thing I am mooting is whether or not it is acceptable for a View to cause (indirectly) access to the database?
For example, I am using the controller to populate a custom data class with all the information I think the View needs to go about doing its job, however as I am passing objects to the view it also can cause database reads.
A quick pseudo example.
public interface IProduct
{
/* Some Members */
/* Some Methods */
decimal GetDiscount();
}
public class Product : IProduct
{
public decimal GetDiscount(){ ... /* causes database access */ }
}
If the View has access to the Product class (it gets passed an IProduct object), it can call GetDiscount() and cause database access.
I am thinking of ways to prevent this. Currently I am only coming up with multiple interface inheritance for the Product class. Instead of implementing just IProduct it would now implement IProduct and IProductView. IProductView would list the members of the class, IProduct would contain the method calls which could cause database access.
The 'View' will only know about the IProductView interface onto the class and be unable to call methods which cause data access.
I have other vague thoughts about 'locking' an object before it is passed to the view, but I can foresee huge scope for side effects with such a method.
So, My questions:
Are there any best practices regarding this issue?
How do other people using MVC stop the View being naughty and doing more to objects than they should?
Your view isn't really causing data access. The view is simply calling the GetDiscount() method in a model interface. It's the model which is causing data access. Indeed, you could create another implementation of IProduct which wouldn't cause data access, yet there would be no change to the view.
Model objects that do lazy loading invariably cause data access when the view tries to extract data for display.
Whether it's OK is down to personal taste and preference.
However, unless you've got a good reason for lazy loading, I'd prefer to load the data into the model object and then pass that "ready-baked" for the view to display.
One thing I am mooting is whether or not it is acceptable for a View to cause (indirectly) access to the database?
I've often asked the same question. So many things we access on the Model in Stack Overflow Views can cause implicit database access. It's almost unavoidable. Would love to hear others' thoughts on this.
If you keep your domain objects "persistent ignorant", then you don't have this problem. That is, instead of having getDiscount inside your Product class, why not just have a simple property called Discount? This would then be set by your ORM when loading the instance of the Product class from the database.
The model should not have methods ("actions") that consist of data access. That's the DAL's concern. YOu could have a discount percent property stored in the product class and have the GetDiscount method return a simple calculation such as Price * (100 - discountPercent) or something like this.
I disconnect my business entities (Product in your example) from data access. That's the repository (in my case) 's concern.
I've built a site in MonoRail before that sometimes has methods that trigger data access from the view. I try to avoid it because when it fails, it can fail in unusual and unfixable ways (I can't really try/catch in an NVelocity template, for example). It's totally not the end of the world--I wrote well-abstracted PHP sites for years that accessed the database from the view and they still work well enough because most of the time if something blows up, you're just redirecting to a "Something didn't work"-type error page anyway.
But yeah, I try to avoid it. In a larger sense, my domain model usually doesn't trickle all the way down into the view. Instead, the view is rendering Document objects that are unashamedly just strongly-typed data dumps, with everything pre-formatted, whipped, crushed, and puree'd to the point where the view just has to spit out some strings with some loops and if/else's, transform the number "4" into 4 star images, etc. This document is usually returned by a Web service that sits in front of the beautiful domain model, or it's just a simple struct that is constructed in the controller and passed along as part of the ViewData. If a domain object is used directly, then it usually doesn't do anything to explicitly trigger data access; that's handled by a collection-like repository that the view doesn't have access to and the domain objects usually don't have access to, either.
But you don't have to do it that way. You could just be discplined enough to just not call those methods that touch the database from the view.
The current system that I am working on makes use of Castle Activerecord to provide ORM (Object Relational Mapping) between the Domain objects and the database. This is all well and good and at most times actually works well!
The problem comes about with Castle Activerecords support for asynchronous execution, well, more specifically the SessionScope that manages the session that objects belong to. Long story short, bad stuff happens!
We are therefore looking for a way to easily convert (think automagically) from the Domain objects (who know that a DB exists and care) to the DTO object (who know nothing about the DB and care not for sessions, mapping attributes or all thing ORM).
Does anyone have suggestions on doing this. For the start I am looking for a basic One to One mapping of object. Domain object Person will be mapped to say PersonDTO. I do not want to do this manually since it is a waste.
Obviously reflection comes to mind, but I am hoping with some of the better IT knowledge floating around this site that "cooler" will be suggested.
Oh, I am working in C#, the ORM objects as said before a mapped with Castle ActiveRecord.
Example code:
By #ajmastrean's request I have linked to an example that I have (badly) mocked together. The example has a capture form, capture form controller, domain objects, activerecord repository and an async helper. It is slightly big (3MB) because I included the ActiveRecored dll's needed to get it running. You will need to create a database called ActiveRecordAsync on your local machine or just change the .config file.
Basic details of example:
The Capture Form
The capture form has a reference to the contoller
private CompanyCaptureController MyController { get; set; }
On initialise of the form it calls MyController.Load()
private void InitForm ()
{
MyController = new CompanyCaptureController(this);
MyController.Load();
}
This will return back to a method called LoadComplete()
public void LoadCompleted (Company loadCompany)
{
_context.Post(delegate
{
CurrentItem = loadCompany;
bindingSource.DataSource = CurrentItem;
bindingSource.ResetCurrentItem();
//TOTO: This line will thow the exception since the session scope used to fetch loadCompany is now gone.
grdEmployees.DataSource = loadCompany.Employees;
}, null);
}
}
this is where the "bad stuff" occurs, since we are using the child list of Company that is set as Lazy load.
The Controller
The controller has a Load method that was called from the form, it then calls the Asyc helper to asynchronously call the LoadCompany method and then return to the Capture form's LoadComplete method.
public void Load ()
{
new AsyncListLoad<Company>().BeginLoad(LoadCompany, Form.LoadCompleted);
}
The LoadCompany() method simply makes use of the Repository to find a know company.
public Company LoadCompany()
{
return ActiveRecordRepository<Company>.Find(Setup.company.Identifier);
}
The rest of the example is rather generic, it has two domain classes which inherit from a base class, a setup file to instert some data and the repository to provide the ActiveRecordMediator abilities.
I solved a problem very similar to this where I copied the data out of a lot of older web service contracts into WCF data contracts. I created a number of methods that had signatures like this:
public static T ChangeType<S, T>(this S source) where T : class, new()
The first time this method (or any of the other overloads) executes for two types, it looks at the properties of each type, and decides which ones exist in both based on name and type. It takes this 'member intersection' and uses the DynamicMethod class to emil the IL to copy the source type to the target type, then it caches the resulting delegate in a threadsafe static dictionary.
Once the delegate is created, it's obscenely fast and I have provided other overloads to pass in a delegate to copy over properties that don't match the intersection criteria:
public static T ChangeType<S, T>(this S source, Action<S, T> additionalOperations) where T : class, new()
... so you could do this for your Person to PersonDTO example:
Person p = new Person( /* set whatever */);
PersonDTO = p.ChangeType<Person, PersonDTO>();
And any properties on both Person and PersonDTO (again, that have the same name and type) would be copied by a runtime emitted method and any subsequent calls would not have to be emitted, but would reuse the same emitted code for those types in that order (i.e. copying PersonDTO to Person would also incur a hit to emit the code).
It's too much code to post, but if you are interested I will make the effort to upload a sample to SkyDrive and post the link here.
Richard
use ValueInjecter, with it you can map anything to anything e.g.
object <-> object
object <-> Form/WebForm
DataReader -> object
and it has cool features like: flattening and unflattening
the download contains lots of samples
You should automapper that I've blogged about here:
http://januszstabik.blogspot.com/2010/04/automatically-map-your-heavyweight-orm.html#links
As long as the properties are named the same on both your objects automapper will handle it.
My apologies for not really putting the details in here, but a basic OO approach would be to make the DTO a member of the ActiveRecord class and have the ActiveRecord delegate the accessors and mutators to the DTO. You could use code generation or refactoring tools to build the DTO classes pretty quickly from the AcitveRecord classes.
Actually I got totally confussed now.
Because you are saying: "We are therefore looking for a way to easily convert (think automagically) from the Domain objects (who know that a DB exists and care) to the DTO object (who know nothing about the DB and care not for sessions, mapping attributes or all thing ORM)."
Domain objects know and care about DB? Isn't that the whole point of domain objects to contain business logic ONLY and be totally unaware of DB and ORM?....You HAVE to have these objects? You just need to FIX them if they contain all that stuff...that's why I am a bit confused how DTO's come into picture
Could you provide more details on what problems you're facing with lazy loading?