Is it safe to reuse Entity Framework Sqlite DBContext?

Is it safe to reuse Entity Framework Sqlite DBContext? - c#

I do something like:
public class MyDbContext : DbContext {
public MyDbContext(bool readOnlyFlag) {
// Monitor.Enter(theLock); // needed ??
this.readOnlyFlag = readOnlyFlag;
// Database.EnsureCreated(); // needed ??
}
public DbSet<MyData> MyData { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) {
string connectionString = "Data Source=C:\\mydb.db;";
if (readOnlyFlag) connectionString += "Mode=ReadOnly;";
optionsBuilder.UseSqlite(connectionString);
}
public override void Dispose() {
// Database.CloseConnection(); // needed ??
base.Dispose();
// Monitor.Exit(theLock); // needed ??
}
readonly bool readOnlyFlag;
// readonly object theLock = new object(); // needed ??
}
and then:
using (var dbc = new MyDbContext(true)) {
dbc.MyData.Where( ... code
}
I call such code this from multiple concurrent threads to run different queries.. (in a .Net Core 3.0 console App)
Questions:
If I understand correctly the database file will be opened when the using block starts and closed when it ends. Closing and opening a file on each query seems really inefficient but I could not find any reference to whether or not it's OK to keep a singleton MyDbContext (ie in the Program class) and reuse it ?
If I can reuse MyDbContext should I then use a lock around queries ?
In general do I need to use ie the Monitor remarked above to make sure queries don't run concurrently ? I've seen posts saying Sqlite needs this ?
Do I need to call Database.CloseConnection() ? seems to work fine without it but I've seen posts where it was called like above remarked ?
is Database.EnsureCreated() needed for Sqlite ?
Thanks!

Are you sure that you are the only user of the data? In other words, are you sure that the data does not change between two usages of your dbContext?
Furthermore: are you sure that your dbContext will always be used this way, or might it be that in future this dbContext might be connected to a real database?
If your thread will be the one and only user, now and in future, there is not much harm in reusing the DbContext. However, keep in mind that it is not guaranteed that data is really written before you Dispose the dbContext. Furthermore: your dbContext will keep all fetched data in local memory, so after a while you will have your complete database in local memory.
Consider using a repository pattern, where you hide how the data is persisted, the repository pattern knows a bit more about what your repository is used for and can make smarter decisions about what data to keep in memory and what data to query from your database by a fresh dbContext.
For instance, if you have a database with Schools, Students, and Teachers, and you frequently query their data, but seldom query data of retired Teachers and data of graduated students, your repository could keep all fetched non-retired/graduated Teachers / Students in memory and only create a fresh dbContext to fetch unknown data, fetch retired / graduated data or update the database
interface IRepositorySet<Tentity> : IEnumerable<Tentity>
where Tentity : class, new()
{
Tentity Add(Tentity entity);
Tentity Update(Tentity entity);
Tentity Delete(Tentity entity);
}
interface ISchoolsRepository
{
// for simple queries / add / update / remove only
IRepositorySet<School> Schools {get;}
IRepositorySet<Teacher> Teachers {get;}
IRepositorySet<Student> Students {get;}
}
The RepositorySet knows which dbContext to create when it needs data. All frequently fetched items will be kept in memory in a Dictionary with primary Key.
Upon creation the Dictionary is filled with all primary keys, and value null, indicating that the item is not fetched yet.
When data is requested, the RepositorySet first fetches data from the dictionary. All items that have still a null value will be fetched from a fresh dbContext and put in the dictionary.
Note that this won't work for huge amounts of data. Only consider this solution if you think you can keep all fetched data in memory. But then again: keeping your dbContext open will also keep all fetched data in memory.

You can use DbContext with Sqlite multithreading. Normally, you should use DbContext as instance per request, because of DbContext is not thread safe, one commit should not affect the others.
As mentioned on sqlite's site, it supports mutltithreading:
SQLite supports three different threading modes:
Single-thread. In this mode, all mutexes are disabled and SQLite is
unsafe to use in more than a single thread at once.
Multi-thread. In this mode, SQLite can be safely used by multiple
threads provided that no single database connection is used
simultaneously in two or more threads.
Serialized. In serialized mode, SQLite can be safely used by multiple
threads with no restriction.
The threading mode can be selected at compile-time (when the SQLite
library is being compiled from source code) or at start-time (when the
application that intends to use SQLite is initializing) or at run-time
(when a new SQLite database connection is being created). Generally
speaking, run-time overrides start-time and start-time overrides
compile-time. Except, single-thread mode cannot be overridden once
selected.
The default mode is serialized.
https://www.sqlite.org/threadsafe.html
Also I suggest you to take a look at this SQLite Concurrent Access and this Can I read and write to a SQLite database concurrently from multiple connections? .
According to above posts, sqlite writes locks the entire file even for reads. And in the internet some of the users suggests to taking locks in code explicitly for writes.
But new version of sqlite has a feature called WAL.
The second advantage of WAL-mode is that writers do not block readers
and readers to do not block writers. This is mostly true. But there
are some obscure cases where a query against a WAL-mode database can
return SQLITE_BUSY, so applications should be prepared for that
happenstance.
Sqlite itself says concurrent access even for multiple process can handled by sqlite.
And according to sqlite.org/faq
If your application has a need for a lot of concurrency, then you
should consider using a client/server database. But experience
suggests that most applications need much less concurrency than their
designers imagine.
When SQLite tries to access a file that is locked by another process,
the default behavior is to return SQLITE_BUSY.
It might be need to handled in application itself.

Related

Creating generic DbContext Factory in Entity Framework

I'm using .Net Core 2.1. I'm using more than one DbContext. I'm creating a DbContextFactory for every context. But, I want to do this in a generic way. I want to create only one DbContextFactory. How can I achieve this?
MyDbContextFactory.cs
public interface IDbContextFactory<TContext> where TContext : DbContext
{
DbContext Create();
}
public class MyDbContextFactory : IDbContextFactory<MyDbContext>
{
public IJwtHelper JwtHelper { get; set; }
public MyDbContextCreate()
{
return new MyDbContext(this.JwtHelper);
}
DbContext IDbContextFactory<MyDbContext>.Create()
{
throw new NotImplementedException();
}
}
UnitOfWork.cs
public class UnitOfWork<TContext> : IUnitOfWork<TContext> where TContext : DbContext
{
public static Func<TContext> CreateDbContextFunction { get; set; }
protected readonly DbContext DataContext;
public UnitOfWork()
{
DataContext = CreateDbContextFunction();
}
}
MyDbContext.cs
public class MyDbContext : DbContext
{
private readonly IJwtHelper jwtHelper;
public MyDbContext(IJwtHelper jwtHelper) : base()
{
this.jwtHelper = jwtHelper;
}
}

So you have a database, and a class that represent this database: your DbContext, it should represent the tables and the relations between the tables that are in your database, nothing more.
You decided to separate the operations on your database from the database itself. That is a good thing, because if several users of your database want to do the same thing, they can re-use the code to do it.
For instance, if you want to create "an Order for a Customer with several OrderLines, containing ordered Products, agreed Prices, amount, etc", you'll need to do several things with your database: check if the customer already exists, check if all products already exist, check if there are enough items, etc.
These things are typically things that you should not implement in your DbContext, but in a separate class.
If you add a function like: CreateOrder, then several users can re-use this function. You'll only have to test this only once, and if you decide to change something in your order model, there is only one place where you'll have to change the creation of an Order.
Other advantages of separating the class that represents your database (DbContext)
from the class that handles this data is that will be easier to change the internal structure without having to change the users of your database. You can even decide to change from Dapper to Entity Framework without having to change usage. This makes it also easier to mock the database for test purposes.
Functions like CreateOrder, QueryOrder, UpdateOrder already indicate that they are not generic database actions, they are meant for an Ordering database, not for a School database.
This might have the effect that unit-of-work might not be a proper name for the functionality you want in the separate class. A few years ago, unit-of-work was mainly meant to represent actions on a database, not really a specific database, I'm not really sure about this, because I saw fairly soon that a real unit-of-work class would not enhance functionality of my DbContext.
Quite often you see the following:
A DbContext class that represents your Database: the database that you created, not any generic idea of databases
A Repository class that represent the idea of storing your data somewhere, how this is stored, it could be a DbContext, but also a CSV-file, or a collection of Dictionary classes created for Test purposes. This Repository has an IQueryable, and functions to Add / Update / Remove (as far as needed
A class that represents your problem: the ordering model: Add Order / Update Order / Get Order of a Customer: this class really knows everything about an Order, for instance that it has an OrderTotal, which is probably nowhere to be found in your Ordering database.
Outside DbContext you sometimes may need SQL, for instance to improve efficiency of a call. Outside Repository it should not be seen that you are using SQL
Consider to separate the concerns: how to save your data (DbContext), how to CRUD (create, fetch, update, etc) the data (Repository), how to use the data (combine the tables)
I think what you want to do in your unit-of-work should be done inside the repository. Your Ordering class should create the Repository (which creates the DbContext), query several items to check the data it has to Add / Update, do the Add and Update and save the changes. After that your ordering class should Dispose the Repository, which in turn will Dispose the DbContext.
The Repository class will look very similar to the DbContext class. It has several sets that represent the tables. Every set will implement IQueryable<...> and allow to Add / Update / Remove, whatever is needed.
Because of this similarity in functions you could omit the Repository class and let your Ordering class use the DbContext directly. However, keep in mind, that changes will be bigger if in future you decide that you don't want to use entity framework anymore but some newer concept, or maybe return back to Dapper, or even more low level. SQL will seep through into your Ordering class
What to choose
I think you should answer several questions for yourself:
Is there really only one database that should be represented by your DbContext, could it be that the same DbContext should be used in a 2nd database with the same layout. Think of a test database, or a development database. Wouldn't it be easer / better testable / better changeable, to let your program create the DbContext that is to be used?
Is there really one group of Users of your DbContext: should everyone have the possibility to Delete? to Create? Could it be that some programs only want to query data (the program that e-mails the orders), and that order programs need to add Customers. And maybe another program needs to Add and Update Products, and the amount of Products in the warehouse. Consider Creating different Repository classes for them. Each Repository gets the same DbContext, because they are all accessing the same database
Similarly: only one data processing class (the above mentioned ordering class): should the process that handles Orders, be able to change product prices and add items to the stock?
The reason that you need the factories, is because you don't want to let your "main" program decide what items it should create for the purpose it is running right now. Code would be much easier if you created the items yourself:
Creation sequence for an Ordering Process:
IJwtHelper jwtHelper = ...;
// The product database: all functionality to do everything with Products and Orders
ProductDbContext dbContext = new ProductDbContext(...)
{
JwtHelper = jwtHelper,
...
};
// The Ordering repository: everything to place Orders,
// It can't change ProductPrices, nor can it stock the wharehouse
// So no AddProduct, not AddProductCount,
// of course it has TakeNrOfProducts, to decrease Stock of ordered Products
OrderingRepository repository = new OrderingRepository(...) {DbContext = dbContext};
// The ordering process: Create Order, find Order, ...
// when an order is created, it checks if items are in stock
// the prices of the items, if the Customer exists, etc.
using (OrderingProcess process = new OrderingProcess(...) {Repository = repository})
{
... // check if Customer exists, check if all items in stock, create the Order
process.SaveChanges();
}
When the Process is Disposed, the Repository is Disposed, which in turns Disposes the DbContext.
Something similar for the process that e-mails the Orders: It does not have to check the products, nor create customers, it only has to fetch data, and maybe update that an order has been e-mailed, or that e-mailing failed.
IJwtHelper jwtHelper = ...;
// The product database: all functionality to do everything with Products and Orders
ProductDbContext dbContext = new ProductDbContext(...) {JwtHelper = jwtHelper};
// The E-mail order repository: everything to fetch order data
// It can't change ProductPrices, nor can it stock the wharehouse
// It can't add Customers, nor create Orders
// However it can query a lot: customers, orders, ...
EmailOrderRepository repository = new EmailOrderRepository(...){DbContext = dbContext};
// The e-mail order process: fetch non-emailed orders,
// e-mail them and update the e-mail result
using (EmailOrderProcess process = new EmailOrderProcess(...){Repository = repository}
{
... // fetch the orders that are not e-mailed yet
// email the orders
// warning about orders that can't be emailed
// update successfully logged orders
repository.SaveChanges();
See how much easier you make the creation process, how much more versatile you make it: give the DbContext a different JwtHelper, and the data is logged differently, give the Repository a different DbContext and the data is saved in a different database, give the Process a different Repository, and you'll use Dapper to execute your queries.
Testing will be easier: create a Repository that uses Lists to save the tables, and testing your process with test data will be easy
Changes in databases will be easier. If for instance you later decide to separate your databases into one for your stock and stock prices and one for Customers and Orders, only one Repository needs to change. None of the Processes will notice this change.
Conclusion
Don't let the classes decide which objects they need. Let the creator of the class say: "hey, you need a DbContext? Use this one!" This will omit the need of factories and such
Separate your actual database (DbContext) from the concept of storing and retrieving data (Repository), use a separate class that handles the data without knowing how this data is stored or retrieved (The process class)
Create several Repositories that can only access the data they need to perform the task (+data that can be foreseen in future after expected changed). Don't make too much Repositories, but also not one that can do everything.
Create process classes that do only what they are meant to do. Don't create one process class with 20 different tasks. It will only make it more difficult to describe what it should do, more difficult to test it and more difficult to change the task

If you want to reuse the existing implementation, EF Core 5 provides DbContext Factory out of the box now:
https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-5.0/whatsnew#dbcontextfactory
Make sure you dispose it correctly, as it's instances are not managed by the application's service provider.
See Microsoft documentation
Using a DbContext factory

Which is better to dispose the object?

In case of this method:
public void Delete(int id)
{
using (var connection = GetOpenConnection())
{
connection.Execute($"DELETE FROM MyTable WHERE Id = {id}");
}
}
Or just:
GetOpenConnection().Execute($"DELETE FROM MyTable WHERE Id = {id}");
I wonder if the second is the best option to ease the maintenance and simplify.

First option gives you predictability: connection object returned from GetOpenConnection() will be disposed as soon as connection.Execute finishes.
On the other hand, if you use second approach, you can hope that the connection would be closed at some time in the future, but you have absolutely no certainty of when, and even if, it is going to happen.
Therefore one should prefer the first approach.
Note: Consider parameterizing your query. Even though in your situation insertion of the id into the query is non-threatening because id's type is int, it is a good idea to use parameters consistently throughout your code.

Answering this requires an understanding of how Sql Server (and other databases) use connections, and how ADO.Net uses connection pooling.
Database servers tend to only be able to handle a limited a number of active connections at a time. It has partly to do with the limited available ephemeral ports on a system, but other factors can come into play, as well. This is means it's important to make sure connections are either always closed promptly, or that we carefully limit connection use. If we want a database to scale to a large number of users, we have to do both.
.Net addresses this situation in two ways. First, the ADO.Net library you use for database access (System.Data and company) includes a feature called Connection Pooling. This feature pools and caches connections for you, to make it efficient to quickly open and close connections as needed. The feature means you should not try to keep a shared connection object active for the life of an application or session. Let the connection pool handle this, and create a brand new connection object for most trips to the database.
The other way it addresses the issue is with the IDisposable pattern. IDisposable provides an interface with direct support in the runtime via the using keyword, such that you can be sure unmanaged resources for an object — like that ephemeral port on the database server your connection was holding onto — are cleaned up promptly and in a deterministic way, even if an exception is thrown. This feature makes sure all those short-lived connections you create because of the connection pooling feature really are as short-lived as they need to be.
In other words, the using block in the first sample serves an important function. It's a mistake to omit it. On a busy system it can even lead to a denial of service situation for your database.
You get a sense of this in the question title itself, which asks, "Which is better to dispose the object?" Only one of those two samples disposes the object at all.

You could approach the design in this manner.
using(var context = new CustomerFactory().Create())
return context.RetrieveAll();
Then inside your CustomerContext you would have the dispose logic, the database connection, and your query. But you could create inherit a DbConnectionManager class, which will deal with the connection. But the entire class will be disposed, which would also salvage the connection manager.
public interface ICustomerRepository : IDisposable
{
IEnumerable<Customer> RetrieveAll();
}
public interface ICustomerFactory
{
ICustomerRepository Create();
}
public class CustomerFactory : ICustomerFactory
{
public ICustomerRepository Create() => new CustomerContext();
}
public class CustomerContext : ICustomerRepository
{
public CustomerContext()
{
// Instantiate your connection manager here.
}
public IEnumerable<Customer> RetrieveAll() => dbConnection.Query<Customer>(...);
}
That would be if you want to stub out an expressive call, kind of representing your fluid syntax in option two, without the negative impact.

How to cache database data into memory for use by MVC application?

I have a somewhat complex permission system that uses six database tables in total and in order to speed it up, I would like to cache these tables in memory instead of having to hit the database every page load.
However, I'll need to update this cache when a new user is added or a permission is changed. I'm not sure how to go about having this in memory cache, and how to update it safely without causing problems if its accessed at the same time as updating
Does anyone have an example of how to do something like this or can point me in the right direction for research?

Without knowing more about the structure of the application, there are lots of possible options. One such option might be to abstract the data access behind a repository interface and handle in-memory caching within that repository. Something as simple as a private IEnumerable<T> on the repository object.
So, for example, say you have a User object which contains information about the user (name, permissions, etc.). You'd have a UserRepository with some basic fetch/save methods on it. Inside that repository, you could maintain a private static HashSet<User> which holds User objects which have already been retrieved from the database.
When you fetch a User from the repository, it first checks the HashSet for an object to return, and if it doesn't find out it gets it from the database, adds it to the HashSet, then returns it. When you save a User it updates both the HashSet and the database.
Again, without knowing the specifics of the codebase and overall design, it's hard to give a more specific answer. This should be a generic enough solution to work in any application, though.

I would cache items as you use it, which means on your data layer when you are getting you data back you check on your cache if it is available there otherwise you go to the database and cache the result after.
public AccessModel GetAccess(string accessCode)
{
if(cache.Get<string>(accessCode) != null)
return cache.Get<string>(accessCode);
return GetFromDatabase(accessCode);
}
Then I would think next on my cache invalidate strategy. You can follow two ways:
One would be set expire data to be 1 hour and then you just hit the database once in a hour.
Or invalidate the cache whenever you update the data. That is for sure the best but is a bit more complex.
Hope it helps.
Note: you can either use ASP.NET Cache or another solution like memcached depending on your infrastructure

Is it hitting the database every page load that's the problem or is it joining six tables that's the problem?
If it's just that the join is slow, why not create a database table that summarizes the data in a way that is much easier and faster to query?
This way, you just have to update your summary table each time you add a user or update a permission. If you group all of this into a single transaction, you shouldn't have issues with out-of-sync data.

You can take advantage of ASP.NET Caching and SqlCacheDependency Class. There is article on MSDN.

You can use the Cache object built in ASP.Net. Here is an article that explains how.

I can suggest cache such data in Application state object. For thread-safe usage, consider using lock operator. Your code would look something like this:
public void ClearTableCache(string tableName)
{
lock (System.Web.HttpContext.Current)
{
System.Web.HttpContext.Current.Application[tableName] = null;
}
}
public SomeDataType GetTableData(string tableName)
{
lock (System.Web.HttpContext.Current)
{
if (System.Web.HttpContext.Current.Application[tableName] == null)
{
//get data from DB then put it into application state
System.Web.HttpContext.Current.Application[tableName] = dataFromDb;
return dataFromDb;
}
return (SomeDataType)System.Web.HttpContext.Current.Application[tableName];
}
}

Keep one common DataContext per DAL instead of create it for each database call?

In my DALs I currently use a new DataContext instance for each method, i.e. create the context for each data call, then dispose it (with using). I remember I read that was sort of a best practice.
Now I think that I probably better use one common DataContext per DAL which will require less lines to write and will allow to update changes in the database without attaching the entities to the newly created context.
But I am not sure whether this will impact the productivity of the application. Are there negative things which may appear with this new approach, like maybe "each context reserves a connection line with a database" or "there are only a limited number of contexts available per application"?

From what I read and my own conclusion, the basic rule is: use a single DataContext instance for each short time set of operations, this means:
Use new (separate) instance of DataContext for each operation (transaction) in long living parent objects, such as DALs. For example, the main form has a DAL which uses a DataContext, the main form is the most long living object in a desktop application, thus having a single instance of a DataContext to serve all the main form data operations will not be a good solution due to the increasing cache and risk of the data to become obsolete.
Use single (common) instance of DataContext for all operations in short time living parent objects. For example, if we have a class which executes a set of data operations in a short amount of time, such as takes data from a database, operates with them, updates them, saves the changes to the database and gets disposed, we better create one single instance of the DataContext and use it in all the DAL methods. This relates to a web applications and services as well since they are stateless and are being executed per request.
Example of when I see a requirement of a common DataContext:
DAL:
// Common DAL DataContext field.
DataContext Context = new DataContext();
public IEnumerable<Record> GetRecords()
{
var records = Context.Records;
foreach (var record in records)
{
yield return record;
}
}
public void UpdateData()
{
Context.SaveChanges();
}
BLL:
public void ManageData()
{
foreach (var record in DAL.GetRecords())
{
record.IsUpdated = true;
DAL.UpdateData();
}
}

With this approach you will end up with a lot of objects created in memory (potentially, the whole db) and (which can be even more important), those objects will not correspond to current values in the db (if the db gets updated outside of your application/machine). So, in order to use memory efficiently and to have up-to-data values for your entities, it's really better to create data context per transaction.

LINQ to SQL: Reusing DataContext

I have a number of static methods that perform simple operations like insert or delete a record. All these methods follow this template of using:
public static UserDataModel FromEmail(string email)
{
using (var db = new MyWebAppDataContext())
{
db.ObjectTrackingEnabled = false;
return (from u in db.UserDataModels
where u.Email == email
select u).Single();
}
}
I also have a few methods that need to perform multiple operations that use a DataContext:
public static UserPreferencesDataModel Preferences(string email)
{
return UserDataModel.Preferences(UserDataModel.FromEmail(email));
}
private static UserPreferencesViewModel Preferences(UserDataModel user)
{
using(var db = new MyWebAppDataContext())
{
var preferences = (from u in db.UserDataModels
where u == user
select u.Preferences).Single();
return new UserPreferencesViewModel(preferences);
}
}
I like that I can divide simple operations into faux-stored procedures in my data models with static methods like FromEmail(), but I'm concerned about the cost of having Preferences() invoking two connections (right?) via the two using DataContext statements.
Do I need to be? Is what I'm doing less efficient than using a single using(var db = new MyWebAppDataContext()) statement?

If you examine those "two" operations, you might see that they could be performed in 1 database roundtrip. Minimizing database roundtrips is a major performance objective (second to minimizing database io).
If you have multiple datacontexts, they view the same record differently. Normally, ObjectTracking requires that the same instance is always used to represent a single record. If you have 2 DataContexts, they each do their own object tracking on their own instances.
Suppose the record changes between DC1 observing it and and DC2 observing it. In this case, the record will not only have 2 different instances, but those different instances will have different values. It can be very challenging to express business logic against such a moving target.
You should definately retire the DataContext after the UnitOfWork, to protect yourself from stale instances of records.

Normally you should use one context for one logical unit of work. So have a look at the unit of work pattern, ex. http://dotnet.dzone.com/news/using-unit-work-pattern-entity

Of cause there is some overhead in creating a new DataContext each time. But its a good practice to do as Ludwig stated: One context per unit of work.
Its using connection pooling so its not a too expensive operation.

I also think creating a new DataContext each time is the correct way but this link explains different approaches for handling the data context. Linq to SQL DataContext Lifetime Management

I developed a wrapper component that uses an interface like:
public interface IContextCacher {
DataContext GetFromCache();
void SaveToCache(DataContext ctx);
}
And use a wrapper to instantiate the context; if it exists in cache, it's pulled from there, otherwise, a new instance is created and pushed to the Save method, and all future implementations would get the value from the getter.
Depending on the type of application would be the actual caching mechanism. Say for instance, an ASP.NET web application. This could store the context in the items collection, so its alive for the request only. For a windows app, it could pull it from some singleton collection. It could be whatever you wanted under the scenes.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.