I have very peculiar problem with performance of Entity Framework. I use version 7 of the framework with SQLite provider (both from nuget). Database have around 10 millions of records but in the future there will be around 100 millions. The construction of db is very simple:
public class Sample
{
public int SampleID { get; set; }
public long Time { get; set; }
public short Channel { get; set; } /* values from 0 to 8191, in the presented test 0-15 */
public byte Events { get; set; } /* 1-255 */
}
public class Channel
{
public int ChannelID { get; set; }
public short Ch { get; set; }
public int Es { get; set; }
}
public class MyContext : DbContext
{
// This property defines the table
public DbSet<Sample> Samples { get; set; }
public DbSet<Channel> Spectrum { get; set; }
// This method connects the context with the database
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
var connectionStringBuilder = new SqliteConnectionStringBuilder { DataSource = "E://database.db" };
var connectionString = connectionStringBuilder.ToString();
var connection = new SqliteConnection(connectionString);
optionsBuilder.UseSqlite(connection);
}
}
I try to group events by channel and sum them up into something like spectrum. When I use linq2sql I have very low performance. For 10m of records the query takes about 15 minutes and get around 1 GB of RAM and then throws an OutOfMemoryException - I think that Entity Framework is loading all records as objects into memory - but why? On the other hand, simple SQL needs about 3 seconds and takes no significant amount of RAM.
using (var db = new MyContext())
{
var res1 = from sample in db.Samples
group sample by sample.Channel into g
select new { Channel=g.Key, Events = g.Sum(s => s.Events) };
res1.ToArray();
var res2 = db.Natas.FromSql("SELECT Channel as ChannelID, Channel as Ch, SUM(Events) as Es FROM Sample GROUP BY Channel");
var data = res2.ToArray();
}
Any suggestions? Thank for help ;)
Suggestion? IGNORE ENTITY FRAMEWORK.
As in: this is so totally not an EF issue it is not even funny.
Look at the SQL that EF sends out, then optimize from that level. Oh, you have little influence on the SQL; but for a trivial statement like this the SQL will be optimal.
What will not be optimal - and there is a hint you never looked at the SQL - is the database. Are the indices there? Code first is amazing in that it is ignorant to the intricacies of the database and you need to look at it FIRST from a "is my database optimal". Indices. And - sadly - hardware. If you hit 100 million rows, you need to have the power in the database to handle this.
I think that Entity Framework is loading all records as objects into memory -
but why?
Rule 1 in performance dbugging: DO NOT THINK - CHECK. Look at the SQL generated (log, the res1 variable can show you) and see what gets submitted to the database.
It is possible that you just have that much data. You say nothing about how many channels exist - this may well require a bigger machine.
Check it.
Also: it is not really smart to pull the results into an array unless you need that. Arrays are memory problematic in this scenario (reallocations to get the size) and a LIST may be better (uses more memory but requires no reallocation). In general, though, you want to AVOID materializing the result sets - i.e. work from the enumerable. Not always, but your test may simple show problems on that side. The resulting array may be hugh. And require one piece of memory.
And seriously, question your selection of database technology. SqlLite is nice - it is small, it is lightweight. It is in memory. It is NOT suitable for hugh amounts of data, it is not a full scale database server. You may be much better off using Sql Express (if anything: SQL express will use memory for caching that is NOT in your process but separate). I personally would not use SqlLite for something that may use hundreds of millions of records.
Also: Note your SQL is different. The EF Part has an OrderBy (which is not needed), the SQL not. Ordering may well be expensive. Which puts us back to "get the SQL generated by entity framework".
The problem was connected with SQLite provider. After change to SQL Server Compact everything works fine ;)
I am trying to retrieve a table with several columns and I have created a class that will be able to represent each row as an object with properties.
E.G.
class TableA {
int prop1;
int prop2;
....
}
I am using SqlDataReader to read the value for each row and then assigning it to the object that I have created
TableA tab = new TableA()
tab.prop1 = sqlreader.GetValue(prop1_ordinal).toString();
At the moment I need to explicitly state:
tab.prop1 = etc2..
tab.prop2 = etc2...
This can be quite troublesome when I have quite a few properties (20+ or so).
What other alternatives should I be using?
I am thinking of using a Dictionary or something of the sort but am not sure how to start. That way, I can just use a foreach loop to go through a list of all the properties and set the values.
Essentially, I don't want to put in too much redundant code just to set values.
After all the data has been put into the object, I will essentially write it to a CSV file after the values have been manipulated and changed.
Any thoughts will be appreciated?
Have you tried using linq to generate your table & schema:
[Table(Name = "Test")]
public class TableA
{
[Column(IsPrimaryKey = true)]
public int ID { get; set; }
[Column]
public int prop1 { get; set; }
[Column]
public int prop2 { get; set; }
}
static int main()
{
var constr = #" Data Source=NOTEBOOK\SQLEXPRESS;Initial Catalog=DemoDataContext;Integrated Security=True " ;
var context = new DataContext(constr) { Log = Console.Out };
var metaTable = context.Mapping.GetTable( typeof (TableA));
var typeName = " System.Data.Linq.SqlClient.SqlBuilder " ;
var type = typeof (DataContext).Assembly.GetType(typeName);
var bf = BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.InvokeMethod;
var sql = type.InvokeMember( " GetCreateTableCommand " , bf, null , null , new [] { metaTable });
Console.WriteLine(sql);
// Excute SQL Command
}
Make sure to include System.Data.Linq and:
using System.Data.Linq.Mapping;
using System.Data.Linq;
using System.Reflection;
You can find more information at:
https://msdn.microsoft.com/en-us/library/bb384396.aspx
Once you have everything mapped you can import the data into an object by using linq funcitonality to fill in objects for you !
And an example:
https://social.msdn.microsoft.com/Forums/en-US/2bdfdde6-596e-4880-a3b3-3cb3ec365245/could-i-use-linq-to-sql-create-table-in-my-database?forum=linqtosql
EntityFramework is an ORM (object relational mapper) designed for such a thing. It's perfect for an enterprise-level product, but it's pretty big and bulky if you have a smaller project. Dapper is a super-light-weight ORM that will essentially do the
tab.prop1 = sqlreader.GetValue(prop1_ordinal).toString();
part for you if you name the properties the same as the column names. You pull Dapper in with NuGet. The following code will give you an IEnumerable of your object (TModel).
IEnumerable<TModel> result;
using (MySqlConnection conn = new MySqlConnection(_mysqlConnString))
{
// "Query" is a Dapper extension method that stuffs the datareader into objects based on the column names
result = conn.Query<TModel>("Select * from YourTable");
}
// do stuff with result
This links to a full example, instead of just the piece I pulled out of my current project. http://www.tritac.com/bp-24-dapper-net-by-example
The problem you are describing is one of the main reasons we have Object Relational Mappers (ORM's). The easiest to use for Microsoft SQL Server is probably Linq2Sql, but Entity Framework allows you to use other database engines and will allow you to define more complicated relationships. Another solid ORM is NHibernate.
If you find EntityFramework to be big and bulky you could try Trinity Framework it's a small database first ORM framework with T4 templates and smart mapping to you database.
nugget: PM> Install-Package TrinityFramework
I've written this code to project one to many relation but it's not working:
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
IEnumerable<Store> stores = connection.Query<Store, IEnumerable<Employee>, Store>
(#"Select Stores.Id as StoreId, Stores.Name,
Employees.Id as EmployeeId, Employees.FirstName,
Employees.LastName, Employees.StoreId
from Store Stores
INNER JOIN Employee Employees ON Stores.Id = Employees.StoreId",
(a, s) => { a.Employees = s; return a; },
splitOn: "EmployeeId");
foreach (var store in stores)
{
Console.WriteLine(store.Name);
}
}
Can anybody spot the mistake?
EDIT:
These are my entities:
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public double Price { get; set; }
public IList<Store> Stores { get; set; }
public Product()
{
Stores = new List<Store>();
}
}
public class Store
{
public int Id { get; set; }
public string Name { get; set; }
public IEnumerable<Product> Products { get; set; }
public IEnumerable<Employee> Employees { get; set; }
public Store()
{
Products = new List<Product>();
Employees = new List<Employee>();
}
}
EDIT:
I change the query to:
IEnumerable<Store> stores = connection.Query<Store, List<Employee>, Store>
(#"Select Stores.Id as StoreId ,Stores.Name,Employees.Id as EmployeeId,
Employees.FirstName,Employees.LastName,Employees.StoreId
from Store Stores INNER JOIN Employee Employees
ON Stores.Id = Employees.StoreId",
(a, s) => { a.Employees = s; return a; }, splitOn: "EmployeeId");
and I get rid of exceptions! However, Employees are not mapped at all. I am still not sure what problem it had with IEnumerable<Employee> in first query.
This post shows how to query a highly normalised SQL database, and map the result into a set of highly nested C# POCO objects.
Ingredients:
8 lines of C#.
Some reasonably simple SQL that uses some joins.
Two awesome libraries.
The insight that allowed me to solve this problem is to separate the MicroORM from mapping the result back to the POCO Entities. Thus, we use two separate libraries:
Dapper as the MicroORM.
Slapper.Automapper for mapping.
Essentially, we use Dapper to query the database, then use Slapper.Automapper to map the result straight into our POCOs.
Advantages
Simplicity. Its less than 8 lines of code. I find this a lot easier to understand, debug, and change.
Less code. A few lines of code is all Slapper.Automapper needs to handle anything you throw at it, even if we have a complex nested POCO (i.e. POCO contains List<MyClass1> which in turn contains List<MySubClass2>, etc).
Speed. Both of these libraries have an extraordinary amount of optimization and caching to make them run almost as fast as hand tuned ADO.NET queries.
Separation of concerns. We can change the MicroORM for a different one, and the mapping still works, and vice-versa.
Flexibility. Slapper.Automapper handles arbitrarily nested hierarchies, it isn't limited to a couple of levels of nesting. We can easily make rapid changes, and everything will still work.
Debugging. We can first see that the SQL query is working properly, then we can check that the SQL query result is properly mapped back to the target POCO Entities.
Ease of development in SQL. I find that creating flattened queries with inner joins to return flat results is much easier than creating multiple select statements, with stitching on the client side.
Optimized queries in SQL. In a highly normalized database, creating a flat query allows the SQL engine to apply advanced optimizations to the whole which would not normally be possible if many small individual queries were constructed and run.
Trust. Dapper is the back end for StackOverflow, and, well, Randy Burden is a bit of a superstar. Need I say any more?
Speed of development. I was able to do some extraordinarily complex queries, with many levels of nesting, and the dev time was quite low.
Fewer bugs. I wrote it once, it just worked, and this technique is now helping to power a FTSE company. There was so little code that there was no unexpected behavior.
Disadvantages
Scaling beyond 1,000,000 rows returned. Works well when returning < 100,000 rows. However, if we are bringing back >1,000,000 rows, in order to reduce the traffic between us and SQL server, we should not flatten it out using inner join (which brings back duplicates), we should instead use multiple select statements and stitch everything back together on the client side (see the other answers on this page).
This technique is query oriented. I haven't used this technique to write to the database, but I'm sure that Dapper is more than capable of doing this with some more extra work, as StackOverflow itself uses Dapper as its Data Access Layer (DAL).
Performance Testing
In my tests, Slapper.Automapper added a small overhead to the results returned by Dapper, which meant that it was still 10x faster than Entity Framework, and the combination is still pretty darn close to the theoretical maximum speed SQL + C# is capable of.
In most practical cases, most of the overhead would be in a less-than-optimum SQL query, and not with some mapping of the results on the C# side.
Performance Testing Results
Total number of iterations: 1000
Dapper by itself: 1.889 milliseconds per query, using 3 lines of code to return the dynamic.
Dapper + Slapper.Automapper: 2.463 milliseconds per query, using an additional 3 lines of code for the query + mapping from dynamic to POCO Entities.
Worked Example
In this example, we have list of Contacts, and each Contact can have one or more phone numbers.
POCO Entities
public class TestContact
{
public int ContactID { get; set; }
public string ContactName { get; set; }
public List<TestPhone> TestPhones { get; set; }
}
public class TestPhone
{
public int PhoneId { get; set; }
public int ContactID { get; set; } // foreign key
public string Number { get; set; }
}
SQL Table TestContact
SQL Table TestPhone
Note that this table has a foreign key ContactID which refers to the TestContact table (this corresponds to the List<TestPhone> in the POCO above).
SQL Which Produces Flat Result
In our SQL query, we use as many JOIN statements as we need to get all of the data we need, in a flat, denormalized form. Yes, this might produce duplicates in the output, but these duplicates will be eliminated automatically when we use Slapper.Automapper to automatically map the result of this query straight into our POCO object map.
USE [MyDatabase];
SELECT tc.[ContactID] as ContactID
,tc.[ContactName] as ContactName
,tp.[PhoneId] AS TestPhones_PhoneId
,tp.[ContactId] AS TestPhones_ContactId
,tp.[Number] AS TestPhones_Number
FROM TestContact tc
INNER JOIN TestPhone tp ON tc.ContactId = tp.ContactId
C# code
const string sql = #"SELECT tc.[ContactID] as ContactID
,tc.[ContactName] as ContactName
,tp.[PhoneId] AS TestPhones_PhoneId
,tp.[ContactId] AS TestPhones_ContactId
,tp.[Number] AS TestPhones_Number
FROM TestContact tc
INNER JOIN TestPhone tp ON tc.ContactId = tp.ContactId";
string connectionString = // -- Insert SQL connection string here.
using (var conn = new SqlConnection(connectionString))
{
conn.Open();
// Can set default database here with conn.ChangeDatabase(...)
{
// Step 1: Use Dapper to return the flat result as a Dynamic.
dynamic test = conn.Query<dynamic>(sql);
// Step 2: Use Slapper.Automapper for mapping to the POCO Entities.
// - IMPORTANT: Let Slapper.Automapper know how to do the mapping;
// let it know the primary key for each POCO.
// - Must also use underscore notation ("_") to name parameters in the SQL query;
// see Slapper.Automapper docs.
Slapper.AutoMapper.Configuration.AddIdentifiers(typeof(TestContact), new List<string> { "ContactID" });
Slapper.AutoMapper.Configuration.AddIdentifiers(typeof(TestPhone), new List<string> { "PhoneID" });
var testContact = (Slapper.AutoMapper.MapDynamic<TestContact>(test) as IEnumerable<TestContact>).ToList();
foreach (var c in testContact)
{
foreach (var p in c.TestPhones)
{
Console.Write("ContactName: {0}: Phone: {1}\n", c.ContactName, p.Number);
}
}
}
}
Output
POCO Entity Hierarchy
Looking in Visual Studio, We can see that Slapper.Automapper has properly populated our POCO Entities, i.e. we have a List<TestContact>, and each TestContact has a List<TestPhone>.
Notes
Both Dapper and Slapper.Automapper cache everything internally for speed. If you run into memory issues (very unlikely), ensure that you occasionally clear the cache for both of them.
Ensure that you name the columns coming back, using the underscore (_) notation to give Slapper.Automapper clues on how to map the result into the POCO Entities.
Ensure that you give Slapper.Automapper clues on the primary key for each POCO Entity (see the lines Slapper.AutoMapper.Configuration.AddIdentifiers). You can also use Attributes on the POCO for this. If you skip this step, then it could go wrong (in theory), as Slapper.Automapper would not know how to do the mapping properly.
Update 2015-06-14
Successfully applied this technique to a huge production database with over 40 normalized tables. It worked perfectly to map an advanced SQL query with over 16 inner join and left join into the proper POCO hierarchy (with 4 levels of nesting). The queries are blindingly fast, almost as fast as hand coding it in ADO.NET (it was typically 52 milliseconds for the query, and 50 milliseconds for the mapping from the flat result into the POCO hierarchy). This is really nothing revolutionary, but it sure beats Entity Framework for speed and ease of use, especially if all we are doing is running queries.
Update 2016-02-19
Code has been running flawlessly in production for 9 months. The latest version of Slapper.Automapper has all of the changes that I applied to fix the issue related to nulls being returned in the SQL query.
Update 2017-02-20
Code has been running flawlessly in production for 21 months, and has handled continuous queries from hundreds of users in a FTSE 250 company.
Slapper.Automapper is also great for mapping a .csv file straight into a list of POCOs. Read the .csv file into a list of IDictionary, then map it straight into the target list of POCOs. The only trick is that you have to add a propery int Id {get; set}, and make sure it's unique for every row (or else the automapper won't be able to distinguish between the rows).
Update 2019-01-29
Minor update to add more code comments.
See: https://github.com/SlapperAutoMapper/Slapper.AutoMapper
I wanted to keep it as simple as possible, my solution:
public List<ForumMessage> GetForumMessagesByParentId(int parentId)
{
var sql = #"
select d.id_data as Id, d.cd_group As GroupId, d.cd_user as UserId, d.tx_login As Login,
d.tx_title As Title, d.tx_message As [Message], d.tx_signature As [Signature], d.nm_views As Views, d.nm_replies As Replies,
d.dt_created As CreatedDate, d.dt_lastreply As LastReplyDate, d.dt_edited As EditedDate, d.tx_key As [Key]
from
t_data d
where d.cd_data = #DataId order by id_data asc;
select d.id_data As DataId, di.id_data_image As DataImageId, di.cd_image As ImageId, i.fl_local As IsLocal
from
t_data d
inner join T_data_image di on d.id_data = di.cd_data
inner join T_image i on di.cd_image = i.id_image
where d.id_data = #DataId and di.fl_deleted = 0 order by d.id_data asc;";
var mapper = _conn.QueryMultiple(sql, new { DataId = parentId });
var messages = mapper.Read<ForumMessage>().ToDictionary(k => k.Id, v => v);
var images = mapper.Read<ForumMessageImage>().ToList();
foreach(var imageGroup in images.GroupBy(g => g.DataId))
{
messages[imageGroup.Key].Images = imageGroup.ToList();
}
return messages.Values.ToList();
}
I still do one call to the database, and while i now execute 2 queries instead of one, the second query is using a INNER join instead of a less optimal LEFT join.
A slight modification of Andrew's answer that utilizes a Func to select the parent key instead of GetHashCode.
public static IEnumerable<TParent> QueryParentChild<TParent, TChild, TParentKey>(
this IDbConnection connection,
string sql,
Func<TParent, TParentKey> parentKeySelector,
Func<TParent, IList<TChild>> childSelector,
dynamic param = null, IDbTransaction transaction = null, bool buffered = true, string splitOn = "Id", int? commandTimeout = null, CommandType? commandType = null)
{
Dictionary<TParentKey, TParent> cache = new Dictionary<TParentKey, TParent>();
connection.Query<TParent, TChild, TParent>(
sql,
(parent, child) =>
{
if (!cache.ContainsKey(parentKeySelector(parent)))
{
cache.Add(parentKeySelector(parent), parent);
}
TParent cachedParent = cache[parentKeySelector(parent)];
IList<TChild> children = childSelector(cachedParent);
children.Add(child);
return cachedParent;
},
param as object, transaction, buffered, splitOn, commandTimeout, commandType);
return cache.Values;
}
Example usage
conn.QueryParentChild<Product, Store, int>("sql here", prod => prod.Id, prod => prod.Stores)
According to this answer there is no one to many mapping support built into Dapper.Net. Queries will always return one object per database row. There is an alternative solution included, though.
Here is another method:
Order (one) - OrderDetail (many)
using (var connection = new SqlCeConnection(connectionString))
{
var orderDictionary = new Dictionary<int, Order>();
var list = connection.Query<Order, OrderDetail, Order>(
sql,
(order, orderDetail) =>
{
Order orderEntry;
if (!orderDictionary.TryGetValue(order.OrderID, out orderEntry))
{
orderEntry = order;
orderEntry.OrderDetails = new List<OrderDetail>();
orderDictionary.Add(orderEntry.OrderID, orderEntry);
}
orderEntry.OrderDetails.Add(orderDetail);
return orderEntry;
},
splitOn: "OrderDetailID")
.Distinct()
.ToList();
}
Source: http://dapper-tutorial.net/result-multi-mapping#example---query-multi-mapping-one-to-many
Here is a crude workaround
public static IEnumerable<TOne> Query<TOne, TMany>(this IDbConnection cnn, string sql, Func<TOne, IList<TMany>> property, dynamic param = null, IDbTransaction transaction = null, bool buffered = true, string splitOn = "Id", int? commandTimeout = null, CommandType? commandType = null)
{
var cache = new Dictionary<int, TOne>();
cnn.Query<TOne, TMany, TOne>(sql, (one, many) =>
{
if (!cache.ContainsKey(one.GetHashCode()))
cache.Add(one.GetHashCode(), one);
var localOne = cache[one.GetHashCode()];
var list = property(localOne);
list.Add(many);
return localOne;
}, param as object, transaction, buffered, splitOn, commandTimeout, commandType);
return cache.Values;
}
its by no means the most efficient way, but it will get you up and running. I'll try and optimise this when i get a chance.
use it like this:
conn.Query<Product, Store>("sql here", prod => prod.Stores);
bear in mind your objects need to implement GetHashCode, perhaps like this:
public override int GetHashCode()
{
return this.Id.GetHashCode();
}
We're using Entity Framework 4.1 for our data acces and while building up objects and we started asking questions to ourselves
about how chatty the application was going to be with the database. Now one item that we really started looking at is below:
public MasterPreAward()
{
public int ID
public int MemberID
public int CycleID
public virtual Cycle
public virtual Member
public virtual Status
public virtual ICollection<DataTracking> DataTrackings
public virtual ICollection<ReviewerAssignment> Reviewers
}
The MasterPreAward is a generated entity from the database and has the navigation properites of Cycle, Member, Status along with two collections for DataTrackings
Reviewers. What we were wondering was, how did Entity Framework load up the child objects based off of these items and bring back the data we use in the follow model?
As you can see, we're passing in MasterPreAward object and then accessing children properties which are loaded based on the MasterPreAward.
public ViewHeaderSummary(MasterPreAward masterPreAward)
{
MasterPreAwardId = masterPreAward.ID;
ClientId = masterPreAward.Cycle.Project.Program.ClientID;
ApplicationId = masterPreAward.MemberID;
ProgramId = masterPreAward.Cycle.Project.ProgramID;
ProjectId = masterPreAward.Cycle.ProjectID;
EventTypeId = masterPreAward.DataTrackings.FirstOrDefault(x=>x.Finished==true
&& x.EventTypeID==(int)FormEvents.Application).EventTypeID;
CycleId = masterPreAward.CycleID;
FormId = masterPreAward.Cycle.CycleForms.FirstOrDefault().FormID;
}
What we'd like to know, is this the best way to access these properties, or should be really be thinking doing this type of work in a different way?
I believe the default settings would be to lazy load each nested collection independently, which could cause a lot of database traffic.
The best way to verify the generated SQL is to start a SQL profiler and confirm the number of queries.
You can force EF to eagerly load related entities by calling .Include method. See here for more details.
You don't seem to query for full entities but only for a bunch of scalar values. In my opinion this would be a good candidate for a projection which collects all the needed values in a single database roundtrip:
var result = dbContext.MasterPreAwards
.Where(m => m.ID == masterPreAward.ID)
.Select(m => new
{
ClientId = m.Cycle.Project.Program.ClientID,
ProgramId = m.Cycle.Project.ProgramID,
ProjectId = m.Cycle.ProjectID,
EventTypeId = m.DataTrackings.Where(d => d.Finished
&& x.EventTypeID==(int)FormEvents.Application)
.Select(d => d.EventTypeID).FirstOrDefault(),
FormId = m.Cycle.CycleForms.Select(c => c.FormID).FirstOrDefault()
})
.Single();
MasterPreAwardId = masterPreAward.ID;
ClientId = result.ClientID;
ApplicationId = masterPreAward.MemberID;
ProgramId = result.ProgramID;
ProjectId = result.ProjectID;
EventTypeId = result.EventTypeId;
CycleId = masterPreAward.CycleID;
FormId = result.FormID;
As you can see, you need the DbContext to run such a query.
Your original way to lazily load all related entities will lead to 5 database queries as far as I can see (for Cycle, Project, Program, DataTrackings and CycleForms). Worst of all are the queries for DataTrackings.FirstOrDefault and CycleForms.FirstOrDefault which will actually load the full collections first from the database into memory and then execute FirstOrDefault in memory on the loaded collections to return only one single element from which you then only use one single property.
(Edit: Query for ApplicationId and CycleId not necessary, Code changed.)