I currently have 2 tables:
Table1, with the following columns:
Id, TypeId, VersionId, AnotherColumn1, AnotherColumn2, AnotherColumn3
Table2, with the following columns:
Id, TypeId, VersionId, DifferentColumn1, DifferentColumn2
The only thing these 2 tables have in common are TypeId and VersionId.
I am trying to get only the TypeId and VersionId from Table1 AS LONG as that specific **Type Id + VersionId combination **is not in Table 2.
I have tried the following:
var result1 = this.Table2
.Select(k => new { TypeId = k.TypeId, VersionId = k.VersionId })
.ToArray() // Trying to first select all possible TypeId + VersionId combinations from Table2
var finalResult = this.Table1
. // This is where I am lost, should I use a `.Except`?, some kind of `.Where`?
This should be possible with grouping a left outer join using DefaultIfEmpty:
var results = context.Table1
.GroupJoin(context.Table2,
t1 => new {t1.TypeId, t1.VersionId},
t2 => new {t2.TypeId, t2.VersionId},
(t1, t2) => new { Values = new {t1.TypeId, t1.VersionId}, Table2s = t2.DefaultIfEmpty().Where(x => x != null) })
.Where(g => !g.Table2s.Any())
.Select(g => g.Values)
.ToList();
This may look a bit complicated but we essentially do an outer join between the tables on the TypeId and VersionId. When fetching the grouped result of that join we have to tell EF to exclude any cases where Table2 would be #null, otherwise we would get a collection with 1 #null element. (There may be a more optimal way to get this to work, but the above does work) This is grouped by the combination of values requested (TypeId and VersionId from Table 1). From there it is just filtering out the results where there are no Table2 records, and selecting the "Key" from the grouping, which is our desired values.
With the help of **LINQ Except** also you can do this
Please find the following details with an example
public class Table1
{
public int Id { get; set; }
public int TypeId { get; set; }
public int VersionId { get; set; }
public string AnotherColumn1 { get; set; }
}
public class Table2
{
public int Id { get; set; }
public int TypeId { get; set; }
public int VersionId { get; set; }
public string DifferentColumn1 { get; set; }
}
private void GetValuesUsingExcept()
{
List<Table1> lstTable1 = new List<Table1>()
{
new Table1{Id=1, TypeId=101, VersionId=201, AnotherColumn1="Test1"},
new Table1{Id=2, TypeId=102, VersionId=202, AnotherColumn1="Test2"},
new Table1{Id=3, TypeId=103, VersionId=203, AnotherColumn1="Test3"}
};
List<Table2> lstTable2 = new List<Table2>()
{
new Table2{Id=1, TypeId=101, VersionId=201, DifferentColumn1="DiffVal1"},
new Table2{Id=2, TypeId=102, VersionId=202, DifferentColumn1="DiffVal2"},
new Table2{Id=4, TypeId=104, VersionId=204, DifferentColumn1="DiffVal3"}
};
var output = lstTable1.Select(s1 => new { s1.TypeId, s1.VersionId }).Except(lstTable2.Select(s2 => new { s2.TypeId, s2.VersionId })).ToList();
}
I have a TVP+SP insert strategy implemented as i need to insert big amounts of rows (probably concurrently) while being able to get some info in return like Id and stuff. Initially I'm using EF code first approach to generate the DB structure. My entities:
FacilityGroup
public class FacilityGroup
{
public int Id { get; set; }
[Required]
public string Name { get; set; }
public string InternalNotes { get; set; }
public virtual List<FacilityInstance> Facilities { get; set; } = new List<FacilityInstance>();
}
FacilityInstance
public class FacilityInstance
{
public int Id { get; set; }
[Required]
[Index("IX_FacilityName")]
[StringLength(450)]
public string Name { get; set; }
[Required]
public string FacilityCode { get; set; }
//[Required]
public virtual FacilityGroup FacilityGroup { get; set; }
[ForeignKey(nameof(FacilityGroup))]
[Index("IX_FacilityGroupId")]
public int FacilityGroupId { get; set; }
public virtual List<DataBatch> RelatedBatches { get; set; } = new List<DataBatch>();
public virtual HashSet<BatchRecord> BatchRecords { get; set; } = new HashSet<BatchRecord>();
}
BatchRecord
public class BatchRecord
{
public long Id { get; set; }
//todo index?
public string ItemName { get; set; }
[Index("IX_Supplier")]
[StringLength(450)]
public string Supplier { get; set; }
public decimal Quantity { get; set; }
public string ItemUnit { get; set; }
public string EntityUnit { get; set; }
public decimal ItemSize { get; set; }
public decimal PackageSize { get; set; }
[Index("IX_FamilyCode")]
[Required]
[StringLength(4)]
public string FamilyCode { get; set; }
[Required]
public string Family { get; set; }
[Index("IX_CategoryCode")]
[Required]
[StringLength(16)]
public string CategoryCode { get; set; }
[Required]
public string Category { get; set; }
[Index("IX_SubCategoryCode")]
[Required]
[StringLength(16)]
public string SubCategoryCode { get; set; }
[Required]
public string SubCategory { get; set; }
public string ItemGroupCode { get; set; }
public string ItemGroup { get; set; }
public decimal PurchaseValue { get; set; }
public decimal UnitPurchaseValue { get; set; }
public decimal PackagePurchaseValue { get; set; }
[Required]
public virtual DataBatch DataBatch { get; set; }
[ForeignKey(nameof(DataBatch))]
public int DataBatchId { get; set; }
[Required]
public virtual FacilityInstance FacilityInstance { get; set; }
[ForeignKey(nameof(FacilityInstance))]
[Index("IX_FacilityInstance")]
public int FacilityInstanceId { get; set; }
[Required]
public virtual Currency Currency { get; set; }
[ForeignKey(nameof(Currency))]
public int CurrencyId { get; set; }
}
DataBatch
public class DataBatch
{
public int Id { get; set; }
[Required]
public string Name { get; set; }
public DateTime DateCreated { get; set; }
public BatchStatus BatchStatus { get; set; }
public virtual List<FacilityInstance> RelatedFacilities { get; set; } = new List<FacilityInstance>();
public virtual HashSet<BatchRecord> BatchRecords { get; set; } = new HashSet<BatchRecord>();
}
And then my SQL Server related code, TVP Structure:
CREATE TYPE dbo.RecordImportStructure
AS TABLE (
ItemName VARCHAR(MAX),
Supplier VARCHAR(MAX),
Quantity DECIMAL(18, 2),
ItemUnit VARCHAR(MAX),
EntityUnit VARCHAR(MAX),
ItemSize DECIMAL(18, 2),
PackageSize DECIMAL(18, 2),
FamilyCode VARCHAR(4),
Family VARCHAR(MAX),
CategoryCode VARCHAR(MAX),
Category VARCHAR(MAX),
SubCategoryCode VARCHAR(MAX),
SubCategory VARCHAR(MAX),
ItemGroupCode VARCHAR(MAX),
ItemGroup VARCHAR(MAX),
PurchaseValue DECIMAL(18, 2),
UnitPurchaseValue DECIMAL(18, 2),
PackagePurchaseValue DECIMAL(18, 2),
FacilityCode VARCHAR(MAX),
CurrencyCode VARCHAR(MAX)
);
Insert stored procedure:
CREATE PROCEDURE dbo.ImportBatchRecords (
#BatchId INT,
#ImportTable dbo.RecordImportStructure READONLY
)
AS
SET NOCOUNT ON;
DECLARE #ErrorCode int
DECLARE #Step varchar(200)
--Clear old stuff?
--TRUNCATE TABLE dbo.BatchRecords;
INSERT INTO dbo.BatchRecords (
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
DataBatchId,
FacilityInstanceId,
CurrencyId
)
OUTPUT INSERTED.Id
SELECT
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
#BatchId,
--FacilityInstanceId,
--CurrencyId
(SELECT TOP 1 f.Id from dbo.FacilityInstances f WHERE f.FacilityCode=FacilityCode),
(SELECT TOP 1 c.Id from dbo.Currencies c WHERE c.CurrencyCode=CurrencyCode)
FROM #ImportTable;
And finally my quick, test only solution to execute this stuff on .NET side.
public class BatchRecordDataHandler : IBulkDataHandler<BatchRecordImportItem>
{
public async Task<int> ImportAsync(SqlConnection conn, SqlTransaction transaction, IEnumerable<BatchRecordImportItem> src)
{
using (var cmd = new SqlCommand())
{
cmd.CommandText = "ImportBatchRecords";
cmd.Connection = conn;
cmd.Transaction = transaction;
cmd.CommandType = CommandType.StoredProcedure;
cmd.CommandTimeout = 600;
var batchIdParam = new SqlParameter
{
ParameterName = "#BatchId",
SqlDbType = SqlDbType.Int,
Value = 1
};
var tableParam = new SqlParameter
{
ParameterName = "#ImportTable",
TypeName = "dbo.RecordImportStructure",
SqlDbType = SqlDbType.Structured,
Value = DataToSqlRecords(src)
};
cmd.Parameters.Add(batchIdParam);
cmd.Parameters.Add(tableParam);
cmd.Transaction = transaction;
using (var res = await cmd.ExecuteReaderAsync())
{
var resultTable = new DataTable();
resultTable.Load(res);
var cnt = resultTable.AsEnumerable().Count();
return cnt;
}
}
}
private IEnumerable<SqlDataRecord> DataToSqlRecords(IEnumerable<BatchRecordImportItem> src)
{
var tvpSchema = new[] {
new SqlMetaData("ItemName", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("Supplier", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("Quantity", SqlDbType.Decimal),
new SqlMetaData("ItemUnit", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("EntityUnit", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("ItemSize", SqlDbType.Decimal),
new SqlMetaData("PackageSize", SqlDbType.Decimal),
new SqlMetaData("FamilyCode", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("Family", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("CategoryCode", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("Category", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("SubCategoryCode", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("SubCategory", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("ItemGroupCode", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("ItemGroup", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("PurchaseValue", SqlDbType.Decimal),
new SqlMetaData("UnitPurchaseValue", SqlDbType.Decimal),
new SqlMetaData("PackagePurchaseValue", SqlDbType.Decimal),
new SqlMetaData("FacilityInstanceId", SqlDbType.VarChar, SqlMetaData.Max),
new SqlMetaData("CurrencyId", SqlDbType.VarChar, SqlMetaData.Max),
};
var dataRecord = new SqlDataRecord(tvpSchema);
foreach (var importItem in src)
{
dataRecord.SetValues(importItem.ItemName,
importItem.Supplier,
importItem.Quantity,
importItem.ItemUnit,
importItem.EntityUnit,
importItem.ItemSize,
importItem.PackageSize,
importItem.FamilyCode,
importItem.Family,
importItem.CategoryCode,
importItem.Category,
importItem.SubCategoryCode,
importItem.SubCategory,
importItem.ItemGroupCode,
importItem.ItemGroup,
importItem.PurchaseValue,
importItem.UnitPurchaseValue,
importItem.PackagePurchaseValue,
importItem.FacilityCode,
importItem.CurrencyCode);
yield return dataRecord;
}
}
}
Import entity structure:
public class BatchRecordImportItem
{
public string ItemName { get; set; }
public string Supplier { get; set; }
public decimal Quantity { get; set; }
public string ItemUnit { get; set; }
public string EntityUnit { get; set; }
public decimal ItemSize { get; set; }
public decimal PackageSize { get; set; }
public string FamilyCode { get; set; }
public string Family { get; set; }
public string CategoryCode { get; set; }
public string Category { get; set; }
public string SubCategoryCode { get; set; }
public string SubCategory { get; set; }
public string ItemGroupCode { get; set; }
public string ItemGroup { get; set; }
public decimal PurchaseValue { get; set; }
public decimal UnitPurchaseValue { get; set; }
public decimal PackagePurchaseValue { get; set; }
public int DataBatchId { get; set; }
public string FacilityCode { get; set; }
public string CurrencyCode { get; set; }
}
Please don't mind useless reader at the end, doesn't really do much. So without the reader inserting 2.5kk rows took around 26 minutes while SqlBulkCopy took around 6+- minutes. Is there something I'm doing fundamentally wrong? I’m using IsolationLevel.Snapshot if this matters. Using SQL Server 2014, free to change DB structure and indices.
UPD 1
Done a couple of adjustments/improvement attempts described by #Xedni, specifically:
Limited all string fields that didn't have a max length to some fixed length
Changed all TVP members from VARCHAR(MAX) to VARCHAR(*SomeValue*)
Added a unique index to FacilityInstance->FacilityCode
Added a unique index to Curreency->CurrencyCode
Tried adding WITH RECOMPILE to my SP
Tried using DataTable instead of IEnumerable<SqlDataRecord>
Tried batchinng data into smaller buckets, 50k and 100k per SP execution instead of 2.5kk
My structure is now like this:
CREATE TYPE dbo.RecordImportStructure
AS TABLE (
ItemName VARCHAR(4096),
Supplier VARCHAR(450),
Quantity DECIMAL(18, 2),
ItemUnit VARCHAR(2048),
EntityUnit VARCHAR(2048),
ItemSize DECIMAL(18, 2),
PackageSize DECIMAL(18, 2),
FamilyCode VARCHAR(16),
Family VARCHAR(512),
CategoryCode VARCHAR(16),
Category VARCHAR(512),
SubCategoryCode VARCHAR(16),
SubCategory VARCHAR(512),
ItemGroupCode VARCHAR(16),
ItemGroup VARCHAR(512),
PurchaseValue DECIMAL(18, 2),
UnitPurchaseValue DECIMAL(18, 2),
PackagePurchaseValue DECIMAL(18, 2),
FacilityCode VARCHAR(450),
CurrencyCode VARCHAR(4)
);
So far no noticeable performance gains unfortunately, 26-28 min as before
UPD 2
Checked the execution plan - indices are my bane?
UPD 3
Added OPTION (RECOMPILE); at the end of my SP, gained a minor boost, now sitting at ~25m for 2.5kk
You could set traceflag 2453:
FIX: Poor performance when you use table variables in SQL Server 2012 or SQL Server 2014
When you use a table variable in a batch or procedure, the query is compiled and optimized for the initial empty state of table variable. If this table variable is populated with many rows at runtime, the pre-compiled query plan may no longer be optimal. For example, the query may be joining a table variable with nested loop since it is usually more efficient for small number of rows. This query plan can be inefficient if the table variable has millions of rows. A hash join may be a better choice under such condition. To get a new query plan, it needs to be recompiled. Unlike other user or temporary tables, however, row count change in a table variable does not trigger a query recompile. Typically, you can work around this with OPTION (RECOMPILE), which has its own overhead cost.
The trace flag 2453 allows the benefit of query recompile without OPTION (RECOMPILE). This trace flag differs from OPTION (RECOMPILE) in two main aspects.
(1) It uses the same row count threshold as other tables. The query does not need to be compiled for every execution unlike OPTION (RECOMPILE). It would trigger recompile only when the row count change exceeds the predefined threshold.
(2) OPTION (RECOMPILE) forces the query to peek parameters and optimize the query for them. This trace flag does not force parameter peeking.
You can turn on trace flag 2453 to allow a table variable to trigger recompile when enough number of rows are changed. This may allow the query optimizer to choose a more efficient plan
Try with the following stored procedure:
CREATE PROCEDURE dbo.ImportBatchRecords (
#BatchId INT,
#ImportTable dbo.RecordImportStructure READONLY
)
AS
SET NOCOUNT ON;
DECLARE #ErrorCode int
DECLARE #Step varchar(200)
CREATE TABLE #FacilityInstances
(
Id int NOT NULL,
FacilityCode varchar(512) NOT NULL UNIQUE WITH (IGNORE_DUP_KEY=ON)
);
CREATE TABLE #Currencies
(
Id int NOT NULL,
CurrencyCode varchar(512) NOT NULL UNIQUE WITH (IGNORE_DUP_KEY = ON)
)
INSERT INTO #FacilityInstances(Id, FacilityCode)
SELECT Id, FacilityCode FROM dbo.FacilityInstances
WHERE FacilityCode IS NOT NULL AND Id IS NOT NULL;
INSERT INTO #Currencies(Id, CurrencyCode)
SELECT Id, CurrencyCode FROM dbo.Currencies
WHERE CurrencyCode IS NOT NULL AND Id IS NOT NULL
INSERT INTO dbo.BatchRecords (
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
DataBatchId,
FacilityInstanceId,
CurrencyId
)
OUTPUT INSERTED.Id
SELECT
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
#BatchId,
F.Id,
C.Id
FROM
#FacilityInstances F RIGHT OUTER HASH JOIN
(
#Currencies C
RIGHT OUTER HASH JOIN #ImportTable IT
ON C.CurrencyCode = IT.CurrencyCode
)
ON F.FacilityCode = IT.FacilityCode
This enforces the execution plan to use hash match joins instead of nested loops. I think the culprit of bad performance is the first nested loop that performs an index scan for each row in #ImportTable
I don't know if CurrencyCode is unique in Currencies table, so I create the temporal table #Currencies with unique currency codes.
I don't know if FacilityCode is unique in Facilities table, so I create the temporal table #FacilityInstances with unique facility codes.
If they are unique you don't need the temporal tables, you can use the permanent tables directly.
Assuming CurrencyCode and FacilityCode are unique the following stored procedure would be better because it doesn't create unnecessary temporary tables:
CREATE PROCEDURE dbo.ImportBatchRecords (
#BatchId INT,
#ImportTable dbo.RecordImportStructure READONLY
)
AS
SET NOCOUNT ON;
DECLARE #ErrorCode int
DECLARE #Step varchar(200)
INSERT INTO dbo.BatchRecords (
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
DataBatchId,
FacilityInstanceId,
CurrencyId
)
OUTPUT INSERTED.Id
SELECT
ItemName,
Supplier,
Quantity,
ItemUnit,
EntityUnit,
ItemSize,
PackageSize,
FamilyCode,
Family,
CategoryCode,
Category,
SubCategoryCode,
SubCategory,
ItemGroupCode,
ItemGroup,
PurchaseValue,
UnitPurchaseValue,
PackagePurchaseValue,
#BatchId,
F.Id,
C.Id
FROM
dbo.FacilityInstances F RIGHT OUTER HASH JOIN
(
dbo.Currencies C
RIGHT OUTER HASH JOIN #ImportTable IT
ON C.CurrencyCode = IT.CurrencyCode
)
ON F.FacilityCode = IT.FacilityCode
I would guess your proc could use some love. Without seeing an execution plan its hard to say for sure, but here are some thoughts.
A table variable (which a table-valued-parameter essentially is) is always assumed by SQL Server to contain exactly 1 row (even if it doesn't). This is irrelevant for many cases, but you have two correlated subqueries in your insert list which is where I'd focus my attention. It's more than likely hammering that poor table variable with a bunch of nested loop joins because of the cardinality estimate. I would consider putting the rows from your TVP into a temp table, updating the temp table with the IDs from FacilityInstances and Currencies then do your final insert from that.
Well... why not just use SQL Bulk Copy?
There's plenty of solutions out there that help you convert a collection of entities into a IDataReader object that can be handed directly to SqlBulkCopy.
This is a good start...
https://github.com/matthewschrager/Repository/blob/master/Repository.EntityFramework/EntityDataReader.cs
Then it becomes as simple as...
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
IDataReader dataReader = storeEntities.AsDataReader();
bulkCopy.WriteToServer(dataReader);
I've used this code, the one caveat is that you need to be quite careful about the definition of your entity. The order of the properties in the entity determines the order of the columns exposed by the IDataReader and this needs to correlate with the order of the columns in the table that you are bulk copying to.
Alternatively there's other code here..
https://www.codeproject.com/Tips/1114089/Entity-Framework-Performance-Tuning-Using-SqlBulkC
I know there is an accepted answer, but I can't resist. I believe you can improve the performance 20-50% over the accepted answer.
The key is to SqlBulkCopy to the final table dbo.BatchRecords directly.
To make this happen you need FacilityInstanceId and CurrencyId before to SqlBulkCopy. To get them, load SELECT Id, FacilityCode FROM FacilityIntances and SELECT Id, CurrencyCode FROM Currencies into collections, then build a dictionary:
var facilityIdByFacilityCode = facilitiesCollection.ToDictionary(x => x.FacilityCode, x => x.Id);
var currencyIdByCurrencyCode = currenciesCollection.ToDictionnary(x => x.CurrencyCode, x => x.Id);
Once you have the dictionaries, getting the id's from the codes is constant time cost. This is equivalent and very similar to HASH MATCH JOIN in SQL Server, but at the client side.
The other barrier you need to tear down is to get the Id column of new inserted rows in dbo.BatchRecords table. Actually can you get the Ids before inserting them.
Make the Id column "sequence driven":
CREATE SEQUENCE BatchRecords_Id_Seq START WITH 1;
CREATE TABLE BatchRecords
(
Id int NOT NULL CONSTRAINT DF_BatchRecords_Id DEFAULT (NEXT VALUE FOR BatchRecords_Id_Seq),
.....
CONSTRAINT PK_BatchRecords PRIMARY KEY (Id)
)
One you have the BatchRecords collection, you know how many records are in it. You can then reserve a contiguous range of sequences. Execute the following T-SQL:
DECLARE #BatchCollectionCount int = 2500 -- Replace with the actual value
DECLARE #range_first_value sql_variant
DECLARE #range_last_value sql_variant
EXEC sp_sequence_get_range
#sequence_name = N'BatchRecords_Id_Seq',
#range_size = #BatchCollectionCount,
#range_first_value = #range_first_value OUTPUT,
#range_last_value = #range_last_value OUTPUT
SELECT
CAST(#range_first_value AS INT) AS range_first_value,
CAST(#range_last_value AS int) as range_last_value
This returns range_first_value and range_last_value. You can now assign BatchRecord.Id to each record:
int id = range_first_value;
foreach (var record in batchRecords)
{
record.Id = id++;
}
Next, you can SqlBulkCopy the batch record collection directly into the final table dbo.BatchRecords.
To get a DataReader from an IEnumerable<T> to feed SqlBulkCopy.WriteToServer you can use code like this which is part of EntityLite, a micro ORM I developed.
You can make it even faster if you cache facilityIdByFacilityCode and currencyIdByCurrencyCode. To be sure these dictionaries are up to date you can use SqlDependencyor techniques like this one.
I have a Customer class with the following properties:
public int Id { get; set; }
public string Name { get; set; }
public int AddressId { get; set; }
public Address Address { get; set; }
My goal is to write a Dapper query that will use an Inner Join to populate the entire Address property within each Customer that is returned.
Here is what I have and it is working but I am wondering if this is the cleanest/simplest way to do it:
StringBuilder sql = new StringBuilder();
using (var conn = GetOpenConnection())
{
sql.AppendLine("SELECT c.Id, c.Name, c.AddressId, a.Address1, a.Address2, a.City, a.State, a.ZipCode ");
sql.AppendLine("FROM Customer c ");
sql.AppendLine("INNER JOIN Address a ON c.AddressId = a.Id ");
return conn.Query<Customer, Address, Customer>(
sql.ToString(),
(customer, address) => {
customer.Address= address;
return userRole;
},
splitOn: "AddressId"
).ToList();
}
I have some concern about adding another property such as:
public Contact Contact { get; set; }
I am not sure how I would switch the syntax above to populate both Address and Contact.
I have coded using Dapper version 1.40 and I have written queries like the way below, I haven't got any issues to populate mote more than one object, but I have faced a limit of 8 different classes those I can map in a query.
public class Customer {
public int Id { get; set; }
public string Name { get; set; }
public int AddressId { get; set; }
public int ContactId { get; set; }
public Address Address { get; set; }
public Contact Contact { get; set; }
}
public class Address {
public int Id { get; set; }
public string Address1 {get;set;}
public string Address2 {get;set;}
public string City {get;set;}
public string State {get;set;}
public int ZipCode {get;set;}
public IEnumerable<Customer> Customer {get;set;}
}
public class Contact {
public int Id { get; set; }
public string Name { get; set; }
public IEnumerable<Customer> Customer {get;set;}
}
using (var conn = GetOpenConnection())
{
var query = _contextDapper
.Query<Customer, Address, Contact, Customer>($#"
SELECT c.Id, c.Name,
c.AddressId, a.Id, a.Address1, a.Address2, a.City, a.State, a.ZipCode,
c.ContactId, ct.Id, ct.Name
FROM Customer c
INNER JOIN Address a ON a.Id = c.AddressId
INNER JOIN Contact ct ON ct.Id = c.ContactId",
(c, a, ct) =>
{
c.LogType = a;
c.Contact = ct;
return c;
}, splitOn: "AddressId, ContactId")
.AsQueryable();
return query.ToList();
}
Take a look in my example with a big query, note that Each Query line It's a different object.
public List<Appointment> GetList(int id)
{
List<Appointment> ret;
using (var db = new SqlConnection(connstring))
{
const string sql = #"SELECT AP.[Id], AP.Diagnostics, AP.Sintomns, AP.Prescription, AP.DoctorReport, AP.AddressId,
AD.Id, AD.Street, AD.City, AD.State, AD.Country, AD.ZIP, Ad.Complement,
D.Id, D.Bio, d.CRMNumber, D.CRMNumber, D.CRMState,
P.Id,
S.Id, S.Name,
MR.Id, MR.Alergies, MR.BloodType, MR.DtRegister, Mr.HealthyProblems, MR.HealthyProblems, MR.Height, MR.MedicalInsuranceNumber, MR.MedicalInsuranceUserName, MR.Medications, MR.Weight,
MI.Id, MI.Name
from Appointment AP
inner join [Address] AD on AD.Id = AP.AddressId
inner join Doctor D on D.Id = AP.DoctorId
inner join Patient P on P.Id = AP.PatientId
left join Speciality S on S.Id = D.IDEspeciality
left join MedicalRecord MR on MR.Id = P.MedicalRecordId
left join MedicalInsurance MI on MI.Id = MR.MedicalInsuranceId
where AP.Id = #Id
order by AP.Id desc";
ret = db.Query<Appointment, Address, Doctor, Patient, Speciality, MedicalRecord, MedicalInsurance, Appointment>(sql,
(appointment, address, doctor, patient, speciality, medicalrecord, medicalinsurance) =>
{
appointment.Address = address;
appointment.Doctor = doctor;
appointment.Patient = patient;
appointment.Doctor.Speciality = speciality;
appointment.Patient.MedicalRecord = medicalrecord;
appointment.Patient.MedicalRecord.MedicalInsurance = medicalinsurance;
return appointment;
}, new { Id = id }, splitOn: "Id, Id, Id, Id, Id, Id").ToList();
}
return ret;
}
I've only just started looking at Dapper.net and have just been experimenting with some different queries, one of which is producing weird results that i wouldn't expect.
I have 2 tables - Photos & PhotoCategories, of which are related on CategoryID
Photos Table
PhotoId (PK - int)
CategoryId (FK - smallint)
UserId (int)
PhotoCategories Table
CategoryId (PK - smallint)
CategoryName (nvarchar(50))
My 2 classes:
public class Photo
{
public int PhotoId { get; set; }
public short CategoryId { get; set; }
public int UserId { get; set; }
public PhotoCategory PhotoCategory { get; set; }
}
public class PhotoCategory
{
public short CategoryId { get; set; }
public string CategoryName { get; set; }
{
I want to use multi-mapping to return an instance of Photo, with a populated instance of the related PhotoCategory.
var sql = #"select p.*, c.* from Photos p inner
join PhotoCategories c
on p.CategoryID = c.CategoryID where p.PhotoID = #pid";
cn.Open();
var myPhoto = cn.Query<Photo, PhotoCategory, Photo>(sql,
(photo, photoCategory) => { photo.PhotoCategory = photoCategory;
return photo; },
new { pid = photoID }, null, true, splitOn: "CategoryID").Single();
When this is executed, not all of the properties are getting populated (despite the same names between the DB table and in my objects.
I noticed that if I don't 'select p.* etc.' in my SQL, and instead.
I explicitly state the fields.
I want to return EXCLUDING p.CategoryId from the query, then everything gets populated (except obviously the CategoryId against the Photo object which I've excluded from the select statement).
But i would expect to be able to include that field in the query, and have it, as well as all the other fields queried within the SQL, to get populated.
I could just exclude the CategoryId property from my Photo class, and always use Photo.PhotoCategory.CategoryId when i need the ID.
But in some cases I might not want to populate the PhotoCategory object when I get an instance of
the Photo object.
Does anyone know why the above behavior is happening? Is this normal for Dapper?
I just committed a fix for this:
class Foo1
{
public int Id;
public int BarId { get; set; }
}
class Bar1
{
public int BarId;
public string Name { get; set; }
}
public void TestMultiMapperIsNotConfusedWithUnorderedCols()
{
var result = connection.Query<Foo1,Bar1,
Tuple<Foo1,Bar1>>(
"select 1 as Id, 2 as BarId, 3 as BarId, 'a' as Name",
(f,b) => Tuple.Create(f,b), splitOn: "BarId")
.First();
result.Item1.Id.IsEqualTo(1);
result.Item1.BarId.IsEqualTo(2);
result.Item2.BarId.IsEqualTo(3);
result.Item2.Name.IsEqualTo("a");
}
The multi-mapper was getting confused if there was a field in the first type, that also happened to be in the second type ... AND ... was used as a split point.
To overcome now dapper allow for the Id field to show up anywhere in the first type. To illustrate.
Say we have:
classes: A{Id,FooId} B{FooId,Name}
splitOn: "FooId"
data: Id, FooId, FooId, Name
The old method of splitting was taking no account of the actual underlying type it was mapping. So ... it mapped Id => A and FooId, FooId, Name => B
The new method is aware of the props and fields in A. When it first encounters FooId in the stream it does not start a split, since it knows that A has a property called FooId which needs to be mapped, next time it sees FooId it will split, resulting in the expected results.
I'm having a similar problem. It's to do with the fact that both the child and the parent have the same name for the field that is being split on. The following for example works:
class Program
{
static void Main(string[] args)
{
var createSql = #"
create table #Users (UserId int, Name varchar(20))
create table #Posts (Id int, OwnerId int, Content varchar(20))
insert #Users values(99, 'Sam')
insert #Users values(2, 'I am')
insert #Posts values(1, 99, 'Sams Post1')
insert #Posts values(2, 99, 'Sams Post2')
insert #Posts values(3, null, 'no ones post')
";
var sql =
#"select * from #Posts p
left join #Users u on u.UserId = p.OwnerId
Order by p.Id";
using (var connection = new SqlConnection(#"CONNECTION STRING HERE"))
{
connection.Open();
connection.Execute(createSql);
var data = connection.Query<Post, User, Post>(sql, (post, user) => { post.Owner = user; return post; }, splitOn: "UserId");
var apost = data.First();
apost.Content = apost.Content;
connection.Execute("drop table #Users drop table #Posts");
}
}
}
class User
{
public int UserId { get; set; }
public string Name { get; set; }
}
class Post
{
public int Id { get; set; }
public int OwnerId { get; set; }
public User Owner { get; set; }
public string Content { get; set; }
}
But the following does not because "UserId" is used in both tables and both objects.
class Program
{
static void Main(string[] args)
{
var createSql = #"
create table #Users (UserId int, Name varchar(20))
create table #Posts (Id int, UserId int, Content varchar(20))
insert #Users values(99, 'Sam')
insert #Users values(2, 'I am')
insert #Posts values(1, 99, 'Sams Post1')
insert #Posts values(2, 99, 'Sams Post2')
insert #Posts values(3, null, 'no ones post')
";
var sql =
#"select * from #Posts p
left join #Users u on u.UserId = p.UserId
Order by p.Id";
using (var connection = new SqlConnection(#"CONNECTION STRING HERE"))
{
connection.Open();
connection.Execute(createSql);
var data = connection.Query<Post, User, Post>(sql, (post, user) => { post.Owner = user; return post; }, splitOn: "UserId");
var apost = data.First();
apost.Content = apost.Content;
connection.Execute("drop table #Users drop table #Posts");
}
}
}
class User
{
public int UserId { get; set; }
public string Name { get; set; }
}
class Post
{
public int Id { get; set; }
public int UserId { get; set; }
public User Owner { get; set; }
public string Content { get; set; }
}
Dapper's mapping seems to get very confused in this scenario. Think this describes the issue but is there a solution / workaround we can employ (OO design decisions aside)?
I know this question is old but thought I would save someone 2 minutes with the obvious answer to this: Just alias one id from one table:
ie:
SELECT
user.Name, user.Email, user.AddressId As id, address.*
FROM
User user
Join Address address
ON user.AddressId = address.AddressId