C# Datatable String or binary data truncated [duplicate] - c#

I have a C# code which does lot of insert statements in a batch. While executing these statements, I got "String or binary data would be truncated" error and transaction roledback.
To find out the which insert statement caused this, I need to insert one by one in the SQLServer until I hit the error.
Is there clever way to findout which statement and which field caused this issue using exception handling? (SqlException)

In general, there isn't a way to determine which particular statement caused the error. If you're running several, you could watch profiler and look at the last completed statement and see what the statement after that might be, though I have no idea if that approach is feasible for you.
In any event, one of your parameter variables (and the data inside it) is too large for the field it's trying to store data in. Check your parameter sizes against column sizes and the field(s) in question should be evident pretty quickly.

This type of error occurs when the datatype of the SQL Server column has a length which is less than the length of the data entered into the entry form.

this type of error generally occurs when you have to put characters or values more than that you have specified in Database table like in that case: you specify
transaction_status varchar(10)
but you actually trying to store
_transaction_status
which contain 19 characters. that's why you faced this type of error in this code

Generally it is that you are inserting a value that is greater than the maximum allowed value. Ex, data column can only hold up to 200 characters, but you are inserting 201-character string

BEGIN TRY
INSERT INTO YourTable (col1, col2) VALUES (#val1, #val2)
END TRY
BEGIN CATCH
--print or insert into error log or return param or etc...
PRINT '#val1='+ISNULL(CONVERT(varchar,#val1),'')
PRINT '#val2='+ISNULL(CONVERT(varchar,#val2),'')
END CATCH

For SQL 2016 SP2 or higher follow this link
For older versions of SQL do this:
Get the query that is causing the problems (you can also use SQL Profiler if you dont have the source)
Remove all WHERE clauses and other unimportant parts until you are basically just left with the SELECT and FROM parts
Add WHERE 0 = 1 (this will select only table structure)
Add INTO [MyTempTable] just before the FROM clause
You should end up with something like
SELECT
Col1, Col2, ..., [ColN]
INTO [MyTempTable]
FROM
[Tables etc.]
WHERE 0 = 1
This will create a table called MyTempTable in your DB that you can compare to your target table structure i.e. you can compare the columns on both tables to see where they differ. It is a bit of a workaround but it is the quickest method I have found.

It depends on how you are making the Insert Calls. All as one call, or as individual calls within a transaction? If individual calls, then yes (as you iterate through the calls, catch the one that fails). If one large call, then no. SQL is processing the whole statement, so it's out of the hands of the code.

I have created a simple way of finding offending fields by:
Getting the column width of all the columns of a table where we're trying to make this insert/ update. (I'm getting this info directly from the database.)
Comparing the column widths to the width of the values we're trying to insert/ update.
Assumptions/ Limitations:
The column names of the table in the database match with the C# entity fields. For eg: If you have a column like this in database:
You need to have your Entity with the same column name:
public class SomeTable
{
// Other fields
public string SourceData { get; set; }
}
You're inserting/ updating 1 entity at a time. It'll be clearer in the demo code below. (If you're doing bulk inserts/ updates, you might want to either modify it or use some other solution.)
Step 1:
Get the column width of all the columns directly from the database:
// For this, I took help from Microsoft docs website:
// https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.getschema?view=netframework-4.7.2#System_Data_SqlClient_SqlConnection_GetSchema_System_String_System_String___
private static Dictionary<string, int> GetColumnSizesOfTableFromDatabase(string tableName, string connectionString)
{
var columnSizes = new Dictionary<string, int>();
using (var connection = new SqlConnection(connectionString))
{
// Connect to the database then retrieve the schema information.
connection.Open();
// You can specify the Catalog, Schema, Table Name, Column Name to get the specified column(s).
// You can use four restrictions for Column, so you should create a 4 members array.
String[] columnRestrictions = new String[4];
// For the array, 0-member represents Catalog; 1-member represents Schema;
// 2-member represents Table Name; 3-member represents Column Name.
// Now we specify the Table_Name and Column_Name of the columns what we want to get schema information.
columnRestrictions[2] = tableName;
DataTable allColumnsSchemaTable = connection.GetSchema("Columns", columnRestrictions);
foreach (DataRow row in allColumnsSchemaTable.Rows)
{
var columnName = row.Field<string>("COLUMN_NAME");
//var dataType = row.Field<string>("DATA_TYPE");
var characterMaxLength = row.Field<int?>("CHARACTER_MAXIMUM_LENGTH");
// I'm only capturing columns whose Datatype is "varchar" or "char", i.e. their CHARACTER_MAXIMUM_LENGTH won't be null.
if(characterMaxLength != null)
{
columnSizes.Add(columnName, characterMaxLength.Value);
}
}
connection.Close();
}
return columnSizes;
}
Step 2:
Compare the column widths with the width of the values we're trying to insert/ update:
public static Dictionary<string, string> FindLongBinaryOrStringFields<T>(T entity, string connectionString)
{
var tableName = typeof(T).Name;
Dictionary<string, string> longFields = new Dictionary<string, string>();
var objectProperties = GetProperties(entity);
//var fieldNames = objectProperties.Select(p => p.Name).ToList();
var actualDatabaseColumnSizes = GetColumnSizesOfTableFromDatabase(tableName, connectionString);
foreach (var dbColumn in actualDatabaseColumnSizes)
{
var maxLengthOfThisColumn = dbColumn.Value;
var currentValueOfThisField = objectProperties.Where(f => f.Name == dbColumn.Key).First()?.GetValue(entity, null)?.ToString();
if (!string.IsNullOrEmpty(currentValueOfThisField) && currentValueOfThisField.Length > maxLengthOfThisColumn)
{
longFields.Add(dbColumn.Key, $"'{dbColumn.Key}' column cannot take the value of '{currentValueOfThisField}' because the max length it can take is {maxLengthOfThisColumn}.");
}
}
return longFields;
}
public static List<PropertyInfo> GetProperties<T>(T entity)
{
//The DeclaredOnly flag makes sure you only get properties of the object, not from the classes it derives from.
var properties = entity.GetType()
.GetProperties(System.Reflection.BindingFlags.Public
| System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.DeclaredOnly)
.ToList();
return properties;
}
Demo:
Let's say we're trying to insert someTableEntity of SomeTable class that is modeled in our app like so:
public class SomeTable
{
[Key]
public long TicketID { get; set; }
public string SourceData { get; set; }
}
And it's inside our SomeDbContext like so:
public class SomeDbContext : DbContext
{
public DbSet<SomeTable> SomeTables { get; set; }
}
This table in Db has SourceData field as varchar(16) like so:
Now we'll try to insert value that is longer than 16 characters into this field and capture this information:
public void SaveSomeTableEntity()
{
var connectionString = "server=SERVER_NAME;database=DB_NAME;User ID=SOME_ID;Password=SOME_PASSWORD;Connection Timeout=200";
using (var context = new SomeDbContext(connectionString))
{
var someTableEntity = new SomeTable()
{
SourceData = "Blah-Blah-Blah-Blah-Blah-Blah"
};
context.SomeTables.Add(someTableEntity);
try
{
context.SaveChanges();
}
catch (Exception ex)
{
if (ex.GetBaseException().Message == "String or binary data would be truncated.\r\nThe statement has been terminated.")
{
var badFieldsReport = "";
List<string> badFields = new List<string>();
// YOU GOT YOUR FIELDS RIGHT HERE:
var longFields = FindLongBinaryOrStringFields(someTableEntity, connectionString);
foreach (var longField in longFields)
{
badFields.Add(longField.Key);
badFieldsReport += longField.Value + "\n";
}
}
else
throw;
}
}
}
The badFieldsReport will have this value:
'SourceData' column cannot take the value of
'Blah-Blah-Blah-Blah-Blah-Blah' because the max length it can take is
16.

It could also be because you're trying to put in a null value back into the database. So one of your transactions could have nulls in them.

Most of the answers here are to do the obvious check, that the length of the column as defined in the database isn't smaller than the data you are trying to pass into it.
Several times I have been bitten by going to SQL Management Studio, doing a quick:
sp_help 'mytable'
and be confused for a few minutes until I realize the column in question is an nvarchar, which means the length reported by sp_help is really double the real length supported because it's a double byte (unicode) datatype.
i.e. if sp_help reports nvarchar Length 40, you can store 20 characters max.

Checkout this gist.
https://gist.github.com/mrameezraja/9f15ad624e2cba8ac24066cdf271453b.
public Dictionary<string, string> GetEvilFields(string tableName, object instance)
{
Dictionary<string, string> result = new Dictionary<string, string>();
var tableType = this.Model.GetEntityTypes().First(c => c.GetTableName().Contains(tableName));
if (tableType != null)
{
int i = 0;
foreach (var property in tableType.GetProperties())
{
var maxlength = property.GetMaxLength();
var prop = instance.GetType().GetProperties().FirstOrDefault(_ => _.Name == property.Name);
if (prop != null)
{
var length = prop.GetValue(instance)?.ToString()?.Length;
if (length > maxlength)
{
result.Add($"{i}.Evil.Property", prop.Name);
result.Add($"{i}.Evil.Value", prop.GetValue(instance)?.ToString());
result.Add($"{i}.Evil.Value.Length", length?.ToString());
result.Add($"{i}.Evil.Db.MaxLength", maxlength?.ToString());
i++;
}
}
}
}
return result;
}

With Linq To SQL I debugged by logging the context, eg. Context.Log = Console.Out
Then scanned the SQL to check for any obvious errors, there were two:
-- #p46: Input Char (Size = -1; Prec = 0; Scale = 0) [some long text value1]
-- #p8: Input Char (Size = -1; Prec = 0; Scale = 0) [some long text value2]
the last one I found by scanning the table schema against the values, the field was nvarchar(20) but the value was 22 chars
-- #p41: Input NVarChar (Size = 4000; Prec = 0; Scale = 0) [1234567890123456789012]

In our own case I increase the sql table allowable character or field size which is less than the total characters posted from theĀ front end. Hence that resolve the issue.

Simply Used this:
MessageBox.Show(cmd4.CommandText.ToString());
in c#.net and this will show you main query , Copy it and run in database .

Related

String or binary data would be truncated exception when inserting data [duplicate]

I am running data.bat file with the following lines:
Rem Tis batch file will populate tables
cd\program files\Microsoft SQL Server\MSSQL
osql -U sa -P Password -d MyBusiness -i c:\data.sql
The contents of the data.sql file is:
insert Customers
(CustomerID, CompanyName, Phone)
Values('101','Southwinds','19126602729')
There are 8 more similar lines for adding records.
When I run this with start > run > cmd > c:\data.bat, I get this error message:
1>2>3>4>5>....<1 row affected>
Msg 8152, Level 16, State 4, Server SP1001, Line 1
string or binary data would be truncated.
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
Also, I am a newbie obviously, but what do Level #, and state # mean, and how do I look up error messages such as the one above: 8152?
From #gmmastros's answer
Whenever you see the message....
string or binary data would be truncated
Think to yourself... The field is NOT big enough to hold my data.
Check the table structure for the customers table. I think you'll find that the length of one or more fields is NOT big enough to hold the data you are trying to insert. For example, if the Phone field is a varchar(8) field, and you try to put 11 characters in to it, you will get this error.
I had this issue although data length was shorter than the field length.
It turned out that the problem was having another log table (for audit trail), filled by a trigger on the main table, where the column size also had to be changed.
In one of the INSERT statements you are attempting to insert a too long string into a string (varchar or nvarchar) column.
If it's not obvious which INSERT is the offender by a mere look at the script, you could count the <1 row affected> lines that occur before the error message. The obtained number plus one gives you the statement number. In your case it seems to be the second INSERT that produces the error.
Just want to contribute with additional information: I had the same issue and it was because of the field wasn't big enough for the incoming data and this thread helped me to solve it (the top answer clarifies it all).
BUT it is very important to know what are the possible reasons that may cause it.
In my case i was creating the table with a field like this:
Select '' as Period, * From Transactions Into #NewTable
Therefore the field "Period" had a length of Zero and causing the Insert operations to fail. I changed it to "XXXXXX" that is the length of the incoming data and it now worked properly (because field now had a lentgh of 6).
I hope this help anyone with same issue :)
Some of your data cannot fit into your database column (small). It is not easy to find what is wrong. If you use C# and Linq2Sql, you can list the field which would be truncated:
First create helper class:
public class SqlTruncationExceptionWithDetails : ArgumentOutOfRangeException
{
public SqlTruncationExceptionWithDetails(System.Data.SqlClient.SqlException inner, DataContext context)
: base(inner.Message + " " + GetSqlTruncationExceptionWithDetailsString(context))
{
}
/// <summary>
/// PArt of code from following link
/// http://stackoverflow.com/questions/3666954/string-or-binary-data-would-be-truncated-linq-exception-cant-find-which-fiel
/// </summary>
/// <param name="context"></param>
/// <returns></returns>
static string GetSqlTruncationExceptionWithDetailsString(DataContext context)
{
StringBuilder sb = new StringBuilder();
foreach (object update in context.GetChangeSet().Updates)
{
FindLongStrings(update, sb);
}
foreach (object insert in context.GetChangeSet().Inserts)
{
FindLongStrings(insert, sb);
}
return sb.ToString();
}
public static void FindLongStrings(object testObject, StringBuilder sb)
{
foreach (var propInfo in testObject.GetType().GetProperties())
{
foreach (System.Data.Linq.Mapping.ColumnAttribute attribute in propInfo.GetCustomAttributes(typeof(System.Data.Linq.Mapping.ColumnAttribute), true))
{
if (attribute.DbType.ToLower().Contains("varchar"))
{
string dbType = attribute.DbType.ToLower();
int numberStartIndex = dbType.IndexOf("varchar(") + 8;
int numberEndIndex = dbType.IndexOf(")", numberStartIndex);
string lengthString = dbType.Substring(numberStartIndex, (numberEndIndex - numberStartIndex));
int maxLength = 0;
int.TryParse(lengthString, out maxLength);
string currentValue = (string)propInfo.GetValue(testObject, null);
if (!string.IsNullOrEmpty(currentValue) && maxLength != 0 && currentValue.Length > maxLength)
{
//string is too long
sb.AppendLine(testObject.GetType().Name + "." + propInfo.Name + " " + currentValue + " Max: " + maxLength);
}
}
}
}
}
}
Then prepare the wrapper for SubmitChanges:
public static class DataContextExtensions
{
public static void SubmitChangesWithDetailException(this DataContext dataContext)
{
//http://stackoverflow.com/questions/3666954/string-or-binary-data-would-be-truncated-linq-exception-cant-find-which-fiel
try
{
//this can failed on data truncation
dataContext.SubmitChanges();
}
catch (SqlException sqlException) //when (sqlException.Message == "String or binary data would be truncated.")
{
if (sqlException.Message == "String or binary data would be truncated.") //only for EN windows - if you are running different window language, invoke the sqlException.getMessage on thread with EN culture
throw new SqlTruncationExceptionWithDetails(sqlException, dataContext);
else
throw;
}
}
}
Prepare global exception handler and log truncation details:
protected void Application_Error(object sender, EventArgs e)
{
Exception ex = Server.GetLastError();
string message = ex.Message;
//TODO - log to file
}
Finally use the code:
Datamodel.SubmitChangesWithDetailException();
Another situation in which you can get this error is the following:
I had the same error and the reason was that in an INSERT statement that received data from an UNION, the order of the columns was different from the original table. If you change the order in #table3 to a, b, c, you will fix the error.
select a, b, c into #table1
from #table0
insert into #table1
select a, b, c from #table2
union
select a, c, b from #table3
on sql server you can use SET ANSI_WARNINGS OFF like this:
using (SqlConnection conn = new SqlConnection("Data Source=XRAYGOAT\\SQLEXPRESS;Initial Catalog='Healthy Care';Integrated Security=True"))
{
conn.Open();
using (var trans = conn.BeginTransaction())
{
try
{
using cmd = new SqlCommand("", conn, trans))
{
cmd.CommandText = "SET ANSI_WARNINGS OFF";
cmd.ExecuteNonQuery();
cmd.CommandText = "YOUR INSERT HERE";
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
cmd.CommandText = "SET ANSI_WARNINGS ON";
cmd.ExecuteNonQuery();
trans.Commit();
}
}
catch (Exception)
{
trans.Rollback();
}
}
conn.Close();
}
I had the same issue. The length of my column was too short.
What you can do is either increase the length or shorten the text you want to put in the database.
Also had this problem occurring on the web application surface.
Eventually found out that the same error message comes from the SQL update statement in the specific table.
Finally then figured out that the column definition in the relating history table(s) did not map the original table column length of nvarchar types in some specific cases.
I had the same problem, even after increasing the size of the problematic columns in the table.
tl;dr: The length of the matching columns in corresponding Table Types may also need to be increased.
In my case, the error was coming from the Data Export service in Microsoft Dynamics CRM, which allows CRM data to be synced to an SQL Server DB or Azure SQL DB.
After a lengthy investigation, I concluded that the Data Export service must be using Table-Valued Parameters:
You can use table-valued parameters to send multiple rows of data to a Transact-SQL statement or a routine, such as a stored procedure or function, without creating a temporary table or many parameters.
As you can see in the documentation above, Table Types are used to create the data ingestion procedure:
CREATE TYPE LocationTableType AS TABLE (...);
CREATE PROCEDURE dbo.usp_InsertProductionLocation
#TVP LocationTableType READONLY
Unfortunately, there is no way to alter a Table Type, so it has to be dropped & recreated entirely. Since my table has over 300 fields (šŸ˜±), I created a query to facilitate the creation of the corresponding Table Type based on the table's columns definition (just replace [table_name] with your table's name):
SELECT 'CREATE TYPE [table_name]Type AS TABLE (' + STRING_AGG(CAST(field AS VARCHAR(max)), ',' + CHAR(10)) + ');' AS create_type
FROM (
SELECT TOP 5000 COLUMN_NAME + ' ' + DATA_TYPE
+ IIF(CHARACTER_MAXIMUM_LENGTH IS NULL, '', CONCAT('(', IIF(CHARACTER_MAXIMUM_LENGTH = -1, 'max', CONCAT(CHARACTER_MAXIMUM_LENGTH,'')), ')'))
+ IIF(DATA_TYPE = 'decimal', CONCAT('(', NUMERIC_PRECISION, ',', NUMERIC_SCALE, ')'), '')
AS field
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = '[table_name]'
ORDER BY ORDINAL_POSITION) AS T;
After updating the Table Type, the Data Export service started functioning properly once again! :)
When I tried to execute my stored procedure I had the same problem because the size of the column that I need to add some data is shorter than the data I want to add.
You can increase the size of the column data type or reduce the length of your data.
A 2016/2017 update will show you the bad value and column.
A new trace flag will swap the old error for a new 2628 error and will print out the column and offending value. Traceflag 460 is available in the latest cumulative update for 2016 and 2017:
https://support.microsoft.com/en-sg/help/4468101/optional-replacement-for-string-or-binary-data-would-be-truncated
Just make sure that after you've installed the CU that you enable the trace flag, either globally/permanently on the server:
...or with DBCC TRACEON:
https://learn.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-traceon-trace-flags-transact-sql?view=sql-server-ver15
Another situation, in which this error may occur is in
SQL Server Management Studio. If you have "text" or "ntext" fields in your table,
no matter what kind of field you are updating (for example bit or integer).
Seems that the Studio does not load entire "ntext" fields and also updates ALL fields instead of the modified one.
To solve the problem, exclude "text" or "ntext" fields from the query in Management Studio
This Error Comes only When any of your field length is greater than the field length specified in sql server database table structure.
To overcome this issue you have to reduce the length of the field Value .
Or to increase the length of database table field .
If someone is encountering this error in a C# application, I have created a simple way of finding offending fields by:
Getting the column width of all the columns of a table where we're trying to make this insert/ update. (I'm getting this info directly from the database.)
Comparing the column widths to the width of the values we're trying to insert/ update.
Assumptions/ Limitations:
The column names of the table in the database match with the C# entity fields. For eg: If you have a column like this in database:
You need to have your Entity with the same column name:
public class SomeTable
{
// Other fields
public string SourceData { get; set; }
}
You're inserting/ updating 1 entity at a time. It'll be clearer in the demo code below. (If you're doing bulk inserts/ updates, you might want to either modify it or use some other solution.)
Step 1:
Get the column width of all the columns directly from the database:
// For this, I took help from Microsoft docs website:
// https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.getschema?view=netframework-4.7.2#System_Data_SqlClient_SqlConnection_GetSchema_System_String_System_String___
private static Dictionary<string, int> GetColumnSizesOfTableFromDatabase(string tableName, string connectionString)
{
var columnSizes = new Dictionary<string, int>();
using (var connection = new SqlConnection(connectionString))
{
// Connect to the database then retrieve the schema information.
connection.Open();
// You can specify the Catalog, Schema, Table Name, Column Name to get the specified column(s).
// You can use four restrictions for Column, so you should create a 4 members array.
String[] columnRestrictions = new String[4];
// For the array, 0-member represents Catalog; 1-member represents Schema;
// 2-member represents Table Name; 3-member represents Column Name.
// Now we specify the Table_Name and Column_Name of the columns what we want to get schema information.
columnRestrictions[2] = tableName;
DataTable allColumnsSchemaTable = connection.GetSchema("Columns", columnRestrictions);
foreach (DataRow row in allColumnsSchemaTable.Rows)
{
var columnName = row.Field<string>("COLUMN_NAME");
//var dataType = row.Field<string>("DATA_TYPE");
var characterMaxLength = row.Field<int?>("CHARACTER_MAXIMUM_LENGTH");
// I'm only capturing columns whose Datatype is "varchar" or "char", i.e. their CHARACTER_MAXIMUM_LENGTH won't be null.
if(characterMaxLength != null)
{
columnSizes.Add(columnName, characterMaxLength.Value);
}
}
connection.Close();
}
return columnSizes;
}
Step 2:
Compare the column widths with the width of the values we're trying to insert/ update:
public static Dictionary<string, string> FindLongBinaryOrStringFields<T>(T entity, string connectionString)
{
var tableName = typeof(T).Name;
Dictionary<string, string> longFields = new Dictionary<string, string>();
var objectProperties = GetProperties(entity);
//var fieldNames = objectProperties.Select(p => p.Name).ToList();
var actualDatabaseColumnSizes = GetColumnSizesOfTableFromDatabase(tableName, connectionString);
foreach (var dbColumn in actualDatabaseColumnSizes)
{
var maxLengthOfThisColumn = dbColumn.Value;
var currentValueOfThisField = objectProperties.Where(f => f.Name == dbColumn.Key).First()?.GetValue(entity, null)?.ToString();
if (!string.IsNullOrEmpty(currentValueOfThisField) && currentValueOfThisField.Length > maxLengthOfThisColumn)
{
longFields.Add(dbColumn.Key, $"'{dbColumn.Key}' column cannot take the value of '{currentValueOfThisField}' because the max length it can take is {maxLengthOfThisColumn}.");
}
}
return longFields;
}
public static List<PropertyInfo> GetProperties<T>(T entity)
{
//The DeclaredOnly flag makes sure you only get properties of the object, not from the classes it derives from.
var properties = entity.GetType()
.GetProperties(System.Reflection.BindingFlags.Public
| System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.DeclaredOnly)
.ToList();
return properties;
}
Demo:
Let's say we're trying to insert someTableEntity of SomeTable class that is modeled in our app like so:
public class SomeTable
{
[Key]
public long TicketID { get; set; }
public string SourceData { get; set; }
}
And it's inside our SomeDbContext like so:
public class SomeDbContext : DbContext
{
public DbSet<SomeTable> SomeTables { get; set; }
}
This table in Db has SourceData field as varchar(16) like so:
Now we'll try to insert value that is longer than 16 characters into this field and capture this information:
public void SaveSomeTableEntity()
{
var connectionString = "server=SERVER_NAME;database=DB_NAME;User ID=SOME_ID;Password=SOME_PASSWORD;Connection Timeout=200";
using (var context = new SomeDbContext(connectionString))
{
var someTableEntity = new SomeTable()
{
SourceData = "Blah-Blah-Blah-Blah-Blah-Blah"
};
context.SomeTables.Add(someTableEntity);
try
{
context.SaveChanges();
}
catch (Exception ex)
{
if (ex.GetBaseException().Message == "String or binary data would be truncated.\r\nThe statement has been terminated.")
{
var badFieldsReport = "";
List<string> badFields = new List<string>();
// YOU GOT YOUR FIELDS RIGHT HERE:
var longFields = FindLongBinaryOrStringFields(someTableEntity, connectionString);
foreach (var longField in longFields)
{
badFields.Add(longField.Key);
badFieldsReport += longField.Value + "\n";
}
}
else
throw;
}
}
}
The badFieldsReport will have this value:
'SourceData' column cannot take the value of
'Blah-Blah-Blah-Blah-Blah-Blah' because the max length it can take is
16.
Kevin Pope's comment under the accepted answer was what I needed.
The problem, in my case, was that I had triggers defined on my table that would insert update/insert transactions into an audit table, but the audit table had a data type mismatch where a column with VARCHAR(MAX) in the original table was stored as VARCHAR(1) in the audit table, so my triggers were failing when I would insert anything greater than VARCHAR(1) in the original table column and I would get this error message.
I used a different tactic, fields that are allocated 8K in some places. Here only about 50/100 are used.
declare #NVPN_list as table
nvpn varchar(50)
,nvpn_revision varchar(5)
,nvpn_iteration INT
,mpn_lifecycle varchar(30)
,mfr varchar(100)
,mpn varchar(50)
,mpn_revision varchar(5)
,mpn_iteration INT
-- ...
) INSERT INTO #NVPN_LIST
SELECT left(nvpn ,50) as nvpn
,left(nvpn_revision ,10) as nvpn_revision
,nvpn_iteration
,left(mpn_lifecycle ,30)
,left(mfr ,100)
,left(mpn ,50)
,left(mpn_revision ,5)
,mpn_iteration
,left(mfr_order_num ,50)
FROM [DASHBOARD].[dbo].[mpnAttributes] (NOLOCK) mpna
I wanted speed, since I have 1M total records, and load 28K of them.
This error may be due to less field size than your entered data.
For e.g. if you have data type nvarchar(7) and if your value is 'aaaaddddf' then error is shown as:
string or binary data would be truncated
You simply can't beat SQL Server on this.
You can insert into a new table like this:
select foo, bar
into tmp_new_table_to_dispose_later
from my_table
and compare the table definition with the real table you want to insert the data into.
Sometime it's helpful sometimes it's not.
If you try inserting in the final/real table from that temporary table it may just work (due to data conversion working differently than SSMS for example).
Another alternative is to insert the data in chunks, instead of inserting everything immediately you insert with top 1000 and you repeat the process, till you find a chunk with an error. At least you have better visibility on what's not fitting into the table.

Getting empty answer when I should get nothing

I'm working on a C# program that uses a SQL Server Compact database. I have a query where I want to select the highest number in a specific field that looks like this:
SELECT MAX(nr2) FROM TABLE WHERE nr1 = '10'
This works as inteneded when there is a row where nr1 is 10. But I would expect to not get an answer when that row doesn't exist, but instead I get an empty field. So in my C# code I have:
text = result[0].ToString();
When I get a value from my SQL query the string contains a number and when the specified row doesn't exist I get an empty string.
This isn't really a big problem but I would be able to do the following check:
if (result.Count > 0)
Instead of:
if (result[0].ToString() == "")
which I have to do at the moment since count is always larger than 0.
Talk about using a sledgehammer to crack a nut, but...
I don't test it with C# code, but in SQL Server Management Studio, if you run...
SELECT MAX(nr2) FROM TABLE WHERE nr1 = '10' HAVING MAX(nr2) IS NOT NULL
, the result is an empty collection, not a collection with one null (or empty) element.
NOTE: My answer is based on this SO Answer. It seems that MAX and COUNT SQL functions returns always a single row collection.
That SQL statement will always return a result... if the base query returns no result then the value of max() is null !
if you are using ADO.NEt, you could use ExecuteScalar, here an example :
private int GetIDNum()
{
SqlConnection connection = new SqlConnection("connectionstring");
using(SqlCommand command = new SqlCommand("SELECT MAX(nr2) FROM TABLE WHERE nr1 = '10'", connection))
{
try
{
connection.Open();
object result = command.ExecuteScalar();
if( result != null && result != DBNull.Value )
{
return Convert.ToInt32( result );
}
else
{
return 0;
}
}
finally
{
connection.Close();
}
}
}

Insert Data into MySQL in multiple Tables in C# efficiently

I need to insert a huge CSV-File into 2 Tables with a 1:n relationship within a mySQL Database.
The CSV-file comes weekly and has about 1GB, which needs to be append to the existing data.
Each of them 2 tables have a Auto increment Primary Key.
I've tried:
Entity Framework (takes most time of all approaches)
Datasets (same)
Bulk Upload (doesn't support multiple tables)
MySqlCommand with Parameters (needs to be nested, my current approach)
MySqlCommand with StoredProcedure including a Transaction
Any further suggestions?
Let's say simplified this is my datastructure:
public class User
{
public string FirstName { get; set; }
public string LastName { get; set; }
public List<string> Codes { get; set; }
}
I need to insert from the csv into this database:
User (1-n) Code
+---+-----+-----+ +---+---+-----+
|PID|FName|LName| |CID|PID|Code |
+---+-----+-----+ +---+---+-----+
| 1 |Jon | Foo | | 1 | 1 | ed3 |
| 2 |Max | Foo | | 2 | 1 | wst |
| 3 |Paul | Foo | | 3 | 2 | xsd |
+---+-----+-----+ +---+---+-----+
Here a sample line of the CSV-file
Jon;Foo;ed3,wst
A Bulk load like LOAD DATA LOCAL INFILE is not possible because i have restricted writing rights
Referring to your answer i would replace
using (MySqlCommand myCmdNested = new MySqlCommand(cCommand, mConnection))
{
foreach (string Code in item.Codes)
{
myCmdNested.Parameters.Add(new MySqlParameter("#UserID", UID));
myCmdNested.Parameters.Add(new MySqlParameter("#Code", Code));
myCmdNested.ExecuteNonQuery();
}
}
with
List<string> lCodes = new List<string>();
foreach (string code in item.Codes)
{
lCodes.Add(String.Format("('{0}','{1}')", UID, MySqlHelper.EscapeString(code)));
}
string cCommand = "INSERT INTO Code (UserID, Code) VALUES " + string.Join(",", lCodes);
using (MySqlCommand myCmdNested = new MySqlCommand(cCommand, mConnection))
{
myCmdNested.ExecuteNonQuery();
}
that generates one insert statement instead of item.Count
Given the great size of data, the best approach (performance wise) is to leave as much data processing to the database and not the application.
Create a temporary table that the data from the .csv file will be temporarily saved.
CREATE TABLE `imported` (
`id` int(11) NOT NULL,
`firstname` varchar(45) DEFAULT NULL,
`lastname` varchar(45) DEFAULT NULL,
`codes` varchar(450) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Loading the data from the .csv to this table is pretty straightforward. I would suggest the use of MySqlCommand (which is also your current approach). Also, using the same MySqlConnection object for all INSERT statements will reduce the total execution time.
Then to furthermore process the data, you can create a stored procedure that will handle it.
Assuming these two tables (taken from your simplified example):
CREATE TABLE `users` (
`PID` int(11) NOT NULL AUTO_INCREMENT,
`FName` varchar(45) DEFAULT NULL,
`LName` varchar(45) DEFAULT NULL,
PRIMARY KEY (`PID`)
) ENGINE=InnoDB AUTO_INCREMENT=3737 DEFAULT CHARSET=utf8;
and
CREATE TABLE `codes` (
`CID` int(11) NOT NULL AUTO_INCREMENT,
`PID` int(11) DEFAULT NULL,
`code` varchar(45) DEFAULT NULL,
PRIMARY KEY (`CID`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8;
you can have the following stored procedure.
CREATE DEFINER=`root`#`localhost` PROCEDURE `import_data`()
BEGIN
DECLARE fname VARCHAR(255);
DECLARE lname VARCHAR(255);
DECLARE codesstr VARCHAR(255);
DECLARE splitted_value VARCHAR(255);
DECLARE done INT DEFAULT 0;
DECLARE newid INT DEFAULT 0;
DECLARE occurance INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
DECLARE cur CURSOR FOR SELECT firstname,lastname,codes FROM imported;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur;
import_loop:
LOOP FETCH cur INTO fname, lname, codesstr;
IF done = 1 THEN
LEAVE import_loop;
END IF;
INSERT INTO users (FName,LName) VALUES (fname, lname);
SET newid = LAST_INSERT_ID();
SET i=1;
SET occurance = (SELECT LENGTH(codesstr) - LENGTH(REPLACE(codesstr, ',', '')) + 1);
WHILE i <= occurance DO
SET splitted_value =
(SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(codesstr, ',', i),
LENGTH(SUBSTRING_INDEX(codesstr, ',', i - 1)) + 1), ',', ''));
INSERT INTO codes (PID, code) VALUES (newid, splitted_value);
SET i = i + 1;
END WHILE;
END LOOP;
CLOSE cur;
END
For every row in the source data, it makes an INSERT statement for the user table. Then there is a WHILE loop to split the comma separated codes and make for each one an INSERT statement for the codes table.
Regarding the use of LAST_INSERT_ID(), it is reliable on a PER CONNECTION basis (see doc here). If the MySQL connection used to run this stored procedure is not used by other transactions, the use of LAST_INSERT_ID() is safe.
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
Edit: Here is the OP's variant that omits the temp-table imported. Instead of inserting the data from the .csv to the imported table, you call the SP to directly store them to your database.
CREATE DEFINER=`root`#`localhost` PROCEDURE `import_data`(IN fname VARCHAR(255), IN lname VARCHAR(255),IN codesstr VARCHAR(255))
BEGIN
DECLARE splitted_value VARCHAR(255);
DECLARE done INT DEFAULT 0;
DECLARE newid INT DEFAULT 0;
DECLARE occurance INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
INSERT INTO users (FName,LName) VALUES (fname, lname);
SET newid = LAST_INSERT_ID();
SET i=1;
SET occurance = (SELECT LENGTH(codesstr) - LENGTH(REPLACE(codesstr, ',', '')) + 1);
WHILE i <= occurance DO
SET splitted_value =
(SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(codesstr, ',', i),
LENGTH(SUBSTRING_INDEX(codesstr, ',', i - 1)) + 1), ',', ''));
INSERT INTO codes (PID, code) VALUES (newid, splitted_value);
SET i = i + 1;
END WHILE;
END
Note: The code to split the codes is taken from here (MySQL does not provide a split function for strings).
I developed my WPF application application using the Entity Framework and used SQL server database and needed to read data from an excel file and had to insert that data into 2 tables that has relationship between them. For roughly about 15000 rows in excel it used to take around 4 hours of time. Then what I did was I used a block of 500 rows per insert and this speeded up my insertion to unbelievalbe fast and now it takes mere 3-5 seconds to import that same data.
So I would suggest you add your rows to a Context like 100/200/500 at a time and then call the SaveChanges method (if you really want to be using EF). There are other helpful tips as well to speed up the performance for EF. Please read this for your reference.
var totalRecords = TestPacksData.Rows.Count;
var totalPages = (totalRecords / ImportRecordsPerPage) + 1;
while (count <= totalPages)
{
var pageWiseRecords = TestPacksData.Rows.Cast<DataRow>().Skip(count * ImportRecordsPerPage).Take(ImportRecordsPerPage);
count++;
Project.CreateNewSheet(pageWiseRecords.ToList());
Project.CreateNewSpool(pageWiseRecords.ToList());
}
And here is the CreateNewSheet method
/// <summary>
/// Creates a new Sheet record in the database
/// </summary>
/// <param name="row">DataRow containing the Sheet record</param>
public void CreateNewSheet(List<DataRow> rows)
{
var tempSheetsList = new List<Sheet>();
foreach (var row in rows)
{
var sheetNo = row[SheetFields.Sheet_No.ToString()].ToString();
if (string.IsNullOrWhiteSpace(sheetNo))
continue;
var testPackNo = row[SheetFields.Test_Pack_No.ToString()].ToString();
TestPack testPack = null;
if (!string.IsNullOrWhiteSpace(testPackNo))
testPack = GetTestPackByTestPackNo(testPackNo);
var existingSheet = GetSheetBySheetNo(sheetNo);
if (existingSheet != null)
{
UpdateSheet(existingSheet, row);
continue;
}
var isometricNo = GetIsometricNoFromSheetNo(sheetNo);
var newSheet = new Sheet
{
sheet_no = sheetNo,
isometric_no = isometricNo,
ped_rev = row[SheetFields.PED_Rev.ToString()].ToString(),
gpc_rev = row[SheetFields.GPC_Rev.ToString()].ToString()
};
if (testPack != null)
{
newSheet.test_pack_id = testPack.id;
newSheet.test_pack_no = testPack.test_pack_no;
}
if (!tempSheetsList.Any(l => l.sheet_no == newSheet.sheet_no))
{
DataStore.Context.Sheets.Add(newSheet);
tempSheetsList.Add(newSheet);
}
}
try
{
DataStore.Context.SaveChanges();
**DataStore.Dispose();** This is very important. Dispose the context
}
catch (DbEntityValidationException ex)
{
// Create log for the exception here
}
}
CreateNewSpool is ditto same method except for the fields name and table name, because it updates a child table. But the idea is the same
1 - Add a column VirtualId to User table & class.
EDITED
2 - Assign numbers in a loop for the VirtualId (use negative numbers starting -1 to avoid collisions in the last step) field in each User Object. For each Code c object belonging to User u object set the c.UserId = u.VirtualId.
3 - Bulk load Users into User table, Bulk load Codes into Code table.
4- UPDATE CODE C,USER U SET C.UserId = U.Id WHERE C.UserId = U.VirtualId.
NOTE : If you have a FK Constraint on Code.UserId you can drop it and re-add it after the Insert.
public class User
{
public int Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public int VirtualId { get; set; }
}
public class Code
{
public int Id { get; set; }
public string Code { get; set; }
public string UserId { get; set; }
}
Can you break the CSV into two files?
E.g. Suppose your file has the following columns:
... A ... | ... B ...
a0 | b0
a0 | b1
a0 | b2 <-- data
a1 | b3
a1 | b4
So one set of A might have multiple B entries. After you break it apart, you get:
... A ...
a0
a1
... B ...
b0
b1
b2
b3
b4
Then you bulk insert them separately.
Edit: Pseudo code
Based on the conversation, something like:
DataTable tableA = ...; // query schema for TableA
DataTable tableB = ...; // query schmea for TableB
List<String> usernames = select distinct username from TableA;
Hashtable htUsername = new Hashtable(StringComparer.InvariantCultureIgnoreCase);
foreach (String username in usernames)
htUsername[username] = "";
int colUsername = ...;
foreach (String[] row in CSVFile) {
String un = row[colUsername] as String;
if (htUsername[un] == null) {
// add new row to tableA
DataRow row = tableA.NewRow();
row["Username"] = un;
// etc.
tableA.Rows.Add(row);
htUsername[un] = "";
}
}
// bulk insert TableA
select userid, username from TableA
Hashtable htUserId = new Hashtable(StringComparer.InvariantCultureIgnoreCase);
// htUserId[username] = userid;
int colUserId = ...;
foreach (String[] row in CSVFile) {
String un = row[colUsername] as String;
int userid = (int) htUserId[un];
DataRow row = tableB.NewRow();
row[colUserId] = userId;
// fill in other values
tableB.Rows.Add(row);
if (table.Rows.Count == 65000) {
// bulk insert TableB
var t = tableB.Clone();
tableB.Dispose();
tableB = t;
}
}
if (tableB.Rows.Count > 0)
// bulk insert TableB
AFAIK the insertions done in a table are sequential while the insertions in different table can be done in parallel. Open two separate new connections to the same database and then insert in parallel maybe by using Task Parallel Library.
However, if there are integrity constraints about 1:n relationship between the tables, then:
Insertions might fail and thus any parallel insert approach would be wrong. Clearly then your best bet would be to do sequential inserts only, one table after the other.
You can try and sort the data of both tables write the InsertInto method written below such that insert in second table will happen only after you are done inserting the data in the first one.
Edit: Since you have requested, if there is a possibility for you to perform the inserts in parallel, following is the code template you can use.
private void ParallelInserts()
{
..
//Other code in the method
..
//Read first csv into memory. It's just a GB so should be fine
ReadFirstCSV();
//Read second csv into memory...
ReadSecondCSV();
//Because the inserts will last more than a few CPU cycles...
var taskFactory = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.None)
//An array to hold the two parallel inserts
var insertTasks = new Task[2];
//Begin insert into first table...
insertTasks[0] = taskFactory.StartNew(() => InsertInto(commandStringFirst, connectionStringFirst));
//Begin insert into second table...
insertTasks[1] = taskFactory.StartNew(() => InsertInto(commandStringSecond, connectionStringSecond));
//Let them be done...
Task.WaitAll(insertTasks);
Console.WriteLine("Parallel insert finished.");
}
//Defining the InsertInto method which we are passing to the tasks in the method above
private static void InsertInto(string commandString, string connectionString)
{
using (/*open a new connection using the connectionString passed*/)
{
//In a while loop, iterate until you have 100/200/500 rows
while (fileIsNotExhausted)
{
using (/*commandString*/)
{
//Execute command to insert in bulk
}
}
}
}
When you say "efficiently" are you talking memory, or time?
In terms of improving the speed of the inserts, if you can do multiple value blocks per insert statement, you can get 500% improvement in speed. I did some benchmarks on this over in this question: Which is faster: multiple single INSERTs or one multiple-row INSERT?
My approach is described in the answer, but simply put, reading in up to say 50 "rows" (to be inserted) at once and bundling them into a single INSERT INTO(...), VALUES(...),(...),(...)...(...),(...) type statement seems to really speed things up. At least, if you're restricted from not being able to bulk load.
Another approach btw if you have live data you can't drop indexes on during the upload, is to create a memory table on the mysql server without indexes, dump the data there, and then do an INSERT INTO live SELECT * FROM mem. Though that uses more memory on the server, hence the question at the start of this answer about "what do you mean by 'efficiently'?" :)
Oh, and there's probably nothing wrong with iterating through the file and doing all the first table inserts first, and then doing the second table ones. Unless the data is being used live, I guess. In that case you could definitely still use the bundled approach, but the application logic to do that is a lot more complex.
UPDATE: OP requested example C# code for multivalue insert blocks.
Note: this code assumes you have a number of structures already configured:
tables List<string> - table names to insert into
fieldslist Dictionary<string, List<String>> - list of field names for each table
typeslist Dictionary<string, List<MySqlDbType>> - list of MySqlDbTypes for each table, same order as the field names.
nullslist Dictionary<string, List<Boolean>> - list of flags to tell if a field is nullable or not, for each table (same order as field names).
prikey Dictionary<string, string> - list of primary key field name, per table (note: this doesn't support multiple field primary keys, though if you needed it you could probably hack it in - I think somewhere I have a version that does support this, but... meh).
theData Dictionary<string, List<Dictionary<int, object>>> - the actual data, as a list of fieldnum-value dictionaries, per table.
Oh yeah, and the localcommand is MySqlCommand created by using CreateCommand() on the local MySqlConnection object.
Further note: I wrote this quite a while back when I was kind of starting. If this causes your eyes or brain to bleed, I apologise in advance :)
const int perinsert = 50;
foreach (string table in tables)
{
string[] fields = fieldslist[table].ToArray();
MySqlDbType[] types = typeslist[table].ToArray();
bool[] nulls = nullslist[table].ToArray();
int thisblock = perinsert;
int rowstotal = theData[table].Count;
int rowsremainder = rowstotal % perinsert;
int rowscopied = 0;
// Do the bulk (multi-VALUES block) INSERTs, but only if we have more rows than there are in a single bulk insert to perform:
while (rowscopied < rowstotal)
{
if (rowstotal - rowscopied < perinsert)
thisblock = rowstotal - rowscopied;
// Generate a 'perquery' multi-VALUES prepared INSERT statement:
List<string> extravals = new List<string>();
for (int j = 0; j < thisblock; j++)
extravals.Add(String.Format("(#{0}_{1})", j, String.Join(String.Format(", #{0}_", j), fields)));
localcmd.CommandText = String.Format("INSERT INTO {0} VALUES{1}", tmptable, String.Join(",", extravals.ToArray()));
// Now create the parameters to match these:
for (int j = 0; j < thisblock; j++)
for (int i = 0; i < fields.Length; i++)
localcmd.Parameters.Add(String.Format("{0}_{1}", j, fields[i]), types[i]).IsNullable = nulls[i];
// Keep doing bulk INSERTs until there's less rows left than we need for another one:
while (rowstotal - rowscopied >= thisblock)
{
// Queue up all the VALUES for this block INSERT:
for (int j = 0; j < thisblock; j++)
{
Dictionary<int, object> row = theData[table][rowscopied++];
for (int i = 0; i < fields.Length; i++)
localcmd.Parameters[String.Format("{0}_{1}", j, fields[i])].Value = row[i];
}
// Run the query:
localcmd.ExecuteNonQuery();
}
// Clear all the paramters - we're done here:
localcmd.Parameters.Clear();
}
}

Inserting to SQL performance issues

I wrote a program some time ago that delimits and reads in pretty big text files. The program works but the problem is it basically freezes the computer and takes long time to finish. On average each text file has around 10K to 15K lines, and each line represents a new row in a SQL table.
Way my program works is I first read all of the lines (this is where delimiting happens) and store them in array, after that I go through each array element and insert them into SQL table. This is all done at once and I suspect is eating up to much memory which is causing the program to freeze the computer.
Here is my code for reading file:
private void readFile()
{
//String that will hold each line read from the file
String line;
//Instantiate new stream reader
System.IO.StreamReader file = new System.IO.StreamReader(txtFilePath.Text);
try
{
while (!file.EndOfStream)
{
line = file.ReadLine();
if (!string.IsNullOrWhiteSpace(line))
{
if (this.meetsCondition(line))
{
badLines++;
continue;
} // end if
else
{
collection.readIn(line);
counter++;
} // end else
} // end if
} // end while
file.Close();
} // end try
catch (Exception exceptionError)
{
//Placeholder
}
Code for inserting:
for (int i = 0; i < counter; i++)
{
//Iterates through the collection array starting at first index and going through until the end
//and inserting each element into our SQL Table
//if (!idS.Contains(collection.getIdItems(i)))
//{
da.InsertCommand.Parameters["#Id"].Value = collection.getIdItems(i);
da.InsertCommand.Parameters["#Date"].Value = collection.getDateItems(i);
da.InsertCommand.Parameters["#Time"].Value = collection.getTimeItems(i);
da.InsertCommand.Parameters["#Question"].Value = collection.getQuestionItems(i);
da.InsertCommand.Parameters["#Details"].Value = collection.getDetailsItems(i);
da.InsertCommand.Parameters["#Answer"].Value = collection.getAnswerItems(i);
da.InsertCommand.Parameters["#Notes"].Value = collection.getNotesItems(i);
da.InsertCommand.Parameters["#EnteredBy"].Value = collection.getEnteredByItems(i);
da.InsertCommand.Parameters["#WhereReceived"].Value = collection.getWhereItems(i);
da.InsertCommand.Parameters["#QuestionType"].Value = collection.getQuestionTypeItems(i);
da.InsertCommand.Parameters["#AnswerMethod"].Value = collection.getAnswerMethodItems(i);
da.InsertCommand.Parameters["#TransactionDuration"].Value = collection.getTransactionItems(i);
da.InsertCommand.ExecuteNonQuery();
//}
//Updates the progress bar using the i in addition to 1
_worker.ReportProgress(i + 1);
} // end for
If you can map your collection to a DataTable then you could use an SqlBulkCopy to import your data. SqlBulkCopy is the fastest way to import data from .Net into SqlServer.
Use SqlBulkCopy class for bulk inserts.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
You will cut down the time to mere seconds.
+1 for SqlBulkCopy as others have stated, but be aware that it requires INSERT permission. If you work in a strictly controlled environment, as I do, where you aren't allowed to use dynamic SQL an alternative approach is to have your stored proc use Table-Valued parameters. That way you can still pass in chunks of records and have the proc do the actual inserting.
As an example how to use the functionaloty of the SqlBulkCopy class, (It is just pseudocode to render the idea)
First change your collection class to host an internal DataTable, and in the constructor define the schema used by your readIn method
public class MyCollection
{
private DataTable loadedData = null;
public MyCollection()
{
loadedData = new DataTable();
loadedData.Columns.Add("Column1", typeof(string));
.... and so on for every field expected
}
// A property to return the collected data
public DataTable GetData
{
get{return loadedData;}
}
public void readIn(string line)
{
// split the line in fields
DataRow r = loadedData.NewRow();
r["Column1"] = splittedLine[0];
.... and so on
loadedData.Rows.Add(r);
}
}
Finally the code that upload the data to your server
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = "destinationTable";
try
{
bulkCopy.WriteToServer(collection.GetData());
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
As mentioned, using SqlBulkCopy will be faster than inserting one-by-one, but there are other things that you could look at:
Is there a clustered index on the table? If so will you be inserting rows with values in the middle of that index? It's much more efficient to add values at the end of a clustered index since otherwise it will have to rearrange data to insert in in the middle (this is only for CLUSTERED indexes). On example I've seen us using SSN as a clustered primary key. Since SSNs will be distributed randomly, you are rearranging the physical structure on virtually every insert. Having a date as part of the clustered key may be OK if you are MOSTLY inserting data at the end (e.g. adding daily records)
Are there a lot of indexes on that table? it may be more efficient to drop the indexes, add the data, and re-add the indexes after the inserts. (or just drop indexes you don't need)

Avoiding the 2100 parameter limit in LINQ to SQL

In a project I am currently working on, I need to access 2 databases in LINQ in the following manner:
I get a list of all trip numbers between a specified date range from DB1, and store this as a list of 'long' values
I perform an extensive query with a lot of joins on DB2, but only looking at trips that have their trip number included in the above list.
Problem is, the trip list from DB1 often returns over 2100 items - and I of course hit the 2100 parameter limit in SQL, which causes my second query to fail. I've been looking at ways around this, such as described here, but this has the effect of essentially changing my query to LINQ-to-Objects, which causes a lot of issues with my joins
Are there any other workarounds I can do?
as LINQ-to-SQL can call stored procs, you could
have a stored proc that takes an array as a input then puts the values in a temp table to join on
likewise by taking a string that the stored proc splits
Or upload all the values to a temp table yourself and join on that table.
However maybe you should rethink the problem:
Sql server can be configured to allow query against tables in other databases (including oracle), if you are allowed this may be an option for you.
Could you use some replication system to keep a table of trip numbers updated in DB2?
Not sure whether this will help, but I had a similar issue for a one-off query I was writing in LinqPad and ended up defining and using a temporary table like this.
[Table(Name="#TmpTable1")]
public class TmpRecord
{
[Column(DbType="Int", IsPrimaryKey=true, UpdateCheck=UpdateCheck.Never)]
public int? Value { get; set; }
}
public Table<TmpRecord> TmpRecords
{
get { return base.GetTable<TmpRecord>(); }
}
public void DropTable<T>()
{
ExecuteCommand( "DROP TABLE " + Mapping.GetTable(typeof(T)).TableName );
}
public void CreateTable<T>()
{
ExecuteCommand(
typeof(DataContext)
.Assembly
.GetType("System.Data.Linq.SqlClient.SqlBuilder")
.InvokeMember("GetCreateTableCommand",
BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.InvokeMethod
, null, null, new[] { Mapping.GetTable(typeof(T)) } ) as string
);
}
Usage is something like
void Main()
{
List<int> ids = ....
this.Connection.Open();
// Note, if the connection is not opened here, the temporary table
// will be created but then dropped immediately.
CreateTable<TmpRecord>();
foreach(var id in ids)
TmpRecords.InsertOnSubmit( new TmpRecord() { Value = id}) ;
SubmitChanges();
var list1 = (from r in CustomerTransaction
join tt in TmpRecords on r.CustomerID equals tt.Value
where ....
select r).ToList();
DropTable<TmpRecord>();
this.Connection.Close();
}
In my case the temporary table only had one int column, but you should be able to define whatever column(s) type you want, (as long as you have a primary key).
You may split your query or use a temporary table in database2 filled with results from database1.

Categories