I need to insert a huge CSV-File into 2 Tables with a 1:n relationship within a mySQL Database.
The CSV-file comes weekly and has about 1GB, which needs to be append to the existing data.
Each of them 2 tables have a Auto increment Primary Key.
I've tried:
Entity Framework (takes most time of all approaches)
Datasets (same)
Bulk Upload (doesn't support multiple tables)
MySqlCommand with Parameters (needs to be nested, my current approach)
MySqlCommand with StoredProcedure including a Transaction
Any further suggestions?
Let's say simplified this is my datastructure:
public class User
{
public string FirstName { get; set; }
public string LastName { get; set; }
public List<string> Codes { get; set; }
}
I need to insert from the csv into this database:
User (1-n) Code
+---+-----+-----+ +---+---+-----+
|PID|FName|LName| |CID|PID|Code |
+---+-----+-----+ +---+---+-----+
| 1 |Jon | Foo | | 1 | 1 | ed3 |
| 2 |Max | Foo | | 2 | 1 | wst |
| 3 |Paul | Foo | | 3 | 2 | xsd |
+---+-----+-----+ +---+---+-----+
Here a sample line of the CSV-file
Jon;Foo;ed3,wst
A Bulk load like LOAD DATA LOCAL INFILE is not possible because i have restricted writing rights
Referring to your answer i would replace
using (MySqlCommand myCmdNested = new MySqlCommand(cCommand, mConnection))
{
foreach (string Code in item.Codes)
{
myCmdNested.Parameters.Add(new MySqlParameter("#UserID", UID));
myCmdNested.Parameters.Add(new MySqlParameter("#Code", Code));
myCmdNested.ExecuteNonQuery();
}
}
with
List<string> lCodes = new List<string>();
foreach (string code in item.Codes)
{
lCodes.Add(String.Format("('{0}','{1}')", UID, MySqlHelper.EscapeString(code)));
}
string cCommand = "INSERT INTO Code (UserID, Code) VALUES " + string.Join(",", lCodes);
using (MySqlCommand myCmdNested = new MySqlCommand(cCommand, mConnection))
{
myCmdNested.ExecuteNonQuery();
}
that generates one insert statement instead of item.Count
Given the great size of data, the best approach (performance wise) is to leave as much data processing to the database and not the application.
Create a temporary table that the data from the .csv file will be temporarily saved.
CREATE TABLE `imported` (
`id` int(11) NOT NULL,
`firstname` varchar(45) DEFAULT NULL,
`lastname` varchar(45) DEFAULT NULL,
`codes` varchar(450) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Loading the data from the .csv to this table is pretty straightforward. I would suggest the use of MySqlCommand (which is also your current approach). Also, using the same MySqlConnection object for all INSERT statements will reduce the total execution time.
Then to furthermore process the data, you can create a stored procedure that will handle it.
Assuming these two tables (taken from your simplified example):
CREATE TABLE `users` (
`PID` int(11) NOT NULL AUTO_INCREMENT,
`FName` varchar(45) DEFAULT NULL,
`LName` varchar(45) DEFAULT NULL,
PRIMARY KEY (`PID`)
) ENGINE=InnoDB AUTO_INCREMENT=3737 DEFAULT CHARSET=utf8;
and
CREATE TABLE `codes` (
`CID` int(11) NOT NULL AUTO_INCREMENT,
`PID` int(11) DEFAULT NULL,
`code` varchar(45) DEFAULT NULL,
PRIMARY KEY (`CID`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8;
you can have the following stored procedure.
CREATE DEFINER=`root`#`localhost` PROCEDURE `import_data`()
BEGIN
DECLARE fname VARCHAR(255);
DECLARE lname VARCHAR(255);
DECLARE codesstr VARCHAR(255);
DECLARE splitted_value VARCHAR(255);
DECLARE done INT DEFAULT 0;
DECLARE newid INT DEFAULT 0;
DECLARE occurance INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
DECLARE cur CURSOR FOR SELECT firstname,lastname,codes FROM imported;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur;
import_loop:
LOOP FETCH cur INTO fname, lname, codesstr;
IF done = 1 THEN
LEAVE import_loop;
END IF;
INSERT INTO users (FName,LName) VALUES (fname, lname);
SET newid = LAST_INSERT_ID();
SET i=1;
SET occurance = (SELECT LENGTH(codesstr) - LENGTH(REPLACE(codesstr, ',', '')) + 1);
WHILE i <= occurance DO
SET splitted_value =
(SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(codesstr, ',', i),
LENGTH(SUBSTRING_INDEX(codesstr, ',', i - 1)) + 1), ',', ''));
INSERT INTO codes (PID, code) VALUES (newid, splitted_value);
SET i = i + 1;
END WHILE;
END LOOP;
CLOSE cur;
END
For every row in the source data, it makes an INSERT statement for the user table. Then there is a WHILE loop to split the comma separated codes and make for each one an INSERT statement for the codes table.
Regarding the use of LAST_INSERT_ID(), it is reliable on a PER CONNECTION basis (see doc here). If the MySQL connection used to run this stored procedure is not used by other transactions, the use of LAST_INSERT_ID() is safe.
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.
Edit: Here is the OP's variant that omits the temp-table imported. Instead of inserting the data from the .csv to the imported table, you call the SP to directly store them to your database.
CREATE DEFINER=`root`#`localhost` PROCEDURE `import_data`(IN fname VARCHAR(255), IN lname VARCHAR(255),IN codesstr VARCHAR(255))
BEGIN
DECLARE splitted_value VARCHAR(255);
DECLARE done INT DEFAULT 0;
DECLARE newid INT DEFAULT 0;
DECLARE occurance INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
INSERT INTO users (FName,LName) VALUES (fname, lname);
SET newid = LAST_INSERT_ID();
SET i=1;
SET occurance = (SELECT LENGTH(codesstr) - LENGTH(REPLACE(codesstr, ',', '')) + 1);
WHILE i <= occurance DO
SET splitted_value =
(SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(codesstr, ',', i),
LENGTH(SUBSTRING_INDEX(codesstr, ',', i - 1)) + 1), ',', ''));
INSERT INTO codes (PID, code) VALUES (newid, splitted_value);
SET i = i + 1;
END WHILE;
END
Note: The code to split the codes is taken from here (MySQL does not provide a split function for strings).
I developed my WPF application application using the Entity Framework and used SQL server database and needed to read data from an excel file and had to insert that data into 2 tables that has relationship between them. For roughly about 15000 rows in excel it used to take around 4 hours of time. Then what I did was I used a block of 500 rows per insert and this speeded up my insertion to unbelievalbe fast and now it takes mere 3-5 seconds to import that same data.
So I would suggest you add your rows to a Context like 100/200/500 at a time and then call the SaveChanges method (if you really want to be using EF). There are other helpful tips as well to speed up the performance for EF. Please read this for your reference.
var totalRecords = TestPacksData.Rows.Count;
var totalPages = (totalRecords / ImportRecordsPerPage) + 1;
while (count <= totalPages)
{
var pageWiseRecords = TestPacksData.Rows.Cast<DataRow>().Skip(count * ImportRecordsPerPage).Take(ImportRecordsPerPage);
count++;
Project.CreateNewSheet(pageWiseRecords.ToList());
Project.CreateNewSpool(pageWiseRecords.ToList());
}
And here is the CreateNewSheet method
/// <summary>
/// Creates a new Sheet record in the database
/// </summary>
/// <param name="row">DataRow containing the Sheet record</param>
public void CreateNewSheet(List<DataRow> rows)
{
var tempSheetsList = new List<Sheet>();
foreach (var row in rows)
{
var sheetNo = row[SheetFields.Sheet_No.ToString()].ToString();
if (string.IsNullOrWhiteSpace(sheetNo))
continue;
var testPackNo = row[SheetFields.Test_Pack_No.ToString()].ToString();
TestPack testPack = null;
if (!string.IsNullOrWhiteSpace(testPackNo))
testPack = GetTestPackByTestPackNo(testPackNo);
var existingSheet = GetSheetBySheetNo(sheetNo);
if (existingSheet != null)
{
UpdateSheet(existingSheet, row);
continue;
}
var isometricNo = GetIsometricNoFromSheetNo(sheetNo);
var newSheet = new Sheet
{
sheet_no = sheetNo,
isometric_no = isometricNo,
ped_rev = row[SheetFields.PED_Rev.ToString()].ToString(),
gpc_rev = row[SheetFields.GPC_Rev.ToString()].ToString()
};
if (testPack != null)
{
newSheet.test_pack_id = testPack.id;
newSheet.test_pack_no = testPack.test_pack_no;
}
if (!tempSheetsList.Any(l => l.sheet_no == newSheet.sheet_no))
{
DataStore.Context.Sheets.Add(newSheet);
tempSheetsList.Add(newSheet);
}
}
try
{
DataStore.Context.SaveChanges();
**DataStore.Dispose();** This is very important. Dispose the context
}
catch (DbEntityValidationException ex)
{
// Create log for the exception here
}
}
CreateNewSpool is ditto same method except for the fields name and table name, because it updates a child table. But the idea is the same
1 - Add a column VirtualId to User table & class.
EDITED
2 - Assign numbers in a loop for the VirtualId (use negative numbers starting -1 to avoid collisions in the last step) field in each User Object. For each Code c object belonging to User u object set the c.UserId = u.VirtualId.
3 - Bulk load Users into User table, Bulk load Codes into Code table.
4- UPDATE CODE C,USER U SET C.UserId = U.Id WHERE C.UserId = U.VirtualId.
NOTE : If you have a FK Constraint on Code.UserId you can drop it and re-add it after the Insert.
public class User
{
public int Id { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public int VirtualId { get; set; }
}
public class Code
{
public int Id { get; set; }
public string Code { get; set; }
public string UserId { get; set; }
}
Can you break the CSV into two files?
E.g. Suppose your file has the following columns:
... A ... | ... B ...
a0 | b0
a0 | b1
a0 | b2 <-- data
a1 | b3
a1 | b4
So one set of A might have multiple B entries. After you break it apart, you get:
... A ...
a0
a1
... B ...
b0
b1
b2
b3
b4
Then you bulk insert them separately.
Edit: Pseudo code
Based on the conversation, something like:
DataTable tableA = ...; // query schema for TableA
DataTable tableB = ...; // query schmea for TableB
List<String> usernames = select distinct username from TableA;
Hashtable htUsername = new Hashtable(StringComparer.InvariantCultureIgnoreCase);
foreach (String username in usernames)
htUsername[username] = "";
int colUsername = ...;
foreach (String[] row in CSVFile) {
String un = row[colUsername] as String;
if (htUsername[un] == null) {
// add new row to tableA
DataRow row = tableA.NewRow();
row["Username"] = un;
// etc.
tableA.Rows.Add(row);
htUsername[un] = "";
}
}
// bulk insert TableA
select userid, username from TableA
Hashtable htUserId = new Hashtable(StringComparer.InvariantCultureIgnoreCase);
// htUserId[username] = userid;
int colUserId = ...;
foreach (String[] row in CSVFile) {
String un = row[colUsername] as String;
int userid = (int) htUserId[un];
DataRow row = tableB.NewRow();
row[colUserId] = userId;
// fill in other values
tableB.Rows.Add(row);
if (table.Rows.Count == 65000) {
// bulk insert TableB
var t = tableB.Clone();
tableB.Dispose();
tableB = t;
}
}
if (tableB.Rows.Count > 0)
// bulk insert TableB
AFAIK the insertions done in a table are sequential while the insertions in different table can be done in parallel. Open two separate new connections to the same database and then insert in parallel maybe by using Task Parallel Library.
However, if there are integrity constraints about 1:n relationship between the tables, then:
Insertions might fail and thus any parallel insert approach would be wrong. Clearly then your best bet would be to do sequential inserts only, one table after the other.
You can try and sort the data of both tables write the InsertInto method written below such that insert in second table will happen only after you are done inserting the data in the first one.
Edit: Since you have requested, if there is a possibility for you to perform the inserts in parallel, following is the code template you can use.
private void ParallelInserts()
{
..
//Other code in the method
..
//Read first csv into memory. It's just a GB so should be fine
ReadFirstCSV();
//Read second csv into memory...
ReadSecondCSV();
//Because the inserts will last more than a few CPU cycles...
var taskFactory = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.None)
//An array to hold the two parallel inserts
var insertTasks = new Task[2];
//Begin insert into first table...
insertTasks[0] = taskFactory.StartNew(() => InsertInto(commandStringFirst, connectionStringFirst));
//Begin insert into second table...
insertTasks[1] = taskFactory.StartNew(() => InsertInto(commandStringSecond, connectionStringSecond));
//Let them be done...
Task.WaitAll(insertTasks);
Console.WriteLine("Parallel insert finished.");
}
//Defining the InsertInto method which we are passing to the tasks in the method above
private static void InsertInto(string commandString, string connectionString)
{
using (/*open a new connection using the connectionString passed*/)
{
//In a while loop, iterate until you have 100/200/500 rows
while (fileIsNotExhausted)
{
using (/*commandString*/)
{
//Execute command to insert in bulk
}
}
}
}
When you say "efficiently" are you talking memory, or time?
In terms of improving the speed of the inserts, if you can do multiple value blocks per insert statement, you can get 500% improvement in speed. I did some benchmarks on this over in this question: Which is faster: multiple single INSERTs or one multiple-row INSERT?
My approach is described in the answer, but simply put, reading in up to say 50 "rows" (to be inserted) at once and bundling them into a single INSERT INTO(...), VALUES(...),(...),(...)...(...),(...) type statement seems to really speed things up. At least, if you're restricted from not being able to bulk load.
Another approach btw if you have live data you can't drop indexes on during the upload, is to create a memory table on the mysql server without indexes, dump the data there, and then do an INSERT INTO live SELECT * FROM mem. Though that uses more memory on the server, hence the question at the start of this answer about "what do you mean by 'efficiently'?" :)
Oh, and there's probably nothing wrong with iterating through the file and doing all the first table inserts first, and then doing the second table ones. Unless the data is being used live, I guess. In that case you could definitely still use the bundled approach, but the application logic to do that is a lot more complex.
UPDATE: OP requested example C# code for multivalue insert blocks.
Note: this code assumes you have a number of structures already configured:
tables List<string> - table names to insert into
fieldslist Dictionary<string, List<String>> - list of field names for each table
typeslist Dictionary<string, List<MySqlDbType>> - list of MySqlDbTypes for each table, same order as the field names.
nullslist Dictionary<string, List<Boolean>> - list of flags to tell if a field is nullable or not, for each table (same order as field names).
prikey Dictionary<string, string> - list of primary key field name, per table (note: this doesn't support multiple field primary keys, though if you needed it you could probably hack it in - I think somewhere I have a version that does support this, but... meh).
theData Dictionary<string, List<Dictionary<int, object>>> - the actual data, as a list of fieldnum-value dictionaries, per table.
Oh yeah, and the localcommand is MySqlCommand created by using CreateCommand() on the local MySqlConnection object.
Further note: I wrote this quite a while back when I was kind of starting. If this causes your eyes or brain to bleed, I apologise in advance :)
const int perinsert = 50;
foreach (string table in tables)
{
string[] fields = fieldslist[table].ToArray();
MySqlDbType[] types = typeslist[table].ToArray();
bool[] nulls = nullslist[table].ToArray();
int thisblock = perinsert;
int rowstotal = theData[table].Count;
int rowsremainder = rowstotal % perinsert;
int rowscopied = 0;
// Do the bulk (multi-VALUES block) INSERTs, but only if we have more rows than there are in a single bulk insert to perform:
while (rowscopied < rowstotal)
{
if (rowstotal - rowscopied < perinsert)
thisblock = rowstotal - rowscopied;
// Generate a 'perquery' multi-VALUES prepared INSERT statement:
List<string> extravals = new List<string>();
for (int j = 0; j < thisblock; j++)
extravals.Add(String.Format("(#{0}_{1})", j, String.Join(String.Format(", #{0}_", j), fields)));
localcmd.CommandText = String.Format("INSERT INTO {0} VALUES{1}", tmptable, String.Join(",", extravals.ToArray()));
// Now create the parameters to match these:
for (int j = 0; j < thisblock; j++)
for (int i = 0; i < fields.Length; i++)
localcmd.Parameters.Add(String.Format("{0}_{1}", j, fields[i]), types[i]).IsNullable = nulls[i];
// Keep doing bulk INSERTs until there's less rows left than we need for another one:
while (rowstotal - rowscopied >= thisblock)
{
// Queue up all the VALUES for this block INSERT:
for (int j = 0; j < thisblock; j++)
{
Dictionary<int, object> row = theData[table][rowscopied++];
for (int i = 0; i < fields.Length; i++)
localcmd.Parameters[String.Format("{0}_{1}", j, fields[i])].Value = row[i];
}
// Run the query:
localcmd.ExecuteNonQuery();
}
// Clear all the paramters - we're done here:
localcmd.Parameters.Clear();
}
}
Related
I have a C# code which does lot of insert statements in a batch. While executing these statements, I got "String or binary data would be truncated" error and transaction roledback.
To find out the which insert statement caused this, I need to insert one by one in the SQLServer until I hit the error.
Is there clever way to findout which statement and which field caused this issue using exception handling? (SqlException)
In general, there isn't a way to determine which particular statement caused the error. If you're running several, you could watch profiler and look at the last completed statement and see what the statement after that might be, though I have no idea if that approach is feasible for you.
In any event, one of your parameter variables (and the data inside it) is too large for the field it's trying to store data in. Check your parameter sizes against column sizes and the field(s) in question should be evident pretty quickly.
This type of error occurs when the datatype of the SQL Server column has a length which is less than the length of the data entered into the entry form.
this type of error generally occurs when you have to put characters or values more than that you have specified in Database table like in that case: you specify
transaction_status varchar(10)
but you actually trying to store
_transaction_status
which contain 19 characters. that's why you faced this type of error in this code
Generally it is that you are inserting a value that is greater than the maximum allowed value. Ex, data column can only hold up to 200 characters, but you are inserting 201-character string
BEGIN TRY
INSERT INTO YourTable (col1, col2) VALUES (#val1, #val2)
END TRY
BEGIN CATCH
--print or insert into error log or return param or etc...
PRINT '#val1='+ISNULL(CONVERT(varchar,#val1),'')
PRINT '#val2='+ISNULL(CONVERT(varchar,#val2),'')
END CATCH
For SQL 2016 SP2 or higher follow this link
For older versions of SQL do this:
Get the query that is causing the problems (you can also use SQL Profiler if you dont have the source)
Remove all WHERE clauses and other unimportant parts until you are basically just left with the SELECT and FROM parts
Add WHERE 0 = 1 (this will select only table structure)
Add INTO [MyTempTable] just before the FROM clause
You should end up with something like
SELECT
Col1, Col2, ..., [ColN]
INTO [MyTempTable]
FROM
[Tables etc.]
WHERE 0 = 1
This will create a table called MyTempTable in your DB that you can compare to your target table structure i.e. you can compare the columns on both tables to see where they differ. It is a bit of a workaround but it is the quickest method I have found.
It depends on how you are making the Insert Calls. All as one call, or as individual calls within a transaction? If individual calls, then yes (as you iterate through the calls, catch the one that fails). If one large call, then no. SQL is processing the whole statement, so it's out of the hands of the code.
I have created a simple way of finding offending fields by:
Getting the column width of all the columns of a table where we're trying to make this insert/ update. (I'm getting this info directly from the database.)
Comparing the column widths to the width of the values we're trying to insert/ update.
Assumptions/ Limitations:
The column names of the table in the database match with the C# entity fields. For eg: If you have a column like this in database:
You need to have your Entity with the same column name:
public class SomeTable
{
// Other fields
public string SourceData { get; set; }
}
You're inserting/ updating 1 entity at a time. It'll be clearer in the demo code below. (If you're doing bulk inserts/ updates, you might want to either modify it or use some other solution.)
Step 1:
Get the column width of all the columns directly from the database:
// For this, I took help from Microsoft docs website:
// https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.getschema?view=netframework-4.7.2#System_Data_SqlClient_SqlConnection_GetSchema_System_String_System_String___
private static Dictionary<string, int> GetColumnSizesOfTableFromDatabase(string tableName, string connectionString)
{
var columnSizes = new Dictionary<string, int>();
using (var connection = new SqlConnection(connectionString))
{
// Connect to the database then retrieve the schema information.
connection.Open();
// You can specify the Catalog, Schema, Table Name, Column Name to get the specified column(s).
// You can use four restrictions for Column, so you should create a 4 members array.
String[] columnRestrictions = new String[4];
// For the array, 0-member represents Catalog; 1-member represents Schema;
// 2-member represents Table Name; 3-member represents Column Name.
// Now we specify the Table_Name and Column_Name of the columns what we want to get schema information.
columnRestrictions[2] = tableName;
DataTable allColumnsSchemaTable = connection.GetSchema("Columns", columnRestrictions);
foreach (DataRow row in allColumnsSchemaTable.Rows)
{
var columnName = row.Field<string>("COLUMN_NAME");
//var dataType = row.Field<string>("DATA_TYPE");
var characterMaxLength = row.Field<int?>("CHARACTER_MAXIMUM_LENGTH");
// I'm only capturing columns whose Datatype is "varchar" or "char", i.e. their CHARACTER_MAXIMUM_LENGTH won't be null.
if(characterMaxLength != null)
{
columnSizes.Add(columnName, characterMaxLength.Value);
}
}
connection.Close();
}
return columnSizes;
}
Step 2:
Compare the column widths with the width of the values we're trying to insert/ update:
public static Dictionary<string, string> FindLongBinaryOrStringFields<T>(T entity, string connectionString)
{
var tableName = typeof(T).Name;
Dictionary<string, string> longFields = new Dictionary<string, string>();
var objectProperties = GetProperties(entity);
//var fieldNames = objectProperties.Select(p => p.Name).ToList();
var actualDatabaseColumnSizes = GetColumnSizesOfTableFromDatabase(tableName, connectionString);
foreach (var dbColumn in actualDatabaseColumnSizes)
{
var maxLengthOfThisColumn = dbColumn.Value;
var currentValueOfThisField = objectProperties.Where(f => f.Name == dbColumn.Key).First()?.GetValue(entity, null)?.ToString();
if (!string.IsNullOrEmpty(currentValueOfThisField) && currentValueOfThisField.Length > maxLengthOfThisColumn)
{
longFields.Add(dbColumn.Key, $"'{dbColumn.Key}' column cannot take the value of '{currentValueOfThisField}' because the max length it can take is {maxLengthOfThisColumn}.");
}
}
return longFields;
}
public static List<PropertyInfo> GetProperties<T>(T entity)
{
//The DeclaredOnly flag makes sure you only get properties of the object, not from the classes it derives from.
var properties = entity.GetType()
.GetProperties(System.Reflection.BindingFlags.Public
| System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.DeclaredOnly)
.ToList();
return properties;
}
Demo:
Let's say we're trying to insert someTableEntity of SomeTable class that is modeled in our app like so:
public class SomeTable
{
[Key]
public long TicketID { get; set; }
public string SourceData { get; set; }
}
And it's inside our SomeDbContext like so:
public class SomeDbContext : DbContext
{
public DbSet<SomeTable> SomeTables { get; set; }
}
This table in Db has SourceData field as varchar(16) like so:
Now we'll try to insert value that is longer than 16 characters into this field and capture this information:
public void SaveSomeTableEntity()
{
var connectionString = "server=SERVER_NAME;database=DB_NAME;User ID=SOME_ID;Password=SOME_PASSWORD;Connection Timeout=200";
using (var context = new SomeDbContext(connectionString))
{
var someTableEntity = new SomeTable()
{
SourceData = "Blah-Blah-Blah-Blah-Blah-Blah"
};
context.SomeTables.Add(someTableEntity);
try
{
context.SaveChanges();
}
catch (Exception ex)
{
if (ex.GetBaseException().Message == "String or binary data would be truncated.\r\nThe statement has been terminated.")
{
var badFieldsReport = "";
List<string> badFields = new List<string>();
// YOU GOT YOUR FIELDS RIGHT HERE:
var longFields = FindLongBinaryOrStringFields(someTableEntity, connectionString);
foreach (var longField in longFields)
{
badFields.Add(longField.Key);
badFieldsReport += longField.Value + "\n";
}
}
else
throw;
}
}
}
The badFieldsReport will have this value:
'SourceData' column cannot take the value of
'Blah-Blah-Blah-Blah-Blah-Blah' because the max length it can take is
16.
It could also be because you're trying to put in a null value back into the database. So one of your transactions could have nulls in them.
Most of the answers here are to do the obvious check, that the length of the column as defined in the database isn't smaller than the data you are trying to pass into it.
Several times I have been bitten by going to SQL Management Studio, doing a quick:
sp_help 'mytable'
and be confused for a few minutes until I realize the column in question is an nvarchar, which means the length reported by sp_help is really double the real length supported because it's a double byte (unicode) datatype.
i.e. if sp_help reports nvarchar Length 40, you can store 20 characters max.
Checkout this gist.
https://gist.github.com/mrameezraja/9f15ad624e2cba8ac24066cdf271453b.
public Dictionary<string, string> GetEvilFields(string tableName, object instance)
{
Dictionary<string, string> result = new Dictionary<string, string>();
var tableType = this.Model.GetEntityTypes().First(c => c.GetTableName().Contains(tableName));
if (tableType != null)
{
int i = 0;
foreach (var property in tableType.GetProperties())
{
var maxlength = property.GetMaxLength();
var prop = instance.GetType().GetProperties().FirstOrDefault(_ => _.Name == property.Name);
if (prop != null)
{
var length = prop.GetValue(instance)?.ToString()?.Length;
if (length > maxlength)
{
result.Add($"{i}.Evil.Property", prop.Name);
result.Add($"{i}.Evil.Value", prop.GetValue(instance)?.ToString());
result.Add($"{i}.Evil.Value.Length", length?.ToString());
result.Add($"{i}.Evil.Db.MaxLength", maxlength?.ToString());
i++;
}
}
}
}
return result;
}
With Linq To SQL I debugged by logging the context, eg. Context.Log = Console.Out
Then scanned the SQL to check for any obvious errors, there were two:
-- #p46: Input Char (Size = -1; Prec = 0; Scale = 0) [some long text value1]
-- #p8: Input Char (Size = -1; Prec = 0; Scale = 0) [some long text value2]
the last one I found by scanning the table schema against the values, the field was nvarchar(20) but the value was 22 chars
-- #p41: Input NVarChar (Size = 4000; Prec = 0; Scale = 0) [1234567890123456789012]
In our own case I increase the sql table allowable character or field size which is less than the total characters posted from the front end. Hence that resolve the issue.
Simply Used this:
MessageBox.Show(cmd4.CommandText.ToString());
in c#.net and this will show you main query , Copy it and run in database .
I am running data.bat file with the following lines:
Rem Tis batch file will populate tables
cd\program files\Microsoft SQL Server\MSSQL
osql -U sa -P Password -d MyBusiness -i c:\data.sql
The contents of the data.sql file is:
insert Customers
(CustomerID, CompanyName, Phone)
Values('101','Southwinds','19126602729')
There are 8 more similar lines for adding records.
When I run this with start > run > cmd > c:\data.bat, I get this error message:
1>2>3>4>5>....<1 row affected>
Msg 8152, Level 16, State 4, Server SP1001, Line 1
string or binary data would be truncated.
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
<1 row affected>
Also, I am a newbie obviously, but what do Level #, and state # mean, and how do I look up error messages such as the one above: 8152?
From #gmmastros's answer
Whenever you see the message....
string or binary data would be truncated
Think to yourself... The field is NOT big enough to hold my data.
Check the table structure for the customers table. I think you'll find that the length of one or more fields is NOT big enough to hold the data you are trying to insert. For example, if the Phone field is a varchar(8) field, and you try to put 11 characters in to it, you will get this error.
I had this issue although data length was shorter than the field length.
It turned out that the problem was having another log table (for audit trail), filled by a trigger on the main table, where the column size also had to be changed.
In one of the INSERT statements you are attempting to insert a too long string into a string (varchar or nvarchar) column.
If it's not obvious which INSERT is the offender by a mere look at the script, you could count the <1 row affected> lines that occur before the error message. The obtained number plus one gives you the statement number. In your case it seems to be the second INSERT that produces the error.
Just want to contribute with additional information: I had the same issue and it was because of the field wasn't big enough for the incoming data and this thread helped me to solve it (the top answer clarifies it all).
BUT it is very important to know what are the possible reasons that may cause it.
In my case i was creating the table with a field like this:
Select '' as Period, * From Transactions Into #NewTable
Therefore the field "Period" had a length of Zero and causing the Insert operations to fail. I changed it to "XXXXXX" that is the length of the incoming data and it now worked properly (because field now had a lentgh of 6).
I hope this help anyone with same issue :)
Some of your data cannot fit into your database column (small). It is not easy to find what is wrong. If you use C# and Linq2Sql, you can list the field which would be truncated:
First create helper class:
public class SqlTruncationExceptionWithDetails : ArgumentOutOfRangeException
{
public SqlTruncationExceptionWithDetails(System.Data.SqlClient.SqlException inner, DataContext context)
: base(inner.Message + " " + GetSqlTruncationExceptionWithDetailsString(context))
{
}
/// <summary>
/// PArt of code from following link
/// http://stackoverflow.com/questions/3666954/string-or-binary-data-would-be-truncated-linq-exception-cant-find-which-fiel
/// </summary>
/// <param name="context"></param>
/// <returns></returns>
static string GetSqlTruncationExceptionWithDetailsString(DataContext context)
{
StringBuilder sb = new StringBuilder();
foreach (object update in context.GetChangeSet().Updates)
{
FindLongStrings(update, sb);
}
foreach (object insert in context.GetChangeSet().Inserts)
{
FindLongStrings(insert, sb);
}
return sb.ToString();
}
public static void FindLongStrings(object testObject, StringBuilder sb)
{
foreach (var propInfo in testObject.GetType().GetProperties())
{
foreach (System.Data.Linq.Mapping.ColumnAttribute attribute in propInfo.GetCustomAttributes(typeof(System.Data.Linq.Mapping.ColumnAttribute), true))
{
if (attribute.DbType.ToLower().Contains("varchar"))
{
string dbType = attribute.DbType.ToLower();
int numberStartIndex = dbType.IndexOf("varchar(") + 8;
int numberEndIndex = dbType.IndexOf(")", numberStartIndex);
string lengthString = dbType.Substring(numberStartIndex, (numberEndIndex - numberStartIndex));
int maxLength = 0;
int.TryParse(lengthString, out maxLength);
string currentValue = (string)propInfo.GetValue(testObject, null);
if (!string.IsNullOrEmpty(currentValue) && maxLength != 0 && currentValue.Length > maxLength)
{
//string is too long
sb.AppendLine(testObject.GetType().Name + "." + propInfo.Name + " " + currentValue + " Max: " + maxLength);
}
}
}
}
}
}
Then prepare the wrapper for SubmitChanges:
public static class DataContextExtensions
{
public static void SubmitChangesWithDetailException(this DataContext dataContext)
{
//http://stackoverflow.com/questions/3666954/string-or-binary-data-would-be-truncated-linq-exception-cant-find-which-fiel
try
{
//this can failed on data truncation
dataContext.SubmitChanges();
}
catch (SqlException sqlException) //when (sqlException.Message == "String or binary data would be truncated.")
{
if (sqlException.Message == "String or binary data would be truncated.") //only for EN windows - if you are running different window language, invoke the sqlException.getMessage on thread with EN culture
throw new SqlTruncationExceptionWithDetails(sqlException, dataContext);
else
throw;
}
}
}
Prepare global exception handler and log truncation details:
protected void Application_Error(object sender, EventArgs e)
{
Exception ex = Server.GetLastError();
string message = ex.Message;
//TODO - log to file
}
Finally use the code:
Datamodel.SubmitChangesWithDetailException();
Another situation in which you can get this error is the following:
I had the same error and the reason was that in an INSERT statement that received data from an UNION, the order of the columns was different from the original table. If you change the order in #table3 to a, b, c, you will fix the error.
select a, b, c into #table1
from #table0
insert into #table1
select a, b, c from #table2
union
select a, c, b from #table3
on sql server you can use SET ANSI_WARNINGS OFF like this:
using (SqlConnection conn = new SqlConnection("Data Source=XRAYGOAT\\SQLEXPRESS;Initial Catalog='Healthy Care';Integrated Security=True"))
{
conn.Open();
using (var trans = conn.BeginTransaction())
{
try
{
using cmd = new SqlCommand("", conn, trans))
{
cmd.CommandText = "SET ANSI_WARNINGS OFF";
cmd.ExecuteNonQuery();
cmd.CommandText = "YOUR INSERT HERE";
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
cmd.CommandText = "SET ANSI_WARNINGS ON";
cmd.ExecuteNonQuery();
trans.Commit();
}
}
catch (Exception)
{
trans.Rollback();
}
}
conn.Close();
}
I had the same issue. The length of my column was too short.
What you can do is either increase the length or shorten the text you want to put in the database.
Also had this problem occurring on the web application surface.
Eventually found out that the same error message comes from the SQL update statement in the specific table.
Finally then figured out that the column definition in the relating history table(s) did not map the original table column length of nvarchar types in some specific cases.
I had the same problem, even after increasing the size of the problematic columns in the table.
tl;dr: The length of the matching columns in corresponding Table Types may also need to be increased.
In my case, the error was coming from the Data Export service in Microsoft Dynamics CRM, which allows CRM data to be synced to an SQL Server DB or Azure SQL DB.
After a lengthy investigation, I concluded that the Data Export service must be using Table-Valued Parameters:
You can use table-valued parameters to send multiple rows of data to a Transact-SQL statement or a routine, such as a stored procedure or function, without creating a temporary table or many parameters.
As you can see in the documentation above, Table Types are used to create the data ingestion procedure:
CREATE TYPE LocationTableType AS TABLE (...);
CREATE PROCEDURE dbo.usp_InsertProductionLocation
#TVP LocationTableType READONLY
Unfortunately, there is no way to alter a Table Type, so it has to be dropped & recreated entirely. Since my table has over 300 fields (😱), I created a query to facilitate the creation of the corresponding Table Type based on the table's columns definition (just replace [table_name] with your table's name):
SELECT 'CREATE TYPE [table_name]Type AS TABLE (' + STRING_AGG(CAST(field AS VARCHAR(max)), ',' + CHAR(10)) + ');' AS create_type
FROM (
SELECT TOP 5000 COLUMN_NAME + ' ' + DATA_TYPE
+ IIF(CHARACTER_MAXIMUM_LENGTH IS NULL, '', CONCAT('(', IIF(CHARACTER_MAXIMUM_LENGTH = -1, 'max', CONCAT(CHARACTER_MAXIMUM_LENGTH,'')), ')'))
+ IIF(DATA_TYPE = 'decimal', CONCAT('(', NUMERIC_PRECISION, ',', NUMERIC_SCALE, ')'), '')
AS field
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = '[table_name]'
ORDER BY ORDINAL_POSITION) AS T;
After updating the Table Type, the Data Export service started functioning properly once again! :)
When I tried to execute my stored procedure I had the same problem because the size of the column that I need to add some data is shorter than the data I want to add.
You can increase the size of the column data type or reduce the length of your data.
A 2016/2017 update will show you the bad value and column.
A new trace flag will swap the old error for a new 2628 error and will print out the column and offending value. Traceflag 460 is available in the latest cumulative update for 2016 and 2017:
https://support.microsoft.com/en-sg/help/4468101/optional-replacement-for-string-or-binary-data-would-be-truncated
Just make sure that after you've installed the CU that you enable the trace flag, either globally/permanently on the server:
...or with DBCC TRACEON:
https://learn.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-traceon-trace-flags-transact-sql?view=sql-server-ver15
Another situation, in which this error may occur is in
SQL Server Management Studio. If you have "text" or "ntext" fields in your table,
no matter what kind of field you are updating (for example bit or integer).
Seems that the Studio does not load entire "ntext" fields and also updates ALL fields instead of the modified one.
To solve the problem, exclude "text" or "ntext" fields from the query in Management Studio
This Error Comes only When any of your field length is greater than the field length specified in sql server database table structure.
To overcome this issue you have to reduce the length of the field Value .
Or to increase the length of database table field .
If someone is encountering this error in a C# application, I have created a simple way of finding offending fields by:
Getting the column width of all the columns of a table where we're trying to make this insert/ update. (I'm getting this info directly from the database.)
Comparing the column widths to the width of the values we're trying to insert/ update.
Assumptions/ Limitations:
The column names of the table in the database match with the C# entity fields. For eg: If you have a column like this in database:
You need to have your Entity with the same column name:
public class SomeTable
{
// Other fields
public string SourceData { get; set; }
}
You're inserting/ updating 1 entity at a time. It'll be clearer in the demo code below. (If you're doing bulk inserts/ updates, you might want to either modify it or use some other solution.)
Step 1:
Get the column width of all the columns directly from the database:
// For this, I took help from Microsoft docs website:
// https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.getschema?view=netframework-4.7.2#System_Data_SqlClient_SqlConnection_GetSchema_System_String_System_String___
private static Dictionary<string, int> GetColumnSizesOfTableFromDatabase(string tableName, string connectionString)
{
var columnSizes = new Dictionary<string, int>();
using (var connection = new SqlConnection(connectionString))
{
// Connect to the database then retrieve the schema information.
connection.Open();
// You can specify the Catalog, Schema, Table Name, Column Name to get the specified column(s).
// You can use four restrictions for Column, so you should create a 4 members array.
String[] columnRestrictions = new String[4];
// For the array, 0-member represents Catalog; 1-member represents Schema;
// 2-member represents Table Name; 3-member represents Column Name.
// Now we specify the Table_Name and Column_Name of the columns what we want to get schema information.
columnRestrictions[2] = tableName;
DataTable allColumnsSchemaTable = connection.GetSchema("Columns", columnRestrictions);
foreach (DataRow row in allColumnsSchemaTable.Rows)
{
var columnName = row.Field<string>("COLUMN_NAME");
//var dataType = row.Field<string>("DATA_TYPE");
var characterMaxLength = row.Field<int?>("CHARACTER_MAXIMUM_LENGTH");
// I'm only capturing columns whose Datatype is "varchar" or "char", i.e. their CHARACTER_MAXIMUM_LENGTH won't be null.
if(characterMaxLength != null)
{
columnSizes.Add(columnName, characterMaxLength.Value);
}
}
connection.Close();
}
return columnSizes;
}
Step 2:
Compare the column widths with the width of the values we're trying to insert/ update:
public static Dictionary<string, string> FindLongBinaryOrStringFields<T>(T entity, string connectionString)
{
var tableName = typeof(T).Name;
Dictionary<string, string> longFields = new Dictionary<string, string>();
var objectProperties = GetProperties(entity);
//var fieldNames = objectProperties.Select(p => p.Name).ToList();
var actualDatabaseColumnSizes = GetColumnSizesOfTableFromDatabase(tableName, connectionString);
foreach (var dbColumn in actualDatabaseColumnSizes)
{
var maxLengthOfThisColumn = dbColumn.Value;
var currentValueOfThisField = objectProperties.Where(f => f.Name == dbColumn.Key).First()?.GetValue(entity, null)?.ToString();
if (!string.IsNullOrEmpty(currentValueOfThisField) && currentValueOfThisField.Length > maxLengthOfThisColumn)
{
longFields.Add(dbColumn.Key, $"'{dbColumn.Key}' column cannot take the value of '{currentValueOfThisField}' because the max length it can take is {maxLengthOfThisColumn}.");
}
}
return longFields;
}
public static List<PropertyInfo> GetProperties<T>(T entity)
{
//The DeclaredOnly flag makes sure you only get properties of the object, not from the classes it derives from.
var properties = entity.GetType()
.GetProperties(System.Reflection.BindingFlags.Public
| System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.DeclaredOnly)
.ToList();
return properties;
}
Demo:
Let's say we're trying to insert someTableEntity of SomeTable class that is modeled in our app like so:
public class SomeTable
{
[Key]
public long TicketID { get; set; }
public string SourceData { get; set; }
}
And it's inside our SomeDbContext like so:
public class SomeDbContext : DbContext
{
public DbSet<SomeTable> SomeTables { get; set; }
}
This table in Db has SourceData field as varchar(16) like so:
Now we'll try to insert value that is longer than 16 characters into this field and capture this information:
public void SaveSomeTableEntity()
{
var connectionString = "server=SERVER_NAME;database=DB_NAME;User ID=SOME_ID;Password=SOME_PASSWORD;Connection Timeout=200";
using (var context = new SomeDbContext(connectionString))
{
var someTableEntity = new SomeTable()
{
SourceData = "Blah-Blah-Blah-Blah-Blah-Blah"
};
context.SomeTables.Add(someTableEntity);
try
{
context.SaveChanges();
}
catch (Exception ex)
{
if (ex.GetBaseException().Message == "String or binary data would be truncated.\r\nThe statement has been terminated.")
{
var badFieldsReport = "";
List<string> badFields = new List<string>();
// YOU GOT YOUR FIELDS RIGHT HERE:
var longFields = FindLongBinaryOrStringFields(someTableEntity, connectionString);
foreach (var longField in longFields)
{
badFields.Add(longField.Key);
badFieldsReport += longField.Value + "\n";
}
}
else
throw;
}
}
}
The badFieldsReport will have this value:
'SourceData' column cannot take the value of
'Blah-Blah-Blah-Blah-Blah-Blah' because the max length it can take is
16.
Kevin Pope's comment under the accepted answer was what I needed.
The problem, in my case, was that I had triggers defined on my table that would insert update/insert transactions into an audit table, but the audit table had a data type mismatch where a column with VARCHAR(MAX) in the original table was stored as VARCHAR(1) in the audit table, so my triggers were failing when I would insert anything greater than VARCHAR(1) in the original table column and I would get this error message.
I used a different tactic, fields that are allocated 8K in some places. Here only about 50/100 are used.
declare #NVPN_list as table
nvpn varchar(50)
,nvpn_revision varchar(5)
,nvpn_iteration INT
,mpn_lifecycle varchar(30)
,mfr varchar(100)
,mpn varchar(50)
,mpn_revision varchar(5)
,mpn_iteration INT
-- ...
) INSERT INTO #NVPN_LIST
SELECT left(nvpn ,50) as nvpn
,left(nvpn_revision ,10) as nvpn_revision
,nvpn_iteration
,left(mpn_lifecycle ,30)
,left(mfr ,100)
,left(mpn ,50)
,left(mpn_revision ,5)
,mpn_iteration
,left(mfr_order_num ,50)
FROM [DASHBOARD].[dbo].[mpnAttributes] (NOLOCK) mpna
I wanted speed, since I have 1M total records, and load 28K of them.
This error may be due to less field size than your entered data.
For e.g. if you have data type nvarchar(7) and if your value is 'aaaaddddf' then error is shown as:
string or binary data would be truncated
You simply can't beat SQL Server on this.
You can insert into a new table like this:
select foo, bar
into tmp_new_table_to_dispose_later
from my_table
and compare the table definition with the real table you want to insert the data into.
Sometime it's helpful sometimes it's not.
If you try inserting in the final/real table from that temporary table it may just work (due to data conversion working differently than SSMS for example).
Another alternative is to insert the data in chunks, instead of inserting everything immediately you insert with top 1000 and you repeat the process, till you find a chunk with an error. At least you have better visibility on what's not fitting into the table.
I have a table as shown below. It has accounts of type Fixed and Savings. I need to update the status of all accounts of user 1. There are 10000 accounts for this user. Essentially the logic would be as shown in the following SQL Stored Procedure Script. The script takes only less than 1 second to execute (83 milli seconds).
But when I converted it to a ORM using LINQ to SQL it takes more than 3 minutes (204814 milli seconds). It is at least 240,000% slower.
Is there a pattern in LINQ to SQL (or other ORM) that will help to overcome this performance hit?
What can force it to do a update in one go to database?
Note: I am aware of calling stored procedures from LINQ. I don’t see that as ORM and not an option for me.
Manual Stored Procedure Script
DECLARE #UserID INT
DECLARE #StatusForFixed VARCHAR(50)
DECLARE #StatusForSavings VARCHAR(50)
SET #UserID = 1
SET #StatusForFixed = 'FrozenFA11'
SET #StatusForSavings = 'FrozenSB22'
UPDATE BankAccount
SET Status =
CASE
WHEN BankAccount.AccountType='Fixed' THEN #StatusForFixed
WHEN BankAccount.AccountType='Savings' THEN #StatusForSavings
END
WHERE AccountOwnerID=#UserID
LINQ Generated Code Sample
Note: This type of statements happen 10000 times
UPDATE [dbo].[BankAccount]
SET [Status] = #p3
WHERE [BankAccountID] = #p0
-- #p0: Input Int (Size = -1; Prec = 0; Scale = 0) [3585]
-- #p3: Input NChar (Size = 10; Prec = 0; Scale = 0) [FrozenSB]
CODE after applying ORM
public class BankAccountAppService
{
public RepositoryLayer.ILijosBankRepository AccountRepository { get; set; }
public void FreezeAllAccountsForUser(int userId)
{
IEnumerable<DBML_Project.BankAccount> accounts = AccountRepository.GetAllAccountsForUser(userId);
foreach (DBML_Project.BankAccount acc in accounts)
{
acc.Freeze();
}
AccountRepository.UpdateAccount();
}
}
public class LijosSimpleBankRepository : ILijosBankRepository
{
public System.Data.Linq.DataContext Context
{
get;
set;
}
public List<DBML_Project.BankAccount> GetAllAccountsForUser(int userID)
{
IQueryable<DBML_Project.BankAccount> queryResultEntities = Context.GetTable<DBML_Project.BankAccount>().Where(p => p.AccountOwnerID == userID);
return queryResultEntities.ToList();
}
public List<T> GetAllAccountsofType<T>() where T : DBML_Project.BankAccount
{
var query = from p in Context.GetTable<DBML_Project.BankAccount>().OfType<T>()
select p;
List<T> typeList = query.ToList();
return typeList;
}
public virtual void UpdateAccount()
{
Context.SubmitChanges();
}
}
namespace DBML_Project
{
public partial class BankAccount
{
//Define the domain behaviors
public virtual void Freeze()
{
//Do nothing
}
}
public class FixedBankAccount : BankAccount
{
public override void Freeze()
{
this.Status = "FrozenFA";
}
}
public class SavingsBankAccount : BankAccount
{
public override void Freeze()
{
this.Status = "FrozenSB";
}
}
}
REFERENCE
Pass List as XElement to be used as XML Datatype parameter
You are comparing two wildly different scenarios:
1: running a script locally on the SQL server, a single set-based UPDATE
2: fetching 10,000 records over the network, updating each, submitting each individually
You can improve 2 a bit by deferring the SubmitChanges() into one single batch of 10,000 rather than 10,000 batches of 1 (just: don't call SubmitChanges() until the end), but that still involves sending the details of 10,000 records in two directions, plus all the overheads (for example, SubmitChanges() might still choose to do that via 10,000 individual calls).
Basically, object-based tools are not intended for bulk updates against records. If the SP works, use the SP. Maybe call the SP via a data-context, just for convenience of it adding the method/parameters/etc.
You can still execute your stored procedure / custom SQL script from your application. You can even map the procedure in your Linq-to-sql model so that you don't need to open connection and create command manually.
I'm not exactly sure if Linq-to-sql always executes each modification command in separate roundtrip to database but I guess it does (at least in most cases). EF does it always. NHibernate has better support for such operations because it has command batching.
What you showed here is not batch update (single command updating a lot of records) - most ORMs will always update each record separately - that is how these tools work. If you load records and modify each of them in loop the relation to original query used to load records is lost. You now have 10.000 loaded records in your application which must be updated. The bulk update is not possible because you must move 10.000 changes from your application to the database.
If you want to do bulk update you should either use direct SQL or implement some logic which will make the update from Linq-to-sql instead of loading records and updating them in the application. Check this article or simply search for Bulk / Batch updates in Linq-to-sql.
This is because Linq to SQL First loads the data from server and then update each record individually which includes data query/transfer to client, update request for each record. Whereas in SP case there is just a call to SP which executes the Update query on the server directly and it does not include data fetch and update of each record. It updates the record in bulk
Another approach I did is passing the object values to the stored procedure as XML datatype. But there comes a timeout exception (after some 25 seconds) when the record count is more than 1000. Is it due to huge xml file?
Note: It takes around 5 seconds for 1000 records
public virtual void UpdateBankAccountUsingParseXML_SP(System.Xml.Linq.XElement inputXML)
{
string connectionstring = "Data Source=.;Initial Catalog=LibraryReservationSystem;Integrated Security=True;Connect Timeout=600";
var myDataContext = new DBML_Project.MyDataClassesDataContext(connectionstring);
myDataContext.ParseXML(inputXML);
}
public void FreezeAllAccountsForUser(int userId)
{
List<DTOLayer.BankAccountDTOForStatus> bankAccountDTOList = new List<DTOLayer.BankAccountDTOForStatus>();
IEnumerable<DBML_Project.BankAccount> accounts = AccountRepository.GetAllAccountsForUser(userId);
foreach (DBML_Project.BankAccount acc in accounts)
{
string typeResult = Convert.ToString(acc.GetType());
string baseValue = Convert.ToString(typeof(DBML_Project.BankAccount));
if (String.Equals(typeResult, baseValue))
{
throw new Exception("Not correct derived type");
}
acc.Freeze();
DTOLayer.BankAccountDTOForStatus presentAccount = new DTOLayer.BankAccountDTOForStatus();
presentAccount.BankAccountID = acc.BankAccountID;
presentAccount.Status = acc.Status;
bankAccountDTOList.Add(presentAccount);
}
IEnumerable<System.Xml.Linq.XElement> el = bankAccountDTOList.Select(x =>
new System.Xml.Linq.XElement("BankAccountDTOForStatus",
new System.Xml.Linq.XElement("BankAccountID", x.BankAccountID),
new System.Xml.Linq.XElement("Status", x.Status)
));
System.Xml.Linq.XElement root = new System.Xml.Linq.XElement("root", el);
AccountRepository.UpdateBankAccountUsingParseXML_SP(root);
//AccountRepository.Update();
}
Stored Procedure
ALTER PROCEDURE [dbo].[ParseXML] (#InputXML xml)
AS
BEGIN
DECLARE #MyTable TABLE (RowNumber int, BankAccountID int, StatusVal varchar(max))
INSERT INTO #MyTable(RowNumber, BankAccountID,StatusVal)
SELECT ROW_NUMBER() OVER(ORDER BY c.value('BankAccountID[1]','int') ASC) AS Row,
c.value('BankAccountID[1]','int'),
c.value('Status[1]','varchar(32)')
FROM
#inputXML.nodes('//BankAccountDTOForStatus') T(c);
DECLARE #Count INT
SET #Count = 0
DECLARE #NumberOfRows INT
SELECT #NumberOfRows = COUNT(*) FROM #MyTable
WHILE #Count < #NumberOfRows
BEGIN
SET #Count = #Count + 1
DECLARE #BankAccID INT
DECLARE #Status VARCHAR(MAX)
SELECT #BankAccID = BankAccountID
FROM #MyTable
WHERE RowNumber = #Count
SELECT #Status = StatusVal
FROM #MyTable
WHERE RowNumber = #Count
UPDATE BankAccount
SET Status= #Status
WHERE BankAccountID = #BankAccID
END
END
GO
I'm not great at .NET but am learning (at least trying to! ;) ). However, this bit of code I'm working has me baffled. What I want to do is insert a row into a SQL Server 2008 database table called Comment, then use the id of this inserted row to populate a second table (CommentOtherAuthor) with new rows of data. Basically, a comment can have multiple authors.
Here's the code:
public static Comment MakeNew(int parentNodeId, string firstname, string surname, string occupation, string affiliation, string title, string email, bool publishemail, bool competinginterests, string competingintereststext, string[] otherfirstname, string[] othersurname, string[] otheroccupation, string[] otheraffiliation, string[] otheremail, bool approved, bool spam, DateTime created, string commentText, int statusId)
{
var c = new Comment
{
ParentNodeId = parentNodeId,
FirstName = firstname,
Surname = surname,
Occupation = occupation,
Affiliation = affiliation,
Title = title,
Email = email,
PublishEmail = publishemail,
CompetingInterests = competinginterests,
CompetingInterestsText = competingintereststext,
OtherFirstName = otherfirstname,
OtherSurname = othersurname,
OtherOccupation = otheroccupation,
OtherAffiliation = otheraffiliation,
OtherEmail = otheremail,
Approved = approved,
Spam = spam,
Created = created,
CommenText = commentText,
StatusId = statusId
};
var sqlHelper = DataLayerHelper.CreateSqlHelper(umbraco.GlobalSettings.DbDSN);
c.Id = sqlHelper.ExecuteScalar<int>(
#"insert into Comment(mainid,nodeid,firstname,surname,occupation,affiliation,title,email,publishemail,competinginterests,competingintereststext,comment,approved,spam,created,statusid)
values(#mainid,#nodeid,#firstname,#surname,#occupation,#affiliation,#title,#email,#publishemail,#competinginterests,#competingintereststext,#comment,#approved,#spam,#created,#statusid)",
sqlHelper.CreateParameter("#mainid", -1),
sqlHelper.CreateParameter("#nodeid", c.ParentNodeId),
sqlHelper.CreateParameter("#firstname", c.FirstName),
sqlHelper.CreateParameter("#surname", c.Surname),
sqlHelper.CreateParameter("#occupation", c.Occupation),
sqlHelper.CreateParameter("#affiliation", c.Affiliation),
sqlHelper.CreateParameter("#title", c.Title),
sqlHelper.CreateParameter("#email", c.Email),
sqlHelper.CreateParameter("#publishemail", c.PublishEmail),
sqlHelper.CreateParameter("#competinginterests", c.CompetingInterests),
sqlHelper.CreateParameter("#competingintereststext", c.CompetingInterestsText),
sqlHelper.CreateParameter("#comment", c.CommenText),
sqlHelper.CreateParameter("#approved", c.Approved),
sqlHelper.CreateParameter("#spam", c.Spam),
sqlHelper.CreateParameter("#created", c.Created),
sqlHelper.CreateParameter("#statusid", c.StatusId));
c.OnCommentCreated(EventArgs.Empty);
for (int x = 0; x < otherfirstname.Length; x++)
{
sqlHelper.ExecuteScalar<int>(
#"insert into CommentOtherAuthor(firstname,surname,occupation,affiliation,email,commentid) values(#firstname,#surname,#occupation,#affiliation,#email,#commentid)",
sqlHelper.CreateParameter("#firstname", otherfirstname[x]),
sqlHelper.CreateParameter("#surname", othersurname[x]),
sqlHelper.CreateParameter("#occupation", otheroccupation[x]),
sqlHelper.CreateParameter("#affiliation", otheraffiliation[x]),
sqlHelper.CreateParameter("#email", otheremail[x]),
sqlHelper.CreateParameter("#commentid", 123)
);
}
if (c.Spam)
{
c.OnCommentSpam(EventArgs.Empty);
}
if (c.Approved)
{
c.OnCommentApproved(EventArgs.Empty);
}
return c;
}
The key line is:
sqlHelper.CreateParameter("#commentid", 123)
At the moment, I'm just hard-coding the id for the comment as 123, but really I need it to be the id of the record just inserted into the comment table.
I just don't really understand how to grab the last insert from the table Comment without doing a new
SELECT TOP 1 id FROM Comment ORDER BY id DESC
which doesn't strike me as the best way to do this.
Can anyone suggest how to get this working?
Many thanks!
That SELECT TOP 1 id ... query most likely wouldn't give you the proper results anyway in a system under load. If you have 20 or 50 clients inserting comments at the same time, by the time you query the table again, chances are very high you would be getting someone else's id ...
The best way I see to do this would be:
add an OUTPUT clause to your original insert and capture the newly inserted ID
use that ID for your second insert
Something along the lines of:
c.Id = sqlHelper.ExecuteScalar<int>(
#"insert into Comment(......)
output Inserted.ID
values(.............)",
Using this approach, your c.Id value should now be the newly inserted ID - use that in your next insert statement! (note: right now, you're probably always getting a 1 back - the number of rows affected by your statement ...)
This approach assumes your table Comment has a column of type INT IDENTITY that will be automatically set when you insert a new row into it.
for (int x = 0; x < otherfirstname.Length; x++)
{
sqlHelper.ExecuteScalar<int>(
#"insert into CommentOtherAuthor(.....) values(.....)",
sqlHelper.CreateParameter("#firstname", otherfirstname[x]),
sqlHelper.CreateParameter("#surname", othersurname[x]),
sqlHelper.CreateParameter("#occupation", otheroccupation[x]),
sqlHelper.CreateParameter("#affiliation", otheraffiliation[x]),
sqlHelper.CreateParameter("#email", otheremail[x]),
sqlHelper.CreateParameter("#commentid", c.Id) <<=== use that value you got back!
);
}
Assuming you are using Microsoft SQL Server, you could design your table Comment so the column Id has the property Identity set to true. This way the database will generate and auto-increment the id each time a row is inserted into the table.
You would have to use the following line in your SQL request:
OUTPUT INSERTED.Id
in order to return this Id to your C# code when the request is executed.
I have some C# code that creates several Oracle Temporary Tables with the "ON COMMIT DELETE ROWS" option inside of a transaction.
Inside of the transaction, I will insert a bunch of rows in to various temp tables. In some scenarios I need to Truncate a particular temp table so that I can start fresh with that table, but leave the other temp tables alone.
I found that Oracle must be doing an implicit COMMIT when you perform the Truncate, as not only was the particular temp table being Truncated, but all of my temp tables are being truncated.
Ok, I've read elsewhere that the Truncate command is considered a DDL command and that is why the commit is being processed which results in my "ON COMMIT DELETE ROWS" temp tables being cleared.
If that is true, wouldn't the act of creating a new temp table also be a DDL command and it would also trip the same commit clearing all of the other temp tables? If so, I haven't seen that behavior. I have created new temp tables in my code and found that the previously created temp tables still have their rows intact after the new temp table has been created.
Here's some C# code that demonstrates the problem (helper routines not included here):
private void RunTest()
{
if (_oc == null)
_oc = new OracleConnection("data source=myserver;user id=myid;password=mypassword");
_oc.Open();
_tran = _oc.BeginTransaction();
string tt1 = "DMTEST1";
AddTempTable(tt1, false);
int TempTableRowCount0 = GetTempTableRowCount(tt1);
AddRows(tt1, 5);
int TempTableRowCount10 = GetTempTableRowCount(tt1);
string tt2 = "DMTEST2";
AddTempTable(tt2, false);
int TempTableRowCount12 = GetTempTableRowCount(tt1); // This will have the same value as TempTableRowCount10
AddRows(tt2, 6);
int TempTableRowCount13 = GetTempTableRowCount(tt2); // This will have the same value as TempTableRowCount10
string tt3 = "DMTEST3";
AddTempTable(tt3, true); // The TRUE argument which does a TRUNCATE against the DMTEST3 table is the problem
int TempTableRowCount14 = GetTempTableRowCount(tt1); // This will return 0, it should be = TempTableRowCount10
int TempTableRowCount15 = GetTempTableRowCount(tt2); // This will return 0, it should be = TempTableRowCount13
_tran.Commit();
_tran = null;
int TempTableRowCount20 = GetTempTableRowCount(tt1); // This should be 0 because the transaction was committed
int TempTableRowCount21 = GetTempTableRowCount(tt2); // and the temp tables are defined as "ON COMMIT DELETE ROWS"
}
private void AddTempTable(string TableName, bool Truncate)
{
IDbCommand ocmd = new OracleCommand();
ocmd.Connection = _oc;
if (!TableExists(TableName))
{
ocmd.CommandText = string.Format("CREATE GLOBAL TEMPORARY TABLE {0} ({1}) ON COMMIT DELETE ROWS", TableName, "FIELD1 Float");
int rc = ocmd.ExecuteNonQuery();
}
if (Truncate)
{
ocmd.CommandText = "TRUNCATE TABLE " + TableName;
int rc = ocmd.ExecuteNonQuery();
}
}
In Oracle, you don't create global temporary tables at runtime. You create them once when you deploy the system. Each session gets its own "copy" of the temp table automatically.
Also, if you can avoid the TRUNCATE I'd recommend it - i.e. if you can rely on the ON COMMIT DELETE ROWS which causes the data to disappear when you commit, then that's the most efficient way.
To answer your other question ("CREATE GLOBAL TEMPORARY doesn't seem to commit") - I tried it myself, and it seems to me that CREATE GLOBAL TEMPORARY does indeed commit. My test case:
create global temporary table test1 (n number) on commit delete rows;
insert into test1 values (1);
--Expected: 1
select count(*) from test1;
commit;
--Expected: 0
select count(*) from test1;
insert into test1 values (2);
--Expected: 1
select count(*) from test1;
create global temporary table test2 (n number) on commit delete rows;
--Expected: 0
select count(*) from test1;
commit;
--Expected: 0
select count(*) from test1;