I am building application with online database. The database is not on my computer. It is in myasp.net server.
I've got two questions:
This application collects data and after get all, data needs to be sent to this online database. I am open to any solution, including frameworks etc, but I must say that Entity Framework is very slow in my case. My data collection application creating file with inserts values. F.e:
(4880775 , 18196 , 9 , 1),
(4880775 , 9026 , 8.49 , 2),
(4880775 , 4009 , 9.99 , 3),
This file could have (in future) at least 10 mln rows. I have done two tests. One is insert ten times 990 rows using pure SQL query (in VS 2013 right click on database -> new query) and this was something like this:
declare #i int
set #i = 0
while #i < 10
begin
set #i += 1
INSERT INTO daneProduktu VALUES
(4880775 , 18196 , 9 , 1),
(4880775 , 9026 , 8.49 , 2),
(4880775 , 4009 , 9.99 , 3),
...
...
end
And second option was doing the same thing using c# code. I have used Entity Framework
using (var context = new sklepyEntities())
{
context.Database.ExecuteSqlCommand(sb.ToString());
}
and SqlCommand object
using (SqlCommand cmd = new SqlCommand(sb.ToString(), connection))
{
cmd.ExecuteNonQuery();
}
Full code for c# SqlCommand is:
Stopwatch st = new Stopwatch();
st.Start();
for (int i = 0; i < 10; i++)
{
sb.Clear();
sb.Append("INSERT INTO daneProduktu VALUES ");
r = new StreamReader(sciezka);
while ((line = r.ReadLine()) != null)
{
licznik++;
sb.Append(Environment.NewLine);
sb.Append(line);
}
sb.Length--;
sb.Append(";");
using (SqlCommand cmd = new SqlCommand(sb.ToString(), connection))
{
cmd.ExecuteNonQuery();
}
}
sb.Length--;
sb.Append(";");
using (SqlCommand cmd = new SqlCommand(sb.ToString(), connection))
{
cmd.ExecuteNonQuery();
}
st.Stop();
Both are working, but it is so slow..
To compare timings:
Pure SQL query - ~3s
C# SQL using SqlCommand - ~13s
There was special prepare file with 990 insert values. I was using the same values in both cases.
Why code using option is so slow? Is there any way to make it faster? OFC using pure inserts is not only option. I can do anything else. I can prepare XML file for this, csv or anything else if this could be faster.
Every time, before i do inserts from 1st point, I need to clear table. I was reading about shrinking, that is not good, so I choose to drop and recreate table. After this action there is no less space usage, but when i filling table with inserts, space remains the same. Also I will not need to roll back anything from this table. Is this good way? Or maybe Truncate table will be better?
What I've heard is that sometimes Insert INTO VALUES for many rows is not always the fastest. Have you tried:
INSERT INTO yourTable
SELECT 'Value1'
UNION ALL
SELECT 'Value2'
UNION ALL
SELECT 'value3
etc...
I've tried few ways to make this insert, and sqlBulkCopy was fastest. My code is:
using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
{
sqlBulkCopy.BulkCopyTimeout = 600;
sqlBulkCopy.DestinationTableName = "daneProduktu";
sqlBulkCopy.ColumnMappings.Add("numerProduktu", "numerProduktu");
sqlBulkCopy.ColumnMappings.Add("numerSklepu", "numerSklepu");
sqlBulkCopy.ColumnMappings.Add("cena", "cena");
sqlBulkCopy.ColumnMappings.Add("pozycjaCeneo", "pozycjaCeneo");
sqlBulkCopy.WriteToServer(dt);
}
I'am also using SQL transaction here.
Some of you could say "Use BatchSize". I tried, but this option make this insert slower. I must say, that I've got bad upload. I make some time measurments:
BatchSize in Time
0 in 278768ms
500 in 1207129ms
1000 in 817399ms
1500 in 629146ms
2000 in 531632ms
2500 in 480200ms
3000 in 451510ms
3500 in 446899ms
4000 in 407875ms
4500 in 405808ms
5000 in 387078ms
5500 in 360508ms
10000 in 327231ms
20000 in 305282ms
30000 in 305936ms
40000 in 304494ms
50000 in 303541ms
60000 in 303723ms
80000 in 310058ms
100000 in 297835ms
As you can see, 0 batch size is fastest.
To answer my question about INSERT INTO Values():
Probably this all was about sending those inserts via internet connection. In first case I send ONE order, and loop was done on SQL server. In second case, probably I have send 10 orders to server.
Related
I have a piece of C# code, which updates two specific columns for ~1000x20 records in a database on the localhost. As I know (though I am really far from being a database expert), it should not take long, but it takes more than 5 minutes.
I tried SQL Transactions, with no luck. SqlBulkCopy seems a bit overkill, since it's a large table with dozens of columns, and I only have to update 1/2 column for a set of records, so I would like to keep it simple. Is there a better approach to improve efficiency?
The code itself:
public static bool UpdatePlayers(List<Match> matches)
{
using (var connection = new SqlConnection(Database.myConnectionString))
{
connection.Open();
SqlCommand cmd = connection.CreateCommand();
foreach (Match m in matches)
{
cmd.CommandText = "";
foreach (Player p in m.Players)
{
// Some player specific calculation, which takes almost no time.
p.Morale = SomeSpecificCalculationWhichMilisecond();
p.Condition = SomeSpecificCalculationWhichMilisecond();
cmd.CommandText += "UPDATE [Players] SET [Morale] = #morale, [Condition] = #condition WHERE [ID] = #id;";
cmd.Parameters.AddWithValue("#morale", p.Morale);
cmd.Parameters.AddWithValue("#condition", p.Condition);
cmd.Parameters.AddWithValue("#id", p.ID);
}
cmd.ExecuteNonQuery();
}
}
return true;
}
Updating 20,000 records one at a time is a slow process, so taking over 5 minutes is to be expected.
From your query, I would suggest putting the data into a temp table, then joining the temp table to the update. This way it only has to scan the table to update once, and update all values.
Note: it could still take a while to do the update if you have indexes on the fields you are updating and/or there is a large amount of data in the table.
Example update query:
UPDATE P
SET [Morale] = TT.[Morale], [Condition] = TT.[Condition]
FROM [Players] AS P
INNER JOIN #TempTable AS TT ON TT.[ID] = P.[ID];
Populating the temp table
How to get the data into the temp table is up to you. I suspect you could use SqlBulkCopy but you might have to put it into an actual table, then delete the table once you are done.
If possible, I recommend putting a Primary Key on the ID column in the temp table. This may speed up the update process by making it faster to find the related ID in the temp table.
Minor improvements;
use a string builder for the command text
ensure your parameter names are actually unique
clear your parameters for the next use
depending on how many players in each match, batch N commands together rather than 1 match.
Bigger improvement;
use a table value as a parameter and a merge sql statement. Which should look something like this (untested);
CREATE TYPE [MoraleUpdate] AS TABLE (
[Id] ...,
[Condition] ...,
[Morale] ...
)
GO
MERGE [dbo].[Players] AS [Target]
USING #Updates AS [Source]
ON [Target].[Id] = [Source].[Id]
WHEN MATCHED THEN
UPDATE SET SET [Morale] = [Source].[Morale],
[Condition] = [Source].[Condition]
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(...));
dt.Columns.Add("Morale", typeof(...));
dt.Columns.Add("Condition", typeof(...));
foreach(...){
dt.Rows.Add(p.Id, p.Morale, p.Condition);
}
SqlParameter sqlParam = cmd.Parameters.AddWithValue("#Updates", dt);
sqlParam.SqlDbType = SqlDbType.Structured;
sqlParam.TypeName = "dbo.[MoraleUpdate]";
cmd.ExecuteNonQuery();
You could also implement a DbDatareader to stream the values to the server while you are calculating them.
I am working on an application which gets data from two different databases (i.e Database1.Table1 and Database2.Table2) then it compares these two tables ( comparision done only with the primary key i-e ID ) and insert rows from Database1.Table1 to Database2.Table2 if it does not exists in Database2.Table2
The problem is that there is a huge amount of data (about 0.8 Million in both tables ) and it takes a lot of time in comparision. Is there any way to do this Fast
NOTE: I am using Datatable in C# to compare there tables Code is given below
DataTable Database1_Table1;// = method to get all data from Database1.Table1
DataTable Database2_Table2;// = method to get all data from Database2.Table2
foreach (DataRow row in Database1_Table1.Rows) //(var GoodClass in Staging_distinct2)
{
if (Database2_Table2.Select("ID=" + row["ID"]).Count() < 1)
{
sqlComm = new SqlCommand("Delete from Database1.Table1 where Id=" + row["ID"], conn);
sqlComm.ExecuteNonQuery();
sqlComm = new SqlCommand("INSERT INTO Database2.Table2 Values (#ID,#EmpName,#Email,#UserName)", conn);
sqlComm.Parameters.Add("#ID", SqlDbType.Int).Value = row["ID"];
sqlComm.Parameters.Add("#EmpName", SqlDbType.VarChar).Value = row["EmpName"];
sqlComm.Parameters.Add("#Email", SqlDbType.VarChar).Value = row["Email"];
sqlComm.Parameters.Add("#UserName", SqlDbType.VarChar).Value = row["UserName"];
sqlComm.ExecuteNonQuery();
totalCount++;
added++;
}
else
{
deleted++;
totalCount++;
}
}
Submit this SQL from your application to the database:
INSERT INTO Database1..Table1 (Key, Column1,Column2)
SELECT Key, Column1,Column2
FROM Database2..Table2
WHERE NOT EXISTS (
SELECT * FROM Database1..Table1
WHERE Database1..Table1.Key = Database1..Table2.Key
)
It will copy all rows that don't match on column Key from Database..Table2 to Database..Table1
It will do it on the database server. No needless round trip of data. No RBAR (Row By Agonising Row). The only downside is you can't get a progress bar - do it asynchronously.
Bulk update/insert is the fastest way. (sqlbulk copy)
http://www.jarloo.com/c-bulk-upsert-to-sql-server-tutorial/
Best way to handle this is to bulk insert into a temp table then issue a merge statement from that temp table into your production table. I do this with millions of rows a day without issue. I have an example of the technique on my blog C# Sql Server Bulk Upsert
Is it possible to use SqlBulkcopy with Sql Compact Edition e.g. (*.sdf) files?
I know it works with SQL Server 200 Up, but wanted to check CE compatibility.
If it doesnt does anyone else know the fastest way of getting a CSV type file into SQL Server CE without using DataSets (puke here)?
BULKCOPY is not supported in SQL CE. Here is the fastest way if you have a huge number of rows in your table; insert is too slow!
using (SqlCeConnection cn = new SqlCeConnection(yourConnectionString))
{
if (cn.State == ConnectionState.Closed)
cn.Open();
using (SqlCeCommand cmd = new SqlCeCommand())
{
cmd.Connection = cn;
cmd.CommandText = "YourTableName";
cmd.CommandType = CommandType.TableDirect;
using (SqlCeResultSet rs = cmd.ExecuteResultSet(ResultSetOptions.Updatable | ResultSetOptions.Scrollable))
{
SqlCeUpdatableRecord record = rs.CreateRecord();
using (var sr = new System.IO.StreamReader(yourTextFilePath))
{
string line;
while ((line = sr.ReadLine()) != null)
{
int index = 0;
string[] values = line.Split('\t');
//write these lines as many times as the number of columns in the table...
record.SetValue(index, values[index++] == "NULL" ? null : values[index - 1]);
record.SetValue(index, values[index++] == "NULL" ? null : values[index - 1]);
record.SetValue(index, values[index++] == "NULL" ? null : values[index - 1]);
rs.Insert(record);
}
}
}
}
}
Benchmark: table with 34370 rows
with inserts: 38 rows written per second
this way: 260 rows written per second
I have a SqlCeBulkCopy library here: http://sqlcebulkcopy.codeplex.com - even support IEnumerable.
Is possible to increase a lot this kind of operation.
To turn this operation usefull (I mean fast and pretty safe), u can use CE DataAdapter.
By sample, no care about keys, the steps listed bellow can help u:
Make sure that sorce and target tables have same fields structure;
Clone a virtual datatable with a datatable from source database (your select);
Create a CE command with the table name as commandtext (TableDirect as commandtype);
Create a CE dataadapter from CE command;
Create a CE commandbuilder from CE dataatapter;
Pass the Insert command from CE commandbuilder to CE dataadapter;
Copy "n" batch rows from your source datatable to the target datatable (the clone), doing something like this:
'... previous codes
For Each currentRow In sourceTable.Rows
'u can do RaiseEvent Processing(currentRow, totalRows) here with DoEvents
If targetTable.Rows.Count < 100 Then
targetTable.InportRow(currentRow)
targetTable.Rows(targetTable.Rows.Count - 1).SetAdded
Else
'...Here you wll call the CE DataAdapter's Update method (da.Update(targetTable))
'...and then be sure you clone the targetTable again, erasing all previous rows.
'...Do a clone again, don't do just a "clear" in the Rows collection.
'...If u have an Autoincrement it will break all Foreign Keys.
End If
Next
'... next codes
With this way u can update several rows with no much time.
I've some applications using this method and the average rate is about 1500 rows per second in a table with 5 NTEXT fields (slow) and 800000 rows.
Of course, all depends of your table's structure. IMAGE and NTEXT are both slow datatypes.
P.S.: As I said, this method don't care so much about keys, so be carefull.
No, I don't think that SqlBulkCopy is supported (see MSDN). Perhaps throw the data in as xml and strip it apart at the server? SQL/XML is pretty good in 2005/2008.
You might also want to look at table-value-parameters, but I doubt that CE supports these.
The company that I work for has large databases, millions of records in a single table. I have written a C# program that migrates tables between remote servers.
I first create all the tables using SMO without copying data and then the data insertion is done after all the tables have been created.
During the record insertion since there are so many records the console window remains blank until all the rows have been inserted. Due to the sheer volumes of data this takes a long time.
What I want now is a way to print n rows updated like in MSSQL import export data wizard.
The insert part is just a simple insert into select * query.
It sounds like you might be using SqlCommands, if so here is a sample
using (SqlConnection connection = new SqlConnection(Connection.ConnectionString) )
{
using(SqlCommand command = new SqlCommand("insert into OldCustomers select * from customers",connection))
{
connection.Open();
var numRows = command.ExecuteNonQuery();
Console.WriteLine("Affected Rows: {0}",numRows);
}
}
You definitely need to look on OUTPUT clause. There are useful examples on MSDN.
using (SqlConnection conn = new SqlConnection(connectionStr) )
{
var sqlCmd = "
CREATE TABLE #tmp (
InsertedId BIGINT
);
INSERT INTO TestTable
OUTPUT Inserted.Id INTO #tmp
VALUES ....
SELECT COUNT(*) FROM #tmp";
using(SqlCommand cmd = new SqlCommand(sqlCmd,conn))
{
conn .Open();
var numRows = command.ExecuteNonQuery();
Console.WriteLine("Affected Rows: {0}",numRows);
}
}
Also I suggest to use stored procedure for such purposes.
I am using C# to import a CSV with 6-8million rows.
My table looks like this:
CREATE TABLE [Data] ([ID] VARCHAR(100) NULL,[Raw] VARCHAR(200) NULL)
CREATE INDEX IDLookup ON Data(ID ASC)
I am using System.Data.SQLite to do the import.
Currently to do 6 millions rows its taking 2min 55 secs on a Windows 7 32bit, Core2Duo 2.8Ghz & 4GB RAM. That's not too bad but I was just wondering if anyone could see a way of importing it quicker.
Here is my code:
public class Data
{
public string IDData { get; set; }
public string RawData { get; set; }
}
string connectionString = #"Data Source=" + Path.GetFullPath(AppDomain.CurrentDomain.BaseDirectory + "\\dbimport");
System.Data.SQLite.SQLiteConnection conn = new System.Data.SQLite.SQLiteConnection(connectionString);
conn.Open();
//Dropping and recreating the table seems to be the quickest way to get old data removed
System.Data.SQLite.SQLiteCommand command = new System.Data.SQLite.SQLiteCommand(conn);
command.CommandText = "DROP TABLE Data";
command.ExecuteNonQuery();
command.CommandText = #"CREATE TABLE [Data] ([ID] VARCHAR(100) NULL,[Raw] VARCHAR(200) NULL)";
command.ExecuteNonQuery();
command.CommandText = "CREATE INDEX IDLookup ON Data(ID ASC)";
command.ExecuteNonQuery();
string insertText = "INSERT INTO Data (ID,RAW) VALUES(#P0,#P1)";
SQLiteTransaction trans = conn.BeginTransaction();
command.Transaction = trans;
command.CommandText = insertText;
Stopwatch sw = new Stopwatch();
sw.Start();
using (CsvReader csv = new CsvReader(new StreamReader(#"C:\Data.txt"), false))
{
var f = csv.Select(x => new Data() { IDData = x[27], RawData = String.Join(",", x.Take(24)) });
foreach (var item in f)
{
command.Parameters.AddWithValue("#P0", item.IDData);
command.Parameters.AddWithValue("#P1", item.RawData);
command.ExecuteNonQuery();
}
}
trans.Commit();
sw.Stop();
Debug.WriteLine(sw.Elapsed.Minutes + "Min(s) " + sw.Elapsed.Seconds + "Sec(s)");
conn.Close();
This is quite fast for 6 million records.
It seems that you are doing it the right way, some time ago I've read on sqlite.org that when inserting records you need to put these inserts inside transaction, if you don't do this your inserts will be limited to only 60 per second! That is because each insert will be treated as a separate transaction and each transaction must wait for the disk to rotate fully. You can read full explanation here:
http://www.sqlite.org/faq.html#q19
Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive. A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second.
Comparing your time vs Average stated above: 50,000 per second => that should take 2m 00 sec. Which is only little faster than your time.
Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe. For details, read about atomic commit in SQLite..
By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.
There is some hint in next paragraph that you could try to speed up the inserts:
Another option is to run PRAGMA synchronous=OFF. This command will cause SQLite to not wait on data to reach the disk surface, which will make write operations appear to be much faster. But if you lose power in the middle of a transaction, your database file might go corrupt.
I always thought that SQLite was designed for "simple things", 6 millions of records seems to me is a job for some real database server like MySQL.
Counting records in a table in SQLite with so many records can take long time, just for your information, instead of using SELECT COUNT(*), you can always use SELECT MAX(rowid) which is very fast, but is not so accurate if you were deleting records in that table.
EDIT.
As Mike Woodhouse stated, creating the index after you inserted the records should speed up the whole thing, that is a common advice in other databases, but can't say for sure how it works in SQLite.
One thing you might try is to create the index after the data has been inserted - typically it's much faster for databases to build indexes in a single operation than to update it after each insert (or transaction).
I can't say that it'll definitely work with SQLite, but since it only needs two lines to move it's worth trying.
I'm also wondering if a 6 million row transaction might be going too far - could you change the code to try different transaction sizes? Say 100, 1000, 10000, 100000? Is there a "sweet spot"?
You can gain quite some time when you bind your parameters in the following way:
...
string insertText = "INSERT INTO Data (ID,RAW) VALUES( ? , ? )"; // (1)
SQLiteTransaction trans = conn.BeginTransaction();
command.Transaction = trans;
command.CommandText = insertText;
//(2)------
SQLiteParameter p0 = new SQLiteParameter();
SQLiteParameter p1 = new SQLiteParameter();
command.Parameters.Add(p0);
command.Parameters.Add(p1);
//---------
Stopwatch sw = new Stopwatch();
sw.Start();
using (CsvReader csv = new CsvReader(new StreamReader(#"C:\Data.txt"), false))
{
var f = csv.Select(x => new Data() { IDData = x[27], RawData = String.Join(",", x.Take(24)) });
foreach (var item in f)
{
//(3)--------
p0.Value = item.IDData;
p1.Value = item.RawData;
//-----------
command.ExecuteNonQuery();
}
}
trans.Commit();
...
Make the changes in sections 1, 2 and 3.
In this way parameter binding seems to be quite a bit faster.
Especially when you have a lot of parameters, this method can save quite some time.
I did a similar import, but I let my c# code just write the data to a csv first and then ran the sqlite import utility. I was able to import over 300million records in a matter of maybe 10 minutes this way.
Not sure if this can be done directly from c# or not though.