We are currently developing an application that generates upwards of 5-10,000 rows of data in a particular table for each user session. Currently we are using sql text commands to insert each row of data at a time so a save operation could take up to a minute. We are playing around with the use of SqlBulkInserts and have seen the time go down to less than 500ms. Does anyone have any objection with the use of SqlBulkInserts in a production application where many users will be using the system?
I have never ran into an issue with SqlBulkCopy with the tableLock option set and another user being blocked due to it. The TableLock option increases the efficiency of the insert from what many people have talked about and just plain using it have shown me.
My typical method:
public void Bulk(String connectionString, DataTable data, String destinationTable)
{
using (SqlConnection connection = new SqlConnection(connectionString))
{
using (SqlBulkCopy bulkCopy =
new SqlBulkCopy
(
connection,
SqlBulkCopyOptions.TableLock |
SqlBulkCopyOptions.FireTriggers |
SqlBulkCopyOptions.UseInternalTransaction,
null
))
{
bulkCopy.BatchSize = data.Rows.Count;
bulkCopy.DestinationTableName = String.Format("[{0}]", destinationTable);
connection.Open();
bulkCopy.WriteToServer(data);
}
}
}
Before implementing using SqlBulkInsert, try creating your INSERT query dynamically to look like this:
insert into MyTable (Column1, Column2)
select 123, 'abc'
union all
select 124, 'def'
union all
select 125, 'yyy'
union all
select 126, 'zzz'
This will be only one database call, which should run much more quickly. For the SQL string concatenation, make sure you use the StringBuilder class.
I think it's the right way to go, if your application really needs to produce that many records per session.
Related
I am building an application in which I will producing some reports based off the results of some SQL queries executed against a number of different databases and servers. Since I am unable to create stored procedures on each server, I have my SQL scripts saved locally, load them into my C# application and execute them against each server using ADO.NET. All of the SQL scripts are selects that return tables, however, some of them are more complicated than others and involve multiple selects into table variables that get joined on, like the super basic example below.
My question is, using ADO.NET, is it possible to assign a string of multiple SQL queries that ultimately only returns a single data table to a SqlCommand object - e.g. the two SELECT statements below comprising my complete script? Or would I have to create a transaction and execute each individual query separately as its own command?
-- First Select
SELECT *
INTO #temp
FROM Table1;
--Second Select
SELECT *
FROM Table1
JOIN #temp
ON Table1.Id = #temp.Id;
Additionally, some of my scripts have comments embedded in them like the rudimentary example above - would these need to be removed or are they effectively ignored within the string? This seems to be working with single queries, in other words the "--This is a comment" is effectively ignored.
private void button1_Click(object sender, EventArgs e)
{
string ConnectionString = "Server=server1;Database=test1;Trusted_Connection=True";
using (SqlConnection conn = new SqlConnection(ConnectionString))
{
SqlCommand cmd = new SqlCommand("--This is a comment \n SELECT TOP 10 * FROM dbo.Tablw1;");
DataTable dt = new DataTable();
SqlDataAdapter sqlAdapt = new SqlDataAdapter(cmd.CommandText.ToString(), conn);
sqlAdapt.Fill(dt);
MessageBox.Show(dt.Rows.Count.ToString());
}
}
Yes, that is absolutely fine. Comments are ignored. It should work fine. The only thing to watch is the scopin of temporary tables - if you are used to working with stored procedures, the scope is temporary (they are removed when the stored procedure ends); with direct commands: it isn't - they are connection-specific but survive between multiple operations. If that is a problem, take a look at "table variables".
Note: technically this is up to the backend provider; assuming you are using a standard database engine, you'll be OK. If you are using something exotic, then it might be a genuine question. For example, it might not work on "Bob's homemade OneNote ADO.NET provider".
Yes, you can positively do it.
You can play with different types of collections, or with string Builder for passing queries even you can put the string variable and assign the query to it.
While the loop is running put in temp table or CTE, its totally depends on you to choose the approach. and add the data to datatable.
So if you want the entire data to be inserted or Updated or deleted then you can go for transaction,it won't be any issue.
I don't use ado.net, I use Entity Framework but I think this is more a SQL question than an ADO.NET question; Forgive me if I'm wrong. Provided you are selecting from Table1 in both queries I think you should use this query instead.
select *
from Table1 tbl1
join Table1 tbl2
on tbl1.id = tbl2.id
Actually I really don't ever see a reason you would have to move things into temp tables with options like Common Table Expressions available to you.
look up CTEs if you don't already know about them
https://www.simple-talk.com/sql/t-sql-programming/sql-server-cte-basics/
I have a sql connection that I have to hit the database anywhere from 500 to 10,000 times a second. After about 250 per second things start to slow down and then the app gets so far behind it crashes.
I was thinking about putting the database into a dictionary. I need the fastest performance I can get. Currently the ado.net takes about 1 to 2 milliseconds but something happens that causes a bottleneck.
Is there anything wrong with the below syntax for the 10k queries per second? is a dictionary going to work? we are talking about 12 million records and I need to be able to search it within 1 to 5 milliseconds. I also have another collection in the database that has 50 million records so I'm not sure how to store it. any suggestions will be great.
The SQL db has 128 gb memory and 80 processors and the app is on the same server on the Sql server 2012
using (SqlConnection sqlconn = new SqlConnection(sqlConnection.SqlConnectionString()))
{
using (SqlCommand sqlcmd = new SqlCommand("", sqlconn))
{
sqlcmd.CommandType = System.Data.CommandType.StoredProcedure;
sqlcmd.Parameters.Clear();
sqlcmd.CommandTimeout = 1;
sqlconn.Open();
using (SqlDataReader sqlDR = sqlcmd.ExecuteReader(CommandBehavior.CloseConnection))
public static string SqlConnectionString()
{
return string.Format("Data Source={0},{1};Initial Catalog={2};User ID={3};Password={4};Application Name={5};Asynchronous Processing=true;MultipleActiveResultSets=true;Max Pool Size=524;Pooling=true;",
DataIP, port, Database, username, password, IntanceID);
}
the code below the datareader is
r.CustomerInfo = new CustomerVariable();
r.GatewayRoute = new List<RoutingGateway>();
while (sqlDR.Read() == true)
{
if (sqlDR["RateTableID"] != null)
r.CustomerInfo.RateTable = sqlDR["RateTableID"].ToString();
if (sqlDR["EndUserCost"] != null)
r.CustomerInfo.IngressCost = sqlDR["EndUserCost"].ToString();
if (sqlDR["Jurisdiction"] != null)
r.CustomerInfo.Jurisdiction = sqlDR["Jurisdiction"].ToString();
if (sqlDR["MinTime"] != null)
r.CustomerInfo.MinTime = sqlDR["MinTime"].ToString();
if (sqlDR["interval"] != null)
r.CustomerInfo.interval = sqlDR["interval"].ToString();
if (sqlDR["code"] != null)
r.CustomerInfo.code = sqlDR["code"].ToString();
if (sqlDR["BillBy"] != null)
r.CustomerInfo.BillBy = sqlDR["BillBy"].ToString();
if (sqlDR["RoundBill"] != null)
r.CustomerInfo.RoundBill = sqlDR["RoundBill"].ToString();
}
sqlDR.NextResult();
Don't close and re-open the connection, you can keep it open between requests. Even if you have connection pooling turned on, there is certain overhead, including a brief critical section to prevent concurrency issues when seizing a connection from the pool. May as well avoid that.
Ensure your stored procedure has SET NOCOUNT ON to reduce chattiness.
Ensure you are using the minimum transaction isolation level you can get away with, e.g. dirty reads a.k.a NOLOCK. You can set this at the client end at the connection level or within the stored procedure itself, which ever you're more comfortable with.
Profile these transactions to ensure the bottleneck is on the client. Could be on the DB server or on the network.
If this is a multithreaded application (e.g. on the web), check your connection pool settings and ensure it's large enough. There's a PerfMon counter for this.
Access your fields by ordinal using strongly typed getters, e.g. GetString(0) or GetInt32(3).
Tweak the bejesus out of your stored procedure and indexes. Could write a book on this.
Reindex your tables during down periods, and fill up the index pages if this is a fairly static table.
If the purpose of the stored procedure is to retrieve a single row, try adding TOP 1 to the query so that it will stop loking after the first row is found. Also, consider using output parameters instead of a resultset, which incurs a little less overhead.
A dictionary could potentially work but it depends on the nature of the data, how you are searching it, and how wide the rows are. If you update your question with more information I'll edit my answer.
If you're going to be accessing the DataReader in a loop, then you should find the indexes outside the loop, then use them inside of the loop. You might also do better to use the strongly-typed accesors.
Well, if you have already measured that the ADO command takes only a couple of milliseconds, the other possible cause of delay is the string.Format to build the connectionstring
I would try to remove the string.Format that is called for every
using(SqlConnection cn = new SqlConnection(sqlConnection.SqlConnectionString()))
Instead, supposing the SqlConnectionString is in a separate class you could write
private static string conString = string.Empty;
public static string SqlConnectionString()
{
if(conString == "")
conString = string.Format("............");
return conString;
}
Of course, a benchmark could rule out this, but I am pretty sure that strings operations like that are costly
Seeing your comments below another thing very important to add is the correct declaration of your parameters. Instead of using AddWithValue (convenient, but with tricky side effects) declare your parameters with the correct size
using (SqlCommand sqlcmd = new SqlCommand("", sqlconn))
{
sqlcmd.CommandType = System.Data.CommandType.StoredProcedure;
sqlcmd.CommandText = mySql.GetLCR();
SqlParameter p1 = new SqlParameter("#GatewayID", SqlDbType.NVarChar, 20).Value = GatewayID;
SqlParameter p2 = new SqlParameter("#DialNumber", SqlDbType.NVarChar, 20).Value = dialnumber;
sqlCmd.Parameters.AddRange(new SqlParameter[] {p1, p2});
sqlcmd.CommandTimeout = 1;
sqlconn.Open();
.....
}
The AddWithValue is not recommended when you need to squeeze every milliseconds of performance. This very useful article explain why passing a string with AddWithValue destroy the works made by the optimizer of Sql Server. (In short, the optimizer calculates and stores a query plan for your command and, if it receives another identical command, it reuse the calculated query plan. But if you pass a string with addwithvalue, the size of the parameter is calculated every time based on the actual passed string length. The optimizer cannot reuse the query plan and recalculates and stores a new one)
"I need the fastest performance I can get."
If you haven't done so already, review your business requirements, and how your application interacts with your data warehouse. If you have done this already, then please disregard this posting.
It has been my experience that:
The fact that you are even executing a SQL query against a database means that you have an expense - queries cost time/cpu/memory.
Queries are even more expensive if they include write operations.
The easiest way to save money, is not to spend it! So look for ways to:
avoid querying the database in the first place
ensure that queries execute as quickly as possible
STRATEGIES
Make sure you are using the database indexes properly.
Avoid SQL queries that result in a full table scan.
Use connection pooling.
If you are inserting data into the database, then use bulk uploads.
Use caching where appropriate. Options include:
caching results in memory (i.e. RAM)
caching results to disk
pre-render results ahead of time an read them instead of executing a new query
instead of mining raw data with each query, consider generating summary data that could be queried instead.
Partition your data. This can occur on several levels:
most enterprise databases support partitioning strategies
by reviewing your business model, you can partition your data across several databases (i.e. read/write operations against one DB, write operations against another DB).
Review your application's design and then measure response times to confirm that the bottle neck is in fact where you believe it is.
CACHING TECHNOLOGIES
Asp.net - caching
memcached
redis
etc.
DISCLAIMER: I am not a database administrator (DBA).
I don't think the issue is the string.format
Result is:
108 ms for the format
1416 ms for the open
5176 ms for the execute
and the whole thing is 6891 ms
run this, VERY simple test!
namespace ConsoleApplication1
{
class Program
{
private static string DataIP;
private static string Database;
private static string IntanceID;
static void Main(string[] args)
{
DataIP = #"FREDOU-PC\SQLEXPRESS"; Database = "testing"; IntanceID = "123";
int count = 0;
System.Diagnostics.Stopwatch swWholeThing = System.Diagnostics.Stopwatch.StartNew();
System.Diagnostics.Stopwatch swFormat = new System.Diagnostics.Stopwatch();
System.Diagnostics.Stopwatch swOpen = new System.Diagnostics.Stopwatch();
System.Diagnostics.Stopwatch swExecute = new System.Diagnostics.Stopwatch();
for (int i = 0; i < 100000; ++i)
{
using (System.Data.SqlClient.SqlConnection sqlconn = new System.Data.SqlClient.SqlConnection(SqlConnectionString(ref swFormat)))
{
using (System.Data.SqlClient.SqlCommand sqlcmd = new System.Data.SqlClient.SqlCommand("dbo.counttable1", sqlconn))
{
sqlcmd.CommandType = System.Data.CommandType.StoredProcedure;
sqlcmd.Parameters.Clear();
swOpen.Start();
sqlconn.Open();
swOpen.Stop();
swExecute.Start();
using (System.Data.SqlClient.SqlDataReader sqlDR = sqlcmd.ExecuteReader(System.Data.CommandBehavior.CloseConnection))
{
if (sqlDR.Read())
count += sqlDR.GetInt32(0);
}
swExecute.Stop();
}
}
}
swWholeThing.Stop();
System.Console.WriteLine("swFormat: " + swFormat.ElapsedMilliseconds);
System.Console.WriteLine("swOpen: " + swOpen.ElapsedMilliseconds);
System.Console.WriteLine("swExecute: " + swExecute.ElapsedMilliseconds);
System.Console.WriteLine("swWholeThing: " + swWholeThing.ElapsedMilliseconds + " " + count);
System.Console.ReadKey();
}
public static string SqlConnectionString(ref System.Diagnostics.Stopwatch swFormat)
{
swFormat.Start();
var str = string.Format("Data Source={0};Initial Catalog={1};Integrated Security=True;Application Name={2};Asynchronous Processing=true;MultipleActiveResultSets=true;Max Pool Size=524;Pooling=true;",
DataIP, Database, IntanceID);
swFormat.Stop();
return str;
}
}
}
dbo.counttable1 stored procedure:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create PROCEDURE dbo.counttable1
AS
BEGIN
SET NOCOUNT ON;
SELECT count(*) as cnt from dbo.Table_1
END
GO
dbo.table_1
USE [testing]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Table_1](
[id] [int] NOT NULL,
CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
content:
insert into dbo.Table_1 (id) values (1)
insert into dbo.Table_1 (id) values (2)
insert into dbo.Table_1 (id) values (3)
insert into dbo.Table_1 (id) values (4)
insert into dbo.Table_1 (id) values (5)
insert into dbo.Table_1 (id) values (6)
insert into dbo.Table_1 (id) values (7)
insert into dbo.Table_1 (id) values (8)
insert into dbo.Table_1 (id) values (9)
insert into dbo.Table_1 (id) values (10)
If you are handling millions of records and hitting the database anywhere from 500 to 10,000 times a second. I will recommend to create handler file (API) for data retrieving and you can find Load testing tools to test the API performance.
By using memcache performance can be increase, following are the step to implement the memcache
You have to create a window service that will retrieve data from database and store in memcache in JSON format as (key value pair).
For website create a handler file as an API that will retrieve data from memcache and display the result.
I have implemented this in one of my project it retrieves thousands of data in milliseconds
I am doing a loop insert as seen below(Method A), it seems that calling the database with every single loop isn't a good idea. I found an alternative is to loop a comma-delimited string in my SProc instead to do the insert so to have only one entry to the DB. Will be any significant improvement in terms of performance? :
Method A:
foreach (DataRow row in dt.Rows)
{
userBll = new UserBLL();
UserId = (Guid)row["UserId"];
// Call userBll method to insert into SQL Server with UserId as one of the parameter.
}
Method B:
string UserIds = "Tom, Jerry, 007"; // Assuming we already concatenate the strings. So no loops this time here.
userBll = new UserBLL();
// Call userBll method to insert into SQL Server with 'UserIds' as parameter.
Method B SProc / Perform a loop insert in the SProc.
if right(rtrim(#UserIds ), 1) <> ','
SELECT #string = #UserIds + ','
SELECT #pos = patindex('%,%' , #UserIds )
while #pos <> 0
begin
SELECT #piece = left(#v, (#pos-1))
-- Perform the insert here
SELECT #UserIds = stuff(#string, 1, #pos, '')
SELECT #pos = patindex('%,%' , #UserIds )
end
Less queries usually mean faster processing. That said, a co-worker of mine had some success with .NET Framework's wrapper of the TSQL BULK INSERT, which is provided by the Framework as SqlBulkCopy.
This MSDN blog entry shows how to use it.
The main "API" sample is this (taken from the linked article as-is, it writes the contents of a DataTable to SQL):
private void WriteToDatabase()
{
// get your connection string
string connString = "";
// connect to SQL
using (SqlConnection connection =
new SqlConnection(connString))
{
// make sure to enable triggers
// more on triggers in next post
SqlBulkCopy bulkCopy =
new SqlBulkCopy
(
connection,
SqlBulkCopyOptions.TableLock |
SqlBulkCopyOptions.FireTriggers |
SqlBulkCopyOptions.UseInternalTransaction,
null
);
// set the destination table name
bulkCopy.DestinationTableName = this.tableName;
connection.Open();
// write the data in the "dataTable"
bulkCopy.WriteToServer(dataTable);
connection.Close();
}
// reset
this.dataTable.Clear();
this.recordCount = 0;
}
The linked article explains what needs to be done to leverage this mechanism.
In my experience, there are three things you don't want to have to do for each record:
Open/close a sql connection per row. This concern is handled by ADO.NET connection pooling. You shouldn't have to worry about it unless you have disabled the pooling.
Database roundtrip per row. This tends to be less about the network bandwidth or network latency and more about the client side thread sleeping. You want a substantial amount of work on the client side each time it wakes up or you are wasting your time slice.
Open/close the sql transaction log per row. Opening and closing the log is not free, but you don't want to hold it open too long either. Do many inserts in a single transaction, but not too many.
On any of these, you'll probably see a lot of improvement going from 1 row per request to 10 rows per request. You can achieve this by building up 10 insert statements on the client side before transmitting the batch.
Your approach of sending a list into a proc has been written about in extreme depth by Sommarskog.
If you are looking for better insert performance with multiple input values of a given type, I would recommend you look at table valued parameters.
And a sample can be found here, showing some example code that uses them.
You can use bulk insert functionality for this.
See this blog for details: http://blogs.msdn.com/b/nikhilsi/archive/2008/06/11/bulk-insert-into-sql-from-c-app.aspx
In C# I want to execute a query that use 2 different databases (One is Access for local, and other is distant and is MySQL)
I'm able to do it in VBA Access, but how I can make the same thing in C# ??
This is how I made it in Access:
Link my 2 differents table/databases in Table
In VBA:
sSQL = "INSERT INTO DB1tblClient SELECT * FROM DB2tblClient"
CurrentDb.Execute sSQL
How I can execute this SQL in C# ? (What object to use, etc... Example code if you can)
Thanks !
There are two ways to do this. One is to set up linked tables on Access and run a single query. The other is to run both queries from c# and join them with linq.
The first way is better. If you really have to do it with linq, here is some sample code:
dWConnection.Open();
dWDataAdaptor.SelectCommand = dWCommand1;
dWDataAdaptor.Fill(queryResults1);
dWDataAdaptor.SelectCommand = dWCommand2;
dWDataAdaptor.Fill(queryResults2);
dWConnection.Close();
IEnumerable<DataRow> results1 = (from events in queryResults1.AsEnumerable()
where events.Field<string>("event_code").ToString() == "A01"
|| events.Field<string>("event_code").ToString() == "ST"
select events ) as IEnumerable<DataRow>;
var results2 = from events1 in queryResults1.AsEnumerable()
join events2 in queryResults2.AsEnumerable()
on (string)events1["event_code"] equals (string)events2["event_code"]
select new
{
f1 = (string)events1["event_code"],
f2 = (string)events2["event_name"]
};
DataTable newDataTable = new DataTable();
newDataTable = results1.CopyToDataTable<DataRow>();
See why I said linked tables is better?
You should be able to run the same SQL command from any app, really. This is assuming:
You're connecting to Access from your C# app
DB1tblClient is a local Access table
DB2tblClient is a link table in Access
Given these, you might try the following:
using (OleDbConnection conn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Stuff\MyAccessdb.mdb"))
{
conn.Open();
using (OleDbCommand cmd = conn.CreateCommand())
{
cmd.CommandText = "INSERT INTO DB1tblClient SELECT * FROM DB2tblClient";
cmd.ExecuteNonQuery();
}
}
You might want to check connectionstrings.com if you can't get the connection string right, and you may need to install some components (MDAC or ACE) for connections that use those providers.
Well it is not possible to run this such complex query with single statement.
Basically each query execution object initialized by particular database information,
so need two different object for each database first think.
Now 2 Object need with initialized with its own connection object.
Just fetch data by first object and insert it to another database by usin second connection object.
You need to keep following points in mind before trying this type of query
Both the databases are accessible from your code.
There is inter-connectivity between both the database.
Both the databases are available for the user that you are using to execute this query.
You need to specify the query in following format
DATABASE_NAME.SCHEMA_NAME.TABLE_NAME instead of just TABLE_NAME
EDIT
If you don't have inter-connectivity between databases you can follow following steps
Connect to Source database using one connection.
Read the data from source database into a dataset or datatable using SELECT query.
Connect to target database using a second connection.
Insert all the records one by one using a loop to TARGET Database using standard INSERT query
I am newbie to db programming and need help with optimizing this query:
Given tables A, B and C and I am interested in one column from each of them, how to write a query such that I can get one column from each table into 3 different arrays/lists in my C# code?
I am currently running three different queries to the DB but want to accomplish the same in one query (to save 2 trips to the DB).
#patmortech Use UNION ALL instead of UNION if you don't care about duplicate values or if you can only get unique values (because you are querying via primary or unique keys). Much faster performance with UNION ALL.
There is no sense of "arrays" in SQL. There are tables, rows, and columns. Resultsets return a SET of rows and columns. Can you provide an example of what you are looking for? (DDL of source tables and sample data would be helpful.)
As others have said, you can send up multiple queries to the server within a single execute statement and return multiple resultsets via ADO.NET. You use the DataReader .NextResult() command to return the next resultset.
See here for more information: MSDN
Section: Retrieving Multiple Result Sets using NextResult
Here is some sample code:
static void RetrieveMultipleResults(SqlConnection connection)
{
using (connection)
{
SqlCommand command = new SqlCommand(
"SELECT CategoryID, CategoryName FROM dbo.Categories;" +
"SELECT EmployeeID, LastName FROM dbo.Employees",
connection);
connection.Open();
SqlDataReader reader = command.ExecuteReader();
while (reader.HasRows)
{
Console.WriteLine("\t{0}\t{1}", reader.GetName(0),
reader.GetName(1));
while (reader.Read())
{
Console.WriteLine("\t{0}\t{1}", reader.GetInt32(0),
reader.GetString(1));
}
reader.NextResult();
}
}
}
With a stored procedure you can return more than one result set from the database and have a dataset filled with more than one table, you can then access these tables and fill your arrays/lists.
You can do 3 different SELECT statements and execute in 1 call. You will get 3 results sets back. How you leverage those results depends on what data technology you are using. LINQ? Datasets? Data Adapter? Data Reader? If you can provide that information (perhaps even sample code) I can tell you exactly how to get what you need.
Not sure if this is exactly what you had in mind, but you could do something like this (as long as all three columns are the same data type):
select field1, 'TableA' as TableName from tableA
UNION
select field2, 'TableB' from tableB
UNION
select field3, 'TableC' from tableC
This would give you one big resultset with all the records. Then you could use a data reader to read the results, keep track of what the previous record's TableName value was, and whenever it changes you could start putting the column values into another array.
Take the three trips. The answers so far suggest how far you would need to advance from "new to db programming" to do what you want. Master the simplest ways first.
If they are three huge results, then I suspect you're trying to do something in C# that would better be done in SQL on the database without bringing back the data. Without more detail, this sounds suspiciously like an antipattern.