I am remotely selecting results from a custom production database with a criteria of around three minutes from a C# application.
Every time the select command is executed, the server PC that I am using CPU goes up to around 50%. But surely, the load should be on the database that I am connecting to?
Why would the C# application rocket to 50% until the data is retrieved for reading?
Some background
I have worked out from debugging that the Select statement on the
remote database takes around 30-40 seconds, baring in mind I am selecting with a criteria that uses the indexed column.
At the same time of selecting data from the remote DB, I have monitored the TaskManager and the CPU sits at 50% until the select as complete.. this can last around 30-40 seconds each loop.
If I select in the native sql engine for the remote DB, there is no lag on the select, the data (if any) is returned immediately.
I know its not the parsing of result set that is taking up the CPU load as some selects will return nothing.
Here is some code I am using.
OdbcConnection remoteConn = new OdbcConnection(ConfigurationManager.ConnectionStrings["remoteConnectionString"].ToString());
remoteConn.Open();
OdbcCommand remoteCommand = new OdbcCommand();
remoteCommand.Connection = remoteConn;
using (remoteConn)
{
string localSql = "";
string remoteSql = "select * from tracking where last_update > 212316247440000000"; // Julian No = 2015-07-12 11:24:00
remoteCommand.CommandText = remoteSql;
OdbcDataReader remoteReader;
remoteReader = remoteCommand.ExecuteReader();
while (remoteReader.Read())
{
for (int i = 0; i < 68; i++)
{
localSql += ",'" + remoteReader[i].ToString() + "'";
}
}
}
I ran a performance and diagnostic test on the application and it yielded this result.
How, if any, can I reduce this CPU load or even eradicate it completely. It is completely out of the ordinary and I have no clue on how to go about it.
Thanks
Thanks for the information, Dan. Here are my thoughts...
I believe the primary reason why your app is consuming so much CPU is because of the drivers you are using.
Since you are connecting to an SQL Server database, you should use the SQL Server drivers which know how to optimize transportation of data between client and server.
To use the appropriate drivers, make sure you use the SqlConnection, SqlCommand, etc.
This will allow SQL Server to stream the results to your client as you query the data reader.
Secondly, do not use the ExecuteReader() method on the DbCommand object. One of the many wonderful features unique to the SQL Server drivers is the ExecuteReaderAsync() method.
Since this command is an IO-bound operation (not compute-bound) then there is no need to block the calling thread. When the result come back they will arrive on an IO completion thread.
Here is a code sample of what your code might look like after the change.
using (var remoteConn = new SqlConnection(ConfigurationManager.ConnectionStrings["remoteConnectionString"].ToString()))
{
remoteConn.Open();
using (var remoteCommand = new SqlCommand())
{
remoteCommand.Connection = remoteConn;
string localSql = "";
string remoteSql = "select * from tracking where last_update > 212316247440000000"; // Julian No = 2015-07-12 11:24:00
remoteCommand.CommandText = remoteSql;
var remoteReader = await remoteCommand.ExecuteReaderAsync();
while (remoteReader.Read())
{
for (int i = 0; i < 68; i++)
{
localSql += ",'" + remoteReader[i].ToString() + "'";
}
}
}
}
I am assuming that the select command is run from a separate computer than the one hosting the SQL DB.
Frankly it is a bit baffling. The only thing that comes to my mind is that it might be taking time passing and processing the metadata. Have you tried changing your query to return say only one piece of data?
For example:
select top 1 last_update from tracking where last_update > 212316247440000000
select count (*) from tracking where last_update > 212316247440000000
This will clarify if the data+metadata is causing the issue.
Could it be parsing of the result? Are you using some sort of datalayer that automatically parse resultsets into domain objects? That could be very CPU intense..
The best way for your IO Bound operations is using the Task, becouse it does not need to change existing code:
var task = Task.Factory.StartNew(() => {
//your code or #Randy code with little change here:
using (var remoteConn = new SqlConnection(ConfigurationManager.ConnectionStrings["remoteConnectionString"].ToString()))
{
remoteConn.Open();
using (var remoteCommand = new SqlCommand())
{
remoteCommand.Connection = remoteConn;
string localSql = "";
string remoteSql = "select * from tracking where last_update > 212316247440000000"; // Julian No = 2015-07-12 11:24:00
remoteCommand.CommandText = remoteSql;
var remoteReader = remoteCommand.ExecuteReader();
while (remoteReader.Read())
{
for (int i = 0; i < 68; i++)
{
localSql += ",'" + remoteReader[i].ToString() + "'";
}
}
}
}
});
//use can use 'task.wait();' to waiting for complete the task.
Related
So, I have a program I'm working on where more than one client is connected to a MYSQL Database and I want to keep them all current. When one client updated the database, the information in all clients updates. I'm new and still studying in college, so the only way I could think to do this is to make a column that held each records update time and then each second use this query:
if (sqlHandler.userLastUpdate < sqlHandler.LastUpdate())
{
//Loads the products from the Database
sqlHandler.ReadDB();
//Set's this clients last update time to now so it doesn't keep refreshing
sqlHandler.userLastUpdate = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
}
public double LastUpdate()
{
using var con = new MySqlConnection(CONNECTION_STRING);
con.Open();
string sql = "SELECT MAX(LastUpDate) FROM products";
using var cmd = new MySqlCommand(sql, con);
using MySqlDataReader rdr = cmd.ExecuteReader();
rdr.Read();
double time = rdr.GetDouble(0);
return time;
}
This seems horribly inefficient. Is there a better way to do this. I've had a couple clients running on the same MYSQL server and it seemed to be fine, but it just seems to be a better way to do this.
Consider the following example code:
using System.Data.SqlClient;
namespace ReportLoadTest
{
class Program
{
static void Main(string[] args)
{
using (var con = new SqlConnection("...your connection string here..."))
{
con.Open();
var trans = con.BeginTransaction();
var cmd = con.CreateCommand();
cmd.Transaction = trans;
cmd.CommandText = #"insert SomeTable(...columns...) values (...); select scope_identity()";
var rows = cmd.ExecuteScalar();
var rs = new SSRS.ReportExecutionService();
rs.Credentials = System.Net.CredentialCache.DefaultCredentials;
rs.Url = "http://localhost/ReportServer/ReportExecution2005.asmx";
var ei = rs.LoadReport("/Folder/Folder/Some report", null);
}
}
}
}
Under what conditions would this program "get stuck" at the call to ReportExecutionService.LoadReport?
By stuck, I mean 0 CPU, 0 I/O - no progress being made at all by the calling program, Reporting Services or SQL Server.
This program will get stuck if the report that's being loaded contains a dataset that's used to populate the available values for a parameter and that dataset is based on a query that reads rows from SomeTable.
LoadReport will eventually time out and there will be zero helpful information left lying around to help you figure out what happened.
Possible solutions:
Change the report to do "dirty reads" on SomeTable
Change the database to snapshot isolation mode to avoid the lock on the table.
I ran into this as an actual production issue with a system that runs reports on a schedule, and the report being run was the "scheduled reports history" report.
The subtlety is that LoadReport runs queries - in retrospect, it's obvious that it must run queries since the available values for parameters are contained in the ExecutionInfo that's returned by LoadReport.
I am trying to insert 200,000 documents from a folder into the SQL Server database into a varbinary column. I get a timeout expiration message after inserting 80,000 documents. Average file size is about 250kb and max file size is 50MB. I am running this c# program on the server where the database is located.
Please suggest.
The error:
The timeout period elapsed prior to completion of the operation or the server is not responding.
The code:
string spath = #"c:\documents";
string[] files = Directory.GetFiles(spath, "*.*", SearchOption.AllDirectories);
Console.Write("Files Count:" + files.Length);
using (SqlConnection con = new SqlConnection(connectionString))
{
con.Open();
string insertSQL = "INSERT INTO table_Temp(doc_content, doc_path) values(#File, #path)";
SqlCommand cmd = new SqlCommand(insertSQL, con);
var pFile = cmd.Parameters.Add("#File", SqlDbType.VarBinary, -1);
var pPath = cmd.Parameters.Add("#path", SqlDbType.Text);
var tran = con.BeginTransaction();
var fn = 0;
foreach (string docPath in files)
{
string newPath = docPath.Remove(0, spath.Length);
string archive = new DirectoryInfo(docPath).Parent.Name;
fn += 1;
using (var stream = new FileStream(docPath, FileMode.Open, FileAccess.Read))
{
pFile.Value = stream;
pPath.Value = newPath;
cmd.Transaction = tran;
cmd.ExecuteNonQuery();
if (fn % 10 == 0)
{
tran.Commit();
tran = con.BeginTransaction();
Console.Write("|");
}
Console.Write(".");
}
}
tran.Commit();
}
For this, I would suggest using SQLBulkCopy methods, since it should be able to handle data insertion much more easily. Further, as others have pointed out, you might want to increase the timeout condition for your command.
While I would agree this may be best for a bulk copy of some sort, if you must do this in the c# program, your only option is probably to increase the timeout value. You can do this after your SqlCommand object has been created via cmd.CommandTimeout = <new timeout>; The property CommandTimeout is an integer representing the number seconds for the timeout, or zero if you never want it to timeout.
See the MSDN docs for details
You should be able to set the timeout for the transaction directly on that object in your application code, this way you are not changing sql server settings.
Secondly, you can build batches in your application also. You say you can get 80k docs before a timeout, set your batch size at 50k, process them, commit them, grab the next batch. Having your application manage batching also allows you to catch sql errors, such as timeout, and then dynamically adjust the batch size and retry without ever crashing. This is the entire reason for writing your application in the first place, other wise you could just use the wizard in management studio and manually insert your files.
I highly recommend batching over other options.
#Shubham Pandey also provides a great link to SQL bulk copy info, which also has links to more info. You should definitely experiment with the bulk copy class and see if you can get additional gains with it as well.
We already have a running system that handles all connection-strings (db2, oracle, MSServer).
Currently, We are using ExecuteNonQuery() to do some inserts.
We want to improve the performance, by using SqlBulkCopy() instead of ExecuteNonQuery(). We have some clients that have more than 50 million records.
We don't want to use SSIS, because our system supports multiple databases.
I created a sample project to test the performance of SqlBulkCopy(). I created a simple read and insert function for MSServer
Here's the small function:
public void insertIntoSQLServer()
{
using (SqlConnection SourceConnection = new SqlConnection(_sourceConnectionString))
{
//Open the connection to get the data from the source table
SourceConnection.Open();
using (SqlCommand command = new SqlCommand("select * from " + _sourceSchemaName + "." + _sourceTableName + ";", SourceConnection))
{
//Read from the source table
command.CommandTimeout = 2400;
SqlDataReader reader = command.ExecuteReader();
using (SqlConnection DestinationConnection = new SqlConnection(_destinationConnectionString))
{
DestinationConnection.Open();
//Clean the destination table
new SqlCommand("delete from " + _destinationSchemaName + "." + _destinationTableName + ";", DestinationConnection).ExecuteNonQuery();
using (SqlBulkCopy bc = new SqlBulkCopy(DestinationConnection))
{
bc.DestinationTableName = string.Format("[{0}].[{1}]", _destinationSchemaName, _destinationTableName);
bc.NotifyAfter = 10000;
//bc.SqlRowsCopied += bc_SqlRowsCopied;
bc.WriteToServer(reader);
}
}
}
}
}
When I have less that 200 000 in my dummyTable the bulk copy is working fine. But, when it's over 200 000 records, I have the following errors:
Attempt to invoke bulk copy on an object that has a pending operation.
OR
The wait operation timed out (for the IDataReader)
I increased the CommandTimeout for the reader. It seems that it has solved the timeout issue related to IDataReader.
Am I doing something wrong in the code?
Can you try adding the following before the call to WriteToServer ...
bc.BatchSize = 10000;
bc.BulkCopyTimeout = 0;
I don't know what the default batch size or timeout is, but I suspect this might be your issue.
Hope that helps
Also, you can try playing with different Batch Sizes for optimal performance.
You can try this
bc.BatchSize = 100000; // How many Rows you want to insert at a time
bc.BulkCopyTimeout = 60; // Time in Seconds. If you want infinite waiting Time then assign 0.
I want to know if a multiple active result set, MARS, exists for the Microsoft's Access database? I am aware this exists for SQL Server. I tried using it with Access but it didn't work for me. I want to know how to use MARS with Access.
In short, Microsoft Access does not support multiple active result sets (MARS). It is not supported by the ODBC provider and the reason why that is not the case should be obvious if you think about it in terms of what MARS actually offers you from a performance stand point.
If you think about the most important reason for MARS to exist is if you have stored procedures executed on a SQL server that produce multiple result sets. If you have such queries you need to be able to somehow access those multiple results sets.
But in Access there is no such thing as stored procedures. If you have multiple queries you can just execute each one of them separately and get the result set for each. Hence, no need for MARS.
NOTE
In light of the comments, here's an example of how to have two data readers open at the same time:
using(var connection1 = new OdbcConnection("your connection string here"))
{
connection1.Open();
using(var connection2 = new OdbcConnection("your connection string here"))
{
connection2.Open();
using(var cmd1 = connection1.CreateCommand())
{
cmd1.CommandText = "YOU FIRST QUERY HERE";
using(var dataReader1 = cmd1.ExecuteReader())
{
while(dataReader1.Read())
{
// keep reading data from dataReader1 / connection 1
// .. at some point you may need to execute a second query
using(var cmd2 = connection2.CreateCommand())
{
cmd2.CommandText = "YOUR SECOND QUERY HERE";
// you can now execute the second query here
using(var dataReader2 = cmd2.ExecuteReader())
{
while(dataReader2.Read())
{
}
}
}
}
}
}
connection2.Close();
}
connection1.Close();
}