Really odd DataReader performance issue - c#

I have a SQL Server database and I'm using ADO.NET ExecuteReader to get a datareader. My stored procedure returns around 35,000 records.
The call to ExecuteReader is taking roughly 3 seconds to return the datareader.
I'm using code very similar to this to get my items.
using(var conn = new SqlConnection(MySQLHelper.ConnectionString)) {
conn.Open();
var sqlCommand = SqlHelper.CreateCommand(conn, "spGetItems");
using (var dr = sqlCommand.ExecuteReader()) {
while(dr.read){
var item = new Item{ID = dr.GetInt32(0), ItemName = dr.GetString(1)};
items.Add(item);
}
}
}
A majority of the reads is taking 0 milliseconds. However, intermitantly I'm getting a Read that takes about 5.5 seconds (5000+ milliseconds). I've looked at the data and could find nothing out of the ordinary. I think started looking at the frequency of the records that were taking so long.
This was interesting. While not completely consistent, they were close. The records that were taking a long time to load were as follows...
Record #s: 29, 26,26,27,27,29,30,28,27,27,30,30,26,27
So it looks like 26 to 30 records would read in 0 to a few milliseconds, and then it would take 5 seconds, then the next 26 to 30 records would again read as expected.
I'm at a complete loss here. I can post more code, but there isn't much to it. It's pretty simple code.
EDIT
None of my fields are varchar(max), or even close. My largest field is a numeric(28,12).
After modifying my stored procedure , I'm no longer having issues. I first modified it to Select TOP 100, then raised that to Top 1000, then 10,000 and then 100,000. I never had the issue with those. Then I removed to TOP and now I'm not having the issue I was earlier.

SqlDataReader buffers results sent to the client. See this page on MSDN for details:
When the results are sent back to the client, SQL Server puts as many result set rows as it can into each packet, minimizing the number of packets sent to the client.
I suspect that you're getting 26-30 records per packet. As you iterate through the records, you get a delay as new records are loaded.

I had a similar problem. The answer was to cast all the text fields using nvarchar(max) and then .NET ExecuteReader returned within a similar period to an Exec of the sproc in MS Studio. Note that the sproc didn't contain transactions but the .NET call was wrapped in a transaction.

Related

How to run multiple queries against SQL Server?

I have read and implemented several different versions of Microsofts suggested methods for querying a SQL Server database. In all that I have read, each query is surrounded by a using statement, e.g. In some method DoQuery:
List<List<string>> DoQuery(string cStr, string query)
{
using(SqlConnection c = new SqlConnection(cStr))
{
c.Open();
using(SqlCommand cmd = new SqlCommand(queryStr, c))
{
using(SqlDataReader reader = cmd.ExecuteReader())
{
while (reader.Read() )
{
...
//read columns and put into list to return
}
// close all of the using blocks
}
}
}
// return the list of rows containing the list of column values.
}
I need to run this code several hundreds of times for different query strings against the same database. It seems that creating a new connection each time would be inefficient and dropping it each time wasteful.
How should I structure this so that it is efficient? When I tried not using a using block and passing the connection into the DoQuery method, I got messages about the connection had not been closed. If I closed it after the query, then I got messages about it wasn't open.
I'm also trying to improve this because I keep getting somewhat random
IOException: Unable to read data from the transport connection: Operation on non-blocking socket would block.
I'm the only user of the database at this time and I'm not doing anything in multiple threads or async, etc. Just looping through query strings and running DoQuery on them.
Could my structure be part of that problem, i.e. not releasing the resources fast enough and thereby seeing the connection blocked?
I'm stuck here on efficiency and this blocking problem. Thanks in advance.
As it turns out, the query structure was fine and the queries were fine. The problem was that I had an ‘order by X desc’ on each query and that column was not indexed. This caused a full table scan to order the rows even if only returning 2. The table has about 3 million rows and I thought it could handle that better than it does. It timed out with 360 second connection timeout! I indexed the column and no more ‘blocking’ nonsense, which BTW, is a horrible message to return when it was actually a timeout. The queries now run fine if I index every column that appears in a where clause.

C# - Query still running in SQL Server Database even though CommandTimeout was set to terminate query after a set time

Wihin my C# code I am using the CommandTimeout function to ensure that any query that executes longer than 30s is terminated both from the server and database. However when listing the currently running queries on the database the query that was set to cancel after 30s runs well beyond 30s
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
SqlCommand sqlCommand = new SqlCommand(query, connection);
//Set Timeout to 30s
sqlCommand.CommandTimeout = 30;
SqlDataAdapter da = new SqlDataAdapter(sqlCommand);
da.Fill(response);
connection.Close();
da.Dispose();
}
Why is the query still running in the DB? Is my only option right now is to send another query from the server to kill the query (KILL [session_id]) after 30s?
EDIT: 300Mb of data is being returned for this query.
There are a number of posts on StackOverflow indicating that SqlCommand.CommandTimeout won't affect the behavior of SqlDataAdapter.Fill. Instead, you supposedly have to set the SqlDataAdapter's SelectCommand.CommandTimeout property.
However, there are other posts which seem to indicate that even this doesn't work. This one in particular makes me think that the query will only be canceled if the timeout occurs before the query starts yielding results. Once results start coming in, it appears to ignore all timeouts.
My recommendation would be to reconsider using SqlDataAdapter. Depending on your use case, maybe a library like Dapper would work better for you?
You may also want to consider reporting this as a defect to the .NET team. I've had mixed success in the past reporting such errors; it depends on whether the team wants to prioritize fixing the issue.
Update
It looks like this may be the intended, documented behavior, as Marc Gravell points out here.
lol: from the documentation
(https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout(v=vs.110).aspx)
For example, with a 30 second time out, if Read requires two network
packets, then it has 30 seconds to read both network packets. If you
call Read again, it will have another 30 seconds to read any data that
it requires.
So: this timeout resets itself every Read. So: the only way it'll trip
is if any single Read operation takes longer than 2s. As long as the
SQL Server manages to get at least one row onto the pipe in that time:
it won't timeout via either API.

Why are 700k database rows taking over 15 seconds to load into memory?

Using C# and .NET 3.5 and with either a ADO connection or OLEDB connection, filling a DataTable or DataSet with 700k rows of 3 columns each, takes over 15 seconds.
Executing the actual select statement on the DB takes less than a second. The DB is on a different machine to the one querying it, processing the data. (Perhaps this adds time?)
The data looks like this:
public class Node
{
public DateTime Timestamp;
public float Value;
public string Name;
}
Doing it with a SqlDataReader and calling reader.Read(), then manually putting the data in a new instance of the above class, adding it to a List<Node> also takes over 15 seconds.
Code looks like this:
List<Node> data = new List<Node>();
while (reader.Read())
{
Node n = new Node();
n.Timestamp = (DateTime)reader["Timestamp"];
n.Value = (float)reader["Value"];
n.NodeName = (string)reader["NodeName"];
data.Add(n);
}
I measured this using a StopWatch class in release mode with optimization turned on in project properties.
I get that it has to iterate each record, but I would have expected any machine today to be able to iterate 700k records in a few seconds, but not more.
What could be the reasons this takes over 15 seconds? Am I unreasonable to expect that this should be much faster?
EDIT Doing SqlDataReader.Read() by itself also takes over 15 seconds.
I think the problem lies in the container you're using. The List<> is being dynamically resized a lot. Try the following procedure instead:-
run query with a COUNT clause to get the number of records, only select a single column
List<Node> data = new List<Node>(count from above);
run normal query
fill List<> as above
This will prevent the list from constantly resizing.
Alternatively, to see if this is the problem, replace List<> with LinkedList<> as this doesn't have the resizing issues that List<> does.
It should be the network speed between database and machine you are executing the code at.
What also happens in your loop, is that the values from your query are unboxed. It might be worth it to try the GetString, GetFloat, etc methods, since you have so many records.
List<Node> data = new List<Node>();
while (reader.Read())
{
Node n = new Node();
n.Timestamp = reader.GetDateTime(0); // TODO: Check column numbers
n.Value = reader.GetFloat(1);
n.NodeName = reader.GetString(2);
data.Add(n);
}
No conversions are performed in these methods.
Remarks
No conversions are performed; therefore, the data retrieved
must already be a string, or an exception is generated.
I'm reading a lot of guesses, which could be right, but they are still guesses.
If you run it under the debugger and manually pause it a few times, and each time display the stack, you will be using the random pausing method.
It will tell you exactly what's taking the time and why - no guesswork.
If you want to use a profiler, you need one that samples on wall-clock time.
Otherwise you will have to choose between a) a sampler that gives you line-level inclusive percent, with no visibility into IO time, or b) an instrumenter, that only gives you function-level inclusive percent.
Neither one tells you why the time is being spent, only how much.
Whatever you choose, ignore the temptation to look at self time, which is misleading at best in any app that spends all its time in subfunctions, and it totally ignores IO.
If it is not a code issue then suspect it has to do with your query plan then.
Make sure you are setting the right options before executing the query. and they are of the same state on .NET and MSSQL.
One interesting option that has been found to cause a performance hit before is the ARITHABOIRT being enabled on SQL and off on .NET.
Try adding SET ARITHABORT ON before your query in the command.
Refer to :
Slow performance of SqlDataReader

Microsoft SQL Server insert data into large table at every second

I have a data acquisition system that reads values from some industrial devices and records values into Microsoft SQL Server 2008 R2 database. Data record interval is approximately 20 seconds. Every record data contains approximately 600 bytes of data.
Now I need to insert data from a new hardware but this time record interval has to be 1 second. In other words I insert 1 record of 600 bytes into SQL server database in every second.
I have two questions:
Is there any possible problem that I may run into while inserting data in every second? I think Microsoft SQL server is quite OK for this frequency of insertion but I am not sure for a long-period.
Program is a long running application. I clear the data table approximately every week. When I record data in every second I will have 3600 rows in the table every hour and 86400 rows every day and approximately 600K rows at the end of week. Is this OK for a good level of reading data? Or should I try to change my approach in order not to have such amount of rows in the table?
By the way I use LinqToSQL for all my database operations and C# for programming.
Is there any possible problem that I may run into while inserting data in every second? I think Microsoft SQL server is quite OK for this frequency of insertion but I am not sure for a long-period.
If database is properly designed than you should not run into any problem. We save GIS data at much greater speed without any issue.
Is this OK for a good level of reading data? Or should I try to change my approach in order not to have such amount of rows in the table?
It depends, if you need all the data than how can you change the approach? if you don't need it why do you save it?
First of all, you must think about existing indexes on tables in which you insert data, because indexes slowing down insert process. Second, if you have FULL recovery model, then every insert process will be written in transaction log, and your log file will rapidly rise.
Think about change your recovery model to SIMPLE, and to disable your indexes.
Of course, selecting rows from that table will be slower, but I don't know what is your requests.
Based on my thesis experience in college, if your system is fully stable and doesn't crash or overflow or etc. You can use SqlBulkCopy to avoid I/O operation per record.
This is sample code of bulk copy for DataTable and this method should call every 1 hour:
private void SaveNewData()
{
if (cmdThesis.Connection.State == ConnectionState.Open)
{
cmdThesis.Connection.Close();
}
SqlBulkCopy bulkCopy = new SqlBulkCopy(#"Data Source=.;Initial Catalog=YourDb;Integrated Security=True");
bulkCopy.BatchSize = 3000;
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Col1", "Col1"));
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Col2", "Col2"));
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Col3", "Col3"));
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Col4", "Col4"));
bulkCopy.DestinationTableName = "DestinationTable";
bulkCopy.WriteToServer(Result);
Result.Rows.Clear();
}
Although I think you should be ok, since you are apparently using a .NET platform, you can check out StreamInsight: http://technet.microsoft.com/en-us/library/ee391416.aspx

ADO.Net DataReader timeout issue

I am using ADO.Net + C# + VSTS 2008 + ADO.Net to connect to SQL Server 2008 Enterprise. I am using almost the same pattern/sample mentioned here -- using ADO.Net DataReader to retrieve data one entry (row) by one entry (row).
http://msdn.microsoft.com/en-us/library/haa3afyz.aspx
My question is, if I set the SqlCommand timeout in this sample,
1. I think the timeout applies to how much time we could use as maximum value to retrieve one specifc row, not the total timeout for the whole data entry-by-entry loop?
BTW: loop I mean,
while (reader.Read())
{
Console.WriteLine("{0}\t{1}", reader.GetInt32(0),
reader.GetString(1));
}
2.
and this timeout only matters how much time it takes to retrieve data entry from database, and this timeout has nothing to do with how much time we deal with each entry (e.g. if we set timeout to 20 seconds, and if it takes 1 second to retrieve one data entry from database, and it takes 30 seconds for my application logics to manipulate the data entry, timeout will never happen).
Correct understanding?
The command timeout that you can set applies to how long you give ADO.NET to do its job.
If you call cmdQuery.ExecuteNonQuery() which returns nothing but performs a SQL statement it's the time needed to perform that statement.
If you call cmdQuery.ExecuteReader() which returns a data reader, it's the time needed for ADO.NET to ste up / construct that data reader so that you can then use it.
If you call cmdQuery.ExecuteScalar() which returns a single scalar value, it's the time needed to execute the query and grab that single result.
If you use the dataAdapter.Fill() to fill a data table or data set, it's the time needed for ADO.NET to retrieve the data and then fill the data table or data set.
So overall : the timeout applies to the portion of the job that ADO.NET can do - execute the statement, fill a data set, return a scalar value.
Of course it does NOT apply to the time it takes YOU to iterate through the results (in case of a data reader). That wouldn't make sense at all...
Marc
Yes you are right. The CommandTimeout means the Time the Database needs to execute the command (any command)

Categories