I'm a bit newbie still and I have been assigned with the task of maintaining previosuly done code.
I have a web that simulates SQL Management Studio, limitating deleting options for example, so basic users don't screw our servers.
Well, we have a function that expects a query or queries, it works fine, but our server RAM gets blown up with complex queries, maybe it's not that much data, but its casting xml and all that stuff that I still don't even understand in SQL.
This is the actual function:
public DataSet ExecuteMultipleQueries(string queries)
{
var results = new DataSet();
using (var myConnection = new SqlConnection(_connectionString))
{
myConnection.Open();
var sqlCommand = myConnection.CreateCommand();
sqlCommand.Transaction = myConnection.BeginTransaction(IsolationLevel.ReadUncommitted);
sqlCommand.CommandTimeout = AppSettings.SqlTimeout;
sqlCommand.CommandText = queries.Trim();
var dataAdapter = new SqlDataAdapter { SelectCommand = sqlCommand };
dataAdapter.Fill(results);
return results;
}
}
I'm a bit lost, I've read many different answers but either I don't understand them properly or they don't solve my problems in any way.
I know I could use Linq-toSql- or Entity, I tried them but I really don't know how to use them with an "unknown" query, I could try to research more anyway so if you think they will help me approaching a solution, by any means, I will try to learn it.
So to the point:
The function seems to stop at dataAdapter.Fill(results) when debugging, at that point is where the server tries to answer the query and just consume all its RAM and blocks itself. How can I solve this? I thought maybe by making SQL return a certain amount of data, store it in a certain collection, then continue returning data, and keep going until there is no more data to return from SQL, but I really don't know how to detect if there is any data left to return from SQL.
Also I thought about reading and storing in two different threads, but I don't know how the data that is in one thread can be stored in other thread async (and even less if it solves the issue).
So, yes, I don't have anything clear at all, so any guidance or tip would be highly appreciated.
Thanks in advance and sorry for the long post.
You can use pagination to fetch only part of the data.
Your code will be like this:
dataAdapter.Fill(results, 0, pageSize);
pageSize can be at size you want (100 or 250 for example).
You can get more information in this msdn article.
In order to investigate, try the following:
Start SQL profiler (it is usually installed along with SSMS and can be started from Management Studio, Tools menu)
Make sure you fill up some filters (either NT username or at least the database you are profiling). This is to catch as specific (i.e. only your) queries as possible
Include starting events to see when your query starts (e.g. RPC:Starting).
Start your application
Start the profiler before issuing the query (fill the adapter)
Issue the query -> you should see the query start in the profiler
Stop the profiler not to catch other queries (it puts overhead on SQL Server)
Stop the application (no reason to mess with server until the analysis is done)
Take the query within SQL Management Studio. I expect a SELECT that returns a lot of data. Do not run as it is, but put a TOP to limit its results. E.g. SELECT TOP 1000 <some columns> from ....
If the TOPed select runs slowly, you are returning too much data.
This may be due to returning some large fields such as N/VARCHAR(MAX) or VARBINARY(MAX). One possible solution is to exclude these fields from the initial SELECT and lazy-load this data (as needed).
Check these steps and come back with your actual query, if needed.
Related
I am currently facing an issue where it takes quite a while to process information from a server, and I would like to provide active feedback for the user to know what is going on while the application appears to be just sitting around.
A little about the application: it allows the user to pull all databases from a specified server along with all content within those databases. This can take quite a while sometimes, since some of our databases can reach 2TB in size. Each server can contain hundreds of databases and so as a result, if I try to load a server with 100 databases, and 30% of those databases are over 100GB in size, it takes a good couple of minutes before the application is able to run effectively again.
Currently I just have a simple loading message that says: "Please wait, this could take a while...". However, in my opinion, this is not really sufficient for something that can take a few minutes.
So as a result, I am wondering if there is a way to track the progress of the SqlConnection object as it is executing the specified query? If so, what kind of details would I be able to provide and are there any readily available resources to look over and better understand the solution?
I am hopeful that there is a way to do this without having to recreate the SqlConnection object altogether.
Thank you all for your help!
EDIT
Also, as another note; I am NOT looking for handouts of code here. I am looking for resources that will help me in this situation if any are available. I have been looking into this issue for a few days already, I figure the best place to ask for help is here at this point.
Extra
A more thorough explanation: I am wondering if there is a way to provide the user with names of databases, tables, views, etc that are currently being received.
For example:
Loading Table X From Database Y On Server Z
Loading Table A From Database B On Server Z
The SqlConnection has an InfoMessage-Event to which you can assign a method.
Furthermore you have to set the FireInfoMessageEventOnUserErrors-Property to true.
No you can do something like
private void OnInfoMessage(object sender, SqlInfoMessageEventArgs e)
{
for (var index = 0; index < e.Errors.Count; index++)
{
var message = e.Errors[index];
//use the message object
}
}
Note that you should only evaluate messages with an errorcode lower then 11 (everything above is a 'real' error).
Depending on what command you are using sometime the server already generates such info messages (for example at VERIFY BACKUP).
Anyways you can also use this to report progress form a stored procedure or a query.
Pseudocode (im not an sqlguy):
RAISEERROR('START', 1,1) WITH NOWAIT; -- Will raise your OnInfoMessage-method
WHILE
-- Execute something
RAISEERROR('Done something', 1,1) WITH NOWAIT; -- Will raise your OnInfoMessage-method
-- etc.
END
Have fun with parsing, cause remember: such messages can also be generated by the server itself (so it is probably a good idea to start your own, relevant "errors" with a satic sequence which cannot occur under nomal circumstances).
I'm working on a project where we're receiving data from multiple sources, that needs to be saved into various tables in our database.
Fast.
I've played with various methods, and the fastest I've found so far is using a collection of TableValue parameters, filling them up and periodically sending them to the database via a corresponding collection of stored procedures.
The results are quite satisfying. However, looking at disk usage (% Idle Time in Perfmon), I can see that the disk is getting periodically 'thrashed' (a 'spike' down to 0% every 13-18 seconds), whilst in between the %Idle time is around 90%. I've tried varying the 'batch' size, but it doesn't have an enormous influence.
Should I be able to get better throughput by (somehow) avoiding the spikes while decreasing the overall idle time?
What are some things I should be looking out to work out where the spiking is happening? (The database is in Simple recovery mode, and pre-sized to 'big', so it's not the log file growing)
Bonus: I've seen other questions referring to 'streaming' data into the database, but this seems to involve having a Stream from another database (last section here). Is there any way I could shoe-horn 'pushed' data into that?
A very easy way of inserting loads of data into an SQL-Server is -as mentioned- the 'bulk insert' method. ADO.NET offers a very easy way of doing this without the need of external files. Here's the code
var bulkCopy = new SqlBulkCopy(myConnection);
bulkCopy.DestinationTableName = "MyTable";
bulkCopy.WriteToServer (myDataSet);
That's easy.
But: myDataSet needs to have exactly the same structure as MyTable, i.e. Names, field types and order of fields must be exactly the same. If not, well there's a solution to that. It's column mapping. And this is even easier to do:
bulkCopy.ColumnMappings.Add("ColumnNameOfDataSet", "ColumnNameOfTable");
That's still easy.
But: myDataSet needs to fit into memory. If not, things become a bit more tricky as we have need a IDataReader derivate which allows us to instantiate it with an IEnumerable.
You might get all the information you need in this article.
Building on the code referred to in alzaimar's answer, I've got a proof of concept working with IObservable (just to see if I can). It seems to work ok. I just need to put together some tidier code to see if this is actually any faster than what I already have.
(The following code only really makes sense in the context of the test program in code download in the aforementioned article.)
Warning: NSFW, copy/paste at your peril!
private static void InsertDataUsingObservableBulkCopy(IEnumerable<Person> people,
SqlConnection connection)
{
var sub = new Subject<Person>();
var bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "Person";
bulkCopy.ColumnMappings.Add("Name", "Name");
bulkCopy.ColumnMappings.Add("DateOfBirth", "DateOfBirth");
using(var dataReader = new ObjectDataReader<Person>(people))
{
var task = Task.Factory.StartNew(() =>
{
bulkCopy.WriteToServer(dataReader);
});
var stopwatch = Stopwatch.StartNew();
foreach(var person in people) sub.OnNext(person);
sub.OnCompleted();
task.Wait();
Console.WriteLine("Observable Bulk copy: {0}ms",
stopwatch.ElapsedMilliseconds);
}
}
It's difficult to comment without knowing the specifics, but one of the fastest ways to get data into SQL Server is Bulk Insert from a file.
You could write the incoming data to a temp file and periodically bulk insert it.
Streaming data into SQL Server Table-Valued parameter also looks like a good solution for fast inserts as they are held in memory. In answer to your question, yes you could use this, you just need to turn your data into a IDataReader. There's various ways to do this, from a DataTable for example see here.
If your disk is a bottleneck you could always optimise your infrastructure. Put database on a RAM disk or SSD for example.
I have a query (about 1600 lines stored as a Stored Procedure) that takes about 3 seconds to execute (after optimizing it by adding the correct indexes), when executed inside SQL Server Management Studio.
I wrote a wrapper for this in C# and provided myself with the ability to use a URI to execute this query. However, this takes more than 30 seconds to execute and because of this, when I run this query as part of a loop, the browser stalls due to too many pending requests. I wrote the wrapper like this:
try
{
string ConString = Constants.connString;
using (con = new SqlConnection(ConString))
{
cmd = new SqlCommand(sql, con);
con.Open();
dr = cmd.ExecuteReader();
while (dr.Read())
{
...
}
}
}
My connection string is this:
Data Source={0};Initial Catalog={1};Integrated Security=True;MultipleActiveResultSets=true
I know the query itself is good because I've run it multiple times inside SSMS and it worked fine (under 5 seconds on an average). And, I'd be happy to provide with more debug information, except that I don't know what to provide.
To troubleshoot such problems, where would I start?
EDIT:
I ran SQL Profiler and collected some stats. This is what I am observing. Very strange that it is the exact query being executed. Let me know if there is anything else I can do at this point.
Ok; Finally, found the answer here and here. Answer replicated here for convenience. Thanks goes to the original poster, Jacques Bosch, who in turn took it from here. Cannot believe this problem was solved in 2004!
The problem seems to be caused by SQL Server's Parameter Sniffing.
To prevent it, just assign your incoming parameter values to other variables declared right at the top of your SP.
See this nice Article about it
Example:
CREATE PROCEDURE dbo.MyProcedure
(
#Param1 INT
)
AS
declare #MyParam1 INT
set #MyParam1 = #Param1
SELECT * FROM dbo.MyTable WHERE ColumnName = #MyParam1
GO
I copied this information from eggheadcafe.com.
Queries often run faster within SQL Server management studio due to caching of query plans. Try running sp_recompile on your stored procedure before benchmarking. This will clear the query plan.
More details can be found here: http://www.sommarskog.se/query-plan-mysteries.html
Why don't U use XML instead of a Result Set ?
As far as I know using XML is much faster than Reading a result set.
so in this case you should use something like this :
SELECT *
FROM [Table Name]
FOR XML PATH('[Your Path]'), ELEMENTS XSINIL, ROOT('Your Root')
and after that I think you can serialize it in your Project.
I had the same problem once, and it turned out that the difference was caused by the ARITHABORT setting that is different if you connect via SQL management studio. The .net connection set the ARITHABORT setting to OFF, while SQL management studio sets this setting to ON.
You can try if you have the same problem by executing SET ARITHABORT OFF in the SQL Management Studio, and then execute your query.
Here is a thread that explains why this setting can cause such dramatic performance differences.
I'm working on a .NET component that gets a set of data from the database, performs some business logic on that set of data, and then updates single records in the database via a stored procedure that looks something like spUpdateOrderDetailDiscountedItem.
For small sets of data, this isn't a problem, but when I had a very large set of data that required an iteration of 368 stored proc calls to update the records in the database, I realized I had a problem. A senior dev looked at my stored proc code and said it looked fine, but now I'd like to explore a better method for sending "batch" data to the database.
What options do I have for updating the database in batch? Is this possible with stored procs? What other options do I have?
I won't have the option of installing a full-fledged ORM, but any advice is appreciated.
Additional Background Info:
Our current data access model was built 5 years ago and all calls to the db currently get executed via modular/static functions with names like ExecQuery and GetDataTable. I'm not certain that I'm required to stay within that model, but I'd have to provide a very good justification for going outside of our current DAL to get to the DB.
Also worth noting, I'm fairly new when it comes to CRUD operations and the database. I much prefer to play/work in the .NET side of code, but the data has to be stored somewhere, right?
Stored Proc contents:
ALTER PROCEDURE [dbo].[spUpdateOrderDetailDiscountedItem]
-- Add the parameters for the stored procedure here
#OrderDetailID decimal = 0,
#Discount money = 0,
#ExtPrice money = 0,
#LineDiscountTypeID int = 0,
#OrdersID decimal = 0,
#QuantityDiscounted money = 0,
#UpdateOrderHeader int = 0,
#PromoCode varchar(6) = '',
#TotalDiscount money = 0
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for procedure here
Update OrderDetail
Set Discount = #Discount, ExtPrice = #ExtPrice, LineDiscountTypeID = #LineDiscountTypeID, LineDiscountPercent = #QuantityDiscounted
From OrderDetail with (nolock)
Where OrderDetailID = #OrderDetailID
if #UpdateOrderHeader = -1
Begin
--This code should get code the last time this query is executed, but only then.
exec spUpdateOrdersHeaderForSkuGroupSourceCode #OrdersID, 7, 0, #PromoCode, #TotalDiscount
End
If you are using SQL 2008, then you can use a table-valued parameter to push all of the updates in one s'proc call.
update
Incidentally, we are using this in combination with the merge statement. That way sql server takes care of figuring out if we are inserting new records or updating existing ones. This mechanism is used at several major locations in our web app and handles hundreds of changes at a time. During regular load we will see this proc get called around 50 times a second and it is MUCH faster than any other way we've found... and certainly a LOT cheaper than buying bigger DB servers.
An easy and alternative way I've seen in use is to build a SQL statement consisting of sql_execs calling the sproc with the parameters in the string. Not sure if this is advised or not, but from the .NET perspective, you are only populating one SqlCommand and calling ExecuteNonQuery once...
Note if you choose this then please, please use the StringBuilder! :-)
Update: I much prefer Chris Lively's answer, didn't know about table-valued parameters until now... unfortunately the OP is using 2005.
You can send the full set of data as XML input to the stored procedure. Then you can perform Set operations to modify the database. Set based will beat RBARs on performance almost every single time.
If you are using a version of SQL Server prior to 2008, you can move your code entirely into the stored procedure itself.
There are good and "bad" things about this.
Good
No need to pull the data across a network wire.
Faster if your logic is set based
Scales up
Bad
If you have rules against any logic in the database, this would break your design.
If the logic cannot be set based then you might end up with a different set of performance problems
If you have outside dependencies, this might increase difficulty.
Without details on exactly what operations you are performing on the data it's hard to give a solid recommendation.
UPDATE
Ben asked what I meant in one of my comments about the CLR and SQL Server. Read Using CLR Integration in SQL Server 2005. The basic idea is that you can write .Net code to do your data manipulation and have that code live inside the SQL server itself. This saves you from having to read all of the data across the network and send updates back that way.
The code is callable by your existing proc's and gives you the entire power of .net so that you don't have to do things like cursors. The sql will stay set based while the .net code can perform operations on individual records.
Incidentally, this is how things like heirarchyid were implemented in SQL 2008.
The only real downside is that some DBA's don't like to introduce developer code like this into the database server. So depending on your environment, this may not be an option. However, if it is, then it is a very powerful way to take care of your problem while leaving the data and processing within your database server.
Can you create batched statement with 368 calls to your proc, then at least you will not have 368 round trips. ie pseudo code
var lotsOfCommands = "spUpdateOrderDetailDiscountedItem 1; spUpdateOrderDetailDiscountedItem 2;spUpdateOrderDetailDiscountedItem ... 368'
var new sqlcommand(lotsOfCommands)
command.CommandType = CommandType.Text;
//execute command
I had issues when trying to the same thing (via inserts, updates, whatever). While using an OleDbCommand with parameters, it took a bunch of time to constantly re-create the object and parameters each time I called it. So, I made a property on my object for handling such call and also added the appropriate "parameters" to the function. Then, when I needed to actually call/execute it, I would loop through each parameter in the object, set it to whatever I needed it to be, then execute it. This created SIGNIFICANT performance improvement... Such pseudo-code of my operation:
protected OleDbCommand oSQLInsert = new OleDbCommand();
// the "?" are place-holders for parameters... can be named parameters,
// just for visual purposes
oSQLInsert.CommandText = "insert into MyTable ( fld1, fld2, fld3 ) values ( ?, ?, ? )";
// Now, add the parameters
OleDbParameter NewParm = new OleDbParameter("parmFld1", 0);
oSQLInsert.Parameters.Add( NewParm );
NewParm = new OleDbParameter("parmFld2", "something" );
oSQLInsert.Parameters.Add( NewParm );
NewParm = new OleDbParameter("parmFld3", 0);
oSQLInsert.Parameters.Add( NewParm );
Now, the SQL command, and place-holders for the call are all ready to go... Then, when I'm ready to actuall call it, I would do something like..
oSQLInsert.Parameters[0].Value = 123;
oSQLInsert.Parameters[1].Value = "New Value";
oSQLInsert.Parameters[2].Value = 3;
Then, just execute it. The repetition of 100's of calls could be killed by time by creating your commands over and over...
good luck.
Is this a one-time action (like "just import those 368 new customers once") or do you regularly have to do 368 sproc calls?
If it's a one-time action, just go with the 368 calls.
(if the sproc does much more than just updates and is likely to drag down the performance, run it in the evening or at night or whenever no one's working).
IMO, premature optimization of database calls for one-time actions is not worth the time you spend with it.
Bulk CSV Import
(1) Build data output via string builder as CSV then do a Bulk CSV import:
http://msdn.microsoft.com/en-us/library/ms188365.aspx
Table-valued parameters would be best, but since you're on SQL 05, you can use the SqlBulkCopy class to insert batches of records. In my experience, this is very fast.
Update: Looks like the query does not throw any timeout. The connection is timing out.
This is a sample code for executing a query. Sometimes, while executing time consuming queries, it throws a timeout exception.
I cannot use any of these techniques:
1) Increase timeout.
2) Run it asynchronously with a callback. This needs to run in a synchronous manner.
please suggest any other techinques to keep the connection alive while executing a time consuming query?
private static void CreateCommand(string queryString,
string connectionString)
{
using (SqlConnection connection = new SqlConnection(
connectionString))
{
SqlCommand command = new SqlCommand(queryString, connection);
command.Connection.Open();
command.ExecuteNonQuery();
}
}
Since you are using ExecuteNonQuery which does not return any rows, you can try this polling based approach. It executes the query in an asyc manner (without callback)
but the application will wait (inside a while loop) until the query is complete. From MSDN. This should solve the timeout problem. Please try it out.
But, I agree with others that you should think more about optimizing the query to perform under 30 seconds.
IAsyncResult result = command.BeginExecuteNonQuery();
int count = 0;
while (!result.IsCompleted)
{
Console.WriteLine("Waiting ({0})", count++);
System.Threading.Thread.Sleep(1000);
}
Console.WriteLine("Command complete. Affected {0} rows.",
command.EndExecuteNonQuery(result));
You should first check your query to see if it's optimized and it isn't somehow running on missing indexes. 30 seconds is allot for most queries, even on large databases if they are properly tuned. If you have solid proof using the query plan that the query can't be executed any faster than that, then you should increase the timeout, there's no other way to keep the connection, that's the purpose of the timeout to terminate the connection if the query doesn't complete in that time frame.
I have to agree with Terrapin.
You have a few options on how to get your time down. First, if your company employs DBAs, I'd recommend asking them for suggestions.
If that's not an option, or if you want to try some other things first here are your three major options:
Break up the query into components that run under the timeout. This is probably the easiest.
Change the query to optimize the access path through the database (generally: hitting an index as closely as you can)
Change or add indices to affect your query's access path.
If you are constrained from using the default process of changing the timeout value you will most likely have to do a lot more work. The following options come to mind
Validate with your DBA's and another code review that you have truly optimized the query as best as possible
Work on the underlying DB structure to see if there is any gain you can get on the DB side, creating/modifying an idex(es).
Divide it into multiple parts, even if this means running procedures with multiple return parameters that simply call another param. (This option is not elegant, and honestly if your code REALLY is going to take this much time I would be going to management and re-discussing the 30 second timeout)
We recently had a similar issue on a SQL Server 2000 database.
During your query, run this query on your master database on the db server and see if there are any locks you should troubleshoot:
select
spid,
db_name(sp.dbid) as DBname,
blocked as BlockedBy,
waittime as WaitInMs,
lastwaittype,
waitresource,
cpu,
physical_io,
memusage,
loginame,
login_time,
last_batch,
hostname,
sql_handle
from sysprocesses sp
where (waittype > 0 and spid > 49) or spid in (select blocked from sysprocesses where blocked > 0)
SQL Server Management Studio 2008 also contains a very cool activity monitor which lets you see the health of your database during your query.
In our case, it was a networkio lock which kept the database busy. It was some legacy VB code which didn't disconnect its result set quick enough.
If you are prohibited from using the features of the data access API to allow a query to last more than 30 seconds, then we need to see the SQL.
The performance gains to be made by optimizing the use of ADO.NET are slight in comparison to the gains of optimizing the SQL.
And you already are using the most efficient method of executing SQL. Other techniques would be mind numbingly slower (although, if you did a quick retrieval of your rows and some really slow client side processing using DataSets, you might be able to get the initial retrieval down to less than 30 seconds, but I doubt it.)
If we knew if you were doing inserts, then maybe you should be using bulk insert. But we don't know the content of your sql.
This is an UGLY hack, but might help solve your problem temporarily until you can fix the real problem
private static void CreateCommand(string queryString,string connectionString)
{
int maxRetries = 3;
int retries = 0;
while(true)
{
try
{
using (SqlConnection connection = new SqlConnection(connectionString))
{
SqlCommand command = new SqlCommand(queryString, connection);
command.Connection.Open();
command.ExecuteNonQuery();
}
break;
}
catch (SqlException se)
{
if (se.Message.IndexOf("Timeout", StringComparison.InvariantCultureIgnoreCase) == -1)
throw; //not a timeout
if (retries >= maxRetries)
throw new Exception( String.Format("Timedout {0} Times", retries),se);
//or break to throw no error
retries++;
}
}
}
command.CommandTimeout *= 2;
That will double the default time-out, which is 30 seconds.
Or, put the value for CommandTimeout in a configuration file, so you can adjust it as needed without recompiling.
You should break your query up into multiple chunks that each execute within the timeout period.
If you absolutely cannot increase the timeout, your only option is to reduce the time of the query to execute within the default 30 second timeout.
I tend to dislike increasing the connection/command timeout since in my mind that would be a matter of taking care of the symptom, not the problem
have you thought about breaking the query down into several smaller chunks?
Also, have you ran your query against the Database Engine Tuning Advisor in:
Management Studio > Tools > Database Engine Tuning Advisor
Lastly, could we get a look at the query itself?
cheers
Have you tried wrapping your sql inside a stored procedure, they seem to have better memory management. Have seen timeouts like this before in plan sql statement with internal queries using classic ADO. i.e. select * from (select ....) t inner join somthingTable. Where the internal query was returning a very large number of results.
Other tips
1. Performing reads with the with(nolock) execution hint, it's dirty and I don't recommend it but it will tend to be faster.
2. Also look at the execution plan of the sql your trying to run and reduce the row scanning, the order in which you join tables.
3. look at adding some indexes to your tables for faster reads.
4. I've also found that deleting rows is very expensive, you could try and limit the number of rows per call.
5. Swap #table variables with #temporary tables has also worked for me in the past.
6. You may also have saved bad execution plan (heard, never seen).
Hope this helps
Update: Looks like the query does not
throw any timeout. The connection is
timing out.
I.o.w., even if you don't execute a query, the connection times out? because there are two time-outs: connection and query. Everybody seems to focus on the query, but if you get connection timeouts, it's a network problem and has nothing to do with the query: the connection first has to be established before a query can be ran, obviously.
It might be worth trying paging the results back.
just set sqlcommand's CommandTimeout property to 0, this will cause the command to wait until the query finishes...
eg:
SqlCommand cmd = new SqlCommand(spName,conn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.CommandTimeout = 0;