I am noticing the parse_calls are equal to the number of executions in our Oracle 11g database.
select parse_calls, executions
from v$sql order by parse_calls desc;
Running the above query gives the following result.
"PARSE_CALLS" "EXECUTIONS"
87480 87480
87475 87476
87044 87044
26662 26662
21870 21870
21870 21870
As I'm aware this is a major performance drawback. All of these SQL statements are either stored procedures or using bind variables. I'm also reusing the command objects that are calling the stored procedures from C#.
How do I reduce the number of parse calls in this?
Also, is there some method I can distinguish between hard parses and soft parses?
EDIT:
As #DCookie mentioned I ran the following query on the database.
SELECT s2.name, SUM(s1.value)
FROM v$sesstat s1 join v$statname s2 on s1.statistic# = s2.statistic#
WHERE s2.name LIKE '%parse count%'
GROUP BY s2.name
ORDER BY 1,2;
The result is as below
"NAME" "SUM(S1.VALUE)"
"parse count (describe)" 0
"parse count (failures)" 29
"parse count (hard)" 258
"parse count (total)" 11471
So the number of hard parses seem to be very low compared to the number of parses. Thanks to everyone for their responses :)
FINAL UPDATE:
The main issue for the parsing was because we had connection pooling turned off in the connection string. After turning on connection pooling I was able to completely resolve the parsing problem.
Start with this:
SELECT name, SUM(value)
FROM v$sesstat s1 join v$statname s2 on s1.statistic# = s2.statistic#
WHERE s1.name LIKE '%parse count%'
GROUP BY name
ORDER BY 1,2;
This will give you the number of hard parses and total parses. The parse_calls values in your query is total parses, hard and soft.
What does your SQL do? Not much cursor processing, mostly single statements? You are getting pretty much a 1-1 ratio of executions to parses, which, if they are soft parses means you are not doing much cursor processing.
EDIT:
Unless you can modify your code to open and hang on to a cursor for each of your SQL statements, and reuse them as much as possible within a session, I don't think you can avoid the parses.
A parse call has to occur any time a new cursor is created, even if the statement is in the library cache. It is the parse call that checks the library cache. If the statement is found in the library cache, it is a soft parse.
#DCookie has given an answer to your question about checking hard vs. soft parse count. I expect you will find most parse calls are soft parses. Note that you shouldn't expect the counts returned from v$sysstat to be very close to total parse calls from v$sql, since the former is a count since instance startup and the latter just shows the statements that are currently in the library cache.
The only way to avoid parse calls entirely at all is to keep a handle to an existing cursor, and execute it when needed, binding new values in when appropriate. This will occur sometimes by caching of cursors -- it's out of your explicit control, although I believe there are parameters you can change to affect it. In PL/SQL code you can explicitly hold onto cursors that you create and manage using the DBMS_SQL package. I would expect that C# has corresponding capabilities.
In any case, what you're looking at is probably not a problem. Just because counts seem high in no way implies that parsing is a bottleneck on your system.
First of all, you should check whether the SQL statements with those high parse counts are even in your control. When I did a modified version of your query on one of my systems:
select parse_calls, executions, parsing_schema_name,sql_text
FROM v$sql
ORDER BY parse_calls DESC;
I found that the statements with the highest number of parse calls were all recursive SQL parsed by SYS. This may not be the case for you depending on your usage, but it is something to check.
Related
There are two data centers with 3 nodes each. I'm doing two simple inserts (very fast back to back) to the same table with a consistency level of local quorum. The table has one partitioning key and no clustering columns.
Sometimes the first insert wins over the second one. The data produced by the first insert statement is what gets saved in the database even though I do an insert right after that.
C# Code
var statement = "Insert Into customer (id,name) Values (1, "foo")";
statement.SetConsistencyLevel(ConsistencyLevel.LocalQuorum);
session.Execute(statement);
Set the timestamp on client. In most new drivers this is done automatically to better ensure order preserved. However older drivers or pre Cassandra 2.1 its not supported and needs to be in query. I dont know what driver or version you are using, but you can also put it in the CQL. Its supported on protocol level though so driver should have better mechanism.
Something like: var statement = "INSERT INTO customer (id,name) VALUES (1, 'foo') USING TIMESTAMP {microsecond timestamp}";
Best approach is to use a monatomic timestamp so that each call is always higher then last (ie use current milliseconds and add a counter). I don't know C# to tell you how to best approach that. Look at https://docs.datastax.com/en/developer/csharp-driver/3.3/features/query-timestamps/#using-a-timestamp-generator
If you don't have a timestamp set it on the mutation, the coordinator will assign it after it parses the query. Since networks and netty queues can do funny things order is not a sure thing, especially as they end up on different nodes that may have some clock drift.
If I run this command in SSMS:
set showplan_xml on
GO
exec some_procedure 'arg1', 'arg2','arg3'
GO
set showplan_xml off
GO
I get XML output of the full call stack involved in the query execution, as well as any suggested indexes etc.
How might one read this from C#?
(One use case might be to periodically enable this and log these results in a production environment to keep an eye on index suggestions.)
This is, for the most part, two separate (though related) questions.
Is it possible to capture or somehow get the Missing Index information?
If you want only the Suggested Indexes (and don't care about the rest of the execution plan), then you would probably be better off using the DMVs associated with missing indexes. You just need to write some queries instead of app code. Of course, DMV info is reset whenever the service restarts, but you can capture query results into a table if you want/need to keep a history. Please see the following MSDN pages for full details:
sys.dm_db_missing_index_groups
sys.dm_db_missing_index_group_stats
sys.dm_db_missing_index_details
sys.dm_db_missing_index_columns
The only benefit that I can see to capturing the Execution Plan to get this info is that it would include the query text that resulted in the suggestion, which obviously is great for doing that research for determining which indexes to implement, but will also potentially explode the number of rows of data if many variations of a query or queries result in the same suggested index. Just something to keep in mind.
Do not implement suggested indexes programmatically. They are for review and consideration. They are assessed per each query at that moment, and do not take into account:
how many indexes are already on the table
what other queries might benefit from a similar index (meaning, there could be a combination of fields that is not apparent to any individual query, but helps 3 or more queries, and hence only adds one index instead of 3 or more to the table).
Is it possible to programmatically capture execution plans?
Yes, this is definitely doable and I have done it myself. You can do it in .NET whether it is a Console App, Windows Form, Web App, SQLCLR, etc.
Here are the details of what you need to know if you want to capture XML plans:
XML Execution plans are:
sent as separate result sets
sent as datatype of NVARCHAR / string
of two types: Estimated and Actual
ESTIMATED plans:
are just that: estimated
are returned if you execute: SET SHOWPLAN_XML ON;
return only 1 plan that will contain multiple queries if there was more than 1 query in the batch
will return plans for simple queries such as SELECT 1 and DECLARE #Bob INT; SET #Bob = 52;
do not execute any of the queries. Hence, this method will return a single result set being the execution plan
ACTUAL plans:
are the real deal, yo!
are returned if you execute: SET STATISTICS XML ON;
return 1 plan per query as a separate result set
will not return plans for simple queries such as SELECT 1 and DECLARE #Bob INT; SET #Bob = 52;
Execute all queries in the batch. Hence,
Per query, this method will return one or two result sets: if the query returns data, then the query results will be the first result set, and the execution plan will be either the only result set (if the query doesn't return data) or the second result set
For multiple queries, the execution plans will be interspersed with any query results. But, since some queries do not return any results, you cannot simple capture every other result set. I test for a single field in the result set, of type NVARCHAR, with a field name of Microsoft SQL Server 2005 XML Showplan (which has been consistent, at least up through SQL Server 2014; I haven't yet tested SQL Server 2016).
for testing purposes you might want to wrap these queries in a BEGIN TRAN; / COMMIT TRAN; so that no actual data modifications occur.
SET commands need to be in their own batch, so get plans via something like:
SqlConnection _Connection = new sqlConnection(_ConnectionStringFromSomewhere);
SqlCommand _Command = _Connection.CreateCommand();
SqlDataReader _Reader = null;
try
{
_Connection.Open();
// SET command needs to be in its own batch
_Command.CommandText = "SET something ON";
_Command.ExecuteNonQuery();
// Now we can run the desired query
_Command.CommandText = _QueryToTest;
_Reader = _Command.ExecuteReader();
..get you some execution plans!
}
finally
{
if (_Reader != null)
{
_Reader.Dispose();
}
_Command.Dispose();
_Connection.Dispose();
}
As a final note I will mention that for anyone interested in capturing execution plans but not interested in writing any code to get them, I have already implemented this as a SQLCLR stored procedure. The procedure gets not only the XML Execution Plan(s), but also the output from STATISTICS TIME and STATISTICS IO, both of which are harder to capture as they are returned as messages (just like PRINT statements). And, the results of all 3 types of output can be captured into tables for further analysis across multiple executions (handy for doing A / B comparisons of current and revised code). This is available in the SQL# SQLCLR library (which again, I am the author of). Please note that while there is a Free version of SQL#, this particular stored procedure, DB_GetQueryInfo, is only available in the Full version, not the Free version.
UPDATE:
Interestingly enough, I just ran across the following MSDN article that describes how to use SQLCLR to grab the estimated plan, extract the estimated cost, pass it back as an OUTPUT parameter of the SQLCLR Stored Procedure, and then make a decision based on that. I don't think I would use it for such a purpose, but very interesting given that the article was written in 2005:
Processing XML Showplans Using SQLCLR in SQL Server 2005
I am working on a C# application, which loads data from a MS SQL 2008 or 2008 R2 database. The table looks something like this:
ID | binary_data | Timestamp
I need to get only the last entry and only the binary data. Entries to this table are added irregular from another program, so I have no way of knowing if there is a new entry.
Which version is better (performance etc.) and why?
//Always a query, which might not be needed
public void ProcessData()
{
byte[] data = "query code get latest binary data from db"
}
vs
//Always a smaller check-query, and sometimes two queries
public void ProcessData()
{
DateTime timestapm = "query code get latest timestamp from db"
if(timestamp > old_timestamp)
data = "query code get latest binary data from db"
}
The binary_data field size will be around 30kB. The function "ProcessData" will be called several times per minutes, but sometimes can be called every 1-2 seconds. This is only a small part of a bigger program with lots of threading/database access, so I want to the "lightest" solution. Thanks.
Luckily, you can have both:
SELECT TOP 1 binary_data
FROM myTable
WHERE Timestamp > #last_timestamp
ORDER BY Timestamp DESC
If there is a no record newer than #last_timestamp, no record will be returned and, thus, no data transmission takes place (= fast). If there are new records, the binary data of the newest is returned immediately (= no need for a second query).
I would suggest you perform tests using both methods as the answer would depend on your usages. Simulate some expected behaviour.
I would say though, that you are probably okay to just do the first query. Do what works. Don't prematurely optimise, if the single query is too slow, try your second two-query approach.
Two-step approach is more efficient from overall workload of system point of view:
Get informed that you need to query new data
Query new data
There are several ways to implement this approach. Here are a pair of them.
Using Query Notifications which is built-in functionality of SQL Server supported in .NET.
Using implied method of getting informed of database table update, e.g. one described in this article at SQL Authority blog
I think that the better path is a storedprocedure that keeps the logic inside the database, Something with an output parameter with the data required and a return value like a TRUE/FALSE to signal the presence of new data
I have to write the code for the following method:
public IEnumerable<Product> GetProducts(int pageNumber, int pageSize, string sortKey, string sortDirection, string locale, string filterKey, string filterValue)
The method will be used by a web UI and must support pagination, sorting and filtering. The database (SQL Server 2008) has ~250,000 products. My question is the following: where do I implement the pagination, sorting and filtering logic? Should I do it in a T-SQL stored procedure or in the C# code?
I think that it is better if I do it in T-SQL but I will end up with a very complex query. On the other hand, doing that in C# implies that I have to load the entire list of products, which is also bad...
Any idea what is the best option here? Am I missing an option?
You would definitely want to have the DB do this for you. Moving ~250K records up from the database for each request will be a huge overhead. If you are using LINQ-to-SQL, the Skip and Take methods will do this (here is an example), but I don't know exactly how efficient they are.
I think other (and potentionaly best) option is to use some higher level framework that shield you from complexity of query writing. EntityFramework, NHibernate and LINQ(toSQL) help you a lot. That said database is typically best place to do it in your case.
today itself I implement pagination for my website. I have done with stored procedure though I am using Entity-Framework. I found that executing a complex query is better then fetching all records and doing pagination with code. So do it with stored procedure.
And I see your code line, which you have attached, I have implemented in same way only.
I would definatly do it in a stored procedure something along the lines of :
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY Quantity) AS row, *
FROM Products
) AS a WHERE row BETWEEN 11 AND 20
If you are using linq then the Take and Skip methods will take care of this for you.
Definitely in the DB for preference, if at all possible.
Sometimes you can mix things up a bit such as if you have the results returned from a database function (not a stored procedure, functions can be parts of larger queries in ways that stored procedures cannot), then you can have another function order and paginate, or perhaps have Linq2SQL or similar call for a page of results from said function, producing the correct SQL as needed.
If you can at least get the ordering done in the database, and will usually only want the first few pages (quite often happens in real use), then you can at least have reasonable performance for those cases, as only enough rows to skip to, and then take, the wanted rows need be loaded from the db. You of course still need to test that performance is reasonable in those rare cases where someone really does look for page 1,2312!
Still, that's only a compromise for cases where paging is very difficult indeed, as a rule always page in the DB unless it's either extremely difficult for some reason, or the total number of rows is guaranteed to be low.
I have a situation where I have to dynamically create my SQL strings and I'm trying to use paramaters and sp_executesql where possible so I can reuse query plans. In doing lots of reading online and personal experience I have found "NOT IN"s and "INNER/LEFT JOIN"s to be slow performers and expensive when the base (left-most) table is large (1.5M rows with like 50 columns). I also have read that using any type of function should be avoided as it slows down queries, so I'm wondering which is worse?
I have used this workaround in the past, although I'm not sure it's the best thing to do, to avoid using a "NOT IN" with a list of items when, for example I'm passing in a list of 3 character strings with, for example a pipe delimiter (only between elements):
LEN(#param1) = LEN(REPLACE(#param1, [col], ''))
instead of:
[col] NOT IN('ABD', 'RDF', 'TRM', 'HYP', 'UOE')
...imagine the list of strings being 1 to about 80 possible values long, and this method doesn't lend it self to paraterization either.
In this example I can use "=" for a NOT IN and I would use a traditional list technique for my IN, or != if that is a faster although I doubt it. Is this faster than using the NOT IN?
As a possible third alternative, what if I knew all the other possibilities (the IN possabilities, which could potentially be 80-95x longer list) and pass those instead; this would be done in the application's Business Layer as to take the workload off of the SQL Server. Not a very good possability for query plan reuse but if it shaves a sec or two off a big nasty query, why the hell not.
I'm also adept at SQL CLR function creation. Since the above is string manipulation would a CLR function be best?
Thoughts?
Thanks in advance for any and all help/advice/etc.
As Donald Knuth is often (mis)quoted, "premature optimization is the root of all evil".
So, first of all, are you sure that if you write your code in the most clear and simple way (to both write and read), it performs slowly? If not, check it, before starting to use any "clever" optimization tricks.
If the code is slow, check the query plans thouroughly. Most of the time query execution takes much longer than query compilation, so usually you do not have to worry about query plan reuse. Hence, building optimal indexes and/or table structures usually gives significantly better results than tweaking the ways the query is built.
For instance, I seriously doubt that your query with LEN and REPLACE has better performance than NOT IN - in either case all the rows will be scanned and checked for a match. For a long enough list MSSQL optimizer would automatically create a temp table to optimize equality comparison.
Even more, tricks like this tend to introduce bugs: say, your example would work incorrectly if [col] = 'AB'.
IN queries are often faster then NOT IN, because for IN queries only part of the rows is enough to be checked. The efficiency of the method depends on whether you can get a correct list for IN quickly enough.
Speaking of passing a variable-length list to the server, there're many discussions here on SO and elsewhere. Generally, your options are:
table-valued parameters (MSSQL 2008+ only),
dynamically constructed SQL (error prone and/or unsafe),
temp tables (good for long lists, probably too much overhead in writing and execution time for short ones),
delimited strings (good for short lists of 'well-behaved' values - like a handful of integers),
XML parameters (somewhat complex, but works well - if you use a good XML library and do not construct complex XML text 'by hand').
Here is an article with a good overview of these techniques and a few more.
I have found "NOT IN"s and "INNER/LEFT JOIN"s to be slow performers and expensive when the base (left-most) table is large
It shouldn't be slow if you indexed your table correctly. Something that can make the query slow is if you have a dependent subquery. That is, the query must be re-evaluated for each row in the table because the subquery references values from the outer query.
I also have read that using any type of function should be avoided as it slows down queries
It depends. SELECT function(x) FROM ... probably won't make a huge difference to the performance. The problems are when you use function of a column in other places in the query such as JOIN conditions, WHERE clause, or ORDER BY as it may mean that an index cannot be used. A function of a constant value is not a problem though.
Regarding your query, I'd try using [col] NOT IN ('ABD', 'RDF', 'TRM', 'HYP', 'UOE') first. If this is slow, make sure that you have indexed the table appropriately.
First off, since you are only filtering out a small percentage of the records, chances are the index on col isn't being used at all so SARG-ability is moot.
So that leaves query plan reuse.
If you are on SQL Server 2008, replace #param1 with a table-valued parameter, and have your application pass that instead of a delimited list. This solves your problem completely.
If you are on SQL Server 2005, I don't think it matters. You could split the delimited list and use NOT IN/NOT EXISTS against the table, but what's the point if you won't get an index seek on col?
Can anyone speak to the last point? Would splitting the list to a table var and then anti-joining it save enough CPU cycles to offset the setup cost?
EDIT, third method for SQL Server 2005 using XML, inspired by OMG Ponies' link:
DECLARE #not_in_xml XML
SET #not_in_xml = N'<values><value>ABD</value><value>RDF</value></values>'
SELECT * FROM Table1
WHERE #not_in_xml.exist('/values/value[text()=sql:column("col")]') = 0
I have no idea how well this performs compared to a delimited list or TVP.