my Code is working, the function gives me the correct Select count (*) value but anyway, it throws an ORA-25191 Exception - Cannot reference overflow table of an index-organized table tips,
at retVal = Convert.ToInt32(cmd.ExecuteScalar());
Since I use the function very often, the exceptions slow down my program tremendously.
private int getSelectCountQueryOracle(string Sqlquery)
{
try
{
int retVal = 0;
using (DataTable dataCount = new DataTable())
{
using (OracleCommand cmd = new OracleCommand(Sqlquery))
{
cmd.CommandType = CommandType.Text;
cmd.Connection = oraCon;
using (OracleDataAdapter dataAdapter = new OracleDataAdapter())
{
retVal = Convert.ToInt32(cmd.ExecuteScalar());
}
}
}
return retVal;
}
catch (Exception ex)
{
exceptionProtocol("Count Function", ex.ToString());
return 1;
}
}
This function is called in a foreach loop
// function call in foreach loop which goes through tablenames
foreach (DataRow row in dataTbl.Rows)
{...
tableNameFromRow = row["TABLE_NAME"].ToString();
tableRows=getSelectCountQueryOracle("select count(*) as 'count' from " +tableNameFromRow);
tableColumns = getSelectCountQueryOracle("SELECT COUNT(*) as 'count' FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name='" + tableNameFromRow + "'");
...}
dataTbl.rows in this outer loop, in turn, comes from the query
SELECT * FROM USER_TABLES ORDER BY TABLE_NAME
If you're using a database-agnostic API like ADO.Net, you would almost always want to use the API's framework to fetch metadata rather than writing custom queries against each database's metadata tables. The various ADO.Net providers are much more likely to write data dictionary queries that handle all the various corner cases and are much more likely to be optimized than the queries you're likely to write. So rather than writing your own query to populate the dataTbl data table, you'd want to use the GetSchema method
DataTable dataTbl = connection.GetSchema("Tables");
If you want to keep your custom-coded data dictionary query for some reason, you'd need to filter out the IOT overflow tables since you can't query those directly.
select *
from user_tables
where iot_type IS NULL
or iot_type != 'IOT_OVERFLOW'
Be aware, however, that there are likely to be other tables that you don't want to try to get a count from. For example, the dropped column indicates whether a table has been dropped-- presumably, you don't want to count the number of rows in an object in the recycle bin. So you'd want a dropped = 'NO' predicate as well. And you can't do a count(*) on a nested table so you'd want to have a nested = 'NO' predicate as well if your schema happens to contain nested tables. There are probably other corner cases depending on the exact set of features your particular schema makes use of that the developers of the provider have added code for that you'd have to deal with.
So I'd start with
select *
from user_tables
where ( iot_type IS NULL
or iot_type != 'IOT_OVERFLOW')
and dropped = 'NO'
and nested = 'NO'
but know that you'll probably need/ want to add some additional filters depending on the specific features users make use of. I'd certainly much rather let the fine folks that develop the ADO.Net provider worry about all those corner cases than to deal with finding all of them myself.
Taking a step back, though, I'd question why you're regularly doing a count(*) on every table in a schema and why you need an exact answer. In most cases where you're doing counts, you're either doing a one-off where you don't much care how long it takes (i.e. a validation step after a migration) or approximate counts would be sufficient (i.e. getting a list of the biggest tables in the system in order to triage some effort or to track growth over time for projections) in which case you could just use the counts that are already stored in the data dictionary- user_tables.num_rows- from the last time that statistics were run.
This article helped me to solve my problem.
I've changed my query to this:
SELECT * FROM user_tables
WHERE iot_type IS NULL OR iot_type != 'IOT_OVERFLOW'
ORDER BY TABLE_NAME
Related
I have a piece of C# code, which updates two specific columns for ~1000x20 records in a database on the localhost. As I know (though I am really far from being a database expert), it should not take long, but it takes more than 5 minutes.
I tried SQL Transactions, with no luck. SqlBulkCopy seems a bit overkill, since it's a large table with dozens of columns, and I only have to update 1/2 column for a set of records, so I would like to keep it simple. Is there a better approach to improve efficiency?
The code itself:
public static bool UpdatePlayers(List<Match> matches)
{
using (var connection = new SqlConnection(Database.myConnectionString))
{
connection.Open();
SqlCommand cmd = connection.CreateCommand();
foreach (Match m in matches)
{
cmd.CommandText = "";
foreach (Player p in m.Players)
{
// Some player specific calculation, which takes almost no time.
p.Morale = SomeSpecificCalculationWhichMilisecond();
p.Condition = SomeSpecificCalculationWhichMilisecond();
cmd.CommandText += "UPDATE [Players] SET [Morale] = #morale, [Condition] = #condition WHERE [ID] = #id;";
cmd.Parameters.AddWithValue("#morale", p.Morale);
cmd.Parameters.AddWithValue("#condition", p.Condition);
cmd.Parameters.AddWithValue("#id", p.ID);
}
cmd.ExecuteNonQuery();
}
}
return true;
}
Updating 20,000 records one at a time is a slow process, so taking over 5 minutes is to be expected.
From your query, I would suggest putting the data into a temp table, then joining the temp table to the update. This way it only has to scan the table to update once, and update all values.
Note: it could still take a while to do the update if you have indexes on the fields you are updating and/or there is a large amount of data in the table.
Example update query:
UPDATE P
SET [Morale] = TT.[Morale], [Condition] = TT.[Condition]
FROM [Players] AS P
INNER JOIN #TempTable AS TT ON TT.[ID] = P.[ID];
Populating the temp table
How to get the data into the temp table is up to you. I suspect you could use SqlBulkCopy but you might have to put it into an actual table, then delete the table once you are done.
If possible, I recommend putting a Primary Key on the ID column in the temp table. This may speed up the update process by making it faster to find the related ID in the temp table.
Minor improvements;
use a string builder for the command text
ensure your parameter names are actually unique
clear your parameters for the next use
depending on how many players in each match, batch N commands together rather than 1 match.
Bigger improvement;
use a table value as a parameter and a merge sql statement. Which should look something like this (untested);
CREATE TYPE [MoraleUpdate] AS TABLE (
[Id] ...,
[Condition] ...,
[Morale] ...
)
GO
MERGE [dbo].[Players] AS [Target]
USING #Updates AS [Source]
ON [Target].[Id] = [Source].[Id]
WHEN MATCHED THEN
UPDATE SET SET [Morale] = [Source].[Morale],
[Condition] = [Source].[Condition]
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(...));
dt.Columns.Add("Morale", typeof(...));
dt.Columns.Add("Condition", typeof(...));
foreach(...){
dt.Rows.Add(p.Id, p.Morale, p.Condition);
}
SqlParameter sqlParam = cmd.Parameters.AddWithValue("#Updates", dt);
sqlParam.SqlDbType = SqlDbType.Structured;
sqlParam.TypeName = "dbo.[MoraleUpdate]";
cmd.ExecuteNonQuery();
You could also implement a DbDatareader to stream the values to the server while you are calculating them.
Recent bug report states that a method being called is crashing the service causing it to restart. After troubleshooting, the cause was found to be an obnoxious Oracle SQL call with thousands of strings passed. There is a collection of strings being passed to a method from an external service which often is more than 10,000 records. The original code used a where clause on the passed collection using the LIKE keyword, which I think is really, really bad.
public IList<ContainerState> GetContainerStates(IList<string> containerNumbers)
{
string sql =
String.Format(#"Select CTNR_NO, CNTR_STATE FROM CONTAINERS WHERE CTRN_SEQ = 0 AND ({0})",
string.Join("OR", containerNumbers
.Select(item => string.Concat(" cntr_no LIKE '", item.SliceLeft(10), "%' ")))
);
return DataBase.SelectQuery(sql, MapRecordToContainerState, new { }).ToList();
}
Clarification of in house methods used which may be confusing:
DataBase.SelectQuery is an internal library method using generics which gets passed the sql string, a function to map the records to .NET objects, and the parameters being passed and returns an IEnumerable of Objects of type retuned by the Mapping function.
SliceLeft is an extension method from another internal helper library that just returns the first part of a string up to the number of characters specified by the parameter.
The reason that the LIKE statement was apparently used, is that the strings being passed and the strings in the database only are guaranteed to match the first 10 characters. Example ("XXXX000000-1" in the strings being passed should match a database record like "XXXX000000-8").
I believed that the IN clause using the SUBSTR would be more efficent than using multiple LIKE clauses and replaced the code with:
public IList<ContainerRecord> GetContainerStates(IList<string> containerNumbers)
{
string sql =
String.Format(#"Select CTNR_NO, CNTR_STATE FROM CONTAINERS WHERE CTRN_SEQ = 0 AND ({0})",
string.Format("SUBSTR(CNTR_NO, 1, 10) IN ({0}) ",
string.Join(",", containerNumbers.Select(item => string.Format("\'{0}\'", item.SliceLeft(10) ) ) )
)
);
return DataBase.SelectQuery(sql, MapRecordToContainerState, new { }).ToList();
}
This helped slightly, and there were fewer issues in my tests, but when there are huge amounts of records passed, there is still an exception thrown and core dumps occur, as the SQL is longer than the server can parse during these times. The DBA suggests saving all the strings being passed to a temporary table, and then joining against that temp table.
Given that advice, I changed the function to:
public IList<ContainerRecord> GetContainerStates(IList<string> containerNumbers)
{
string sql =
#"
CREATE TABLE T1(cntr_num VARCHAR2(10));
DECLARE GLOBAL TEMPORARY TABLE SESSION.T1 NOT LOGGED;
INSERT INTO SESSION.T1 VALUES (:containerNumbers);
SELECT
DISTINCT cntr_no,
'_IT' cntr_state
FROM
tb_master
WHERE
cntr_seq = 0
AND cntr_state IN ({0})
AND adjustment <> :adjustment
AND SUBSTR(CTNR_NO, 1, 10) IN (SELECT CNTR_NUM FROM SESSION.T1);
";
var parameters = new
{
#containerNumbers = containerNumbers.Select( item => item.SliceLeft(10)).ToList()
};
return DataBase.SelectQuery(sql, MapRecordToContainerState, parameters).ToList();
}
Now I'm getting a "ORA-00900: invalid SQL statement". This is really frustrating, how can I properly write a SQL Statement that will put this list of strings into a temporary table and then use it in a SELECT Statement to return the list I need?
There are couple possible places could cause this error, it seams that the "DECLARE GLOBAL TEMPORARY" is a JAVA API, I don't think .net has this function. Please try "Create global temporary table" instead. And, I don't know whether your internal API could handle multiple SQLs in one select sql. As far as I know, ODP.net Command class can only execute one sql per call. Moreover, "create table" is a DDL, it therefore has its own transaction. I can't see any reason we should put them in the same sql to execute. Following is a sample code for ODP.net,
using (OracleConnection conn = new OracleConnection(BD_CONN_STRING))
{
conn.Open();
using (OracleCommand cmd = new OracleCommand("create global temporary table t1(id number(9))", conn))
{
// actually this should execute once only
cmd.ExecuteNonQuery();
}
using (OracleCommand cmd = new OracleCommand("insert into t1 values (1)", conn)) {
cmd.ExecuteNonQuery();
}
// customer table is a permenant table
using (OracleCommand cmd = new OracleCommand("select c.id from customer c, t1 tmp1 where c.id=tmp1.id", conn)) {
cmd.ExecuteNonQuery();
}
}
I have a list Called ListTypes that holds 10 types of products. Below the store procedure loops and gets every record with the product that is looping and it stores it in the list ListIds. This is killing my sql box since I have over 200 users executing this constantly all day.
I know is not a good architecture to loop a sql statement, but this the only way I made it work. Any ideas how I can make this without looping? Maybe a Linq statement, I never used Linq with this magnitude. Thank you.
protected void GetIds(string Type, string Sub)
{
LinkedIds.Clear();
using (SqlConnection cs = new SqlConnection(connstr))
{
for (int x = 0; x < ListTypes.Count; x++)
{
cs.Open();
SqlCommand select = new SqlCommand("spUI_LinkedIds", cs);
select.CommandType = System.Data.CommandType.StoredProcedure;
select.Parameters.AddWithValue("#Type", Type);
select.Parameters.AddWithValue("#Sub", Sub);
select.Parameters.AddWithValue("#TransId", ListTypes[x]);
SqlDataReader dr = select.ExecuteReader();
while (dr.Read())
{
ListIds.Add(Convert.ToInt32(dr["LinkedId"]));
}
cs.Close();
}
}
}
Not a full answer, but this wouldn't fit in a comment. You can at least update your existing code to be more efficient like this:
protected List<int> GetIds(string Type, string Sub, IEnumerable<int> types)
{
var result = new List<int>();
using (SqlConnection cs = new SqlConnection(connstr))
using (SqlCommand select = new SqlCommand("spUI_LinkedIds", cs))
{
select.CommandType = System.Data.CommandType.StoredProcedure;
//Don't use AddWithValue! Be explicit about your DB types
// I had to guess here. Replace with the actual types from your database
select.Parameters.Add("#Type", SqlDBType.VarChar, 10).Value = Type;
select.Parameters.Add("#Sub", SqlDbType.VarChar, 10).Value = Sub;
var TransID = select.Parameters.Add("#TransId", SqlDbType.Int);
cs.Open();
foreach(int type in types)
{
TransID.Value = type;
SqlDataReader dr = select.ExecuteReader();
while (dr.Read())
{
result.Add((int)dr["LinkedId"]);
}
}
}
return result;
}
Note that this way you only open and close the connection once. Normally in ADO.Net it's better to use a new connection and re-open it for each query. The exception is in a tight loop like this. Also, the only thing that changes inside the loop this way is the one parameter value. Finally, it's better to design methods that don't rely on other class state. This method no longer needs to know about the ListTypes and ListIds class variables, which makes it possible to (among other things) do better unit testing on the method.
Again, this isn't a full answer; it's just an incremental improvement. What you really need to do is write another stored procedure that accepts a table valued parameter, and build on the query from your existing stored procedure to JOIN with the table valued parameter, so that all of this will fit into a single SQL statement. But until you share your stored procedure code, this is about as much help as I can give you.
Besides the improvements others wrote.
You could insert your ID's into a temp table and then make one
SELECT * from WhatEverTable WHERE transid in (select transid from #tempTable)
On a MSSQL this works really fast.
When you're not using a MSSQL it could be possible that one great SQL-Select with joins is faster than a SELECT IN. You have to test these cases by your own on your DBMS.
According to your comment:
The idea is lets say I have a table and I have to get all records from the table that has this 10 types of products. How can I get all of this products? But this number is dynamic.
So... why use a stored procedure at all? Why not query the table?
//If [Type] and [Sub] arguments are external inputs - as in, they come from a user request or something - they should be sanitized. (remove or escape '\' and apostrophe signs)
//create connection
string queryTmpl = "SELECT LinkedId FROM [yourTable] WHERE [TYPE] = '{0}' AND [SUB] = '{1}' AND [TRANSID] IN ({2})";
string query = string.Format(queryTmpl, Type, Sub, string.Join(", ", ListTypes);
SqlCommand select = new SqlCommand(query, cs);
//and so forth
To use Linq-to-SQL you would need to map the table to a class. This would make the query simpler to perform.
I have a method for adding values to the database for all operations.
If this is selected from the database and this select return more rows from the database,
how can I get the rows and store in an array?
This is the method code :
public void ExcuteProcedure(string procName, List<SqlParameter> procparams)
{
try
{
SqlConnection mycon = new SqlConnection(connectionString);
mycon.Open();
SqlCommand mycom = new SqlCommand();
mycom.Connection = mycon;
mycom.CommandText = procName;
mycom.CommandType = CommandType.StoredProcedure;
foreach (var item in procparams)
{
SqlParameter myparm = new SqlParameter();
myparm.ParameterName = item.ParameterName;
// myparm.SqlDbType = item.SqlDbType;
myparm.Value = item.Value;
mycom.Parameters.Add(myparm);
}
var n= mycom.ExecuteScalar();
mycon.Close();
}
catch (SqlException e)
{
Console.WriteLine("Error Number is : " + e.Number);
Console.WriteLine("Error Message is : " + e.Message);
}
}
You need to call mycom.ExecuteReader(), which will give you a SqlDataReader which can read through the results.
Call Read() to advance through the rows.
It never ceases to amaze me the number of times I see devs trying to abstract away simple database connectivity; and the myriad of ways they inevitably screw it up.
The following may sound mean, but it needs said:
Clean up your code, it leaks like a sieve. Using clauses around the connection and command objects are pretty much mandatory. As it stands if you forget a single parameter or put in a bad value you will leak connections. Once the connection pool is filled up your app will crash in all sorts of interesting, and usually hard to debug, ways.
Next, if you aren't sure how to properly get records back from a database then you probably shouldn't try to abstract the code calling your procedures. Either use a lightweight ORM like Dapper or learn how what you are doing will ultimately involve a lot of extraneous code that the next developer on your project will want to rip out.
/rant over.
Getting back to the question: ExecuteScalar returns a single value. You need to use ExecuteReader. I'd suggest that you simply take the results of the reader, stuff it into a datatable and pass that back to the calling code.
var n = mycom.ExecuteScalar();
Scalar: an atomic quantity that can hold only one value at a time
Return a DataReader instead, and iterate through its rows
Fill a DataSet by using a DataAdapter (this is more appropriate if you have multiple tables in the result set).
I have a sequence of sql queries that result in very large datasets that I have to query against a database and write them to files. I have about 80 queries and each one produces somewhere between 1000 records to 10,000,000 records. I cannot change the queries themselves. What I'm trying to do is read 500,000 records at a time for each query and write to a file. Here's what I have so far
void WriteXml(string tableName, string queryString)
{
int pageSize = 500000;
int currentIndex = 0;
using (
SqlConnection connection =
new SqlConnection(CONNECTION_STRING))
{
using (SqlCommand command = new SqlCommand(queryString, connection))
{
try
{
connection.Open();
SqlDataAdapter dataAdapter = new SqlDataAdapter(command);
int rowsRead = 0, count = 0, index = 0;
do
{
DataSet dataSet = new DataSet("SomeDatasetName");
rowsRead = dataAdapter.Fill(dataSet, currentIndex, pageSize, tableName);
currentIndex += rowsRead;
if (dataSet.Tables.Count > 0 && rowsRead > 0)
{
dataSet.Tables[0].WriteXml(string.Format(#"OutputXml\{0}_{1}.xml", tableName, index++),
XmlWriteMode.WriteSchema);
}
}
while (rowsRead > 0);
}
catch (Exception e)
{
Log(e);
}
}
}
}
This works but it's very very slow. I'm pretty sure I'm doing something wrong here because when I run it, the application hogs up most of my memory (I have 6GB) and it takes for ever to run. I started it last night and it is still running. I understand I'm dealing with a lot a records but I don't think it's something that would take so many hours to run.
Is this the right way to do paged/segmented data read from a database? Is there any way this method could be optimized or is there any other way I can approach this?
Do let me know if I'm not clear on anything and I'll try to provide clarification.
The paging overloads for DataAdapter.Fill still get the entire result set beneath the covers. Read here:
http://msdn.microsoft.com/en-us/library/tx1c9c2f%28vs.71%29.aspx
the part that pertains to your question:
The DataAdapter provides a facility for returning only a page of data,
through overloads of the Fill method. However, this might not be the
best choice for paging through large query results because, while the
DataAdapter fills the target DataTable or DataSet with only the
requested records, the resources to return the entire query are still
used. To return a page of data from a data source without using the
resources required to return the entire query, specify additional
criteria for your query that reduces the rows returned to only those
required.
In Linq2Sql, there are convenient methods Skip and Take for paging through data. You could roll your own by using a parameterized query constructed to do the same thing. Here is an example to skip 100, and take 20 rows:
SELECT TOP 20 [t0].[CustomerID], [t0].[CompanyName],
FROM [Customers] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT TOP 100 [t1].[CustomerID]
FROM [Customers] AS [t1]
WHERE [t1].[City] = #p0
ORDER BY [t1].[CustomerID]
) AS [t2]
WHERE [t0].[CustomerID] = [t2].[CustomerID]
))) AND ([t0].[City] = #p1)
ORDER BY [t0].[CustomerID]