C#/ODP.NET: large IN clause workaround

C#/ODP.NET: large IN clause workaround - c#

We have a C# component that handles attaching arbitrary-sized element lists into IN clauses for semi-arbitrary SQL SELECT queries. Essentially this boils down to receiving something like:
SELECT COUNT(*) FROM a WHERE b IN (...)
...where the "..." is the only portion of the query the component is allowed to modify.
Currently the component will insert a comma-separated set of named bind parameters, then attach the corresponding IDbDataParameter objects to the command and execute; the component is made aware of the types for the parameters it has to bind. This works well, until the calling code supplies a parameter set larger than the database is willing to accept. The objective here is to get such large sets working with queries against Oracle 11gR2 via ODP.NET.
This task is complicated somewhat by the following approaches being deemed unacceptable by those setting the requirements:
Global Temporary Tables
Stored procedures
Anything requiring CREATE TYPE to have been executed
The solution to this is not required to execute only one query.
I'm trying to make this work by binding the clause as an array, using code sourced from elsewhere:
IList<string> values;
//...
OracleParameter parameter = new OracleParameter();
parameter.ParameterName = "parm";
parameter.DbType = DbType.String;
parameter.Value = values.ToArray();
int[] sizes = new int[values.Count];
for (int index = 0; index < values.Count; index++)
{
sizes[index] = values[index].Length;
}
parameter.ArrayBindSize = sizes;
//...
The command subsequently executes without throwing an exception, but the value returned for COUNT is zero (compared to the expected value, from running the query in SQLDeveloper with a nested SELECT returning the same parameter set). Going through the ODP.NET docs hasn't brought any joy thus far.
The questions for this are:
Is there a way to make the above parameter attachment work as expected?
Is there another viable way to achieve this without using one of the vetoed approaches?
(I'm aware this is similar to this (unanswered) question, but that scenario does not mention having the same restrictions on approaches.)

Well, since you are not allowed to use Global Temporary Tables, are you at least allowed to create normal tables? If so, here is a way:
Create an OracleCommand object with the following command text:
#"BEGIN
CREATE TABLE {inListTableName}
(
inValue {dbDataType}
)
INSERT INTO {inListTableName}(inValue) VALUES(:inValue);
END"
Set the ArrayBindCount on the command object to the number of items you need in your in list.
Replace {inListTableName} with the Guid.NewGuid().ToString().
Replace the {dbDataType} with the correct oracle data type for the list of values that you want to use in your in clause.
Add an OracleParameter to the OracleCommand named "inValue" and set the value of the parameter to an array containing the values that you want in your in clause. If you have a Hashset (which I recommend using to avoid sending unnecessary duplicates), use the .ToArray() on it to get an array.
Execute this command. This is your prep command.
Then use the following sql snippet as the value portion of the in clause in your select sql statement:
(SELECT {inListTableName}.inValue FROM {inListTableName})
For example:
SELECT FirstName, LastName FROM Users WHERE UserId IN (SELECT {inListTableName}.inValue FROM {inListTableName});
Execute this command to get a reader.
Lastly, one more command with the following command text:
DROP TABLE {inListTableName};
This is your cleanup command. Execute this command.
You might want to create an alternate schema/user to create the inListTable so that you can grant appropriate permissions to your user to only create tables in that schema.
All of this can be encapsulated in a reusable class with the following interface:
public interface IInListOperation
{
void TransmitValueList(OracleConnection connection);
string GetInListSQLSnippet();
void RemoveValueList();
}
TransmitValueList would create your prep command, add the parameter and execute the prep command.
GetInListSQLSnippet would simply return (SELECT {inListTableName}.inValue FROM {inListTableName});
RemoveValueList cleans up.
The constructor for this class would take the value list and oracle db data type, and generate the inListTableName.
If you can use a Global Temporary Table, I would recommend that over creating and dropping tables.
Edit:
I'd like to add that this approach works well if you have clauses involving NOT IN lists or other inequality operators. Take the following for example:
SELECT FirstName, LastName FROM Users WHERE Status == 'ACTIVE' OR UserID NOT IN (1,2,3,4,5,6,7,8,9,10);
If you use the approach of splitting the NOT IN part up, you will end up getting invalid results. The following example of dividing the previous example will return all users instead of all but those with UserIds 1-10.
SELECT FirstName, LastName FROM Users WHERE UserID NOT IN (1,2,3,4,5)
UNION
SELECT FirstName, LastName FROM Users WHERE UserID NOT IN (6,7,8,9,10);

Maybe this is too simplistic for the kind of query you're doing, but is there any reason why you couldn't split this into several queries and combine the results together in code?
i.e. Let's imagine 5 elements are too many for the query...
select COUNT(*) from A where B in (1,2,3,4,5)
you'd separately perform
select COUNT(*) from A where B in (1,2,3)
select COUNT(*) from A where B in (4,5)
and then add those results together. Of course, you'd have to make sure that in-clause list is distinct so you don't double up on your counts.
If you can do it this way, there is an added opportunity for parallelism if you're allowed more than one connection.

Related

How to generate nHibernate raw sql for Insert/Update without execution?

Is it possible to generate a raw SQL statement for insert/update operations using nHibernate without actually executing them? Assuming of course that everything (mappings, connectionStrings, etc.) is properly configured?
The closest thing I've found is to call:
Session.SessionFactory.GetClassMetadata(typeof(Client))
Which returns an object of type SingleTableEntityPersister containing SQLIdentityInsertString, that looks like this:
INSERT INTO Client (FirstName, LastName) values (?, ?)
But it would still require me to bind all of the properties manually, and on top of that SQLIdentityInsertString is a protected property. Are there any proper ways of doing that?

Okay, the closest thing I've found is to construct your own sql query with a string builder. First you need to extract your class metadata:
var metaData = Session.SessionFactory.GetClassMetadata(typeof(Client)) as SingleTableEntityPersister;
Then you can retrieve other information, such as:
var propertyNames = metaData.PropertyNames;
var tableName = metaData.TableName;
var firstPropertyValue = metaData.GetPropertyValue(client, propertyNames[0], EntityMode.Poco);
Once you have that information you can construct your own query manually. Not exactly the solution I wanted, but it's as close as it gets, I think.
However, one thing to note is that Session.CreateSQLQuery(string) method is currently bugged, and as a result SetParameter method doesn't work with more than 10 named parameters. There already seems to be a bug report created for this on NHbiernate's Jira.

Is there a way to improve performance when sending a large number of IDs to SQL server for filtering?

I use the following C# code to send a list of IDs to SQL Server 2012. It filters the column ID of mytable and return the first 50 matching ids.
Actually it take around 180ms to execute the query. The database is local. I am wondering if there is some ways to improve performance. I have noticed performance is directly related to the number of ids send to SQL server rather than the actual number of records in the table. If I send only one thousand records it is very fast (< 1ms). Maybe there is another, more efficient way to send those IDs.
The user defined table int_list_type and mytable are defined like this :
CREATE TABLE mytable (Id int NOT NULL PRIMARY KEY CLUSTERED)
CREATE TYPE int_list_type AS TABLE(Id int NOT NULL PRIMARY KEY CLUSTERED)
C# code :
static void Main()
{
List<int> idsToSend = Enumerable.Range(0, 200000).ToList();
List<int> idsResult = new List<int>();
Stopwatch sw = Stopwatch.StartNew();
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
SqlCommand command = new SqlCommand(#" SELECT TOP 50 t.Id FROM MyTable t
INNER JOIN #ids lt ON t.Id = lt.Id",
connection);
command.Parameters.Add(new SqlParameter("#ids", SqlDbType.Structured)
{
TypeName = "int_list_type",
Direction = ParameterDirection.Input,
Value = GetSqlDataRecords(idsToSend)
});
SqlDataReader reader = command.ExecuteReader();
while (reader.Read())
{
idsResult.Add(reader.GetInt32(0));
}
}
Console.WriteLine(sw.Elasped);
}
private static IEnumerable<SqlDataRecord> GetSqlDataRecords(IEnumerable<int> values)
{
SqlMetaData[] metaData = { new SqlMetaData("Id", SqlDbType.Int) };
foreach (int value in values)
{
SqlDataRecord rec = new SqlDataRecord(metaData);
rec.SetInt32(0, value);
yield return rec;
}
}
EDIT : as suggested by Fabio, I took a look at GetSqlDataRecords() method and this is what take most of the time. I tested it separately this way :
Stopwatch sw = Stopwatch.StartNew();
GetSqlDataRecords(listOfIfs).ToList();
Console.WriteLine(sw.Elapsed);

You could try passing in the list of IDs as a comma separated list of strings, then within SQL find all where ID IN(ListOfIds).
Have not had a chance to test this but it solved a similar issue for me in the past. Please let me know if it makes any sort of difference (good or bad).

+1 for using a table value type and passing that in as a parameter. This is a textbook example of how to use table value types.
Unfortunately as you have identified, you will still experience performance issues when passing in very large arrays of data.
You could try using XML to pass the values in: xml parsing with sql query The XML parser might be more performant in your environment when dealing with larger arrays, a note of warning, keep the namespaces simple or omit them entirely or performance is much worse than the table value type for smaller arrays 100s to 1000s, in larger arrays you may see better performance.
Question, can you re-architect this solution so that the list of IDs is already in the database? Or offload the ingestion of the list of IDs so that this happens first in preparation for your query?
I do this in my applications by allowing the user to manually 'tag' rows, or run scripts or select some pre-compiled logic for selecting the ids.
I store these ids in a Tag table (the user can save the tag list to reuse in other sessions)
Now that the list of IDs is already in the DB, we simply join on the IDs to our selection list. The execution of the query is not any faster than with table value types or any variants of parsing xml, json or string, but we have bypassed the parse step which is generally the most costly. That and that now the data is in the DB it is easier for SQL Server to optimise and cache query execution plans.
Note: when sending lists of data to be executed in a query either in the form of table value types, table variables or temporary tables, work must be done to hydrate this data list in SQL temp DB.
You might find some fancy solutions that involve configuring the environment to support this scenario, but if you can change the process to ensure the selection list is already in the DB, most of the heavy lifting work is done for you. Then you can use indexing and other traditional DBA maintenance to optimise your query performance even more.

SQL generated from LINQ not consistent

I am using Telerik Open/Data Access ORM against an ORACLE.
Why do these two statements result in different SQL commands?
Statement #1
IQueryable<WITransmits> query = from wiTransmits in uow.DbContext.StatusMessages
select wiTransmits;
query = query.Where(e=>e.MessageID == id);
Results in the following SQL
SELECT
a."MESSAGE_ID" COL1,
-- additional fields
FROM "XFE_REP"."WI_TRANSMITS" a
WHERE
a."MESSAGE_ID" = :p0
Statement #2
IQueryable<WITransmits> query = from wiTransmits in uow.DbContext.StatusMessages
select new WITransmits
{
MessageID = wiTranmits.MessageID,
Name = wiTransmits.Name
};
query = query.Where(e=>e.MessageID == id);
Results in the following SQL
SELECT
a."MESSAGE_ID" COL1,
-- additional fields
FROM "XFE_REP"."WI_TRANSMITS" a
The query generated with the second statement #2 returns, obviously EVERY record in the table when I only want the one. Millions of records make this prohibitive.

Telerik Data Access will try to split each query into database-side and client-side (or in-memory LINQ if you prefer it).
Having projection with select new is sure trigger that will make everything in your LINQ expression tree after the projection to go to the client side.
Meaning in your second case you have inefficient LINQ query as any filtering is applied in-memory and you have already transported a lot of unnecessary data.
If you want compose LINQ expressions in the way done in case 2, you can append the Select clause last or explicitly convert the result to IEnumerable<T> to make it obvious that any further processing will be done in-memory.

The first query returns the full object defined, so any additional limitations (like Where) can be appended to it before it is actually being run. Therefore the query can be combined as you showed.
The second one returns a new object, which can be whatever type and contain whatever information. Therefore the query is sent to the database as "return everything" and after the objects have been created all but the ones that match the Where clause are discarded.
Even though the type were the same in both of them, think of this situation:
var query = from wiTransmits in uow.DbContext.StatusMessages
select new WITransmits
{
MessageID = wiTranmits.MessageID * 4 - 2,
Name = wiTransmits.Name
};
How would you combine the Where query now? Sure, you could go through the code inside the new object creation and try to move it outside, but since there can be anything it is not feasible. What if the checkup is some lookup function? What if it's not deterministic?
Therefore if you create new objects based on the database objects there will be a border where the objects will be retrieved and then further queries will be done in memory.

Simple query on linq

using linq, I want to check if a row exist on the DB. I just need a true\false return, no data.
I cant use the ExecuteQuery method because I dont have an entity (and I dont even need it)
I thought of doing something like this:
string command = "select * from myTable where X=Y"
var result = db.ExecuteCommand(command);
(db is my DataContext)
and expected the result to contain the number of affected rows. If different that -1 it would mean the record I'm looking for exists. But I always get -1. I imagine the ExecuteCommand method should only be used to tu run Insert, updates or deletes.
How can I run this simple check using linq

You can use the Any() operator. It will return true if the IEnumerable or IQueryable it is called on has at least one item (i.e. does it have any items).

If db is your data context, you should just do:
bool rowExists = dc.GetTable<myTable>().Any(row => row.X == row.Y);
In general, with LINQ to SQL (and Entity Framework), you rarely want to write SQL code directly.

Replace
select *
with
select count(*)

You're probably better off running SELECT COUNT(*) FROM myTable WHERE X=Y and checking if the single record returned is equal to zero or not

Collecting metadata into table

I have tabluar data that passes through a C# program that I need to collect some metadata on before finishing. The metadata is always counts based on fields of the data. Also, I need them all grouped by one field in the data. Periodically, I need to add new counts to this collection of metadata.
I've been researching it for a little while, and I think what makes sense is to rework my program to store the data as a DataTable, then run LINQ queries on the table. The problem I'm having is being able to put the different counts into one table-like structure and then write that out.
I might run a query like this:
var query01 =
from record in records.AsEnumerable()
group record by record.Field<String>("Association Key") into associationsGroup
select new { AssociationKey = associationsGroup.Key, Count = associationsGroup.Count<DataRow>() };
To get a count of all of the records grouped by the field Association Key. I'm going to want another count, grouped in the same way:
var query02 =
from record in records.AsEnumerable()
where record.Field<String>("Number 9") == "yes"
group record by record.Field<String>("Association Key") into associationsGroup
select new { AssociationKey = associationsGroup.Key, Number9Count = associationsGroup.Count<DataRow>() };
And so on.
I thought about trying Union chain the queries but I was having trouble getting them to union since I'm projecting into anonymous types. I couldn't figure out how to do it differently to make a union work better.
So, how can I collect my metadata into one table-like structure?

Not going to union because you have different types. Add Number9Count and Count to both annonymous types and try union again.

I ended up solving the problem by creating a class that holds the set of records I need as a DataTable. A user can add queries through a method, taking an argument Func<DataRow, bool>. The method constructs the query supplying that argument as the where clause, maintaining the same grouping and properties in the resulting anonymous-typed object.
When retrieving the results, the class iterates over each query stored and enters the results into a new DataTable.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.