What i need to do is this:
I got an array with lets say 100000 values, For each value i need to make the same query ,just change that specific value.
Now, i am thinking that if i loop all this values in my c#/ java code and reach for a query it would take a lot of time.
My other option is doing all the work in my db, populate a temp table and than reading back in my code from that temp table.
What is the fastest way of doing such thing?
private void GetValues(List<Element> _Elements)
{
foreach (Element e in _Elements)
{
using (OracleCommand cmd = new OracleCommand())
{
cmd.Connection = _conn;
cmd.CommandText = "select value from table where something = " +e.Indicator;
using(OracleDataReader r = cmd.ExecuteReader());
while (r.Read())
{
e.setvalue = r.GetString(1);
}
r.Close();
}
}
}
}
[Editor note: question was originally unclear as to whether it was C# or Java -- but the languages are largely equivalent, and answers should be applicable to both.]
Do one query,
select value from table where something between min and max;
or
select value from table where something in (a, b, c, ... );
or
select value from table where something in
(select things from tempTable);
or
select value from table where something in
(select things from #tablevariable);
whichever of these approaches is most applicable to your database and problem.
Repeating all the processing over and over, than amalgamating the results must be slower than taking a set based approach in the first place.
It all rather depends on on the type and distribution of the Indicator property of the Element List.
The faster way is to use a dynamic query. In a loop, you build up a statement to make use of several values at once.
In the sample you gave, it mean building up statements like those.
query1: select value from table where something in (n1, n2,... ,n500)
query2: select value from table where something in (n501, n502,... ,n1000)
etc.
You may not have to make several queries at all depending of the (characters?) limitations you face.
There are a lot of optimiziation tips, but for your specific case, here are 2:
Try to do everything in one go, #Jodrell suggested very good ideas for that.
Retrieve the smallest possible data from the db, select only the fields you need.
Related
I use the following C# code to send a list of IDs to SQL Server 2012. It filters the column ID of mytable and return the first 50 matching ids.
Actually it take around 180ms to execute the query. The database is local. I am wondering if there is some ways to improve performance. I have noticed performance is directly related to the number of ids send to SQL server rather than the actual number of records in the table. If I send only one thousand records it is very fast (< 1ms). Maybe there is another, more efficient way to send those IDs.
The user defined table int_list_type and mytable are defined like this :
CREATE TABLE mytable (Id int NOT NULL PRIMARY KEY CLUSTERED)
CREATE TYPE int_list_type AS TABLE(Id int NOT NULL PRIMARY KEY CLUSTERED)
C# code :
static void Main()
{
List<int> idsToSend = Enumerable.Range(0, 200000).ToList();
List<int> idsResult = new List<int>();
Stopwatch sw = Stopwatch.StartNew();
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
SqlCommand command = new SqlCommand(#" SELECT TOP 50 t.Id FROM MyTable t
INNER JOIN #ids lt ON t.Id = lt.Id",
connection);
command.Parameters.Add(new SqlParameter("#ids", SqlDbType.Structured)
{
TypeName = "int_list_type",
Direction = ParameterDirection.Input,
Value = GetSqlDataRecords(idsToSend)
});
SqlDataReader reader = command.ExecuteReader();
while (reader.Read())
{
idsResult.Add(reader.GetInt32(0));
}
}
Console.WriteLine(sw.Elasped);
}
private static IEnumerable<SqlDataRecord> GetSqlDataRecords(IEnumerable<int> values)
{
SqlMetaData[] metaData = { new SqlMetaData("Id", SqlDbType.Int) };
foreach (int value in values)
{
SqlDataRecord rec = new SqlDataRecord(metaData);
rec.SetInt32(0, value);
yield return rec;
}
}
EDIT : as suggested by Fabio, I took a look at GetSqlDataRecords() method and this is what take most of the time. I tested it separately this way :
Stopwatch sw = Stopwatch.StartNew();
GetSqlDataRecords(listOfIfs).ToList();
Console.WriteLine(sw.Elapsed);
You could try passing in the list of IDs as a comma separated list of strings, then within SQL find all where ID IN(ListOfIds).
Have not had a chance to test this but it solved a similar issue for me in the past. Please let me know if it makes any sort of difference (good or bad).
+1 for using a table value type and passing that in as a parameter. This is a textbook example of how to use table value types.
Unfortunately as you have identified, you will still experience performance issues when passing in very large arrays of data.
You could try using XML to pass the values in: xml parsing with sql query The XML parser might be more performant in your environment when dealing with larger arrays, a note of warning, keep the namespaces simple or omit them entirely or performance is much worse than the table value type for smaller arrays 100s to 1000s, in larger arrays you may see better performance.
Question, can you re-architect this solution so that the list of IDs is already in the database? Or offload the ingestion of the list of IDs so that this happens first in preparation for your query?
I do this in my applications by allowing the user to manually 'tag' rows, or run scripts or select some pre-compiled logic for selecting the ids.
I store these ids in a Tag table (the user can save the tag list to reuse in other sessions)
Now that the list of IDs is already in the DB, we simply join on the IDs to our selection list. The execution of the query is not any faster than with table value types or any variants of parsing xml, json or string, but we have bypassed the parse step which is generally the most costly. That and that now the data is in the DB it is easier for SQL Server to optimise and cache query execution plans.
Note: when sending lists of data to be executed in a query either in the form of table value types, table variables or temporary tables, work must be done to hydrate this data list in SQL temp DB.
You might find some fancy solutions that involve configuring the environment to support this scenario, but if you can change the process to ensure the selection list is already in the DB, most of the heavy lifting work is done for you. Then you can use indexing and other traditional DBA maintenance to optimise your query performance even more.
We have a C# component that handles attaching arbitrary-sized element lists into IN clauses for semi-arbitrary SQL SELECT queries. Essentially this boils down to receiving something like:
SELECT COUNT(*) FROM a WHERE b IN (...)
...where the "..." is the only portion of the query the component is allowed to modify.
Currently the component will insert a comma-separated set of named bind parameters, then attach the corresponding IDbDataParameter objects to the command and execute; the component is made aware of the types for the parameters it has to bind. This works well, until the calling code supplies a parameter set larger than the database is willing to accept. The objective here is to get such large sets working with queries against Oracle 11gR2 via ODP.NET.
This task is complicated somewhat by the following approaches being deemed unacceptable by those setting the requirements:
Global Temporary Tables
Stored procedures
Anything requiring CREATE TYPE to have been executed
The solution to this is not required to execute only one query.
I'm trying to make this work by binding the clause as an array, using code sourced from elsewhere:
IList<string> values;
//...
OracleParameter parameter = new OracleParameter();
parameter.ParameterName = "parm";
parameter.DbType = DbType.String;
parameter.Value = values.ToArray();
int[] sizes = new int[values.Count];
for (int index = 0; index < values.Count; index++)
{
sizes[index] = values[index].Length;
}
parameter.ArrayBindSize = sizes;
//...
The command subsequently executes without throwing an exception, but the value returned for COUNT is zero (compared to the expected value, from running the query in SQLDeveloper with a nested SELECT returning the same parameter set). Going through the ODP.NET docs hasn't brought any joy thus far.
The questions for this are:
Is there a way to make the above parameter attachment work as expected?
Is there another viable way to achieve this without using one of the vetoed approaches?
(I'm aware this is similar to this (unanswered) question, but that scenario does not mention having the same restrictions on approaches.)
Well, since you are not allowed to use Global Temporary Tables, are you at least allowed to create normal tables? If so, here is a way:
Create an OracleCommand object with the following command text:
#"BEGIN
CREATE TABLE {inListTableName}
(
inValue {dbDataType}
)
INSERT INTO {inListTableName}(inValue) VALUES(:inValue);
END"
Set the ArrayBindCount on the command object to the number of items you need in your in list.
Replace {inListTableName} with the Guid.NewGuid().ToString().
Replace the {dbDataType} with the correct oracle data type for the list of values that you want to use in your in clause.
Add an OracleParameter to the OracleCommand named "inValue" and set the value of the parameter to an array containing the values that you want in your in clause. If you have a Hashset (which I recommend using to avoid sending unnecessary duplicates), use the .ToArray() on it to get an array.
Execute this command. This is your prep command.
Then use the following sql snippet as the value portion of the in clause in your select sql statement:
(SELECT {inListTableName}.inValue FROM {inListTableName})
For example:
SELECT FirstName, LastName FROM Users WHERE UserId IN (SELECT {inListTableName}.inValue FROM {inListTableName});
Execute this command to get a reader.
Lastly, one more command with the following command text:
DROP TABLE {inListTableName};
This is your cleanup command. Execute this command.
You might want to create an alternate schema/user to create the inListTable so that you can grant appropriate permissions to your user to only create tables in that schema.
All of this can be encapsulated in a reusable class with the following interface:
public interface IInListOperation
{
void TransmitValueList(OracleConnection connection);
string GetInListSQLSnippet();
void RemoveValueList();
}
TransmitValueList would create your prep command, add the parameter and execute the prep command.
GetInListSQLSnippet would simply return (SELECT {inListTableName}.inValue FROM {inListTableName});
RemoveValueList cleans up.
The constructor for this class would take the value list and oracle db data type, and generate the inListTableName.
If you can use a Global Temporary Table, I would recommend that over creating and dropping tables.
Edit:
I'd like to add that this approach works well if you have clauses involving NOT IN lists or other inequality operators. Take the following for example:
SELECT FirstName, LastName FROM Users WHERE Status == 'ACTIVE' OR UserID NOT IN (1,2,3,4,5,6,7,8,9,10);
If you use the approach of splitting the NOT IN part up, you will end up getting invalid results. The following example of dividing the previous example will return all users instead of all but those with UserIds 1-10.
SELECT FirstName, LastName FROM Users WHERE UserID NOT IN (1,2,3,4,5)
UNION
SELECT FirstName, LastName FROM Users WHERE UserID NOT IN (6,7,8,9,10);
Maybe this is too simplistic for the kind of query you're doing, but is there any reason why you couldn't split this into several queries and combine the results together in code?
i.e. Let's imagine 5 elements are too many for the query...
select COUNT(*) from A where B in (1,2,3,4,5)
you'd separately perform
select COUNT(*) from A where B in (1,2,3)
select COUNT(*) from A where B in (4,5)
and then add those results together. Of course, you'd have to make sure that in-clause list is distinct so you don't double up on your counts.
If you can do it this way, there is an added opportunity for parallelism if you're allowed more than one connection.
Basically, I need to get the email column from each table in the DataSet (there could be 0 tables in there, or there could be 100) and slap them together into a big List for processing later.
I was about to write the 2x nested loop to do it, but is there an easier way to type this?
My first attempt at loops didn't work so well, since DataTables don't define GetEnumerator :
foreach(DataTable item in emailTables.Tables)
{
foreach(var email in item)
{
// This doesn't work
}
}
Like L.B. said, it's opinion based, but LINQ would be my first choice. A bit less readability but less typing overall and definitely less nesting.
var listEmail = (from DataTable table in emailTables.Tables
from DataRow row in table.Rows
select row["yourColumnNameHere"].ToString()).ToList();
If any of the tables do not (or may not) have an email column, then you will have to do some more validation.
I chose to use a separate answer to extend the question posed in a comment after my initial answer was accepted.
The question was:
I'm considering making this into an extension method, would it be possible to extend this code to obtain multiple items, like Email, Name and Address? Maybe into something other than List<>?
The answer:
Create Anonymous types in a the select statement and assign different values from the columns to those types:
var selectedItems = from DataTable table in emailTables.Tables
from DataRow row in table.Rows
select
new
{
EMail = row["email"].ToString(),
Address = row["address"].ToString(),
Name = row["name"].ToString()
};
Then loop through the results in selectedItems and do whatever you would like to the fields. Not sure what type you want to store the results in, but this should give you a pretty good idea.
foreach (var item in selectedItems)
{
//Do whatever you want by accessing the fields EMail, Address, and Name using dot notation like
var myVar = item.EMail;
var myVar2 = item.Address;
//Etc... Not sure what the end result you need is going to be, but you should have a good starting point now.
}
OR you could just return the selectedItems collection. It's type is IEnumerable<T>.
using linq, I want to check if a row exist on the DB. I just need a true\false return, no data.
I cant use the ExecuteQuery method because I dont have an entity (and I dont even need it)
I thought of doing something like this:
string command = "select * from myTable where X=Y"
var result = db.ExecuteCommand(command);
(db is my DataContext)
and expected the result to contain the number of affected rows. If different that -1 it would mean the record I'm looking for exists. But I always get -1. I imagine the ExecuteCommand method should only be used to tu run Insert, updates or deletes.
How can I run this simple check using linq
You can use the Any() operator. It will return true if the IEnumerable or IQueryable it is called on has at least one item (i.e. does it have any items).
If db is your data context, you should just do:
bool rowExists = dc.GetTable<myTable>().Any(row => row.X == row.Y);
In general, with LINQ to SQL (and Entity Framework), you rarely want to write SQL code directly.
Replace
select *
with
select count(*)
You're probably better off running SELECT COUNT(*) FROM myTable WHERE X=Y and checking if the single record returned is equal to zero or not
There are probably 10 duplicates of this question but I would like to know if there is a better way than I am currently doing this. This is a small example that I'm using to show how I'm determining differences:
//let t1 be a representation of the ID's in the database.
List<int> t1 = new List<int>() { 5, 6, 7, 8 };
//let t2 be the list of ID's that are in memory.
//these changes need to be reflected to the database.
List<int> t2 = new List<int>() { 6, 8, 9, 10 };
var hash = new HashSet<int>(t1);
var hash2 = new HashSet<int>(t2);
//determines which ID's need to be removed from the database
hash.ExceptWith(t2);
//determines which ID's need to be added to the database.
hash2.ExceptWith(t1);
//remove contents of hash from database
//add contents of hash2 to database
I want to know if I can determine what to add and remove in ONE operation instead of the two that I currently have to do. Is there any way to increase the performance of this operation? Keep in mind in the actual database situation there are hundreds of thousands of ID's.
EDIT or second question, is there a LINQ query that I can do directly on the database so I can just supply the new list of ID's and have it automatically remove/add itself? (using mysql)
CLARIFICATION I know I need two SQL queries (or a stored procedure). The question is if I can determine the differences in the list in one action, and if it can be done faster than this.
EDIT2
This operation from SPFiredrake appears to be faster than my hashset version - however I have no idea how to determine which to add and which to remove from the database. Is there a way to include that information in the operation?
t1.Union(t2).Except(t1.Intersect(t2))
EDIT3
Nevermind, I forgot that this statement in-fact has the problem of delayed execution, although in-case anyone is wondering, I solved my prior problem with it by using a custom comparer and an added variable determining which list it was from.
Ultimately, you're going to use a full outer join (which in LINQ world, is two GroupJoins). However, we ONLY care about values that don't have a matching record in either table. Null right value (left outer join) indicates a removal, null left value (right outer join) indicates an addition. So to get it to work this way, we just perform two left outer joins (switching the input for the second case to emulate the right outer join), concat them together (can use union, but unnecessary since we'll be getting rid of any duplicates anyway).
List<int> t1 = new List<int>() { 5, 6, 7, 8 };
List<int> t2 = new List<int>() { 6, 8, 9, 10 };
var operations =
t1.GroupJoin(
t2,
t1i => t1i,
t2i => t2i,
(t1i, t2join) => new { Id = t1i, Action = !t2join.Any() ? "Remove" : null })
.Concat(
t2.GroupJoin(
t1,
t2i => t2i,
t1i => t1i,
(t2i, t1join) => new { Id = t2i, Action = !t1join.Any() ? "Insert" : null })
.Where(tr => tr.Action != null)
This will give you the select statement. Then, you can feed this data into a stored procedure that removes values that already exist in the table and add the rest (or two lists to run removals and additions against). Either way, still not the cleanest way to do it, but at least this gets you thinking.
Edit: My original solution was to separate out the two lists based on what action was needed, which is why it's so ghastly. The same can be done using a one-liner (not caring about which action to take, however), although I think you'll still suffer from the same issues (using LINQ [enumeration] as opposed to Hashsets [hash collection]).
// XOR of sets = (A | B) - (A & B), - being set difference (Except)
t1.Union(t2).Except(t1.Intersect(t2))
I'm sure it'll still be slower than using the Hashsets, but give it a shot anyway.
Edit: Yes, it is faster, because it doesn't actually do anything with the collection until you enumerate over it (either in a foreach or by getting it into a concrete data type [IE: List<>, Array, etc]). It's still going to take extra time to sort out which ones to add/remove and that's ultimately the problem. I was able to get comparable speed by breaking down the two queries, but getting it into the in-memory world (via ToList()) made it slower than the hashset version:
t1.Except(t2); // .ToList() slows these down
t2.Except(t1);
Honestly, I would handle it on the SQL side. In the stored proc, store all the values in a table variable with another column indicating addition or removal (based on whether the value already exists in the table). Then you can just do a bulk deletion/insertion by joining back to this table variable.
Edit: Thought I'd expand on what I meant by sending the full list to the database and have it handled in the sproc:
var toModify = t1.Union(t2).Except(t1.Intersect(t2));
mods = string.Join(",", toModify.ToArray());
// Pass mods (comma separated list) to your sproc.
Then, in the stored procedure, you would do this:
-- #delimitedIDs some unbounded text type, in case you have a LOT of records
-- I use XQuery to build the table (found it's faster than some other methods)
DECLARE #idTable TABLE (ID int, AddRecord bit)
DECLARE #xmlString XML
SET #xmlString = CAST('<NODES><NODE>' + REPLACE(#delimitedIDs, ',', '</NODE><NODE>') + '</NODE></NODES>' as XML)
INSERT INTO #idTable (ID)
SELECT node.value('.','int')
FROM #xmlString.nodes('//NODE') as xs(node)
UPDATE id
SET AddRecord = CASE WHEN someTable.ID IS NULL THEN 1 ELSE 0 END
FROM #idTable id LEFT OUTER JOIN [SomeTable] someTable on someTable.ID = id.ID
DELETE a
FROM [SomeTable] a JOIN #idTable b ON b.ID = a.ID AND b.AddRecord = 0
INSERT INTO [SomeTable] (ID)
SELECT id FROM #idTable WHERE AddRecord = 1
Admittedly, this just inserts some ID, it doesn't actually add any other information. However, you can still pass in XML data to the sproc and use XQuery in a similar fashion to get the information you'd need to add.
even if you replace it with a Linq version you still need two operations.
let's assume you are doing this using pure SQL.
you would probably need two queries:
one for removing the records
another one for adding them
Using LINQ code it would be much more complicated and less readable than your solution