Multithread SQL select statements - c#

I am a multithreading novice and a SQL novice, so please excuse any rookie mistakes.
I am trying to execute many SQL queries asynchronously. The queries are all select statements from the same table in the same database. I can run them synchronously and everything works fine, but testing a small subset leads me to believe that to run all the queries synchronously would take approximately 150 hours, which is far too long. As such, I'm trying to figure out how to run them in parallel.
I have tried to model the code after the answer at run a method multiple times simultaneously in c#, but my code is not executing correctly (it's erroring, though I do not know specifically how. The code just says an error occurs).
Here is what I have (A much smaller and simpler version of what I am actually doing):
class Program
{
static void Main(string[] args)
{
List<string> EmployeeIDs = File.ReadAllLines(/* Filepath */);
List<Tuple<string, string>> NamesByID = new List<Tuple<string, string>>();
//What I do not want to do (because it takes too long) ...
using (SqlConnection conn = new SqlConnection(/* connection string */))
{
foreach (string id in EmployeeIDs)
{
using (SqlCommand cmd = new SqlCommand("SELECT FirstName FROM Employees WITH (NOLOCK) WHERE EmployeeID = " + id, conn))
{
try
{
conn.Open();
NamesByID.Add(new Tuple<string, string> (id, cmd.ExecuteScalar().ToString()));
}
finally
{
conn.Close();
}
}
}
}
//What I do want to do (but it errors) ...
var tasks = EmployeeIDs.Select(id => Task<Tuple<string, string>>.Factory.StartNew(() => RunQuery(id))).ToArray();
Task.WaitAll(tasks);
NamesByID = tasks.Select(task => task.Result).ToList();
}
private static Tuple<string, string> RunQuery(string id)
{
using (SqlConnection conn = new SqlConnection(/* connection string */))
{
using (SqlCommand cmd = new SqlCommand("SELECT FirstName FROM Employees WITH (NOLOCK) WHERE EmployeeID = " + id, conn))
{
try
{
conn.Open();
return new Tuple<string, string> (id, cmd.ExecuteScalar().ToString());
}
finally
{
conn.Close();
}
}
}
}
}
Note: I do not care exactly how this is multithreaded (tasks, parallel.foreach, backgroundworker, etc). This is going to be used to run ~30,000 select queries exactly 1 time, so I just need it to run fast (I'm hoping for ~8 hrs = one work day, but I'll take what I can get) one time. It doesn't have to really be pretty.
Thank you in advance!

This is just plain wrong. You should build one query to select all FirstNames you need. If you need to pass a bunch of ids to the server, that is no problem, just use table valued parameter (aka TVP), coma separated list of values really does not scale well. If the query is correctly written and the tables indexed, that should be quite fast. 100k rows table is a small table.
The query then may look like this
SELECT DollarAmount, comp.CompanyID
FROM Transactions
JOIN (SELECT MIN(TransactionID) as minTransactionID, CompanyID
FROM CompanyTransactions
GROUP BY CompanyID
) AS comp
ON Transactions.TransactionID = comp.minTransactionID
JOIN #IDList ON id = comp.CompanyID
You may use IN instead of JOIN if the ids in TVP are not unique.
Btw. do you know what NOLOCK means? If you are the only user of the database and use it single threaded or do not modify any data, then you are safe. Other than that it means that you are okay with a small chance of:
some records may be missing in the result
there are duplicate records in the result
there are rows in the result, that have never been committed and never were accepted as valid data
if you use varchar(max), you may get text that has never been stored

You want to do one query to get all of the ID/Name combinations, then put them into a dictionary (for quick access). This will remove the very slow process of running 30,000 queries as well as reduce the complexity of your code.
I could get you something more concrete if you posted the actual SQL query (you can change the column and table names if you need) but this should be close:
;WITH CompTransCTE AS (
SELECT CompanyID, MIN(TransactionID) AS TransactionID
FROM CompanyTransactions
WHERE CompanyID IN (/*Comma seperated list of values*/)
GROUP BY CompanyID
)
SELECT CT.CompanyID, T.DollarAmount, T.TransactionID
FROM Transactions AS T
INNER JOIN CompTransCTE AS CT ON CT.TransactionID = T.TransactionID;

Without creating a User-Defined Table Type in the database, you can use SqlBulkCopy to load the IDs into a temp table, and reference that in the query.
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
namespace ConsoleApp11
{
class Program
{
static void Main(string[] args)
{
//var EmployeeIDs = File.ReadAllLines(""/* Filepath */);
var EmployeeIDs = Enumerable.Range(1, 30 * 1000).ToList();
var dt = new DataTable();
dt.Columns.Add("id", typeof(int));
dt.BeginLoadData();
foreach (var id in EmployeeIDs)
{
var row = dt.NewRow();
row[0] = id;
dt.Rows.Add(row);
}
dt.EndLoadData();
using (SqlConnection conn = new SqlConnection("server=.;database=tempdb;integrated security=true"))
{
conn.Open();
var cmdCreateTemptable = new SqlCommand("create table #ids(id int primary key)",conn);
cmdCreateTemptable.ExecuteNonQuery();
//var cmdCreateEmpable = new SqlCommand("create table Employees(EmployeeId int primary key, FirstName varchar(2000))", conn);
//cmdCreateEmpable.ExecuteNonQuery();
var bc = new SqlBulkCopy(conn);
bc.DestinationTableName = "#ids";
bc.ColumnMappings.Add("id", "id");
bc.WriteToServer(dt);
var names = new List<string>();
var cmd = new SqlCommand("SELECT FirstName, EmployeeId FROM Employees WHERE EmployeeID in (select id from #ids)", conn);
using (var rdr = cmd.ExecuteReader())
{
var firstName = rdr.GetString(0);
var id = rdr.GetInt32(1);
names.Add(firstName);
}
Console.WriteLine("Hit any key to continue");
Console.ReadKey();
}
}
}
}

Related

ADO.NET query returning nothing even if the item available

I am using ADO.NET for querying the SQL Server database. I am trying to get items if present on the table.
My query is executing but returning nothing even if there is.
Here is my code:
public List<string> GetRecords(List<string> itemList)
{
itemList.Add("100");
string list = string.Join(",", itemList.Select(x => string.Format("'{0}'", x)));
string query = #"SELECT Id FROM Employees WHERE Id In (#list)";
using (SqlCommand sqlCommand = new SqlCommand(query,connection))
{
sqlCommand.Parameters.AddWithValue("#list", list);
sqlDataReader = sqlCommand.ExecuteReader();
while (sqlDataReader.Read())
{
employeeList.Add(Convert.ToString(database.Sanitise(sqlDataReader, "Id")));
}
}
}
There are three items in the list the employee with ID=100 is available in the table but the other two's are not. but still the query returning nothing.
SQL profiler showing me this query:
exec sp_executesql N'SELECT
Id
FROM
Employees
WHERE
Id In (#list)',N'#list nvarchar(29)',#list=N'''50'',''23'',''100'''
SQL Server will not interpret your concatenated list as actual code. It remains data always, so it's just one big text string of numbers. That is never going to match a single row.
Instead, use a Table Valued Parameter.
First create a table type in your database, I usually keep a few useful ones around.
CREATE TYPE dbo.IdList (Id int PRIMARY KEY);
Then create a DataTable and pass it as a parameter.
public List<string> GetRecords(List<string> itemList)
{
var table = new DataTable { Columns = {
{ "Id", typeof(int) },
} };
foreach (var id in itemList)
table.Rows.Add(id);
const string query = #"
SELECT e.Id
FROM Employees e
WHERE e.Id IN (SELECT l.Id FROM #list l);
";
using (var connection = new SqlConnection(YourConnString)) // always create and dispose a new connection
using (var sqlCommand = new SqlCommand(query,connection))
{
sqlCommand.Parameters.Add(new SqlParameter("#list", SqlDbType.Structured) {
Value = table,
TypeName = "dbo.IdList",
});
connection.Open();
using (var sqlDataReader = sqlCommand.ExecuteReader())
{
while (sqlDataReader.Read())
{
employeeList.Add((string)sqlDataReader["Id"]);
}
}
}
}
Note also:
using on all SQL objects.
Do not cache a connection object. Create when you need it, dispose with using.
I don't know what your Sanitize function does, but it probably doesn't work. Sanitizing database values correctly is hard, you should always use parameterization.
AddWithValue is a bad idea. Instead specify the parameter types (and lengths/precision) explicitly.

Get many rows from a list of IDs

I'm using C# and SQL Server. I have a list of IDs for documents which corresponds to the primary key for a table in SQL Server that has a row for each document and the row contains (among other things) the ID and the document for that ID. I want to get the document in the row for each of the IDs. Currently, I execute a query for each ID, but since there are 10,000s of them, this runs a ton of queries and takes a very long time. It ends up being faster to simply load everything from the table into memory and then filter by the ids I have, but that seems inefficient and won't scale over time. If that doesn't make sense, hopefully the following code that takes a long time to run shows what I'm trying to do.
private static Dictionary<Guid, string> foo(IEnumerable<Guid> guids, SqlConnection conn)
{
using (SqlCommand command = new SqlCommand(null, conn))
{
command.CommandText = "select document from Documents where id = #id";
SqlParameter idParam = new SqlParameter("#id", SqlDbType.UniqueIdentifier);
command.Parameters.Add(idParam);
command.Prepare();
var documents = new Dictionary<Guid, string>();
foreach (var guid in guids)
{
idParam.Value = guid;
object obj = command.ExecuteScalar();
if (obj != null)
{
documents[guid] = (string)obj;
}
}
return documents;
}
}
I could programmatically construct query strings to use where clause like this: ".... where id in (ID1, ID2, ID3, ..., ID100)" to get 100 documents at a time or something like that, but this feels janky and it seems to me like there's got to be a better way.
I'm sure I'm not the only one to run into this. Is there an accepted way to go about this?
You can use Table-Valued Parameters with no limits in amount of guids
In the code you will create SqlParameter with all Id's you need to
First you need create type of parameter in the sql server
CREATE TYPE IdTableType AS TABLE
(
Id uniqueidentifier
);
Then in the code
private static Dictionary<Guid, string> foo(IEnumerable<Guid> guids, SqlConnection conn)
{
using (SqlCommand command = new SqlCommand(null, conn))
{
// use parameter as normal table in the query
command.CommandText =
"select document from Documents d inner join #AllIds a ON d.id = a.Id";
// DataTable is used for Table-Valued parameter as value
DataTable allIds = new DataTable();
allIds.Columns.Add("Id"); // Name of column need to be same as in created Type
foreach(var id in guids)
allids.Rows.Add(id);
SqlParameter idParam = new SqlParameter
{
ParameterName = "#AllIds",
SqlDbType=SqlDbType.Structured // Important for table-valued parameters
TypeName = "IdTableType", // Important! Name of the type must be provided
Value = allIds
};
command.Parameters.Add(idParam);
var documents = new Dictionary<Guid, string>();
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
documents[guid] = reader[0].ToString();
}
}
return documents;
}
}
You don't need to prepare the command any more. Besides after first execution next queries will use same compiled query plan, because query text remain same.
You can bunch them into sets of ids and pass a table valued parameter into the query. With Dapper this looks a bit like:
connection.Query("select document from Documents where id in #ids", new { ids = guids});
BEWARE though theres an 8000 parameter limit in sql so you will need to batch up your reads.
btw.. I'd highly recommend looking at Dapper or another micro orm for this type of data access.

Transferring data between two Access databases

I am writing a simple reporting tool that will need to move data from a table in one Access database to a table in another Access database (the table structure is identical). However, I am new to C# and am finding it hard to come up with a reliable solution.
Any pointers would be greatly appreciated.
Access SQL supports using an IN clause to specify that a table resides in a different database. The following C# code SELECTs rows from a table named [YourTable] in Database1.accdb and INSERTs them into an existing table named [YourTable] (with the identical structure) in Database2.accdb:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.OleDb;
namespace oleDbTest
{
class Program
{
static void Main(string[] args)
{
string myConnectionString;
myConnectionString =
#"Provider=Microsoft.ACE.OLEDB.12.0;" +
#"Data Source=C:\Users\Public\Database1.accdb;";
using (var con = new OleDbConnection())
{
con.ConnectionString = myConnectionString;
con.Open();
using (var cmd = new OleDbCommand())
{
cmd.Connection = con;
cmd.CommandType = System.Data.CommandType.Text;
cmd.CommandText =
#"INSERT INTO YourTable IN 'C:\Users\Public\Database2.accdb' " +
#"SELECT * FROM YourTable WHERE ID < 103";
cmd.ExecuteNonQuery();
}
con.Close();
}
Console.WriteLine("Done.");
}
}
}
Many ways.
0) If it's only once, copy and paste the table.
1) If you want to do this inside Access, the easiest way is to create a linked table in the new database, and then a make table query in the new database.
2) You can reference the second table directly.
SELECT *
FROM TableInDbX IN 'C:\SomeFolder\DB X';
3) In a macro, you can use the TransferDatabase method of the DoCmd object to link relevant tables and then run suitable append and update queries to synchronize.
4) VBA
http://www.techonthenet.com/access/questions/new_mdb.php
Given column names Col1, Col2, and Col3:
private static void Migrate(string dbConn1, string dbConn2) {
// DataTable to store your info into
var table = new DataTable();
// Modify your SELECT command as needed
string sqlSelect = "SELECT Col1, Col2, Col3 FROM aTableInOneAccessDatabase ";
// Notice this uses the connection string to DB1
using (var cmd = new OleDbCommand(sqlSelect, new OleDbConnection(dbConn1))) {
cmd.Connection.Open();
table.Load(cmd.ExecuteReader());
cmd.Connection.Close();
}
// Modify your INSERT command as needed
string sqlInsert = "INSERT INTO aTableInAnotherAccessDatabase " +
"(Col1, Col2, Col3) VALUES (#Col1, #Col2, #Col3) ";
// Notice this uses the connection string to DB2
using (var cmd = new OleDbCommand(sqlInsert, new OleDbConnection(dbConn2))) {
// Modify these database parameters to match the signatures in the new table
cmd.Parameters.Add("#Col1", DbType.Int32);
cmd.Parameters.Add("#Col2", DbType.String, 50);
cmd.Parameters.Add("#Col3", DbType.DateTime);
cmd.Connection.Open();
foreach (DataRow row in table.Rows) {
// Fill in each parameter with data from your table's row
cmd.Parameters["#Col1"].Value = row["Col1"];
cmd.Parameters["#Col2"].Value = row["Col2"];
cmd.Parameters["#Col3"].Value = row["Col3"];
// Insert that data
cmd.ExecuteNonQuery();
}
cmd.Connection.Close();
}
}
Now, I do not work with Access databases very often, so you may need to tweak something up there.
That should get you well on your way, though.
Worth noting:
If I remember correctly, Access does NOT pay attention to your OleDbParameter names! You could call them whatever you want, and in fact most people just use a question mark ? for the parameter fields.
So, you have to add and update these parameters in the same order that your statement calls them.
So, why did I name the parameters #Col1, #Col2, #Col3? Here, it just to help you and me understand where each parameter is intended to map to. It is also good practice to get into. If you ever migrate to a better database, hopefully it will pay attention to what the parameters are named.

C# Excel result comparation

I have never learned this aspect of programming, but is there a way to get each separate result of a excel query(using OleDB) or the likes.
The only way I can think of doing this is to use the INTO keyword in the SQL statement, but this does not work for me (SELECT attribute INTO variable FROM table).
An example would be to use the select statement to retrieve the ID of Clients, and then compare these ID's to clientID's in a client ListArray, and if they match, then the clientTotal orders should be compared.
Could someone prove some reading material and/or some example code for this problem.
Thank you.
This code fetches rows from a sql procedure. Will probably work for you too with some
modifications.
using (var Conn = new SqlConnection(ConnectString))
{
Conn.Open();
try
{
using (var cmd = new SqlCommand("THEPROCEDUREQUERY", Conn))
{
cmd.CommandType = CommandType.StoredProcedure;
SqlDataReader reader = cmd.ExecuteReader();
// Find Id of column in query only once at start
var Col1IdOrd = reader.GetOrdinal("ColumnName");
var Col2IdOrd = reader.GetOrdinal("ColumnName");
// loop through all the rows
while (reader.Read())
{
// get data for each row
var Col1 = reader.GetInt32(ColIdOrd);
var Col2 = reader.GetDouble(Col2IdOrd);
// Do something with data from one row for both columns here
}
}
}
finally
{
Conn.Close();
}

c# loop within a loop

I'm currently using DataSets to bring back results in a C# service which I now need to change to loop through the initial set of data and bring back a subset of data from this result.
So I need to loop through these results using an identifier and then show another set of results within a nest below each of these. Using datasets seems no way of making this happen from my limited C# knowledge.
EG> Loop through DB, for each result in DB loop through another table.
[WebMethod(BufferResponse=true,Description="Viewing Things")]
public DataSet MyFunctionIs (int IDtoQuery)
{
MySqlConnection dbConnection = new MySqlConnection("server=na;uid=na;pwd=na;database=na;");
MySqlDataAdapter objCommand = new MySqlDataAdapter("SELECT STATEMENT HERE;", dbConnection);
DataSet DS = new DataSet();
objCommand.Fill(DS,"MyFunctionIs");
}
But even using joins isnt going to fulfill.. I need to query of each row returned on this and return a child set of data for the XML response
Often you can solve this kind of problem by using a SQL query that joins several tables and returns only one set of rows. This gives you a general idea on how you can try to do it:
string ConnectionString = "server=myserver;uid=sa;pwd=secret;database=mydatabase";
using (var con = new SqlConnection(ConnectionString)) {
string CommandText = "SELECT p.firstname, p.lastname, o.operderdate " +
"FROM persons p LEFT JOIN orders o ON p.person_id = o.person_id";
using (var cmd = new SqlCommand(CommandText, con)) {
con.Open();
using (var reader = cmd.ExecuteReader()) {
while (reader.Read()) {
Console.WriteLine("{0} {1}, {2}", reader["firstname"], reader["lastname"], reader["operderdate"]);
}
}
}
}

Categories