How to optimize this SQL Query (from C#) - c#

I am newbie to db programming and need help with optimizing this query:
Given tables A, B and C and I am interested in one column from each of them, how to write a query such that I can get one column from each table into 3 different arrays/lists in my C# code?
I am currently running three different queries to the DB but want to accomplish the same in one query (to save 2 trips to the DB).

#patmortech Use UNION ALL instead of UNION if you don't care about duplicate values or if you can only get unique values (because you are querying via primary or unique keys). Much faster performance with UNION ALL.
There is no sense of "arrays" in SQL. There are tables, rows, and columns. Resultsets return a SET of rows and columns. Can you provide an example of what you are looking for? (DDL of source tables and sample data would be helpful.)
As others have said, you can send up multiple queries to the server within a single execute statement and return multiple resultsets via ADO.NET. You use the DataReader .NextResult() command to return the next resultset.
See here for more information: MSDN
Section: Retrieving Multiple Result Sets using NextResult
Here is some sample code:
static void RetrieveMultipleResults(SqlConnection connection)
{
using (connection)
{
SqlCommand command = new SqlCommand(
"SELECT CategoryID, CategoryName FROM dbo.Categories;" +
"SELECT EmployeeID, LastName FROM dbo.Employees",
connection);
connection.Open();
SqlDataReader reader = command.ExecuteReader();
while (reader.HasRows)
{
Console.WriteLine("\t{0}\t{1}", reader.GetName(0),
reader.GetName(1));
while (reader.Read())
{
Console.WriteLine("\t{0}\t{1}", reader.GetInt32(0),
reader.GetString(1));
}
reader.NextResult();
}
}
}

With a stored procedure you can return more than one result set from the database and have a dataset filled with more than one table, you can then access these tables and fill your arrays/lists.

You can do 3 different SELECT statements and execute in 1 call. You will get 3 results sets back. How you leverage those results depends on what data technology you are using. LINQ? Datasets? Data Adapter? Data Reader? If you can provide that information (perhaps even sample code) I can tell you exactly how to get what you need.

Not sure if this is exactly what you had in mind, but you could do something like this (as long as all three columns are the same data type):
select field1, 'TableA' as TableName from tableA
UNION
select field2, 'TableB' from tableB
UNION
select field3, 'TableC' from tableC
This would give you one big resultset with all the records. Then you could use a data reader to read the results, keep track of what the previous record's TableName value was, and whenever it changes you could start putting the column values into another array.

Take the three trips. The answers so far suggest how far you would need to advance from "new to db programming" to do what you want. Master the simplest ways first.
If they are three huge results, then I suspect you're trying to do something in C# that would better be done in SQL on the database without bringing back the data. Without more detail, this sounds suspiciously like an antipattern.

Related

Create Strongly Typed Dataset for a stored procedure that return more than one table

I have a stored procedure that performs 3 select. How do I create a strongly typed dataset that can access all the 3 tables and read the data. Visual studio by default generates a dataset with the 1st table only
I have already tried using visual studio Typed Dataset, to drag and drop the Stored Procedure.
Stored procedure is something like this:
Create Procedure GetData
As
Begin
Select ColA, ColB, ColC from TableA
Select ColD, ColE, ColF from TableB
Select ColG, ColH, ColI from TableC
End
If you're desperate to do this I don't think you'll succeed with a pure strongly-typed designer generated solution; tableadapters are intended to mediate between a local datatable representation of the db data (your strongly typed datatable) and a database query that returns rows. The "table" in tableadapter related to th datatable, not the database table.
A single tableadapter is not intended to function as a mediator between 3 local datatables and a remote procedure that delivers the output of 3 database queries. Primarily it cannot do this because there is nothing the client side code can use to identify, for your sql of...
Select ColA, ColB, ColC from TableA
Select ColD, ColE, ColF from TableB
Select ColG, ColH, ColI from TableC
...that the results from select * from TableA are supposed to go in TableADataTable in your dataset etc. The fact that the data came out of tableA is lost in the transmission over the wire because it is completely irrelevant and might not even be true.
When a tableadapter is correctly used with a single select, it knows implicitly the datatable the results should be put in. TableADataTable has a corresponding TableATableAdapter, and TableATableAdapter selects data from somewhere in the db and stashes it in the TableADataTable - there is no other table in the dataset that TableATableAdapter is designed to manipulate so it doesn't need any hints about where the block of data goes after it runs the query. TableATableAdapter can even be loaded with a query that doesn't return any data from the database TableA at all; so long as the query it runs produces a set of columns of th right number and type, that data will go in TableADataTable because that's what TableATableAdapter is hardcoded to do. It serves no other datatable, and has no interest in any other datatable.
Because your stored procedure has no way of indicating to the tableadapter which of the result sets should be stored in which table, the solution you're envisaging cannot work.
The simple rule is: "one dog, one done" - "one db query result set, one tableadapter, one strongly typed datatable"
As such, i firmly recommend you use these things as they were intended:
Create 3 tableadapters and corresponding datatables for TableA, TableB and TableC
Fill each independently in your code:
var ds = new StronglyTypedDataSet();
var ata as new TableATableAdapter();
ata.Fill(ds);
var bta as new TableBTableAdapter();
bta.Fill(ds);
var cta as new TableCTableAdapter();
cta.Fill(ds);
"We would like to avoid multiple db calls for a single page" doesn't really make sense - it sounds like a solution to a problem you've imagined rather than on that will really happen. There's so little performance advantage to be gained by trying to execute these things in one hit rather than 3. You might disagree, but test it - don't just go off a hunch. Connections are pooled, statements are cached and prepared, 3 statements could be executed concurrently if you really think it will help vastly.
At the end of the day if you have 9 megabytes of data to pull out of a database, the difference in doing it as 3 pulls of 3 mb each versus 1 pull of 9 is going to be miniscule; you aren't waiting 30 seconds for a connection to open, reading 3 mb in a second, waiting another 30s to close it ad having to do it all over again (total time 183 seconds) and having all the bottleneck attributable to the connection management. Even if you did have a super latency connection that will take 30s to transmit the SELECT and another 30s to start reading data, you can launch your 3 requests simultaneously and by definition it will take the same time to send the 3 SELECTs as it will to invoke 1 procedure call (both take 61 seconds)
If you cannot agree that the reason for trying to do it all in one is spurious then you may wish to keep trying to do it via your chosen method, in which case I think you're going to have to choose:
Use a standard dataadapter and dataset
and then move the data into the typed set to work with it
SqlConnection con = new SqlConnection("YourConnection String");
SqlDataAdapter da = new SqlDataAdapter();
DataSet ds = new DataSet();
SqlCommand cmd = new SqlCommand("storedprocedure", con);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue("#p1", whatever); //if you have parameters.
SqlDataAdapter da= new SqlDataAdapter(cmd);
da.Fill(ds);
con.Close();
Now you have a dataset with 3 tables in it, it's your problem to figure out which table is which. Let's assume ds.Tables[0] is for TableA:
foreach(var ro in ds.Tables[0].Rows)
typedDs.TableA.AddTableARow(ro.ItemArray);
Repeat it for b and c tables
Convert your 3 queries to 1
Your example seems to indicate all 3 of your tables have the same number of columns. If the columns are the same type too, then you can union the queries, and load them into a single table adapter and fill them into a single strongly typed datatable. You might then have some more work to do splitting them out into separate datatables. Perhaps modify th query to return a column so you can track where the data came from:
Select 'tableA' as wherefrom, ColA, ColB, ColC from TableA
UNION ALL
Select 'tableB' as wherefrom, ColD, ColE, ColF from TableB
UNION ALL
Select 'tableC' as wherefrom, ColG, ColH, ColI from TableC
It's a mess, a hassle, a hack
Why is this so hard? Well.. to quote another old saying: if it's hard, you're doing it wrong. TableAdapters were designed X way and you're trying to use them Y way. Take a step back and examine the reasons behind why you're doing it - that's where the real problem lies

C# ADO.NET Loading SQL Script with Multiple Statements and Comments

I am building an application in which I will producing some reports based off the results of some SQL queries executed against a number of different databases and servers. Since I am unable to create stored procedures on each server, I have my SQL scripts saved locally, load them into my C# application and execute them against each server using ADO.NET. All of the SQL scripts are selects that return tables, however, some of them are more complicated than others and involve multiple selects into table variables that get joined on, like the super basic example below.
My question is, using ADO.NET, is it possible to assign a string of multiple SQL queries that ultimately only returns a single data table to a SqlCommand object - e.g. the two SELECT statements below comprising my complete script? Or would I have to create a transaction and execute each individual query separately as its own command?
-- First Select
SELECT *
INTO #temp
FROM Table1;
--Second Select
SELECT *
FROM Table1
JOIN #temp
ON Table1.Id = #temp.Id;
Additionally, some of my scripts have comments embedded in them like the rudimentary example above - would these need to be removed or are they effectively ignored within the string? This seems to be working with single queries, in other words the "--This is a comment" is effectively ignored.
private void button1_Click(object sender, EventArgs e)
{
string ConnectionString = "Server=server1;Database=test1;Trusted_Connection=True";
using (SqlConnection conn = new SqlConnection(ConnectionString))
{
SqlCommand cmd = new SqlCommand("--This is a comment \n SELECT TOP 10 * FROM dbo.Tablw1;");
DataTable dt = new DataTable();
SqlDataAdapter sqlAdapt = new SqlDataAdapter(cmd.CommandText.ToString(), conn);
sqlAdapt.Fill(dt);
MessageBox.Show(dt.Rows.Count.ToString());
}
}
Yes, that is absolutely fine. Comments are ignored. It should work fine. The only thing to watch is the scopin of temporary tables - if you are used to working with stored procedures, the scope is temporary (they are removed when the stored procedure ends); with direct commands: it isn't - they are connection-specific but survive between multiple operations. If that is a problem, take a look at "table variables".
Note: technically this is up to the backend provider; assuming you are using a standard database engine, you'll be OK. If you are using something exotic, then it might be a genuine question. For example, it might not work on "Bob's homemade OneNote ADO.NET provider".
Yes, you can positively do it.
You can play with different types of collections, or with string Builder for passing queries even you can put the string variable and assign the query to it.
While the loop is running put in temp table or CTE, its totally depends on you to choose the approach. and add the data to datatable.
So if you want the entire data to be inserted or Updated or deleted then you can go for transaction,it won't be any issue.
I don't use ado.net, I use Entity Framework but I think this is more a SQL question than an ADO.NET question; Forgive me if I'm wrong. Provided you are selecting from Table1 in both queries I think you should use this query instead.
select *
from Table1 tbl1
join Table1 tbl2
on tbl1.id = tbl2.id
Actually I really don't ever see a reason you would have to move things into temp tables with options like Common Table Expressions available to you.
look up CTEs if you don't already know about them
https://www.simple-talk.com/sql/t-sql-programming/sql-server-cte-basics/

Select count from datatable against dynamic query

I have a data set in memory which has two data tables in it.
Cont
Ins
Secondly i have separate data table (in memory) in which i have rows in following format.
And suppose it has two rows. In actual it may have million of rows
ID TableName ColumnName Operator value
1 Const ID = 1
2 Ins app_ID = 558877
As is: Now i get information from given rows and construct a query and execute it in database server like:
Select count(*) from Cont.Id= 1 and Ins.app_id = 558877;
On the basis of above query result i have implemented my business logic.
Objective: In order to increase performance i want to execute query in application server as now i have complete table in memory. how can i do it.
Note: Tables name may vary according to data.
Do you really want to keep tables with millions of rows in memory?
In terms of counting an in memory table, how is it kept? If it's a datatable as per your tag you can use the DataTable.Rows.Count property.
If your table names aren't known, you can loop over the tables in DataSet.Tables and call Rows.Count on each of them.
You have to create index on app_ID field. After that your count will run with good performance.
Your query references undefined alias "Ins". You must specify another table in FROM/JOIN clause and change "count()" to "count(Cont.)"

Possible to insert with a Table Parameter, and also retrieve identity values?

I'm trying to insert records using a high performance table parameter method ( http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/ ), and I'm curious if it's possible to retrieve back the identity values for each record I insert.
At the moment, the answer appears to be no - I insert the data, then retrieve back the identity values, and they don't match. Specifically, they don't match about 75% of the time, and they don't match in unpredictable ways. Here's some code that replicates this issue:
// Create a datatable with 100k rows
DataTable dt = new DataTable();
dt.Columns.Add(new DataColumn("item_id", typeof(int)));
dt.Columns.Add(new DataColumn("comment", typeof(string)));
for (int i = 0; i < 100000; i++) {
dt.Rows.Add(new object[] { 0, i.ToString() });
}
// Insert these records and retrieve back the identity
using (SqlConnection conn = new SqlConnection("Data Source=localhost;Initial Catalog=testdb;Integrated Security=True")) {
conn.Open();
using (SqlCommand cmd = new SqlCommand("proc_bulk_insert_test", conn)) {
cmd.CommandType = CommandType.StoredProcedure;
// Adding a "structured" parameter allows you to insert tons of data with low overhead
SqlParameter param = new SqlParameter("#mytable", SqlDbType.Structured);
param.Value = dt;
cmd.Parameters.Add(param);
SqlDataReader dr = cmd.ExecuteReader();
// Set all the records' identity values
int i = 0;
while (dr.Read()) {
dt.Rows[i].ItemArray = new object[] { dr.GetInt32(0), dt.Rows[i].ItemArray[1] };
i++;
}
dr.Close();
}
// Do all the records' ID numbers match what I received back from the database?
using (SqlCommand cmd = new SqlCommand("SELECT * FROM bulk_insert_test WHERE item_id >= #base_identity ORDER BY item_id ASC", conn)) {
cmd.Parameters.AddWithValue("#base_identity", (int)dt.Rows[0].ItemArray[0]);
SqlDataReader dr = cmd.ExecuteReader();
DataTable dtresult = new DataTable();
dtresult.Load(dr);
}
}
The database is defined using this SQL server script:
CREATE TABLE bulk_insert_test (
item_id int IDENTITY (1, 1) NOT NULL PRIMARY KEY,
comment varchar(20)
)
GO
CREATE TYPE bulk_insert_table_type AS TABLE ( item_id int, comment varchar(20) )
GO
CREATE PROCEDURE proc_bulk_insert_test
#mytable bulk_insert_table_type READONLY
AS
DECLARE #TableOfIdentities TABLE (IdentValue INT)
INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id INTO #TableOfIdentities(IdentValue)
SELECT comment FROM #mytable
SELECT * FROM #TableOfIdentities
Here's the problem: the values returned from proc_bulk_insert_test are not in the same order as the original records were inserted. Therefore, I can't programmatically assign each record the item_id value I received back from the OUTPUT statement.
It seems like the only valid solution is to SELECT back the entire list of records I just inserted, but frankly I'd prefer any solution that would reduce the amount of data piped across my SQL Server's network card. Does anyone have better solutions for large inserts while still retrieving identity values?
EDIT: Let me try clarifying the question a bit more. The problem is that I would like my C# program to learn what identity values SQL Server assigned to the data that I just inserted. The order isn't essential; but I would like to be able to take an arbitrary set of records within C#, insert them using the fast table parameter method, and then assign their auto-generated ID numbers in C# without having to requery the entire table back into memory.
Given that this is an artificial test set, I attempted to condense it into as small of a readable bit of code as possible. Let me describe what methods I have used to resolve this issue:
In my original code, in the application this example came from, I would insert about 15 million rows using 15 million individual insert statements, retrieving back the identity value after each insert. This worked but was slow.
I revised the code using high performance table parameters for insertion. I would then dispose of all of the objects in C#, and read back from the database the entire objects. However, the original records had dozens of columns with lots of varchar and decimal values, so this method was very network traffic intensive, although it was fast and it worked.
I now began research to figure out whether it was possible to use the table parameter insert, while asking SQL Server to just report back the identity values. I tried scope_identity() and OUTPUT but haven't been successful so far on either.
Basically, this problem would be solved if SQL Server would always insert the records in exactly the order I provided them. Is it possible to make SQL server insert records in exactly the order they are provided in a table value parameter insert?
EDIT2: This approach seems very similar to what Cade Roux cites below:
http://www.sqlteam.com/article/using-the-output-clause-to-capture-identity-values-on-multi-row-inserts
However, in the article, the author uses a magic unique value, "ProductNumber", to connect the inserted information from the "output" value to the original table value parameter. I'm trying to figure out how to do this if my table doesn't have a magic unique value.
Your TVP is an unordered set, just like a regular table. It only has order when you specify as such. Not only do you not have any way to indicate actual order here, you're also just doing a SELECT * at the end with no ORDER BY. What order do you expect here? You've told SQL Server, effectively, that you don't care. That said, I implemented your code and had no problems getting the rows back in the right order. I modified the procedure slightly so that you can actually tell which identity value belongs to which comment:
DECLARE #TableOfIdentities TABLE (IdentValue INT, comment varchar(20))
INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id, Inserted.comment
INTO #TableOfIdentities(IdentValue, comment)
SELECT comment FROM #mytable
SELECT * FROM #TableOfIdentities
Then I called it using this code (we don't need all the C# for this):
DECLARE #t bulk_insert_table_type;
INSERT #t VALUES(5,'foo'),(2,'bar'),(3,'zzz');
SELECT * FROM #t;
EXEC dbo.proc_bulk_insert_test #t;
Results:
1 foo
2 bar
3 zzz
If you want to make sure the output is in the order of identity assignment (which isn't necessarily the same "order" that your unordered TVP has), you can add ORDER BY item_id to the last select in your procedure.
If you want to insert into the destination table so that your identity values are in an order that is important to you, then you have a couple of options:
add a column to your TVP and insert the order into that column, then use a cursor to iterate over the rows in that order, and insert one at a time. Still more efficient than calling the entire procedure for each row, IMHO.
add a column to your TVP that indicates order, and use an ORDER BY on the insert. This isn't guaranteed, but is relatively reliable, particularly if you eliminate parallelism issues using MAXDOP 1.
In any case, you seem to be placing a lot of relevance on ORDER. What does your order actually mean? If you want to place some meaning on order, you shouldn't be doing so using an IDENTITY column.
You specify no ORDER BY on this: SELECT * FROM #TableOfIdentities so there's no guarantee of order. If you want them in the same order they were sent, do an INNER JOIN in that to the data that was inserted with an ORDER BY which matches the order the rows were sent in.

How to read the result of SELECT * from joined tables with duplicate column names in .NET

I am a PHP/MySQL developer, slowly venturing into the realm of C#/SQL Server and I am having a problem in C# when it comes to reading an SQL Server query that joins two tables.
Given the two tables:
TableA:
int:id
VARCHAR(50):name
int:b_id
TableB:
int:id
VARCHAR(50):name
And given the query
SELECT * FROM TableA,TableB WHERE TableA.b_id = TableB.id;
Now in C# I normally read query data in the following fashion:
SqlDataReader data_reader= sql_command.ExecuteReader();
data_reader["Field"];
Except in this case I need to differentiate from TableA's name column, and TableB's name column.
In PHP I would simply ask for the field "TableA.name" or "TableB.name" accordingly but when I try something like
data_reader["TableB.name"];
in C#, my code errors out.
How can fix this? And how can I read a query on multiple tables in C#?
The result set only sees the returned data/column names, not the underlying table. Change your query to something like
SELECT TableA.Name as Name_TA, TableB.Name as Name_TB from ...
Then you can refer to the fields like this:
data_reader["Name_TA"];
To those posting that it is wrong to use "SELECT *", I strongly disagree with you. There are many real world cases where a SELECT * is necessary. Your absolute statements about its "wrong" use may be leading someone astray from what is a legitimate solution.
The problem here does not lie with the use of SELECT *, but with a constraint in ADO.NET.
As the OP points out, in PHP you can index a data row via the "TABLE.COLUMN" syntax, which is also how raw SQL handles column name conflicts:
SELECT table1.ID, table2.ID FROM table1, table;
Why DataReader is not implemented this way I do not know...
That said, a solution to be used could build your SQL statement dynamically by:
querying the schema of the tables you're selecting from
build your SELECT clause by iterating through the column names in the schema
In this way you could build a query like the following without having to know what columns currently exist in the schema for the tables you're selecting from
SELECT TableA.Name as Name_TA, TableB.Name as Name_TB from ...
You could try reading the values by index (a number) rather than by key.
name = data_reader[4];
You will have to experiment to see how the numbers correspond.
Welcome to the real world. In the real world, we don't use "SELECT *". Specify which columns you want, from which tables, and with which alias, if required.
Although it is better to use a column list to remove duplicate columns, if for any reason you want *****, then just use
rdr.item("duplicate_column_name")
This will return the first column value, since the inner join will have the same values in both identical columns, so this will accomplish the task.
Ideally, you should never have duplicate column names, across a database schema. So if you can rename your schema to not have conflicting names.
That rule is for this very situation. Once you've done your join, it is just a new recordset, and generally the table names do go with it.

Categories