Does DataSet occupy too much space?

Does DataSet occupy too much space? - c#

If I wish to add some information to my SQL Server database, must I do it through a DataSet and a DataAdapter ?
The idea is that if my database has 1-2 million entries, isn't my memory going to be occupied unnecessary with the 1-2 mil rows in the DataSet considering that I want to add only one row? Is there an alternative ?

If you're only inserting a row, that needn't fetch anything into the DataSet/DataAdapter. You add the row, submit the changes, and the relevant INSERT command will be executed.

You could always create a plain old ADO.NET parametrized SqlCommand holding a simple SQL INSERT statement, and provide parameters, and load the data that way (nothing needs to be loaded, doesn't matter how many rows you already have - it will just work):
string insertStmt = "INSERT INTO dbo.YourTable(col1, col2, ...., colN) " +
"VALUES(#Value1, #Value2, ...., #ValueN)";
using(SqlConnection _con = new SqlConnection(-your-connection-string-here))
using(SqlCommand _cmdInsert = new SqlCommand(insertStmt, _con))
{
// define the parameters for your query
_cmdInsert.Parameters.Add("#Value1", SqlDbType.Int);
.......
// set the values
_cmdInsert.Parameters["#Value1"].Value = 4711;
.....
_con.Open();
int rowsInserted = _cmdInsert.ExecuteNonQuery();
_con.Close();
}
If you have multiple rows to insert, you could loop over e.g. a list of objects, set the values for our _cmdInsert for each object, and execute the _cmdInsert.ExecuteNonQuery() for each row.
Of course, if you use something like an ORM (NHibernate, Linq-to-SQL, Entity Framework), that work might get infinitely easier - just insert new objects into your collection and save them - the ORM will deal with all the nitty-gritty details (and basically do this code I showed above and execute it - more or less).

Related

Create Strongly Typed Dataset for a stored procedure that return more than one table

I have a stored procedure that performs 3 select. How do I create a strongly typed dataset that can access all the 3 tables and read the data. Visual studio by default generates a dataset with the 1st table only
I have already tried using visual studio Typed Dataset, to drag and drop the Stored Procedure.
Stored procedure is something like this:
Create Procedure GetData
As
Begin
Select ColA, ColB, ColC from TableA
Select ColD, ColE, ColF from TableB
Select ColG, ColH, ColI from TableC
End

If you're desperate to do this I don't think you'll succeed with a pure strongly-typed designer generated solution; tableadapters are intended to mediate between a local datatable representation of the db data (your strongly typed datatable) and a database query that returns rows. The "table" in tableadapter related to th datatable, not the database table.
A single tableadapter is not intended to function as a mediator between 3 local datatables and a remote procedure that delivers the output of 3 database queries. Primarily it cannot do this because there is nothing the client side code can use to identify, for your sql of...
Select ColA, ColB, ColC from TableA
Select ColD, ColE, ColF from TableB
Select ColG, ColH, ColI from TableC
...that the results from select * from TableA are supposed to go in TableADataTable in your dataset etc. The fact that the data came out of tableA is lost in the transmission over the wire because it is completely irrelevant and might not even be true.
When a tableadapter is correctly used with a single select, it knows implicitly the datatable the results should be put in. TableADataTable has a corresponding TableATableAdapter, and TableATableAdapter selects data from somewhere in the db and stashes it in the TableADataTable - there is no other table in the dataset that TableATableAdapter is designed to manipulate so it doesn't need any hints about where the block of data goes after it runs the query. TableATableAdapter can even be loaded with a query that doesn't return any data from the database TableA at all; so long as the query it runs produces a set of columns of th right number and type, that data will go in TableADataTable because that's what TableATableAdapter is hardcoded to do. It serves no other datatable, and has no interest in any other datatable.
Because your stored procedure has no way of indicating to the tableadapter which of the result sets should be stored in which table, the solution you're envisaging cannot work.
The simple rule is: "one dog, one done" - "one db query result set, one tableadapter, one strongly typed datatable"
As such, i firmly recommend you use these things as they were intended:
Create 3 tableadapters and corresponding datatables for TableA, TableB and TableC
Fill each independently in your code:
var ds = new StronglyTypedDataSet();
var ata as new TableATableAdapter();
ata.Fill(ds);
var bta as new TableBTableAdapter();
bta.Fill(ds);
var cta as new TableCTableAdapter();
cta.Fill(ds);
"We would like to avoid multiple db calls for a single page" doesn't really make sense - it sounds like a solution to a problem you've imagined rather than on that will really happen. There's so little performance advantage to be gained by trying to execute these things in one hit rather than 3. You might disagree, but test it - don't just go off a hunch. Connections are pooled, statements are cached and prepared, 3 statements could be executed concurrently if you really think it will help vastly.
At the end of the day if you have 9 megabytes of data to pull out of a database, the difference in doing it as 3 pulls of 3 mb each versus 1 pull of 9 is going to be miniscule; you aren't waiting 30 seconds for a connection to open, reading 3 mb in a second, waiting another 30s to close it ad having to do it all over again (total time 183 seconds) and having all the bottleneck attributable to the connection management. Even if you did have a super latency connection that will take 30s to transmit the SELECT and another 30s to start reading data, you can launch your 3 requests simultaneously and by definition it will take the same time to send the 3 SELECTs as it will to invoke 1 procedure call (both take 61 seconds)
If you cannot agree that the reason for trying to do it all in one is spurious then you may wish to keep trying to do it via your chosen method, in which case I think you're going to have to choose:
Use a standard dataadapter and dataset
and then move the data into the typed set to work with it
SqlConnection con = new SqlConnection("YourConnection String");
SqlDataAdapter da = new SqlDataAdapter();
DataSet ds = new DataSet();
SqlCommand cmd = new SqlCommand("storedprocedure", con);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.AddWithValue("#p1", whatever); //if you have parameters.
SqlDataAdapter da= new SqlDataAdapter(cmd);
da.Fill(ds);
con.Close();
Now you have a dataset with 3 tables in it, it's your problem to figure out which table is which. Let's assume ds.Tables[0] is for TableA:
foreach(var ro in ds.Tables[0].Rows)
typedDs.TableA.AddTableARow(ro.ItemArray);
Repeat it for b and c tables
Convert your 3 queries to 1
Your example seems to indicate all 3 of your tables have the same number of columns. If the columns are the same type too, then you can union the queries, and load them into a single table adapter and fill them into a single strongly typed datatable. You might then have some more work to do splitting them out into separate datatables. Perhaps modify th query to return a column so you can track where the data came from:
Select 'tableA' as wherefrom, ColA, ColB, ColC from TableA
UNION ALL
Select 'tableB' as wherefrom, ColD, ColE, ColF from TableB
UNION ALL
Select 'tableC' as wherefrom, ColG, ColH, ColI from TableC
It's a mess, a hassle, a hack
Why is this so hard? Well.. to quote another old saying: if it's hard, you're doing it wrong. TableAdapters were designed X way and you're trying to use them Y way. Take a step back and examine the reasons behind why you're doing it - that's where the real problem lies

How to copy large datatable to MySql table?

I am trying to copy a large datatable (columns with more than 1000 rows) created dynamically in the applicaiton to a MySQL table using c#, WPF.
I have searched for various ways to do this but was unsuccessful to implement. I think the MySqlDataAdapter class is something I should use but I cant make it work. This is what I tried to do...
MySqlConnection con = new MySqlConnection(MyConString);
MySqlCommand comm = new MySqlCommand("Select * From kinectdata", con);
MySqlDataAdapter test1 = new MySqlDataAdapter(comm);
test1.Update(skelData);
Speed of this transfer is also important so I prefered not to call an insert or update statement 1000 times.
Many thanks for your feedback!
M

You can build a single INSERT statement which inserts all 1000 rows.
INSERT INTO table VALUES (1,2,3), (4,5,6), (7,8,9);

1000 rows is not that much, in database terms its nothing, using insert should be very fast. No more then 2 seconds.
In your example you do have to declare command type and set your query and command text.

Possible to insert with a Table Parameter, and also retrieve identity values?

I'm trying to insert records using a high performance table parameter method ( http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/ ), and I'm curious if it's possible to retrieve back the identity values for each record I insert.
At the moment, the answer appears to be no - I insert the data, then retrieve back the identity values, and they don't match. Specifically, they don't match about 75% of the time, and they don't match in unpredictable ways. Here's some code that replicates this issue:
// Create a datatable with 100k rows
DataTable dt = new DataTable();
dt.Columns.Add(new DataColumn("item_id", typeof(int)));
dt.Columns.Add(new DataColumn("comment", typeof(string)));
for (int i = 0; i < 100000; i++) {
dt.Rows.Add(new object[] { 0, i.ToString() });
}
// Insert these records and retrieve back the identity
using (SqlConnection conn = new SqlConnection("Data Source=localhost;Initial Catalog=testdb;Integrated Security=True")) {
conn.Open();
using (SqlCommand cmd = new SqlCommand("proc_bulk_insert_test", conn)) {
cmd.CommandType = CommandType.StoredProcedure;
// Adding a "structured" parameter allows you to insert tons of data with low overhead
SqlParameter param = new SqlParameter("#mytable", SqlDbType.Structured);
param.Value = dt;
cmd.Parameters.Add(param);
SqlDataReader dr = cmd.ExecuteReader();
// Set all the records' identity values
int i = 0;
while (dr.Read()) {
dt.Rows[i].ItemArray = new object[] { dr.GetInt32(0), dt.Rows[i].ItemArray[1] };
i++;
}
dr.Close();
}
// Do all the records' ID numbers match what I received back from the database?
using (SqlCommand cmd = new SqlCommand("SELECT * FROM bulk_insert_test WHERE item_id >= #base_identity ORDER BY item_id ASC", conn)) {
cmd.Parameters.AddWithValue("#base_identity", (int)dt.Rows[0].ItemArray[0]);
SqlDataReader dr = cmd.ExecuteReader();
DataTable dtresult = new DataTable();
dtresult.Load(dr);
}
}
The database is defined using this SQL server script:
CREATE TABLE bulk_insert_test (
item_id int IDENTITY (1, 1) NOT NULL PRIMARY KEY,
comment varchar(20)
)
GO
CREATE TYPE bulk_insert_table_type AS TABLE ( item_id int, comment varchar(20) )
GO
CREATE PROCEDURE proc_bulk_insert_test
#mytable bulk_insert_table_type READONLY
AS
DECLARE #TableOfIdentities TABLE (IdentValue INT)
INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id INTO #TableOfIdentities(IdentValue)
SELECT comment FROM #mytable
SELECT * FROM #TableOfIdentities
Here's the problem: the values returned from proc_bulk_insert_test are not in the same order as the original records were inserted. Therefore, I can't programmatically assign each record the item_id value I received back from the OUTPUT statement.
It seems like the only valid solution is to SELECT back the entire list of records I just inserted, but frankly I'd prefer any solution that would reduce the amount of data piped across my SQL Server's network card. Does anyone have better solutions for large inserts while still retrieving identity values?
EDIT: Let me try clarifying the question a bit more. The problem is that I would like my C# program to learn what identity values SQL Server assigned to the data that I just inserted. The order isn't essential; but I would like to be able to take an arbitrary set of records within C#, insert them using the fast table parameter method, and then assign their auto-generated ID numbers in C# without having to requery the entire table back into memory.
Given that this is an artificial test set, I attempted to condense it into as small of a readable bit of code as possible. Let me describe what methods I have used to resolve this issue:
In my original code, in the application this example came from, I would insert about 15 million rows using 15 million individual insert statements, retrieving back the identity value after each insert. This worked but was slow.
I revised the code using high performance table parameters for insertion. I would then dispose of all of the objects in C#, and read back from the database the entire objects. However, the original records had dozens of columns with lots of varchar and decimal values, so this method was very network traffic intensive, although it was fast and it worked.
I now began research to figure out whether it was possible to use the table parameter insert, while asking SQL Server to just report back the identity values. I tried scope_identity() and OUTPUT but haven't been successful so far on either.
Basically, this problem would be solved if SQL Server would always insert the records in exactly the order I provided them. Is it possible to make SQL server insert records in exactly the order they are provided in a table value parameter insert?
EDIT2: This approach seems very similar to what Cade Roux cites below:
http://www.sqlteam.com/article/using-the-output-clause-to-capture-identity-values-on-multi-row-inserts
However, in the article, the author uses a magic unique value, "ProductNumber", to connect the inserted information from the "output" value to the original table value parameter. I'm trying to figure out how to do this if my table doesn't have a magic unique value.

Your TVP is an unordered set, just like a regular table. It only has order when you specify as such. Not only do you not have any way to indicate actual order here, you're also just doing a SELECT * at the end with no ORDER BY. What order do you expect here? You've told SQL Server, effectively, that you don't care. That said, I implemented your code and had no problems getting the rows back in the right order. I modified the procedure slightly so that you can actually tell which identity value belongs to which comment:
DECLARE #TableOfIdentities TABLE (IdentValue INT, comment varchar(20))
INSERT INTO bulk_insert_test (comment)
OUTPUT Inserted.item_id, Inserted.comment
INTO #TableOfIdentities(IdentValue, comment)
SELECT comment FROM #mytable
SELECT * FROM #TableOfIdentities
Then I called it using this code (we don't need all the C# for this):
DECLARE #t bulk_insert_table_type;
INSERT #t VALUES(5,'foo'),(2,'bar'),(3,'zzz');
SELECT * FROM #t;
EXEC dbo.proc_bulk_insert_test #t;
Results:
1 foo
2 bar
3 zzz
If you want to make sure the output is in the order of identity assignment (which isn't necessarily the same "order" that your unordered TVP has), you can add ORDER BY item_id to the last select in your procedure.
If you want to insert into the destination table so that your identity values are in an order that is important to you, then you have a couple of options:
add a column to your TVP and insert the order into that column, then use a cursor to iterate over the rows in that order, and insert one at a time. Still more efficient than calling the entire procedure for each row, IMHO.
add a column to your TVP that indicates order, and use an ORDER BY on the insert. This isn't guaranteed, but is relatively reliable, particularly if you eliminate parallelism issues using MAXDOP 1.
In any case, you seem to be placing a lot of relevance on ORDER. What does your order actually mean? If you want to place some meaning on order, you shouldn't be doing so using an IDENTITY column.

You specify no ORDER BY on this: SELECT * FROM #TableOfIdentities so there's no guarantee of order. If you want them in the same order they were sent, do an INNER JOIN in that to the data that was inserted with an ORDER BY which matches the order the rows were sent in.

How to SELECT with optional columns?

I'm currently working on a c# application that grabs a bunch of data from a user specified access(.mdb) database and does a bunch of stuff with that data. A problem that I've recently come across is that some of the a database is missing a column that has existed in all of the others.
How can I do a select on a database, but gracefully fail (throw null in the data or something) when a column doesn't exist in the database?
Currently, my code looks something like this:
OleDbConnection aConnection = new
OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + FileName);
string sqlQuery = "SELECT [Table1].[Index], [Table1].[Optional Info],
[Table2].[Other Info], ....
FROM [Table1] INNER JOIN [Table2] ON [Table1].[Index]=[Table2].[Index]
ORDER BY [Table1].[Index]";
OleDbCommand aCommand = new OleDbCommand(sqlQuery, aConnection);
OleDbDataReader aReader = aCommand.ExecuteReader();
(proceed to read data in line by line, using fabulous magic numbers)
I think it's obvious that this is one of my first experiences with databases. I'm not overly concerned as long as it works, but it's stopped working for a database that does not contain the [Table1].[Optional Info] column. It's throwing an OleDbException: "No value given for one or more required parameters."
Any help would be appreciated.

I might be missing something but...
SELECT Table1.*, Table2.otherInfo
FROM ...
Should do the trick, and let the client process the result set, with an important caveat: there is no way to exclude a column from Table1 in the above.
(I am not aware of any method to "dynamically shape" -- with the viewpoint of the caller -- a SELECT except with a * in the column list as above.)
Happy coding.

The way to do that is to not use magic numbers, but to fetch the field names from the reader and use them - for example GetName etc.
Alternatively, use a mapper like "dapper" that will do this for you.

There is no way to do this in a single query: you cannot run a query that includes columns that don't exist in the source tables. When the server tries to compile the query, it will simply fail.
If you absolutely need to support different scemas, you will need different queries for each of them.
To make things even more awesome, there is no documented way to check if an Access table has a particular column on it via SQL. In SQL Server, you could query the system schema, like sys.objects or sys.columns. In Access, the MsysObjects table has the information you need but it's schema is liable to change on you without notice.
Probably the safest way to go about this is to do a single, up front check where you execute a command such as
SELECT * FROM Table1
then scan the resulting column names to see if your optional column exists; your C# code would then become:
string sqlQuery = string.Empty;
if (optionalColumnExists)
{
sqlQuery = "SELECT [Table1].[Index], [Table1].[Optional Info], -- etc."
}
else
{
sqlQuery = "SELECT [Table1].[Index], '' AS [Optional Info], -- etc."
}

There is a way to extract the table schema using OleDbDataReader.GetSchemaTable and that can be used
OleDbConnection aConnection = new
OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + FileName);
OleDbCommand aCommand = new OleDbCommand("Table1", aConnection);
aCommand.CommandType = CommandType.TableDirect;
aConnection.Open();
OleDbDataReader aReader = cmd.ExecuteReader(CommandBehavior.SchemaOnly);
DataTable schemaTable = aReader.GetSchemaTable();
aReader.Close();
aConnection.Close();
bool optionalInfoColumnExists = schemaTable.Columns.Contains("Optional Info");
Now later in the code
string sqlQuery = #"SELECT [Table1].[Index], {0}
[Table2].[Other Info], ....
FROM [Table1] INNER JOIN [Table2] ON [Table1].[Index]=[Table2].[Index]
ORDER BY [Table1].[Index]";
if (optionalInfoColumnExists)
{
sqlQuery = string.Format(sqlQuery, "[Table1].[Optional Info],");
}
else
{
sqlQuery = string.Format(sqlQuery, "");
}
and while reading use similar logic.
I don't know what kind of application this is but the optionalInfoColumnExists should be populated at the application or session start and reused throughout the life of the app i.e. don't execute the GetSchemaTable everytime a query is run on this table (assuming that the mdb won't change while the app is active).
Either way, it seems like that it is going to make the code to have "if else" just to take care of presense and absence of a column in a table.

How to optimize this SQL Query (from C#)

I am newbie to db programming and need help with optimizing this query:
Given tables A, B and C and I am interested in one column from each of them, how to write a query such that I can get one column from each table into 3 different arrays/lists in my C# code?
I am currently running three different queries to the DB but want to accomplish the same in one query (to save 2 trips to the DB).

#patmortech Use UNION ALL instead of UNION if you don't care about duplicate values or if you can only get unique values (because you are querying via primary or unique keys). Much faster performance with UNION ALL.
There is no sense of "arrays" in SQL. There are tables, rows, and columns. Resultsets return a SET of rows and columns. Can you provide an example of what you are looking for? (DDL of source tables and sample data would be helpful.)
As others have said, you can send up multiple queries to the server within a single execute statement and return multiple resultsets via ADO.NET. You use the DataReader .NextResult() command to return the next resultset.
See here for more information: MSDN
Section: Retrieving Multiple Result Sets using NextResult
Here is some sample code:
static void RetrieveMultipleResults(SqlConnection connection)
{
using (connection)
{
SqlCommand command = new SqlCommand(
"SELECT CategoryID, CategoryName FROM dbo.Categories;" +
"SELECT EmployeeID, LastName FROM dbo.Employees",
connection);
connection.Open();
SqlDataReader reader = command.ExecuteReader();
while (reader.HasRows)
{
Console.WriteLine("\t{0}\t{1}", reader.GetName(0),
reader.GetName(1));
while (reader.Read())
{
Console.WriteLine("\t{0}\t{1}", reader.GetInt32(0),
reader.GetString(1));
}
reader.NextResult();
}
}
}

With a stored procedure you can return more than one result set from the database and have a dataset filled with more than one table, you can then access these tables and fill your arrays/lists.

You can do 3 different SELECT statements and execute in 1 call. You will get 3 results sets back. How you leverage those results depends on what data technology you are using. LINQ? Datasets? Data Adapter? Data Reader? If you can provide that information (perhaps even sample code) I can tell you exactly how to get what you need.

Not sure if this is exactly what you had in mind, but you could do something like this (as long as all three columns are the same data type):
select field1, 'TableA' as TableName from tableA
UNION
select field2, 'TableB' from tableB
UNION
select field3, 'TableC' from tableC
This would give you one big resultset with all the records. Then you could use a data reader to read the results, keep track of what the previous record's TableName value was, and whenever it changes you could start putting the column values into another array.

Take the three trips. The answers so far suggest how far you would need to advance from "new to db programming" to do what you want. Master the simplest ways first.
If they are three huge results, then I suspect you're trying to do something in C# that would better be done in SQL on the database without bringing back the data. Without more detail, this sounds suspiciously like an antipattern.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Does DataSet occupy too much space? - c#

If you're only inserting a row, that needn't fetch anything into the DataSet/DataAdapter. You add the row, submit the changes, and the relevant INSERT command will be executed.

Related

Create Strongly Typed Dataset for a stored procedure that return more than one table

How to copy large datatable to MySql table?

Possible to insert with a Table Parameter, and also retrieve identity values?

How to SELECT with optional columns?

How to optimize this SQL Query (from C#)

Categories

Resources