I'm in a pickle here. I'm reading an excel file that has three worksheet using exceldatareader's dataset function. After reading the file I have a dataset that contains three tables, one for each worksheet.
Now I need to export this data row by row into two different sql tables. These tables have an auto incremental primary key and PK of Table A makes up the FK in TableB . I'm using a stored procedure with SCOPE_IDENTITY() to achieve that.
Each column in excel file is a variable in stored procedure so as I iterate the sheets row by row I can assign these variables and then send them through stored procedure.
Now the question is how do I iterate through this dataset and assign each row[col] to variable for my stored procedure.
Thanks for help.
Update More Info:
1.Sheet 1 goes to TABLE 1 in sql
2.Sheet 2 goes to Table 2 in sql
3.Sheet 3 also goes to Table 2 in sql
4.Table 1 has 1 to many relationship to Table 2
Here is some half-pseudocode boilerplate to start from:
var datatable = dataset.Tables[0];
using(var proc = connection.CreateCommand(...))
{
proc.Parameters.Add("#firstcolumnname", SqlDbType.Int);
foreach(DataRow dr in datatable.Rows)
{
/* check for DbNull.Value if source column is nullable! */
proc.Parameters["#firstcolumnname"].Value = dr["FirstColumnName"];
proc.ExecuteNonQuery();
}
}
But this makes much sense only if you're doing some data processing inside the stored procedure. Otherwise this calls for a bulk insert, especially if you have lots (thousands) of rows in the excel sheets.
The one-to-many relation (foreign key) can be tricky, if you can control the order of inserts then simply start with the "one" table and then load the "many" table aftewards. Let the stored procedure match the key values, if they are not part of the Excel source data.
Whole different story if you are in a concurrent multi-user environment and the tables are accessed and written to, while you load. Then you'd have to add transactions to preserve referential integrety throughout the process.
Related
I have a dataset object in C# (SSIS package) containing about 40000 rows and SQL table containing about 50000000 rows. I want to join these tables on their IDs.
I can't load the SQL table in C# as its too big, also, I don't have a permission on that server to create table (for cloning the object from C#).
Is there any way that I can join object and table?
Does C# or SSIS package support this kind of solution?
It is possible to do in SSIS.
Below are some scenarios to do it. Key question - do you have 1-many match or many-many match
Alternative 1 - you need to match all rows of SQL Table refers to C# table (1 SQL table row matches 0 or 1 C# table rows).
High level view on the approach:
Create a dataset object with data and store it in SSIS Object type variable. Script Task will do it.
In DataFlow Script Source - read rows from the variable and write it to Cache Destination, persist into Cache file.
On the next Data Flow - read SQL table with OLEDB, and perform join with Lookup transformation, where Lookup uses Cache file created on step 2 as reference. You can add columns from Cache table as you wish.
The destination of the last Data Flow is up to you
Comments and samples:
Before entering code in Script Source -- add Output and specify output columns with its names and data types.
Script code for reading data from DataSet variable:
#region Namespaces
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
#endregion
// Add in the appropriate namespaces
using System.Data;
using System.Data.OleDb;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void CreateNewOutputRows()
{
// Set up the DataAdapter to extract the data, and the DataTable object to capture those results
DataTable dt = new DataTable();
// Copy DataTable from DataSet
dt = Variables.vResults.DataTable["dtName"];
// Since we know the column metadata at design time, we simply need to iterate over each row in
// the DataTable, creating a new row in our Data Flow buffer for each
foreach (DataRow dr in dt.Rows)
{
// Create a new, empty row in the output buffer
SalesOutputBuffer.AddRow();
// Now populate the columns - here are sample names,
// have to define it before as columns in Script Source Output
SalesOutputBuffer.PurchOrderID = int.Parse(dr["PurchOrderID"].ToString());
SalesOutputBuffer.RevisionNumber = int.Parse(dr["RevisionNumber"].ToString());
SalesOutputBuffer.CreateDate = DateTime.Parse(dr["CreateDate"].ToString());
SalesOutputBuffer.TotalDue = decimal.Parse(dr["TotalDue"].ToString());
}
}
}
Alternative 2. You want to match all rows of C# DataSet to SQL Table (1 C# table row matches 0 or 1 SQL Table rows)
High level view on the approach:
Create a dataset object with data and store it in SSIS Object type variable. Script Task will do it.
In DataFlow Script Source - read rows from the variable.
Then - create a Lookup with Partial Cache and define SQL query to your table. You can create a No Cache Lookup if IDs in C# table are unique. Define match condition and columns needed from SQL Table.
Save result at some Destination
Bad Alternative - 1-many match with row multiplication
Example - row from C# table can match several SQL table rows and you have to output several rows in this case.
High level view on the approach:
Create a dataset object with data and store it in SSIS Object type variable. Script Task will do it.
In DataFlow Script Source - read rows from the variable. Sort it by ID.
Ad another Data Source where reading SQL Table, ordered by ID in the same direction.
Do a SSIS Merge Join
Save results to some destination
The bad thing about this scenario is that it may require a lot of RAM to do Sort and Merge Join transformations.
Ferdipux's approach is good one and less complicated than what lies below. The tradeoff between their solution and this is performance versus complexity. In the approach outlined by Ferdipux, you'll have to pull all 50 million rows from your source table into the data flow just to identify whether you have a match.
The approach I propose is to
Load your dataset into a temporary table. You might not be able to create a permanent table but temporary objects should not be an issue
Rewrite your source query to incorporate the temporary table.
Now the database engine can efficiently extract the source data with minimal impact.
Technical bits
Execute SQL Task (create temp table)
-> Data Flow Task (populate temp table
-> Data Flow Task (extract from big table)
In your connection manager to Source1, change the property for RetainSameConnection to True. This will ensure our temporary table does not go out of scope during execution.
Execute SQL Task
Create a global temporary table when the package begins.
During development, you will need to open a connection and run the supplied code and KEEP THE CONNECTION OPEN. This is as simple as running the query in SSMS and not closing the application.
IF OBJECT_ID('tempdb..##SO_59281633') IS NOT NULL
BEGIN
DROP TABLE ##SO_59281633;
END
-- Create global temporary table
CREATE TABLE ##SO_59281633
(
SomeKey int NOT NULL
, AValue varchar(50) NOT NULL
);
Data Flow Task (populate temp table)
Use the approach outlined by Ferdipux but also you need to specify on the Data Flow's properties that DelayValidation = True
Validation happens when the package is opened for editing and when it begins execution. During normal execution, the temporary table won't exist until the previous task has executed so specifying Delay Validation means this task will not validate until it is time for it to start - as opposed to the package.
If you're comfortable with .net, you can replace this step by using a Script Task and then use an ADO.NET/OLE DB connection and command objects to load the temporary table.
Data Flow Task (extract from big table)
You will again need to specify DelayValidation = True here as the source query will rely on a temporary object.
The source for your OLE DB Source will be changed from Table to Query and then specify your query
SELECT * FROM dbo.BigTable AS BT INNER JOIN ##SO_59281633 AS SO ON SO.SomeKey = BT.SomeKey;
I'm reading (asp.net mvc c# site) an excel file with multiple worksheets into a sql database where the first sheet goes into table A and an auto-increment column generates a unique id or PK.
Now the second worksheet goes into TABLE B but it has a composite key made up of auto-increment column for TABLE B and the value from Table A.
My question is how do I get Table A's PK into Table B while reading the excel file?
I'm not sure if this question is better suited for database design or c#.
Depending on how you're doing your insert, you could run a query like
SELECT scope_identity()
which will get your last inserted PK.
I'm working on the function Import, i have an excel file which contains some data, that later will be edited by the user, I managed to do the import excel by SmartXLS in C# and update all data to SQL Server Database, however, what I did is to fetch all data in excel file and update all rows into the SQL Table, which affects to the performance and I also updated unedited rows.
I would like to ask that is there any way that I can get only modified cells, rows in Excel and update to the correponding data in SQL Table?
var workbook = new WorkBook();
workbook.read(filePath);
var dataTable = workbook.ExportDataTable();
Just a Scenarion, maybe it helps you to understand what gordatron and i were talking about:
Following Situation:
There is a Table "Products" wich is central storage place for product informations
and a table "UpdatedProducts" which structure looks exactly like "Products" table but data
maybe different. Think of following scenarion: you export product table to excel in the morning. the whole
day you delete, add, update products in your excel table. At the end of the day you want to re-import your excel
data to "Products" table. What you need:
delete all records from "UpdatedProducts"
insert data from excel to
"UpdatedProducts" (bulk insert if possible)
update the "Products"
table
Then a Merge-Statement could look like this:
MERGE Products AS TARGET
USING UpdatedProducts AS SOURCE
ON TARGET.ProductID = SOURCE.ProductID
WHEN MATCHED AND TARGET.ProductName <> SOURCE.ProductName OR TARGET.Rate <> SOURCE.Rate
THEN UPDATE SET TARGET.ProductName = SOURCE.ProductName,
TARGET.Rate = SOURCE.Rate
WHEN NOT MATCHED BY TARGET
THEN INSERT (ProductID, ProductName, Rate)
VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
What this Statement does:
WHEN MATCHED:
Data exist in both tables, we update data in "Products" if ProductName or Rate is different
WHEN NOT MATCHED BY TARGET:
Data exist in staging table but not in your original table, we add them to "Products"
WHEN NOT MATCHED BY SOURCE:
Data exists in your original table but not in staging table, thy will be deleted from "Products"
Thanks a lot to http://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/ for this perfect example!
I have a .Net DataTable that contains records, all of which are "added" records. The corresponding table in the database may contain millions of rows. If I attempt to simply call the "Update" method on my SqlDataAdapter, any existing records cause an exception to be raised due to a violation of the primary key constraint. I considered loading all of the physical table's records into a second DataTable instance, merging the two, and then calling the Update method on the second DataTable. This actually works exactly like I want. However, my concern is that if there are 30 billion records in the physical table, loading all of that data into a DataTable in memory could be an issue.
I considered selecting a sub-set of data from the physical table and proceeding as described above, but the construction of the sub-query has proved to be very involved and very tedious. You see, I am not working with a single known table. I am working with a DataSet that contains several hundred DataTables. Each of the DataTables maps to its own physical table. The name and schema of the tables are not known at compile time. This has to all be done at run time.
I have played with the SqlBulkCopy class but have the same issue - duplicate records raise an exception.
I don't want to have to dynamically construct queries for each table at run time. If that is the only way, so be it, but I just can't help but think that there must be a simpler solution using what Ado.Net provides.
you could create your insertcommand like this:
declare #pk int = 1
declare #txt nvarchar(100) = 'nothing'
insert into #temp (id, txt)
select distinct #pk, #txt
where not exists (select id from #temp x where x.id = #pk)
assuming that your table #temp (temporary table used for this example) is created like this (with primary key on id)
create table #temp (id int not null, txt nvarchar(100))
I created two tables(FIRSTtable and SECONDtable) in the mysql database and two tables that are related.
The FIRST table, has a columns (product_id (pK), product_name).
The SECOND table has an columns (machine_id, production_date, product_id (fK),
product_quantity, operator_id).
Relations between the two tables using the product_id column with UpdateCascade and DeleteCascade. Both relationships are functioning normally when I try with the sql script. Suppose I delete all product_id in the FIRST table, all existing data in the SECOND table will be deleted.
Both of these tables displayed in datagridview. When I delete all the data in the FIRST table, the all rows in datagridview FIRST table will be deleted, also the data in mysql the FIRST table will be deleted.
I try to open the mysql database, the data are in SECOND Table also deleted, the problem why the view that in the second datagridview, can not be deleted, still keep the previous data? How to refresh datagridview binding in vb.net or C#? Thanks.
With Me.SECOND_DataGridView
.Datasource = Nothing ' tried this, but failed.
.DataSource = MyDataset.Tables("SECOND_table")
End With
I believe what you are running into is the fact the the MySQL Engine is actually performing the cascading deletes for you.
When you query the MySQL Data into a localized C# "DataTable" (Table within a DataSet), that data is now in memory and not directly linked to that on the disk. When you go to delete the rows in the "memory" version of the first data table, its causing the deletions to occur at the SERVER for the second level table and NOT directly updating you in-memory version of data table two.
That being said, you will probably have to do one of two things... Requery the entire dataset (tables one and two) to get a full refresh of what is STILL in the actual database... OR... As you are calling the delete from table one of the dataset, you'll have to perform the delete handling in the local datatable TWO as well to keep it in synch.