I'm trying to create an object variable that will hold a collection from an Execute SQL Task. This collection will be used in multiple Script Task throughout the ETL package.
The problem is, after the first Fill of the first Script Task, the object variable becomes empty. Here's a code on how I used the variable to a DataTable:
try
{
DataTable dt = new DataTable();
OleDbDataAdapter da = new OleDbDataAdapter();
da.Fill(dt, Dts.Variables["reportMetrics"].Value);
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception Ex)
{
MessageBox.Show(Ex.Message);
Dts.TaskResult = (int)ScriptResults.Failure;
}
Throughout the ETL package, Script Task components will have this piece of code. Since the variable becomes empty after the first Fill, I can't reuse the object variable.
I'm guessing that the Fill method has something to do with this.
Thanks!
It looks like your Dts.Variables["reportMetrics"].Value object holds DataReader object. This object allows forward-only read-only access to the data. You cannot fill DataTable twice using DataReader. To accomplish your task you need to create another script task that performs exactly what you described here: it reads the Reader to DataTable object and stores this DataTable object in another Dts.Variable with type Object.
Dts.Variables["reportMetricsTable"].Value = dt
After that all your subsequequent script tasks shall either create a copy of this table if they modify the data, or use it directly if they do not modify it.
DataTable dtCopy = (Dts.Variables["reportMetricsTable"].Value as DataTable).Copy()
I had a similar situation. While I think you can do a SQL Task with a SELECT COUNT(*) query and assign the result to an SSIS variable, what I did was create an int SSIS variable called totalCount with an original value of 0. I expect the total count to be > 0 (otherwise, I won't have anything to iterate on) so I created an if statement within my Script Task. If the value is zero, I assume totalCount has not been initialized, so I use the same code you are using (with the Fill method). Otherwise (i.e, in further iterations), I skip that part and continue to use totalCount variable. Here's the block of code. Hope it helps:
if ((int)Dts.Variables["User::totalCount"].Value == 0) // if the total count variable has not been initialized...
{
System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter();
DataTable stagingTablesQryResult = new DataTable();
da.Fill(stagingTablesQryResult, Dts.Variables["User::stagingTablesQryResultSet"].Value); // to be used for logging how many files are we iterating. It may be more efficient to do a count(*) outside this script and save the total number of rows for the query but I made this as proof of concept for future developments.
Dts.Variables["User::totalCount"].Value = stagingTablesQryResult.Rows.Count;
}
Console.WriteLine("{0}. Looking for data file {0} of {1} using search string '{2}'.", counter, Dts.Variables["User::totalCount"].Value, fileNameSearchString);
Excellent
This has helped me around an issue in building myt ETL platform.
Essentially I execute a SQL task to build a dataset of tasks, there is some in line transformations and rules which pull the relevant tasks to the fore, which for obvious reasons I only want to execute the once per execution.
I then need to get the unique ProcessIDs from the data set (to use in a For Each Loop)
Within the FEL, I want to then fetch the relevant records from the original dataset to then push through a further FEL process.
I was facing the same "empty data set" for the 2nd execution against the dataset.
I thought I'd try to share my solution to assist others
You'll need to add the Namespaces
using System.Data.OleDb;
into the Scripts
Screen shot of solution
Get dataset
Execute SQL - Get your data and pass into a Variable Object
Pull Ds
Declare the Variable Objects
public void Main()
{
DataTable dt = new DataTable();
OleDbDataAdapter da = new OleDbDataAdapter();
//Read the original table
da.Fill(dt, Dts.Variables["Tbl"].Value);
//Push to a replica
Dts.Variables["TblClone"].Value = dt;
Dts.TaskResult = (int)ScriptResults.Success;
}
Build Proc List
This gets a list of ProcessIDs (and Names) by filtering on a Rank field in the dataset
Declare the Variable Objects
public void Main()
{ //Take a copy of the Cloned Dataset
DataTable dtRead = (Dts.Variables["TblClone"].Value as DataTable).Copy();
//Lock the output object variable
Dts.VariableDispenser.LockForWrite("User::ProcTbl");
//Create a data table to place the results into which we can write to the output object once finished
DataTable dtWrite = new DataTable();
//Create elements to the Datatable programtically
//dtWrite.Clear();
dtWrite.Columns.Add("ID", typeof(Int64));
dtWrite.Columns.Add("Nm");
//Start reading input rows
foreach (DataRow dr in dtRead.Rows)
{
//If 1st col from Read object = ID var
if (Int64.Parse(dr[9].ToString()) == 1) //P_Rnk = 1
{
DataRow newDR = dtWrite.NewRow();
newDR[0] = Int64.Parse(dr[0].ToString());
newDR[1] = dr[4].ToString();
//Write the row
dtWrite.Rows.Add(newDR);
}
}
//Write the dataset back to the object variable
Dts.Variables["User::ProcTbl"].Value = dtWrite;
Dts.Variables.Unlock();
Dts.TaskResult = (int)ScriptResults.Success;
}
Build TaskList from ProcList
Cycle round ProcessID in a For Each Loop
Build TL Collection
..and map Vars
Build TL Var Mappings
Build TL Script
This will dynamically build the output for you (NB this works for me although havent extensively tested it, so if it doesnt work....have a fiddle with it).
You'll see I've commented out some Debug stuff
public void Main()
{
//Clone the copied table
DataTable dtRead = (Dts.Variables["TblClone"].Value as DataTable).Copy();
//Read the var to filter the records by
var ID = Int64.Parse(Dts.Variables["User::ProcID"].Value.ToString());
//Lock the output object variable
Dts.VariableDispenser.LockForWrite("User::SubTbl");
//Debug Test the ProcID being passed
//MessageBox.Show(#"Start ProcID = " + ID.ToString());
//MessageBox.Show(#"TblCols = " + dtRead.Columns.Count);
//Create a data table to place the results into which we can write to the output object once finished
DataTable dtWrite = new DataTable();
//Create elements to the Datatable programtically
//dtWrite.Clear();
foreach (DataColumn dc in dtRead.Columns)
{
dtWrite.Columns.Add(dc.ColumnName, dc.DataType);
}
MessageBox.Show(#"TblRows = " + dtRead.Rows.Count);
//Start reading input rows
foreach (DataRow dr in dtRead.Rows)
{
//If 1st col from Read object = ID var
if (ID == Int64.Parse(dr[0].ToString()))
{
DataRow newDR = dtWrite.NewRow();
//Dynamically create data for each column
foreach (DataColumn dc in dtRead.Columns)
{
newDR[dc.ColumnName] = dr[dc.ColumnName];
}
//Write the row
dtWrite.Rows.Add(newDR);
//Debug
//MessageBox.Show(#"ProcID = " + newDR[0].ToString() + #"TaskID = " + newDR[1].ToString() + #"Name = " + newDR[4].ToString());
}
}
//Write the dataset back to the object variable
Dts.Variables["User::SubTbl"].Value = dtWrite;
Dts.Variables.Unlock();
Dts.TaskResult = (int)ScriptResults.Success;
}
For Each Loop Container
FEL Cont Collection
N.B. Dont forget to map the items in the Variable Mappings
Now you can consume the records and do stuff with that data
I included the Msg Loop script as an easy data check...in reality this will go off and trigger other processes but just to aid you in data checks I though Id include it
Msg Loop
Msg Loop Script
public void Main()
{
// TODO: Add your code here
MessageBox.Show("ID = " + Dts.Variables["User::ProcID"].Value + ", and val = " + Dts.Variables["User::TaskID"].Value, "Name = Result");
Dts.TaskResult = (int)ScriptResults.Success;
}
Hope that helps somebody solve their issue (Ive been tring to resolve this for a working day or so :/
Related
Currently, I have a program that uses OleDb to take data from an Excel sheet and imports it into a SQL Server database using Entity Framework.
In bridging the gap between OleDb and EF, I push the data from the Excel sheet into a DataTable, go into the DataTable to grab all the data in a single row, put that into a StringBuilder separated by commas, and then turn that StringBuilder object into a string array separated by commas. With that, I then call the Add function to import the data into the database using EF.
In the code shown below, you can see that I have to call
Name = data[0], Message = data[1]
etc to push the data into the database. Is there a way that I can instead pass the string array into the class instead of each separate parameter and deal with the data there?
public static void Insert(string Sheet, OleDbConnection conn)
{
OleDbCommand command = new OleDbCommand("SELECT Name, Message, Message type FROM [" + Sheet + "$]", conn); //Selects everything inside excel file
DataTable Data = new DataTable();
OleDbDataAdapter adapter = new OleDbDataAdapter(command);
adapter.Fill(Data); //Puts all data inside a DataSet
StringBuilder sb = new StringBuilder();
var context = new DataWarehouseContext();
// Prints out DataSet
foreach (DataRow dataRow in Data.Rows)
{
foreach (var item in dataRow.ItemArray)
{
sb.Append(item);
sb.Append(",");
}
string[] data = sb.ToString().Split(','); //Gets data for each item in vHealth Insert method
context.RvtoolsVHealth.Add(new RvtoolsVHealth { Name = data[0], Message = data[1], MessageType = data[2] });
context.SaveChanges();
}
}
Any pointers would be great, thanks!
Not in a built-in way. You could create a constructor for RvtoolsVHealth that takes a string array and sets the properties, but it's a poor API in my opinion because there's no way to ensure that the properties are mapped to the proper array values. What if the array contains the message type, message, and name in that order?
What you're doing is the canonical method of instantiating objects.
I will say that you seem to waste some energy by concatenating a string only to split it back out. Why not just pull the values from the dataRow directly:
foreach (DataRow dataRow in Data.Rows)
{
context.RvtoolsVHealth.Add(new RvtoolsVHealth {
Name = dataRow[0].ToString(),
Message = dataRow[1].ToString(),
MessageType = dataRow[2].ToString()
});
context.SaveChanges();
}
You could even go one step further and use a OleDbDataReader to read the values rather than taking the time to fill a DataTable
I have an Execute SQL Task that returns a table of data using Full Result Set used to drive other child processes in parallel. The problem is needing locks on the Object so each is processed only once. How can I access the Object variable in a C# Script Task and convert into a Queue datatype in order to lock?
The Execute SQL Task stores the results in User::CoreTables, then I want to get the value in the first row each time:
// The line that fails to do the conversion, not sure how to do this
System.Collections.Generic.Queue<string> tablesQueue = (System.Collections.Generic.Queue<string>)Dts.Variables["User::CoreTables"].Value;
lock (tablesQueue)
{
//If number of rows in queue is greater than 0 then dequeue row for processing
//and set DoWork to true. Otherwise set DoWork to false as there is nothing to process
if (tablesQueue.Count > 0)
{
Dts.Variables["User::TableToProcess"].Value = tablesQueue.Dequeue();
Dts.Variables["User::DoWork"].Value = true;
}
else
{
Dts.Variables["User::DoWork"].Value = false;
}
}
Execute SQL Query stores full result set as OleDB Recordset. You cannot cast it to a Queue of strings, even if OleDB recordset contains only one column. So, you have to process OleDB object inside Script Task, filling in your Queue.
The following code sample fragment can process OleDB Object variable
using System.Data.OleDb;
DataTable dt = new DataTable();
OleDbDataAdapter adapter = new OleDbDataAdapter();
adapter.Fill(dt, Dts.Variables["User::CoreTables"].Value);
foreach (DataRow row in dt.Rows)
{
//process datatable row here
}
the problem domain is that I have an db file with like ~90000 rows and 6 columns. I got an Select query where I get all the rows and columns that are necessary for me and that works fine. Now is the thing that I fill a DataTable with those records. I do this with SQliteDataAdapter Fill Method and this takes about ~1,3 seconds and after this I fill my ObservableCollection (<--Bound to DataGrid) with this data and this takes also about ~1,3 seconds. So here is my code
private void GetSelectedMaterial()
{
DataTable dtMaterial = new DataTable();
materialColl.Clear(); // Clearing ObservableCollection
Trace.WriteLine("GetSelectedMaterial TS " + DateTime.Now + DateTime.Now.Millisecond);
using (SQLiteConnection connection = new SQLiteConnection(dbConnection))
using (SQLiteCommand cmd = connection.CreateCommand())
{
connection.Open();
query = "SELECT * FROM Tbl_Materialliste LEFT JOIN(SELECT * FROM Tbl_Besitzt k WHERE k.TechnikID = '" + teTechnikID + "') as k ON k.MaterialID = Tbl_Materialliste.MaterialID";
dataAdapter = new SQLiteDataAdapter(query, connection);
Trace.WriteLine("query: " + DateTime.Now + DateTime.Now.Millisecond);
dtMaterial.Columns.Add("Checked", typeof(bool));
Trace.WriteLine("here comes the fill: " + DateTime.Now + DateTime.Now.Millisecond);
dataAdapter.Fill(dtMaterial);
Trace.WriteLine("Checkbox: " + DateTime.Now + DateTime.Now.Millisecond);
DetermineCheckBox(dtMaterial, teTechnikID, 8);
Trace.WriteLine("SQL TS: " + DateTime.Now + DateTime.Now.Millisecond);
}
FillMaterialColl(dtMaterial);
}
private void FillMaterialColl(DataTable dtMaterial)
{
foreach (DataRow dr in dtMaterial.Rows)
{
Material mat = new Material();
mat.isChecked = (bool)dr.ItemArray[0];
mat.materialID = (string)dr.ItemArray[1];
mat.materialkurztext = (string)dr.ItemArray[2];
mat.herstellername = (string)dr.ItemArray[3];
mat.herArtikenummer = (string)dr.ItemArray[4];
mat.dokument = (string)dr.ItemArray[5];
mat.substMaterial = (string)dr.ItemArray[6];
materialColl.Add(mat);
}
}
I know ObservableCollections are draining performance but is there some way to do this in another way? Some say to use DataReader instead of DataAdapter but DataAdapter shall use DataReader so I think there is no improvement in performance. So the main problem is that that process takes to long and user experience is not so good if showing new material takes about 3-4 seconds..
EDIT
So here comes my DB design:
It is a many-to-many relationship between Tbl_Material and Tbl_Technik
And my Select query gives me ALL entrys from Tbl_Material (~90k) and in addition those columns from Tbl_Besitzt where I can find the technikID
So that I can filter (for an checkbox) which entrys belong to my MaterialID
In my DB file MaterialId from Tbl_Materialliste is a PK and also TechnikID from Tbl_Technik - not that you are wondering in the design image, I didnt get they into the model..
Thanks a lot!
It's hard to investigate the performance issues of a database without knowing its schema and design. In your SQL query, there is a join expression. You need to ensure that the corresponding data fields are indexed in order to make the join operation fast. This depends also on the data size of both tables.
To speed up the displaying of the search results, you should avoid adding them item-by-item in your ObservableCollection<T>. This is because each time you add a new item, the Binding Engine transfers this item to the DataGrid causing the grid to perform all the actions it needs to display a record.
If you don't really need the collection to be observable (e.g. you won't add or remove any items in the view), then just make it to an IEnumerable<T>:
public IEnumerable<Material> Materials
{
get { return this.materials; }
private set
{
// Using the PRISM-like implementation of INotifyPropertyChanged here
// Change this to yours
this.SetProperty(ref this.materials, value);
}
}
In you method, create a local List<Material>, fill it, and then expose to the view:
List<Material> materials = new List<Material>();
// fill the list here
// ...
// finally, assign the result to your property causing the binding to do the job once
this.Materials = materials;
If you need the ObservableCollection<T> though, you can do the same trick - create a local copy, fill it, and finally expose.
If this doesn't help, you should try using the UI virtualization. This is a rather big topic, but there is a lot of information on the net.
I basically have a listbox that has postcode areas i.e : AE,CW,GU etc etc.
The user selects this and then a postback occurs - an sql statement is builts and a database query operation is performed and the results are returned to a datatable called tempdata.
So far so good. I then need to loop through this datatable and copy the records to my main viewstate datatable which is the datasource for google maps api.
DataTable tempstore = GetData(querystring, "");
//check tempstore has rows otherwise add defaultcust as default otherwise map will be blank
if (tempstore.Rows.Count == 0)
{
tempstore = GetData("WHERE CUSTCODE=='CD344'", "");
infoalert.Visible = true;
infoalert.InnerHtml = "No Results Returned For Selection";
}
foreach (DataRow row in tempstore.Rows)
{
dtpc.ImportRow(row);
dtpc.AcceptChanges();
}
//database command
using (OleDbConnection con = new OleDbConnection(conString))
{
using (OleDbCommand cmd = new OleDbCommand(query))
{
using (OleDbDataAdapter sda = new OleDbDataAdapter())
{
cmd.Connection = con;
sda.SelectCommand = cmd;
sda.Fill(dt5);
}
}
}
So my main datatable can grow and grow as users add more postcodes. However when it gets to around 500 rows or so I get a huge memory spike only on postback and then it settles back down.My ram usage goes from 2gb to 3gb and if even more postcodes is selected it maxes the memory and crashes my pc.
If I remove the:
dtpc.Importrow(row);
the memory spike goes completely, obviously because the main datatable has no rows. I thought you only run into memory issues when you have thousands of rows?
Any help would be much appreciated.
thank you
Do you really need all the rows at once
A DataReader will access a single row at a time and keep you memory to a minimum
DataReader class
If you need all you data at once create a class of strut for the data and hold it in a collection like a List. DataTable is a heavy object.
And if you are measuring memory via Task Manager be aware it is not very accurate.
First off, make sure you're wrapping any SQL execution in the appropriate "using" clauses. This is most likely the cause of your problem.
using (var command = new SqlCommand())
{
// Some code here
}
Like Blam said, DataTable is too heavy for your purposes.
You can convert your data rows into class objects quite easily:
var datasourceList = new List<YourCustomObject>();
foreach (DataRow row in tempstore.Rows)
{
var newMapsObject = new YourCustomObject
{
Value1 = row.Field<String>("Value1ColumnName"),
Value2 = row.Field<String>("Value2ColumnName")
};
datasourceList.Add(newMapsObject);
}
viewStateList.AddRange(datasourceList);
To bind a custom collection to a data display (such as a repeater) you assign the list to the .DataSource property of said display, then call .DataBind(). This will work for most all ASP.NET data display objects.
repeater1.DataSource = viewStateList;
repeater1.DataBind();
as there is so many of them, examples.. i was researching online, though they do not show use of SqlBulkCopy, in the folowing scenario :
i have used a query in order to fetch existing Data from SqlServer(2008), into a DataTable,
so i could sort the data locally, and avoid hitting database while processing.
so now, at that stage, i already have the option to clone the source dataTable Schema
using localDataTable = DataTableFromOnlineSqlServer.Clone();
by doing that Clone(), i now have all the columns, and each of the column-dataType.
then in next stage of the program, i am filling the Cloned-From-Db, - that Local (yet Empty) new DataTable with some new data .
so by now i have a populated DataTable, and it's ready to be stored in sql server .
using this code below yeld no results
public string UpdateDBWithNewDtUsingSQLBulkCopy(DataTable TheLocalDtToPush, string TheOnlineSQLTableName)
{
// Open a connection to the AdventureWorks database.
using (SqlConnection connection = new SqlConnection(RCLDBCONString))
{
connection.Open();
// Perform an initial count on the destination table.
SqlCommand commandRowCount = new SqlCommand("SELECT COUNT(*) FROM "+TheOnlineSQLTableName +";", connection);
long countStart = System.Convert.ToInt32(commandRowCount.ExecuteScalar());
var nl = "\r\n";
string retStrReport = "";
retStrReport = string.Concat(string.Format("Starting row count = {0}", countStart), nl);
retStrReport += string.Concat("==================================================", nl);
// Create a table with some rows.
DataTable newCustomers = TheLocalDtToPush;
// Create the SqlBulkCopy object.
// Note that the column positions in the source DataTable
// match the column positions in the destination table so
// there is no need to map columns.
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = TheOnlineSQLTableName;
try
{
// Write from the source to the destination.
bulkCopy.WriteToServer(newCustomers);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
// Perform a final count on the destination
// table to see how many rows were added.
long countEnd = System.Convert.ToInt32(
commandRowCount.ExecuteScalar());
retStrReport += string.Concat(string.Format("Ending row count = {0}", countEnd), nl);
retStrReport += string.Concat("==================================================", nl);
retStrReport += string.Concat(string.Format("{0} rows were added.", countEnd - countStart),nl);
retStrReport += string.Concat("New Customers Was updated successfully", nl, "END OF PROCESS !");
Console.ReadLine();
return retStrReport;
}
}
now The problem is, that No data was inserted at all.
i have made some resarch and there is no solution for me
i also checked to make sure that :
all the columns of source and destination are aligned (although
it is a clone so no wories)
that there is a PK set as IDENTITY
column on the Sql server table
what am i missing here ?
...the "report" i have made, in order to calculate inserted rows tells
" 0 rows were added "
and thats it no Errors or exeptions reported /thrown .