In SSIS, I have a Script Task that takes in 10 rows of data, containing 6 values
- Row.WorkOrderID
- Row.WorkOrderProductID
- Row.BatchNo
- Row.OldQty
- Row.NewQty
I need extract all records that have the same Row.WorkOrderID and Row.WorkOrderProductID, but different Row.NewQty compared to Row.OldQty.
I'm aware that SSIS Script Task ProcessInputRow method processes row by row. In my case, can this only be done in PostExecute method?
Consider using conditional splits to do the job and merge again if needed. Script task is only as a last option if something can't be done by the transforms that SSIS has, and because Script task process row by row it is slow for large data.
Check out the condition to use at conditional split:
Related
I have an SSIS package that will assemble dynamic SQL statement and execute on a different server with the results needing to be written back to the first server.
Because the SQL is created and passed in as a variable, a Foreach loop is used to run each instance. The results are put into an Object Variable and This works fine. If I put my script task in the Foreach loop itself, I can write the results back to the original server. However- I would really like, for performance reasons, to get the insert out of the Foreach loop and read the result set / object variable to open one connection and write all the data at one go. But when I pull the object doing the reading of the results and writing to the database out of the loop, it only write the last row of data, not all of them.
How can I get to all the rows in the result set outside of the Foreach loop? Is there a pointer to the first row or something? I can't imagine I'm the first person to need to do this but my search for answers has come up empty. Or maybe I'm just under caffeinated.
Well, it can be simplified, if some conditions are met. Generally, SSIS is metadata centric, i.e. set of columns and its types. So, if each of SQL query you run returns the same set of columns, including column names, data types; then you can try the following approach:
In the ForEach loop, run SQL commands and store its results into an Object Variable.
Then - create a Data Flow task with a Script Component Source, fetching its rows from the step 1 Object Variable. Add the rows to some SQL table as you usually do; if needed - you can add some other data like SQL query text. The resulted rows can be added to some table as a DFT Destination.
How to use Object Variable as a data source - here are already good answers to this question
I'm new to SSIS, your idea or solution is greatly appreciated.
I have a flat file with the first row as the file details(not the header). The second row onwards is the actual data.
Data description
First-row format= Supplier_name, Date, number of records in the file
eg:
Supplier_name^06022017^3
ID1^Member1^NEW YORK^050117^50.00^GENERAL^ANC
ID2^Member2^FLORIDA^050517^50.00^MOBILE^ANC
ID3^Member3^SEATTLE^050517^80.00^MOBILE^ANC
EOF
Problem
Using SSIS I want to split the First row into output1 and second row onwards into output2.
With the help of conditional split, I thought I can do this. But I'm not sure what condition to give in order to split the rows. Should I try with multicast?
Thanks
I would handle this by using a script task (BEFORE the dataflow) to read the first row and do whatever you want with it.
Then in the dataflow task, I would set the flat file source to ignore the first row and import the second row on as data.
Thank you all. Here is an alternative solution
I used a script component in SSIS to do this.
Step1: Create a variable called RowNumber.
Step2: Then add a script component which will add an additional column and increments row numbers.
SSIS Script component
private int m_rowNumber;
public override void PreExecute()
{
base.PreExecute();
m_rowNumber = 0;
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
m_rowNumber++;
Row.RowNumber = m_rowNumber;
}
Step3: Use the output of Script component as the input of conditional split and create a condition with RowNumber == 1.
The Multicast will split the data accordingly.
I would first make sure that you have the correct number of columns in your Flat File Connection:
Edit the Flat File Connection -> Advanced Tab press the New button to add columns. In your example you should have 7, Column 0 to Column 6.
Now add a conditional split and ass two case statements:
Output Name Condition
HeaderRow [Column 0] == "Supplier_Name"
DetailRow [Column 0] != "Supplier_Name"
Now route these to the Output 1 and Output 2
Expanding on Tab Allerman's answer.
For our project, we used a power shell script component inside an Execute process task which runs a simple power shell command to grab the first line of the file.
See this MSDN blog on how to run power shell script.
Power shell script to get the first line
Get-Content C:\foo\yourfolderpath\yourfilename.txt -First 1
This note only helps in case like yours, but generically helps in avoiding processing large files (in GBs and upwards) which have incorrect header. This simple power shell executes in milliseconds as opposed to most of the processes/scripts which will require to load a full file into memory, slowing things down.
I'm new to VS and SSIS. I'm trying to consolidate data that I have in visual studio using the Script Component Transformation Editor that currently looks like this:
date type seconds
2016/07/07 personal 400
2016/07/07 business 300
2016/07/07 business 600
and transform it to look like this:
date type avgSeconds totalRows
2016/07/07 personal 400 1
2016/07/07 business 450 2
Basically, counting and taking the average for type and date.
I tried both VB.net and C# options and am open to either (new to both). Does anyone have any ideas on how i can do this?
I'm using VS 2012. I'm thinking i need to create a temp or buffer table to keep count of everything as I go through the Input0_ProcessInputRow and then write it to the output at the end, I just can't figure out how to do this. Any help is much appreciated!!
Based on your comments this might work.
Set up a data flow task that will do a merge join to combine the data
https://www.simple-talk.com/sql/ssis/ssis-basics-using-the-merge-join-transformation/
Send the data to a staging table. This can be created automatically by the oledb destination if you click 'New' next to table drop down. If you plan to rerun this package then you will need to add an execute sql task before the data flow task to delete from or truncate this staging table.
Create an execute sql task with your aggregation query. Depending on your situation an insert could work. Something like:
Insert ProductionTable
Select date, type, AVG(Seconds) avgSeconds, Vount(*) totalRows
From StagingTable
Group By date, type
You can also use the above query, minus the insert production table, as a source in a data flow task in case you need to apply more transforms after the aggregation.
I wanted to know ,
I have a ssis package which can
1) read multiple input files
2) store the data from files to dB
3) archive the input files
I have to write functional test using specflow
One of my test case is :
To check the row count in the table in dB to be equal the summation of all lines in each input file read.
I am not sure how I can achieve this. Can anyone help me in :
how to get the summation of lines in each file.
To check the row count in the table in dB
Add a variable to your SSIS package, named something like iRowCount.
In the Data Flow Task where DB is the source, add a RowCount control, and assign the value to variable iRowCount.
to be equal the summation of all lines in each input file read.
Same concept, another two variables, named something like iFilesRowCount and iFilesRowCountTotal.
Then you'll have to pull off in each data pump a RowCount control, assigning the value to iFilesRowCount. Then outside of each data pump a script task that does an iFilesRowCount = iFilesRowCount + iFilesRowCountTotal.
Then somewhere towards the bottom a script task to perform the comparison between iRowCount (db) and iFilesRowCountTotal, and in the arrows exiting that task create precedence constraints to pull off a 'True' path and 'False' path.
I'm developing an ASP.NET app that analyzes Excel files uploaded by user. The files contain various data about customers (one row = one customer), the key field is CustomerCode. Basically the data comes in form of DataTable object.
At some point I need to get information about the specified customers from SQL and compare it to what user uploaded. I'm doing it the following way:
Make a comma-separated list of customers from CustomerCode column: 'Customer1','Customer2',...'CustomerN'.
Pass this string to SQL query IN (...) clause and execute it.
This was working okay until I ran into The query processor ran out of internal resources and could not produce a query plan exception when trying to pass ~40000 items inside IN (...) clause.
The trivial ways seems to:
Replace IN (...) with = 'SomeCustomerCode' in query template.
Execute this query 40000 times for each CustomerCode.
Do DataTable.Merge 40000 times.
Is there any better way to work this problem around?
Note: I can't do IN (SELECT CustomerCode FROM ... WHERE SomeConditions) because the data comes from Excel files and thus cannot be queried from DB.
"Table valued parameters" would be worth investigating, which let you pass in (usually via a DataTable on the C# side) multiple rows - the downside is that you need to formally declare and name the data shape on the SQL server first.
Alternatively, though: you could use SqlBulkCopy to throw the rows into a staging table, and then just JOIN to that table. If you have parallel callers, you will need some kind of session identifier on the row to distinguish between concurrent uses (and: don't forget to remove your session's data afterwards).
You shouldn't process too many records at once, because of errors as you mentioned, and it is such a big batch that it takes too much time to run and you can't do anything in parallel. You shouldn't process only 1 record at a time either, because then the overhead of the SQL server communication will be too big. Choose something in the middle, process eg. 10000 records at a time. You can even parallelize the processing, you can start running the SQL for the next 10000 in the background while you are processing the previous 10000 batch.