SSIS Get List of Missing Files

SSIS Get List of Missing Files - c#

I get a monthly submission of files from various companies that get loaded into SQL Server via an SSIS job. When the SSIS job runs I want to get a list of companies that did not submit a file. The files will have the date appended to the end so I'm assuming it will need to do some sort of wild card search through a list of names. If I'm expecting:
AlphaCO_File_yyyymmdd
BetaCO_File_yyyymmdd
DeltaCO_File_yyyymmdd
ZetaCO_File_yyyymmdd
and the file from ZetacO is missing I want to write ZetaCO to a table, or save it in a variable I can use in an email task.
I am using Visual Studio 2019 and SQL Server 2019. I have the Task Factory add-on for SSIS.

Note this answers uses psuedo code that needs to be tuned for your specific values.
My guess is that you already have a foreach loop set up where you are reading the file name to a parameter.
What you need is a table in SQL Server to compare against.
CompanyName, FileSubmitted (bit)
AlphaCO
BetaCo
DeltaCo
ZetaCO
First step is SQL command: udpate table set FileSubmitted = 0.
Then in each for loop have one path to update that table based on file name:
Use token or use C# task. c# would be company = fileName.Split('_')[0];
And then update the table:
update table
set FileSubmitted = 1
where CompanyName = company
Now you can use that table for emails.

Related

I need to create a new ETL C# process with large .CSV files to SQL Tables Add/Update records

I need to bring in a number of .CSV files into unique keyed SQL Tables (Table names and column names match from source to target). I started looking at libs like Cinchoo-ETL, but I need to do an "Upsert" Meaning Update if record is present insert if it's not present. I'm not sure if Cinchoo-ETL or some other lib has this feature built in.
For example lets say the SQL Server Customer table has some records in it, Cust# is a primary Key
Cust# Name
1 Bob
2 Jack
The CSV file looks something like this:
Cust#,Name
2,Jill
3,Roger
When the ETL program runs it needs to update Cust# 2 from Jack to Jill and insert a new cust# 3 record for Roger.
Speed reusability is important as there will be 80 or so different tables, some of the tables can have several million records in them.
Any ideas for a fast easy way to do this? Keep in mind I'm not a daily developer so examples would be awesome.
Thanks!

You are describing something than can be done with a tool I developped. It's called Fuzible (www.fuzible-app.com) : in Synchronization mode, it allows you to choose the behavior of the target table (allow INSERT, UPDATE, DELETE) and your Source can be any CSV file and your target, any database.
You can contact me from the website if you need an how-to.
The software is free :)
What you have to do is to create a Job with your CSV path as a Source connection, then your Database as the Target connection.
Choose the "Synchronization" mode, which, by opposition with the "Replication" mode will compare Source and Target data.
Then, you can write as many queries as you want (one for each CSV file) like this :
MyOutputTable:SELECT * FROM MyCSVFile.CSV
No need to write more complex queries if both CSV and database table shares the same schema (same columns)
The software should be able to do the rest :) Update rows than need to be updated and create new rows if required.

This is what I did in a recent SSIS job. I load the data to a temp table just use a regular SQL query to perform this comparison. This may be cumbersome on tables with lots of fields.
-- SEE DIFFERENCES FOR YOUR AMUSEMENT
SELECT *
FROM Accounts a
INNER JOIN
DI_Accounts da
ON a.CustomerNumber = da.CustomerNumber AND (
a.FirstName <> da.FirstName
)
-- UPDATE BASED ON DIFFERENCES
UPDATE a
SET
a.FirstName = a.FirstName
FROM Accounts a
INNER JOIN
DI_Accounts da
ON a.ModelId = da.ModelId AND (
a.FirstName <> da.FirstName
)

I would recommend that you take a look at the nuget package ETLBox and the necessary extension packages for Csv & Sql Server (ETLBox.Csv + ETLBox.SqlServer).
This would allow you to write a code like this:
//Create the components
CsvSource source = new CsvSource("file1.csv");
SqlConnectionManager conn = new SqlConnectionManager("..connection_string_here..");
DbMerge dest = new DbMerge(conn, "DestinationTableName");
dest.MergeProperties.IdColumns.Add(new IdColumn() { IdPropertyName = "Cust#" });
dest.MergeMode = MergeMode.Full; //To create the deletes
dest.CacheMode = CacheMode.Partial; //Enable for bigger data sets
//Linking
source.LinkTo(dest);
//Execute the data flow
Network.Execute(source);
This code snipped would do the corresponding inserts/updates & deletes into the database table for one file. Make sure that the header names match exactly with the column names in your database table (case-sensitive). For bigger data sets you need to enable the partial cache, to avoid having all data loaded into Memory.
It will use dynamic objects under the hood (ExpandoObject). You can find more information about the merge and the tool on the website (www.etlbox.net)
The only downside is that ETLBox is not open source. But the package allows you to work with data sets up to 10.000 rows to check if it suits your needs.

How do i compare a date and a text coming from a text file with a date and text in SQL server and do the further transformation as needed?

I need to develop a SSIS package and the need is to check two things before the data gets inserted.
There would be a text file where there would be a date and a text (anydate,text) and there is a table in sql server which will have the same pair of data(only one row where i might update it every time and i will insert data there for the first load so that it can be compared from the date and text coming from text file).
My question is, how can i compare the data coming from the text and the data coming from the sql server and do my transformation on the basis of true or false(if date matches do something and if it doesn't match update the present date in sql with this new date and do something else)

My question is, how can i compare the data coming from the text and the data coming from the sql server and do my transformation on the basis of true or false(if date matches do something and if it doesn't match update the present date in sql with this new date and do something else)
You are looking to perform an Upsert operation. To do that you need first a lookup transformation to check if the row exists; if lookup matched (row exists) use an OLEDB command to update the row else use an OLE DB destination to achieve that. You can refer to the following link for a step-by-step guide:
SSIS: Perform upsert (Update/Insert) using SSIS Package
Note that lookup transformation is case sensitive
Another method is to use Merge Join and conditional split:
SSIS insert and update rows in a table based on the contents of a Excel file

Visual Studio SSIS Script Transformation: Get Average and Count

I'm new to VS and SSIS. I'm trying to consolidate data that I have in visual studio using the Script Component Transformation Editor that currently looks like this:
date type seconds
2016/07/07 personal 400
2016/07/07 business 300
2016/07/07 business 600
and transform it to look like this:
date type avgSeconds totalRows
2016/07/07 personal 400 1
2016/07/07 business 450 2
Basically, counting and taking the average for type and date.
I tried both VB.net and C# options and am open to either (new to both). Does anyone have any ideas on how i can do this?
I'm using VS 2012. I'm thinking i need to create a temp or buffer table to keep count of everything as I go through the Input0_ProcessInputRow and then write it to the output at the end, I just can't figure out how to do this. Any help is much appreciated!!

Based on your comments this might work.
Set up a data flow task that will do a merge join to combine the data
https://www.simple-talk.com/sql/ssis/ssis-basics-using-the-merge-join-transformation/
Send the data to a staging table. This can be created automatically by the oledb destination if you click 'New' next to table drop down. If you plan to rerun this package then you will need to add an execute sql task before the data flow task to delete from or truncate this staging table.
Create an execute sql task with your aggregation query. Depending on your situation an insert could work. Something like:
Insert ProductionTable
Select date, type, AVG(Seconds) avgSeconds, Vount(*) totalRows
From StagingTable
Group By date, type
You can also use the above query, minus the insert production table, as a source in a data flow task in case you need to apply more transforms after the aggregation.

Saving each user's file along with corresponding database / table

CONTEXT: I am writing an application in Visual C# / .Net, where the user can:
Create a new document-file "Untitled.abc"
Add some data/content, e.g., images, text, etc.
Save the document-file as "MyDoc7.abc", where abc would be my app-specific extension.
QUESTION: Is the following method good to implement the above, or is there a simpler/cleaner approach?
I start with a SQL-Server database called MyDb.
Each time a user creates a new document-file, I programatically add a new DB table to MyDb
The newly created DB table stores the data/content from just that particular corresponding document-file created by the user.
When user re-opens the saved document-file in the future, I programatically read the data/content from the attached corresponding DB table.
UPDATE:
Solved. I will be using this Alternative approach:
I start with a SQL-Server database called MyDb with a table called MyTable, including a column called FileId.
Each time a user creates a new document-file, I assign the file a unique id, e.g., FileId = 5.
When the user adds a piece of data/content to the file, I store the piece of data to MyTable, but also store FileId = 5 for that piece of data.
In the future, when the user re-opens the file, I fetch all pieces of data from MyTable where FileId = 5.

That sounds like a particularly bad idea to do that.
You don't want to create tables on the fly
If all the data is in the database, what is the file for?
I would rather choose the approach that Office uses:
The file really is a zip archive which contains:
A folder for the assets (images and the like)
The actual file with the content and relative links to the assets.

SSIS - How do I load data from text files where the path of files is inside another text file?

I have a text file that contains a list of files to load into database.
The list contains two columns:
FilePath,Type
c:\f1.txt,A
c:\f2.txt,B
c:\f3.txt,B
I want to provide this file as the source to SSIS. I then want it to go through it line by line. For each line, I want it to read the file in the FilePath column and check the Type.
If type is A then I want it to ignore the first 4 lines of the file that is located at the FilePath column of the current line and then load rest of the data inside that file in a table.
If type is B then I want it to open the file and copy first column of the file into table 1 and second column into table 2 for all of the lines.
I would really appreciate if someone can please provide me a high level list of steps I need to follow.
Any help is appreciated.

Here is one way of doing it within SSIS. Below steps are with respect to SSIS 2008 R2.
Create an SSIS package and create three package variables namely FileName, FilesToRead and Type. FilesToRead variable will hold the list of files and their types information. We will have a loop that will go through each of those records and store the information in FileName and Type variables every time it loops through.
On the control flow tab, place a Data flow task followed by a ForEach Loop container. The data flow task would read the file containing the list of files that has to be processed. The loop would then go through each file. Your control flow tab would finally look something like this. For now, there will be errors because nothing is configured. We will get to that shortly.
On the connection manager section, you need four connections.
First, you need an OLE DB connection to connect to the database. Name this as SQLServer.
Second, a flat file connection manager to read the file that contains the list of files and types. This flat file connection manager will contain two columns configured namely FileName and Type Name this as Files.
Third, another flat file connection manager to read all files of type A. Name this as Type_A. In this flat file connection manager, enter the value 4 in the text box Header rows to skip so that the first four rows are always skipped.
Fourth, one more flat file connection manager to read all files of type B. Name this as Type_B.
Let's get back to control flow. Double-click on the first data flow task. Inside the data flow task, place a flat file source that would read all the files using the connection manager Files and then place a Recordset Destination. Configure the variable FilesToRead in the recordset destination. Your first data flow task would like as shown below.
Now, let's go back to control flow tab again. Configure the ForEach loop as shown below. This loop will go through the recordset stored in the variable FilesToRead. Since, the recordset contains two columns, each time a record is looped through, the variables FileName and Type will be assigned the value of the current record.
Inside, the for each loop container, there are two data flow tasks namely Type A files and Type B files. You can configure each of these data flow tasks according to your requirements to read the files from connection managers. However, we need to disable the tasks based on the file that is being read.,
Type A files data flow task should be enabled only when A type files are being processed.
Similarly, Type B files data flow task should be enabled only when B type files are being processed.
To achieve this, click on the Type A files data flow task and press F4 to bring the properties. Click on the Ellipsis button available on the Expression property.
On the Property Expressions Editor, select Disable Property and enter the expression !(#[User::Type] == "A")
Similarly, click on the Type B files data flow task and press F4 to bring the properties. Click on the Ellipsis button available on the Expression property.
On the Property Expressions Editor, select Disable Property and enter the expression !(#[User::Type] == "B")
Here is a sample Files.txt containing only A type file in the list. When the package is executed to read this file, you will notice that only the Type A files data flow task.
Here is another sample Files.txt containing only B type files in the list. When the package is executed to read this file, you will notice that only the Type B files data flow task.
If Files.txt contains both A and B type files, the loop will execute the appropriate data flow task based on the type of file that is being processed.
Configuring Data Flow task Type A files
Let's assume that your flat files of type A have three column layout like as shown below with comma separated values. The file data here is shown using Notepad++ with all special characters. CR LF denotes that the lines are ending with Carriage return and Line Feed. This file is stored in the path C:\f1.txt
We need a table in the database to import the data. Let's create a table named dbo.Table_A in the SQL Server database as shown here.
Now, go to the SSIS package. Here are the details to configure the Flat File connection manager named Type_A. Give a name to the connection manager. You need specify the value 4 in the Header rows to skip textbox. Your flat file connection manager should look something like this.
On the Advanced tab, you can rename the column names if you would like to.
Now that the connection manager is configured, we need to configure data flow task Type A files to process the corresponding files. Double-click on the data flow task Type A files. Place a Flat file source and OLE DB Destination inside the task.
The flat file source has to be configured to read the files from flat file connection manager.
The data flow task doesn't do anything special. It simply reads the flat files of type A and inserts the data into the table dbo.Table_A. Now, we need to configure the OLE DB Destination to insert the data into database. The column names configured in the flat file connection manager and the table are not same. So, they have to be mapped manually.
Now, that the data flow task is configured. We have to make that the file path being read from the Files.txt is passed correctly. To do this, click on the Type_A flat file connection manager and press F4 to bring the properties. Set the DelayValidation property to True. Click on the Ellipsis button on the Expressions property.
On the Property Expression builder, select ConnectionString property and set it to the Expression #[User::FileName]
Here is a sample Files.txt file containing Type A files only.
Here are the sample type A files f01.txt and f02.txt
After the package execution, following data will be found in the table Table_A
Above mentioned configuration steps have to be followed for Type B files. However, the data flow task would look slightly different since the file processing logic is different. Data flow task Type B files would something like this. Since you have to insert the two columns in type B files into different tables. You have to use Multicast transformation that would create clones of the input data. You could use each of the multicast output to pass through to a different transformation or destination.
Hope that helps you to achieve your task.

I would recommend that you create a SSIS package for each different type of file load you're going to do. You can execute those packages from another program, see here: How to execute an SSIS package from .NET?
Given this information, you can write a quick program to execute the relevant packages:
var jobs = File.ReadLines("C:\\temp\\order.txt")
.Skip(1)
.Select(line => line.Split(','))
.Select(tokens => new { File = tokens[0], Category = tokens[1] });
foreach (var job in jobs)
{
// execute the relevant package for job.Category using job.File
}

My solution would look like N + 1 flat file Connection Managers to handle the source files. CM A would address the skip first 4 rows file format, B sounds like it's just a 2 column file, etc. The last CM would be used to parse the command file you've illustrated.
Now that you have all of those Connection Managers defined, you can go about the processing logic.
Create 3 variables. 2 of type string (CurrentPath, CurrentType). 1 is of type Object and I called it Recordset.
The first Data Flow reads all the rows from the flat file source using "CM Control." This is the data you supplied in your example.
We will then use that Recordset object as the source for a ForEach Loop Container in what is commonly referred to as shredding. Bingle the term "Shred recordset ssis" and you're bound to hit a number of articles describing how to do it. The net result is that for each row in that source CM Control file, you will assign those values into the CurrentPath, CurrentType variables.
Inside that Loop container, create a central point for control for control to radiate out. I find a script task works wonderfully for this. Drag it onto the canvas, give it a strong name to indicate it's not used for anything and then create a data flow to handle each processing permutation.
The magic comes from using Expressions. Dang near everything in SSIS can have expressions set on their properties which is what separates the professionals from the poseurs. Here, we will double click on the line connecting to a given data flow and change the constraint type from "Constraint" to "Expression and Constraint" The Expression you would then use is something like #[User::CurrentType] == "A" This will ensure that path is only taken when both the parent task Succeeded and the condition is true.
The second bit of expression magic will be applied to the connection managers themselves. They will need to have their ConnectionString property driven by the value of the #[User::CurrentFile] property. This will allow a design-time value of C:\filea.txt but would allow a runtime value, from the control file, to be \\network\share\ClientFileA.txt Unless all the files have the same structure, you'll most likely need to set DelayValidation to True in the properties. Otherwise, SSIS will fail PreValidation as all the "CM A" to "CM N" would be using that CurrentFile variable which may or may not be a valid connection string for that file layout.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.