How to validate column before importing into database - c#

I am a complete newbie to SSIS.
I have a c#/sql server background.
I would like to know whether it is possible to validate data before it goes into a database. I am grabbing text from a |(pipe) delimited text file.
For example, if a certain datapoint is null, then change it to 0 or if a certain datapoint's length is 0, then change to "nada".
I don't know if this is even possible with SSIS, but it would be most helpful if you can point me into the right direction.

anything is possible with SSIS!
after your flat file data source, use a Derived Column Transformation. Deriving a new column with the expression being something like the following.
ISNULL(ColumnName) ? "nada" : ColumnName
Then use this new column in your data source destination.
Hope it helps.

I don't know if you're dead set on using SSIS, but the basic method I've generally used to import textfile data into a database generally takes two stages:
Use BULK INSERT to load the file into a temporary staging table on the database server; each of the columns in this staging table are something reasonably tolerant of the data they contain, like a varchar(max).
Write up validation routines to update the data in the temporary table and double-check to make sure that it's well-formed according to your needs, then convert the columns into their final formats and push the rows into the destination table.
I like this method mostly because BULK INSERT can be a bit cryptic about the errors it spits out; with a temporary staging table, it's a lot easier to look through your dataset and fix errors on the fly as opposed to rooting through a text file.

Related

Copy values from pivot table to sheet in specific cell

I have some data that is displayed in a pivot table, from that pivot table I currently copy and paste them in another table with the corresponding current day, with so many operation numbers it takes about 1hr to complete this process daily, I'm trying to make a macro in which will help me in making this process in less time
This is the pivot table in which I copy the data
and here is where I paste it (in this table I use it to create some graphs)
Both tables consist of many different numbers of operation, as you can imagine this is very tedious.
Any help or ideas that can help me solve this is appreciated, I'm willing to try some other alternatives like fomulas, vba (preferred), even closedxml for c#.
I dont know if im explaining my self?
This is my logic.
Look for operation number
Copy Sum of "Sx"
Look in other sheet for the same operation number.
Paste value in corresponding day.
Look for same operation number with next "Sx"
.
.
End
Right now I'm just focusing on Operation1, since in Operation2 I use another sheet in which I do the same thing. (will probably use the same code as Operation1).
To clarify I'm not asking for someone to just give me the solution (although it is appreciated ), but any guide on this is helpful.
You do not need VBA to do this. Use GETPIVOTDATA.
Consider below replicated data:
In above sample, I copied some of your data in your screen shot, made a pivot out of it, put it in the same sheet and also put the table you need to populate below it. Then I use GETPIVOTDATA. I feel I need to show you how it is done even though it is well explained in the link I posted.
So we use this formula:
=GETPIVOTDATA("Sum of "&$I18,$I$2,"Date",J$17,"Opt",$I$16,"Id",$J$16)
in Cell J14. Take note that instead of using Day 1, Day2.. etc on your destination table, we used the actual date. Why? Because we need that in our GETPIVOTDATA Formula. Then copy the formula to the rest of the cells.
Result:
Now, GETPIVOTDATA errors out if it does not find anything which matches the criteria you supplied, so you might want to incorporate error handling using IFERROR statement.
Final Formula:
=IFERROR(GETPIVOTDATA("Sum of "&$I18,$I$2,"Date",J$17,"Opt",$I$16,"Id",$J$16),0)
Although you prefer VBA, I don't think a better approach than the built-in Excel functionality is more suitable in your case.

SQLXML Bulk Load or manual iteration?

I am looking to insert a 20-25MB xml file into a database on a daily basis. The issue is that each entry needs an extra column added with a calculated value. So what I am wondering is if the most efficient way to do this would be using the SQLXML Bulk Load tools after editing the xml file, running through the xml file and add the new column then loading each item, or using the Bulk Load followed by going through the database adding the new column values.
Comments = answer
There is no need to store this value seperate. Since it's a calculated value with the data you need to calculate it on each record, you can calculate this on the fly instead of storing it as it's own unique value. A mix of where and/or having clauses will allow for filtering (searching) of results based on that calculated value.

Updating millions of Row after Calculation

I am looking for advice on how should I do following:
I have a table in SQL server with about 3 -6 Million Records and 51 Columns.
only one column needs to be updated after calculating a value from 45 columns data been taken in mathematical calculation.
I already have maths done through C#, and I am able to create Datatable out of it [with millions record yes].
Now I want to update them into database with most efficient manner. Options I know are
Run update query with every record, as I use loop on data reader to do math and create DataTable.
Create A temporary table and use SQLBulkCopy to copy data and later use MERGE statement
Though it is very HARD to do, but can try to make Function within SQL to do all math and just run simple update without any condition to update all in once.
I am not sure which method is faster one or better one. Any idea?
EDIT: Why I am afraid of using Stored Procedure
First I have no idea how i wrote it, I am pretty new to do this. Though maybe it is time to do it now.
My Formula is Take one column, apply one forumla on them, along with additional constant value [which is also part of Column name], then take all 45 columns and apply another formula.
The resultant will be stored in 46th column.
Thanks.
If you have a field that contains a calculation from other fields in the database, it is best to make it a calculated field or to maintain it through a trigger so that anytime the data is changed from any source, the calculation is maintained.
You can create a .net function which can be called directly from sql here is the link how to create one http://msdn.microsoft.com/en-us/library/w2kae45k%28v=vs.90%29.aspx. After you created the function run the simple update statement
Can't you create a scalar valued function in c#, and call it in as part of a computed column?

how to implement oracle -> oracle conversion/refresher program in C# / ADO.NET 2.0

When program runs 1st time it just gets some fields from a source database table say:
SELECT NUMBER, COLOR, USETYPE, ROOFMATERIALCODE FROM HOUSE; //number is uniq key
it does some in-memory processing say converting USETYPE and ROOFMATERIAL to destination database format (by using cross ref table).
Then program inserts ALL THE ROWS to destination database:
INSERT INTO BUILDING (BUILDINGID, BUILDINGNUMBER, COLOR, BUILDINGTYPE, ROOFMAT)
VALUES (PROGRAM_GENERATED_ID, NUMBER_FROM_HOUSE, COLOR_FROM_HOUSE,
CONVERTED_USETYPE_FROM_HOUSE, CONVERTED_ROOFMATERIALCODE_FROM_HOUSE);
The above is naturally not SQL but you get the idea (the values with underscores just describe the data inserted).
The next times the program should do the same except:
insert only the ones not found from target database.
update only the ones that have updated color, usetype, roofmaterialcode.
My question is:
How to implement this in efficient way?
-Do I first populate DataSet and convert fields to destination format?
-If I use only 1 DataSet how give destination db BUILDING_IDs (can i add columns to populated DataSet?)
-How to efficiently check if destination rows need refresh (if i select them one # time by BUILDING_NUMBER and check all fields it's gonna be slow)?
Thanks for your answers!
-matti
If you are using Oracle, have you looked at the MERGE statement? You give the merge statement a criteria. If records match the criteria, it performs an UPDATE. If they don't match the criteria (they aren't already in the table), it performs an INSERT. That might be helpful for what you are trying to do.
Here is the spec/example of merge.

knowing if a string will be truncated when updating database

I'm working on a software that takes a csv file and put the data in a sqlserver. i'm testing it with bad data now and when i make a data string to long (in a line) to be imported in the database i got the error : String or binary data would be truncated the statement has been terminate. that's normal and that's what i should expect. Now i wanna detecte those error before the update to the database. Is there any clever way to detecte this?
The way my software work is that i importe every line in a dataset then show the user the data that will be imported. Then he can click a button to do the actual update. i then do a dataAdapter.Update( Dataset, "something" ) to make the update to the database.
The probleme is that the error row terminate all the update and report the error. So i want to detect the error before i do the update to the server so the other rows will be inserted.
thanks
You will have to check the columns of each row. See if one exceeds the maximum specified in the database, and if yes, exclude it from being inserted.
A different solution would be to explicitly truncate the data and insert the truncated content, which could be done by using SubString.
The only way that I know of is to pre-check the information schema for the character limit:
Select
Column_Name,
Character_Maximum_Length
From
Information_Schema.Columns
Where
Table_Name = 'YourTableName'
What you need is the column metadata.
MSDN: SqlConnection.GetSchema Method
Or, if you have opened a recordset on your database, another solution would be to browse the field object to use its length to truncate the string. For example, with ADO recordset/VB code, you could have some code like this one:
myRecordset.fields(myField) = left(myString, myRecordset.fields(myField).DefinedSize)

Categories