I have a comma delimited CSV file being loaded into a String value generated as Part of an XSLT Transformation in C# - (CLR assembly, console Application).
I need to get this C# app to output this CSV file into a Database table, 1 line at a time. and as I am rather inexperienced with C#, I have no idea how best to achieve this!
In SQL I managed it by the following SQL Statement:
INSERT INTO CsvData
(ID, sFilename, iLineCount, sData, dDate)
SELECT #ID, #Filename, id, val, CAST(getdate() as smalldatetime)
FROM dbo.split(#CSVFile,char(13))
The dbo.split function takes the #CSVFile and it's Delimeter (char(13) in this case) and returns a Table with 1 row per line in the CSVFile (id=identity, val=Linedata), which is then used to populate the CsvData table.
I cannot pass the #CSVfile as a parameter to an SQL StroedProc as it can get VERY large, so want to keep it all enclosed in the C# code.
How would i best achieve this in .NET?
You can use SqlBulkCopy class from .NET SqlClient provider. Check this MSDN article for its usage. It provides good performance regarding bulk inserts. However, you must firstly read your data from CSV in a structured format, since SqlBulkCopy requires a DataTable or an IDataReader for it to work. You have 2 options:
Load your entire CSV file into a DataTable object, which may not be the best solution if your CSV file is very big
Create a CsvDataReader as a read-only, forward-only cursor for your CSV files. You can find some implementations on web.
Since you're saying that your CSV is being loaded into a String value, than if wouldn't be a problem to fill a DataTable with data. Use this DataTable as an argument to SqlBulkCopy.WriteToServer method.
Try LINQ to CSV.
http://blogs.msdn.com/b/ericwhite/archive/2008/09/30/linq-to-text-and-linq-to-csv.aspx
This implementation would imply iterating rather than a bulk insert though
Related
I have to try to import delimited files via C# into a SQL Server database using SqlBulkCopy. I use StreamReader for reading the data from the flat file and a DataTable for writing it to the database using SqlBulkCopy.
Unfortunately within the various fields delimited by the ; we do not always have correct data, so we need to make corrections to the values we read, such as removing single quotes or replacing non-existent dates with real dates, or replace , with . for float number:
string[] line = reader.ReadLine().Split(';');
line[0] = line[0].Trim().Replace("'", "");
...
...
line[n] = line[n].Trim().Replace(",", ".");
DataTable.Row.Add(line);
I'm noticing though that in doing the replace and other operations on string arrays the performance is really really bad.
Can anyone suggest the most efficient way to do this kind of load?
I'm importing geocode data to our database from a csv file.
I've used the following library
A fast csv reader to read the csv and then using SqlBulkCopy
Here's an example of the data I'm importing
"AB10","1BH","L",0,0,,,20
"AB10","1BR","L",39320,80570,57.14214,-2.11400,21
It works ok on good data but on the top line it will throw an exception because the database is set up to not accept null values.
Is there a way to tell bulkcopy to ignore bad data? I've tried to get the csv reader to ignore bad lines by using the in built properties of the library like so but they don't appear to work.
csv.SkipEmptyLines = true;
csv.MissingFieldAction = MissingFieldAction.ParseError;
csv.DefaultParseErrorAction = ParseErrorAction.AdvanceToNextLine;
I guess another option would be to pre-parse the csv and remove all the offending rows. Perhaps there's a better csv library out there for .net?
If you could post your csv reader code then we could help more. But looking at the code on your linked page, you could do something like this:
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};",
headers[i],
csv[i] ?? "null"));
Console.WriteLine();
}
See where I have added that null-coalescing operator? This should change your output from:
"AB10","1BH","L",0,0,,,20
to
"AB10","1BH","L",0,0,null,null,20
I used the Microsoft Text Driver to import CSV information for a project. It worked pretty well. I defined a Schema.ini file to specify the column headers, data types, number of rows to scan (MaxScanRows=0, will scan the whole file).
I haven't tried this but when you use Microsoft Text Driver you issue a select query to pull the data out of the csv file, I'm wondering if you could add criteria to filter out the null records.
How to populate IDataReader from .csv for use with SqlBulkCopy.WriteToServer(IDataReader)
http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx
http://www.connectionstrings.com/textfile
Hope this helps.
To deal with the null entries I ended up parsing the csv into a DataTable object 1000 entries at a time and then imported them as I went.
I am a complete newbie to SSIS.
I have a c#/sql server background.
I would like to know whether it is possible to validate data before it goes into a database. I am grabbing text from a |(pipe) delimited text file.
For example, if a certain datapoint is null, then change it to 0 or if a certain datapoint's length is 0, then change to "nada".
I don't know if this is even possible with SSIS, but it would be most helpful if you can point me into the right direction.
anything is possible with SSIS!
after your flat file data source, use a Derived Column Transformation. Deriving a new column with the expression being something like the following.
ISNULL(ColumnName) ? "nada" : ColumnName
Then use this new column in your data source destination.
Hope it helps.
I don't know if you're dead set on using SSIS, but the basic method I've generally used to import textfile data into a database generally takes two stages:
Use BULK INSERT to load the file into a temporary staging table on the database server; each of the columns in this staging table are something reasonably tolerant of the data they contain, like a varchar(max).
Write up validation routines to update the data in the temporary table and double-check to make sure that it's well-formed according to your needs, then convert the columns into their final formats and push the rows into the destination table.
I like this method mostly because BULK INSERT can be a bit cryptic about the errors it spits out; with a temporary staging table, it's a lot easier to look through your dataset and fix errors on the fly as opposed to rooting through a text file.
C# with .net 2.0 with a SQL server 2005 DB backend.
I've a bunch of XML files which contain data along the lines of the following, the structure varies a little but is more or less as follows:
<TankAdvisory>
<WarningType name="Tank Overflow">
<ValidIn>All current tanks</ValidIn>
<Warning>Tank is close to capacity</Warning>
<IssueTime Issue-time="2011-02-11T10:00:00" />
<ValidFrom ValidFrom-time="2011-01-11T13:00:00" />
<ValidTo ValidTo-time="2011-01-11T14:00:00" />
</WarningType>
</TankAdvisory>
I have a single DB table that has all the above fields ready to be filled.
When I use the following method of reading the data from the XML file:
DataSet reportData = new DataSet();
reportData.ReadXml("../File.xml");
It successfully populates the Dataset but with multiple tables. So when I come to use SQLBulkCopy I can either save just one table this way:
sbc.WriteToServer(reportData.Tables[0]);
Or if I loop through all the tables in the Dataset adding them it adds a new row in the Database, when in actuality they're all to be stored in the one row.
Then of course there's also the issue of columnmappings, I'm thinking that maybe SQLBulkCopy is the wrong way of doing this.
What I need to do is find a quick way of getting the data from that XML file into the Database under the relevant columns in the DB.
Ok, so the original question is a little old, but i have just came across a way to resolve this issue.
All you need to do is loop through all the DataTables that are in your DataSet and add them to the One DataTable that has all the columns in the Table in your DB like so...
DataTable dataTable = reportData.Tables[0];
//Second DataTable
DataTable dtSecond = reportData.Tables[1];
foreach (DataColumn myCol in dtSecond.Columns)
{
sbc.ColumnMappings.Add(myCol.ColumnName, myCol.ColumnName);
dataTable.Columns.Add(myCol.ColumnName);
dataTable.Rows[0][myCol.ColumnName] = dtSecond.Rows[0][myCol];
}
//Finally Perform the BulkCopy
sbc.WriteToServer(dataTable);
foreach (DataColumn myCol in dtSecond.Columns)
{
dataTable.Columns.Add(myCol.ColumnName);
for (int intRowcnt = 0; intRowcnt <= dtSecond.Rows.Count - 1; intRowcnt++)
{
dataTable.Rows[intRowcnt][myCol.ColumnName] = dtSecond.Rows[intRowcnt][myCol];
}
}
SqlBulkCopy is for many inserts. It's perfect for those cases when you would otherwise generate a lot of INSERT statements and juggle the limit on total number of parameters per batch. The thing about the SqlBulkCopy class though, is that it's a cranky. Unless you fully specify all column mappings for the data set it will throw an exception.
I'm assuming that your data is quite manageable since your reading it into a DataSet. If you where to have even larger data sets you could lift chunks into memory and then flush them to the database piece by piece. But if everything fits in one go, it's as simple as that.
The SqlBulkCopy is the fastest way to put data into the database. Just setup column mappings for all the columns, otherwise it won't work.
Why reinvent the wheel? Use SSIS. Read with an XML Source, transform with one of the many Transformations, then load it with an OLE Db Destination into the SQL Server table. You will never beat SSIS in terms of runtime, speed to deploy the solution, maintenance, error handling etc etc.
I'm working on a software that takes a csv file and put the data in a sqlserver. i'm testing it with bad data now and when i make a data string to long (in a line) to be imported in the database i got the error : String or binary data would be truncated the statement has been terminate. that's normal and that's what i should expect. Now i wanna detecte those error before the update to the database. Is there any clever way to detecte this?
The way my software work is that i importe every line in a dataset then show the user the data that will be imported. Then he can click a button to do the actual update. i then do a dataAdapter.Update( Dataset, "something" ) to make the update to the database.
The probleme is that the error row terminate all the update and report the error. So i want to detect the error before i do the update to the server so the other rows will be inserted.
thanks
You will have to check the columns of each row. See if one exceeds the maximum specified in the database, and if yes, exclude it from being inserted.
A different solution would be to explicitly truncate the data and insert the truncated content, which could be done by using SubString.
The only way that I know of is to pre-check the information schema for the character limit:
Select
Column_Name,
Character_Maximum_Length
From
Information_Schema.Columns
Where
Table_Name = 'YourTableName'
What you need is the column metadata.
MSDN: SqlConnection.GetSchema Method
Or, if you have opened a recordset on your database, another solution would be to browse the field object to use its length to truncate the string. For example, with ADO recordset/VB code, you could have some code like this one:
myRecordset.fields(myField) = left(myString, myRecordset.fields(myField).DefinedSize)