How to bulk copy data from CSV file to SQL? - c#

I tried like this:
public Datatable GetDatatable(){
string[] csv=#"1,ABC,Arun,12/12/2017\n
2,BCD,Sam,10/12/2017\n
3,XYZ,Ammy,11/12/2017\n
4,PQR,Varun,9/12/2017\n";
DataTable table = new DataTable();
table.Columns.Add("Dosage", typeof(int));
table.Columns.Add("Drug", typeof(string));
table.Columns.Add("Patient", typeof(string));
table.Columns.Add("Date", typeof(DateTime));
foreach (var line in csv)
{
string[] l=line.split(',');
table.Rows.Add(l[0]);
table.Rows.Add(l[1]);
table.Rows.Add(l[2]);
table.Rows.Add(l[3]);
}
return table;
}
csv like
1,ABC,Arun,12/12/2017
2,BCD,Sam,10/12/2017
3,XYZ,Ammy,11/12/2017
4,PQR,Varun,9/12/2017
It is done by using foreach. I want same result without using for loop
or foreach because CSV contain above 3 million lines. Please give some ideas or suggestions.

You could use OLEDB to read the CSV file using SQL (kinda) queries:
Check this answer for an example: Most efficient way to process a large csv in .NET
You could also use an existing library such as FileHelpers: http://www.filehelpers.net/ it allows to parse large CSV files easily and very fast.

In a way or another a loop will be always executed to read the file, split the line and insert it in the DataTable.
There are many free libraries to work with CSV file and if you execute a simple search you will find them really easily.
However your code could be improved (and fixed) with splitting the line just one time and not 4 times in that loop and adding the row with just one single line
(Actually your code adds 4 rows for each line and this seems pretty wrong)
foreach (var line in csv)
{
string[] parts = line.Split(',');
table.Rows.Add(parts);
}
return table;
If this will be enough for your requirements it is up to you to evaluate. With so many rows to read I would do an extensive comparison of the features and performances of different approaches. (Using libraries or self made code)

This is working no need to go for third party Use SQLBulk Copy Instead
public static void BulInsert()
{
var destination = #"Data Source = (localdb)\MSSQLLocalDB; Initial Catalog = SampleDatabase; Integrated Security = True; Connect Timeout = 30; Encrypt = False; TrustServerCertificate = False; "; // your comnnection string
var filename = #"d:\db.csv";//your source file
var connString = string.Format(#"Provider=Microsoft.Jet.OleDb.4.0; Data Source={0};Extended Properties=""Text;HDR=No;FMT=Delimited""", Path.GetDirectoryName(filename));
string query = string.Format("Select * from [{0}]", Path.GetFileName(filename));
using (var conn = new OleDbConnection(connString))
{
conn.Open();
OleDbCommand command = new OleDbCommand(query, conn);
var reader = command.ExecuteReader();
using (SqlConnection destConnection = new SqlConnection(destination))
{
try
{
destConnection.Open();
using (SqlBulkCopy bulkCopy =
new SqlBulkCopy(destConnection))
{
// what ever mapping with ordinal
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(0, "Id"));
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(1, "name"));
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(2, "visit"));
bulkCopy.DestinationTableName =
"dbo.Patients";
try
{
bulkCopy.WriteToServer(reader);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
finally
{
reader.Close();
}
}
}
catch (Exception)
{
}
}
}
}

You can use Cinchoo ETL library, if you want to convert CSV to Datatable or bulk import CSV to sqlserver easily.
To convert CSV to datatable, the code below shows how to do it
using (var p = new ChoCSVReader("emp.csv").WithFirstLineHeader())
{
DataTable dt = p.AsDataTable();
}
If you want to bulk import CSV to sqlserver, you can refer this codeproject article on how to do it.
Cinchoo ETL - Bulk Insert CSV File into SQLServer
Hope this helps.
Disclaimer: I'm author of this library.

Related

Better way to get json data from relational databases?

I don't know my title is understandable but actually I want to know which one is better?
1-Creating object class and get data from mssql db with loop
2-Getting data from sql db with json format
3-Something else..
I think, loop can be slow when working with big datas. However maybe using json path can be slower than loop.
Example for 1 (CREATING OBJECT IN LOOP)
List<objExample > retVal = new List<objExample >();
objExample item;
SqlConnection con = new SqlConnection("CONNECTION STRING");
SqlDataAdapter da;
SqlCommandcmd;
da = new SqlDataAdapter("Select a,b from table", con);
con.Open();
DataTable dt = new DataTable();
da.Fill(dt);
con.Close();
foreach (DataRow itemdr in dt.Rows)
{
item = new objExample();
item.A= itemdr["a"].ToString();
item.B= itemdr["b"].ToString();
item.HasError = false;
retVal.Add(item);
}
return retVal;
Example for 2 (FOR JSON PATH)
List<objExample > retVal;
SqlConnection con = new SqlConnection("CONNECTION STRING");
SqlDataAdapter da;
SqlCommandcmd;
da = new SqlDataAdapter("Select a,b from table for json path", con);
con.Open();
DataTable dt = new DataTable();
da.Fill(dt);
con.Close();
string _json = dt.Rows[0][0].ToString();
retVal = JsonConvert.DeserializeObject<List<objExample>>(_json);
return retVal;
I tried both of them with small data but it didn't satisfy me.
PS : I wrote codes in my mind. Sorry about wrong codes and bad English.
Please guide to me. Thanks.
Just to be clear, are you reading the complete table?
If the goal is to get data that is stored within the json, and maybe perform some queries on it a NoSQL DB seems more appropriate that a SQL one.
Count the number of commands that are dependent on the number of rows. Kind of how we do the Big O Complexity.
I would do something like:
using (var sqlCommand = new SqlCommand("Select a,b from table for json path", con))
{
try
{
using (var reader = sqlCommand.ExecuteReader())
{
while (reader.Read())
{
item = new objExample();
item.A= reader["a"].ToString();
item.B= reader["b"].ToString();
item.HasError = false;
retVal.Add(item);
}
}
}
}
Keep in mind that the sql connection will remain open while you are doing this so its judgement call if you want to do this after loop or in the loop.
And the da.Fill or using the above is same internally, still going to iterate over all rows.
As far as using json is concerned, I wouldn't do that unless you are asking if storing as Json file is a better option than in sql. That's a whole diff question.
You can use PLINQ to process the data in parallel once you have it in the list, even for your above deserialization you can try PLINQ.

SQL Script Has to be Changed to Reduce SQL Serverload

A couple of years ago it was a script designed to import persons into a temporary table. Now this script takes 10 minutes to load and that causes problems sometimes.
So I decided to check the code for optimizing but I have no clue how I can change it.
The current code looks like this
// Getting attributes from the configfile
string filePath = getAppSetting("filepath");
string fileName = getAppSetting("filename");
string fileBP = filePath + fileName;
if (File.Exists(fileBP))
{
// Truncate Temp-Table
SqlCommand command = new SqlCommand("TRUNCATE TABLE [dbo].[temp_person];", connection);
command.ExecuteNonQuery();
FileStream logFileStream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
StreamReader logFileReader = new StreamReader(logFileStream, System.Text.Encoding.Default);
while (!logFileReader.EndOfStream)
{
string line = logFileReader.ReadLine();
strActLine = line;
string sql = "INSERT INTO temp_person(per_nummer, per_pid, per_name)"
+ "VALUES(#per_nummer, #per_pid, #per_name)"
SqlCommand cmd = new SqlCommand(sql, connection);
cmd.Parameters.AddWithValue("#per_nummer", isNull(line.Substring(0, 7)));
cmd.Parameters.AddWithValue("#per_pid", isNull(line.Substring(7, 7)));
cmd.Parameters.AddWithValue("#per_name", isNull((line.Substring(14, 20))));
cmd.ExecuteNonQuery();
}
// Clean up
connection.Close();
logFileReader.Close();
logFileStream.Close();
}
In this code I open for each person a new connection and it makes no sense to do that. Is it possible to change that to a bulk insert or something like that? The file does not have any kind of separators like a ";"
I'm Using
MSSQL 2008 R2,
.Net 4.0 (higher is currently not possible on this server)
Presuming your logFileStream file ("fileName") has all the users you are loading, you are NOT opening a new connection as you think. It is currently using one connection to TRUNCATE the table, then load all entries from the file fileName.
The only way to make this run any faster would be to use the Bulk Insert SQL Server statement, details of which you can find here: https://msdn.microsoft.com/en-us/library/ms188365.aspx
Have you investigated the SqlBulkCopy class? It provides several different ways to bulk-copy data from .NET code into SQL Server. Here's an example using a DataTable to buffer the records:
if (File.Exists(fileName))
{
TruncateTempTable(connection);
DataTable newRecs = new DataTable();
newRecs.Columns.Add("per_nummer", typeof (string));
newRecs.Columns.Add("per_pid", typeof(string));
newRecs.Columns.Add("per_name", typeof(string));
using (TextReader tr = File.OpenText(fileName))
{
while (tr.Peek() > 0)
{
string theLine = tr.ReadLine();
DataRow newRow = newRecs.NewRow();
newRow["per_nummer"] = theLine.Substring(0, 7);
newRow["per_pid"] = theLine.Substring(7, 7);
newRow["per_name"] = theLine.Substring(14, 20);
newRecs.Rows.Add(newRow);
}
}
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
bulkCopy.WriteToServer(newRecs);
}
Much more detail is available at the MSDN link above.
I hope this will work
BULK INSERT Reports FROM #ReportFile WITH (FIELDTERMINATOR = '|',ROWTERMINATOR = '\n')
if file data like
M1009|20130502|E400969|ARACIL ALONDRA A|2013050220131202201404022014040220140408
M1009|20130502|N1000533|BARRY PATRICIA| 2013050220131202201404022014040220140408
M1009|20130502|N1001263|GRAYSON JOSEPH| 2013050220131202201404022014040220140408
M1009|20130502|N1026710|GANZI LOUIS R.| 2013050220131202201404022014040220140408
this should works with t-sql

Logging Data in Excel File using C#

I have a requirement to log the range of data from multiple interfaces in the Excel file.
So, i can open an excel sheet and keep the data in the different worksheets of an excel sheet from multiple interfaces having an iteration period of somewhere 40ms to 100ms for different interfaces.
I have tried using the EPPlus library and able to push the data but its like I just collate the data and then push it in the Excel sheet. I am not finding any way where I can keep writing the data in multiple worksheets in parallel.
One approach I am trying to use with the InterOp but I am not sure if this work file where the very fast data is coming from multiple interfaces and needs to be filled in one or more worksheets.
Anyone can advice the best approach to do so?
You actually describe two kinds of functionality here - logging and reporting.
Excel file is by no means a realtime data storage. It's suitable for reporting, but not for logging.
I would suggest accumulating data somewhere else, for example in a relational database or just in csv files, depending on your reliability and scalability needs, and aggregate the excel files for closed time periods, for example: dayly, hourly, every minute.
If you absolutely need to add many items to different excel sheet at runtime, you can use Microsoft OleDb driver:
const string connectionString =
#"Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=Excel 12.0 XML;Data Source=C:\source\MyExcel.xlsx;";
using (var conn = new OleDbConnection(connectionString))
{
conn.Open();
foreach (var sheet in new[] { "sheet1", "sheet2", "sheet3" })
{
using (var cmd = new OleDbCommand())
{
cmd.Connection = conn;
try
{
cmd.CommandText = "CREATE TABLE [" + sheet + "] (id INT, datecol DATE );";
cmd.ExecuteNonQuery();
}
catch (Exception) // TODO: find better way to determine existing sheet
{
Console.WriteLine("Can't create {0}", sheet);
}
}
for (var i = 0; i < 1000; i++)
{
using (var cmd = new OleDbCommand())
{
cmd.Connection = conn;
var datecol = DateTime.Now;
var id = i;
cmd.CommandText = "INSERT INTO [" + sheet + "](id, datecol) VALUES(#id,#datecol);";
cmd.Parameters.Add("#id", OleDbType.Integer).Value = id;
cmd.Parameters.Add("#datecol", OleDbType.Date).Value = datecol;
cmd.ExecuteNonQuery();
}
}
}
conn.Close();
}

Reading excel file in c#.net cell by cell

I'm new to c#.net
I have excel sheet and I want to import into database.
I want to read it cell by cell and want to insert value in database.
this.openFileDialog1.FileName = "*.xls";
DialogResult dr = this.openFileDialog1.ShowDialog();
if (dr == System.Windows.Forms.DialogResult.OK)
{
string path = openFileDialog1.FileName;
string connectionString = String.Format(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR=no;IMEX=1;""", openFileDialog1.FileName);
string query = String.Format("select * from [{0}$]", "Sheet3");
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(query, connectionString);
DataSet dataSet = new DataSet();
dataAdapter.Fill(dataSet);
dataGridView1.DataSource = dataSet.Tables[0];
I assume that after you execute the code in your question, you can see the values within dataGridView1.
The actual reading from the excel sheet is done when calling dataAdapter.Fill. So, in your case, reading the cells comes down to indexing columns and rows in dataSet.Tables[0].
For example:
for (int row = 0; row < dataSet.Tables[0].Rows.Count; row++)
{
DataRow r = dataSet.Tables[0].Rows[row];
}
Accessing the cells in row r is trivial (like the sample above, just for cell).
EDIT
I forgot to describe the "insert the values into a database" part. I presume that the database is SQL Server (may be Express edition, too).
First: create a database connection. Instead of manually composing the connection string, use the SqlConnectionStringBuilder:
SqlConnectionStringBuilder csb = new SqlConnectionStringBuilder();
csb.DataSource = <your server instance, e.g. "localhost\sqlexpress">;
csb.InitialCatalog = <name of your database>;
csb.IntegratedSecurity = <true if you use integrated security, false otherwise>;
if (!csb.IntegratedSecurity)
{
csb.UserId = <User name>;
csb.Password = <Password>;
}
Then, create and open a new SqlConnection with the connection string:
using (SqlConnection conn = new SqlConnection(csb.ConnectionString))
{
conn.Open();
Iterate over all the values you want to insert and execute a respective insert command:
for (...)
{
SqlCommand cmd = new SqlCommand("INSERT INTO ... VALUES (#param1, ..., #paramn)", conn);
cmd.Parameters.AddWithValue("#param1", value1);
...
cmd.Parameters.AddWithValue("#paramn", valuen);
cmd.ExecuteNonQuery();
}
This closes the connection, as the using block ends:
}
And there you go. Alternatively, you could use a data adapter with a special insert-command. Then, inserting the values would come down to a one-liner, however, your database table must have the same structure as the Excel-sheet (respectively: as the data table you obtained in the code you posted.
Check out NPOI
http://npoi.codeplex.com/
It's the .NET version of Apache's POI Excel implementation. It'll easily do what you need it to do, and will help avoid some of the problems ( i.e. local copy of Excel, or worse, copy of Excel on the server ) that you'll face when using the Jet provider.

SqlBulkCopy Not Working

I have a DataSet populated from Excel Sheet. I wanted to use SQLBulk Copy to Insert Records in Lead_Hdr table where LeadId is PK.
I am having following error while executing the code below:
The given ColumnMapping does not match up with any column in the
source or destination
string ConStr=ConfigurationManager.ConnectionStrings["ConStr"].ToString();
using (SqlBulkCopy s = new SqlBulkCopy(ConStr,SqlBulkCopyOptions.KeepIdentity))
{
if (MySql.State==ConnectionState.Closed)
{
MySql.Open();
}
s.DestinationTableName = "PCRM_Lead_Hdr";
s.NotifyAfter = 10000;
#region Comment
s.ColumnMappings.Clear();
#region ColumnMapping
s.ColumnMappings.Add("ClientID", "ClientID");
s.ColumnMappings.Add("LeadID", "LeadID");
s.ColumnMappings.Add("Company_Name", "Company_Name");
s.ColumnMappings.Add("Website", "Website");
s.ColumnMappings.Add("EmployeeCount", "EmployeeCount");
s.ColumnMappings.Add("Revenue", "Revenue");
s.ColumnMappings.Add("Address", "Address");
s.ColumnMappings.Add("City", "City");
s.ColumnMappings.Add("State", "State");
s.ColumnMappings.Add("ZipCode", "ZipCode");
s.ColumnMappings.Add("CountryId", "CountryId");
s.ColumnMappings.Add("Phone", "Phone");
s.ColumnMappings.Add("Fax", "Fax");
s.ColumnMappings.Add("TimeZone", "TimeZone");
s.ColumnMappings.Add("SicNo", "SicNo");
s.ColumnMappings.Add("SicDesc", "SicDesc");
s.ColumnMappings.Add("SourceID", "SourceID");
s.ColumnMappings.Add("ResearchAnalysis", "ResearchAnalysis");
s.ColumnMappings.Add("BasketID", "BasketID");
s.ColumnMappings.Add("PipeLineStatusId", "PipeLineStatusId");
s.ColumnMappings.Add("SurveyId", "SurveyId");
s.ColumnMappings.Add("NextCallDate", "NextCallDate");
s.ColumnMappings.Add("CurrentRecStatus", "CurrentRecStatus");
s.ColumnMappings.Add("AssignedUserId", "AssignedUserId");
s.ColumnMappings.Add("AssignedDate", "AssignedDate");
s.ColumnMappings.Add("ToValueAmt", "ToValueAmt");
s.ColumnMappings.Add("Remove", "Remove");
s.ColumnMappings.Add("Release", "Release");
s.ColumnMappings.Add("Insert_Date", "Insert_Date");
s.ColumnMappings.Add("Insert_By", "Insert_By");
s.ColumnMappings.Add("Updated_Date", "Updated_Date");
s.ColumnMappings.Add("Updated_By", "Updated_By");
#endregion
#endregion
s.WriteToServer(sourceTable);
s.Close();
MySql.Close();
}
I've encountered the same problem while copying data from access to SQLSERVER 2005 and i found that the column mappings are case sensitive on both data sources regardless of the databases sensitivity.
Well, is it right? Do the column names exist on both sides?
To be honest, I've never bothered with mappings. I like to keep things simple - I tend to have a staging table that looks like the input on the server, then I SqlBulkCopy into the staging table, and finally run a stored procedure to move the table from the staging table into the actual table; advantages:
no issues with live data corruption if the import fails at any point
I can put a transaction just around the SPROC
I can have the bcp work without logging, safe in the knowledge that the SPROC will be logged
it is simple ;-p (no messing with mappings)
As a final thought - if you are dealing with bulk data, you can get better throughput using IDataReader (since this is a streaming API, where-as DataTable is a buffered API). For example, I tend to hook CSV imports up using CsvReader as the source for a SqlBulkCopy. Alternatively, I have written shims around XmlReader to present each first-level element as a row in an IDataReader - very fast.
The answer by Marc would be my recomendation (on using staging table). This ensures that if your source doesn't change, you'll have fewer issues importing in the future.
However, in my experience, you can check the following issues:
Column names match in source and table
That the column types match
If you think you did this and still no success. You can try the following.
1 - Allow nulls in all columns in your table
2 - comment out all column mappings
3 - rerun adding one column at a time until you find where your issue is
That should bring out the bug
One of the reason is that :SqlBukCOpy is case sensitive . Follow steps:
In that Case first you have to find your column in Source Table by
using "Contain" method in C#.
Once your Destination column matched with source column get index of
that column and give its column name in SqlBukCOpy .
For Example:`
//Get Column from Source table
string sourceTableQuery = "Select top 1 * from sourceTable";
DataTable dtSource=SQLHelper.SqlHelper.ExecuteDataset(transaction, CommandType.Text, sourceTableQuery).Tables[0];// i use sql helper for executing query you can use corde sw
for (int i = 0; i < destinationTable.Columns.Count; i++)
{ //check if destination Column Exists in Source table
if (dtSource.Columns.Contains(destinationTable.Columns[i].ToString()))//contain method is not case sensitive
{
int sourceColumnIndex = dtSource.Columns.IndexOf(destinationTable.Columns[i].ToString());//Once column matched get its index
bulkCopy.ColumnMappings.Add(dtSource.Columns[sourceColumnIndex].ToString(), dtSource.Columns[sourceColumnIndex].ToString());//give coluns name of source table rather then destination table so that it would avoid case sensitivity
}
}
bulkCopy.WriteToServer(destinationTable);
bulkCopy.Close();
I would go with the staging idea, however here is my approach to handling the case sensitive nature. Happy to be critiqued on my linq
using (SqlConnection connection = new SqlConnection(conn_str))
{
connection.Open();
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = string.Format("[{0}].[{1}].[{2}]", targetDatabase, targetSchema, targetTable);
var targetColumsAvailable = GetSchema(conn_str, targetTable).ToArray();
foreach (var column in dt.Columns)
{
if (targetColumsAvailable.Select(x => x.ToUpper()).Contains(column.ToString().ToUpper()))
{
var tc = targetColumsAvailable.Single(x => String.Equals(x, column.ToString(), StringComparison.CurrentCultureIgnoreCase));
bulkCopy.ColumnMappings.Add(column.ToString(), tc);
}
}
// Write from the source to the destination.
bulkCopy.WriteToServer(dt);
bulkCopy.Close();
}
}
and the helper method
private static IEnumerable<string> GetSchema(string connectionString, string tableName)
{
using (SqlConnection connection = new SqlConnection(connectionString))
using (SqlCommand command = connection.CreateCommand())
{
command.CommandText = "sp_Columns";
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add("#table_name", SqlDbType.NVarChar, 384).Value = tableName;
connection.Open();
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
yield return (string)reader["column_name"];
}
}
}
}
What I have found is that the columns in the table and the columns in the input must at least match. You can have more columns in the table and the input will still load. If you have less you'll receive the error.
Thought a long time about answering...
Even if column names are case equally, if the data type differs
you get the same error. So check column names and their data type.
P.S.: staging tables are definitive the way to import.

Categories