I'm trying to use SqlBulkCopy to import a bunch of data to our website. In most of the other areas we're using Entity model which uses byte arrays to represent binary data in SQL. However, SqlBulkCopy seems to be confusing byte[] with string. Everything seems to be working fine except for this one binary column which throws an exception: "The given value of type String from the data source cannot be converted to type binary of the specified target column."
I've created a small test case to illustrate the problem:
using System.Data;
using System.Data.SqlClient;
namespace SqlBulkCopyTest
{
class Program
{
static void Main(string[] args)
{
DataTable table = new DataTable("BinaryData");
table.Columns.Add("Data");
for (int i = 0; i < 10; i++)
{
var row = table.NewRow();
row["Data"] = new byte[5] { 1, 2, 3, 4, 5 };
table.Rows.Add(row);
}
using (var connection =
new SqlConnection("Data Source=localhost\\sqlexpress;Initial Catalog=TestBulkCopy;Integrated Security=True"))
{
connection.Open();
using (var copier = new SqlBulkCopy(connection))
{
copier.DestinationTableName = table.TableName;
/* EXCEPTION HERE: */ copier.WriteToServer(table);
}
}
}
}
}
This uses a test database with a BinaryData table which has a single binary(5) column named Data.
Any help would be greatly appreciated
Instead of:
table.Columns.Add("Data");
Add the "Data" column as a binary:
table.Columns.Add("Data", typeof(Byte[]));
Related
I'm using NPOI to manipulate Excel(.xlsx) file data & format. I was wondering if there is a way to format the cell range to the table.
// something like.
ITable table = worksheet.FormatAsTable("A1:C4");
Have done some research on the internet but no luck yet. Any help would be much appreciated!
[2021/05/28 Update]:
Thanks for reminding. Found that without setting ctTable's id, name and displayName would get this error Removed Part: /xl/tables/table1.xml part with XML error. (Table) Load error. Line 1, column 247. (Following sample code fixed.)
Based on comment and link offered by #Gian Paolo, the C# way to achieve 'format as table' with NPOI would be like this:
Install-Package NPOI -Version 2.5.3
// NPOI dependencies
using NPOI.OpenXmlFormats.Spreadsheet;
using NPOI.SS.UserModel;
using NPOI.SS.Util;
using NPOI.XSSF.UserModel;
IWorkbook workbook = new XSSFWorkbook();
XSSFSheet worksheet = workbook.CreateSheet("Grades") as XSSFSheet;
InsertTestData(worksheet);
// Format Cell Range As Table
XSSFTable xssfTable = worksheet.CreateTable();
CT_Table ctTable = xssfTable.GetCTTable();
AreaReference myDataRange = new AreaReference(new CellReference(0, 0), new CellReference(3, 2));
ctTable.#ref = myDataRange.FormatAsString();
ctTable.id = 1;
ctTable.name = "Table1";
ctTable.displayName = "Table1";
ctTable.tableStyleInfo = new CT_TableStyleInfo();
ctTable.tableStyleInfo.name = "TableStyleMedium2"; // TableStyleMedium2 is one of XSSFBuiltinTableStyle
ctTable.tableStyleInfo.showRowStripes = true;
ctTable.tableColumns = new CT_TableColumns();
ctTable.tableColumns.tableColumn = new List<CT_TableColumn>();
ctTable.tableColumns.tableColumn.Add(new CT_TableColumn() { id = 1, name = "ID" });
ctTable.tableColumns.tableColumn.Add(new CT_TableColumn() { id = 2, name = "Name" });
ctTable.tableColumns.tableColumn.Add(new CT_TableColumn() { id = 3, name = "Score" });
using (FileStream file = new FileStream(#"test.xlsx", FileMode.Create))
{
workbook.Write(file);
}
// Function to Populate Test Data
private void InsertTestData(XSSFSheet worksheet)
{
worksheet.CreateRow(0);
worksheet.GetRow(0).CreateCell(0).SetCellValue("ID");
worksheet.GetRow(0).CreateCell(1).SetCellValue("Name");
worksheet.GetRow(0).CreateCell(2).SetCellValue("Score");
worksheet.CreateRow(1);
worksheet.GetRow(1).CreateCell(0).SetCellValue(1);
worksheet.GetRow(1).CreateCell(1).SetCellValue("John");
worksheet.GetRow(1).CreateCell(2).SetCellValue(82);
worksheet.CreateRow(2);
worksheet.GetRow(2).CreateCell(0).SetCellValue(2);
worksheet.GetRow(2).CreateCell(1).SetCellValue("Sam");
worksheet.GetRow(2).CreateCell(2).SetCellValue(90);
worksheet.CreateRow(3);
worksheet.GetRow(3).CreateCell(0).SetCellValue(3);
worksheet.GetRow(3).CreateCell(1).SetCellValue("Amy");
worksheet.GetRow(3).CreateCell(2).SetCellValue(88);
}
Result:
Not enough rep to add a comment - but if anyone is getting strange Excel errors when everything looks fine, I had an issue with iterating an object list into tableColumn. Building an array first fixed all my issues:
var headerNames = _headers.Select(x => x.Name).ToArray();
for (uint i = 0; i < headerNames.Count; i++)
{
ctTable.tableColumns.tableColumn.Add(new CT_TableColumn() { id = i + 1, name = headerNames[i] });
}
The CSVHelper .NET library seems fantastic so far, but the documentation is a little lacking for a pseudo-beginner like myself.
I need to read a csv file and write the results to our SQL Server database. For the table I'm writing to, I need to map to its columns from the CSV columns, including some concatenation of multiple fields to one.
This is what I have for reading the csv file:
public static void Main(string[] args)
{
using (var reader = new StreamReader(#"C:\Users\me\Documents\file.csv"))
using (var csv = new CsvReader(reader))
{
csv.Configuration.PrepareHeaderForMatch = (string header, int index) =>
header.Replace(" ", "_").Replace("(", "").Replace(")", "").Replace(".", "");
var records = csv.GetRecords<EntityCsv>().ToList();
}
}
My EntityCsv class contains property names for all columns of the csv file.
Then, I also have a class called TaskEntity which contains the property names and types for the destination database table (although I'm unclear as to whether I need this).
Finally, per advice from a colleague, I have a method set up to make use of SQLBulkCopy as thus:
public void AddBulk(List<TaskEntity> entities)
{
using (var con = GetConnection())
{
SqlBulkCopy bulk = new SqlBulkCopy(con);
bulk.BatchSize = 2000;
bulk.BulkCopyTimeout = 0;
bulk.DestinationTableName = "dbo.CsvExports";
bulk.WriteToServer(entities.AsDataTable());
bulk.Close();
}
}
I borrowed that code block from him and would theoretically run that method as the final step.
But I know I'm missing a step in between, and that is mapping the fields from the csv to the SQL server field. I'm scratching my head at how to implement this step.
So let's say for simplicity's sake I have 3 columns in the csv file, and I want to map them to 2 columns of the SQL table as follows:
CsvColumn1 -> SQLtableColumn1
CsvColumn2 + CsvColumn3 -> SQLtableColumn2
How would I go about accomplishing this with CsvReader and C#? I have explored the Mapping section of the CSVReader documentation but everything I'm seeing in there seems to refer to mapping column names from an input file to names in an output file. I don't see anything there (nor anywhere on the Google) that speaks specifically to taking the input file and exporting its rows to a SQL database.
You can use a ClassMap to map the csv columns to the sql table columns and skip the CsvEntity class.
public static void Main(string[] args)
{
using (var reader = new StreamReader(#"C:\Users\me\Documents\file.csv"))
using (var csv = new CsvReader(reader))
{
csv.Configuration.PrepareHeaderForMatch = (string header, int index) =>
header.Replace(" ", "_").Replace("(", "").Replace(")", "").Replace(".", "");
csv.Configuration.RegisterClassMap<TaskEntityMap>();
var records = csv.GetRecords<TaskEntity>().ToList();
}
}
public class TaskEntity
{
public int Id { get; set; }
public string SqlTableColumn1 { get; set; }
public string SqlTableColumn2 { get; set; }
}
public sealed class TaskEntityMap : ClassMap<TaskEntity>
{
public TaskEntityMap()
{
Map(m => m.SqlTableColumn1).Name("CsvColumn1");
Map(m => m.SqlTableColumn2).ConvertUsing(row => row.GetField<string>("CsvColumn2") + " " + row.GetField<string>("CsvColumn3"));
}
}
I have used the SqlBulkCopy along with the csvhelper to dump data in to the sql server.
SqlBulkCopy is an awesome utility that writes data to the sql server from nearly any data source that can be loaded into a DataTable instance.
var lines = File.ReadAllLines(file);
if (lines.Count() == 0)
return;
var tableName = GetTableName(file);
var columns = lines[0].Split(',').ToList();
var table = new DataTable();
sqlBulk.ColumnMappings.Clear();
foreach (var c in columns)
{
table.Columns.Add(c);
sqlBulk.ColumnMappings.Add(c, c);
}
for (int i = 1; i < lines.Count() - 1; i++)
{
var line = lines[i];
// Explicitly mark empty values as null for SQL import to work
var row = line.Split(',')
.Select(a => string.IsNullOrEmpty(a) ? null : a).ToArray();
table.Rows.Add(row);
}
sqlBulk.DestinationTableName = tableName;
sqlBulk.WriteToServer(table);
I am inputting a text file into a DataTable and then using SqlBulkCopy to copy to a Database. While BulkCopy is fast, inserting 50000+ lines into DataTable is not (around 5 mins). How do I make it efficient?
Can I insert data into the DataTable quickly?
If not, is there a way to save the inserted data permanently into the DataTable so I don't have to insert it every time I run the program?
for (; i < fares.Length; )
{
k = i;
Console.WriteLine("Inserting " + k + " out of " + (fares.Length));
for (; i <= (k + 3); i++)
{
if (i % 4 == 0)
{
for (int j = 0; j < fares.Length - 1; j++)
{
{
int space = fares[i].IndexOf(" ");
startStation = fares[i].Substring(0, space);
endStation = fares[i].Substring(space + 1, fares[i].Length - space - 1);
}
}
}
else if (i % 4 == 1)
{
valueFare = fares[i];
}
else if (i % 4 == 2)
{
standardFare = fares[i];
}
else if (i % 4 == 3)
{
time = int.Parse(fares[i]);
}
}
faresDT.Rows.Add(startStation, endStation, valueFare, standardFare, time);
If what you want is to optimize your load to the database, I suggest that you get rid of the DataTable completely. By making use of Marc Gravell's FastMember (and anyone who's using SqlBulkCopy should be using FastMember IMHO) you can get a DataReader directly from any IEnumerable.
I would use some variation of the below code whenever writing from a file directly to a database. The below code will stream the contents of the file directly to the SqlBulkCopy operation thru the clever use of yield returns and lazy load of IEnumerable.
using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.IO;
using System.Text;
using FastMember;
namespace BulkCopyTest
{
public class Program
{
public static void Main(string[] args)
{
const string filePath = "SOME FILE THAT YOU WANT TO LOAD TO A DB";
WriteData(GetData<dynamic>(filePath));
}
private static void WriteData<T>(IEnumerable<T> data)
{
using (var bcp = new SqlBulkCopy(GetConnection(), SqlBulkCopyOptions.TableLock, null))
using (var reader = ObjectReader.Create(data))
{
SetColumnMappings<T>(bcp.ColumnMappings);
bcp.BulkCopyTimeout = 300;
bcp.BatchSize = 150000;
bcp.DestinationTableName = ""; //TODO: Set correct TableName
bcp.WriteToServer(reader);
}
}
private static void SetColumnMappings<T>(SqlBulkCopyColumnMappingCollection mappings)
{
//Setup your column mappings
}
private static IEnumerable<T> GetData<T>(string filePath)
{
using (var fileStream = File.OpenRead(filePath))
using (var reader = new StreamReader(fileStream, Encoding.UTF8))
{
string line;
while ((line = reader.ReadLine()) != null)
{
//TODO: Add actual parsing logic and whatever else is needed to create an instance of T
yield return Activator.CreateInstance<T>();
}
}
}
private static SqlConnection GetConnection()
{
return new SqlConnection(new SqlConnectionStringBuilder
{
//TODO: Set Connection information here
}.ConnectionString);
}
}
}
In this case I think you should take advantage of the BeginLoadData, LoadDataRow and EndLoadData methods provided in the DataTable class, you could use them like this:
try
{
faresDT.BeginLoadData();
// Your for loop...
{
// Logic defining the value of startStation, endStation, valueFare, standardFare and time removed for briefness.
faresDT.LoadDataRow(new object[] {startStation, endStation, valueFare, standardFare, time}, true);
}
}
finally
{
faresDT.EndLoadData();
}
What BeginLoadData() does is turning off some processing that happens every time you add a row, and only does it once when you are done loading data by calling EndLoadData().
You can find more details about these APIs here:
https://learn.microsoft.com/en-us/dotnet/api/system.data.datatable.loaddatarow?view=netframework-4.7.2
I have data in sets like this:
**SET 1:**
Time = 2017-11-01 13:18:10
Param1 = 42.42
Param2 = 47.11
Param3 = 12.34
.... (up to 100 parameters)
**SET 2:**
Time = 2017-11-01 13:18:20
Param1 = 45.17
Param2 = 46.11
Param3 = 12.35
.... (up to 100 parameters)
I get a new set of data every 10 second. I need so save data in SQL Server (I am free to define the table).
Later I need to fetch the data from the database so that I can XY graph having time on X-axis and the params on Y-axis.
I was thinking of saving my data as JSON. Either just as a string in a table (where the string is JSON) or use the JSON support in SQL Server 2016.
What is the recommended way of doing this?
I am thinking a lot on performance.
I did some tests:
Simple String: This is just for reference. My data colum simply contains a string with a 5 digit number.
XML String With attributes: Data column is a string (nvarchar(MAX)) containing XML with 35 nodes like this:
<data>
<Regtime value='2017 - 08 - 21 13:56:05'/>
<MachineId value = 'Somefactory.SomeSite.DeviceId' />
<Values>
<B_T_SP value = '181.23' unit = '1234' />
<B_H_SP_Tdp value = '87.34' unit = '801' />
<B_A_SP_v_air value = '42.42' unit = '500' />
<S_T_SP value = '175' unit = '801' />
<S_A_SP_v_air value = '57.23' unit = '500'
...
XML String with nodes: Same as above but not using attributes:
<data>
<Regtime>'2017-11-01T12:59:02.2792518+01:00'</Regtime>
<MachineId>'Somefactory.SomeSite.DeviceId'</MachineId>
<Values>
<B_T_SP>
<value>666,50</value>
<unit>801</unit>
</B_T_SP>
<B_H_SP_Tdp>
<value>414,21</value>
<unit>801</unit>
</B_H_SP_Tdp>
<B_A_SP_v_air>
<value>41,83</value>
<unit>801</unit>
</B_A_SP_v_air>
<S_T_SP>
<value>20,70</value>
<unit>801</unit>
</S_T_SP>
...
JSON string: Data column is a string (nvarchar(MAX)) containing JSON with 35 nodes like this:
{
"data": {
"Regtime": "2017-11-02T12:57:00.3745960+01:00",
"MachineId": "Somefactory.SomeSite.DeviceId",
"Values": {
"B_T_SP": {
"value": "703,81",
"unit": "801"
},
"B_H_SP_Tdp": {
"value": "485,90",
"unit": "801"
},
"B_A_SP_v_air": {
"value": "3,65",
"unit": "801"
},
"S_T_SP": {
"value": "130,44",
"unit": "801"
},
...
Distributed: Like CodeCaster suggested, using two tables. 35 SetParameters per Set
When inserting data I do like this:
startTime = DateTime.Now;
using (ConnectionScope cs = new ConnectionScope())
{
for (int i = 0; i < counts; i++)
{
sql = GetSqlAddData(DataType.XmlAttributeString);
using (IDbCommand c = cs.CreateCommand(sql))
{
c.ExecuteScalar();
}
}
}
logText.AppendText(string.Format("{0}x XML attribute string Insert took \t{1}\r\n", counts, DateTime.Now.Subtract(startTime)));
Except for Distribted, here I do like this:
using (ConnectionScope cs = new ConnectionScope())
{
for (int i = 0; i < counts; i++)
{
sql = GetSqlAddData(DataType.Distributed);
using (IDbCommand c = cs.CreateCommand(sql))
{
id = (int)c.ExecuteScalar();
}
for (int j = 0; j < 35; j++)
{
using (IDbCommand c = cs.CreateCommand($"INSERT into test_datalog_distr_detail (setid, name, value) VALUES ({id}, '{"param"+j}', '{j*100 + i}')"))
{
c.ExecuteScalar();
}
}
}
}
logText.AppendText(string.Format("{0}x Distributed Insert \t{1}\r\n", counts, DateTime.Now.Subtract(startTime)));
When I read data I do like this:
Simple String:
var data = new List<Tuple<DateTime, string>>();
DateTime time;
string point;
string sql = GetSqlGetData(DataType.SimpleString);
var startTime = DateTime.Now;
using (ConnectionScope cs = new ConnectionScope())
{
using (IDbCommand cmd = cs.CreateCommand(sql))
{
using (IDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
time = DateTime.Parse(reader[5].ToString());
point = reader[12].ToString();
data.Add(new Tuple<DateTime, string>(time, point));
}
}
}
}
logText.AppendText(string.Format("{0}x Simple Select {1}\r\n", counts, DateTime.Now.Subtract(startTime)));
XML both with and without nodes:
sql = GetSqlGetData(DataType.XmlAttributeString);
startTime = DateTime.Now;
var doc = new XmlDocument();
using (ConnectionScope cs = new ConnectionScope())
{
using (IDbCommand cmd = cs.CreateCommand(sql))
{
using (IDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
time = DateTime.Parse(reader[5].ToString());
doc = new XmlDocument();
doc.LoadXml(reader[12].ToString());
point = doc.SelectSingleNode("/data/Values/B_T_SP").Attributes["value"].Value;
data.Add(new Tuple<DateTime, string>(time, point));
}
}
}
}
logText.AppendText(string.Format("{0}x Select using XmlDoc and Attribute String {1}\r\n", counts, DateTime.Now.Subtract(startTime)));
JSON:
JObject jobj;
using (ConnectionScope cs = new ConnectionScope())
{
using (IDbCommand cmd = cs.CreateCommand(sql))
{
using (IDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
time = DateTime.Parse(reader[5].ToString());
jobj = JObject.Parse(reader[12].ToString());
point = jobj["data"]["Values"]["B_T_SP"].ToString();
data.Add(new Tuple<DateTime, string>(time, point));
}
}
}
}
logText.AppendText(string.Format("{0}x Select using JSON String {1}\r\n", counts, DateTime.Now.Subtract(startTime)));
Distributed tables (CodeCaster recommendation)
sql = GetSqlGetData(DataType.Distributed);
startTime = DateTime.Now;
using (ConnectionScope cs = new ConnectionScope())
{
using (IDbCommand cmd = cs.CreateCommand(sql))
{
using (IDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
time = DateTime.Parse(reader[5].ToString());
point = reader[15].ToString();
data.Add(new Tuple<DateTime, string>(time, point));
}
}
}
}
logText.AppendText(string.Format("{0}x Select on distributed tables {1}\r\n", counts, DateTime.Now.Subtract(startTime)));
I did a test run where I was inserting 100000 rows to the database and measured the time used:
Simple string value: 18 seconds
XML string (attributes): 36 seconds
XML string (nodes only): 38 seconds
JSON string: 37 seconds
Distributed (CodeCaster): 8 MINUTES!
Reading 100000 rows and fetch one value in each:
Simple string value: 0.4 seconds
XML string (attributes): 5.8 seconds
XML string (nodes only): 7.4 seconds
JSON string: 9.4 seconds
Distributed (CodeCaster): 0.5 seconds
So far my conclusion is:
I am surprised that XML seems faster than JSON. I expected distributed select to be faster than XML especially because the selection of one parameter is done in SQL and not afterwards as with JSON ans XML. But insert into distributed tables worries me. What I need more to test is to have the XML and XML in DB and not string so that I can select which parameter to use in SQL and not afterwards in XmlDocument
Let's not store JSON in a database column, lest you want to create the inner platform effect or a database-in-a-database.
Especially not if you want to meaningfully query the data stored therein. Sure, there are database systems that support this, but if you have the ability to inspect and transform the JSON on beforehand, you should definitely go for that option.
Simply normalize the params into a junction table called SetParameters, with a foreign key to the Sets table.
So you'll end up with two tables:
Sets
Id
Time
SetParameters
Id
SetId
Name
Value
Just wanted to inform what I ended up doing:
JSON simply used up too much database space so I used a three level database structure, where I saved name of parameter in one table and parameter value in another table to reduce the footprint.
I then used Table Valued Parameters (TVP) to insert data in an effective way.
I am building a WEB API to generate JSON objects in .net core
The thing is the data sets are generated in SQL stored procedures (using dynamic SQL) and i dont know the type of objects that are returned so i can map it to a concrete model, since the output columns change depending on the parameters.
Does any one know ho to retrive the data set from the BD in net core 1.0 with or without using EF?
Browsed a lot and can only find ansers that use models
Thanks in advance
You can add the following dependencies for your project in project.json file:
System.Data.Common
System.Data.SqlClient
As you can see in the next image:
Rebuild your project and you can code something like this:
using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Dynamic;
namespace ConsoleApp1
{
public class Program
{
public static IEnumerable<dynamic> GetData(String cmdText)
{
using (var connection = new SqlConnection("server=(local);database=Northwind;integrated security=yes;"))
{
connection.Open();
using (var command = new SqlCommand(cmdText, connection))
{
using (var dataReader = command.ExecuteReader())
{
var fields = new List<String>();
for (var i = 0; i < dataReader.FieldCount; i++)
{
fields.Add(dataReader.GetName(i));
}
while (dataReader.Read())
{
var item = new ExpandoObject() as IDictionary<String, Object>;
for (var i = 0; i < fields.Count; i++)
{
item.Add(fields[i], dataReader[fields[i]]);
}
yield return item;
}
}
}
}
}
public static void Main(String[] args)
{
foreach (dynamic row in GetData("select * from Shippers"))
{
Console.WriteLine("Company name: {0}", row.CompanyName);
Console.WriteLine();
}
Console.ReadKey();
}
}
}
Please let me know if this is useful.