Insert the whole value of DataTable bulk into postgreSQL table - c#

In SQL we do something like this for bulk insert to datatable
SqlBulkCopy copy = new SqlBulkCopy(sqlCon);
copy.DestinationTableName = strDestinationTable;
copy.WriteToServer(dtFrom);
Blockquote
but in PostgreSQL how to do this operation

Simple Insert Using Parameters
Your project will need to reference the following assembly: Npgsql. If this reference is not visible within Visual Studio, then:
browse to the connector's installation folder
Execute: GACInstall.exe
Restart Visual Studio.
Sample Table
CREATE TABLE "OrderHistory"
(
"OrderId" bigint NOT NULL,
"TotalAmount" bigint,
CONSTRAINT "OrderIdPk" PRIMARY KEY ("OrderId")
)
WITH (
OIDS=FALSE
);
ALTER TABLE "OrderHistory"
OWNER TO postgres;
GRANT ALL ON TABLE "OrderHistory" TO postgres;
GRANT ALL ON TABLE "OrderHistory" TO public;
ALTER TABLE "OrderHistory" ALTER COLUMN "OrderId" SET (n_distinct=1);
GRANT SELECT("OrderId"), UPDATE("OrderId"), INSERT("OrderId"), REFERENCES("OrderId") ON "OrderHistory" TO public;
GRANT SELECT("TotalAmount"), UPDATE("TotalAmount"), INSERT("TotalAmount"), REFERENCES("TotalAmount") ON "OrderHistory" TO public;
Sample Code
Be sure to use the following directives:
using Npgsql;
using NpgsqlTypes;
Enter the following source code into your method:
// Make sure that the user has the INSERT privilege for the OrderHistory table.
NpgsqlConnection connection = new NpgsqlConnection("PORT=5432;TIMEOUT=15;POOLING=True;MINPOOLSIZE=1;MAXPOOLSIZE=20;COMMANDTIMEOUT=20;COMPATIBLE=2.2.4.3;DATABASE=test;HOST=127.0.0.1;PASSWORD=test;USER ID=test");
connection.Open();
DataSet dataSet = new DataSet();
NpgsqlDataAdapter dataAdapter = new NpgsqlDataAdapter("select * from OrderHistory where OrderId=-1", connection);
dataAdapter.InsertCommand = new NpgsqlCommand("insert into OrderHistory(OrderId, TotalAmount) " +
" values (:a, :b)", connection);
dataAdapter.InsertCommand.Parameters.Add(new NpgsqlParameter("a", NpgsqlDbType.Bigint));
dataAdapter.InsertCommand.Parameters.Add(new NpgsqlParameter("b", NpgsqlDbType.Bigint));
dataAdapter.InsertCommand.Parameters[0].Direction = ParameterDirection.Input;
dataAdapter.InsertCommand.Parameters[1].Direction = ParameterDirection.Input;
dataAdapter.InsertCommand.Parameters[0].SourceColumn = "OrderId";
dataAdapter.InsertCommand.Parameters[1].SourceColumn = "TotalAmount";
dataAdapter.Fill(dataSet);
DataTable newOrders = dataSet.Tables[0];
DataRow newOrder = newOrders.NewRow();
newOrder["OrderId"] = 20;
newOrder["TotalAmount"] = 20.0;
newOrders.Rows.Add(newOrder);
DataSet ds2 = dataSet.GetChanges();
dataAdapter.Update(ds2);
dataSet.Merge(ds2);
dataSet.AcceptChanges();
connection.Close();
Thoughts On Performance
The original posting made no mention of performance requirements. It was requested that the solution must:
insert using a DataTable
insert data without using a loop
If you are inserting significant amounts of data, then I would suggest that you take a look at your performance options. The Postgres documentation suggests that you:
Disable Autocommit
Use the COPY command
Remove indexes
Remove Foreign Key Constraints
etc.
For more information about optimizing Postgres inserts, please take a look at:
PostgresSql.org: Inserting Data
PostgresSql.org: Insert + Performance Tips
StackOverflow: How to speed up insertion performance in PostgreSQL
Also, there are a lot of other factors that can impact a system's performance. For a high level introduction, take a look at:
ADO.NET SQL Server Performance bottleneck
This posting outlines general (i.e. non-SqlServer) strategies for optimizing performance.
Other Options
Does the .NET connector support the Postgres Copy command?
If not, you can download the source code for the Npgsql connector and add your own BulkCopy() method. Be sure to review the source code's licensing agreement first.
Check to see if Postgres supports Table Value Parameters.
This approach allows you to pass in a table into a Postgres function which can then insert the data directly into the destination.
Purchase a Postgres .NET connector from a vendor which includes the required feature.
Additional References
Postgres .NET Connector - free & open source

I've got the same problem a time ago. It seems there is no "ready to use" solution, till yet.
I've read this post and build a similar solution at that time, which is in productive use till today. Its based on text querys which reads files from STDIN. It uses the ADO.NET Postgre Data Provider Npgsql. You can create a large string (or temporary file, cause of memory usage) based on your DataTable and use that one as text query with the COPY command. In our case it was much more faster than inser teach row.
Maybe this isn't a complete solution, but may a good point to start and anything i know about it. :)

I have also found, that there are no 'ready to use' solution yet. Probably you can check my other answer in which I describe a little helper I have created for this problem, making use of another helper really easy: https://stackoverflow.com/a/46063313/6654362
I think that's currently the best solution.
I posted the solution from the link in case the post died.
Edit:
I have recently run into similar problem, but we were using Postgresql. I wanted to use effective bulkinsert, what turned out to be pretty difficult. I haven't found any proper free library to do so on this DB. I have only found this helper:
https://bytefish.de/blog/postgresql_bulk_insert/
which is also on Nuget. I have written a small mapper, which auto mapped properties the way Entity Framework:
public static PostgreSQLCopyHelper<T> CreateHelper<T>(string schemaName, string tableName)
{
var helper = new PostgreSQLCopyHelper<T>("dbo", "\"" + tableName + "\"");
var properties = typeof(T).GetProperties();
foreach(var prop in properties)
{
var type = prop.PropertyType;
if (Attribute.IsDefined(prop, typeof(KeyAttribute)) || Attribute.IsDefined(prop, typeof(ForeignKeyAttribute)))
continue;
switch (type)
{
case Type intType when intType == typeof(int) || intType == typeof(int?):
{
helper = helper.MapInteger("\"" + prop.Name + "\"", x => (int?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
break;
}
case Type stringType when stringType == typeof(string):
{
helper = helper.MapText("\"" + prop.Name + "\"", x => (string)typeof(T).GetProperty(prop.Name).GetValue(x, null));
break;
}
case Type dateType when dateType == typeof(DateTime) || dateType == typeof(DateTime?):
{
helper = helper.MapTimeStamp("\"" + prop.Name + "\"", x => (DateTime?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
break;
}
case Type decimalType when decimalType == typeof(decimal) || decimalType == typeof(decimal?):
{
helper = helper.MapMoney("\"" + prop.Name + "\"", x => (decimal?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
break;
}
case Type doubleType when doubleType == typeof(double) || doubleType == typeof(double?):
{
helper = helper.MapDouble("\"" + prop.Name + "\"", x => (double?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
break;
}
case Type floatType when floatType == typeof(float) || floatType == typeof(float?):
{
helper = helper.MapReal("\"" + prop.Name + "\"", x => (float?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
break;
}
case Type guidType when guidType == typeof(Guid):
{
helper = helper.MapUUID("\"" + prop.Name + "\"", x => (Guid)typeof(T).GetProperty(prop.Name).GetValue(x, null));
break;
}
}
}
return helper;
}
I use it the following way (I had entity named Undertaking):
var undertakingHelper = BulkMapper.CreateHelper<Model.Undertaking>("dbo", nameof(Model.Undertaking));
undertakingHelper.SaveAll(transaction.UnderlyingTransaction.Connection as Npgsql.NpgsqlConnection, undertakingsToAdd));
I showed an example with transaction, but it can also be done with normal connection retrieved from context. undertakingsToAdd is enumerable of normal entity records, which I want to bulkInsert into DB.
This solution, to which I've got after few hours of research and trying, is as you could expect much faster and finally easy to use and free! I really advice you to use this solution, not only for the reasons mentioned above, but also because it's the only one with which I had no problems with Postgresql itself, many other solutions work flawlessly for example with SqlServer.

Related

cmd.executescalar() works but throws ORA-25191 Exception

my Code is working, the function gives me the correct Select count (*) value but anyway, it throws an ORA-25191 Exception - Cannot reference overflow table of an index-organized table tips,
at retVal = Convert.ToInt32(cmd.ExecuteScalar());
Since I use the function very often, the exceptions slow down my program tremendously.
private int getSelectCountQueryOracle(string Sqlquery)
{
try
{
int retVal = 0;
using (DataTable dataCount = new DataTable())
{
using (OracleCommand cmd = new OracleCommand(Sqlquery))
{
cmd.CommandType = CommandType.Text;
cmd.Connection = oraCon;
using (OracleDataAdapter dataAdapter = new OracleDataAdapter())
{
retVal = Convert.ToInt32(cmd.ExecuteScalar());
}
}
}
return retVal;
}
catch (Exception ex)
{
exceptionProtocol("Count Function", ex.ToString());
return 1;
}
}
This function is called in a foreach loop
// function call in foreach loop which goes through tablenames
foreach (DataRow row in dataTbl.Rows)
{...
tableNameFromRow = row["TABLE_NAME"].ToString();
tableRows=getSelectCountQueryOracle("select count(*) as 'count' from " +tableNameFromRow);
tableColumns = getSelectCountQueryOracle("SELECT COUNT(*) as 'count' FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name='" + tableNameFromRow + "'");
...}
dataTbl.rows in this outer loop, in turn, comes from the query
SELECT * FROM USER_TABLES ORDER BY TABLE_NAME
If you're using a database-agnostic API like ADO.Net, you would almost always want to use the API's framework to fetch metadata rather than writing custom queries against each database's metadata tables. The various ADO.Net providers are much more likely to write data dictionary queries that handle all the various corner cases and are much more likely to be optimized than the queries you're likely to write. So rather than writing your own query to populate the dataTbl data table, you'd want to use the GetSchema method
DataTable dataTbl = connection.GetSchema("Tables");
If you want to keep your custom-coded data dictionary query for some reason, you'd need to filter out the IOT overflow tables since you can't query those directly.
select *
from user_tables
where iot_type IS NULL
or iot_type != 'IOT_OVERFLOW'
Be aware, however, that there are likely to be other tables that you don't want to try to get a count from. For example, the dropped column indicates whether a table has been dropped-- presumably, you don't want to count the number of rows in an object in the recycle bin. So you'd want a dropped = 'NO' predicate as well. And you can't do a count(*) on a nested table so you'd want to have a nested = 'NO' predicate as well if your schema happens to contain nested tables. There are probably other corner cases depending on the exact set of features your particular schema makes use of that the developers of the provider have added code for that you'd have to deal with.
So I'd start with
select *
from user_tables
where ( iot_type IS NULL
or iot_type != 'IOT_OVERFLOW')
and dropped = 'NO'
and nested = 'NO'
but know that you'll probably need/ want to add some additional filters depending on the specific features users make use of. I'd certainly much rather let the fine folks that develop the ADO.Net provider worry about all those corner cases than to deal with finding all of them myself.
Taking a step back, though, I'd question why you're regularly doing a count(*) on every table in a schema and why you need an exact answer. In most cases where you're doing counts, you're either doing a one-off where you don't much care how long it takes (i.e. a validation step after a migration) or approximate counts would be sufficient (i.e. getting a list of the biggest tables in the system in order to triage some effort or to track growth over time for projections) in which case you could just use the counts that are already stored in the data dictionary- user_tables.num_rows- from the last time that statistics were run.
This article helped me to solve my problem.
I've changed my query to this:
SELECT * FROM user_tables
WHERE iot_type IS NULL OR iot_type != 'IOT_OVERFLOW'
ORDER BY TABLE_NAME

Is this dynamic query vulnerable to SQL injection?

I am rather new to SQL in C# and I need some advice on SQL injection.
public System.Linq.IQueryable findBy(List<String> lWhere)
{
string sWhere;
foreach (var (sQueryPart, i) in lWhere.Select((Value, i) => (Value, i)))
{
if (i == 0)
{
sWhere = sQueryPart;
}
else if (i == 1)
{
sWhere += " = " + sQueryPart;
}
else if (i % 2 == 0)
{
sWhere += " and " + sQueryPart;
}
else
{
sWhere += " = " + sQueryPart;
}
}
return this.TABLE.FromSqlRaw("SELECT * FROM TABLE WHERE {0}", sWhere);
}
This method gets a list with entries like {"COLUMN1", "VALUE1" , "COLUMN2", "VALUE2"...}
After that, I build my Where clause using this list and enter it into the select statement.
First of all, the list might get replaced by a dictionary, actually I am pretty sure of that.
Secondly my question, is this safe against SQL injection? There shouldn't be user input other then using the method in program code, but no manual entries after that.
EDIT: it is important that I do not know the number of where clauses used, it could range from 1 to 4
If you are building your SQL query manually by concatenating strings, you are vulnerable to SQL injection. Full stop.
I don't understand why you are even doing this, as your code implies you are using Entity Framework. Which adds methods on your database entities to allow you to dynamically chain as many .Where() clauses as you require, precisely to remove the need for you to write SQL yourself, for example:
var results = dbContext.Table
.Where(t => t.Column1 == "foo")
.Where(t => t.Column2 == 42);
which will generate and execute SQL along the lines of:
select *
from Table
where Column1 = 'foo'
and Column2 = 42;
If you are using Entity Framework properly, you should almost never have to write any SQL yourself. EF will generate it for you, in a way that is not susceptible to SQL injection.
As long as the List of strings is not formed from user input you are safe from SQL Injection; however, this is still bad code:
You should not do select * From. That’s bad because if the table gets a new column that you don’t need -and it may very well be huge binary data- you will end up getting it without needing it, slowing your app considerably.
It’s better to create a stored procedure with parameters for filtering the data. That way, you protect your code from SQL Injection and you can still branch your SQL statement to perform the filtering based on the parameters passed.
If at any point in time your List of Strings is gathered from user input, your code will be vulnerable.

SQL DBGeography second insert fails

This one is a strange one. I am trying to save a polygon from Google maps into MS SQL, via an MVC controller. The problem is that the first time I do it, it works, the second time it gives me the error:
The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 3 ("#2"): The supplied value is not a valid instance of data type geography. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision.
I am using EntityFramework 6.1.3, code first. The error appears on the commit line below:
var newPoly = new GenericPolygon()
{
Name = webShape.Name,
PolyShape = shapePolygon,
IsEnabled = true,
IsDeleted = false
};
_unitOfWork.PolygonRepository.Add(newPoly);
_unitOfWork.Commit();
The SQL table structure is the same as the class except that it has an int ID identity column as well, and the name is a varchar(255). The PolyShape column is of type geography.
The shapePolygon variable is defined like this, with the class adding a read-only property called "LongLat", which is used to switch from the Google LatLong to the MS LongLat format:
var shapePolygon = DbGeography.PolygonFromText("POLYGON((" + webShape.LongLat + "))", 4326);
The commit line itself calls the db context save method (I'm using UoW pattern to cut down on code):
this.context.SaveChanges();
I can't for the life of me figure out why it works once, and then not again, unless I restart my VS (running VS 2013 with IIS Express - SQL 2008 R2 Enterprise on a server).
Any help or pointers would be appreciated :-)
I seem to have narrowed down on the issue, and whilst it is more of a workaround than an answer this may help someone else.
The issue is the version of SQL Server, namely SQL 2008 R2 10.50.4000. I migrated my database to SQL Server 2012 build 11.0.5058, after which the code worked, every time.
Hope this helps someone!
I just had this and solved it by reversing the points in the polygon. Apparently SQL Server is left handed with these things or something.
So instead of having a string concatenation like strGeog += string.Format("{0} {1}, ", latlong[0], latlong[1]); I changed it to:
foreach (XmlNode xnPoly in xmlPolyList)
{
strGeog = "";
firstlatlong = null;
if (xnPoly["coordinates"] != null)
{
latlongpairs = xnPoly["coordinates"].InnerText.Replace("\n", "").Split(' ');
foreach (string ll in latlongpairs)
{
latlong = ll.Split(',');
if (firstlatlong == null) firstlatlong = latlong;
strGeog = string.Format("{0} {1}, ", latlong[0], latlong[1]) + strGeog;
}
}
if (strGMPoly.Length > 0)
{
strGeog = strGeog.Substring(0, strGeog.Length - 2); //trim off the last comma and space
strGeog = "POLYGON((" + string.Format("{0} {1} ", firstlatlong[0], firstlatlong[1]) + strGeog + "))"; // conversion from WKT needs it to come back to the first point.
}
i++;
dbPCPoly = new PostCodePolygon();
dbPCPoly.geog = DbGeography.PolygonFromText(strGeog, 4326);
LocDB.PostCodePolygons.Add(dbPCPoly);
LocDB.SaveChanges();
Console.WriteLine(string.Format("Added Polygon {0} for Postcode ({1})", dbPCPoly.PCPolyID, dbPC.PostCodeName));
}

Best way to build a query with condition on code-behind?

How to make this code properly? I am not satisfied with this code, I'm lost.
I give you a simple example but the query is more complexe.
Thanks in advance.
string aValue;
string queryA;
string queryB;
string finalQuery;
string queryA = #"SELECT column1 FROM table1 WHERE column1=";
queryA += aValue;
string queryB = #"SELECT column1, column2,"
if (aValue == "all"){
queryB += #"column3";
}
queryB += #"FROM table1 WHERE column1=";
queryB += #"'" +aValue+ "'";
private void exportExcel(){
// change the value with a dropdownlist
if (ddlType.selectedIndex(1))
aValue = "typeA";
else if(ddlType.selectedIndex(2))
aValue = "typeB";
else
aValue = "all";
// select the query
if (aValue == "typeA")
finalQuery = queryA;
else if (aValue == "typeB")
finalQuery = queryB;
ExecQUery(finalQuery);
}
In both Java and C# (and pretty much any other platform) you should definitely not include the values directly in the SQL. That's opening up the way to SQL injection attacks, and also makes dealing with formatting for dates, times and numbers tricky.
Instead, you should use parameterized SQL, specifying the values in the parameters. How you do that varies between Java and C#, but the principle is the same.
Another approach on both platforms is to use an ORM of some description rather than building queries by hand. For example, in .NET you might want to use a LINQ provider of some description, and in Java you might want to use something like Hibernate. Either way you get to express your queries at a higher level of abstraction than just the raw SQL.
It's hard to give much more concrete advice without knowing what platform you're really using (or database) and without a real query to look at.
One small change you can do is set value attribute of dropdownlist to typeA,TypeB, etc.. and get rid of the initial if conditions and variables.
eg:
if(ddlType.selectedValue.toString()=="typeA")
finalQuery = queryA;
if(ddlType.selectedValue.toString()=="typeB")
finalQuery = queryB;
I usually load it from a resource file. This gives you some freedom to change the queries (this in case you don't need to generate it dynamically with if blocks). In source code I use formatting ending my line with a comment line in order to avoid my IDE to concatenate or put it all in one like like:
String sql = "select " + //
" * " + //
"from "+ //
" employee " + //
"where " + //
" salary > :minSal " + //
" and startDate > :minStartDate";
And in case of conditional part I just add it with a if block. But for where statements I just add one default "1=1" in order to proceed with additional limitations, so if there is no additional limitations the query will still be valid. Suppose both where statements in the SQL bellow were added conditionally:
String sql = "select " + //
" * " + //
"from "+ //
" employee " + //
"where 1 = 1 ";
Until here you have your base SQL, valid, that means if not condition is added it will still be valid.
Suppose you will add salary limitation just in case if it is informed:
if (salary != null) {
sql += "and salary > :minSalary";
parameters.put("minSalary", salary);
}
As you can see in the same condition I add a new expression to my SQL and a parameter to a map that will be used later in execute to set the parameters to the query, that avoids you to create a second if statement just to set this parameter.
Another approach that you could take is build the entire SQL and before the execution ask for the prepared statement which parameters it needs as input and provide them. In java you can do it with:
http://download.oracle.com/javase/1.4.2/docs/api/java/sql/PreparedStatement.html#getParameterMetaData%28%29
I know that is not the case but if ORM is used, is common to have Builders for queries and this turns this task much easier, for example in Hibernate you could have something like:
List cats = sess.createCriteria(Cat.class)
.add( Restrictions.like("name", "F%")
.addOrder( Order.asc("name") )
.addOrder( Order.desc("age") )
.setMaxResults(50)
.list();
As it is documented at:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/querycriteria.html
That means you could do this:
Criteria c = sess.createCriteria(Cat.class)
.addOrder( Order.asc("name") )
.addOrder( Order.desc("age") )
.setMaxResults(50);
if (name != null) {
c.add( Restrictions.like("name", name);
}
List cats = c.list();

Check if table exists in c#

I want to read data from a table whose name is supplied by a user. So before actually starting to read data, I want to check if the database exists or not.
I have seen several pieces of code on the NET which claim to do this. However, they all seem to be work only for SQL server, or for mysql, or some other implementation. Is there not a generic way to do this?
(I am already seperately checking if I can connect to the supplied database, so I'm fairly certain that a connection can be opened to the database.)
You cannot do this in a cross-database way. Generally DDL (that is, the code for creating tables, indexes and so on) is completely different from database to database and so the logic for checking whether tables exist is also different.
I would say the simplest answer, though, would simply be something like:
SELECT * FROM <table> WHERE 1 = 0
If that query gives an error, then the table doesn't exist. If it works (though it'll return 0 rows) then the table exists.
Be very careful with what you let the user input, though. What's to stop him from from specifying "sysusers" as the table name (in SQL Server, that'll be the list of all database users)
You can use the DbConnection.GetSchema family of methods to retreive metadata about the database. It will return a DataTable with schema objects. The exact object types and restriction values may vary from vendor to vendor, but I'm sure you can set up your check for a specific table in a way that will work in most databases.
Here's an example of using GetSchema that will print the name and owner of every table that is owned by "schema name" and called "table name". This is tested against oracle.
static void Main(string[] args)
{
string providerName = #"System.Data.OracleClient";
string connectionString = #"...";
DbProviderFactory factory = DbProviderFactories.GetFactory(providerName);
using (DbConnection connection = factory.CreateConnection())
{
connection.ConnectionString = connectionString;
connection.Open();
DataTable schemaDataTable = connection.GetSchema("Tables", new string[] { "schema name", "table name" });
foreach (DataColumn column in schemaDataTable.Columns)
{
Console.Write(column.ColumnName + "\t");
}
Console.WriteLine();
foreach (DataRow row in schemaDataTable.Rows)
{
foreach (object value in row.ItemArray)
{
Console.Write(value.ToString() + "\t");
}
Console.WriteLine();
}
}
}
That's like asking "is there a generic way to get related data" in databases. The answer is of course no - the only "generic way" is to have a data layer that hides the implementation details of your particular data source and queries it appropriately.
If you are really supporting and accessing many different types of databases without a Stategy design pattern or similar approach I would be quite surprised.
That being said, the best approach is something like this bit of code:
bool exists;
try
{
// ANSI SQL way. Works in PostgreSQL, MSSQL, MySQL.
var cmd = new OdbcCommand(
"select case when exists((select * from information_schema.tables where table_name = '" + tableName + "')) then 1 else 0 end");
exists = (int)cmd.ExecuteScalar() == 1;
}
catch
{
try
{
// Other RDBMS. Graceful degradation
exists = true;
var cmdOthers = new OdbcCommand("select 1 from " + tableName + " where 1 = 0");
cmdOthers.ExecuteNonQuery();
}
catch
{
exists = false;
}
}
Source: Check if a SQL table exists
You can do something like this:
string strCheck = "SHOW TABLES LIKE \'tableName\'";
cmd = new MySqlCommand(strCheck, connection);
if (connection.State == ConnectionState.Closed)
{
connection.Open();
}
cmd.Prepare();
var reader = cmd.ExecuteReader();
if (reader.HasRows)
{
Console.WriteLine("Table Exist!");
}
else (reader.HasRows)
{
Console.WriteLine("Table Exist!");
}

Categories