I'm using SQL CE 4.0 and am running into a StackOverflowException when the SqlCeDataAdapter attempts to Fill the DataTable.
The code is pretty standard :
using (var cmd = new SqlCeCommand(sql, conn, tran))
using (var dt = new DataTable())
{
using (var da = new SqlCeDataAdapter(cmd))
da.Fill(dt);
IList<DataRow> rows = dt.Rows.OfType<DataRow>().ToList();
foreach (DataRow row in rows)
{
// Processing ...
}
}
The sql string is in the form of :
SELECT * FROM [Table] WHERE Id IN ( ... )
The total character count is 1,068,369 due to the column value list being filled with 27,393 values of type uniqueidentifier.
Executing that query should return results with 1-for-1 rows from the database. Instead, the StackOverflowException occurs. I feel as though I am running up against some sort of SQL CE 4.0 limitation.
I've read Understanding Databases (SQL Server Compact) > Database Objects and under the Queries section it reads :
Characters in an SQL statement | Unlimited
1,068,369 characters is less than unlimited; PASS.
I've read DataTable Class and under the Remarks section it reads :
The maximum number of rows that a DataTable can store is 16,777,216
27,393 rows is less than 16,777,216; PASS.
Is there something else that I'm missing?
"The code is pretty standard" (!) - but the SQL statement is not!
It is obviously an engine bug/doc error, but do not expect a fix from Microsoft.
You will have to change the code to work with smaller batches of ids, resulting in a statement that is not 1 MB in size with 27.000 values.
This sounds like your sql server runs out of temp space when running this query. You can test this by changing the query to return only one row and not using the IN statement. if this works then as mentioned before check or ask your DBA to check that the system has been allocated the resources to run the query and build the result set before returning it to your application. PS. IF multiple persons use this application at the same time then this problem will scale horribly.
Related
I have read and implemented several different versions of Microsofts suggested methods for querying a SQL Server database. In all that I have read, each query is surrounded by a using statement, e.g. In some method DoQuery:
List<List<string>> DoQuery(string cStr, string query)
{
using(SqlConnection c = new SqlConnection(cStr))
{
c.Open();
using(SqlCommand cmd = new SqlCommand(queryStr, c))
{
using(SqlDataReader reader = cmd.ExecuteReader())
{
while (reader.Read() )
{
...
//read columns and put into list to return
}
// close all of the using blocks
}
}
}
// return the list of rows containing the list of column values.
}
I need to run this code several hundreds of times for different query strings against the same database. It seems that creating a new connection each time would be inefficient and dropping it each time wasteful.
How should I structure this so that it is efficient? When I tried not using a using block and passing the connection into the DoQuery method, I got messages about the connection had not been closed. If I closed it after the query, then I got messages about it wasn't open.
I'm also trying to improve this because I keep getting somewhat random
IOException: Unable to read data from the transport connection: Operation on non-blocking socket would block.
I'm the only user of the database at this time and I'm not doing anything in multiple threads or async, etc. Just looping through query strings and running DoQuery on them.
Could my structure be part of that problem, i.e. not releasing the resources fast enough and thereby seeing the connection blocked?
I'm stuck here on efficiency and this blocking problem. Thanks in advance.
As it turns out, the query structure was fine and the queries were fine. The problem was that I had an ‘order by X desc’ on each query and that column was not indexed. This caused a full table scan to order the rows even if only returning 2. The table has about 3 million rows and I thought it could handle that better than it does. It timed out with 360 second connection timeout! I indexed the column and no more ‘blocking’ nonsense, which BTW, is a horrible message to return when it was actually a timeout. The queries now run fine if I index every column that appears in a where clause.
SqlCommand query = new SqlCommand();
query.Connection = dbconn;
try {
query.CommandText = "SELECT email,LastSync FROM users WHERE email IN ('alexander#contoso.com')";
sdr = query.ExecuteReader();
while(sdr.Read()) if(!sdr.IsDBNull(1)) syncedUsers.Add(sdr.GetString(0));
sdr.Close();
} catch(Exception e) {
logger.Log(e.Message);
List<string> fields = new List<string>();
if(sdr!=null) for (int i = 0; i < sdr.FieldCount; i++)
{
fields.Add(sdr.GetName(i));
}
logger.Log(String.Join(",",fields));
}
This sometimes (not always) throws the error:
Unable to cast object of type 'System.Int32' to type 'System.String'.
and the one time that I looked, the logging then contained a list of all fields of the table, not only the two requested fields:
id,email,LastSync,...
although the query is about two columns only. id is an int, email is a string, so I assume that the error is thrown in GetString(0). The error is not reliably reproducible, I did not yet get it while VS debugging was enabled.
Now, as long as that reader is open, the field list cannot change, or so I think. Is it possible under certain circumstances that the SQL Server returns more fields than I requested?
It is a SQL Server Express 2012.
I know that this can be "solved" by using column names instead of indexes, but before I do, I want to make sure that this is really the issue. I cannot reproduce the issue across all machines I deployed the code to, only certain machines seem to be affected (it doesn't seem to depend on the # of cores available to the machine, nor on the operating system or the .NET version)
EDIT: I now have the same problem when using ExecuteScalar(), on the same table. Doesn't happen with other tables, as far as I can see. I try to get a string using the query
SELECT displayName FROM users WHERE email=#email
using ExecuteScalar(), and sometimes the result is 4, which is the id of the row I requested. So it seems to get the right row, but not the right columns.
This is really weird!
I'm at the end of my rope on this issue. Trying to connect to a FoxPro directory hosted locally, using the Microsoft OLE DB Provider for Visual FoxPro 9.0 with the following code:
using (var con = new OleDbConnection(#"Data Source=C:\FoxDB;Provider=VFPOLEDB.1;"))
{
con.Open();
using (var cmd = new OleDbCommand("select * from order", con))
{
var dr = cmd.ExecuteReader();
while (dr.Read())
{
Debug.WriteLine(dr["ord_id"]);
}
}
}
Executing the code does not throw an exception, but there are zero rows returned. The schema is discovered and returned, as it has 72 fields present when I examine the data reader (I've done the same with data tables, data sets, data adapters, etc. All return the schema, but zero results).
Building an SSIS package to access the same table and pull into a MSSQL database results in 3,828 records being pulled in. order.dbf on disk is 884kb which seems to jive with the SSIS results I've pulled in.
I've tried adding Collation Sequence to the connection string, both Machine and General to no effect.
Please tell me there's something obvious I'm missing!
UPDATE: So apparently there's something that I just don't understand about FoxPro. If I change the command type to CommandType.TableDirect and switch the command text to just be order, it returns all the rows. Any insight would be appreciated.
I think the problem is not with Foxpro, testing with the same exact code, I can get the result (created a free test table in c:\FoxDb). Ensure that you are using the latest VFPOLEDB driver. It looks like the problem lies within your table.
BTW, order is a keyword, you better write it as:
"select * from [order]"
although it would work as you did (VFP is forgiving in that regard). The problem might also lie in collation sequence you have used (I never use collation sequences other than machine, they are problematic in Turkish, I expect the same in some other languages).
I have a datatable that comes from an SQL request. While I am really working against a table using OLEDB, even if I get the table from my SQL server, I have the same problem.
If I fill the datatable and then query the DataColumns - they all say AllowDBNull== true and allowNull == true. But if I look at the table in SSMS, it states otherwise.
string selectStmt= "Select * from foobar; "
DataSet NewData = new DataSet();
using (SqlConnection DataConn = new SqlConnection(MyConnectionString))
{
SqlDataAdapter DataAdapter = new SqlDataAdapter(selectStmt, DataConn );
var Results = DataAdapter.Fill(NewData, tableName);
}
DataColumn Col = NewData.Tables[0].Columns[0];
// Col.AllowDBNull is always true as is Col.AllowNull
I also can't seem to figure out where to get the length of a string field.
This makes it a little difficult to implement some simple client side error checking before I try to upload data.
If I were only dealing with SQL server based tables, I could use Microsoft.SqlServer.Management.Sdk and Microsoft.SqlServer.Management.Smo. Since I am not, that's out.
Try
var Results = DataAdapter.FillSchema(NewData, SchemaType.Source, tableName);
See if that gives you the level of schema detail you need.
A ResultSet isn't going to know column schema data like that, it would be too intensive an operation to do that per command execution, instead the runtime will create schema information on the fly only using the data it gets back in the data/result-set. For full blown schema you'd have to use something like EF or code the schema yourself. The only thing you can rely on for runtime schema's is the data type (unless the data columns were specifically coded with their attributes).
To properly test for DbNull you do this:
if ( dataRow[colNameOrIndex].Value == DbNull.Value){
//null
}
I am using this code to insert 1 million records into an empty table in the database. Ok so without much code I will start from the point I have already interacted with data, and read the schema into a DataTable:
So:
DataTable returnedDtViaLocalDbV11 = DtSqlLocalDb.GetDtViaConName(strConnName, queryStr, strReturnedDtName);
And now that we have returnedDtViaLocalDbV11 lets create a new DataTable to be a clone of the source database table:
DataTable NewDtForBlkInsert = returnedDtViaLocalDbV11.Clone();
Stopwatch SwSqlMdfLocalDb11 = Stopwatch.StartNew();
NewDtForBlkInsert.BeginLoadData();
for (int i = 0; i < 1000000; i++)
{
NewDtForBlkInsert.LoadDataRow(new object[] { null, "NewShipperCompanyName"+i.ToString(), "NewShipperPhone" }, false);
}
NewDtForBlkInsert.EndLoadData();
DBRCL_SET.UpdateDBWithNewDtUsingSQLBulkCopy(NewDtForBlkInsert, tblClients._TblName, strConnName);
SwSqlMdfLocalDb11.Stop();
var ResSqlMdfLocalDbv11_0 = SwSqlMdfLocalDb11.ElapsedMilliseconds;
This code is populating 1 million records to an embedded SQL database (localDb) in 5200ms. The rest of the code is just implementing the bulkCopy but I will post it anyway.
public string UpdateDBWithNewDtUsingSQLBulkCopy(DataTable TheLocalDtToPush, string TheOnlineSQLTableName, string WebConfigConName)
{
//Open a connection to the database.
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings[WebConfigConName].ConnectionString))
{
connection.Open();
// Perform an initial count on the destination table.
SqlCommand commandRowCount = new SqlCommand("SELECT COUNT(*) FROM "+TheOnlineSQLTableName +";", connection);
long countStart = System.Convert.ToInt32(commandRowCount.ExecuteScalar());
var nl = "\r\n";
string retStrReport = "";
retStrReport = string.Concat(string.Format("Starting row count = {0}", countStart), nl);
retStrReport += string.Concat("==================================================", nl);
// Create a table with some rows.
//DataTable newCustomers = TheLocalDtToPush;
// Create the SqlBulkCopy object.
// Note that the column positions in the source DataTable
// match the column positions in the destination table so
// there is no need to map columns.
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = TheOnlineSQLTableName;
try
{
// Write from the source to the destination.
for (int colIndex = 0; colIndex < TheLocalDtToPush.Columns.Count; colIndex++)
{
bulkCopy.ColumnMappings.Add(colIndex, colIndex);
}
bulkCopy.WriteToServer(TheLocalDtToPush);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
// Perform a final count on the destination
// table to see how many rows were added.
long countEnd = System.Convert.ToInt32(
commandRowCount.ExecuteScalar());
retStrReport += string.Concat("Ending row count = ", countEnd, nl);
retStrReport += string.Concat("==================================================", nl);
retStrReport += string.Concat((countEnd - countStart)," rows were added.", nl);
retStrReport += string.Concat("New Customers Was updated successfully", nl, "END OF PROCESS !");
//Console.ReadLine();
return retStrReport;
}
}
Trying it via a connection to SQL server was around 7000ms(at best) & ~7700ms average. Also via a random kv nosql database took around 40 sec (really I did not even keep records of it as it passed over the x2 of sql variants). So... is there a faster way than what I was testing in my code?
Edit
i am using win7 x64 8gb ram and most important i should think (as i5 3ghz) is not so great by now
the x3 500Gb Wd on Raid-0 does the job even better
but i am just saying if you will check on your pc
though just compare it to any other method in your configuration
Have you tried SSIS? I have never written an SSIS package with a loacldb connection, but this is the sort of activity SSIS should be well suited.
If your data source is a SQL Server, another idea would be setting up a linked server. Not sure if this would work with localdb. If you can set up a linked server, you could bypass the C# all together and load your data with an INSERT .. SELECT ... FROM ... SQL statement.
you can use Dapper.NET.
Dapper is a micro-ORM, executes a query and map the results to a strongly typed List.
Object-relational mapping (ORM, O/RM, and O/R mapping) in computer software is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a “virtual object database” that can be used from within the programming language
For more info:
check out https://code.google.com/p/dapper-dot-net/
GitHub Repository: https://github.com/SamSaffron/dapper-dot-net
Hope It helps..
Remove looping... In SQL, try to make a table with 1 million rows... and left join it use this for insert/select data
Try sending it without storing it in a datatable.
See the example at the end of this post, that allows you to do it with an enumerator http://www.developerfusion.com/article/122498/using-sqlbulkcopy-for-high-performance-inserts/
If you are just creating nonsense data, create a stored procedure and just call that through .net
If you are passing real data, again passing it to a stored proc would be quicker but you would be best off dropping the table and recreating it with the data.
If you insert one row at a time, it will take longer than inserting it all at once. It will take even longer if you have indexes to write.
Create a single XML file for all rows you want to save into data base. Pass this XML to SQL stored procedure and save all record in one call only.
But your stored procedure must be written so that it can read all read then insert into table.
If this is a new project I recommend you to use Entity Framework. In this case you can create a List<> with an object with all the data you need and then simply add it entirely to the corresponding table.
This way you are quickly geting the needed data and then sending it to the database at once.
I agree with Mike on SSIS but it my not suit your environment, however for ETL processes that involve cross server calls and general data flow processes it is a great built in tool and highly integrated.
With 1 million rows you will likely have to do a bulk insert. Depending on the row size you would not really be able to use a stored procedure unless you did this in batches. A datatable will fill memory pretty quick, again depending on the row size. You could make a stored procedure and have that take a table type and call that every X number of rows but why would we do this when you already have a better, more scalable solution. That million rows could be 50 million next year.
I have used SSIS a bit and if that is an organizational fit I would suggest looking at it, but it wouldn't be a one time answer, wouldn't be worth the dependencies.