Selecting million records from SQL Server

Selecting million records from SQL Server - c#

We need to index (in ASP.NET) all our records stored in a SQL Server table. That table has around 2M records with text (nvarchar) data too in each row.
Is it okay to fetch all records in one go as we need to index them (for search)? What is the other option (I want to avoid pagination)?
Note: I am not displaying these records, just need all of them in one go so that I can index them via a background thread.
Do I need to set any long time outs for my query? If yes, what is the most effective method for setting longer time outs if I am running the query from ASP.NET page?

If I needed something like this, just thinking about it from the database side, I'd probably export it to a file. Then that file can get moved around pretty easily. Moving around data sets that large is a huge pain to all involved. You can use SSIS, sqlcmd or even bcp in a batch command to get it done.
Then, you just have to worry about what you're doing with it on the app side, no worries about locking & everything on the database side once you've exported it.

I don't think a page is a good place for this regardless. There should be a different process or program that does this. On a related note maybe something like http://incubator.apache.org/lucene.net/ would help you?

Is it okay to fetch all records in one go as we need to index them
(for search)? What is the other option (I want to avoid pagination)?
Memory Management Issue / Performance Issue
You can face System Out Of Memory Exception in case you are bringing 2 millions of records
As you will be keeping all those records in DataSet and the dataset memory will be in RAM.
Do I need to set any long time outs for my query? If yes, what is the
most effective method for setting longer time outs if I am running the
query from ASP.NET page?
using (System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand())
{
cmd.CommandTimeout = 0;
}
Suggestion
It's better to filter out the record from database level...
Fetch all records from database and save it in a file. Access that file for any intermediate operations.

What you describe in Extract Transform Load (ETL). there are 2 options I'm aware of:
SSIS which is part of sql server
Rhino.ETL
I prefer Rhino.Etl as it's comletely written in C#, you can create scripts in Boo and it's much easier to test and compose ETL Processes. And the library is built to handle large sets of data, so memory management is built in.
One final note: while asp.net might be the entry point to start the indexing process, I wouldn't run the process within asp.net as it could take minutes or hours depending on the amount of records and processing.
instead have asp.net be the entry point to fires off a background task to process the records. Ideally, completely independent of asp.net so you avoid any timeout or shutdown issues.

Process your records in batches. You are going to have two main issues. (1) You need to index all of the existing records. (2) you will want to update the index with records that were added, updated or deleted. It might sound eaiser just to drop the index and recreate it, but it should be avoided if possible. Below is an example of processing the [Production].[TransactionHistory] from the AdventureWorks2008R2 database in batches of 10,000 records. It does not load all of the records into memory. Output on my local computer produces Processed 113443 records in 00:00:00.2282294. Obviously, this doesn't take into consideration remote computer and processing time for each record.
class Program
{
private static string ConnectionString
{
get { return ConfigurationManager.ConnectionStrings["db"].ConnectionString; }
}
static void Main(string[] args)
{
int recordCount = 0;
int lastId = -1;
bool done = false;
Stopwatch timer = Stopwatch.StartNew();
do
{
done = true;
IEnumerable<TransactionHistory> transactionDataRecords = GetTransactions(lastId, 10000);
foreach (TransactionHistory transactionHistory in transactionDataRecords)
{
lastId = transactionHistory.TransactionId;
done = false;
recordCount++;
}
} while (!done);
timer.Stop();
Console.WriteLine("Processed {0} records in {1}", recordCount, timer.Elapsed);
}
/// Get a new open connection
private static SqlConnection GetOpenConnection()
{
SqlConnection connection = new SqlConnection(ConnectionString);
connection.Open();
return connection;
}
private static IEnumerable<TransactionHistory> GetTransactions(int lastTransactionId, int count)
{
const string sql = "SELECT TOP(#count) [TransactionID],[TransactionDate],[TransactionType] FROM [Production].[TransactionHistory] WHERE [TransactionID] > #LastTransactionId ORDER BY [TransactionID]";
return GetData<TransactionHistory>((connection) =>
{
SqlCommand command = new SqlCommand(sql, connection);
command.Parameters.AddWithValue("#count", count);
command.Parameters.AddWithValue("#LastTransactionId", lastTransactionId);
return command;
}, DataRecordToTransactionHistory);
}
// funtion to convert a data record to the TransactionHistory object
private static TransactionHistory DataRecordToTransactionHistory(IDataRecord record)
{
TransactionHistory transactionHistory = new TransactionHistory();
transactionHistory.TransactionId = record.GetInt32(0);
transactionHistory.TransactionDate = record.GetDateTime(1);
transactionHistory.TransactionType = record.GetString(2);
return transactionHistory;
}
private static IEnumerable<T> GetData<T>(Func<SqlConnection, SqlCommand> commandBuilder, Func<IDataRecord, T> dataFunc)
{
using (SqlConnection connection = GetOpenConnection())
{
using (SqlCommand command = commandBuilder(connection))
{
using (IDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
T record = dataFunc(reader);
yield return record;
}
}
}
}
}
}
public class TransactionHistory
{
public int TransactionId { get; set; }
public DateTime TransactionDate { get; set; }
public string TransactionType { get; set; }
}

Related

Field length is > maximum

I'm working on some middleware in C# using the System.Data.Odbc library to interact with a v10 PSQL database. I have a set of working select and insert queries I run in sequence where occasionally the full sequence will execute without issue but most of the time for each INSERT query in the sequence my error handling catches the exception:
ERROR [HY000][Pervasive][ODBC Client Interface][LNA][Pervasive][ODBC Engine Interface][Date Record Manager]Field length is > maximum
I'm trying to understand what this means and how to resolve it.
This is on a Windows 2008 R2 server. I'm using C# in Visual Studios Community 2015 to take information from a web system (no issues here) and add sales orders to the other system sitting on the server which uses a Pervasive SQL v10 database.
The PSQL tables are massive with 80-160 columns, so for the 3 tables I need to write to I run a select query first to fetch the excess values, then bind them as parameters for the insert query. There are 4 SELECT/INSERT routines run in sequence, with the last one running n times in a foreach loop.
I've been able to run this ODBC SELECT/INSERT sequence on this system in the past using MS Access and PHP. I've tried cleaning the solution, rebooting the server and rebuilding, as well as adding additional Dispose() calls on the commands, but I still get these errors.
class PSQLOrderConnector
{
private OdbcConnection Odbc { get; }
public PSQLOrderConnector()
{
Odbc = new OdbcConnection(Constants.ODBCSTRING);
Odbc.Open();
}
/*
...
*/
public void CreateOrderAddressBillTo(string CustomerCode, string OrderNumber, string AddDate, int AddTime)
{
OdbcCommand cmdSelectAddressB = new OdbcCommand();
OdbcCommand cmdInsertAddressB = new OdbcCommand();
cmdSelectAddressB = this.Odbc.CreateCommand();
cmdSelectAddressB.CommandText = OrderQueries.PQSL_SELECT_ADDRESS_BY_CUSTOMER;
cmdSelectAddressB.Parameters.Add("#CEV_NO", OdbcType.Text).Value = CustomerCode;
OdbcDataReader reader = cmdSelectAddressB.ExecuteReader();
reader.Read();
var NAME = reader.GetString(0);
/* ...repeat for the next 80 columns */
cmdSelectAddressB.Dispose();
cmdInsertAddressB = this.Odbc.CreateCommand();
cmdInsertAddressB.CommandText = OrderQueries.PSQL_INSERT_ORDER_ADDRESS;
cmdInsertAddressB.Parameters.Add("NAME", OdbcType.Text).Value = NAME;
/* ...repeat for the next 80 variables */
try
{
int result = cmdInsertAddressB.ExecuteNonQuery();
}
catch (OdbcException odbce)
{
//Exception error gets thrown here
}
cmdInsertAddressB.Dispose();
}
/*
...
*/
}
class Order
{
private PSQLOrderConnector PSQLOrder { get; }
public Order()
{
PSQLOrder = new PSQLOrderConnector();
}
/*
...
*/
public void AddOrders(List<businessEvent> Events )
{
/*
...
*/
/* These 4 calls either pass in sequence or fail in sequence on the Try/Catch in the above class*/
PSQLOrder.CreateOrderHeader(OrderNumber, CustomerCode, PONumber, SubTotal, CurrentCost, AverageCost, AddDate, AddTime);
/* This is the method detailed above */
PSQLOrder.CreateOrderAddressBillTo(CustomerCode, OrderNumber, AddDate, AddTime);
PSQLOrder.CreateOrderAddressShipTo(CustomerCode, ShipToCode, OrderNumber, AddDate, AddTime);
int recNo = 1;
foreach (ItemLine line in itemLines)
{
PSQLOrder.CreateOrderDetail( OrderNumber, recNo, line.ItemCode, line.Quantity, line.Price, AddDate, AddTime);
recNo++;
}
}
}
(I edited the code for cleaner posting here, hopefully there's no typo's)
With the last lines running the function calls either the error triggers for each insert in sequence, or the entire sequence completes successfully. Using the same or different inputs this occurs randomly with roughly a 80/20 failure/success rate.

I've resolved my issue, the error message was exactly right and was pointing to my timestamp. Because I set it once with a variable AddTime then sent that to each of the 4 function calls is why it would fail in sequence but sometimes work, probably in the first 9 seconds of the system minute.

Deadlock: Insert into a filetable statements appears to block one another

We are having trouble inserting into a filetable. This is our current constellation, I will try to explain it as detailed as possible:
Basically we have three tables:
T_Document (main metadata of a document)
T_Version (versioned metadata of a document)
T_Content (the binary content of a document, FileTable)
Our WCF service saves the documents and is being used by multiple persons. The service will open a transaction and will call the method SaveDocument which saves the documents:
//This is the point where the tranaction starts
using (IDBTransaction tran = db.GetTransaction())
{
try
{
m_commandFassade.SaveDocument(document, m_loginName, db, options, lastVersion);
tran.Commit();
return document;
}
catch
{
tran.Rollback();
throw;
}
}
The SaveDocument method looks like this:
public void SaveDocument(E2TDocument document, string login, IDBConnection db, DocumentUploadOptions options, int lastVersion)
{
document.GuardNotNull();
options.GuardNotNull();
if (lastVersion == -1)
{
//inserting new T_Document
SaveDocument(document, db);
}
else
{
//updating the existing T_Document
UpdateDocument(document, db); //document already exists, updating it
}
Guid contentID = Guid.NewGuid();
//inserting the content
SaveDocumentContent(document, contentID, db);
//inserting the new / initial version
SaveDocumentVersion(document, contentID, db);
}
Basically all the methods you see are either inserting or updating those three tables. The insert content query, that appears to make some trouble like this:
INSERT INTO T_Content
(stream_id
,file_stream
,name)
VALUES
(#ContentID
,#Content
,#Title)
And the method (please take this as pseudo code):
private void SaveDocumentContent(E2TDocument e2TDokument, Guid contentID, IDBConnection db)
{
using (m_log.CreateScope<MethodScope>(GlobalDefinitions.TracePriorityForData))
{
Command cmd = CommandFactory.CreateCommand("InsertContents");
cmd.SetParameter("ContentID", contentID);
cmd.SetParameter("Content", e2TDokument.Content);
string title = string.Concat(e2TDokument.Titel.RemoveIllegalPathCharacters(), GlobalDefinitions.UNTERSTRICH,
contentID).TrimToMaxLength(MaxLength_T_Contents_Col_Name, SuffixLength_T_Contents_Col_Name);
cmd.SetParameter("Title", title);
db.Execute(cmd);
}
}
I have no experience in deadlock-analysis, but the deadlock graphs show me that when inserting the content into the filetable, it appears to be deadlocked with another process also writing into the same table at the same time.
(the other side shows the same statement, my application log confirms two concurrent tries to save documents)
The same deadlock appears 30 times a day. I already shrinked the transaction to a minimum, removing all unneccessary selects, but yet I had no luck to solve this issue.
What I'm most curious about is how its possible to deadlock on an insert into a filetable. Are there internal things that are being executed that I'm not aware of. I saw some strange statements in the profiler trace on that table, that we dont use anywhere in the code, e.g.:
set #pathlocator = convert(hierachyid, #path_locator__bin)
and things like:
if exists (
select 1
from [LGOL_Content01].[dbo].[T_Contents]
where parent_path_locator = #path_locator
)
If you need any more details, please let me know. Any tips how to proceed would be awesome.
Edit 1:\
Following you find the execution plan for the T_Content insert:

So, after hours and hours and research and consulting with microsoft, the deadlock is actually a filetable / sql server related bug, which will be hotfixed by Microsoft.

Very slow foreach loop

I am working on an existing application. This application reads data from a huge file and then, after doing some calculations, it stores the data in another table.
But the loop doing this (see below) is taking a really long time. Since the file sometimes contains 1,000s of records, the entire process takes days.
Can I replace this foreach loop with something else? I tried using Parallel.ForEach and it did help. I am new to this, so will appreciate your help.
foreach (record someredord Somereport.r)
{
try
{
using (var command = new SqlCommand("[procname]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add(…);
IAsyncResult result = command.BeginExecuteReader();
while (!result.IsCompleted)
{
System.Threading.Thread.Sleep(10);
}
command.EndExecuteReader(result);
}
}
catch (Exception e)
{
…
}
}
After reviewing the answers , I removed the Async and used edited the code as below. But this did not improve performance.
using (command = new SqlCommand("[sp]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
foreach (record someRecord in someReport.)
{
command.Parameters.Clear();
command.Parameters.Add(....)
command.Prepare();
using (dr = command.ExecuteReader())
{
while (dr.Read())
{
if ()
{
}
else if ()
{
}
}
}
}
}

Instead of looping the sql connection so many times, ever consider extracting the whole set of data out from sql server and process the data via the dataset?
Edit: Decided to further explain what i meant..
You can do the following, pseudo code as follow
Use a select * and get all information from the database and store them into a list of the class or dictionary.
Do your foreach(record someRecord in someReport) and do the condition matching as usual.

Step 1: Ditch the try at async. It isn't implemented properly and you're blocking anyway. So just execute the procedure and see if that helps.
Step 2: Move the SqlCommand outside of the loop and reuse it for each iteration. that way you don't incurr the cost of creating and destroying it for every item in your loop.
Warning: Make sure you reset/clear/remove parameters you don't need from the previous iteration. We did something like this with optional parameters and had 'bleed-thru' from the previous iteration because we didn't clean up parameters we didn't need!

Your biggest problem is that you're looping over this:
IAsyncResult result = command.BeginExecuteReader();
while (!result.IsCompleted)
{
System.Threading.Thread.Sleep(10);
}
command.EndExecuteReader(result);
The entire idea of the asynchronous model is that the calling thread (the one doing this loop) should be spinning up ALL of the asynchronous tasks using the Begin method before starting to work with the results with the End method. If you are using Thread.Sleep() within your main calling thread to wait for an asynchronous operation to complete (as you are here), you're doing it wrong, and what ends up happening is that each command, one at a time, is being called and then waited for before the next one starts.
Instead, try something like this:
public void BeginExecutingCommands(Report someReport)
{
foreach (record someRecord in someReport.r)
{
var command = new SqlCommand("[procname]", sqlConn);
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add(…);
command.BeginExecuteReader(ReaderExecuted,
new object[] { command, someReport, someRecord });
}
}
void ReaderExecuted(IAsyncResult result)
{
var state = (object[])result.AsyncState;
var command = state[0] as SqlCommand;
var someReport = state[1] as Report;
var someRecord = state[2] as Record;
try
{
using (SqlDataReader reader = command.EndExecuteReader(result))
{
// work with reader, command, someReport and someRecord to do what you need.
}
}
catch (Exception ex)
{
// handle exceptions that occurred during the async operation here
}
}

In SQL on the other end of a write is a (one) disk. You rarely can write faster in parallel. In fact in parallel often slows it down due to index fragmentation. If you can sort the data by primary (clustered) key prior to loading. In a big load even disable other keys, load data rebuild keys.
Not really sure what are doing in the asynch but for sure it was not doing what you expected as it was waiting on itself.
try
{
using (var command = new SqlCommand("[procname]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
foreach (record someredord Somereport.r)
{
command.Parameters.Clear()
command.Parameters.Add(…);
using (var rdr = command.ExecuteReader())
{
while (rdr.Read())
{
…
}
}
}
}
}
catch (…)
{
…
}

As we were talking about in the comments, storing this data in memory and working with it there may be a more efficient approach.
So one easy way to do that is to start with Entity Framework. Entity Framework will automatically generate the classes for you based on your database schema. Then you can import a stored procedure which holds your SELECT statement. The reason I suggest importing a stored proc into EF is that this approach is generally more efficient than doing your queries in LINQ against EF.
Then run the stored proc and store the data in a List like this...
var data = db.MyStoredProc().ToList();
Then you can do anything you want with that data. Or as I mentioned, if you're doing a lot of lookups on primary keys then use ToDictionary() something like this...
var data = db.MyStoredProc().ToDictionary(k => k.MyPrimaryKey);
Either way, you'll be working with your data in memory at this point.

It seems executing your SQL command puts lock on some required resources and that's the reason enforced you to use Async methods (my guess).
If the database in not in use, try an exclusive access to it. Even then in there are some internal transactions due to data-model complexity consider consulting to database designer.

C# OutOfMemory, Mapped Memory File or Temp Database

Seeking some advice, best practice etc...
Technology: C# .NET4.0, Winforms, 32 bit
I am seeking some advice on how I can best tackle large data processing in my C# Winforms application which experiences high memory usage (working set) and the occasional OutOfMemory exception.
The problem is that we perform a large amount of data processing "in-memory" when a "shopping-basket" is opened. In simplistic terms when a "shopping-basket" is loaded we perform the following calculations;
For each item in the "shopping-basket" retrieve it's historical price going all the way back to the date the item first appeared in-stock (could be two months, two years or two decades of data). Historical price data is retrieved from text files, over the internet, any format which is supported by a price plugin.
For each item, for each day since it first appeared in-stock calculate various metrics which builds a historical profile for each item in the shopping-basket.
The result is that we can potentially perform hundreds, thousand and/or millions of calculations depending upon the number of items in the "shopping-basket". If the basket contains too many items we run the risk of hitting a "OutOfMemory" exception.
A couple of caveats;
This data needs to be calculated for each item in the "shopping-basket" and the data is kept until the "shopping-basket" is closed.
Even though we perform steps 1 and 2 in a background thread, speed is important as the number of items in the "shopping-basket" can greatly effect overall calculation speed.
Memory is salvaged by the .NET garbage collector when a "shopping-basket" is closed. We have profiled our application and ensure that all references are correctly disposed and closed when a basket is closed.
After all the calculations are completed the resultant data is stored in a IDictionary. "CalculatedData is a class object whose properties are individual metrics calculated by the above process.
Some ideas I've thought about;
Obviously my main concern is to reduce the amount of memory being used by the calculations however the volume of memory used can only be reduced if I
1) reduce the number of metrics being calculated for each day or
2) reduce the number of days used for the calculation.
Both of these options are not viable if we wish to fulfill our business requirements.
Memory Mapped Files
One idea has been to use memory mapped files which will store the data dictionary. Would this be possible/feasible and how can we put this into place?
Use a temporary database
The idea is to use a separate (not in-memory) database which can be created for the life-cycle of the application. As "shopping-baskets" are opened we can persist the calculated data to the database for repeated use, alleviating the requirement to recalculate for the same "shopping-basket".
Are there any other alternatives that we should consider? What is best practice when it comes to calculations on large data and performing them outside of RAM?
Any advice is appreciated....

The easiest solution is a database, perhaps SQLite. Memory mapped files don't automatically become dictionaries, you would have to code all the memory management yourself, and thereby fight with the .net GC system itself for ownership of he data.

If you're interested in trying the memory mapped file approach, you can try it now. I wrote a small native .NET package called MemMapCache that in essence creates a key/val database backed by MemMappedFiles. It's a bit of a hacky concept, but the program MemMapCache.exe keeps all references to the memory mapped files so that if your application crashes, you don't have to worry about losing the state of your cache.
It's very simple to use and you should be able to drop it in your code without too many modifications. Here is an example using it: https://github.com/jprichardson/MemMapCache/blob/master/TestMemMapCache/MemMapCacheTest.cs
Maybe it'd be of some use to you to at least further figure out what you need to do for an actual solution.
Please let me know if you do end up using it. I'd be interested in your results.
However, long-term, I'd recommend Redis.

As an update for those stumbling upon this thread...
We ended up using SQLite as our caching solution. The SQLite database we employ exists separate to the main data store used by the application. We persist calculated data to the SQLite (diskCache) as it's required and have code controlling cache invalidation etc. This was a suitable solution for us as we were able to achieve write speeds up and around 100,000 records per second.
For those interested, this is the code that controls inserts into the diskCache. Full credit for this code goes to JP Richardson (shown answering a question here) for his excellent blog post.
internal class SQLiteBulkInsert
{
#region Class Declarations
private SQLiteCommand m_cmd;
private SQLiteTransaction m_trans;
private readonly SQLiteConnection m_dbCon;
private readonly Dictionary<string, SQLiteParameter> m_parameters = new Dictionary<string, SQLiteParameter>();
private uint m_counter;
private readonly string m_beginInsertText;
#endregion
#region Constructor
public SQLiteBulkInsert(SQLiteConnection dbConnection, string tableName)
{
m_dbCon = dbConnection;
m_tableName = tableName;
var query = new StringBuilder(255);
query.Append("INSERT INTO ["); query.Append(tableName); query.Append("] (");
m_beginInsertText = query.ToString();
}
#endregion
#region Allow Bulk Insert
private bool m_allowBulkInsert = true;
public bool AllowBulkInsert { get { return m_allowBulkInsert; } set { m_allowBulkInsert = value; } }
#endregion
#region CommandText
public string CommandText
{
get
{
if(m_parameters.Count < 1) throw new SQLiteException("You must add at least one parameter.");
var sb = new StringBuilder(255);
sb.Append(m_beginInsertText);
foreach(var param in m_parameters.Keys)
{
sb.Append('[');
sb.Append(param);
sb.Append(']');
sb.Append(", ");
}
sb.Remove(sb.Length - 2, 2);
sb.Append(") VALUES (");
foreach(var param in m_parameters.Keys)
{
sb.Append(m_paramDelim);
sb.Append(param);
sb.Append(", ");
}
sb.Remove(sb.Length - 2, 2);
sb.Append(")");
return sb.ToString();
}
}
#endregion
#region Commit Max
private uint m_commitMax = 25000;
public uint CommitMax { get { return m_commitMax; } set { m_commitMax = value; } }
#endregion
#region Table Name
private readonly string m_tableName;
public string TableName { get { return m_tableName; } }
#endregion
#region Parameter Delimiter
private const string m_paramDelim = ":";
public string ParamDelimiter { get { return m_paramDelim; } }
#endregion
#region AddParameter
public void AddParameter(string name, DbType dbType)
{
var param = new SQLiteParameter(m_paramDelim + name, dbType);
m_parameters.Add(name, param);
}
#endregion
#region Flush
public void Flush()
{
try
{
if (m_trans != null) m_trans.Commit();
}
catch (Exception ex)
{
throw new Exception("Could not commit transaction. See InnerException for more details", ex);
}
finally
{
if (m_trans != null) m_trans.Dispose();
m_trans = null;
m_counter = 0;
}
}
#endregion
#region Insert
public void Insert(object[] paramValues)
{
if (paramValues.Length != m_parameters.Count)
throw new Exception("The values array count must be equal to the count of the number of parameters.");
m_counter++;
if (m_counter == 1)
{
if (m_allowBulkInsert) m_trans = m_dbCon.BeginTransaction();
m_cmd = m_dbCon.CreateCommand();
foreach (var par in m_parameters.Values)
m_cmd.Parameters.Add(par);
m_cmd.CommandText = CommandText;
}
var i = 0;
foreach (var par in m_parameters.Values)
{
par.Value = paramValues[i];
i++;
}
m_cmd.ExecuteNonQuery();
if(m_counter != m_commitMax)
{
// Do nothing
}
else
{
try
{
if(m_trans != null) m_trans.Commit();
}
catch(Exception)
{ }
finally
{
if(m_trans != null)
{
m_trans.Dispose();
m_trans = null;
}
m_counter = 0;
}
}
}
#endregion
}

Ideas on logic/algorithm and how to prevent race in threaded writes to SqlServer

I have the following logic:
public void InQueueTable(DataTable Table)
{
int incomingRows = Table.Rows.Count;
if (incomingRows >= RowsThreshold)
{
// asyncWriteRows(Table)
return;
}
if ((RowsInMemory + incomingRows) >= RowsThreshold)
{
// copy and clear internal table
// asyncWriteRows(copyTable)
}
internalTable.Merge(Table);
}
There is one problem with this lagorithm:
Given RowsThreshold = 10000
If incomingRows puts RowsInMemory
over RowsThreshold: (1)
asynchronously write out data, (2)
merge incoming data
If incomingRows is over
RowsThreshold, asynchronously write
incoming data
But what if??? Assume a second thread spins up and calls asyncWriteRows(xxxTable); also, that each thread owning the asynchronous method will be writing to the same table in SqlServer: Does SqlServer handle this sort of multi-threaded write functionality to the same table?
Follow up
Based on Greg D's suggestion:
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connectionString,
sqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.UseInternalTransaction))
{
// perform bulkcopy
}
Regardless, I still have the issue of signaling the asyncWriteRows(copyTable). The algorithm needs to determine the need to go ahead and copy internalTable, clear internalTable, and asyncWriteRows(copyTable). I think that what I need to do is move the internalTable.Copy() call to it's own method:
private DataTable CopyTable (DataTable srcTable)
{
lock (key)
{
return srcTable.Copy();
}
}
...and then the following changes to the InQueue method:
public void InQueueTable(DataTable Table)
{
int incomingRows = Table.Rows.Count;
if (incomingRows >= RowsThreshold)
{
// asyncWriteRows(Table)
return;
}
if ((RowsInMemory + incomingRows) >= RowsThreshold)
{
// copy and clear internal table
// asyncWriteRows(CopyTable(Table))
}
internalTable.Merge(Table);
}
...finally, add a callback method:
private void WriteCallback(Object iaSyncResult)
{
int rowCount = (int)iaSyncResult.AsyncState;
if (RowsInMemory >= rowCount)
{
asyncWriteRows(CopyTable(internalTable));
}
}
This is what I have determined as a solution. Any feedback?

Is there some reason you can't use transactions?

I'll admit now that I'm not an expert in this field.
With transactions and cursors you will get lock escalation if your operation is large. E.g. your operation will start locking a row, then a page then a table if it needs to, preventing other operations from functioning.
The idiot that I was assumed that SQL Server would just queue these blocked operations up and wait for locks to be released, but it just returns errors and it's up to the API programmer to keep retrying (someone correct me if I'm wrong, or if it's fixed in a later version).
If you are happy to be reading possibly old data that you then copy over, like we were, we changed our isolation mode to stop the server blocking operations unnecessarily.
ALTER DATABASE [dbname] SET READ_COMMITTED_SNAPSHOT ON;
You may also alter your insert statments to use NOLOCK. But please read up on this.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.