I need to run 4 stored procedured and create queries from data received. There is about 5k items in q container co it is 20k executions of stored procedures. I use LINQ to connect to DB and execute them and it works just great with normal foreach loop but there is one problem: code takes about one hour to complete. It is way to long so I tried to write Parrarel.ForEach instead of normal ForEach loop. Code crashes after few iterations - I guess LINQ connection just doesnt get on with Parrarel. Any ideas how to run LINQ stored procedures in multiple Threads?
var dataCollector = new EpmDataCollector();
Parallel.ForEach(q, history =>
{
try
{
var queriesBefore = dataCollector.GetQueries().Count;
var weight = dataCollector.CreateProjectQuery(history);//function executes stored procedure and creates queries from data received, then adds them to container (ConcurrentBag) in dataCollector
dataCollector.CreateHoursQuery(history);//like above
dataCollector.CreateCostQuery(history);//same
dataCollector.CreateIncomeQuery(history);//same
var log = ...
Global.log.Info(log);
//i++;
Interlocked.Increment(ref i);
if (i % 10 == 0)
{
//calculate and log estimation time
}
}
catch (Exception ex)
{
//catch code
}
});
System.Data.Linq.DataContext class is not thread safe.
Reference: https://msdn.microsoft.com/en-us/library/system.data.linq.datacontext(v=vs.110).aspx
Any public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
That's why you have to create new instance of DataContext within the ForEach loop.
Also I'd rather look into SqlBulkCopy (https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx) which is specifically designed to handle thousands of inserts.
Move the creation of your EpmDataCollector into the loop like so:
Parallel.ForEach(q, history =>
{
try
{
var dataCollector = new EpmDataCollector();
var queriesBefore = dataCollector.GetQueries().Count;
var weight = dataCollector.CreateProjectQuery(history);//function executes stored procedure and creates queries from data received, then adds them to container (ConcurrentBag) in dataCollector
dataCollector.CreateHoursQuery(history);//like above
dataCollector.CreateCostQuery(history);//same
dataCollector.CreateIncomeQuery(history);//same
var log = ...
Global.log.Info(log);
//i++;
Interlocked.Increment(ref i);
if (i % 10 == 0)
{
//calculate and log estimation time
}
}
catch (Exception ex)
{
//catch code
}
});
Related
I have a table in my SQL server/DB with close to 100,000 records. Using C# - need to loop thru the rows and make an external API call and update the results of the call into SQL table. I am new to multi-threading concept. How can I achieve this?
Here is the code I have - just sequential processing.
public void MainProcess()
{
try
{
// Retrieve rows from table
List<rowResult> rowResults = (List<rowResult>)GetRowsFromTable();
foreach (var row in rowResults)
{
callExternalAPI(row);
}
}
catch (Exception)
{
throw;
}
}
How can I modify this to enable Multi-threading? Please help
One way to do this is to use Parallel.ForEach, replace your foreach with the below which calls callExternalApi directly with the rowResult:
Parallel.ForEach(rowResults, callExternalAPI);
Or if you need to do additional processing with each row you can use:
Parallel.ForEach(rowResults, row =>
{
//Additional processing
callExternalAPI(row);
//Additional processing
});
You can also use ParallelOptions to set how many threads are used with MaxDegreeOfParallelism; though this should default to something sensible.
ParallelOptions po = new ParallelOptions()
{
MaxDegreeOfParallelism = 4
};
Parallel.ForEach(rowResults, po, callExternalAPI);
I am trying to get data from sql server using dapper. I have requirement to export 460K records stored in a Azure sql database. I decided to get data in batches, so I getting record of 10k records in each batch. I have planned to get the records in Parallel, so I added async methods to a list of task and did Task.WhenAll. The code works fine when i run locally but after deployed to k8s cluster, I am getting data read issue for some records. I am new to multi threading and I don't how to handle this issue. I tried to do a lock inside the method but the system crashes, Below is my code, the code might be clumsy because I was trying many solution to fix the issue.
for (int i = 0; i < numberOfPages; i++)
{
tableviewWithCondition.startRow = startRow;
resultData.Add(_tableviewRepository.GetTableviewRowsByPagination(tableviewExportCondition.TableviewName, modelMappingGroups, tableviewWithCondition.startRow, builder, pageSize, appName, i));
startRow += tableviewWithCondition.pageSize;
}
foreach(var task in resultData)
{
if (task != null)
{
dataToExport.AddRange(task.Result);
}
}
This is the method I implemented to get data from azure sql database using dapper.
public async Task<(IEnumerable<int> unprocessedData, IEnumerable<dynamic> rowData)> GetTableviewRowsByPagination(string tableName, IEnumerable<MappingGroup> tableviewAttributeDetails,
int startRow, SqlBuilder builder, int pageSize = 100, AppNameEnum appName = AppNameEnum.OptiSoil, int taskNumber = 1)
{
var _unitOfWork = _unitOfWorkServices.Build(appName.ToString());
List<int> unprocessedData = new List<int>();
try
{
var columns = tableviewAttributeDetails.Select(c => { return $"{c.mapping_group_value} [{c.attribute}]"; });
var joinedColumn = string.Join(",", columns);
builder.Select(joinedColumn);
var selector = builder.AddTemplate($"SELECT /**select**/ FROM {tableName} with (nolock) /**innerjoin**/ /**where**/ /**orderby**/ OFFSET {startRow} ROWS FETCH NEXT {(pageSize == 0 ? 100 : pageSize)} ROWS ONLY");
using (var connection = _unitOfWork.Connection)
{
connection.Open();
var data = await connection.QueryAsync(selector.RawSql, selector.Parameters);
Console.WriteLine($"data completed for task{taskNumber}");
return (unprocessedData, data);
}
}
catch(Exception ex)
{
Console.WriteLine($"Exception: {ex.Message}");
if (ex.InnerException != null)
Console.WriteLine($"InnerException: {ex.InnerException.Message}");
Console.WriteLine($"Error in fetching from row {startRow}");
unprocessedData.Add(startRow);
return (unprocessedData, null);
}
finally
{
_unitOfWork.Dispose();
}
}
The above code works fine locally, but in server I am getting below issue.
Exception: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 35 - An internal exception was caught).
InnerException: The WriteAsync method cannot be called when another write operation is pending.
How to avoid this issue when fetch data in parallel tasks?
You're using the same connection and trying to execute multiple commands over it (I'm assuming this because of the naming), also should you be disposing the unit of work?
Rather than :
using (var connection = _unitOfWork.Connection)
{
connection.Open();
var data = await connection.QueryAsync(selector.RawSql, selector.Parameters);
Console.WriteLine($"data completed for task{taskNumber}");
return (unprocessedData, data);
}
Create a new connection for each item, if this is what you truly want to do. I imagine, and this is an educated guess it's working locally because of timing.
Also look into Task.WhenAll it's a better way collect all the results up. Rather than :
foreach(var task in resultData)
{
if (task != null)
{
dataToExport.AddRange(task.Result);
}
}
calling result on a task is usually bad practice.
I had some issues when adding to an Entity Framework DbSet from multiple threads from inside a ConcurrentDictionary ValueFactory method. I tried to eliminate that issue by introducing a lock statement. This seems to have some strange side effects though. In some rare and random cases, my code throws a KeyNotFoundException, even though the programming should prevent that from happening. I guess that I oversee something.
using (ESBClient client = new ESBClient()) { // WCF SERVICE
client.Open();
// Limit the maximum number of parallel requests
var esbLimiter = new SemaphoreSlim(4);
ConcurrentDictionary<string, DataEntry> dataEntryDict = new ConcurrentDictionary<string, DataEntry>(
await db.DataEntries
.Where(de => allObjIDs.Contains(de.PAObjID))
.IncludeOptimized(de => de.WorkSchedules)
.ToDictionaryAsync(a => a.PAObjID, a => a)
);
// Get WorkOrderDataSet02 for each data entry number
await Task.WhenAll(allDataEntryNumbers.Batch(20).Select(async workOrderBatch => {
await esbLimiter.WaitAsync();
Debug.WriteLine($"Starting for new batch after {s.ElapsedMilliseconds} with parallel {esbLimiter.CurrentCount}");
try {
int retryCounter = 0;
getWorkOrderDataSet02Response gwoResp;
retryCurrentWorkOrderDataSetResp:
try {
gwoResp = await client.getWorkOrderDataSet02Async(
new getWorkOrderDataSet02Request(
"?",
companyGroup.Key,
string.Join(",", workOrderBatch.Select(wob => wob.DataEntryNumber)),
"WNTREIB",
"?",
"act,sales",
"D"
)
);
} catch (System.ServiceModel.CommunicationException ex) {
// Retry up to 3 times before finally crashing
if (retryCounter++ < 3) {
await HandleServiceRetryError("getWorkOrderDataSet02Async", retryCounter, s.ElapsedMilliseconds, ex);
goto retryCurrentWorkOrderDataSetResp;
} else
throw;
}
// Iterate over all work orders returned by the ESB
foreach (dsyWorkOrder01TtyWorkOrder currDetail in gwoResp.dsyWorkOrder01) { // dsyWorkOrder01 IS AN ARRAY OF OBJECTS. IT COMES FROM A WCF CALL. PAObjID IS UNIQUE.
// Get or create element
DataEntry currentEntry = dataEntryDict.GetOrAdd(
currDetail.Obj,
key => {
DataEntry newDe = new DataEntry();
lock (db.DataEntries) { // I INTRODUCED THOSE LOCK STATEMENTS
db.DataEntries.Add(newDe); // THIS IS THE LINE THAT WAS PROBLEMATIC IN THE FIRST PLACE
}
return newDe;
}
);
// Set regular fields
currentEntry.ApplyTtyWorkOrder(currDetail, resourceDict); // THIS METHOD APPLIES THE PAObjID PROPERTY
}
// Delete all elements, that were not provided by the service anymore
lock(db.DataEntries) {
workOrderBatch
.Where(wob => !gwoResp.dsyWorkOrder01
.Where(wo => wo.DataEntryNumber.HasValue)
.Select(wo => wo.DataEntryNumber.Value)
.Contains(wob.DataEntryNumber)
)
.ToArray()
.ForEach(dataEntry => {
try {
db.DataEntries.Remove(dataEntryDict[dataEntry.ObjID]); // THIS LINE THROWS THE KeyNotFoundException
} catch (Exception ex) {
throw new Exception($"Key {dataEntry.ObjID} not in list.", ex);
}
});
}
// Update progress
progress.Report(.1f + totalSteps * Interlocked.Increment(ref currentStep) * .8f);
} finally {
Debug.WriteLine($"Finished for batch after {s.ElapsedMilliseconds} with parallel {esbLimiter.CurrentCount}");
esbLimiter.Release();
}
}));
}
// HERE'S THE APPLY METHOD
public void ApplyTtyWorkOrder(dsyWorkOrder01TtyWorkOrder src, Dictionary<(string Name, byte ResourceType), int> resourceDict) {
Deleted = false;
DataEntryNumber = src.DataEntryNumber.Value;
PAObjID = src.Obj; // PAObjID IS APPLIED HERE
IsHeader = src.IsHeader;
Pieces = Convert.ToInt16(src.ProductionQty);
PartNo = src.Article;
JobNo = src.WorkOrder;
StartDate = src.StartDate;
FinishDate = src.EndDate;
FinishedPA = src.WorkOrderStatus == "R";
// Update methods
UpdateFromTtyCustomer(src.ttyCustomer?.FirstOrDefault());
UpdateFromPart(src.ttyPart?.FirstOrDefault());
UpdateFromSalesDocHeader(src.ttySalesDocHeader?.FirstOrDefault());
UpdateWorkSchedules(src.ttyWorkOrderActivity, resourceDict);
}
I added an UPPERCASE comment to every line I would consider relevant.
I have no idea why this error happens. From my understanding, I only try to get an entry from the dataEntryDict dictionary dataEntry.ObjID keys that I've added before in the same iteration of the loop.
Before I introduced the two lock statements, the line marked with "THIS IS THE LINE THAT WAS PROBLEMATIC IN THE FIRST PLACE" throw an exception sporadically: "Collection was modified; enumeration operation may not execute." After digging into the code of EF, I realized that this should have something to do with the way how the DbSet.Add method is implemented.
Are there any known side effects when using a lock statement inside the ValueFactory?
lock (db.DataEntries) { // I INTRODUCED THOSE LOCK STATEMENTS
db.DataEntries.Add(newDe); // THIS IS THE LINE THAT WAS PROBLEMATIC IN THE FIRST PLACE
}
The issue is that db.DataEntries is not a thread-safe collection but it is being accessed concurrently by multiple threads. All EF objects are not thread-safe.
Using locking seems like a good solution here. Make sure that you catch all the places.
It is often better to split the concurrent part off from the sequential part. Make only the client.getWorkOrderDataSet02Async call concurrent and collect the results in a collection. Then, process the results sequentially.
I need to make the code below atomic/fail or succeed as a single unit. How could I go about achieving that?
void Processor(Input input)
{
var mapper = new Mapper(recordDetails);
int remainingRecords = GetCountForRemainingRecords(recordDetails);
try
{
while (remainingRecords > 0)
{
mapper.CreateRecords(dataset);
Validate(dataset);
//the Save(dataset) uses SqlBulkCopy maps tables, transaction, and saves it..
Save(dataset);
//I cannot perform the operation below on the dataset directly because dataset doesn't have the records that is in the database
//the method below eventually calls a stored proc that sends a list of users that was recently created
OutdateDuplicateUsers(dataset.userTable);
remainingRecords = MethodToGetUpdatedCount();
}
}
catch (Exception exception)
{
//exception handler..
}
}
Now if my OutdateDuplicateUsers throws an exception, I would still end up with the accounts that Save method persisted. I do not want that to happen.
I want both Save and OutdateDuplicateUsers method to be atomic. I read about this great article about TransactionScope and seemed it is exactly what I want. However, I could not get it to work. The implementation seems straight forward reading from the article, but I couldn't get it working myself.
What I tried:
void Processor(Input input)
{
var mapper = new Mapper(recordDetails);
int remainingRecords = GetCountForRemainingRecords(recordDetails);
try
{
while (remainingRecords > 0)
{
using (var scope = new TransactionScope())
{
try
{
mapper.CreateRecords(dataset);
Validate(dataset);
//the method Save(dataset) is using SqlBulkCopy; maps tables, uses transaction, and saves it..
Save(dataset);
//I cannot perform this opertaion on the dataset directly because dataset doesn't have the records that is in the database
//the method below eventually calls a stored proc that sends a list of users that was recently created
OutdateDuplicateUsers(dataset.userTable);
remainingRecords = MethodToGetUpdatedCount();
scope.Complete();
}
catch (Exception)
{
//not both at the same time. I tried using both, one at a time though.
TransactionScope.Dispose();
TransactionScope.Current.Rollback();
//exception handler
}
}
}
}
}
update:
The dataset is a strongly typed dataset and is schema only. The CreateRecords and Validate method populates the data based on the business logic. The 'mapper' takes in recordDetails which is, for instance, a list of Users (updated the snippet).
What I mean by doesn't work is that if OutdateDuplicateUser() method throws an exception and cannot complete the outdating operation, I could still see that the records have been persisted in the database from Save(dataset) method, which I am trying to prevent.
I have an application that, before is creates a thread it calls the database to pull X amount of records. When the records are retrieved from the database a locked flag is set so those records are not pulled again.
Once a thread has completed it will pull some more records form that database. When I call the database from a thread should I set a lock on that section of code so it is called only by that thread at that time? Here is an exmaple of my code (I commented in the area where I have the lock):
private void CreateThreads()
{
for(var i = 1; i <= _threadCount; i++)
{
var adapter = new Dystopia.DataAdapter();
var records = adapter.FindAllWithLocking(_recordsPerThread,_validationId,_validationDateTime);
if(records != null && records.Count > 0)
{
var paramss = new ArrayList { i, records };
ThreadPool.QueueUserWorkItem(ThreadWorker, paramss);
}
this.Update();
}
}
private void ThreadWorker(object paramList)
{
try
{
var parms = (ArrayList) paramList;
var stopThread = false;
var threadCount = (int) parms[0];
var records = (List<Candidates>) parms[1];
var runOnce = false;
var adapter = new Dystopia.DataAdapter();
var lastCount = records.Count;
var runningCount = 0;
while (_stopThreads == false)
{
if (records.Count > 0)
{
foreach (var record in records)
{
var proc = new ProcRecords();
proc.Validate(ref rec);
adapter.Update(rec);
if (_stopThreads)
{
break;
}
}
//This is where I think I may need to sync the threads.
//Is this correct?
lock(this){
records = adapter.FindAllWithLocking;
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
SQL to Pull records:
WITH cte AS (
SELECT TOP (#topCount) *
FROM Candidates WITH (READPAST)
WHERE
isLocked = 0 and
isTested = 0 and
validated = 0
)
UPDATE cte
SET
isLocked = 1,
validationID = #validationId,
validationDateTime = #validationDateTime
OUTPUT INSERTED.*;
You shouldn't need to lock your threads as the database should be doing this on the request for you.
I see a few issues.
First, you are testing _stopThreads == false, but you have not revealed whether this a volatile read. Read the second of half this answer for a good description of what I am talking about.
Second, the lock is pointless because adapter is a local reference to a non-shared object and records is a local reference which just being replaced. I am assuming that the adapter makes a separate connection to the database, but if it shares an existing connection then some type of synchronization may need to take place since ADO.NET connection objects are not typically thread-safe.
Now, you probably will need locking somewhere to publish the results from the work item. I do not see where the results are being published to the main thread so I cannot offer any guidance here.
By the way, I would avoid showing a message box from a ThreadPool thread. The reason being that this will hang that thread until the message box closes.
You shouldn't lock(this) since its really easy for you to create deadlocks you should create a separate lock object. if you search for "lock(this)" you can find numerous articles on why.
Here's an SO question on lock(this)