I'm using this to spin up threads that either insert or delete document from a DocumentDB collection.
It works, but I am not exactly sure how I'm supposed to know how many threads I can spin.
Sometimes, it works with maxThreads at 7, above that I'll quickly get the Request rate is large error. But sometimes, even at 3 threads I'll get the same error.
So this is obviously not very scientific.
I guess I would have to monitor how many RUs I've used after each calls and perhaps throttle the logic for a couple of miliseconds.
Any ideas ?
public class MultiThreadOperations<T> where T : IDocumentModel
{
List<T> Documents = new List<T>();
CollectionDB<T> Collection;
OperationType OperationType;
List<Task> AllTasks = new List<Task>();
public MultiThreadOperations(List<T> documents, CollectionDB<T> Collection, OperationType opType)
{
this.Collection = Collection;
Documents = documents;
OperationType = opType;
}
public async Task Start()
{
var maxThreads = 2;
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxThreads))
{
foreach (T doc in Documents)
{
concurrencySemaphore.Wait();
var t = Task.Run(async () =>
{
try
{
switch (OperationType)
{
case OperationType.Create:
await InsertDocument(doc);
break;
case OperationType.Delete:
await DeleteDocument(doc);
break;
}
}
finally
{
concurrencySemaphore.Release();
}
});
AllTasks.Add(t);
}
await Task.WhenAll(AllTasks.ToArray());
}
}
private async Task InsertDocument(T item)
{
await Collection.CreateAsync(item);
}
private async Task DeleteDocument(T item)
{
await Collection.DeleteFromId(item.Id);
}
}
It depends on the following factors:
Let's say the number of request units per single create/delete operation (RUs) is X RUs
The latency/duration per request is N. Within the same region, this is ~5ms, but across the network, it could be RTT (round trip time) + 5ms.
Then each thread can perform X * 1/N RUs per second
If your collection is provisioned with T RU/s, then you need the number of threads = T / (X * 1/N)
For example, within the same Azure region, if you had 10,000 RU/s, say each create or delete takes 5 RUs, and the network latency is 5ms. This means each thread can perform 1000/5 = 200 writes/second = 200 * 5 RU/s = 1000 RU/s. Therefore you need 10 threads to reach 10,000 RU/s.
Let's say, you're running the same test from a VM in Europe accessing an account in East US. The network lag is ~100ms. This means that each thread can perform ~10 requests/sec = 50 RU/s. Therefore, you need 200 threads to reach the same 10,000 RU/s.
Related
I am trying to create a producer/consumer TPL dataflow process. As part of it, I will be creating multiple producer tasks for different id range which will generate the data records for processing. So I am planning to throttle the number of threads that will connect to database to get the data and be active at same time.
However, the method which extracts data and sends it to BufferBlock has a return type so that I can get the count of records extracted. So I am unable to figure out where to call the Release method from the SemaphoreSlim class, but still be able to get a return value? Below is the sample code that I am using. Can someone please suggest any work around for this?
private Task<KeyRange>[] DataExtractProducer(ITargetBlock<DataRow[]> targetBuffer, ExtractQueryConfiguration QueryConf)
{
CancellationToken cancelToken = cancelTokenSrc.Token;
var tasks = new List<Task<KeyRange>>();
int taskCount = 0;
int maxThreads = QueryConf.MaxThreadsLimit > 0 ? QueryConf.MaxThreadsLimit : DataExtConstants.DefaultMaxThreads;
using (SemaphoreSlim concurrency = new SemaphoreSlim(maxThreads))
{
foreach (KeyRange range in keyRangeList)
{
concurrency.WaitAsync();
var task = Task.Run(() =>
{
Console.WriteLine(MsgConstants.StartKeyRangeExtract, QueryConf.KeyColumn, range.StartValue, range.EndValue);
//concurrency.Release();
return GetDataTask(targetBuffer, range, QueryConf.ExtractQuery);
}, cancelToken);
tasks.Add(task);
taskCount++;
}
}
Console.WriteLine(MsgConstants.TaskCountMessage, taskCount);
return tasks.ToArray<Task<KeyRange>>();
}
Edit: I tried this variant also. But this does not seem to work. I tried with limit of 20. But I see more than 50 DB connections going out. Eventually, I am hitting high memory consumption because of the unthrottled connections.
using (SemaphoreSlim concurrency = new SemaphoreSlim(maxThreads))
{
foreach (KeyRange range in keyRangeList)
{
concurrency.WaitAsync();
var task = Task.Run(() =>
{
Console.WriteLine(MsgConstants.StartKeyRangeExtract, QueryConf.KeyColumn, range.StartValue, range.EndValue);
var temptask = await GetDataTask(targetBuffer, range, QueryConf.ExtractQuery);
concurrency.Release();
return temptask;
}, cancelToken);
tasks.Add(task);
taskCount++;
}
}
I am working on improving some of my code to increase efficiency. In the original code I was limiting the number of threads allowed to be 5, and if I had already 5 active threads I would wait until one finished before starting another one. Now I want to modify this code to allow any number of threads, but I want to be able to make sure that only 5 threads get started every second. For example:
Second 0 - 5 new threads
Second 1 - 5 new threads
Second 2 - 5 new threads ...
Original Code (cleanseDictionary contains usually thousands of items):
ConcurrentDictionary<long, APIResponse> cleanseDictionary = new ConcurrentDictionary<long, APIResponse>();
ConcurrentBag<int> itemsinsec = new ConcurrentBag<int>();
ConcurrentDictionary<long, string> resourceDictionary = new ConcurrentDictionary<long, string>();
DateTime start = DateTime.Now;
Parallel.ForEach(resourceDictionary, new ParallelOptions { MaxDegreeOfParallelism = 5 }, row =>
{
lock (itemsinsec)
{
ThrottleAPIRequests(itemsinsec, start);
itemsinsec.Add(1);
}
cleanseDictionary.TryAdd(row.Key, _helper.MakeAPIRequest(string.Format("/endpoint?{0}", row.Value)));
});
private static void ThrottleAPIRequests(ConcurrentBag<int> itemsinsec, DateTime start)
{
if ((start - DateTime.Now).Milliseconds < 10001 && itemsinsec.Count > 4)
{
System.Threading.Thread.Sleep(1000 - (start - DateTime.Now).Milliseconds);
start = DateTime.Now;
itemsinsec = new ConcurrentBag<int>();
}
}
My first thought was increase the MaxDegreeofParallelism to something much higher and then have a helper method that will limit only 5 threads in a second, but I am not sure if that is the best way to do it and if it is, I would probably need a lock around that step?
Thanks in advance!
EDIT
I am actually looking for a way to throttle the API Requests rather than the actual threads. I was thinking they were one in the same.
Edit 2: My requirements are to send over 5 API requests every second
"Parallel.ForEach" from the MS website
may run in parallel
If you want any degree of fine control over how the threads are managed, this is not the way.
How about creating your own helper class where you can queue jobs with a group id, allows you to wait for all jobs of group id X to complete, and it spawns extra threads as and when required?
For me the best solution is:
using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;
namespace SomeNamespace
{
public class RequestLimiter : IRequestLimiter
{
private readonly ConcurrentQueue<DateTime> _requestTimes;
private readonly TimeSpan _timeSpan;
private readonly object _locker = new object();
public RequestLimiter()
{
_timeSpan = TimeSpan.FromSeconds(1);
_requestTimes = new ConcurrentQueue<DateTime>();
}
public TResult Run<TResult>(int requestsOnSecond, Func<TResult> function)
{
WaitUntilRequestCanBeMade(requestsOnSecond).Wait();
return function();
}
private Task WaitUntilRequestCanBeMade(int requestsOnSecond)
{
return Task.Factory.StartNew(() =>
{
while (!TryEnqueueRequest(requestsOnSecond).Result) ;
});
}
private Task SynchronizeQueue()
{
return Task.Factory.StartNew(() =>
{
_requestTimes.TryPeek(out var first);
while (_requestTimes.Count > 0 && (first.Add(_timeSpan) < DateTime.UtcNow))
_requestTimes.TryDequeue(out _);
});
}
private Task<bool> TryEnqueueRequest(int requestsOnSecond)
{
lock (_locker)
{
SynchronizeQueue().Wait();
if (_requestTimes.Count < requestsOnSecond)
{
_requestTimes.Enqueue(DateTime.UtcNow);
return Task.FromResult(true);
}
return Task.FromResult(false);
}
}
}
}
I want to be able to send over 5 API request every second
That's really easy:
while (true) {
await Task.Delay(TimeSpan.FromSeconds(1));
await Task.WhenAll(Enumerable.Range(0, 5).Select(_ => RunRequestAsync()));
}
Maybe not the best approach since there will be a burst of requests. This is not continuous.
Also, there is timing skew. One iteration takes more than 1 second. This can be solved with a few lines of time logic.
I'm implementing a Redis caching layer using the Stackexchange Redis client and the performance right now is bordering on unusable.
I have a local environment where the web application and the redis server are running on the same machine. I ran the Redis benchmark test against my Redis server and the results were actually really good (I'm just including set and get operations in my write up):
C:\Program Files\Redis>redis-benchmark -n 100000
====== PING_INLINE ======
100000 requests completed in 0.88 seconds
50 parallel clients
3 bytes payload
keep alive: 1
====== SET ======
100000 requests completed in 0.89 seconds
50 parallel clients
3 bytes payload
keep alive: 1
99.70% <= 1 milliseconds
99.90% <= 2 milliseconds
100.00% <= 3 milliseconds
111982.08 requests per second
====== GET ======
100000 requests completed in 0.81 seconds
50 parallel clients
3 bytes payload
keep alive: 1
99.87% <= 1 milliseconds
99.98% <= 2 milliseconds
100.00% <= 2 milliseconds
124069.48 requests per second
So according to the benchmarks I am looking at over 100,000 sets and 100,000 gets, per second. I wrote a unit test to do 300,000 set/gets:
private string redisCacheConn = "localhost:6379,allowAdmin=true,abortConnect=false,ssl=false";
[Fact]
public void PerfTestWriteShortString()
{
CacheManager cm = new CacheManager(redisCacheConn);
string svalue = "t";
string skey = "testtesttest";
for (int i = 0; i < 300000; i++)
{
cm.SaveCache(skey + i, svalue);
string valRead = cm.ObtainItemFromCacheString(skey + i);
}
}
This uses the following class to perform the Redis operations via the Stackexchange client:
using StackExchange.Redis;
namespace Caching
{
public class CacheManager:ICacheManager, ICacheManagerReports
{
private static string cs;
private static ConfigurationOptions options;
private int pageSize = 5000;
public ICacheSerializer serializer { get; set; }
public CacheManager(string connectionString)
{
serializer = new SerializeJSON();
cs = connectionString;
options = ConfigurationOptions.Parse(connectionString);
options.SyncTimeout = 60000;
}
private static readonly Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(options));
private static ConnectionMultiplexer Connection => lazyConnection.Value;
private static IDatabase cache => Connection.GetDatabase();
public string ObtainItemFromCacheString(string cacheId)
{
return cache.StringGet(cacheId);
}
public void SaveCache<T>(string cacheId, T cacheEntry, TimeSpan? expiry = null)
{
if (IsValueType<T>())
{
cache.StringSet(cacheId, cacheEntry.ToString(), expiry);
}
else
{
cache.StringSet(cacheId, serializer.SerializeObject(cacheEntry), expiry);
}
}
public bool IsValueType<T>()
{
return typeof(T).IsValueType || typeof(T) == typeof(string);
}
}
}
My JSON serializer is just using Newtonsoft.JSON:
using System.Collections.Generic;
using Newtonsoft.Json;
namespace Caching
{
public class SerializeJSON:ICacheSerializer
{
public string SerializeObject<T>(T cacheEntry)
{
return JsonConvert.SerializeObject(cacheEntry, Formatting.None,
new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
}
public T DeserializeObject<T>(string data)
{
return JsonConvert.DeserializeObject<T>(data, new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
}
}
}
My test times are around 21 seconds (for 300,000 sets and 300,000 gets). This gives me around 28,500 operations per second (at least 3 times slower than I would expect using the benchmarks). The application I am converting to use Redis is pretty chatty and certain heavy requests can approximate 200,000 total operations against Redis. Obviously I wasn't expecting anything like the same times I was getting when using the system runtime cache, but the delays after this change are significant. Am I doing something wrong with my implementation and does anyone know why my benchmarked figures are so much faster than my Stackechange test figures?
Thanks,
Paul
My results from the code below:
Connecting to server...
Connected
PING (sync per op)
1709ms for 1000000 ops on 50 threads took 1.709594 seconds
585137 ops/s
SET (sync per op)
759ms for 500000 ops on 50 threads took 0.7592914 seconds
658761 ops/s
GET (sync per op)
780ms for 500000 ops on 50 threads took 0.7806102 seconds
641025 ops/s
PING (pipelined per thread)
3751ms for 1000000 ops on 50 threads took 3.7510956 seconds
266595 ops/s
SET (pipelined per thread)
1781ms for 500000 ops on 50 threads took 1.7819831 seconds
280741 ops/s
GET (pipelined per thread)
1977ms for 500000 ops on 50 threads took 1.9772623 seconds
252908 ops/s
===
Server configuration: make sure persistence is disabled, etc
The first thing you should do in a benchmark is: benchmark one thing. At the moment you're including a lot of serialization overhead, which won't help get a clear picture. Ideally, for a like-for-like benchmark, you should be using a 3-byte fixed payload, because:
3 bytes payload
Next, you'd need to look at parallelism:
50 parallel clients
It isn't clear whether your test is parallel, but if it isn't we should absolutely expect to see less raw throughput. Conveniently, SE.Redis is designed to be easy to parallelize: you can just spin up multiple threads talking to the same connection (this actually also has the advantage of avoiding packet fragmentation, as you can end up with multiple messages per packet, where-as a single-thread sync approach is guaranteed to use at most one message per packet).
Finally, we need to understand what the listed benchmark is doing. Is it doing:
(send, receive) x n
or is it doing
send x n, receive separately until all n are received
? Both options are possible. Your sync API usage is the first one, but the second test is equally well-defined, and for all I know: that's what it is measuring. There are two ways of simulating this second setup:
send the first (n-1) messages with the "fire and forget" flag, so you only actually wait for the last one
use the *Async API for all messages, and only Wait() or await the last Task
Here's a benchmark that I used in the above, that shows both "sync per op" (via the sync API) and "pipeline per thread" (using the *Async API and just waiting for the last task per thread), both using 50 threads:
using StackExchange.Redis;
using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
static class P
{
static void Main()
{
Console.WriteLine("Connecting to server...");
using (var muxer = ConnectionMultiplexer.Connect("127.0.0.1"))
{
Console.WriteLine("Connected");
var db = muxer.GetDatabase();
RedisKey key = "some key";
byte[] payload = new byte[3];
new Random(12345).NextBytes(payload);
RedisValue value = payload;
DoWork("PING (sync per op)", db, 1000000, 50, x => { x.Ping(); return null; });
DoWork("SET (sync per op)", db, 500000, 50, x => { x.StringSet(key, value); return null; });
DoWork("GET (sync per op)", db, 500000, 50, x => { x.StringGet(key); return null; });
DoWork("PING (pipelined per thread)", db, 1000000, 50, x => x.PingAsync());
DoWork("SET (pipelined per thread)", db, 500000, 50, x => x.StringSetAsync(key, value));
DoWork("GET (pipelined per thread)", db, 500000, 50, x => x.StringGetAsync(key));
}
}
static void DoWork(string action, IDatabase db, int count, int threads, Func<IDatabase, Task> op)
{
object startup = new object(), shutdown = new object();
int activeThreads = 0, outstandingOps = count;
Stopwatch sw = default(Stopwatch);
var threadStart = new ThreadStart(() =>
{
lock(startup)
{
if(++activeThreads == threads)
{
sw = Stopwatch.StartNew();
Monitor.PulseAll(startup);
}
else
{
Monitor.Wait(startup);
}
}
Task final = null;
while (Interlocked.Decrement(ref outstandingOps) >= 0)
{
final = op(db);
}
if (final != null) final.Wait();
lock(shutdown)
{
if (--activeThreads == 0)
{
sw.Stop();
Monitor.PulseAll(shutdown);
}
}
});
lock (shutdown)
{
for (int i = 0; i < threads; i++)
{
new Thread(threadStart).Start();
}
Monitor.Wait(shutdown);
Console.WriteLine($#"{action}
{sw.ElapsedMilliseconds}ms for {count} ops on {threads} threads took {sw.Elapsed.TotalSeconds} seconds
{(count * 1000) / sw.ElapsedMilliseconds} ops/s");
}
}
}
You are fetching data in synchronous way (50 clients in parallel but each client's requests are made synchronously instead of asynchronously)
One option would be to use the async/await methods (StackExchange.Redis support that).
If you need to get multiple keys at once (for example to build a daily graph of visitors to your website assuming you save visitors counter per day keys) then you should try fetching data from redis in asynchronous manner using redis pipelining, this should give you much better performance.
StackExchange redis client old versions have performance issues.
Upgrade to the newest version. Read more here:
https://www.gitmemory.com/issue/mgravell/Pipelines.Sockets.Unofficial/28/479932064
and in this article:
https://blog.marcgravell.com/2019/02/fun-with-spiral-of-death.html
this is the issue in the repo:
https://github.com/StackExchange/StackExchange.Redis/issues/1003
I am using the HTTPClient in System.Net.Http to make requests against an API. The API is limited to 10 requests per second.
My code is roughly like so:
List<Task> tasks = new List<Task>();
items..Select(i => tasks.Add(ProcessItem(i));
try
{
await Task.WhenAll(taskList.ToArray());
}
catch (Exception ex)
{
}
The ProcessItem method does a few things but always calls the API using the following:
await SendRequestAsync(..blah). Which looks like:
private async Task<Response> SendRequestAsync(HttpRequestMessage request, CancellationToken token)
{
token.ThrowIfCancellationRequested();
var response = await HttpClient
.SendAsync(request: request, cancellationToken: token).ConfigureAwait(continueOnCapturedContext: false);
token.ThrowIfCancellationRequested();
return await Response.BuildResponse(response);
}
Originally the code worked fine but when I started using Task.WhenAll I started getting 'Rate Limit Exceeded' messages from the API. How can I limit the rate at which requests are made?
Its worth noting that ProcessItem can make between 1-4 API calls depending on the item.
The API is limited to 10 requests per second.
Then just have your code do a batch of 10 requests, ensuring they take at least one second:
Items[] items = ...;
int index = 0;
while (index < items.Length)
{
var timer = Task.Delay(TimeSpan.FromSeconds(1.2)); // ".2" to make sure
var tasks = items.Skip(index).Take(10).Select(i => ProcessItemsAsync(i));
var tasksAndTimer = tasks.Concat(new[] { timer });
await Task.WhenAll(tasksAndTimer);
index += 10;
}
Update
My ProcessItems method makes 1-4 API calls depending on the item.
In this case, batching is not an appropriate solution. You need to limit an asynchronous method to a certain number, which implies a SemaphoreSlim. The tricky part is that you want to allow more calls over time.
I haven't tried this code, but the general idea I would go with is to have a periodic function that releases the semaphore up to 10 times. So, something like this:
private readonly SemaphoreSlim _semaphore = new SemaphoreSlim(10);
private async Task<Response> ThrottledSendRequestAsync(HttpRequestMessage request, CancellationToken token)
{
await _semaphore.WaitAsync(token);
return await SendRequestAsync(request, token);
}
private async Task PeriodicallyReleaseAsync(Task stop)
{
while (true)
{
var timer = Task.Delay(TimeSpan.FromSeconds(1.2));
if (await Task.WhenAny(timer, stop) == stop)
return;
// Release the semaphore at most 10 times.
for (int i = 0; i != 10; ++i)
{
try
{
_semaphore.Release();
}
catch (SemaphoreFullException)
{
break;
}
}
}
}
Usage:
// Start the periodic task, with a signal that we can use to stop it.
var stop = new TaskCompletionSource<object>();
var periodicTask = PeriodicallyReleaseAsync(stop.Task);
// Wait for all item processing.
await Task.WhenAll(taskList);
// Stop the periodic task.
stop.SetResult(null);
await periodicTask;
The answer is similar to this one.
Instead of using a list of tasks and WhenAll, use Parallel.ForEach and use ParallelOptions to limit the number of concurrent tasks to 10, and make sure each one takes at least 1 second:
Parallel.ForEach(
items,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
ProcessItems(item);
await Task.Delay(1000);
}
);
Or if you want to make sure each item takes as close to 1 second as possible:
Parallel.ForEach(
searches,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
var watch = new Stopwatch();
watch.Start();
ProcessItems(item);
watch.Stop();
if (watch.ElapsedMilliseconds < 1000) await Task.Delay((int)(1000 - watch.ElapsedMilliseconds));
}
);
Or:
Parallel.ForEach(
searches,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
await Task.WhenAll(
Task.Delay(1000),
Task.Run(() => { ProcessItems(item); })
);
}
);
UPDATED ANSWER
My ProcessItems method makes 1-4 API calls depending on the item. So with a batch size of 10 I still exceed the rate limit.
You need to implement a rolling window in SendRequestAsync. A queue containing timestamps of each request is a suitable data structure. You dequeue entries with a timestamp older than 10 seconds. As it so happens, there is an implementation as an answer to a similar question on SO.
ORIGINAL ANSWER
May still be useful to others
One straightforward way to handle this is to batch your requests in groups of 10, run those concurrently, and then wait until a total of 10 seconds has elapsed (if it hasn't already). This will bring you in right at the rate limit if the batch of requests can complete in 10 seconds, but is less than optimal if the batch of requests takes longer. Have a look at the .Batch() extension method in MoreLinq. Code would look approximately like
foreach (var taskList in tasks.Batch(10))
{
Stopwatch sw = Stopwatch.StartNew(); // From System.Diagnostics
await Task.WhenAll(taskList.ToArray());
if (sw.Elapsed.TotalSeconds < 10.0)
{
// Calculate how long you still have to wait and sleep that long
// You might want to wait 10.5 or 11 seconds just in case the rate
// limiting on the other side isn't perfectly implemented
}
}
https://github.com/thomhurst/EnumerableAsyncProcessor
I've written a library to help with this sort of logic.
Usage would be:
var responses = await AsyncProcessorBuilder.WithItems(items) // Or Extension Method: items.ToAsyncProcessorBuilder()
.SelectAsync(item => ProcessItem(item), CancellationToken.None)
.ProcessInParallel(levelOfParallelism: 10, TimeSpan.FromSeconds(1));
I have a simulation that generates data which must be saved to database.
ParallelLoopResult res = Parallel.For(0, 1000000, options, (r, state) =>
{
ComplexDataSet cds = GenerateData(r);
SaveDataToDatabase(cds);
});
The simulation generates a whole lot of data, so it wouldn't be practical to first generate it and then save it to database (up to 1 GB of data) and it also wouldn't make sense to save it to database one by one (too small transanctions to be practical). I want to insert them to database as a batch insert of controlled size (say 100 with one commit).
However, I think my knowledge of parallel computing is less that theoretical. I came up with this (which as you can see is very flawed):
DataBuffer buffer = new DataBuffer(...);
ParallelLoopResult res = Parallel.For(0, 10000000, options, (r, state) =>
{
ComplexDataSet cds = GenerateData(r);
buffer.SaveDataToBuffer(cds, i == r - 1);
});
public class DataBuffer
{
int count = 0;
int limit = 100
object _locker = new object();
ConcurrentQueue<ConcurrentBag<ComplexDataSet>> ComplexDataBagQueue{ get; set; }
public void SaveDataToBuffer(ComplexDataSet data, bool isfinalcycle)
{
lock (_locker)
{
if(count >= limit)
{
ConcurrentBag<ComplexDataSet> dequeueRef;
if(ComplexDataBagQueue.TryDequeue(out dequeueRef))
{
Commit(dequeueRef);
}
_lastItemRef = new ConcurrentBag<ComplexDataSet>{data};
ComplexDataSetsQueue.Enqueue(_lastItemRef);
count = 1;
}
else
{
// First time
if(_lastItemRef == null)
{
_lastItemRef = new ConcurrentBag<ComplexDataSet>{data};
ComplexDataSetsQueue.Enqueue(_lastItemRef);
count = 1;
}
// If buffer isn't full
else
{
_lastItemRef.Add(data);
count++;
}
}
if(isfinalcycle)
{
// Commit everything that hasn't been committed yet
ConcurrentBag<ComplexDataSet> dequeueRef;
while (ComplexDataSetsQueue.TryDequeue(out dequeueRef))
{
Commit(dequeueRef);
}
}
}
}
public void Commit(ConcurrentBag<ComplexDataSet> data)
{
// Commit data to database..should this be somehow in another thread or something ?
}
}
As you can see, I'm using queue to create a buffer and then manually decide when to commit. However I have a strong feeling that this isn't very performing solution to my problem. First, I'm unsure whether I'm doing locking right. Second, I'm not sure even if this is fully thread-safe (or at all).
Can you please take a look for a moment and comment what should I do differently ? Or if there is a complitely better way of doing this (using somekind of Producer-Consumer technique or something) ?
Thanks and best wishes,
D.
There is no need to use locks or expensive concurrency-safe data structures. The data is all independent, so introducing locking and sharing will only hurt performance and scalability.
Parallel.For has an overload that lets you specify per-thread data. In this you can store a private queue and private database connection.
Also: Parallel.For internally partitions your range into smaller chunks. It's perfectly efficient to pass it a huge range, so nothing to change there.
Parallel.For(0, 10000000, () => new ThreadState(),
(i, loopstate, threadstate) =>
{
ComplexDataSet data = GenerateData(i);
threadstate.Add(data);
return threadstate;
}, threadstate => threadstate.Dispose());
sealed class ThreadState : IDisposable
{
readonly IDisposable db;
readonly Queue<ComplexDataSet> queue = new Queue<ComplexDataSet>();
public ThreadState()
{
// initialize db with a private MongoDb connection.
}
public void Add(ComplexDataSet cds)
{
queue.Enqueue(cds);
if(queue.Count == 100)
{
Commit();
}
}
void Commit()
{
db.Write(queue);
queue.Clear();
}
public void Dispose()
{
try
{
if(queue.Count > 0)
{
Commit();
}
}
finally
{
db.Dispose();
}
}
}
Now, MongoDb currently doesn't support truly concurrent inserts -- it holds some expensive locks in the server, so parallel commits won't gain you much (if any) speed. They want to fix this in the future, so you might get a free speed-up one day.
If you need to limit the number of database connections held, a producer/consumer setup is a good alternative. You can use a BlockingCollection queue to do this efficiently without using any locks:
// Specify a maximum of 1000 items in the collection so that we don't
// run out of memory if we get data faster than we can commit it.
// Add() will wait if it is full.
BlockingCollection<ComplexDataSet> commits =
new BlockingCollection<ComplexDataSet>(1000);
Task consumer = Task.Factory.StartNew(() =>
{
// This is the consumer. It processes the
// "commits" queue until it signals completion.
while(!commits.IsCompleted)
{
ComplexDataSet cds;
// Timeout of -1 will wait for an item or IsCompleted == true.
if(commits.TryTake(out cds, -1))
{
// Got at least one item, write it.
db.Write(cds);
// Continue dequeuing until the queue is empty, where it will
// timeout instantly and return false, or until we've dequeued
// 100 items.
for(int i = 1; i < 100 && commits.TryTake(out cds, 0); ++i)
{
db.Write(cds);
}
// Now that we're waiting for more items or have dequeued 100
// of them, commit. More can be continue to be added to the
// queue by other threads while this commit is processing.
db.Commit();
}
}
}, TaskCreationOptions.LongRunning);
try
{
// This is the producer.
Parallel.For(0, 1000000, i =>
{
ComplexDataSet data = GenerateData(i);
commits.Add(data);
});
}
finally // put in a finally to ensure the task closes down.
{
commits.CompleteAdding(); // set commits.IsFinished = true.
consumer.Wait(); // wait for task to finish committing all the items.
}
In your example you have 10 000 000 packages of work. Each of this needs to be distributed to a thread.
Assuming you don't have a really large number of cpu cores this is not optimal. You also have to synchronize your threads on every buffer.SaveDataToBuffer call (by using locks). Additionally you should be aware that the variable r isn't necessarly increased by one in a chronology view (example: Thread1 executes r with 1,2,3 and Thread2 with 4,5,6. Chronological this would lead to the following sequence of r passed to SaveDataToBuffer 1,4,2,5,3,6 (approximately)).
I would make the packages of work larger and then commit each package at once. This has also the benefit that you don't have to lock/synchronize all to often.
Here's an example:
int total = 10000000;
int step = 1000;
Parallel.For(0, total / step, (r, state) =>
{
int start = r * start;
int end = start + step;
ComplexDataSet[] result = new ComplexDataSet[step];
for (int i = start; i < end; i++)
{
result[i - start] = GenerateData(i);
}
Commit(result);
});
In this example the whole work is split into 10 000 packages (which are executed in parallel) and every package generates 1000 data items and commits them to the database.
With this solution the Commit method might be a bottleneck, if not wisely designed. Best would be to make it thread safe without using any locks. This can be accomplished, if you don't use common objects between threads which need synchronization.
E.g. for a sql server backend that would mean creating an own sql connection in the context of every Commit() call:
private void Commit(ComplexDataSet[] data)
{
using (var connection = new SqlConnection("connection string..."))
{
connection.Open();
// insert your data here...
}
}
Instead of increasing complexity of software, rather consider simplification. You can refactor the code into three parts:
Workers that enqueue
This is concurrent GenerateData in Parallel.For that does some heavy computation and produce ComplexDataSet.
Actual queue
A concurrent queue that stores the results from [1] - so many ComplexDataSet. Here I assumed that one instance of ComplexDataSet is actually not really resource consuming and fairly light. As long as the queue is concurrent it will support parallel "inserts" and "deletes".
Workers that dequeue
Code that takes one instance of the ComplexDataSet from processing queue [2] and puts it into the concurrent bag (or other storage). Once the bag has N number of items you block, stop dequeueing, flush the content of the bag into the database and clear it. Finally, you unblock and resume dequeueing.
Here is some metacode (it still compiles, but needs improvements)
[1]
// [1] - Class is responsible for generating complex data sets and
// adding them to processing queue
class EnqueueWorker
{
//generate data and add to queue
internal void ParrallelEnqueue(ConcurrentQueue<ComplexDataSet> resultQueue)
{
Parallel.For(1, 10000, (i) =>
{
ComplexDataSet cds = GenerateData(i);
resultQueue.Enqueue(cds);
});
}
//generate data
ComplexDataSet GenerateData(int i)
{
return new ComplexDataSet();
}
}
[3]
//[3] This guy takes sets from the processing queue and flush results when
// N items have been generated
class DequeueWorker
{
//buffer that holds processed dequeued data
private static ConcurrentBag<ComplexDataSet> buffer;
//lock to flush the data to the db once in a while
private static object syncRoot = new object();
//take item from processing queue and add it to internal buffer storage
//once buffer is full - flush it to the database
internal void ParrallelDequeue(ConcurrentQueue<ComplexDataSet> resultQueue)
{
buffer = new ConcurrentBag<ComplexDataSet>();
int N = 100;
Parallel.For(1, 10000, (i) =>
{
//try dequeue
ComplexDataSet cds = null;
var spinWait = new SpinWait();
while (cds == null)
{
resultQueue.TryDequeue(out cds);
spinWait.SpinOnce();
}
//add to buffer
buffer.Add(cds);
//flush to database if needed
if (buffer.Count == N)
{
lock (syncRoot)
{
IEnumerable<ComplexDataSet> data = buffer.ToArray();
// flush data to database
buffer = new ConcurrentBag<ComplexDataSet>();
}
}
});
}
}
[2] and usage
class ComplexDataSet { }
class Program
{
//processing queueu - [2]
private static ConcurrentQueue<ComplexDataSet> processingQueue;
static void Main(string[] args)
{
// create new processing queue - single instance for whole app
processingQueue = new ConcurrentQueue<ComplexDataSet>();
//enqueue worker
Task enqueueTask = Task.Factory.StartNew(() =>
{
EnqueueWorker enqueueWorker = new EnqueueWorker();
enqueueWorker.ParrallelEnqueue(processingQueue);
});
//dequeue worker
Task dequeueTask = Task.Factory.StartNew(() =>
{
DequeueWorker dequeueWorker = new DequeueWorker();
dequeueWorker.ParrallelDequeue(processingQueue);
});
}
}