StackExchange redis client very slow compared to benchmark tests - c#

I'm implementing a Redis caching layer using the Stackexchange Redis client and the performance right now is bordering on unusable.
I have a local environment where the web application and the redis server are running on the same machine. I ran the Redis benchmark test against my Redis server and the results were actually really good (I'm just including set and get operations in my write up):
C:\Program Files\Redis>redis-benchmark -n 100000
====== PING_INLINE ======
100000 requests completed in 0.88 seconds
50 parallel clients
3 bytes payload
keep alive: 1
====== SET ======
100000 requests completed in 0.89 seconds
50 parallel clients
3 bytes payload
keep alive: 1
99.70% <= 1 milliseconds
99.90% <= 2 milliseconds
100.00% <= 3 milliseconds
111982.08 requests per second
====== GET ======
100000 requests completed in 0.81 seconds
50 parallel clients
3 bytes payload
keep alive: 1
99.87% <= 1 milliseconds
99.98% <= 2 milliseconds
100.00% <= 2 milliseconds
124069.48 requests per second
So according to the benchmarks I am looking at over 100,000 sets and 100,000 gets, per second. I wrote a unit test to do 300,000 set/gets:
private string redisCacheConn = "localhost:6379,allowAdmin=true,abortConnect=false,ssl=false";
[Fact]
public void PerfTestWriteShortString()
{
CacheManager cm = new CacheManager(redisCacheConn);
string svalue = "t";
string skey = "testtesttest";
for (int i = 0; i < 300000; i++)
{
cm.SaveCache(skey + i, svalue);
string valRead = cm.ObtainItemFromCacheString(skey + i);
}
}
This uses the following class to perform the Redis operations via the Stackexchange client:
using StackExchange.Redis;
namespace Caching
{
public class CacheManager:ICacheManager, ICacheManagerReports
{
private static string cs;
private static ConfigurationOptions options;
private int pageSize = 5000;
public ICacheSerializer serializer { get; set; }
public CacheManager(string connectionString)
{
serializer = new SerializeJSON();
cs = connectionString;
options = ConfigurationOptions.Parse(connectionString);
options.SyncTimeout = 60000;
}
private static readonly Lazy<ConnectionMultiplexer> lazyConnection = new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(options));
private static ConnectionMultiplexer Connection => lazyConnection.Value;
private static IDatabase cache => Connection.GetDatabase();
public string ObtainItemFromCacheString(string cacheId)
{
return cache.StringGet(cacheId);
}
public void SaveCache<T>(string cacheId, T cacheEntry, TimeSpan? expiry = null)
{
if (IsValueType<T>())
{
cache.StringSet(cacheId, cacheEntry.ToString(), expiry);
}
else
{
cache.StringSet(cacheId, serializer.SerializeObject(cacheEntry), expiry);
}
}
public bool IsValueType<T>()
{
return typeof(T).IsValueType || typeof(T) == typeof(string);
}
}
}
My JSON serializer is just using Newtonsoft.JSON:
using System.Collections.Generic;
using Newtonsoft.Json;
namespace Caching
{
public class SerializeJSON:ICacheSerializer
{
public string SerializeObject<T>(T cacheEntry)
{
return JsonConvert.SerializeObject(cacheEntry, Formatting.None,
new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
}
public T DeserializeObject<T>(string data)
{
return JsonConvert.DeserializeObject<T>(data, new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
}
}
}
My test times are around 21 seconds (for 300,000 sets and 300,000 gets). This gives me around 28,500 operations per second (at least 3 times slower than I would expect using the benchmarks). The application I am converting to use Redis is pretty chatty and certain heavy requests can approximate 200,000 total operations against Redis. Obviously I wasn't expecting anything like the same times I was getting when using the system runtime cache, but the delays after this change are significant. Am I doing something wrong with my implementation and does anyone know why my benchmarked figures are so much faster than my Stackechange test figures?
Thanks,
Paul

My results from the code below:
Connecting to server...
Connected
PING (sync per op)
1709ms for 1000000 ops on 50 threads took 1.709594 seconds
585137 ops/s
SET (sync per op)
759ms for 500000 ops on 50 threads took 0.7592914 seconds
658761 ops/s
GET (sync per op)
780ms for 500000 ops on 50 threads took 0.7806102 seconds
641025 ops/s
PING (pipelined per thread)
3751ms for 1000000 ops on 50 threads took 3.7510956 seconds
266595 ops/s
SET (pipelined per thread)
1781ms for 500000 ops on 50 threads took 1.7819831 seconds
280741 ops/s
GET (pipelined per thread)
1977ms for 500000 ops on 50 threads took 1.9772623 seconds
252908 ops/s
===
Server configuration: make sure persistence is disabled, etc
The first thing you should do in a benchmark is: benchmark one thing. At the moment you're including a lot of serialization overhead, which won't help get a clear picture. Ideally, for a like-for-like benchmark, you should be using a 3-byte fixed payload, because:
3 bytes payload
Next, you'd need to look at parallelism:
50 parallel clients
It isn't clear whether your test is parallel, but if it isn't we should absolutely expect to see less raw throughput. Conveniently, SE.Redis is designed to be easy to parallelize: you can just spin up multiple threads talking to the same connection (this actually also has the advantage of avoiding packet fragmentation, as you can end up with multiple messages per packet, where-as a single-thread sync approach is guaranteed to use at most one message per packet).
Finally, we need to understand what the listed benchmark is doing. Is it doing:
(send, receive) x n
or is it doing
send x n, receive separately until all n are received
? Both options are possible. Your sync API usage is the first one, but the second test is equally well-defined, and for all I know: that's what it is measuring. There are two ways of simulating this second setup:
send the first (n-1) messages with the "fire and forget" flag, so you only actually wait for the last one
use the *Async API for all messages, and only Wait() or await the last Task
Here's a benchmark that I used in the above, that shows both "sync per op" (via the sync API) and "pipeline per thread" (using the *Async API and just waiting for the last task per thread), both using 50 threads:
using StackExchange.Redis;
using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
static class P
{
static void Main()
{
Console.WriteLine("Connecting to server...");
using (var muxer = ConnectionMultiplexer.Connect("127.0.0.1"))
{
Console.WriteLine("Connected");
var db = muxer.GetDatabase();
RedisKey key = "some key";
byte[] payload = new byte[3];
new Random(12345).NextBytes(payload);
RedisValue value = payload;
DoWork("PING (sync per op)", db, 1000000, 50, x => { x.Ping(); return null; });
DoWork("SET (sync per op)", db, 500000, 50, x => { x.StringSet(key, value); return null; });
DoWork("GET (sync per op)", db, 500000, 50, x => { x.StringGet(key); return null; });
DoWork("PING (pipelined per thread)", db, 1000000, 50, x => x.PingAsync());
DoWork("SET (pipelined per thread)", db, 500000, 50, x => x.StringSetAsync(key, value));
DoWork("GET (pipelined per thread)", db, 500000, 50, x => x.StringGetAsync(key));
}
}
static void DoWork(string action, IDatabase db, int count, int threads, Func<IDatabase, Task> op)
{
object startup = new object(), shutdown = new object();
int activeThreads = 0, outstandingOps = count;
Stopwatch sw = default(Stopwatch);
var threadStart = new ThreadStart(() =>
{
lock(startup)
{
if(++activeThreads == threads)
{
sw = Stopwatch.StartNew();
Monitor.PulseAll(startup);
}
else
{
Monitor.Wait(startup);
}
}
Task final = null;
while (Interlocked.Decrement(ref outstandingOps) >= 0)
{
final = op(db);
}
if (final != null) final.Wait();
lock(shutdown)
{
if (--activeThreads == 0)
{
sw.Stop();
Monitor.PulseAll(shutdown);
}
}
});
lock (shutdown)
{
for (int i = 0; i < threads; i++)
{
new Thread(threadStart).Start();
}
Monitor.Wait(shutdown);
Console.WriteLine($#"{action}
{sw.ElapsedMilliseconds}ms for {count} ops on {threads} threads took {sw.Elapsed.TotalSeconds} seconds
{(count * 1000) / sw.ElapsedMilliseconds} ops/s");
}
}
}

You are fetching data in synchronous way (50 clients in parallel but each client's requests are made synchronously instead of asynchronously)
One option would be to use the async/await methods (StackExchange.Redis support that).
If you need to get multiple keys at once (for example to build a daily graph of visitors to your website assuming you save visitors counter per day keys) then you should try fetching data from redis in asynchronous manner using redis pipelining, this should give you much better performance.

StackExchange redis client old versions have performance issues.
Upgrade to the newest version. Read more here:
https://www.gitmemory.com/issue/mgravell/Pipelines.Sockets.Unofficial/28/479932064
and in this article:
https://blog.marcgravell.com/2019/02/fun-with-spiral-of-death.html
this is the issue in the repo:
https://github.com/StackExchange/StackExchange.Redis/issues/1003

Related

Redis Get String appears to be slow

I have written some test code to retrieve 1000 strings from my Redis cache. Obviously it is getting the same string in this test but it was written to see how long it would take to get these 1000 items.
The test completes in 23 seconds, so that is only around 43 strings per second that seems quite slow.
I am running this locally against the Redis instance that is in Azure, so I’m assuming there will be some latency. Have I missed out something or is there a way to reduce the time to get these 1000 items?
In my production environment, there could be several thousand items that need to be retrieved.
class Program
{
static async Task Main(string[] args)
{
var connectionString = #"testserver-rc.redis.cache.windows.net:6380,password=password,ssl=True,abortConnect=False,defaultDatabase=2";
var redisClient = new StackExchangeRedisCacheClient(new NewtonsoftSerializer(), connectionString, 2);
await TestGets(redisClient);
Console.ReadLine();
}
private static async Task TestGets(StackExchangeRedisCacheClient redisClient)
{
Console.WriteLine("Running...");
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 1000; i++)
{
await redisClient.Database.StringGetAsync("test_T-0004");
}
Console.WriteLine($"{sw.Elapsed.Seconds} seconds");
}
}
43 per second? That sounds pretty fast. That means including overhead and latency you are spending 23ms per query.
I think you want to parallelize the query.
Try replacing your query line with
await Task.WhenAll(Enumerable.Range(0, 1000).Select(I => redisClient.Database.StringGetAsync("test_T-0004"));
The problem is that you are latency bound. You are waiting for each request to complete before firing the next one off.

How to know how many threads I can spin up?

I'm using this to spin up threads that either insert or delete document from a DocumentDB collection.
It works, but I am not exactly sure how I'm supposed to know how many threads I can spin.
Sometimes, it works with maxThreads at 7, above that I'll quickly get the Request rate is large error. But sometimes, even at 3 threads I'll get the same error.
So this is obviously not very scientific.
I guess I would have to monitor how many RUs I've used after each calls and perhaps throttle the logic for a couple of miliseconds.
Any ideas ?
public class MultiThreadOperations<T> where T : IDocumentModel
{
List<T> Documents = new List<T>();
CollectionDB<T> Collection;
OperationType OperationType;
List<Task> AllTasks = new List<Task>();
public MultiThreadOperations(List<T> documents, CollectionDB<T> Collection, OperationType opType)
{
this.Collection = Collection;
Documents = documents;
OperationType = opType;
}
public async Task Start()
{
var maxThreads = 2;
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxThreads))
{
foreach (T doc in Documents)
{
concurrencySemaphore.Wait();
var t = Task.Run(async () =>
{
try
{
switch (OperationType)
{
case OperationType.Create:
await InsertDocument(doc);
break;
case OperationType.Delete:
await DeleteDocument(doc);
break;
}
}
finally
{
concurrencySemaphore.Release();
}
});
AllTasks.Add(t);
}
await Task.WhenAll(AllTasks.ToArray());
}
}
private async Task InsertDocument(T item)
{
await Collection.CreateAsync(item);
}
private async Task DeleteDocument(T item)
{
await Collection.DeleteFromId(item.Id);
}
}
It depends on the following factors:
Let's say the number of request units per single create/delete operation (RUs) is X RUs
The latency/duration per request is N. Within the same region, this is ~5ms, but across the network, it could be RTT (round trip time) + 5ms.
Then each thread can perform X * 1/N RUs per second
If your collection is provisioned with T RU/s, then you need the number of threads = T / (X * 1/N)
For example, within the same Azure region, if you had 10,000 RU/s, say each create or delete takes 5 RUs, and the network latency is 5ms. This means each thread can perform 1000/5 = 200 writes/second = 200 * 5 RU/s = 1000 RU/s. Therefore you need 10 threads to reach 10,000 RU/s.
Let's say, you're running the same test from a VM in Europe accessing an account in East US. The network lag is ~100ms. This means that each thread can perform ~10 requests/sec = 50 RU/s. Therefore, you need 200 threads to reach the same 10,000 RU/s.

Limiting number of API requests started per second using Parallel.ForEach

I am working on improving some of my code to increase efficiency. In the original code I was limiting the number of threads allowed to be 5, and if I had already 5 active threads I would wait until one finished before starting another one. Now I want to modify this code to allow any number of threads, but I want to be able to make sure that only 5 threads get started every second. For example:
Second 0 - 5 new threads
Second 1 - 5 new threads
Second 2 - 5 new threads ...
Original Code (cleanseDictionary contains usually thousands of items):
ConcurrentDictionary<long, APIResponse> cleanseDictionary = new ConcurrentDictionary<long, APIResponse>();
ConcurrentBag<int> itemsinsec = new ConcurrentBag<int>();
ConcurrentDictionary<long, string> resourceDictionary = new ConcurrentDictionary<long, string>();
DateTime start = DateTime.Now;
Parallel.ForEach(resourceDictionary, new ParallelOptions { MaxDegreeOfParallelism = 5 }, row =>
{
lock (itemsinsec)
{
ThrottleAPIRequests(itemsinsec, start);
itemsinsec.Add(1);
}
cleanseDictionary.TryAdd(row.Key, _helper.MakeAPIRequest(string.Format("/endpoint?{0}", row.Value)));
});
private static void ThrottleAPIRequests(ConcurrentBag<int> itemsinsec, DateTime start)
{
if ((start - DateTime.Now).Milliseconds < 10001 && itemsinsec.Count > 4)
{
System.Threading.Thread.Sleep(1000 - (start - DateTime.Now).Milliseconds);
start = DateTime.Now;
itemsinsec = new ConcurrentBag<int>();
}
}
My first thought was increase the MaxDegreeofParallelism to something much higher and then have a helper method that will limit only 5 threads in a second, but I am not sure if that is the best way to do it and if it is, I would probably need a lock around that step?
Thanks in advance!
EDIT
I am actually looking for a way to throttle the API Requests rather than the actual threads. I was thinking they were one in the same.
Edit 2: My requirements are to send over 5 API requests every second
"Parallel.ForEach" from the MS website
may run in parallel
If you want any degree of fine control over how the threads are managed, this is not the way.
How about creating your own helper class where you can queue jobs with a group id, allows you to wait for all jobs of group id X to complete, and it spawns extra threads as and when required?
For me the best solution is:
using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;
namespace SomeNamespace
{
public class RequestLimiter : IRequestLimiter
{
private readonly ConcurrentQueue<DateTime> _requestTimes;
private readonly TimeSpan _timeSpan;
private readonly object _locker = new object();
public RequestLimiter()
{
_timeSpan = TimeSpan.FromSeconds(1);
_requestTimes = new ConcurrentQueue<DateTime>();
}
public TResult Run<TResult>(int requestsOnSecond, Func<TResult> function)
{
WaitUntilRequestCanBeMade(requestsOnSecond).Wait();
return function();
}
private Task WaitUntilRequestCanBeMade(int requestsOnSecond)
{
return Task.Factory.StartNew(() =>
{
while (!TryEnqueueRequest(requestsOnSecond).Result) ;
});
}
private Task SynchronizeQueue()
{
return Task.Factory.StartNew(() =>
{
_requestTimes.TryPeek(out var first);
while (_requestTimes.Count > 0 && (first.Add(_timeSpan) < DateTime.UtcNow))
_requestTimes.TryDequeue(out _);
});
}
private Task<bool> TryEnqueueRequest(int requestsOnSecond)
{
lock (_locker)
{
SynchronizeQueue().Wait();
if (_requestTimes.Count < requestsOnSecond)
{
_requestTimes.Enqueue(DateTime.UtcNow);
return Task.FromResult(true);
}
return Task.FromResult(false);
}
}
}
}
I want to be able to send over 5 API request every second
That's really easy:
while (true) {
await Task.Delay(TimeSpan.FromSeconds(1));
await Task.WhenAll(Enumerable.Range(0, 5).Select(_ => RunRequestAsync()));
}
Maybe not the best approach since there will be a burst of requests. This is not continuous.
Also, there is timing skew. One iteration takes more than 1 second. This can be solved with a few lines of time logic.

Handle multiple threads, one out one in, in a timed loop

I need to process a large number of files overnight, with a defined start and end time to avoid disrupting users. I've been investigating but there are so many ways of handling threading now that I'm not sure which way to go. The files come into an Exchange inbox as attachments.
My current attempt, based on some examples from here and a bit of experimentation, is:
while (DateTime.Now < dtEndTime.Value)
{
var finished = new CountdownEvent(1);
for (int i = 0; i < numThreads; i++)
{
object state = offset;
finished.AddCount();
ThreadPool.QueueUserWorkItem(delegate
{
try
{
StartProcessing(state);
}
finally
{
finished.Signal();
}
});
offset += numberOfFilesPerPoll;
}
finished.Signal();
finished.Wait();
}
It's running in a winforms app at the moment for ease, but the core processing is in a dll so I can spawn the class I need from a windows service, from a console running under a scheduler, however is easiest. I do have a Windows Service set up with a Timer object that kicks off the processing at a time set in the config file.
So my question is - in the above code, I initialise a bunch of threads (currently 10), then wait for them all to process. My ideal would be a static number of threads, where as one finishes I fire off another, and then when I get to the end time I just wait for all threads to complete.
The reason for this is that the files I'm processing are variable sizes - some might take seconds to process and some might take hours, so I don't want the whole application to wait while one thread completes if I can have it ticking along in the background.
(edit)As it stands, each thread instantiates a class and passes it an offset. The class then gets the next x emails from the inbox, starting at the offset (using the Exchange Web Services paging functionality). As each file is processed, it's moved to a separate folder. From some of the replies so far, I'm wondering if actually I should grab the e-mails in the outer loop, and spawn threads as needed.
To cloud the issue, I currently have a backlog of e-mails that I'm trying to process through. Once the backlog has been cleared, it's likely that the nightly run will have a significantly lower load.
On average there are around 1000 files to process each night.
Update
I've rewritten large chunks of my code so that I can use the Parallel.Foreach and I've come up against an issue with thread safety. The calling code now looks like this:
public bool StartProcessing()
{
FindItemsResults<Item> emails = GetEmails();
var source = new CancellationTokenSource(TimeSpan.FromHours(10));
// Process files in parallel, with a maximum thread count.
var opts = new ParallelOptions { MaxDegreeOfParallelism = 8, CancellationToken = source.Token };
try
{
Parallel.ForEach(emails, opts, processAttachment);
}
catch (OperationCanceledException)
{
Console.WriteLine("Loop was cancelled.");
}
catch (Exception err)
{
WriteToLogFile(err.Message + "\r\n");
WriteToLogFile(err.StackTrace + "r\n");
}
return true;
}
So far so good (excuse temporary error handling). I have a new issue now with the fact that the properties of the "Item" object, which is an email, not being threadsafe. So for example when I start processing an e-mail, I move it to a "processing" folder so that another process can't grab it - but it turns out that several of the threads might be trying to process the same e-mail at a time. How do I guarantee that this doesn't happen? I know I need to add a lock, can I add this in the ForEach or should it be in the processAttachments method?
Use the TPL:
Parallel.ForEach( EnumerateFiles(),
new ParallelOptions { MaxDegreeOfParallelism = 10 },
file => ProcessFile( file ) );
Make EnumerateFiles stop enumerating when your end time is reached, trivially like this:
IEnumerable<string> EnumerateFiles()
{
foreach (var file in Directory.EnumerateFiles( "*.txt" ))
if (DateTime.Now < _endTime)
yield return file;
else
yield break;
}
You can use a combination of Parallel.ForEach() along with a cancellation token source which will cancel the operation after a set time:
using System;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
static class Program
{
static Random rng = new Random();
static void Main()
{
// Simulate having a list of files.
var fileList = Enumerable.Range(1, 100000).Select(i => i.ToString());
// For demo purposes, cancel after a few seconds.
var source = new CancellationTokenSource(TimeSpan.FromSeconds(10));
// Process files in parallel, with a maximum thread count.
var opts = new ParallelOptions {MaxDegreeOfParallelism = 8, CancellationToken = source .Token};
try
{
Parallel.ForEach(fileList, opts, processFile);
}
catch (OperationCanceledException)
{
Console.WriteLine("Loop was cancelled.");
}
}
static void processFile(string file)
{
Console.WriteLine("Processing file: " + file);
// Simulate taking a varying amount of time per file.
int delay;
lock (rng)
{
delay = rng.Next(200, 2000);
}
Thread.Sleep(delay);
Console.WriteLine("Processed file: " + file);
}
}
}
As an alternative to using a cancellation token, you can write a method that returns IEnumerable<string> which returns the list of filenames, and stop returning them when time is up, for example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
static class Program
{
static Random rng = new Random();
static void Main()
{
// Process files in parallel, with a maximum thread count.
var opts = new ParallelOptions {MaxDegreeOfParallelism = 8};
Parallel.ForEach(fileList(), opts, processFile);
}
static IEnumerable<string> fileList()
{
// Simulate having a list of files.
var fileList = Enumerable.Range(1, 100000).Select(x => x.ToString()).ToArray();
// Simulate finishing after a few seconds.
DateTime endTime = DateTime.Now + TimeSpan.FromSeconds(10);
int i = 0;
while (DateTime.Now <= endTime)
yield return fileList[i++];
}
static void processFile(string file)
{
Console.WriteLine("Processing file: " + file);
// Simulate taking a varying amount of time per file.
int delay;
lock (rng)
{
delay = rng.Next(200, 2000);
}
Thread.Sleep(delay);
Console.WriteLine("Processed file: " + file);
}
}
}
Note that you don't need the try/catch with this approach.
You should consider using Microsoft's Reactive Framework. It lets you use LINQ queries to process multithreaded asynchronous processing in a very simple way.
Something like this:
var query =
from file in filesToProcess.ToObservable()
where DateTime.Now < stopTime
from result in Observable.Start(() => StartProcessing(file))
select new { file, result };
var subscription =
query.Subscribe(x =>
{
/* handle result */
});
Truly, that's all the code you need if StartProcessing is already defined.
Just NuGet "Rx-Main".
Oh, and to stop processing at any time just call subscription.Dispose().
This was a truly fascinating task, and it took me a while to get the code to a level that I was happy with it.
I ended up with a combination of the above.
The first thing worth noting is that I added the following lines to my web service call, as the operation timeout I was experiencing, and which I thought was because I'd exceeded some limit set on the endpoint, was actually due to a limit set by microsoft way back in .Net 2.0:
ServicePointManager.DefaultConnectionLimit = int.MaxValue;
ServicePointManager.Expect100Continue = false;
See here for more information:
What to set ServicePointManager.DefaultConnectionLimit to
As soon as I added those lines of code, my processing increased from 10/minute to around 100/minute.
But I still wasn't happy with the looping, and partitioning etc. My service moved onto a physical server to minimise CPU contention, and I wanted to allow the operating system to dictate how fast it ran, rather than my code throttling it.
After some research, this is what I ended up with - arguably not the most elegant code I've written, but it's extremely fast and reliable.
List<XElement> elements = new List<XElement>();
while (XMLDoc.ReadToFollowing("ElementName"))
{
using (XmlReader r = XMLDoc.ReadSubtree())
{
r.Read();
XElement node = XElement.Load(r);
//do some processing of the node here...
elements.Add(node);
}
}
//And now pass the list of elements through PLinQ to the actual web service call, allowing the OS/framework to handle the parallelism
int failCount=0; //the method call below sets this per request; we log and continue
failCount = elements.AsParallel()
.Sum(element => IntegrationClass.DoRequest(element.ToString()));
It ended up fiendishly simple and lightning fast.
I hope this helps someone else trying to do the same thing!

Thread-safe buffer of data to make batch inserts of controlled size

I have a simulation that generates data which must be saved to database.
ParallelLoopResult res = Parallel.For(0, 1000000, options, (r, state) =>
{
ComplexDataSet cds = GenerateData(r);
SaveDataToDatabase(cds);
});
The simulation generates a whole lot of data, so it wouldn't be practical to first generate it and then save it to database (up to 1 GB of data) and it also wouldn't make sense to save it to database one by one (too small transanctions to be practical). I want to insert them to database as a batch insert of controlled size (say 100 with one commit).
However, I think my knowledge of parallel computing is less that theoretical. I came up with this (which as you can see is very flawed):
DataBuffer buffer = new DataBuffer(...);
ParallelLoopResult res = Parallel.For(0, 10000000, options, (r, state) =>
{
ComplexDataSet cds = GenerateData(r);
buffer.SaveDataToBuffer(cds, i == r - 1);
});
public class DataBuffer
{
int count = 0;
int limit = 100
object _locker = new object();
ConcurrentQueue<ConcurrentBag<ComplexDataSet>> ComplexDataBagQueue{ get; set; }
public void SaveDataToBuffer(ComplexDataSet data, bool isfinalcycle)
{
lock (_locker)
{
if(count >= limit)
{
ConcurrentBag<ComplexDataSet> dequeueRef;
if(ComplexDataBagQueue.TryDequeue(out dequeueRef))
{
Commit(dequeueRef);
}
_lastItemRef = new ConcurrentBag<ComplexDataSet>{data};
ComplexDataSetsQueue.Enqueue(_lastItemRef);
count = 1;
}
else
{
// First time
if(_lastItemRef == null)
{
_lastItemRef = new ConcurrentBag<ComplexDataSet>{data};
ComplexDataSetsQueue.Enqueue(_lastItemRef);
count = 1;
}
// If buffer isn't full
else
{
_lastItemRef.Add(data);
count++;
}
}
if(isfinalcycle)
{
// Commit everything that hasn't been committed yet
ConcurrentBag<ComplexDataSet> dequeueRef;
while (ComplexDataSetsQueue.TryDequeue(out dequeueRef))
{
Commit(dequeueRef);
}
}
}
}
public void Commit(ConcurrentBag<ComplexDataSet> data)
{
// Commit data to database..should this be somehow in another thread or something ?
}
}
As you can see, I'm using queue to create a buffer and then manually decide when to commit. However I have a strong feeling that this isn't very performing solution to my problem. First, I'm unsure whether I'm doing locking right. Second, I'm not sure even if this is fully thread-safe (or at all).
Can you please take a look for a moment and comment what should I do differently ? Or if there is a complitely better way of doing this (using somekind of Producer-Consumer technique or something) ?
Thanks and best wishes,
D.
There is no need to use locks or expensive concurrency-safe data structures. The data is all independent, so introducing locking and sharing will only hurt performance and scalability.
Parallel.For has an overload that lets you specify per-thread data. In this you can store a private queue and private database connection.
Also: Parallel.For internally partitions your range into smaller chunks. It's perfectly efficient to pass it a huge range, so nothing to change there.
Parallel.For(0, 10000000, () => new ThreadState(),
(i, loopstate, threadstate) =>
{
ComplexDataSet data = GenerateData(i);
threadstate.Add(data);
return threadstate;
}, threadstate => threadstate.Dispose());
sealed class ThreadState : IDisposable
{
readonly IDisposable db;
readonly Queue<ComplexDataSet> queue = new Queue<ComplexDataSet>();
public ThreadState()
{
// initialize db with a private MongoDb connection.
}
public void Add(ComplexDataSet cds)
{
queue.Enqueue(cds);
if(queue.Count == 100)
{
Commit();
}
}
void Commit()
{
db.Write(queue);
queue.Clear();
}
public void Dispose()
{
try
{
if(queue.Count > 0)
{
Commit();
}
}
finally
{
db.Dispose();
}
}
}
Now, MongoDb currently doesn't support truly concurrent inserts -- it holds some expensive locks in the server, so parallel commits won't gain you much (if any) speed. They want to fix this in the future, so you might get a free speed-up one day.
If you need to limit the number of database connections held, a producer/consumer setup is a good alternative. You can use a BlockingCollection queue to do this efficiently without using any locks:
// Specify a maximum of 1000 items in the collection so that we don't
// run out of memory if we get data faster than we can commit it.
// Add() will wait if it is full.
BlockingCollection<ComplexDataSet> commits =
new BlockingCollection<ComplexDataSet>(1000);
Task consumer = Task.Factory.StartNew(() =>
{
// This is the consumer. It processes the
// "commits" queue until it signals completion.
while(!commits.IsCompleted)
{
ComplexDataSet cds;
// Timeout of -1 will wait for an item or IsCompleted == true.
if(commits.TryTake(out cds, -1))
{
// Got at least one item, write it.
db.Write(cds);
// Continue dequeuing until the queue is empty, where it will
// timeout instantly and return false, or until we've dequeued
// 100 items.
for(int i = 1; i < 100 && commits.TryTake(out cds, 0); ++i)
{
db.Write(cds);
}
// Now that we're waiting for more items or have dequeued 100
// of them, commit. More can be continue to be added to the
// queue by other threads while this commit is processing.
db.Commit();
}
}
}, TaskCreationOptions.LongRunning);
try
{
// This is the producer.
Parallel.For(0, 1000000, i =>
{
ComplexDataSet data = GenerateData(i);
commits.Add(data);
});
}
finally // put in a finally to ensure the task closes down.
{
commits.CompleteAdding(); // set commits.IsFinished = true.
consumer.Wait(); // wait for task to finish committing all the items.
}
In your example you have 10 000 000 packages of work. Each of this needs to be distributed to a thread.
Assuming you don't have a really large number of cpu cores this is not optimal. You also have to synchronize your threads on every buffer.SaveDataToBuffer call (by using locks). Additionally you should be aware that the variable r isn't necessarly increased by one in a chronology view (example: Thread1 executes r with 1,2,3 and Thread2 with 4,5,6. Chronological this would lead to the following sequence of r passed to SaveDataToBuffer 1,4,2,5,3,6 (approximately)).
I would make the packages of work larger and then commit each package at once. This has also the benefit that you don't have to lock/synchronize all to often.
Here's an example:
int total = 10000000;
int step = 1000;
Parallel.For(0, total / step, (r, state) =>
{
int start = r * start;
int end = start + step;
ComplexDataSet[] result = new ComplexDataSet[step];
for (int i = start; i < end; i++)
{
result[i - start] = GenerateData(i);
}
Commit(result);
});
In this example the whole work is split into 10 000 packages (which are executed in parallel) and every package generates 1000 data items and commits them to the database.
With this solution the Commit method might be a bottleneck, if not wisely designed. Best would be to make it thread safe without using any locks. This can be accomplished, if you don't use common objects between threads which need synchronization.
E.g. for a sql server backend that would mean creating an own sql connection in the context of every Commit() call:
private void Commit(ComplexDataSet[] data)
{
using (var connection = new SqlConnection("connection string..."))
{
connection.Open();
// insert your data here...
}
}
Instead of increasing complexity of software, rather consider simplification. You can refactor the code into three parts:
Workers that enqueue
This is concurrent GenerateData in Parallel.For that does some heavy computation and produce ComplexDataSet.
Actual queue
A concurrent queue that stores the results from [1] - so many ComplexDataSet. Here I assumed that one instance of ComplexDataSet is actually not really resource consuming and fairly light. As long as the queue is concurrent it will support parallel "inserts" and "deletes".
Workers that dequeue
Code that takes one instance of the ComplexDataSet from processing queue [2] and puts it into the concurrent bag (or other storage). Once the bag has N number of items you block, stop dequeueing, flush the content of the bag into the database and clear it. Finally, you unblock and resume dequeueing.
Here is some metacode (it still compiles, but needs improvements)
[1]
// [1] - Class is responsible for generating complex data sets and
// adding them to processing queue
class EnqueueWorker
{
//generate data and add to queue
internal void ParrallelEnqueue(ConcurrentQueue<ComplexDataSet> resultQueue)
{
Parallel.For(1, 10000, (i) =>
{
ComplexDataSet cds = GenerateData(i);
resultQueue.Enqueue(cds);
});
}
//generate data
ComplexDataSet GenerateData(int i)
{
return new ComplexDataSet();
}
}
[3]
//[3] This guy takes sets from the processing queue and flush results when
// N items have been generated
class DequeueWorker
{
//buffer that holds processed dequeued data
private static ConcurrentBag<ComplexDataSet> buffer;
//lock to flush the data to the db once in a while
private static object syncRoot = new object();
//take item from processing queue and add it to internal buffer storage
//once buffer is full - flush it to the database
internal void ParrallelDequeue(ConcurrentQueue<ComplexDataSet> resultQueue)
{
buffer = new ConcurrentBag<ComplexDataSet>();
int N = 100;
Parallel.For(1, 10000, (i) =>
{
//try dequeue
ComplexDataSet cds = null;
var spinWait = new SpinWait();
while (cds == null)
{
resultQueue.TryDequeue(out cds);
spinWait.SpinOnce();
}
//add to buffer
buffer.Add(cds);
//flush to database if needed
if (buffer.Count == N)
{
lock (syncRoot)
{
IEnumerable<ComplexDataSet> data = buffer.ToArray();
// flush data to database
buffer = new ConcurrentBag<ComplexDataSet>();
}
}
});
}
}
[2] and usage
class ComplexDataSet { }
class Program
{
//processing queueu - [2]
private static ConcurrentQueue<ComplexDataSet> processingQueue;
static void Main(string[] args)
{
// create new processing queue - single instance for whole app
processingQueue = new ConcurrentQueue<ComplexDataSet>();
//enqueue worker
Task enqueueTask = Task.Factory.StartNew(() =>
{
EnqueueWorker enqueueWorker = new EnqueueWorker();
enqueueWorker.ParrallelEnqueue(processingQueue);
});
//dequeue worker
Task dequeueTask = Task.Factory.StartNew(() =>
{
DequeueWorker dequeueWorker = new DequeueWorker();
dequeueWorker.ParrallelDequeue(processingQueue);
});
}
}

Categories