I am downloading some JSON periodically, say every 10 seconds... When the data arrives, an event is fired. The event fired simply adds the JSON to a BlockingCollection<string>(to be processed).
I'm trying to process the JSON as fast as possible (as soon as it arrives...):
public class Engine
{
private BlockingCollection<string> Queue = new BlockingCollection<string>();
private DataDownloader DataDownloader;
public void Init(string url, int interaval)
{
dataDownloader = new DataDownloader(url, interaval);
dataDownloader .StartCollecting();
dataDownloader .DataReceivedEvent += DataArrived;
//Kick off a new task to process the incomming JSON
Task.Factory.StartNew(Process, TaskCreationOptions.LongRunning);
}
/// <summary>
/// Processes the JSON in parallel
/// </summary>
private void Process()
{
Parallel.ForEach(Queue.GetConsumingEnumerable(), ProcessJson);
}
/// <summary>
/// Deserializes JSON and adds result to database
/// </summary>
/// <param name="json"></param>
private void ProcessJson(string json)
{
using (var db = new MyDataContext())
{
var items= Extensions.DeserializeData(json);
foreach (var item in items)
{
db.Items.Add(item);
db.SaveChanges();
}
}
}
private void DataArrived(object sender, string json)
{
Queue.Add(json);
Console.WriteLine("Queue length: " + Queue.Count);
}
}
When I run the program, it works and data gets added to the Database, but if I watch the console message from Console.WriteLine("Queue length: " + Queue.Count);, I get something like this:
1
1
1
1
1
1
1
1
2
3
4
5
6
7
...
I've tried modifying my Process to look like this:
/// <summary>
/// Processes the JSON in parallel
/// </summary>
private void Process()
{
foreach (var json in Queue.GetConsumingEnumerable())
{
ProcessJson(json);
}
}
I then add multiple Task.Factory.StartNew(Process, TaskCreationOptions.LongRunning); but I get the same problem...
Does anyone have any idea of what is going wrong here?
The queue will initially be filled before processing starts. Probably because the Entity Framework stuff needs to get loaded and a database connection has to be established, this takes a while.
Then the GetConsumingEnumerable() starts to catch up with the downloading process, depleting the queue during downloading. The collection is empty, MoveNext() returns false, Parallel.ForEach() exits and Process() finishes.
Then you'll see the queue starting to fill up, because it's not consumed anymore.
You need to keep trying to read from the BlockingCollection until the download process finishes.
Related
I'm trying to replace the RateGate logic with a Polly policy. However, there is no status code or anything and I'm not sure if it's possible achieve the same idea but with Polly.
Usage
// Binance allows 5 messages per second, but we still get rate limited if we send a lot of messages at that rate
// By sending 3 messages per second, evenly spaced out, we can keep sending messages without being limited
private readonly RateGate _webSocketRateLimiter = new RateGate(1, TimeSpan.FromMilliseconds(330));
private void Send(IWebSocket webSocket, object obj)
{
var json = JsonConvert.SerializeObject(obj);
_webSocketRateLimiter.WaitToProceed();
Log.Trace("Send: " + json);
webSocket.Send(json);
}
RateGate class
public class RateGate : IDisposable
{
// Semaphore used to count and limit the number of occurrences per
// unit time.
private readonly SemaphoreSlim _semaphore;
// Times (in millisecond ticks) at which the semaphore should be exited.
private readonly ConcurrentQueue<int> _exitTimes;
// Timer used to trigger exiting the semaphore.
private readonly Timer _exitTimer;
// Whether this instance is disposed.
private bool _isDisposed;
/// <summary>
/// Number of occurrences allowed per unit of time.
/// </summary>
public int Occurrences
{
get; private set;
}
/// <summary>
/// The length of the time unit, in milliseconds.
/// </summary>
public int TimeUnitMilliseconds
{
get; private set;
}
/// <summary>
/// Flag indicating we are currently being rate limited
/// </summary>
public bool IsRateLimited
{
get { return !WaitToProceed(0); }
}
/// <summary>
/// Initializes a <see cref="RateGate"/> with a rate of <paramref name="occurrences"/>
/// per <paramref name="timeUnit"/>.
/// </summary>
/// <param name="occurrences">Number of occurrences allowed per unit of time.</param>
/// <param name="timeUnit">Length of the time unit.</param>
/// <exception cref="ArgumentOutOfRangeException">
/// If <paramref name="occurrences"/> or <paramref name="timeUnit"/> is negative.
/// </exception>
public RateGate(int occurrences, TimeSpan timeUnit)
{
// Check the arguments.
if (occurrences <= 0)
throw new ArgumentOutOfRangeException(nameof(occurrences), "Number of occurrences must be a positive integer");
if (timeUnit != timeUnit.Duration())
throw new ArgumentOutOfRangeException(nameof(timeUnit), "Time unit must be a positive span of time");
if (timeUnit >= TimeSpan.FromMilliseconds(UInt32.MaxValue))
throw new ArgumentOutOfRangeException(nameof(timeUnit), "Time unit must be less than 2^32 milliseconds");
Occurrences = occurrences;
TimeUnitMilliseconds = (int)timeUnit.TotalMilliseconds;
// Create the semaphore, with the number of occurrences as the maximum count.
_semaphore = new SemaphoreSlim(Occurrences, Occurrences);
// Create a queue to hold the semaphore exit times.
_exitTimes = new ConcurrentQueue<int>();
// Create a timer to exit the semaphore. Use the time unit as the original
// interval length because that's the earliest we will need to exit the semaphore.
_exitTimer = new Timer(ExitTimerCallback, null, TimeUnitMilliseconds, -1);
}
// Callback for the exit timer that exits the semaphore based on exit times
// in the queue and then sets the timer for the nextexit time.
// Credit to Jim: http://www.jackleitch.net/2010/10/better-rate-limiting-with-dot-net/#comment-3620
// for providing the code below, fixing issue #3499 - https://github.com/QuantConnect/Lean/issues/3499
private void ExitTimerCallback(object state)
{
try
{
// While there are exit times that are passed due still in the queue,
// exit the semaphore and dequeue the exit time.
var exitTime = 0;
var exitTimeValid = _exitTimes.TryPeek(out exitTime);
while (exitTimeValid)
{
if (unchecked(exitTime - Environment.TickCount) > 0)
{
break;
}
_semaphore.Release();
_exitTimes.TryDequeue(out exitTime);
exitTimeValid = _exitTimes.TryPeek(out exitTime);
}
// we are already holding the next item from the queue, do not peek again
// although this exit time may have already pass by this stmt.
var timeUntilNextCheck = exitTimeValid
? Math.Min(TimeUnitMilliseconds, Math.Max(0, exitTime - Environment.TickCount))
: TimeUnitMilliseconds;
_exitTimer.Change(timeUntilNextCheck, -1);
}
catch (Exception)
{
// can throw if called when disposing
}
}
/// <summary>
/// Blocks the current thread until allowed to proceed or until the
/// specified timeout elapses.
/// </summary>
/// <param name="millisecondsTimeout">Number of milliseconds to wait, or -1 to wait indefinitely.</param>
/// <returns>true if the thread is allowed to proceed, or false if timed out</returns>
public bool WaitToProceed(int millisecondsTimeout)
{
// Check the arguments.
if (millisecondsTimeout < -1)
throw new ArgumentOutOfRangeException(nameof(millisecondsTimeout));
CheckDisposed();
// Block until we can enter the semaphore or until the timeout expires.
var entered = _semaphore.Wait(millisecondsTimeout);
// If we entered the semaphore, compute the corresponding exit time
// and add it to the queue.
if (entered)
{
var timeToExit = unchecked(Environment.TickCount + TimeUnitMilliseconds);
_exitTimes.Enqueue(timeToExit);
}
return entered;
}
/// <summary>
/// Blocks the current thread until allowed to proceed or until the
/// specified timeout elapses.
/// </summary>
/// <param name="timeout"></param>
/// <returns>true if the thread is allowed to proceed, or false if timed out</returns>
public bool WaitToProceed(TimeSpan timeout)
{
return WaitToProceed((int)timeout.TotalMilliseconds);
}
/// <summary>
/// Blocks the current thread indefinitely until allowed to proceed.
/// </summary>
public void WaitToProceed()
{
WaitToProceed(Timeout.Infinite);
}
// Throws an ObjectDisposedException if this object is disposed.
private void CheckDisposed()
{
if (_isDisposed)
throw new ObjectDisposedException("RateGate is already disposed");
}
/// <summary>
/// Releases unmanaged resources held by an instance of this class.
/// </summary>
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
/// <summary>
/// Releases unmanaged resources held by an instance of this class.
/// </summary>
/// <param name="isDisposing">Whether this object is being disposed.</param>
protected virtual void Dispose(bool isDisposing)
{
if (!_isDisposed)
{
if (isDisposing)
{
// The semaphore and timer both implement IDisposable and
// therefore must be disposed.
_semaphore.Dispose();
_exitTimer.Dispose();
_isDisposed = true;
}
}
}
}
GitHub source code
Rate gate
Disclaimer: I haven't used this component so what I describe here is what I understand from the code.
It is an intrusive policy which means it modifies/alters the execution/data flow in order to slow down fast producer or smoothen out burst. It is blocking the flow to avoid resource abuse.
Here you can specify the "sleep duration" between subsequent calls which is enforced by the gate itself.
Polly's Rate limiter
This policy is designed to avoid resource abuse as well. That means if the consumer issues too many requests against the resource under a predefined time then it simply shortcuts the execution by throwing a RateLimitRejectedException.
So, if you want to allow 20 executions under 1 minute
RateLimitPolicy rateLimiter = Policy
.RateLimit(20, TimeSpan.FromSeconds(1));
and you do not want to exceed the limit you have to wait by yourself
rateLimiter.Execute(() =>
{
//Your Action delegate which runs <1ms
Thread.Sleep(50);
});
So, the executions should be distributed evenly during the allowed period. If your manually injected delay is shorter let's say 10ms then it will throw an exception.
Conclusion
According to my understanding both works like a proxy object. They are sitting between the consumer and producer to control the consumption rate.
The rate gate does that by injecting artificial delays whereas the rate limiter shortcuts the execution if abuse is detected.
Hi I have a question I have a simple timed event that looks like:
public override async Task Execute(uint timedIntervalInMs = 1)
{
timer.Interval = timedInterval;
timer.Elapsed += OnTimedEvent;
timer.AutoReset = true;
timer.Enabled = true;
}
protected override void OnTimedEvent(object source, ElapsedEventArgs evrntArgs)
{
Task.Run(async () =>
{
var message = await BuildFrame();
await sender.Send(message, null);
});
}
What it does it build simple byte array about 27 bytes and send it via UDP, and I want to send that message each 1 ms, but as i checked with timer sending 1000 request takes about 2 - 3 (so about 330 frames per second)seconds, and that is not what I am aiming for, I suspect that timer is waiting for event to finish its work. Is this true, and can this be avoided so I can start sending frame each ms no matter if event is finished or not?
Something like this might be quite useful, the PeriodicYield<T> function will return a sequence of results from a generator function.
These results will be delivered at the end of the last full period that didn't complete yet.
Alter SimpleGenerator to mimic whatever delay in gneration you would like.
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
namespace AsynchronouslyDelayedEnumerable
{
internal class Program
{
private static int Counter;
private static async Task Main(string[] args)
{
await foreach (var value in PeriodicYield(SimpleGenerator, 1000))
{
Console.WriteLine(
$"Time\"{DateTimeOffset.UtcNow}\", Value:{value}");
}
}
private static async Task<int> SimpleGenerator()
{
await Task.Delay(1500);
return Interlocked.Increment(ref Counter);
}
/// <summary>
/// Yield a result periodically.
/// </summary>
/// <param name="generatorAsync">Some generator delegate.</param>
/// <param name="periodMilliseconds">
/// The period in milliseconds at which results should be yielded.
/// </param>
/// <param name="token">A cancellation token.</param>
/// <typeparam name="T">The type of the value to yield.</typeparam>
/// <returns>A sequence of values.</returns>
private static async IAsyncEnumerable<T> PeriodicYield<T>(
Func<Task<T>> generatorAsync,
int periodMilliseconds,
CancellationToken token = default)
{
// Set up a starting point.
var last = DateTimeOffset.UtcNow;
// Continue until cancelled.
while (!token.IsCancellationRequested)
{
// Get the next value.
var nextValue = await generatorAsync();
// Work out the end of the next whole period.
var now = DateTimeOffset.UtcNow;
var gap = (int)(now - last).TotalMilliseconds;
var head = gap % periodMilliseconds;
var tail = periodMilliseconds - head;
var next = now.AddMilliseconds(tail);
// Wait for the end of the next whole period with
// logarithmically shorter delays.
while (next >= DateTimeOffset.Now)
{
var delay = (int)(next - DateTimeOffset.Now).TotalMilliseconds;
delay = (int)Math.Max(1.0, delay * 0.1);
await Task.Delay(delay, token);
}
// Check if cancelled.
if (token.IsCancellationRequested)
{
continue;
}
// return the value and update the last time.
yield return nextValue;
last = DateTimeOffset.UtcNow;
}
}
}
}
As #harol said, Timer doesn't have such high resolution. Because Windows or Linux is not a real-time operating system. It is not possible to trigger an event on precise time. You can trigger an event on approximately time.
Also operating system or your network card driver may decide to wait until the network buffer is full or at a specific value.
While working on an answer to this question, I wrote this snippet:
var buffer = new BufferBlock<object>();
var producer = Task.Run(async () =>
{
while (true)
{
await Task.Delay(TimeSpan.FromMilliseconds(100));
buffer.Post(null);
Console.WriteLine("Post " + buffer.Count);
}
});
var consumer = Task.Run(async () =>
{
while (await buffer.OutputAvailableAsync())
{
IList<object> items;
buffer.TryReceiveAll(out items);
Console.WriteLine("TryReceiveAll " + buffer.Count);
}
});
await Task.WhenAll(consumer, producer);
The producer should post items to the buffer every 100 ms and the consumer should clear all items out of the buffer and asynchronously wait for more items to show up.
What actually happens is that the producer clears all items once, and then never again moves beyond OutputAvailableAsync. If I switch the consumer to remove items one by one it works as excepted:
while (await buffer.OutputAvailableAsync())
{
object item;
while (buffer.TryReceive(out item)) ;
}
Am I misunderstanding something? If not, what is the problem?
This is a bug in SourceCore being used internally by BufferBlock. Its TryReceiveAll method doesn't turn on the _enableOffering boolean data member while TryReceive does. That results in the task returned from OutputAvailableAsync never completing.
Here's a minimal reproduce:
var buffer = new BufferBlock<object>();
buffer.Post(null);
IList<object> items;
buffer.TryReceiveAll(out items);
var outputAvailableAsync = buffer.OutputAvailableAsync();
buffer.Post(null);
await outputAvailableAsync; // Never completes
I've just fixed it in the .Net core repository with this pull request. Hopefully the fix finds itself in the nuget package soon.
Alas, it's the end of September 2015, and although i3arnon fixed the error it is not solved in the version that was released two days after the error was fixed: Microsoft TPL Dataflow version 4.5.24.
However IReceivableSourceBlock.TryReceive(...) works correctly.
An extension method will solve the problem. After a new release of TPL Dataflow it will be easy to change the extension method.
/// <summary>
/// This extension method returns all available items in the IReceivableSourceBlock
/// or an empty sequence if nothing is available. The functin does not wait.
/// </summary>
/// <typeparam name="T">The type of items stored in the IReceivableSourceBlock</typeparam>
/// <param name="buffer">the source where the items should be extracted from </param>
/// <returns>The IList with the received items. Empty if no items were available</returns>
public static IList<T> TryReceiveAllEx<T>(this IReceivableSourceBlock<T> buffer)
{
/* Microsoft TPL Dataflow version 4.5.24 contains a bug in TryReceiveAll
* Hence this function uses TryReceive until nothing is available anymore
* */
IList<T> receivedItems = new List<T>();
T receivedItem = default(T);
while (buffer.TryReceive<T>(out receivedItem))
{
receivedItems.Add(receivedItem);
}
return receivedItems;
}
usage:
while (await this.bufferBlock.OutputAvailableAsync())
{
// some data available
var receivedItems = this.bufferBlock.TryReceiveAllEx();
if (receivedItems.Any())
{
ProcessReceivedItems(bufferBlock);
}
}
I've tried this MANY ways, here is the current iteration. I think I've just implemented this all wrong. What I'm trying to accomplish is to treat this Asynch result in such a way that until it returns AND I finish with my add-thumbnail call, I will not request another call to imageProvider.BeginGetImage.
To Clarify, my question is two-fold. Why does what I'm doing never seem to halt at my Mutex.WaitOne() call, and what is the proper way to handle this scenario?
/// <summary>
/// re-creates a list of thumbnails from a list of TreeElementViewModels (directories)
/// </summary>
/// <param name="list">the list of TreeElementViewModels to process</param>
public void BeginLayout(List<AiTreeElementViewModel> list)
{
// *removed code for canceling and cleanup from previous calls*
// Starts the processing of all folders in parallel.
Task.Factory.StartNew(() =>
{
thumbnailRequests = Parallel.ForEach<AiTreeElementViewModel>(list, options, ProcessFolder);
});
}
/// <summary>
/// Processes a folder for all of it's image paths and loads them from disk.
/// </summary>
/// <param name="element">the tree element to process</param>
private void ProcessFolder(AiTreeElementViewModel element)
{
try
{
var images = ImageCrawler.GetImagePaths(element.Path);
AsyncCallback callback = AddThumbnail;
foreach (var image in images)
{
Console.WriteLine("Attempting Enter");
synchMutex.WaitOne();
Console.WriteLine("Entered");
var result = imageProvider.BeginGetImage(callback, image);
}
}
catch (Exception exc)
{
Console.WriteLine(exc.ToString());
// TODO: Do Something here.
}
}
/// <summary>
/// Adds a thumbnail to the Browser
/// </summary>
/// <param name="result">an async result used for retrieving state data from the load task.</param>
private void AddThumbnail(IAsyncResult result)
{
lock (Thumbnails)
{
try
{
Stream image = imageProvider.EndGetImage(result);
string filename = imageProvider.GetImageName(result);
string imagePath = imageProvider.GetImagePath(result);
var imageviewmodel = new AiImageThumbnailViewModel(image, filename, imagePath);
thumbnailHash[imagePath] = imageviewmodel;
HostInvoke(() => Thumbnails.Add(imageviewmodel));
UpdateChildZoom();
//synchMutex.ReleaseMutex();
Console.WriteLine("Exited");
}
catch (Exception exc)
{
Console.WriteLine(exc.ToString());
// TODO: Do Something here.
}
}
}
To start with,
you create a Task to do a Parallel.ForEach to run a Method that Invokes a delegate.
Three levels of parallelism where 1 would be enough.
And if I read this right, inside the delegate you want to use a Mutex to run only 1 instance at a time.
Could you indicate which actions you want to happen in parallel?
Here is the context: I am writing an interpreter in C# for a small programming language called Heron, and it has some primitive list operations which can be executed in parallel.
One of the biggest challenges I am facing is to distribute the work done by the evaluator across the different cores effectively whenever a parallelizable operation is encountered. This can be a short or long operation, it is hard to determine in advance.
One thing that I don't have to worry about is synchronizing data: the parallel operations are explicitly not allowed to modify data.
So the primary questions I have is:
What is the most effective way to distribute the work across threads, so that I can guarantee that the computer will distribute the work across two cores?
I am also interested in a related question:
Roughly how long should an operation take before we can start overcoming the overhead of separating the work onto another thread?
If you want to do a lot with parallel operations, you're going to want to start with .Net 4.0. Here's the Parallel Programming for .Net documentation. You'll want to start here though. .Net 4.0 adds a LOT in terms of multi-core utilization. Here's a quick example:
Current 3.5 Serial method:
for(int i = 0; i < 30000; i++)
{
doSomething(i);
}
New .Net 4.0 Parallel method:
Parallel.For(0, 30000, (i) => doSomething(i));
The Parallel.For method automatically scales across the number of cores available, you can see how fast you could start taking advantage of this. There are dozens of new libraries in the framework, supporting full thread/task management like your example as well (Including all the piping for syncing, cancellation, etc).
There are libraries for Parallel LINQ (PLINQ), Task Factories, Task Schedulers and a few others. In short, for the specific task you laid out .Net 4.0 has huge benefits for you, and I'd go ahead and grab the free beta 2 (RC coming soon) and get started. (No, I don't work for Microsoft...but rarely do I see an upcoming release fulfill a need so perfectly, so I highly recommend .Net 4.0 for you)
Because I didn't want to develop using VS 2010, and I found that ThreadPool didn't have optimal performance for distributing work across cores (I think because it started/stopped too many threads) I ended up rolling my own. Hope that others find this useful:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace HeronEngine
{
/// <summary>
/// Represents a work item.
/// </summary>
public delegate void Task();
/// <summary>
/// This class is intended to efficiently distribute work
/// across the number of cores.
/// </summary>
public static class Parallelizer
{
/// <summary>
/// List of tasks that haven't been yet acquired by a thread
/// </summary>
static List<Task> allTasks = new List<Task>();
/// <summary>
/// List of threads. Should be one per core.
/// </summary>
static List<Thread> threads = new List<Thread>();
/// <summary>
/// When set signals that there is more work to be done
/// </summary>
static ManualResetEvent signal = new ManualResetEvent(false);
/// <summary>
/// Used to tell threads to stop working.
/// </summary>
static bool shuttingDown = false;
/// <summary>
/// Creates a number of high-priority threads for performing
/// work. The hope is that the OS will assign each thread to
/// a separate core.
/// </summary>
/// <param name="cores"></param>
public static void Initialize(int cores)
{
for (int i = 0; i < cores; ++i)
{
Thread t = new Thread(ThreadMain);
// This system is not designed to play well with others
t.Priority = ThreadPriority.Highest;
threads.Add(t);
t.Start();
}
}
/// <summary>
/// Indicates to all threads that there is work
/// to be done.
/// </summary>
public static void ReleaseThreads()
{
signal.Set();
}
/// <summary>
/// Used to indicate that there is no more work
/// to be done, by unsetting the signal. Note:
/// will not work if shutting down.
/// </summary>
public static void BlockThreads()
{
if (!shuttingDown)
signal.Reset();
}
/// <summary>
/// Returns any tasks queued up to perform,
/// or NULL if there is no work. It will reset
/// the global signal effectively blocking all threads
/// if there is no more work to be done.
/// </summary>
/// <returns></returns>
public static Task GetTask()
{
lock (allTasks)
{
if (allTasks.Count == 0)
{
BlockThreads();
return null;
}
Task t = allTasks.Peek();
allTasks.Pop();
return t;
}
}
/// <summary>
/// Primary function for each thread
/// </summary>
public static void ThreadMain()
{
while (!shuttingDown)
{
// Wait until work is available
signal.WaitOne();
// Get an available task
Task task = GetTask();
// Note a task might still be null becaue
// another thread might have gotten to it first
while (task != null)
{
// Do the work
task();
// Get the next task
task = GetTask();
}
}
}
/// <summary>
/// Distributes work across a number of threads equivalent to the number
/// of cores. All tasks will be run on the available cores.
/// </summary>
/// <param name="localTasks"></param>
public static void DistributeWork(List<Task> localTasks)
{
// Create a list of handles indicating what the main thread should wait for
WaitHandle[] handles = new WaitHandle[localTasks.Count];
lock (allTasks)
{
// Iterate over the list of localTasks, creating a new task that
// will signal when it is done.
for (int i = 0; i < localTasks.Count; ++i)
{
Task t = localTasks[i];
// Create an event used to signal that the task is complete
ManualResetEvent e = new ManualResetEvent(false);
// Create a new signaling task and add it to the list
Task signalingTask = () => { t(); e.Set(); };
allTasks.Add(signalingTask);
// Set the corresponding wait handler
handles[i] = e;
}
}
// Signal to waiting threads that there is work
ReleaseThreads();
// Wait until all of the designated work items are completed.
Semaphore.WaitAll(handles);
}
/// <summary>
/// Indicate to the system that the threads should terminate
/// and unblock them.
/// </summary>
public static void CleanUp()
{
shuttingDown = true;
ReleaseThreads();
}
}
}
I would go with thread pool even though it has its problems, MS is investing in improving it and it seems like .NET 4 will have an improved one. At this point, I think that the best thing would be to use the thread pool wrapped in your own object and wait with deciding about your own implementation