How to speed up task<t> with httpclient - c#

I have a processes where I need to make ~100 http api calls to a server and process the results. I've put together this commandexecutor which builds a list of commands and then runs them async. To make about 100 calls and parse the result is taking over 1 minute. 1 request using a browser give me a response in ~100ms. You would think that ~100 calls would be around 10 seconds. I believe that I am doing something wrong and that this should go much faster.
public static class CommandExecutor
{
private static readonly ThreadLocal<List<Command>> CommandsToExecute =
new ThreadLocal<List<Command>>(() => new List<Command>());
private static readonly ThreadLocal<List<Task<List<Candidate>>>> Tasks =
new ThreadLocal<List<Task<List<Candidate>>>>(() => new List<Task<List<Candidate>>>());
public static void ExecuteLater(Command command)
{
CommandsToExecute.Value.Add(command);
}
public static void StartExecuting()
{
foreach (var command in CommandsToExecute.Value)
{
Tasks.Value.Add(Task.Factory.StartNew<List<Candidate>>(command.GetResult));
}
Task.WaitAll(Tasks.Value.ToArray());
}
public static List<Candidate> Result()
{
return Tasks.Value.Where(x => x.Result != null)
.SelectMany(x => x.Result)
.ToList();
}
}
The Command that I am passing into this list creates a new httpclient, calls the getasync on that client with a url, converts the string response to an object then hydrates a field.
protected void Initialize()
{
_httpClient = new HttpClient();
_httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("text/plain"));
}
protected override void Execute()
{
Initialize();
var task = _httpClient.GetAsync(string.Format(Url, Input));
Result = ConvertResponseToObjectAsync(task).Result;
Result.ForEach(x => x.prop = value);
}
private static Task<Model> ConvertResponseToObjectAsync(Task<HttpResponseMessage> task)
{
return task.Result.Content.ReadAsAsync<Model>(
new MediaTypeFormatter[]
{
new Formatter()
});
}
Can you pick up on my bottleneck or have any suggestions on how to speed this up.
EDIT
making these changes made it down to 4 seconds.
protected override void Execute()
{
Initialize();
_httpClient.GetAsync(string.Format(Url, Input))
.ContinueWith(httpResponse => ConvertResponseToObjectAsync(httpResponse)
.ContinueWith(ProcessResult));
}
protected void ProcessResult(Task<Model> model)
{
Result = model.Result;
Result.ForEach(x => x.prop = value);
}

Stop creating new HttpClient instances. Everytime you dispose a HttpClient instance it closes the TCP/IP connection. Create one HttpClient instance and re-use it for every request. HttpClient can make multiple requests on multiple different threads at the same time.

Avoid the use of task.Result in ConvertResponseToObjectAsync and then again in Execute. Instead chain these on to the original GetAsync task with ContinueWith.
As it stands today, Result will block execution of the current thread until the other task finishes. However, your threadpool will quickly get backed up by tasks waiting on other tasks that have nowhere to run. Eventually (after waiting for a second), the threadpool will add an additional thread to run and so this will eventually finish, but it's hardly efficient.
As a general principle, you should avoid ever accessing Task.Result except in a task continuation.
As a bonus, you probably don't want to be using ThreadLocalStorage. ThreadLocalStorage stores an instance of the item stored in it on each thread where it is accessed. In this case, it looks like you want a thread-safe but shared form of storage. I would recommend ConcurrentQueue for this sort of thing.

Related

Forcing certain code to always run on the same thread

We have an old 3rd party system (let's call it Junksoft® 95) that we interface with via PowerShell (it exposes a COM object) and I'm in the process of wrapping it in a REST API (ASP.NET Framework 4.8 and WebAPI 2). I use the System.Management.Automation nuget package to create a PowerShell in which I instantiate Junksoft's COM API as a dynamic object that I then use:
//I'm omitting some exception handling and maintenance code for brevity
powerShell = System.Management.Automation.PowerShell.Create();
powerShell.AddScript("Add-Type -Path C:\Path\To\Junksoft\Scripting.dll");
powerShell.AddScript("New-Object Com.Junksoft.Scripting.ScriptingObject");
dynamic junksoftAPI = powerShell.Invoke()[0];
//Now we issue commands to junksoftAPI like this:
junksoftAPI.Login(user,pass);
int age = junksoftAPI.GetAgeByCustomerId(custId);
List<string> names = junksoftAPI.GetNames();
This works fine when I run all of this on the same thread (e.g. in a console application). However, for some reason this usually doesn't work when I put junksoftAPI into a System.Web.Caching.Cache and use it from different controllers in my web app. I say ususally because this actually works when ASP.NET happens to give the incoming call to the thread that junksoftAPI was created on. If it doesn't, Junksoft 95 gives me an error.
Is there any way for me to make sure that all interactions with junksoftAPI happen on the same thread?
Note that I don't want to turn the whole web application into a single-threaded application! The logic in the controllers and elswhere should happen like normal on different threads. It should only be the Junksoft interactions that happen on the Junksoft-specific thread, something like this:
[HttpGet]
public IHttpActionResult GetAge(...)
{
//finding customer ID in database...
...
int custAge = await Task.Run(() => {
//this should happen on the Junksoft-specific thread and not the next available thread
var cache = new System.Web.Caching.Cache();
var junksoftAPI = cache.Get(...); //This has previously been added to cache on the Junksoft-specific thread
return junksoftAPI.GetAgeByCustomerId(custId);
});
//prepare a response using custAge...
}
You can create your own singleton worker thread to achieve this. Here is the code which you can plug it into your web application.
public class JunkSoftRunner
{
private static JunkSoftRunner _instance;
//singleton pattern to restrict all the actions to be executed on a single thread only.
public static JunkSoftRunner Instance => _instance ?? (_instance = new JunkSoftRunner());
private readonly SemaphoreSlim _semaphore;
private readonly AutoResetEvent _newTaskRunSignal;
private TaskCompletionSource<object> _taskCompletionSource;
private Func<object> _func;
private JunkSoftRunner()
{
_semaphore = new SemaphoreSlim(1, 1);
_newTaskRunSignal = new AutoResetEvent(false);
var contextThread = new Thread(ThreadLooper)
{
Priority = ThreadPriority.Highest
};
contextThread.Start();
}
private void ThreadLooper()
{
while (true)
{
//wait till the next task signal is received.
_newTaskRunSignal.WaitOne();
//next task execution signal is received.
try
{
//try execute the task and get the result
var result = _func.Invoke();
//task executed successfully, set the result
_taskCompletionSource.SetResult(result);
}
catch (Exception ex)
{
//task execution threw an exception, set the exception and continue with the looper
_taskCompletionSource.SetException(ex);
}
}
}
public async Task<TResult> Run<TResult>(Func<TResult> func, CancellationToken cancellationToken = default(CancellationToken))
{
//allows only one thread to run at a time.
await _semaphore.WaitAsync(cancellationToken);
//thread has acquired the semaphore and entered
try
{
//create new task completion source to wait for func to get executed on the context thread
_taskCompletionSource = new TaskCompletionSource<object>();
//set the function to be executed by the context thread
_func = () => func();
//signal the waiting context thread that it is time to execute the task
_newTaskRunSignal.Set();
//wait and return the result till the task execution is finished on the context/looper thread.
return (TResult)await _taskCompletionSource.Task;
}
finally
{
//release the semaphore to allow other threads to acquire it.
_semaphore.Release();
}
}
}
Console Main Method for testing:
public class Program
{
//testing the junk soft runner
public static void Main()
{
//get the singleton instance
var softRunner = JunkSoftRunner.Instance;
//simulate web request on different threads
for (var i = 0; i < 10; i++)
{
var taskIndex = i;
//launch a web request on a new thread.
Task.Run(async () =>
{
Console.WriteLine($"Task{taskIndex} (ThreadID:'{Thread.CurrentThread.ManagedThreadId})' Launched");
return await softRunner.Run(() =>
{
Console.WriteLine($"->Task{taskIndex} Completed On '{Thread.CurrentThread.ManagedThreadId}' thread.");
return taskIndex;
});
});
}
}
}
Output:
Notice that, though the function was launched from the different threads, some portion of code got always executed always on the same context thread with ID: '5'.
But beware that, though all the web requests are executed on independent threads, they will eventually wait for some tasks to get executed on the singleton worker thread. This will eventually create a bottle neck in your web application. This is anyway your design limitation.
Here is how you could issue commands to the Junksoft API from a dedicated STA thread, using a BlockingCollection class:
public class JunksoftSTA : IDisposable
{
private readonly BlockingCollection<Action<Lazy<dynamic>>> _pump;
private readonly Thread _thread;
public JunksoftSTA()
{
_pump = new BlockingCollection<Action<Lazy<dynamic>>>();
_thread = new Thread(() =>
{
var lazyApi = new Lazy<dynamic>(() =>
{
var powerShell = System.Management.Automation.PowerShell.Create();
powerShell.AddScript("Add-Type -Path C:\Path\To\Junksoft.dll");
powerShell.AddScript("New-Object Com.Junksoft.ScriptingObject");
dynamic junksoftAPI = powerShell.Invoke()[0];
return junksoftAPI;
});
foreach (var action in _pump.GetConsumingEnumerable())
{
action(lazyApi);
}
});
_thread.SetApartmentState(ApartmentState.STA);
_thread.IsBackground = true;
_thread.Start();
}
public Task<T> CallAsync<T>(Func<dynamic, T> function)
{
var tcs = new TaskCompletionSource<T>(
TaskCreationOptions.RunContinuationsAsynchronously);
_pump.Add(lazyApi =>
{
try
{
var result = function(lazyApi.Value);
tcs.SetResult(result);
}
catch (Exception ex)
{
tcs.SetException(ex);
}
});
return tcs.Task;
}
public Task CallAsync(Action<dynamic> action)
{
return CallAsync<object>(api => { action(api); return null; });
}
public void Dispose() => _pump.CompleteAdding();
public void Join() => _thread.Join();
}
The purpose of using the Lazy class is for surfacing a possible exception during the construction of the dynamic object, by propagating it to the callers.
...exceptions are cached. That is, if the factory method throws an exception the first time a thread tries to access the Value property of the Lazy<T> object, the same exception is thrown on every subsequent attempt.
Usage example:
// A static field stored somewhere
public static readonly JunksoftSTA JunksoftStatic = new JunksoftSTA();
await JunksoftStatic.CallAsync(api => { api.Login("x", "y"); });
int age = await JunksoftStatic.CallAsync(api => api.GetAgeByCustomerId(custId));
In case you find that a single STA thread is not enough to serve all the requests in a timely manner, you could add more STA threads, all of them running the same code (private readonly Thread[] _threads; etc). The BlockingCollection class is thread-safe and can be consumed concurrently by any number of threads.
If you did not say that was a 3rd party tool, I would have asumed it is a GUI class. For practical reasons, it is a very bad idea to have multiple threads write to them. .NET enforces a strict "only the creating thread shall write" rule, from 2.0 onward.
WebServers in general and ASP.Net in particular use a pretty big thread pool. We are talking 10's to 100's of Threads per Core. That means it is really hard to nail any request down to a specific Thread. You might as well not try.
Again, looking at the GUI classes might be your best bet. You could basically make a single thread with the sole purpose of immitating a GUI's Event Queue. The Main/UI Thread of your average Windows Forms application, is responsible for creating every GUI class instance. It is kept alive by polling/processing the event queue. It ends onlyx when it receies a cancel command, via teh Event Queue. Dispatching just puts orders into that Queue, so we can avoid Cross-Threading issues.

Buffer tasks data with timeout

I have a rather tricky question to solve. I have multiple (up to hundred or more) tasks, each of them produce a piece of data, say, string. These tasks can be spawn in every moment and there can be huge amount of them in one time and no at another. Each task must receive bool, indicating, whether is was completed correctly or not (that's important).
I want to implement some kind of buffer, to agregate data from tasks and flush it to external service, returning operation state (ok or fail). Also, my buffer must be flushed by timeout (to prevent waiting for new tasks to generate data for too long).
So far i tried to make some shared list of items. Tasks can add items to list and there is another task, checking timer or count of items in list and flushing them. But in this approach i can't tell status of flush operation to task, which is very bad for me.
I'll be gratefull for any approarch to solve my problem.
As I understand, you need to save result of each task to database/service, but you don't want to do it immediately.
There can be more than one solution to your problem, but it's difficult to come up with the best one, so I'll describe how I would have done it ... quickly.
A container for data you need to save/send.
public class TaskResultEventArgs : EventArgs
{
public bool Result { get; set; }
}
A notifier which also runs the task for you. I assumed you can delay execution of tasks.
public class NotifyingTaskRunner
{
public event EventHandler<TaskResultEventArgs> TaskCompleted;
public void RunAndNotify(Task<bool> task)
{
task.ContinueWith(t =>
{
OnTaskCompleted(this, new TaskResultEventArgs { Result = t.Result });
}, TaskContinuationOptions.OnlyOnRanToCompletion);
task.Start();
}
protected virtual void OnTaskCompleted(object sender, TaskResultEventArgs e)
{
var h = TaskCompleted;
if (h != null)
{
h.Invoke(sender, e);
}
}
}
A listener which can buffer and/or flush results (or you might want to delegate this to another class).
public class Listener
{
private ConcurrentQueue<bool> _queue = new ConcurrentQueue<bool>();
public Listener(NotifyingTaskRunner runner)
{
runner.TaskCompleted += Flush;
}
public async void Flush(object sender, TaskResultEventArgs e)
{
// Enqueue status to flush everything later (or flush it immediately)
_queue.Enqueue(e.Result);
}
}
And this is how you can use everything together.
var runner = new NotifyingTaskRunner();
var listener = new Listener(runner);
var t1 = new Task<bool>(() => { return true; });
var t2 = new Task<bool>(() => { return false; });
runner.RunAndNotify(t1);
runner.RunAndNotify(t2);

Optimising asyncronus HttpClient requests

I have made a class to handle multiple HTTP GET requests. It looks something like this:
public partial class MyHttpClass : IDisposable
{
private HttpClient theClient;
private string ApiBaseUrl = "https://example.com/";
public MyHttpClass()
{
this.theClient = new HttpClient();
this.theClient.BaseAddress = new Uri(ApiBaseUrl);
this.theClient.DefaultRequestHeaders.Accept.Clear();
this.theClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
public async Task<JObject> GetAsync(string reqUrl)
{
var returnObj = new JObject();
var response = await this.theClient.GetAsync(reqUrl);
if (response.IsSuccessStatusCode)
{
returnObj = await response.Content.ReadAsAsync<JObject>();
Console.WriteLine("GET successful");
}
else
{
Console.WriteLine("GET failed");
}
return returnObj;
}
public void Dispose()
{
theClient.Dispose();
}
}
I am then queueing multiple requets by using a loop over Task.Run() and then after Task.WaitAll() in the manner of:
public async Task Start()
{
foreach(var item in list)
{
taskList.Add(Task.Run(() => this.GetThing(item)));
}
Task.WaitAll(taskList.ToArray());
}
public async Task GetThing(string url)
{
var response = await this.theClient.GetAsync(url);
// some code to process and save response
}
It definitiely works faster than synchonus operation but it is not as fast as I expected. Based on other advice I think the local threadpool is slowing me down. MSDN suggest I should specify it as a long running task but I can't see a way to do that calling it like this.
Right now I haven't got into limiting threads, I am just doing batches and testing speed to discover the right approach.
Can anyone suggest some areas for me to look at to increase the speed?
So, after you've set your DefaultConnectionLimit to a nice high number, or just the ConnectionLimit of the ServicePoint that manages connections to the host you are hitting:
ServicePointManager
.FindServicePoint(new Uri("https://example.com/"))
.ConnectionLimit = 1000;
the only suspect bit of code is where you start everything...
public async Task Start()
{
foreach(var item in list)
{
taskList.Add(Task.Run(() => this.GetThing(item)));
}
Task.WaitAll(taskList.ToArray());
}
This can be reduced to
var tasks = list.Select(this.GetThing);
to create the tasks (your async methods return hot (running) tasks... no need to double wrap with Task.Run)
Then, rather that blocking while waiting for them to complete, wait asynchronously instead:
await Task.WhenAll(tasks);
You are probably hitting some overhead in creating multiple instance-based HttpClient vs using a static instance. Your implementation will not scale. Using a shared HttpClient is actually recommended.
See my answer why - What is the overhead of creating a new HttpClient per call in a WebAPI client?

Async Producer/Consumer

I have a instance of a class that is accessed from several threads. This class take this calls and add a tuple into a database. I need this to be done in a serial manner, as due to some db constraints, parallel threads could result in an inconsistent database.
As I am new to parallelism and concurrency in C#, I did this:
private BlockingCollection<Task> _tasks = new BlockingCollection<Task>();
public void AddDData(string info)
{
Task t = new Task(() => { InsertDataIntoBase(info); });
_tasks.Add(t);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Task t;
if (_tasks.TryTake(out t))
{
t.Start();
t.Wait();
}
}
});
}
The AddDData is the one who is called by multiple threads and InsertDataIntoBase is a very simple insert that should take few milliseconds.
The problem is that, for some reason that my lack of knowledge doesn't allow me to figure out, sometimes a task is been called twice! It always goes like this:
T1
T2
T3
T1 <- PK error.
T4
...
Did I understand .Take() completely wrong, am I missing something or my producer/ consumer implementation is really bad?
Best Regards,
Rafael
UPDATE:
As suggested, I made a quick sandbox test implementation with this architecture and as I was suspecting, it does not guarantee that a task will not be fired before the previous one finishes.
So the question remains: how to properly queue tasks and fire them sequentially?
UPDATE 2:
I simplified the code:
private BlockingCollection<Data> _tasks = new BlockingCollection<Data>();
public void AddDData(Data info)
{
_tasks.Add(info);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Data info;
if (_tasks.TryTake(out info))
{
InsertIntoDB(info);
}
}
});
}
Note that I got rid of Tasks as I'm relying on synced InsertIntoDB call (as it is inside a loop), but still no luck... The generation is fine and I'm absolutely sure that only unique instances are going to the queue. But no matter I try, sometimes the same object is used twice.
I think this should work:
private static BlockingCollection<string> _itemsToProcess = new BlockingCollection<string>();
static void Main(string[] args)
{
InsertWorker();
GenerateItems(10, 1000);
_itemsToProcess.CompleteAdding();
}
private static void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_itemsToProcess.IsCompleted)
{
string t;
if (_itemsToProcess.TryTake(out t))
{
// Do whatever needs doing here
// Order should be guaranteed since BlockingCollection
// uses a ConcurrentQueue as a backing store by default.
// http://msdn.microsoft.com/en-us/library/dd287184.aspx#remarksToggle
Console.WriteLine(t);
}
}
});
}
private static void GenerateItems(int count, int maxDelayInMs)
{
Random r = new Random();
string[] items = new string[count];
for (int i = 0; i < count; i++)
{
items[i] = i.ToString();
}
// Simulate many threads adding items to the collection
items
.AsParallel()
.WithDegreeOfParallelism(4)
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.Select((x) =>
{
Thread.Sleep(r.Next(maxDelayInMs));
_itemsToProcess.Add(x);
return x;
}).ToList();
}
This does mean that the consumer is single threaded, but allows for multiple producer threads.
From your comment
"I simplified the code shown here, as the data is not a string"
I assume that info parameter passed into AddDData is a mutable reference type. Make sure that the caller is not using the same info instance for multple calls since that reference is captured in Task lambda .
Based on the trace that you provided the only logical possibility is that you have called InsertWorker twice (or more). There are thus two background threads waiting for items to appear in the collection and occasionally they both manage to grab an item and begin executing it.

Dispose RX Threads on application exit

I am stuck at correctly disposing Threads, created with RX, on application exiting. I see in Process Explorer that after application closed, threads are still running, causing an IO exceptions.
class Program
{
static void Main(string[] args)
{
CompositeDisposable subsriptions = new CompositeDisposable();
subscriptions.Add(Observable.Interval(TimeSpan.FromSeconds(15))
.Subscribe(_ =>
{
getData();
}));
Thread.Sleep(TimeSpan.FromSeconds(20));
subscriptions.Dispose();
}
}
}
If you see if I uncomment the subscription.Dispose(), the thread terminates without getting any data. Any help would be appreciated. Thanks
You need some sort of delay between subsriptions.Add(...) and subscriptions.Dispose(). Without a delay between there, your app is simply subscribing and disposing them immediately, with no time for the threads to do their work. (And the Thread.Sleep(1000) doesn't work, since it is inside the subscription function, not part of the main function.)
The pattern you are looking for is similar to this
class Program {
public string GetData(){
return "Hello";
}
public string async GetDataAsync(){
return await Observable
.Interval(TimeSpan.FromSeconds(15))
.Take(1)
.Select(()=>GetData());
}
static void Main(string[]args){
var s = GetDataAsync().Wait();
}
}
The reason for Wait is that an entry point, Main, in this case cannot be
marked as async. Wait blocks the current thread until the Task returned
by GetDataAsync produces a value.
Note also that IObservable is compatible with async/await and will return the last
value produced by the sequence. That is why I add Take(1) as it will produce
only 1 tick.
Another alternative is just to call Wait directly on the IObservable as in
class Program {
public string GetData(){
return "Hello";
}
public IObservable<string> GetDataObservable(){
return Observable
.Interval(TimeSpan.FromSeconds(15))
.Take(1)
.Select(()=>GetData());
}
static void Main(string[]args){
var s = GetDataObservable().Wait();
}
}
You can subscribe your observable with the CancellationToken that will cancel the underlying task execution:
static void Main(string[] args)
{
var cts = new CancellationTokenSource();
Observable.
Interval(TimeSpan.FromSeconds(15)).
Subscribe(_ => getData(), cts.Token));
cts.CancelAfter(TimeSpan.FromSeconds(20));
}

Categories