Dilemma, dilemma...
I've been working up a solution to a problem that uses async calls to the HttpClient library (GetAsync=>ConfigureAwait(false) etc). IIn a console app, my dll is very responsive and the mixture of using the async await calls and the Parallel.ForEach(=>) really makes me glow.
Now for the issue. After moving from this test harness to the target app, things have become problematic. I'm using asp.net mvc 4 and have hit a few issues. The main issue really is that calling my process on a controller action actually blocks the main thread until the async actions are complete. I've tried using an async controller pattern, I've tried using Task.Factory, I've tried using new Threads. You name it, I've tried all the flavours - and then some!.
Now, I appreciate that the nature of http is not designed to facilitate long processes like this and there are a number of articles here on SO that say don't do it. However, there are mitigating reasons why i NEED to use this approach. The main reason that I need to run this in mvc is due to the fact that I actually update the live data cache (on the mvc app) in realtime via raising an event in my dll's code. This means that fragments of the 50-60 data feeds can be pushed out live before the entire async action is complete. Therefore, client apps can receive partial updates within seconds of the async action being instigated. If I were to delegate the process out to a console app that ran the entire process in the background, I'd no longer be able to harness those fragment partial updates and this is the raison d'etre behind the entire choice of this architecture.
Can anyone shed light on a solution that would allow me to mitigate the blocking of the thread, whilst at the same time, allow each async fragment to be consumed by my object model and fed out to the client apps (I'm using signalr to make these client updates). A kind of nirvanna would be a scenario where an out-of-process cache object could be shared between numerous processes - the cache update could then be triggered and consumed by my mvc process (aka - http://devproconnections.com/aspnet-mvc/out-process-caching-aspnet). And so back to reality...
I have also considered using a secondary webservice to achieve this, but would welcome other options before once again over engineering my solution (there are already many moving parts and a multitude of async Actions going on).
Sorry not to have added any code, I'm hoping for practical philosophy/insights, rather than code help on this, tho would of course welcome coded examples that illustrate a solution to my problem.
I'll update the question as we move in time, as my thinking process is still maturing on this.
[edit] - for the sake of clarity, the snippet below is my brothers grimm code collision (extracted from a larger body of work):
Parallel.ForEach(scrapeDataBases, new ParallelOptions()
{
MaxDegreeOfParallelism = Environment.ProcessorCount * 15
},
async dataBase =>
{
await dataBase.ScrapeUrlAsync().ConfigureAwait(false);
await UpdateData(dataType, (DataCheckerScrape)dataBase);
});
async and Parallel.ForEach do not mix naturally, so I'm not sure what your console solution looks like. Furthermore, Parallel should almost never be used on ASP.NET at all.
It sounds like what you would want is to just use Task.WhenAll.
On a side note, I think your reasoning around background processing on ASP.NET is incorrect. It is perfectly possible to have a separate process that updates the clients via SignalR.
Being that your question is pretty high level without a lot of code. You could try Reactive Extensions.
Something like
private IEnumerable<Task<Scraper>> ScrappedUrls()
{
// Return the 50 to 60 task for each website here.
// I assume they all return the same type.
// return .ScrapeUrlAsync().ConfigureAwait(false);
throw new NotImplementedException();
}
public async Task<IEnumerable<ScrapeOdds>> GetOdds()
{
var results = new Collection<ScrapeOdds>();
var urlRequest = ScrappedUrls();
var observerableUrls = urlRequest.Select(u => u.ToObservable()).Merge();
var publisher = observerableUrls.Publish();
var hubContext = GlobalHost.ConnectionManager.GetHubContext<OddsHub>();
publisher.Subscribe(scraper =>
{
// Whatever you do do convert to the result set
var scrapedOdds = scraper.GetOdds();
results.Add(scrapedOdds);
// update anything else you want when it arrives.
// Update SingalR here
hubContext.Clients.All.UpdatedOdds(scrapedOdds);
});
// Will fire off subscriptions and not continue until they are done.
await publisher;
return results;
}
The merge option will process the results as they come in. You can then update the signalR hubs plus whatever else you need to update as they come in. The controller action will have to wait for them all to come in. That's why there is an await on the publisher.
I don't really know if httpClient is going to like to have 50 - 60 web calls all at once or not. If it doesn't you can just take the IEnumerable to an array and break it down into a smaller chunks. And also there should be some error checking in there. With Rx you can also tell it to SubscribeOn and ObserverOn different threads but I think with everything being pretty much async that wouldn't be necessary.
Related
I am trying to create a unit test to simulate my API being called by many people at the same time.
I've got this code in my unit test:
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
var id = i; // must assign to new variable inside for loop
var t = Task.Run(async () =>
{
response = await Client.GetAsync("/api/test2?id=" + id);
Assert.AreEqual(HttpStatusCode.OK, response.StatusCode);
});
tasks.Add(t);
}
await Task.WhenAll(tasks);
Then in my controller I am putting in a Thread.Sleep.
But when I do this, the total time for all tests to complete is 10 x the sleep time.
I expected all the calls to be made and to have ended up at the Thread.Sleep call at more or less the same time.
But it seems the API calls are actually made one after the other.
The reason I am testing the parallel API call is because I want to test a deadlock issue with my data repository when using SQLite which has only happened when more than 1 user uses my website at the same time.
And I have never been able to simulate this and I thought I'd create a unit test, but the code I have now seems to not be executing the calls in parallel.
My plan with the Thread.Sleep calls was to put a couple in the Controller method to make sure all requests end up between certain code blocks at the same time.
Do I need to set a a max number of parallel requests on the Web Server or something or am I doing something obviously wrong?
Thanks in advance.
Update 1:
I forgot to mention I get the same results with await Task.Delay(1000); and many similar alternatives.
Not sure if it's clear but this is all running within a unit test using NUnit.
And the "Web Server" and Client is created like this:
var builder = new WebHostBuilder().UseStartup<TStartup>();
Server = new TestServer(builder);
Client = Server.CreateClient();
You can use Task.Delay(time in milliseconds). Thread.Sleep will not release a thread and it can't process other tasks while waiting to result.
The HttpClient class in .NET has a limit of two concurrent requests to the same server by default, which I believe might be causing the issue in this case. Usually, this limit can be overridden by creating a new HttpClientHandler and using it as an argument in the constructor:
new HttpClient(new HttpClientHandler
{
MaxConnectionsPerServer = 100
})
But because the clients are created using the TestServer method, that gets a little more complicated. You could try changing the ServicePointManager.DefaultConnectionLimit property like below, but I'm not sure if that will work with the TestServer:
System.Net.ServicePointManager.DefaultConnectionLimit = 100;
That being said, I believe using Unit Tests for doing load testing is not a good approach and recommend looking into tools specific for load testing.
Reference for ServicePointManagerClass
This blog post also has more in-depth information about the subject
I found the problem with my test.
It was not the TestServer or the client code, it was the Database code.
In my controller I was starting an NHibernate Transaction, and that was blocking the requests because it would put a lock on the table being updated.
This is correct, so I had to change my code a bit to not automatically start a transaction. But rather leave that up to the calling code to manage.
I am working on a MVC 5 based report generating web application (Excel files). On one "GenerateReports" page, on button click, I am calling StartMonthly function. This takes control to a void method "GenerateReportMainMonthly" in the controller. This method calls another void method "GenerateReportMain". In GenerateReportMain, there are 5 other void functions that are being called.
I do not want the control to get stuck at back-end until the report generation is completed. On button click, an alert box should show "Report Generation started." and the control should come back to the "GenerateReports" page.
I have tried ajax but have not been able to get the control back to UI. How can I get the control back to the same page without waiting for the back-end process to complete?
$('#btnStart').on('click', StartMonthly);
function StartMonthly() {
var url = '/GenerateReport/GenerateReportMainMonthly';
window.location.href = url;
}
public void GenerateReportMainMonthly()
{
_isDaily = false;
GenerateReportMain();
}
It seems you are after running background tasks in your controllers. This is generally a bad idea (see this SO answer) as you might find that your server process has been killed mid-way and your client will have to handle it somehow.
If you absolutely must run long-ish processes in your controller and cannot extract it into a background worker of some sort, you can opt for something like this SO answer suggests. Implementation will vary depending on your setup and how fancy you are willing/able to go, but the basics will ramain the same:
you make an instantaneous call to initiate your long action and return back a job id to refer back to
your backend will process the task and update the status accordingly
your client will periodically check for status and do your desired behaviour when the job is reported complete.
If I were to tackle this problem I'd try to avoid periodic polling and rather opt for SignalR updates as describled in this blog post (this is not mine, I just googled it up for example).
I am currently working on a project to build an integration between an existing ASP.Net MVC website and a file hosting service my company is using. The typical use case is:
A user requests one or more files
The controller makes one call per file to the file host API
The file host returns the file data to the controller
The controller returns a file result
The hosting service can handle concurrent calls, and I've found that executing each API call within a task (see example below) leads to fairly drastic improvements.
private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
var tasks = identifiers.Select(x => RetrieveDocument(results, x)).ToArray();
Task.WaitAll(tasks);
}
private Task RetrieveDocument(List<FileHostResult> results, DocumentIdentifier x)
{
return Task.Run(() =>
{
var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
lock (results)
{
results.Add(result);
}
});
}
My question is whether or not there is a better way of doing this, or if there are any potential pitfalls I might run into? (eg. locking server resources, etc).
EDIT 1: I didn't post the code for GetFileHostResultFromFileHost because I don't really have any access to change it. Its basically a method call implemented in a library I cant change.
EDIT 2: To clarify. My main concern is to avoid harming the current user experience on the site. To that end I want to make sure that running tasks concurrently out of an ASP.net mvc isn't going to lock up the site.
You should use Microsoft's Reactive Framework for this. It is ideally suited to this kind of processing.
Here's the code:
IObservable<FileHostResult> query =
from i in identifiers.ToObservable()
from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
select r;
IList<FileHostResult> results = query.ToList().Wait();
That's it. It properly schedules the code on the optimum number of threads.
If you want awaitable code then you can do this:
IObservable<FileHostResult> query =
from i in identifiers.ToObservable()
from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
select r;
IList<FileHostResult> results = await query.ToList();
It's really very simple and easy to code.
NuGet "System.Reactive" and then add using System.Reactive.Linq; to your code.
It is hard to give great advice without seeing the rest of the source code. But based on what I can see I'd suggest an approach like:
private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
results.AddRange(identifiers.AsParallel().Select(x => RetrieveDocument(x)));
}
private FileHostResult RetrieveDocument(DocumentIdentifier x)
{
var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
return result;
}
The advantages of this approach:
No explicit use of Task.Run - let AsParallel take care of that for you.
No need for locking the results list - let AsParallel and Select take care of that for you
You may also wish to increase the maximum number of connections you have access to.
Being honest though, I think you should look at approaches that don't require new Tasks at all - likely by using Async http download calls which you can run in parallel without the overhead of a thread.
I have the goal of uploading a Products CSV of ~3000 records to my e-commerce site. I want to utilise the REST API that my e-comm platform provides so I have something I can re-use and build upon for future sites that I may create.
My main issue that I am having trouble working through is:
- System.Threading.ThreadAbortException
Which I can only attribute to how long it takes to process through all 3K records via a POST request. My code:
public ActionResult WriteProductsFromFile()
{
string fileNameIN = "19107.txt";
string fileNameOUT = "19107_output.txt";
string jsonUrl = $"/api/products";
List<string> ls = new List<string>();
var engine = new FileHelperAsyncEngine<Prod1>();
using (engine.BeginReadFile(fileNameIN))
{
foreach (Prod1 prod in engine)
{
outputProduct output = new outputProduct();
if (!string.IsNullOrEmpty(prod.name))
{
output.product.name = prod.name;
string productJson = JsonConvert.SerializeObject(output);
ls.Add(productJson);
}
}
}
foreach (String s in ls)
nopApiClient.Post(jsonUrl, s);
return RedirectToAction("GetProducts");
}
}
Since I'm new to web-coding, am I going about this the wrong way? Is there a preferred way to bulk-upload that I haven't come across?
I've attempted to use the TaskCreationOptions.LongRunning flag, which helps the cause slightly but doesn't get me anywhere near my goal.
Web and api controller actions are not meant to do long running tasks - besides locking up the UI/thread, you will be introducing a series of opportunities for failure that you will have little recourse in recovering from.
But it's not all bad you have a lot of options here, there is a lot of literature on async/cloud architecture - which explains how to deal with files and these sorts of scenarios.
What you want to do is disconnect the processing of your file from the API request (in your application not the 3rd party)
It will take a little more work but will ultimately create a more reliable application.
Step 1:
Drop the file immediately to disk - I see you have the file on DISK already not sure how it gets there but either way it will work out the same.
Step 2:
Use a process running as
- a console app (easiest)
- a service (requires some sort of install/uninstall of the service)
- or even a thread in your web app (but you will struggle to know when it fails)
Which ever way you choose, the process will watch a directory for file changes, when there is a change it will kick off your method to happily process the file as you like.
Check out the FileSystemWatchers here is a basic example: https://www.dotnetperls.com/filesystemwatcher
Additionally:
If you are interested in running a thread in your Api/Web app, take a look at https://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx for some options.
You don't have to use a FileSystemWatcher of course, you could trigger via a flag in a DB - that is being checked periodically, or a system event.
Long post.. sorry
I've been reading up on this and tried back and forth with different solutions for a couple of days now but I can't find the most obvious choice for my predicament.
About my situation; I am presenting to the user a page that will contain a couple of different repeaters showing some info based on the result from a couple of webservice calls. I'd like to have the data brought in with an updatepanel (that would be querying the result table once per every two or three seconds until it found results) so I'd actually like to render the page and then when the data is "ready" it gets shown.
The page asks a controller for the info to render and the controller checks in a result table to see if there's anything to be found. If the specific data is not found it calls a method GetData() in WebServiceName.cs. GetData does not return anything but is supposed to start an async operation that gets the data from the webservice. The controller returns null and UpdatePanel waits for the next query.
When that operation is complete it'll store the data in it's relevant place in the db where the controller will find it the next time the page asks for it.
The solution I have in place now is to fire up another thread. I will host the page on a shared webserver and I don't know if this will cause any problems..
So the current code which resides on page.aspx:
Thread t = new Thread(new ThreadStart(CreateService));
t.Start();
}
void CreateService()
{
ServiceName serviceName = new ServiceName(user, "12345", "MOVING", "Apartment", "5100", "0", "72", "Bill", "rate_total", "1", "103", "serviceHost", "password");
}
At first I thought the solution was to use Begin[Method] and End[Method] but these don't seem to have been generated. I thought this seemed like a good solution so I was a little frustrated when they didn't show up.. is there a chance I might have missed a checkbox or something when adding the web references?
I do not want to use the [Method]Async since this stops the page from rendering until [Method]AsyncCompleted gets called from what I've understood.
The call I'm going to do is not CPU-intensive, I'm just waiting on a webService sitting on a slow server, so what I understood from this article: http://msdn.microsoft.com/en-us/magazine/cc164128.aspx making the threadpool bigger is not a choice as this will actually impair the performance instead (since I can't throw in a mountain of hardware).
What do you think is the best solution for my current situation? I don't really like the current one (only by gut feeling but anyway)
Thanks for reading this awfully long post..
Interesting. Until your question, I wasn't aware that VS changed from using Begin/End to Async/Completed when adding web references. I assumed that they would also include Begin/End, but apparently they did not.
You state "GetData does not return anything but is supposed to start an async operation that gets the data from the webservice," so I'm assuming that GetData actually blocks until the "async operation" completes. Otherwise, you could just call it synchronously.
Anyway, there are easy ways to get this working (asynchronous delegates, etc), but they consume a thread for each async operation, which doesn't scale.
You are correct that Async/Completed will block an asynchronous page. (side note: I believe that they will not block a synchronous page - but I've never tried that - so if you're using a non-async page, then you could try that). The method by which they "block" the asynchronous page is wrapped up in SynchronizationContext; in particular, each asynchronous page has a pending operation count which is incremented by Async and decremented after Completed.
You should be able to fake out this count (note: I haven't tried this either ;) ). Just substitute the default SynchronizationContext, which ignores the count:
var oldSyncContext = SynchronizationContext.Current;
try
{
SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
var serviceName = new ServiceName(..);
// Note: MyMethodCompleted will be invoked in a ThreadPool thread
// but WITHOUT an associated ASP.NET page, so some global state
// might be missing. Be careful with what code goes in there...
serviceName.MethodCompleted += MyMethodCompleted;
serviceName.MethodAsync(..);
}
finally
{
SynchronizationContext.SetSynchronizationContext(oldSyncContext);
}
I wrote a class that handles the temporary replacement of SynchronizationContext.Current as part of the Nito.Async library. Using that class simplifies the code to:
using (new ScopedSynchronizationContext(new SynchronizationContext()))
{
var serviceName = new ServiceName(..);
// Note: MyMethodCompleted will be invoked in a ThreadPool thread
// but WITHOUT an associated ASP.NET page, so some global state
// might be missing. Be careful with what code goes in there...
serviceName.MethodCompleted += MyMethodCompleted;
serviceName.MethodAsync(..);
}
This solution does not consume a thread that just waits for the operation to complete. It just registers a callback and keeps the connection open until the response arrives.
You can do this:
var action = new Action(CreateService);
action.BeginInvoke(action.EndInvoke, action);
or use ThreadPool.QueueUserWorkItem.
If using a Thread, make sure to set IsBackground=true.
There's a great post about fire and forget threads at http://consultingblogs.emc.com/jonathangeorge/archive/2009/09/10/make-methods-fire-and-forget-with-postsharp.aspx
try using below settings
[WebMethod]
[SoapDocumentMethod(OneWay = true)]
void MyAsyncMethod(parameters)
{
}
in your web service
but be careful if you use impersonation, we had problems on our side.
I'd encourage a different approach - one that doesn't use update panels. Update panels require an entire page to be loaded, and transferred over the wire - you only want the contents for a single control.
Consider doing a slightly more customized & optimized approach, using the MVC platform. Your data flow could look like:
Have the original request to your web page spawn a thread that goes out and warms your data.
Have a "skeleton" page returned to your client
In said page, have a javascript thread that calls your server asking for the data.
Using MVC, have a controller action that returns a partial view, which is limited to just the control you're interested in.
This will reduce your server load (can have a backoff algorithm), reduce the amount of info sent over the wire, and still give a great experience to the client.