I am currently working on a project to build an integration between an existing ASP.Net MVC website and a file hosting service my company is using. The typical use case is:
A user requests one or more files
The controller makes one call per file to the file host API
The file host returns the file data to the controller
The controller returns a file result
The hosting service can handle concurrent calls, and I've found that executing each API call within a task (see example below) leads to fairly drastic improvements.
private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
var tasks = identifiers.Select(x => RetrieveDocument(results, x)).ToArray();
Task.WaitAll(tasks);
}
private Task RetrieveDocument(List<FileHostResult> results, DocumentIdentifier x)
{
return Task.Run(() =>
{
var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
lock (results)
{
results.Add(result);
}
});
}
My question is whether or not there is a better way of doing this, or if there are any potential pitfalls I might run into? (eg. locking server resources, etc).
EDIT 1: I didn't post the code for GetFileHostResultFromFileHost because I don't really have any access to change it. Its basically a method call implemented in a library I cant change.
EDIT 2: To clarify. My main concern is to avoid harming the current user experience on the site. To that end I want to make sure that running tasks concurrently out of an ASP.net mvc isn't going to lock up the site.
You should use Microsoft's Reactive Framework for this. It is ideally suited to this kind of processing.
Here's the code:
IObservable<FileHostResult> query =
from i in identifiers.ToObservable()
from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
select r;
IList<FileHostResult> results = query.ToList().Wait();
That's it. It properly schedules the code on the optimum number of threads.
If you want awaitable code then you can do this:
IObservable<FileHostResult> query =
from i in identifiers.ToObservable()
from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
select r;
IList<FileHostResult> results = await query.ToList();
It's really very simple and easy to code.
NuGet "System.Reactive" and then add using System.Reactive.Linq; to your code.
It is hard to give great advice without seeing the rest of the source code. But based on what I can see I'd suggest an approach like:
private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
results.AddRange(identifiers.AsParallel().Select(x => RetrieveDocument(x)));
}
private FileHostResult RetrieveDocument(DocumentIdentifier x)
{
var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
return result;
}
The advantages of this approach:
No explicit use of Task.Run - let AsParallel take care of that for you.
No need for locking the results list - let AsParallel and Select take care of that for you
You may also wish to increase the maximum number of connections you have access to.
Being honest though, I think you should look at approaches that don't require new Tasks at all - likely by using Async http download calls which you can run in parallel without the overhead of a thread.
Related
I am trying to create a unit test to simulate my API being called by many people at the same time.
I've got this code in my unit test:
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
var id = i; // must assign to new variable inside for loop
var t = Task.Run(async () =>
{
response = await Client.GetAsync("/api/test2?id=" + id);
Assert.AreEqual(HttpStatusCode.OK, response.StatusCode);
});
tasks.Add(t);
}
await Task.WhenAll(tasks);
Then in my controller I am putting in a Thread.Sleep.
But when I do this, the total time for all tests to complete is 10 x the sleep time.
I expected all the calls to be made and to have ended up at the Thread.Sleep call at more or less the same time.
But it seems the API calls are actually made one after the other.
The reason I am testing the parallel API call is because I want to test a deadlock issue with my data repository when using SQLite which has only happened when more than 1 user uses my website at the same time.
And I have never been able to simulate this and I thought I'd create a unit test, but the code I have now seems to not be executing the calls in parallel.
My plan with the Thread.Sleep calls was to put a couple in the Controller method to make sure all requests end up between certain code blocks at the same time.
Do I need to set a a max number of parallel requests on the Web Server or something or am I doing something obviously wrong?
Thanks in advance.
Update 1:
I forgot to mention I get the same results with await Task.Delay(1000); and many similar alternatives.
Not sure if it's clear but this is all running within a unit test using NUnit.
And the "Web Server" and Client is created like this:
var builder = new WebHostBuilder().UseStartup<TStartup>();
Server = new TestServer(builder);
Client = Server.CreateClient();
You can use Task.Delay(time in milliseconds). Thread.Sleep will not release a thread and it can't process other tasks while waiting to result.
The HttpClient class in .NET has a limit of two concurrent requests to the same server by default, which I believe might be causing the issue in this case. Usually, this limit can be overridden by creating a new HttpClientHandler and using it as an argument in the constructor:
new HttpClient(new HttpClientHandler
{
MaxConnectionsPerServer = 100
})
But because the clients are created using the TestServer method, that gets a little more complicated. You could try changing the ServicePointManager.DefaultConnectionLimit property like below, but I'm not sure if that will work with the TestServer:
System.Net.ServicePointManager.DefaultConnectionLimit = 100;
That being said, I believe using Unit Tests for doing load testing is not a good approach and recommend looking into tools specific for load testing.
Reference for ServicePointManagerClass
This blog post also has more in-depth information about the subject
I found the problem with my test.
It was not the TestServer or the client code, it was the Database code.
In my controller I was starting an NHibernate Transaction, and that was blocking the requests because it would put a lock on the table being updated.
This is correct, so I had to change my code a bit to not automatically start a transaction. But rather leave that up to the calling code to manage.
I have the goal of uploading a Products CSV of ~3000 records to my e-commerce site. I want to utilise the REST API that my e-comm platform provides so I have something I can re-use and build upon for future sites that I may create.
My main issue that I am having trouble working through is:
- System.Threading.ThreadAbortException
Which I can only attribute to how long it takes to process through all 3K records via a POST request. My code:
public ActionResult WriteProductsFromFile()
{
string fileNameIN = "19107.txt";
string fileNameOUT = "19107_output.txt";
string jsonUrl = $"/api/products";
List<string> ls = new List<string>();
var engine = new FileHelperAsyncEngine<Prod1>();
using (engine.BeginReadFile(fileNameIN))
{
foreach (Prod1 prod in engine)
{
outputProduct output = new outputProduct();
if (!string.IsNullOrEmpty(prod.name))
{
output.product.name = prod.name;
string productJson = JsonConvert.SerializeObject(output);
ls.Add(productJson);
}
}
}
foreach (String s in ls)
nopApiClient.Post(jsonUrl, s);
return RedirectToAction("GetProducts");
}
}
Since I'm new to web-coding, am I going about this the wrong way? Is there a preferred way to bulk-upload that I haven't come across?
I've attempted to use the TaskCreationOptions.LongRunning flag, which helps the cause slightly but doesn't get me anywhere near my goal.
Web and api controller actions are not meant to do long running tasks - besides locking up the UI/thread, you will be introducing a series of opportunities for failure that you will have little recourse in recovering from.
But it's not all bad you have a lot of options here, there is a lot of literature on async/cloud architecture - which explains how to deal with files and these sorts of scenarios.
What you want to do is disconnect the processing of your file from the API request (in your application not the 3rd party)
It will take a little more work but will ultimately create a more reliable application.
Step 1:
Drop the file immediately to disk - I see you have the file on DISK already not sure how it gets there but either way it will work out the same.
Step 2:
Use a process running as
- a console app (easiest)
- a service (requires some sort of install/uninstall of the service)
- or even a thread in your web app (but you will struggle to know when it fails)
Which ever way you choose, the process will watch a directory for file changes, when there is a change it will kick off your method to happily process the file as you like.
Check out the FileSystemWatchers here is a basic example: https://www.dotnetperls.com/filesystemwatcher
Additionally:
If you are interested in running a thread in your Api/Web app, take a look at https://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx for some options.
You don't have to use a FileSystemWatcher of course, you could trigger via a flag in a DB - that is being checked periodically, or a system event.
I have to make a c# application which uses REST api to fetch JIRA issues. After I run the tool I am getting the correct output but it is taking a lot of time to display the output. Below is the part of code which is taking the maximum time
var client =new WebClient();
foreach(dynamic i in jira_keys)
{
issue_id=i.key;
string rest_api_url="some valid url"+issue_id;
var jira_response=client.DownloadString(rest_api_url);
//rest of the processing
}
jira_keys is a JArray. After this there is processing part of the JSON in the for each loop. This is taking a lot of time as the number of jira_keys increase. I cannot apply multi-threading to this since there are shared variable issues. So please someone suggest some way to optimise this.
If the issues are tied to a specific project or some other grouping, you can instead search for issues with a JQL string. This way you get them in bulk and paginated.
https://docs.atlassian.com/jira/REST/cloud/#api/2/search-search
Also, like cubrr said in his comment, async calls should work fine if you want to make api calls with multiple threads. Awaiting the call will block until the shared resources are ready.
(Would have posted as a comment if I had enough rep)
Here is an example of how you can fetch the responses from JIRA asynchronously.
var taskList = new List<Task<string>>();
foreach (dynamic i in jira_keys)
{
issue_id = i.key;
string rest_api_url = "some valid url" + issue_id;
var jiraDownloadTask = Task.Factory.StartNew(() => client.DownloadString(rest_api_url));
taskList.Add(jiraDownloadTask);
}
Task.WaitAll(taskList.ToArray());
//access the results
foreach(var task in taskList)
{
Console.WriteLine(task.Result);
}
Dilemma, dilemma...
I've been working up a solution to a problem that uses async calls to the HttpClient library (GetAsync=>ConfigureAwait(false) etc). IIn a console app, my dll is very responsive and the mixture of using the async await calls and the Parallel.ForEach(=>) really makes me glow.
Now for the issue. After moving from this test harness to the target app, things have become problematic. I'm using asp.net mvc 4 and have hit a few issues. The main issue really is that calling my process on a controller action actually blocks the main thread until the async actions are complete. I've tried using an async controller pattern, I've tried using Task.Factory, I've tried using new Threads. You name it, I've tried all the flavours - and then some!.
Now, I appreciate that the nature of http is not designed to facilitate long processes like this and there are a number of articles here on SO that say don't do it. However, there are mitigating reasons why i NEED to use this approach. The main reason that I need to run this in mvc is due to the fact that I actually update the live data cache (on the mvc app) in realtime via raising an event in my dll's code. This means that fragments of the 50-60 data feeds can be pushed out live before the entire async action is complete. Therefore, client apps can receive partial updates within seconds of the async action being instigated. If I were to delegate the process out to a console app that ran the entire process in the background, I'd no longer be able to harness those fragment partial updates and this is the raison d'etre behind the entire choice of this architecture.
Can anyone shed light on a solution that would allow me to mitigate the blocking of the thread, whilst at the same time, allow each async fragment to be consumed by my object model and fed out to the client apps (I'm using signalr to make these client updates). A kind of nirvanna would be a scenario where an out-of-process cache object could be shared between numerous processes - the cache update could then be triggered and consumed by my mvc process (aka - http://devproconnections.com/aspnet-mvc/out-process-caching-aspnet). And so back to reality...
I have also considered using a secondary webservice to achieve this, but would welcome other options before once again over engineering my solution (there are already many moving parts and a multitude of async Actions going on).
Sorry not to have added any code, I'm hoping for practical philosophy/insights, rather than code help on this, tho would of course welcome coded examples that illustrate a solution to my problem.
I'll update the question as we move in time, as my thinking process is still maturing on this.
[edit] - for the sake of clarity, the snippet below is my brothers grimm code collision (extracted from a larger body of work):
Parallel.ForEach(scrapeDataBases, new ParallelOptions()
{
MaxDegreeOfParallelism = Environment.ProcessorCount * 15
},
async dataBase =>
{
await dataBase.ScrapeUrlAsync().ConfigureAwait(false);
await UpdateData(dataType, (DataCheckerScrape)dataBase);
});
async and Parallel.ForEach do not mix naturally, so I'm not sure what your console solution looks like. Furthermore, Parallel should almost never be used on ASP.NET at all.
It sounds like what you would want is to just use Task.WhenAll.
On a side note, I think your reasoning around background processing on ASP.NET is incorrect. It is perfectly possible to have a separate process that updates the clients via SignalR.
Being that your question is pretty high level without a lot of code. You could try Reactive Extensions.
Something like
private IEnumerable<Task<Scraper>> ScrappedUrls()
{
// Return the 50 to 60 task for each website here.
// I assume they all return the same type.
// return .ScrapeUrlAsync().ConfigureAwait(false);
throw new NotImplementedException();
}
public async Task<IEnumerable<ScrapeOdds>> GetOdds()
{
var results = new Collection<ScrapeOdds>();
var urlRequest = ScrappedUrls();
var observerableUrls = urlRequest.Select(u => u.ToObservable()).Merge();
var publisher = observerableUrls.Publish();
var hubContext = GlobalHost.ConnectionManager.GetHubContext<OddsHub>();
publisher.Subscribe(scraper =>
{
// Whatever you do do convert to the result set
var scrapedOdds = scraper.GetOdds();
results.Add(scrapedOdds);
// update anything else you want when it arrives.
// Update SingalR here
hubContext.Clients.All.UpdatedOdds(scrapedOdds);
});
// Will fire off subscriptions and not continue until they are done.
await publisher;
return results;
}
The merge option will process the results as they come in. You can then update the signalR hubs plus whatever else you need to update as they come in. The controller action will have to wait for them all to come in. That's why there is an await on the publisher.
I don't really know if httpClient is going to like to have 50 - 60 web calls all at once or not. If it doesn't you can just take the IEnumerable to an array and break it down into a smaller chunks. And also there should be some error checking in there. With Rx you can also tell it to SubscribeOn and ObserverOn different threads but I think with everything being pretty much async that wouldn't be necessary.
I am using a third party API to query data asynchronously. Here is an example of my code:
private void AsyncDataLoad() {
Task[] tasks = new Task[6] {
Task.Factory.StartNew(() => FetchSomeStuff1()),
Task.Factory.StartNew(() => FetchSomeStuff2()),
Task.Factory.StartNew(() => FetchSomeStuff3()),
Task.Factory.StartNew(() => FetchSomeStuff4()),
Task.Factory.StartNew(() => FetchSomeStuff5()),
Task.Factory.StartNew(() => FetchSomeStuff6())
};
Task.WaitAll(tasks);
}
How can I view how many requests I have open at one time? By default I think there is a limitation to the number of concurrent requests I can have open on one domain and I would like to change this. But, I want to be able to prove this is the fact before trying to make any changes.
My development box is on Windows 7 Enterprise if that helps for any tool suggestions. I tried using PerfMon, but, it didn't seem like any properties available for monitoring with that tool were picking up HTTP requests made by the server.
IME, fiddler is the best tool for viewing http call behavior. Its timeline lets you view concurrency nicely.
The setting for changing the limit is in connectionManagement
http://msdn.microsoft.com/en-us/library/fb6y0fyc.aspx
If you just need to manually watch it yourself (not inspect it programatically), then you could use TCPView from SysInternals.