How to optimize c# code requesting JSON using REST api - c#

I have to make a c# application which uses REST api to fetch JIRA issues. After I run the tool I am getting the correct output but it is taking a lot of time to display the output. Below is the part of code which is taking the maximum time
var client =new WebClient();
foreach(dynamic i in jira_keys)
{
issue_id=i.key;
string rest_api_url="some valid url"+issue_id;
var jira_response=client.DownloadString(rest_api_url);
//rest of the processing
}
jira_keys is a JArray. After this there is processing part of the JSON in the for each loop. This is taking a lot of time as the number of jira_keys increase. I cannot apply multi-threading to this since there are shared variable issues. So please someone suggest some way to optimise this.

If the issues are tied to a specific project or some other grouping, you can instead search for issues with a JQL string. This way you get them in bulk and paginated.
https://docs.atlassian.com/jira/REST/cloud/#api/2/search-search
Also, like cubrr said in his comment, async calls should work fine if you want to make api calls with multiple threads. Awaiting the call will block until the shared resources are ready.
(Would have posted as a comment if I had enough rep)

Here is an example of how you can fetch the responses from JIRA asynchronously.
var taskList = new List<Task<string>>();
foreach (dynamic i in jira_keys)
{
issue_id = i.key;
string rest_api_url = "some valid url" + issue_id;
var jiraDownloadTask = Task.Factory.StartNew(() => client.DownloadString(rest_api_url));
taskList.Add(jiraDownloadTask);
}
Task.WaitAll(taskList.ToArray());
//access the results
foreach(var task in taskList)
{
Console.WriteLine(task.Result);
}

Related

Concurrent calls to external services

I am currently working on a project to build an integration between an existing ASP.Net MVC website and a file hosting service my company is using. The typical use case is:
A user requests one or more files
The controller makes one call per file to the file host API
The file host returns the file data to the controller
The controller returns a file result
The hosting service can handle concurrent calls, and I've found that executing each API call within a task (see example below) leads to fairly drastic improvements.
private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
var tasks = identifiers.Select(x => RetrieveDocument(results, x)).ToArray();
Task.WaitAll(tasks);
}
private Task RetrieveDocument(List<FileHostResult> results, DocumentIdentifier x)
{
return Task.Run(() =>
{
var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
lock (results)
{
results.Add(result);
}
});
}
My question is whether or not there is a better way of doing this, or if there are any potential pitfalls I might run into? (eg. locking server resources, etc).
EDIT 1: I didn't post the code for GetFileHostResultFromFileHost because I don't really have any access to change it. Its basically a method call implemented in a library I cant change.
EDIT 2: To clarify. My main concern is to avoid harming the current user experience on the site. To that end I want to make sure that running tasks concurrently out of an ASP.net mvc isn't going to lock up the site.
You should use Microsoft's Reactive Framework for this. It is ideally suited to this kind of processing.
Here's the code:
IObservable<FileHostResult> query =
from i in identifiers.ToObservable()
from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
select r;
IList<FileHostResult> results = query.ToList().Wait();
That's it. It properly schedules the code on the optimum number of threads.
If you want awaitable code then you can do this:
IObservable<FileHostResult> query =
from i in identifiers.ToObservable()
from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
select r;
IList<FileHostResult> results = await query.ToList();
It's really very simple and easy to code.
NuGet "System.Reactive" and then add using System.Reactive.Linq; to your code.
It is hard to give great advice without seeing the rest of the source code. But based on what I can see I'd suggest an approach like:
private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
results.AddRange(identifiers.AsParallel().Select(x => RetrieveDocument(x)));
}
private FileHostResult RetrieveDocument(DocumentIdentifier x)
{
var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
return result;
}
The advantages of this approach:
No explicit use of Task.Run - let AsParallel take care of that for you.
No need for locking the results list - let AsParallel and Select take care of that for you
You may also wish to increase the maximum number of connections you have access to.
Being honest though, I think you should look at approaches that don't require new Tasks at all - likely by using Async http download calls which you can run in parallel without the overhead of a thread.

C# batch processing of async web responses hangs just before finishing

Here is the scenario.
I want to call 2 versions of an API (hosted on different servers), then cast their responses (they come as a JSON) to C# objects and compare them.
An important note here is that i need to query the APIs a lot of times ~3000. The reason for this is that I query an endpoint that has an id and that returns a specific object from the DB. So my queries are like http://myapi/v1/endpoint/id. And I basically use a loop to go through all of the ids.
Here is the issue
I start querying the API and for the first 90% of all requests it is blazing fast (I get the response and i process it) and all that happens under 5 seconds.
Then however, I start to come to a stop. The next 50-100 requests can take between 1 - 5 seconds to process and after that I come to a stop. No CPU-usage, network activity is low (and I am pretty sure that activity is from other apps). And my app just hangs.
UPDATE: Around 50% of the times I tested this, it does finally resume after quite a bit of time. But the other 50% it still just hangs
Here is what I am doing in-code
I have a list of Ids that I iterate to query the endpoint.
This is the main piece of code that queries the APIs and processes the responses.
var endPointIds = await GetIds(); // this queries a different endpoint to get all ids, however there are no issues with it
var tasks = endPointIds.Select(async id =>
{
var response1 = await _data.GetData($"{Consts.ApiEndpoint1}/{id}");
var response2 = await _data.GetData($"{Consts.ApiEndpoint2}/{id}");
return ProcessResponces(response1, response2);
});
var res = await Task.WhenAll(tasks);
var result = res.Where(r => r != null).ToList();
return result; // I never get to return the result, the app hangs before this is reached
This is the GetData() method
private async Task<string> GetAsync(string serviceUri)
{
try
{
var request = WebRequest.CreateHttp(serviceUri);
request.ContentType = "application/json";
request.Method = WebRequestMethods.Http.Get;
using (var response = await request.GetResponseAsync())
using (var responseStream = response.GetResponseStream())
using (var streamReader = new StreamReader(responseStream, Encoding.UTF8))
{
return await streamReader.ReadToEndAsync();
}
}
catch
{
return string.Empty;
}
}
I would link the ProcessResponces method as well, however I tried mocking it to return a string like so:
private string ProcessResponces(string responseJson1, string responseJson1)
{
//usually i would have 2 lines that deserialize responseJson1 and responseJson1 here using Newtonsoft.Json's DeserializeObject<>
return "Fake success";
}
And even with this implementation my issue did not go away (only difference it made is that I managed the have fast requests for like 97% of my requests, but my code still ended up stopping at the last few request), so I am guessing the main issue is not related to that method. But what it more or less does is deserialize both responses to c# objects, compares them and returns information about their equality.
Here are my observations after 4 hours of debugging
If I manually reduce the number of queries to my API (I used .Take() method on the list of ids) the issue still persists. For example on 1000 total requests I start hanging around 900th, for 1500 on the 1400th an so on. I believe the issue goes away at around 100-200 requests, but I am not sure since it might just be too fast for me to notice.
Since this is currently a console app I tried adding WriteLines() in some of my methods, and the issue seemed to go away (I am guessing the delay in speed that writing on the console creates, gives some time between requests and that helps)
Lastly i did a concurrency profiling of my app and it reported that there were a lot of contentions happening at the point where my app hangs. Opening the contention tab showed that they are mainly happening with System.IO.StreamReader.ReadToEndAsync()
Thoughts and Questions
Obviously, what can I do to resolve the issue?
Is my GetAsync() method wrong, should I be using something else instead of responseStream and streamReader?
I am not super proficient in asynchronous operations, maybe my flow of async/await operations is wrong.
Lastly, could it be something with the API controllers themselves? They are standard ASP.NET MVC 5 WebAPI controllers (version 5.2.3.0)
After long hours of tracking my requests with Fiddler and finally mocking my DataProvider (_data) to retrieve locally, from disk - it turns out that I had responses that were taking 30s+ to come (or even not coming at all).
Since my .Select() is async it always dispalyed info for the quick responses first, and then came to a halt as it was waiting for the slow ones. This gave an illusion that I was somehow loading the first X amount of requests quickly and then stopping. When, in reality, I was simply shown the fastest X amount of requests and then coming to a halt as I was waiting for the slow ones.
And to kind of answer my questions...
What can I do to resolve the issue - set a timeout that allows a maximum number of milliseconds/seconds for a request to finish.
The GetAsync() method is alright.
Async/await operations are also correct, just need to have in mind that doign an async select will return results ordered by the time it took for them to finish.
The ASP.NET Framework controllers are perfectly fine and do not contribute to the issue.

TPL DataFlow confusion around pipelines - should I create a new pipeline for each data call? How can I track data that's flowing through?

I'm struggling with how to apply TPL DataFlow to my application.
I've got a bunch of parallel data operations I want to track and manage, previously I was just using Tasks, but I'm trying to implement DataFlow to give me more control.
I'm composing a pipeline of tasks to say get the data and process it, here's an example of a pipeline to get data, process data, and log it as complete:
TransformBlock<string, string> loadDataFromFile = new TransformBlock<string, string>(filename =>
{
// read the data file (takes a long time!)
Console.WriteLine("Loading from " + filename);
Thread.Sleep(2000);
// return our result, for now just use the filename
return filename + "_data";
});
TransformBlock<string, string> prodcessData = new TransformBlock<string, string>(data =>
{
// process the data
Console.WriteLine("Processiong data " + data);
Thread.Sleep(2000);
// return our result, for now just use the data string
return data + "_processed";
});
TransformBlock<string, string> logProcessComplete= new TransformBlock<string, string>(data =>
{
// Doesn't do anything to the data, just performs an 'action' (but still passses the data long, unlike ActionBlock)
Console.WriteLine("Result " + data + " complete");
return data;
});
I'm linking them together like this:
// create a pipeline
loadDataFromFile.LinkTo(prodcessData);
prodcessData.LinkTo(logProcessComplete);
I've been trying to follow this tutorial.
My confusion is that in the tutorial this pipeline seems to be a 'fire once' operation. It creates the pipeline and fires it off once, and it completes. This seems counter to how the Dataflow library seems designed, I've read:
The usual way of using TPL Dataflow is to create all the blocks, link
them together, and then start putting data in one end.
From "Concurrency in C# Cookbook" by Stephen Cleary.
But I'm not sure how to track the data after I've put said data 'in one end'. I need to be able to get the processed data from multiple parts of the program, say the user presses two buttons, one to get the data from "File1" and do something with it, one to get the data from "File2", I'd need something like this I think:
public async Task loadFile1ButtonPress()
{
loadDataFromFile.Post("File1");
var data = await logProcessComplete.ReceiveAsync();
Console.WriteLine($"Got data1: {data}");
}
public async Task loadFile2ButtonPress()
{
loadDataFromFile.Post("File2");
var data = await logProcessComplete.ReceiveAsync();
Console.WriteLine($"Got data2: {data}");
}
If these are performed 'synchronously' it works just fine, as there's only one piece of information flowing through the pipeline:
Console.WriteLine("waiting for File 1");
await loadFile1ButtonPress();
Console.WriteLine("waiting for File 2");
await loadFile2ButtonPress();
Console.WriteLine("Done");
Produces the expected output:
waiting for File 1
Loading from File1
Processiong data File1_data
Result File1_data_processed complete
Got data1: File1_data_processed
waiting for File 2
Loading from File2
Processiong data File2_data
Result File2_data_processed complete
Got data2: File2_data_processed
Done
This makes sense to me, it's just doing them one at a time:
However, the point is I want to run these operations in parallel and asynchronously. If I simulate this (say, the user pressing both 'buttons' in quick succession) with:
Console.WriteLine("waiting");
await Task.WhenAll(loadFile1ButtonPress(), loadFile2ButtonPress());
Console.WriteLine("Done");
Does this work if the second operation takes longer than the first?
I was expecting both to return the first data however (Originally this didn't
work but it was a bug I've fixed - it does return the correct items now).
I was thinking I could link an ActionBlock<string> to perform the action with the data, something like:
public async Task loadFile1ButtonPress()
{
loadDataFromFile.Post("File1");
// instead of var data = await logProcessComplete.ReceiveAsync();
logProcessComplete.LinkTo(new ActionBlock<string>(data =>
{
Console.WriteLine($"Got data1: {data}");
}));
}
But this is changing the pipeline completely, now loadFile2ButtonPress won't work at all as it's using that pipeline.
Can I create multiple pipelines with the same Blocks? Or should I be creating a whole new pipeline (and new blocks) for each 'operation' (that seems to defeat the point of using the Dataflow library at all)
Not sure if this is best place for Stackoverflow or something like Codereview? Might be a bit subjective.
If you need some events to happen after some data has been processed, you should make your last block AsObservable, and add some small code with Rx.Net:
var subscription = logProcessComplete.AsObservable();
subscription.Subscribe(i => Console.WriteLine(i));
As been said in comments, you can link your blocks to more than one block, with a predicate. Note, that in that case, message will be delivered only to first matching block. You also may create a BroadcastBlock, which delivers a copy of the message to each linked block.
Make sure that unwanted by every other block messages are linked to NullTarget, as in other case they will stay in your pipeline forever, and will stop your completion.
Check that your pipeline correctly handles completion, as in case of multiple links the completion also being propagated only to the first linked block.

Is it good to call Thread.Sleep during polling Google Big Query results in ASP.NET? Alternatives?

I am using ASP.NET MVC 5 which gets data from Google Big Query. Due to the way Google Big Query is designed, I need to poll for results if job is not finished. Here is my code,
var qr = new QueryRequest
{
Query = string.Format(myQuery, param1, param2)
};// all params are mine
var jobs = _bigqueryService.Jobs;
var response = await jobs.Query(qr, _settings.GoogleCloudServiceProjectId).ExecuteAsync();
var jobId = response.JobReference.JobId;
var isCompleted = response.JobComplete == true;
IList<TableRow> rows = response.Rows;
while (!isCompleted)
{
var r = await jobs.GetQueryResults(_settings.GoogleCloudServiceProjectId, jobId).ExecuteAsync();
isCompleted = r.JobComplete == true;
if (!isCompleted)
{
Thread.Sleep(100);
}
else
{
rows = r.Rows;
}
}
Looking at this code can someone tell me whether its good to call Thread.Sleep inside this context or I should continuously burn CPU cycles.
I wouldn't do this on the server side as one have to be careful which waiting calls to use to avoid high resource consumption under load.
Your users also don't get any feedback from the page. You can improve this situation by displaying a spinning wheel, but it might be better to show actual progress to the users.
A better way of doing this will be AJAX calls to your web site. The call may return something like status, time elapsed and percentage complete (have a look at the BigTable's API). In this case you don't need to do any Thread.Sleep or Task.Delay kung fu.
Edit:
Oh, you already using AJAX! Just tear off any Thread.Sleep and return result immediately to users. In browser, when AJAX call is completed update UI with information from the AJAX call. Job done.

Storing Facebook likes locally - Fetching number of likes performance issue

I am building an app using ASP.NET 4.0.
I have a table called entries. Entries can be liked via Facebook. I want to implement the ability to sort via likes so I am taking the approach of storing the number of likes for each entry and using that column to order. The problem is the overhead involved in getting the number of likes. I think the method I am using could be improved as right now because fetching data for only 13 entries is taking 4 seconds, which is way too long.
I am using the FB graph api and JSON.NET to parse the response. In the following code I have a List of type Entry, I am getting the like url for the entry using an app setting combined with the entries id.
This is what I am doing:
foreach (Entry entry in entries)
{
int likes;
try
{
// the url that is tied to the entry
string url = "http://graph.facebook.com/?ids=" + Properties.Settings.Default.likeUrl + "?id=" + entry.EntryId;
//open a WebClient and get the results of the url
WebClient client = new WebClient();
Stream data = client.OpenRead(url);
StreamReader reader = new StreamReader(data);
string s = reader.ReadToEnd();
//parse out the response
var json = JObject.Parse(s);
//shares are how many likes the entry has
likes = Convert.ToInt32(json.First.First.SelectToken("shares").ToString());
}
catch (Exception ex)
{
likes = 0;
}
}
As I said this method is very expensive. If anyone could suggest a better way to do what I am attempting here I would really appreciate the help. Thanks much!
Method,
You are not disposing of your stream or stream reader. This may not help in the individual performance, but you could see a slow down later... Also try to use the parallel extentions, which would require a little more care in the handling of variables. This is just an example:
EDITED: I forgot that webclient is disposable too. That needs to be disposed of each time or it will hang onto a connection for a while. That actually might help a bit.
private object locker;
private int _likes = 0;
private int Likes
{
get
{
lock(locker)
{
return _likes;
}
}
set
{
lock(locker)
{
_likes = value;
}
}
}
void MyMethod()
{
Parallel.ForEach(entries, entry =>
{
using(WebClient client = new WebClient())
using(Stream data = client.OpenRead(url))
using(StreamReader reader = new StreamReader(data))
{
....
}
}
}
Doing a separate API call for each item in the loop is going to be slow due to the overhead of making network requests. Have you looked into batching the query for the likes for all 13 items into a single API call? I don't know specifically if it will work for the query you are running, but I know that the facebook API supports methods of batching queries. You can run the batches such that the output of one goes into other queries in the same batch. You may have to switch to making FQL queries via the Graph API.
You might also consider moving the API calls onto the client, and implement them using the javascript API. This will offload the API work to the users' browsers, which will let your application scale better. If you don't do this you should at least consider Robert's suggestion of making the calls asynchronously.

Categories