Storing Facebook likes locally - Fetching number of likes performance issue

Storing Facebook likes locally - Fetching number of likes performance issue - c#

I am building an app using ASP.NET 4.0.
I have a table called entries. Entries can be liked via Facebook. I want to implement the ability to sort via likes so I am taking the approach of storing the number of likes for each entry and using that column to order. The problem is the overhead involved in getting the number of likes. I think the method I am using could be improved as right now because fetching data for only 13 entries is taking 4 seconds, which is way too long.
I am using the FB graph api and JSON.NET to parse the response. In the following code I have a List of type Entry, I am getting the like url for the entry using an app setting combined with the entries id.
This is what I am doing:
foreach (Entry entry in entries)
{
int likes;
try
{
// the url that is tied to the entry
string url = "http://graph.facebook.com/?ids=" + Properties.Settings.Default.likeUrl + "?id=" + entry.EntryId;
//open a WebClient and get the results of the url
WebClient client = new WebClient();
Stream data = client.OpenRead(url);
StreamReader reader = new StreamReader(data);
string s = reader.ReadToEnd();
//parse out the response
var json = JObject.Parse(s);
//shares are how many likes the entry has
likes = Convert.ToInt32(json.First.First.SelectToken("shares").ToString());
}
catch (Exception ex)
{
likes = 0;
}
}
As I said this method is very expensive. If anyone could suggest a better way to do what I am attempting here I would really appreciate the help. Thanks much!

Method,
You are not disposing of your stream or stream reader. This may not help in the individual performance, but you could see a slow down later... Also try to use the parallel extentions, which would require a little more care in the handling of variables. This is just an example:
EDITED: I forgot that webclient is disposable too. That needs to be disposed of each time or it will hang onto a connection for a while. That actually might help a bit.
private object locker;
private int _likes = 0;
private int Likes
{
get
{
lock(locker)
{
return _likes;
}
}
set
{
lock(locker)
{
_likes = value;
}
}
}
void MyMethod()
{
Parallel.ForEach(entries, entry =>
{
using(WebClient client = new WebClient())
using(Stream data = client.OpenRead(url))
using(StreamReader reader = new StreamReader(data))
{
....
}
}
}

Doing a separate API call for each item in the loop is going to be slow due to the overhead of making network requests. Have you looked into batching the query for the likes for all 13 items into a single API call? I don't know specifically if it will work for the query you are running, but I know that the facebook API supports methods of batching queries. You can run the batches such that the output of one goes into other queries in the same batch. You may have to switch to making FQL queries via the Graph API.
You might also consider moving the API calls onto the client, and implement them using the javascript API. This will offload the API work to the users' browsers, which will let your application scale better. If you don't do this you should at least consider Robert's suggestion of making the calls asynchronously.

Related

C# batch processing of async web responses hangs just before finishing

Here is the scenario.
I want to call 2 versions of an API (hosted on different servers), then cast their responses (they come as a JSON) to C# objects and compare them.
An important note here is that i need to query the APIs a lot of times ~3000. The reason for this is that I query an endpoint that has an id and that returns a specific object from the DB. So my queries are like http://myapi/v1/endpoint/id. And I basically use a loop to go through all of the ids.
Here is the issue
I start querying the API and for the first 90% of all requests it is blazing fast (I get the response and i process it) and all that happens under 5 seconds.
Then however, I start to come to a stop. The next 50-100 requests can take between 1 - 5 seconds to process and after that I come to a stop. No CPU-usage, network activity is low (and I am pretty sure that activity is from other apps). And my app just hangs.
UPDATE: Around 50% of the times I tested this, it does finally resume after quite a bit of time. But the other 50% it still just hangs
Here is what I am doing in-code
I have a list of Ids that I iterate to query the endpoint.
This is the main piece of code that queries the APIs and processes the responses.
var endPointIds = await GetIds(); // this queries a different endpoint to get all ids, however there are no issues with it
var tasks = endPointIds.Select(async id =>
{
var response1 = await _data.GetData($"{Consts.ApiEndpoint1}/{id}");
var response2 = await _data.GetData($"{Consts.ApiEndpoint2}/{id}");
return ProcessResponces(response1, response2);
});
var res = await Task.WhenAll(tasks);
var result = res.Where(r => r != null).ToList();
return result; // I never get to return the result, the app hangs before this is reached
This is the GetData() method
private async Task<string> GetAsync(string serviceUri)
{
try
{
var request = WebRequest.CreateHttp(serviceUri);
request.ContentType = "application/json";
request.Method = WebRequestMethods.Http.Get;
using (var response = await request.GetResponseAsync())
using (var responseStream = response.GetResponseStream())
using (var streamReader = new StreamReader(responseStream, Encoding.UTF8))
{
return await streamReader.ReadToEndAsync();
}
}
catch
{
return string.Empty;
}
}
I would link the ProcessResponces method as well, however I tried mocking it to return a string like so:
private string ProcessResponces(string responseJson1, string responseJson1)
{
//usually i would have 2 lines that deserialize responseJson1 and responseJson1 here using Newtonsoft.Json's DeserializeObject<>
return "Fake success";
}
And even with this implementation my issue did not go away (only difference it made is that I managed the have fast requests for like 97% of my requests, but my code still ended up stopping at the last few request), so I am guessing the main issue is not related to that method. But what it more or less does is deserialize both responses to c# objects, compares them and returns information about their equality.
Here are my observations after 4 hours of debugging
If I manually reduce the number of queries to my API (I used .Take() method on the list of ids) the issue still persists. For example on 1000 total requests I start hanging around 900th, for 1500 on the 1400th an so on. I believe the issue goes away at around 100-200 requests, but I am not sure since it might just be too fast for me to notice.
Since this is currently a console app I tried adding WriteLines() in some of my methods, and the issue seemed to go away (I am guessing the delay in speed that writing on the console creates, gives some time between requests and that helps)
Lastly i did a concurrency profiling of my app and it reported that there were a lot of contentions happening at the point where my app hangs. Opening the contention tab showed that they are mainly happening with System.IO.StreamReader.ReadToEndAsync()
Thoughts and Questions
Obviously, what can I do to resolve the issue?
Is my GetAsync() method wrong, should I be using something else instead of responseStream and streamReader?
I am not super proficient in asynchronous operations, maybe my flow of async/await operations is wrong.
Lastly, could it be something with the API controllers themselves? They are standard ASP.NET MVC 5 WebAPI controllers (version 5.2.3.0)

After long hours of tracking my requests with Fiddler and finally mocking my DataProvider (_data) to retrieve locally, from disk - it turns out that I had responses that were taking 30s+ to come (or even not coming at all).
Since my .Select() is async it always dispalyed info for the quick responses first, and then came to a halt as it was waiting for the slow ones. This gave an illusion that I was somehow loading the first X amount of requests quickly and then stopping. When, in reality, I was simply shown the fastest X amount of requests and then coming to a halt as I was waiting for the slow ones.
And to kind of answer my questions...
What can I do to resolve the issue - set a timeout that allows a maximum number of milliseconds/seconds for a request to finish.
The GetAsync() method is alright.
Async/await operations are also correct, just need to have in mind that doign an async select will return results ordered by the time it took for them to finish.
The ASP.NET Framework controllers are perfectly fine and do not contribute to the issue.

How to optimize c# code requesting JSON using REST api

I have to make a c# application which uses REST api to fetch JIRA issues. After I run the tool I am getting the correct output but it is taking a lot of time to display the output. Below is the part of code which is taking the maximum time
var client =new WebClient();
foreach(dynamic i in jira_keys)
{
issue_id=i.key;
string rest_api_url="some valid url"+issue_id;
var jira_response=client.DownloadString(rest_api_url);
//rest of the processing
}
jira_keys is a JArray. After this there is processing part of the JSON in the for each loop. This is taking a lot of time as the number of jira_keys increase. I cannot apply multi-threading to this since there are shared variable issues. So please someone suggest some way to optimise this.

If the issues are tied to a specific project or some other grouping, you can instead search for issues with a JQL string. This way you get them in bulk and paginated.
https://docs.atlassian.com/jira/REST/cloud/#api/2/search-search
Also, like cubrr said in his comment, async calls should work fine if you want to make api calls with multiple threads. Awaiting the call will block until the shared resources are ready.
(Would have posted as a comment if I had enough rep)

Here is an example of how you can fetch the responses from JIRA asynchronously.
var taskList = new List<Task<string>>();
foreach (dynamic i in jira_keys)
{
issue_id = i.key;
string rest_api_url = "some valid url" + issue_id;
var jiraDownloadTask = Task.Factory.StartNew(() => client.DownloadString(rest_api_url));
taskList.Add(jiraDownloadTask);
}
Task.WaitAll(taskList.ToArray());
//access the results
foreach(var task in taskList)
{
Console.WriteLine(task.Result);
}

Web response in C# .NET doesn't work more than a couple of times

I am developing an application using twitter api and that involves writing a method to check if a user exists. Here is my code:
public static bool checkUserExists(string user)
{
//string URL = "https://twitter.com/" + user.Trim();
//string URL = "http://api.twitter.com/1/users/show.xml?screen_name=" + user.Trim();
//string URL = "http://google.com/#hl=en&sclient=psy-ab&q=" + user.Trim();
string URL = "http://api.twitter.com/1/users/show.json?screen_name=" + user.Trim();
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(URL);
try
{
var webResponse = (HttpWebResponse)webRequest.GetResponse();
return true;
}
//this part onwards does not matter
catch (WebException ex)
{
if (ex.Status == WebExceptionStatus.ProtocolError && ex.Response != null)
{
var resp = (HttpWebResponse)ex.Response;
if (resp.StatusCode == HttpStatusCode.NotFound)
{
return false;
}
else
{
throw new Exception("Unknown level 1 Exception", ex);
}
}
else
{
throw new Exception("Unknown level 2 Exception", ex);
}
}
}
The problem is, calling the method does not work(it doesn't get a response) more than 2 or 3 times, using any of the urls that have been commented, including the google search query(I thought it might be due to twitter API limit). On debug, it shows that it's stuck at:
var webResponse = (HttpWebResponse)webRequest.GetResponse();
Here's how I am calling it:
Console.WriteLine(TwitterFollowers.checkUserExists("handle1"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle2"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle3"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle4"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle5"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle6"));
At most I get 2-3 lines of output. Could someone please point out what's wrong?
Update 1:
I sent 1 request every 15 seconds (well within limit) and it still causes an error. on the other hand, sending a request, closing the app and running it again works very well (on average accounts to 1 request every 5 seconds). The rate limit is 150 calls per hour Twitter FAQ.
Also, I did wait for a while, and got this exception at level 2:
http://pastie.org/3897499
Update 2:
Might sound surprising but if I run fiddler, it works perfectly. Regardless of whether I target this process or not!

The effect you're seeing is almost certainly due to rate-limit type policies on the Twitter API (multiple requests in quick succession). They keep a tight watch on how you're using their API: the first step is to check their terms of use and policies on rate limiting, and make sure you're in compliance.
Two things jump out at me:
You're hitting the API with multiple requests in rapid succession. Most REST APIs, including Google search, are not going to allow you to do that. These APIs are very visible targets, and it makes sense that they'd be pro-active about preventing denial-of-service attacks.
You don't have a User Agent specified in your request. Most APIs require you to send them a meaningful UA, as a way of helping them identify you.

Note that you're dealing with unmanaged resources underneath your HttpWebResponse. So calling Dispose() in a timely fashion or
wrapping the object in a using statement is not only wise, but important to avoid blocking.
Also, var is great for dealing with anonymous types, Linq query
results, and such but it should not become a crutch. Why use var
when you're well aware of the type? (i.e. you're already performing
a cast to HttpWebResponse.)
Finally, services like this often limit the rate of connections per second and/or the number of simultaneous connections allowed to prevent abuse. By not disposing of your HttpWebResponse objects, you may be violating the permitted number of simultaneous connections. By querying too often you'd break the rate limit.

HttpWebRequest Grinding to a halt, possibly just due to page size

I have WPF app that processes a lot of urls (thousands), each it sends off to it's own thread, does some processing and stores a result in the database.
The urls can be anything, but some seem to be massively big pages, this seems to shoot the memory usage up a lot and make performance really bad. I set a timeout on the web request, so if it took longer than say 20 seconds it doesn't bother with that url, but it seems to not make much difference.
Here's the code section:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(urlAddress.Address);
req.Timeout = 20000;
req.ReadWriteTimeout = 20000;
req.Method = "GET";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
pageSource = reader.ReadToEnd();
req = null;
}
It also seems to stall/ramp up memory on reader.ReadToEnd();
I would have thought having a cut off of 20 seconds would help, is there a better method? I assume there's not much advantage to using asynch web method as each url download is on its own thread anyway..
Thanks

In general, it's recommended that you use asynchronous HttpWebRequests instead of creating your own threads. The article I've linked above also includes some benchmarking results.
I don't know what you're doing with the page source after you read the stream to end, but using string can be an issue:
System.String type is used in any .NET application. We have strings
as: names, addresses, descriptions, error messages, warnings or even
application settings. Each application has to create, compare or
format string data. Considering the immutability and the fact that any
object can be converted to a string, all the available memory can be
swallowed by a huge amount of unwanted string duplicates or unclaimed
string objects.
Some other suggestions:
Do you have any firewall restrictions? I've seen a lot of issues at work where the firewall enables rate limiting and fetching pages grinds down to a halt (happens to me all the time)!
I presume that you're going to use the string to parse HTML, so I would recommend that you initialize your parser with the Stream instead of passing in a string containing the page source (if that's an option).
If you're storing the page source in the database, then there isn't much you can do.
Try to eliminate the reading of the page source as a potential contributor to the memory/performance problem by commenting it out.
Use a streaming HTML parser such as Majestic 12- avoids the need to load the entire page source into memory (again, if you need to parse)!
Limit the size of the pages you're going to download, say, only download 150KB. The average page size is about 100KB-130KB
Additionally, can you tell us what's your initial rate of fetching pages and what does it go down to? Are you seeing any errors/exceptions from the web request as you're fetching pages?
Update
In the comment section I noticed that you're creating thousands of threads and I would say that you don't need to do that. Start with a small number of threads and keep increasing them until you peek the performance on your system. Once you start adding threads and the performance looks like it's tapered off, then sop adding threads. I can't imagine that you will need more than 128 threads (even that seems high). Create a fixed number of threads, e.g. 64, let each thread take a URL from your queue, fetch the page, process it and then go back to getting pages from the queue again.

You could enumerate with a buffer instead of calling ReadToEnd, and if it is taking too long, then you could log and abandon - something like:
static void Main(string[] args)
{
Uri largeUri = new Uri("http://www.rfkbau.de/index.php?option=com_easybook&Itemid=22&startpage=7096");
DateTime start = DateTime.Now;
int timeoutSeconds = 10;
foreach (var s in ReadLargePage(largeUri))
{
if ((DateTime.Now - start).TotalSeconds > timeoutSeconds)
{
Console.WriteLine("Stopping - this is taking too long.");
break;
}
}
}
static IEnumerable<string> ReadLargePage(Uri uri)
{
int bufferSize = 8192;
int readCount;
Char[] readBuffer = new Char[bufferSize];
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (StreamReader stream = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
readCount = stream.Read(readBuffer, 0, bufferSize);
while (readCount > 0)
{
yield return new string(readBuffer, 0, bufferSize);
readCount = stream.Read(readBuffer, 0, bufferSize);
}
}
}

Lirik has really good summary.
I would add that if I were implementing this, I would make a separate process that reads the pages. So, it would be a pipeline. First stage would download the URL and write it to a disk location. And then queue that file to the next stage. Next stage reads from the disk and does the parsing & DB updates. That way you will get max throughput on the download and parsing as well. You can also tune your threadpools so that you have more workers parsing, etc. This architecture also lends very well to distributed processing where you can have one machine downloading, and another host parsing/etc.
Another thing to note is that if you are hitting the same server from multiple threads (even if you are using Async) then you will hit yourself against the max outgoing connection limit. You can throttle yourself to stay below that, or increase the connection limit on the ServicePointManager class.

Downloading & Parsing Multiple XML Files using WP7

I've run into a bit of a confusing issue, and I'm not sure if the problem is that I'm unaware of the WebClient "OpenReadCompletedEvent" delegate, or if there's a problem with my chosen solution used on my server side scripts, with which the app interacts with.
Here's my issue:
I have a class that defines a Video Game title, I use a WebClient to asynchronously open an RSS feed for reading, which, when completed continues to fetch user submitted information about that title using the same method. For this, I loop through each Video Game title parsed from the RSS feed (GameStop.com's RSS feed for upcoming games), here's where I'm running into problems, there's no way for me to keep all of these OpenReadCompletedEvent delegates synchronized, or none that I'm aware of.
Right now my code is becoming embarrassing and convoluted, and I believe it's incorrect:
Note: games is a List of Game objects.
List<Thread> threads = new List<Thread>();
for(int i = 0; i < games.Count; i++)
{
threads.Add(new Thread(downloadHype));
threads[i].Start(i);
}
public void downloadHype(object data)
{
int index = (int)data;
String tempUrl = String.Format("http://slyduck.com/hypemachine/frontend.php?intent=2&guid={0}", games[index].GuidString);
WebClient client = new WebClient();
client.OpenReadAsync(new Uri(tempUrl));
client.OpenReadCompleted += new OpenReadCompletedEventHandler(
delegate(object sender, OpenReadCompletedEventArgs e)
{
if (e.Error == null)
{
XDocument xdoc = XDocument.Load(e.Result);
games[index].Hype = (from item in xdoc.Descendants("hype")
select new Hype()
{
Id = uint.Parse(item.Element("id").Value),
GameId = uint.Parse(item.Element("game_id").Value),
UserId = uint.Parse(item.Element("user_id").Value),
Score = (uint.Parse(item.Element("score").Value) == 1)
}).ToList();
}
});
}
Is there an easier way for me to organize this? I considered the possibility of sending an array of the game guids as a GET or POST parameter to alleviate some of the garbage generated by creating so many WebClients, but I'm unsure if that's the right solution.
I've looked into Synchronization classes and Parallel classes, however they aren't available within SilverLight's .NET implementation.
Any help would be greatly appreciated. Thanks!

You spawn too many Threads. Remeber, a new Thread consumes 1MB of Virtual Address Space straight away.
If you have one user which has an ID (GUID), get the data by the guid (as you do know), but your XML should be as a list of Hype, not just a single one.
In other words, use a diffrent strutcture of XML. Then you just need one backgroud thread and one WebClient to fetch the whole list.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.