Downloading & Parsing Multiple XML Files using WP7

Downloading & Parsing Multiple XML Files using WP7 - c#

I've run into a bit of a confusing issue, and I'm not sure if the problem is that I'm unaware of the WebClient "OpenReadCompletedEvent" delegate, or if there's a problem with my chosen solution used on my server side scripts, with which the app interacts with.
Here's my issue:
I have a class that defines a Video Game title, I use a WebClient to asynchronously open an RSS feed for reading, which, when completed continues to fetch user submitted information about that title using the same method. For this, I loop through each Video Game title parsed from the RSS feed (GameStop.com's RSS feed for upcoming games), here's where I'm running into problems, there's no way for me to keep all of these OpenReadCompletedEvent delegates synchronized, or none that I'm aware of.
Right now my code is becoming embarrassing and convoluted, and I believe it's incorrect:
Note: games is a List of Game objects.
List<Thread> threads = new List<Thread>();
for(int i = 0; i < games.Count; i++)
{
threads.Add(new Thread(downloadHype));
threads[i].Start(i);
}
public void downloadHype(object data)
{
int index = (int)data;
String tempUrl = String.Format("http://slyduck.com/hypemachine/frontend.php?intent=2&guid={0}", games[index].GuidString);
WebClient client = new WebClient();
client.OpenReadAsync(new Uri(tempUrl));
client.OpenReadCompleted += new OpenReadCompletedEventHandler(
delegate(object sender, OpenReadCompletedEventArgs e)
{
if (e.Error == null)
{
XDocument xdoc = XDocument.Load(e.Result);
games[index].Hype = (from item in xdoc.Descendants("hype")
select new Hype()
{
Id = uint.Parse(item.Element("id").Value),
GameId = uint.Parse(item.Element("game_id").Value),
UserId = uint.Parse(item.Element("user_id").Value),
Score = (uint.Parse(item.Element("score").Value) == 1)
}).ToList();
}
});
}
Is there an easier way for me to organize this? I considered the possibility of sending an array of the game guids as a GET or POST parameter to alleviate some of the garbage generated by creating so many WebClients, but I'm unsure if that's the right solution.
I've looked into Synchronization classes and Parallel classes, however they aren't available within SilverLight's .NET implementation.
Any help would be greatly appreciated. Thanks!

You spawn too many Threads. Remeber, a new Thread consumes 1MB of Virtual Address Space straight away.
If you have one user which has an ID (GUID), get the data by the guid (as you do know), but your XML should be as a list of Hype, not just a single one.
In other words, use a diffrent strutcture of XML. Then you just need one backgroud thread and one WebClient to fetch the whole list.

Related

How to optimize c# code requesting JSON using REST api

I have to make a c# application which uses REST api to fetch JIRA issues. After I run the tool I am getting the correct output but it is taking a lot of time to display the output. Below is the part of code which is taking the maximum time
var client =new WebClient();
foreach(dynamic i in jira_keys)
{
issue_id=i.key;
string rest_api_url="some valid url"+issue_id;
var jira_response=client.DownloadString(rest_api_url);
//rest of the processing
}
jira_keys is a JArray. After this there is processing part of the JSON in the for each loop. This is taking a lot of time as the number of jira_keys increase. I cannot apply multi-threading to this since there are shared variable issues. So please someone suggest some way to optimise this.

If the issues are tied to a specific project or some other grouping, you can instead search for issues with a JQL string. This way you get them in bulk and paginated.
https://docs.atlassian.com/jira/REST/cloud/#api/2/search-search
Also, like cubrr said in his comment, async calls should work fine if you want to make api calls with multiple threads. Awaiting the call will block until the shared resources are ready.
(Would have posted as a comment if I had enough rep)

Here is an example of how you can fetch the responses from JIRA asynchronously.
var taskList = new List<Task<string>>();
foreach (dynamic i in jira_keys)
{
issue_id = i.key;
string rest_api_url = "some valid url" + issue_id;
var jiraDownloadTask = Task.Factory.StartNew(() => client.DownloadString(rest_api_url));
taskList.Add(jiraDownloadTask);
}
Task.WaitAll(taskList.ToArray());
//access the results
foreach(var task in taskList)
{
Console.WriteLine(task.Result);
}

Proper Use of Delegates and Multi-Threading

I am writing a WCF service that has source data from multiple sources. These are large files in various formats.
I have implemented Caching and set-up a polling interval so these files are kept up to date with fresh data.
I have constructed a manager class that basically is responsible for returning XDocument objects back to the caller. The manager class first checks the cache for existence. If it doesn't exist - it makes the call to retrieve fresh data. Nothing big here.
What I would like to do to keep the response snappy is serialize the file previously downloaded and pass that back to the caller - again nothing new...however...I want to spawn a new thread as soon as the serialization is complete to retrieve the fresh data and overwrite the old file. This is my problem...
Admittedly an intermediate programmer - I came across a few examples on multi-threading (here for that matter)...The problem is it introduced the concept of delegates and I am really struggling with this.
Here is some of my code:
//this method invokes another object that is responsible for making the
//http call, decompressing the file and persisting to the hard drive.
private static void downloadFile(string url, string LocationToSave)
{
using (WeatherFactory wf = new WeatherFactory())
{
wf.getWeatherDataSource(url, LocationToSave);
}
}
//A new thread variable
private static Thread backgroundDownload;
//the delegate...but I am so confused on how to use this...
delegate void FileDownloader(string url, string LocationToSave);
//The method that should be called in the new thread....
//right now the compiler is complaining that I don't have the arguments from
//the delegate (Url and LocationToSave...
//the problem is I don't pass URL and LocationToSave here...
static void Init(FileDownloader download)
{
backgroundDownload = new Thread(new ThreadStart(download));
backgroundDownload.Start();
}
I'd like to implement this the correct way...so a bit of education on how to make this work would be appreciated.

I would use the Task Parallel library to do this:
//this method invokes another object that is responsible for making the
//http call, decompressing the file and persisting to the hard drive.
private static void downloadFile(string url, string LocationToSave)
{
using (WeatherFactory wf = new WeatherFactory())
{
wf.getWeatherDataSource(url, LocationToSave);
}
//Update cache here?
}
private void StartBackgroundDownload()
{
//Things to consider:
// 1. what if we are already downloading, start new anyway?
// 2. when/how to update your cache
var task = Task.Factory.StartNew(_=>downloadFile(url, LocationToSave));
}

Storing Facebook likes locally - Fetching number of likes performance issue

I am building an app using ASP.NET 4.0.
I have a table called entries. Entries can be liked via Facebook. I want to implement the ability to sort via likes so I am taking the approach of storing the number of likes for each entry and using that column to order. The problem is the overhead involved in getting the number of likes. I think the method I am using could be improved as right now because fetching data for only 13 entries is taking 4 seconds, which is way too long.
I am using the FB graph api and JSON.NET to parse the response. In the following code I have a List of type Entry, I am getting the like url for the entry using an app setting combined with the entries id.
This is what I am doing:
foreach (Entry entry in entries)
{
int likes;
try
{
// the url that is tied to the entry
string url = "http://graph.facebook.com/?ids=" + Properties.Settings.Default.likeUrl + "?id=" + entry.EntryId;
//open a WebClient and get the results of the url
WebClient client = new WebClient();
Stream data = client.OpenRead(url);
StreamReader reader = new StreamReader(data);
string s = reader.ReadToEnd();
//parse out the response
var json = JObject.Parse(s);
//shares are how many likes the entry has
likes = Convert.ToInt32(json.First.First.SelectToken("shares").ToString());
}
catch (Exception ex)
{
likes = 0;
}
}
As I said this method is very expensive. If anyone could suggest a better way to do what I am attempting here I would really appreciate the help. Thanks much!

Method,
You are not disposing of your stream or stream reader. This may not help in the individual performance, but you could see a slow down later... Also try to use the parallel extentions, which would require a little more care in the handling of variables. This is just an example:
EDITED: I forgot that webclient is disposable too. That needs to be disposed of each time or it will hang onto a connection for a while. That actually might help a bit.
private object locker;
private int _likes = 0;
private int Likes
{
get
{
lock(locker)
{
return _likes;
}
}
set
{
lock(locker)
{
_likes = value;
}
}
}
void MyMethod()
{
Parallel.ForEach(entries, entry =>
{
using(WebClient client = new WebClient())
using(Stream data = client.OpenRead(url))
using(StreamReader reader = new StreamReader(data))
{
....
}
}
}

Doing a separate API call for each item in the loop is going to be slow due to the overhead of making network requests. Have you looked into batching the query for the likes for all 13 items into a single API call? I don't know specifically if it will work for the query you are running, but I know that the facebook API supports methods of batching queries. You can run the batches such that the output of one goes into other queries in the same batch. You may have to switch to making FQL queries via the Graph API.
You might also consider moving the API calls onto the client, and implement them using the javascript API. This will offload the API work to the users' browsers, which will let your application scale better. If you don't do this you should at least consider Robert's suggestion of making the calls asynchronously.

Open thread in foreach loop

I am getting an XML feed and I parse it the my MQ server, then I have a service that listen to the MQ server and reading all its messages.
I have a foreach loop that opens a new thread each iteration, in order to make the parsing faster, cause there are around 500 messages in the MQ (means there are 500 XMLs)
foreach (System.Messaging.Message m in msgs)
{
byte[] bytes = new byte[m.BodyStream.Length];
m.BodyStream.Read(bytes, 0, (int)m.BodyStream.Length);
System.Text.ASCIIEncoding ascii = new System.Text.ASCIIEncoding();
ParserClass tst = new ParserClass(ascii.GetString(bytes, 0, (int)m.BodyStream.Length));
new Thread( new ThreadStart(tst.ProcessXML)).Start();
}
In the ParserClass I have this code:
private static object thLockMe = new object();
public string xmlString { get; set; }
public ParserClass(string xmlStringObj)
{
this.xmlString = xmlStringObj;
}
public void ProcessXML()
{
lock (thLockMe)
{
XDocument reader = XDocument.Parse(xmlString);
//Some more code...
}
}
The problem is, when I run this foreach loop with 1 thread only, it works perfect, but slow.
When I run it with more then 1 thread, I get an error "Object reference not set to an instance of an object".
I guess there is something wrong with my locking since I am not very experienced with threading.
I am kinda hopeless, hope you can help!
Cheers!

I note that you are running a bunch of threads with their entire code wrapped inside a lock statement. You might as well run the methods in a sequence this way, because you are not getting any parallelism.

Since you are creating a new ParserClass instance on every iteration of your loop, and also creating and starting a new thread every iteration, you do not need a lock in your ParseXML method.
Your object on which you lock is currently static, so it is not instance bound, which means, once one thread is inside your ParseXML method, no other will be able to do anything, until the first has finished.
You are not sharing any data (from the code I can see) in your Parser class amongst threads, so you don't need a lock, inside your ParseXML function.
If you are using data that is shared between threads, then you should have a lock.
If you're going to be using lots of threads, then you're better of using a ThreadPool, and taking a finite (4 perhaps) from your pool, assigning them some work, and recycling them for the next 4 tasks.
Creating threads is an expensive operation, which requires a call into the OS kernel, so you do not want to do that 500 times. This is too costly. Also, the min reserved memory for a threadstack in Windows is 1MB, so that is 500MB in stackspace alone for your threads.
An optimal number of threads should be equal to the number of cores in your machine, however since that's not real for most purposes, you can do double or triple that, but then you're better off with a threadpool, where you recycle threads, instead of creating new one's all the time.

Even though this probably won't solve your problem, instead of creating 500 simultaneous threads you should just use the ThreadPool, which manages threads in a much more efficient way:
foreach (System.Messaging.Message m in msgs)
{
byte[] bytes = new byte[m.BodyStream.Length];
m.BodyStream.Read(bytes, 0, (int)m.BodyStream.Length);
System.Text.ASCIIEncoding ascii = new System.Text.ASCIIEncoding();
ParserClass tst = new ParserClass(ascii.GetString(bytes, 0, (int)m.BodyStream.Length));
ThreadPool.QueueUserWorkItem(x => tst.ProcessXML());
}
And to make sure they run as simultaneously as possible change your code in the ParserClass like this (assuming you indeed have resources you share between threads - if you don't have any, you don't have to lock at all):
private static object thLockMe = new object();
public string XmlString { get; set; }
public ParserClass(string xmlString)
{
XmlString = xmlString;
}
public void ProcessXML()
{
XDocument reader = XDocument.Parse(xmlString);
// some more code which doesn't need to access the shared resource
lock (thLockMe)
{
// the necessary code to access the shared resource (and only that)
}
// more code
}
Regarding your actual question:
Instead of calling OddService.InsertEvent(...) multiple times with the same parameters (that method reeks of remote calls and side effects...) you should call it once, store the result in a variable and do all subsequent operations on that variable. That way you can also conveniently check if it's not that precise method which returns null sometimes (when accessed simultaneously?).
Edit:
Does it work if you put all calls to OddService.* in lock blocks?

C# Downloader: should I use Threads, BackgroundWorker or ThreadPool?

I'm writing a downloader in C# and stopped at the following problem: what kind of method should I use to parallelize my downloads and update my GUI?
In my first attempt, I used 4 Threads and at the completion of each of them I started another one: main problem was that my cpu goes 100% at each new thread start.
Googling around, I found the existence of BackgroundWorker and ThreadPool: stating that I want to update my GUI with the progress of each link that I'm downloading, what is the best solution?
1) Creating 4 different BackgroundWorker, attaching to each ProgressChanged event a Delegate to a function in my GUI to update the progress?
2) Use ThreadPool and setting max and min number of threads to the same value?
If I choose #2, when there are no more threads in the queue, does it stop the 4 working threads? Does it suspend them? Since I have to download different lists of links (20 links each of them) and move from one to another when one is completed, does the ThreadPool start and stop threads between each list?
If I want to change the number of working threads on live and decide to use ThreadPool, changing from 10 threads to 6, does it throw and exception and stop 4 random threads?
This is the only part that is giving me an headache.
I thank each of you in advance for your answers.

I would suggest using WebClient.DownloadFileAsync for this. You can have multiple downloads going, each raising the DownloadProgressChanged event as it goes along, and DownloadFileCompleted when done.
You can control the concurrency by using a queue with a semaphore or, if you're using .NET 4.0, a BlockingCollection. For example:
// Information used in callbacks.
class DownloadArgs
{
public readonly string Url;
public readonly string Filename;
public readonly WebClient Client;
public DownloadArgs(string u, string f, WebClient c)
{
Url = u;
Filename = f;
Client = c;
}
}
const int MaxClients = 4;
// create a queue that allows the max items
BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>(MaxClients);
// queue of urls to be downloaded (unbounded)
Queue<string> UrlQueue = new Queue<string>();
// create four WebClient instances and put them into the queue
for (int i = 0; i < MaxClients; ++i)
{
var cli = new WebClient();
cli.DownloadProgressChanged += DownloadProgressChanged;
cli.DownloadFileCompleted += DownloadFileCompleted;
ClientQueue.Add(cli);
}
// Fill the UrlQueue here
// Now go until the UrlQueue is empty
while (UrlQueue.Count > 0)
{
WebClient cli = ClientQueue.Take(); // blocks if there is no client available
string url = UrlQueue.Dequeue();
string fname = CreateOutputFilename(url); // or however you get the output file name
cli.DownloadFileAsync(new Uri(url), fname,
new DownloadArgs(url, fname, cli));
}
void DownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
DownloadArgs args = (DownloadArgs)e.UserState;
// Do status updates for this download
}
void DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
DownloadArgs args = (DownloadArgs)e.UserState;
// do whatever UI updates
// now put this client back into the queue
ClientQueue.Add(args.Client);
}
There's no need for explicitly managing threads or going to the TPL.

I think you should look into using the Task Parallel Library, which is new in .NET 4 and is designed for solving these types of problems

Having 100% cpu load has nothing to do with the download (as your network is practically always the bottleneck). I would say you have to check your logic how you wait for the download to complete.
Can you post some code of the thread's code you start multiple times?

By creating 4 different backgroundworkers you will be creating seperate threads that will no longer interfere with your GUI. Backgroundworkers are simple to implement and from what I understand will do exactly what you need them to do.
Personally I would do this and simply allow the others to not start until the previous one is finished. (Or maybe just one, and allow it to execute one method at a time in the correct order.)
FYI - Backgroundworker

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Downloading & Parsing Multiple XML Files using WP7 - c#

Related

How to optimize c# code requesting JSON using REST api

Proper Use of Delegates and Multi-Threading

Storing Facebook likes locally - Fetching number of likes performance issue

Open thread in foreach loop

C# Downloader: should I use Threads, BackgroundWorker or ThreadPool?

Categories

Resources