Multiple parallel calls to WCF Service takes longer than single call - c#

I'm testing WCF concurrency and instancing.
There is wcf Service :
public class Service1 : IService1
{
public string GetData(int value)
{
Thread.Sleep(1000);
return string.Format("You entered: {0}", value);
}
}
From my forms application I make call to this service method. When I do single call, it takes aprox: 1 sec as expected.
private void single_Click(object sender, EventArgs e)
{
using (var service = new Service1Client())
{
var sw = new Stopwatch();
sw.Start();
service.GetData(1);
sw.Stop();
Debug.WriteLine(sw.Elapsed);
}
}
But when I call it multiple times with Tasks, it takes aprox : call count * 1 second.
private void mult_Click(object sender, EventArgs e)
{
using (var service = new Service1Client())
{
var tasks = new List<Task<string>>();
for (var i = 0; i < 5; i++)
{
int p = i;
tasks.Add(Task.Factory.StartNew(() => service.GetData(p)));
}
var sw = new Stopwatch();
sw.Start();
Task.WaitAll(tasks.ToArray());
sw.Stop();
Debug.WriteLine(sw.Elapsed);
foreach (var task in tasks)
{
Debug.WriteLine(task.Result);
}
}
}
I've tried all 9 combinations of Instancing and Concurrency (Instance mode = Per Call and Concurrency = Single etc.)
Interesting thing is that if I create new ServiceClient object for all Task, it works fine, but I don't think it is right approach. I feel that there must be some thing I missed.If so, Can you tell me what exactly?

The issue is on the client side.
You have to explicitly call Open() on the Service1Client object before making any calls to the service. Otherwise your WCF client proxy is internally going to have a call to EnsureOpened(). The problem is specifically that EnsureOpened() will result in each request waiting until the previous request is completed before executing and this is why only one request will be sent out at a time and not in parallel as desired.
Change your code like this:
using (var service = new Service1Client())
{
service.Open();
// Do stuff...
}
From Wenlong Dong's excellent blog post on the subject:
If you don’t call the “Open” method first, the proxy would be opened
internally when the first call is made on the proxy. This is called
auto-open. Why? When the first message is sent through the auto-opened
proxy, it will cause the proxy to be opened automatically. You can use
.NET Reflector to open the method
System.ServiceModel.Channels.ServiceChannel.Call and see the following
code:
if (!this.explicitlyOpened)
{
this.EnsureDisplayUI();
this.EnsureOpened(rpc.TimeoutHelper.RemainingTime());
}
When you drill down into EnsureOpened, you will see that it calls
CallOnceManager.CallOnce. For non-first calls, you would hit
SyncWait.Wait which waits for the first request to complete. This
mechanism is to ensure that all requests wait for the proxy to be
opened and it also ensures the correct execution order. Thus all
requests are serialized into a single execution sequence until all
requests are drained out from the queue. This is not a desired
behavior in most cases.

Related

Why calls from the same client are not executed at the same time

I have azure function that only have 2 lines of code, first is await Task.Delay(5000) and second is returning status Ok to client. In host.json maxOutstandingRequests and maxConcurrentRequests are set. AF is executed locally and also i tried with deployed version. The problem occurs when I try to send multiple requests from the same HttpClient, function calls will not be async. Execution time per call will be 5, 10, 15, 20 and 25s. When i run same client's code with WebApi instead of AF (WebApi controller has same function as AF), then execution time per each call is 5s. I want to know how can i get the same behavior with Azure function as I have it with WebApi?
Below I provided client's code.
class Program
{
private static System.Net.Http.HttpClient httpClient = new System.Net.Http.HttpClient();
static void Main(string[] args)
{
IEnumerable<TimeSpan> result = Test(10);
Console.WriteLine(string.Join("\n", result));
Console.ReadLine();
}
private static IEnumerable<TimeSpan> Test(int taskCount)
{
Task<TimeSpan>[] tasks = new Task<TimeSpan>[taskCount];
for (int i = 0; i < taskCount; i++)
tasks[i] = Send();
Task.WaitAll(tasks);
return tasks.Select(t => t.Result);
}
private static async Task<TimeSpan> Send()
{
using (var request = new HttpRequestMessage(HttpMethod.Post, ConfigurationManager.AppSettings["AF"]))
{
Stopwatch sw = new Stopwatch();
sw.Start();
using (var response = await httpClient.SendAsync(request))
{
sw.Stop();
return sw.Elapsed;
}
}
}
}
I did some research for this issue. First, I added a line of code in your client's code "Console.WriteLine("--" + sw.Elapsed);".
Then run this code and I saw the 10 Stopwatch start time are printed at the same time.
So this is the reason for the result of 5s, 10s, 15s, 20s and 25s.
After that, if we want to test the async task of azure function, we must add the property in host.json(I think you have completed it)
And then I run the client's code in visual studio and got the result as below screenshot.
I'm not sure if you have the same problem. If you met the same problem as the screenshot above shows. You can install Fiddler4(because according to some research, it may caused by proxy). So I installed Fiddler4 and then run the client's code in visual studio again.
It shows the code send 10 requests successfully and the sw.Elapsed time returned are what we expected.
The 10 lines of 6.xx seconds prove the azure function run task async.

How to use tasks, background threads or another method to solve a hanging issue in Windows Service

I was wondering the best way to get round this issue.
I have created a Windows Service that connects to a mailbox, processes the emails, then cleans up after itself, waits a certain amount of time and repeats.
protected override void OnStart(string[] args)
{
this._mainTask = new Task(this.Poll, this._cancellationToken.Token, TaskCreationOptions.LongRunning);
this._mainTask.Start();
}
private void Poll()
{
CancellationToken cancellation = this._cancellationToken.Token;
TimeSpan interval = TimeSpan.Zero;
while (!cancellation.WaitHandle.WaitOne(interval))
{
using (IImapClient emailClient = new S22ImapClient())
{
ImapClientSettings chatSettings = ...;
emailClient.Connect(chatSettings); // CAN SOMETIMES HANG HERE
// SOME WORK DONE HERE
}
interval = this._waitAfterSuccessInterval;
// check the cancellation state.
if (cancellation.IsCancellationRequested)
{
break;
}
}
}
Now I am using a 3rd party IMAP client "S22.Imap". When I create the email client object on occasion it will hang on creation as it is attempting to login. This in turn will hang my Windows Service indefinitely.
public class S22ImapClient : IImapClient
{
private ImapClient _client;
public void Connect(ImapClientSettings imapClientSettings)
{
this._client = new ImapClient(
imapClientSettings.Host,
imapClientSettings.Port,
imapClientSettings.EmailAddress,
imapClientSettings.Password,
AuthMethod.Login,
true);
}
}
How would I change the "S22ImapClient.Connect()" call to, behind the covers, use some method to attempt to connect for a set amount of time, then abort if it has not been able to?
The solution to this will also be used for anything that I need to do with the mail client, for example "GetMessage()", "DeleteMessage()" etc
You could use a cancellation token source and give it a time to cancel after in the event that it hangs too long. Otherwise you would just have to extend the third party class and implement an async version of the Connect method. This is untested but should give you the basic idea.
private void Poll()
{
CancellationTokenSource source = new CancellationTokenSource();
TimeSpan interval = TimeSpan.Zero;
while (!source.Token.WaitHandle.WaitOne(interval))
{
using (IImapClient emailClient = new S22ImapClient())
{
ImapClientSettings chatSettings = ...;
var task = Task.Run(() =>
{
source.CancelAfter(TimeSpan.FromSeconds(5));
emailClient.Connect(chatSettings); // CAN SOMETIMES HANG HERE
}, source.Token);
// SOME WORK DONE HERE
}
interval = this._waitAfterSuccessInterval;
// check the cancellation state.
if (source.IsCancellationRequested)
{
break;
}
}
}
I decided to stop using the S22.Imap email client for this particular problem, and use another 3rd party component, ActiveUp MailSystem, as it includes async calls out the box.
This way I can do code like this:
IAsyncResult connectResult = this._client.BeginConnectSsl(imapClientSettings.Host, imapClientSettings.Port, null);
if (!connectResult.AsyncWaitHandle.WaitOne(this._connectionTimeout))
{
throw new EmailTimeoutException(this._connectionTimeout);
}

Processing Messages in Parallel in Azure Service Bus

Problem: I've got tons of emails to send, presently, an average of 10 emails in the queue at any point in time. The code I have process the queue one at a time; that is, receive the message, process it and eventually send the email. This cause a considerably delay in sending emails to users when they signup for the service.
I've begun to think of modifying the code to process the messages in parrallel say 5 asynchronously. I'm imagining writing a method and using the CTP to call this method in parallel, say, 5 times.
I'm a little bit lost in how to implement this. The cost of making a mistake is exceedingly great as users will get disappointed if things go wrong.
Request: I need help in writing code that process messages in Azure service bus in parallel.
Thanks.
My code in a nutshell.
Public .. Run()
{
_myQueueClient.BeginReceive(ProcessUrgentEmails, _myQueueClient);
}
void ProcessUrgentEmails(IAsyncResult result)
{
//casted the `result` as a QueueClient
//Used EndReceive on an object of BrokeredMessage
//I processed the message, then called
sendEmail.BeginComplete(ProcessEndComplete, sendEmail);
}
//This method is never called despite having it as callback function above.
void ProcessEndComplete(IAsyncResult result)
{
Trace.WriteLine("ENTERED ProcessEndComplete method...");
var bm = result.AsyncState as BrokeredMessage;
bm.EndComplete(result);
}
This page gives you performance tips when using Windows Azure Service Bus.
About parallel processing, you could have a pool of threads for processing, and every time you get a message, you just grab one of that pool and assign it a message. You need to manage that pool.
OR, you could retrieve multiple messages at once and process them using TPL... for example, the method BeginReceiveBatch/EndReceiveBatch allows you to retrieve multiple "items" from Queue (Async) and then use "AsParallel" to convert the IEnumerable returned by the previous methods and process the messages in multiple threads.
VERY simple and BARE BONES sample:
var messages = await Task.Factory.FromAsync<IEnumerable<BrokeredMessage>>(Client.BeginReceiveBatch(3, null, null), Client.EndReceiveBatch);
messages.AsParallel().WithDegreeOfParallelism(3).ForAll(item =>
{
ProcessMessage(item);
});
That code retrieves 3 messages from queue and processes then in "3 threads" (Note: it is not guaranteed that it will use 3 threads, .NET will analyze the system resources and it will use up to 3 threads if necessary)
You could also remove the "WithDegreeOfParallelism" part and .NET will use whatever threads it needs.
At the end of the day there are multiple ways to do it, you have to decide which one works better for you.
UPDATE: Sample without using ASYNC/AWAIT
This is a basic (without error checking) sample using regular Begin/End Async pattern.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Net;
using System.Threading;
using Microsoft.ServiceBus;
using Microsoft.ServiceBus.Messaging;
using Microsoft.WindowsAzure;
using Microsoft.WindowsAzure.ServiceRuntime;
namespace WorkerRoleWithSBQueue1
{
public class WorkerRole : RoleEntryPoint
{
// The name of your queue
const string QueueName = "QUEUE_NAME";
const int MaxThreads = 3;
// QueueClient is thread-safe. Recommended that you cache
// rather than recreating it on every request
QueueClient Client;
bool IsStopped;
int dequeueRequests = 0;
public override void Run()
{
while (!IsStopped)
{
// Increment Request Counter
Interlocked.Increment(ref dequeueRequests);
Trace.WriteLine(dequeueRequests + " request(s) in progress");
Client.BeginReceive(new TimeSpan(0, 0, 10), ProcessUrgentEmails, Client);
// If we have made too many requests, wait for them to finish before requesting again.
while (dequeueRequests >= MaxThreads && !IsStopped)
{
System.Diagnostics.Trace.WriteLine(dequeueRequests + " requests in progress, waiting before requesting more work");
Thread.Sleep(2000);
}
}
}
void ProcessUrgentEmails(IAsyncResult result)
{
var qc = result.AsyncState as QueueClient;
var sendEmail = qc.EndReceive(result);
// We have received a message or has timeout... either way we decrease our counter
Interlocked.Decrement(ref dequeueRequests);
// If we have a message, process it
if (sendEmail != null)
{
var r = new Random();
// Process the message
Trace.WriteLine("Processing message: " + sendEmail.MessageId);
System.Threading.Thread.Sleep(r.Next(10000));
// Mark it as completed
sendEmail.BeginComplete(ProcessEndComplete, sendEmail);
}
}
void ProcessEndComplete(IAsyncResult result)
{
var bm = result.AsyncState as BrokeredMessage;
bm.EndComplete(result);
Trace.WriteLine("Completed message: " + bm.MessageId);
}
public override bool OnStart()
{
// Set the maximum number of concurrent connections
ServicePointManager.DefaultConnectionLimit = 12;
// Create the queue if it does not exist already
string connectionString = CloudConfigurationManager.GetSetting("Microsoft.ServiceBus.ConnectionString");
var namespaceManager = NamespaceManager.CreateFromConnectionString(connectionString);
if (!namespaceManager.QueueExists(QueueName))
{
namespaceManager.CreateQueue(QueueName);
}
// Initialize the connection to Service Bus Queue
Client = QueueClient.CreateFromConnectionString(connectionString, QueueName);
IsStopped = false;
return base.OnStart();
}
public override void OnStop()
{
// Waiting for all requestes to finish (or timeout) before closing
while (dequeueRequests > 0)
{
System.Diagnostics.Trace.WriteLine(dequeueRequests + " request(s), waiting before stopping");
Thread.Sleep(2000);
}
// Close the connection to Service Bus Queue
IsStopped = true;
Client.Close();
base.OnStop();
}
}
}
Hope it helps.

Multi-threaded async web service call in c# .net 3.5

I have 2 ASP.net 3.5 asmx web services, ws2 and ws3. They contain operations op21 and op31 respectively. op21 sleeps for 2 seconds and op31 sleeps for 3 seconds. I want to call both op21 and op31 from op11 in a web service, ws1, asynchronously. Such that when I call op11 from a client synchronously.,the time-taken will be 3 seconds which is the total. I currently get 5 seconds with this code:
WS2SoapClient ws2 = new WS2SoapClient();
WS3SoapClient ws3 = new WS3SoapClient();
//capture time
DateTime now = DateTime.Now;
//make calls
IAsyncResult result1 = ws3.BeginOP31(null,null);
IAsyncResult result2 = ws2.BeginOP21(null,null);
WaitHandle[] handles = { result1.AsyncWaitHandle, result2.AsyncWaitHandle };
WaitHandle.WaitAll(handles);
//calculate time difference
TimeSpan ts = DateTime.Now.Subtract(now);
return "Asynchronous Execution Time (h:m:s:ms): " + String.Format("{0}:{1}:{2}:{3}",
ts.Hours,
ts.Minutes,
ts.Seconds,
ts.Milliseconds);
The expected result is that the total time for both requests should be equal to the time it takes for the slower request to execute.
Note that this works as expected when I debug it with Visual Studio, however when running this on IIS, the time is 5 seconds which seems to show the requests are not processed concurrently.
My question is, is there a specific configuration with IIS and the ASMX web services that might need to be setup properly for this to work as expected?
Original Answer:
I tried this with google.com and bing.com am getting the same thing, linear execution. The problem is that you are starting the BeginOP() calls on the same thread, and the AsyncResult (for whatever reason) is not returned until the call is completed. Kind of useless.
My pre-TPL multi-threading is a bit rusty but I tested the code at the end of this answer and it executes asynchronously: This is a .net 3.5 console app. Note I obviously obstructed some of your code but made the classes look the same.
Update:
I started second-guessing myself because my execution times were so close to each other, it was confusing. So I re-wrote the test a little bit to include both your original code and my suggested code using Thread.Start(). Additionally, I added Thread.Sleep(N) in the WebRequest methods such that it should simulate vastly different execution times for the requests.
The test results do show that the code you posted was sequentially executed as I stated above in my original answer.
Note the total time is much longer in both cases than the actual web request time because of the Thread.Sleep(). I also added the Thread.Sleep() to offset the fact that the first web request to any site takes a long time to spin up (9 seconds), as can be seen above. Either way you slice it, it's clear that the times are sequential in the "old" case and truly "asynchronous" in the new case.
The updated program for testing this out:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading;
namespace MultiThreadedTest
{
class Program
{
static void Main(string[] args)
{
// Test both ways of executing IAsyncResult web calls
ExecuteUsingWaitHandles();
Console.WriteLine();
ExecuteUsingThreadStart();
Console.ReadKey();
}
private static void ExecuteUsingWaitHandles()
{
Console.WriteLine("Starting to execute using wait handles (old way) ");
WS2SoapClient ws2 = new WS2SoapClient();
WS3SoapClient ws3 = new WS3SoapClient();
IAsyncResult result1 = null;
IAsyncResult result2 = null;
// Time the threadas
var stopWatchBoth = System.Diagnostics.Stopwatch.StartNew();
result1 = ws3.BeginOP31();
result2 = ws2.BeginOP21();
WaitHandle[] handles = { result1.AsyncWaitHandle, result2.AsyncWaitHandle };
WaitHandle.WaitAll(handles);
stopWatchBoth.Stop();
// Display execution time of individual calls
Console.WriteLine((result1.AsyncState as StateObject));
Console.WriteLine((result2.AsyncState as StateObject));
// Display time for both calls together
Console.WriteLine("Asynchronous Execution Time for both is {0}", stopWatchBoth.Elapsed.TotalSeconds);
}
private static void ExecuteUsingThreadStart()
{
Console.WriteLine("Starting to execute using thread start (new way) ");
WS2SoapClient ws2 = new WS2SoapClient();
WS3SoapClient ws3 = new WS3SoapClient();
IAsyncResult result1 = null;
IAsyncResult result2 = null;
// Create threads to execute the methods asynchronously
Thread startOp3 = new Thread( () => result1 = ws3.BeginOP31() );
Thread startOp2 = new Thread( () => result2 = ws2.BeginOP21() );
// Time the threadas
var stopWatchBoth = System.Diagnostics.Stopwatch.StartNew();
// Start the threads
startOp2.Start();
startOp3.Start();
// Make this thread wait until both of those threads are complete
startOp2.Join();
startOp3.Join();
stopWatchBoth.Stop();
// Display execution time of individual calls
Console.WriteLine((result1.AsyncState as StateObject));
Console.WriteLine((result2.AsyncState as StateObject));
// Display time for both calls together
Console.WriteLine("Asynchronous Execution Time for both is {0}", stopWatchBoth.Elapsed.TotalSeconds);
}
}
// Class representing your WS2 client
internal class WS2SoapClient : TestWebRequestAsyncBase
{
public WS2SoapClient() : base("http://www.msn.com/") { }
public IAsyncResult BeginOP21()
{
Thread.Sleep(TimeSpan.FromSeconds(10D));
return BeginWebRequest();
}
}
// Class representing your WS3 client
internal class WS3SoapClient : TestWebRequestAsyncBase
{
public WS3SoapClient() : base("http://www.google.com/") { }
public IAsyncResult BeginOP31()
{
// Added sleep here to simulate a much longer request, which should make it obvious if the times are overlapping or sequential
Thread.Sleep(TimeSpan.FromSeconds(20D));
return BeginWebRequest();
}
}
// Base class that makes the web request
internal abstract class TestWebRequestAsyncBase
{
public StateObject AsyncStateObject;
protected string UriToCall;
public TestWebRequestAsyncBase(string uri)
{
AsyncStateObject = new StateObject()
{
UriToCall = uri
};
this.UriToCall = uri;
}
protected IAsyncResult BeginWebRequest()
{
WebRequest request =
WebRequest.Create(this.UriToCall);
AsyncCallback callBack = new AsyncCallback(onCompleted);
AsyncStateObject.WebRequest = request;
AsyncStateObject.Stopwatch = System.Diagnostics.Stopwatch.StartNew();
return request.BeginGetResponse(callBack, AsyncStateObject);
}
void onCompleted(IAsyncResult result)
{
this.AsyncStateObject = (StateObject)result.AsyncState;
this.AsyncStateObject.Stopwatch.Stop();
var webResponse = this.AsyncStateObject.WebRequest.EndGetResponse(result);
Console.WriteLine(webResponse.ContentType, webResponse.ResponseUri);
}
}
// Keep stopwatch on state object for illustration of individual execution time
internal class StateObject
{
public System.Diagnostics.Stopwatch Stopwatch { get; set; }
public WebRequest WebRequest { get; set; }
public string UriToCall;
public override string ToString()
{
return string.Format("Request to {0} executed in {1} seconds", this.UriToCall, Stopwatch.Elapsed.TotalSeconds);
}
}
}
There is some throttling in your system. Probably the service is configured for only one concurrent caller which is a common reason (WCF ConcurrencyMode). There might be HTTP-level connection limits (ServicePointManager.DefaultConnectionLimit) or WCF throttlings on the server.
Use Fiddler to determine if both requests are being sent simultaneously. Use the debugger to break on the server and see if both calls are running simultaneously.

Mass Downloading of Webpages C#

My application requires that I download a large amount of webpages into memory for further parsing and processing. What is the fastest way to do it? My current method (shown below) seems to be too slow and occasionally results in timeouts.
for (int i = 1; i<=pages; i++)
{
string page_specific_link = baseurl + "&page=" + i.ToString();
try
{
WebClient client = new WebClient();
var pagesource = client.DownloadString(page_specific_link);
client.Dispose();
sourcelist.Add(pagesource);
}
catch (Exception)
{
}
}
The way you approach this problem is going to depend very much on how many pages you want to download, and how many sites you're referencing.
I'll use a good round number like 1,000. If you want to download that many pages from a single site, it's going to take a lot longer than if you want to download 1,000 pages that are spread out across dozens or hundreds of sites. The reason is that if you hit a single site with a whole bunch of concurrent requests, you'll probably end up getting blocked.
So you have to implement a type of "politeness policy," that issues a delay between multiple requests on a single site. The length of that delay depends on a number of things. If the site's robots.txt file has a crawl-delay entry, you should respect that. If they don't want you accessing more than one page per minute, then that's as fast as you should crawl. If there's no crawl-delay, you should base your delay on how long it takes a site to respond. For example, if you can download a page from the site in 500 milliseconds, you set your delay to X. If it takes a full second, set your delay to 2X. You can probably cap your delay to 60 seconds (unless crawl-delay is longer), and I would recommend that you set a minimum delay of 5 to 10 seconds.
I wouldn't recommend using Parallel.ForEach for this. My testing has shown that it doesn't do a good job. Sometimes it over-taxes the connection and often it doesn't allow enough concurrent connections. I would instead create a queue of WebClient instances and then write something like:
// Create queue of WebClient instances
BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>();
// Initialize queue with some number of WebClient instances
// now process urls
foreach (var url in urls_to_download)
{
var worker = ClientQueue.Take();
worker.DownloadStringAsync(url, ...);
}
When you initialize the WebClient instances that go into the queue, set their OnDownloadStringCompleted event handlers to point to a completed event handler. That handler should save the string to a file (or perhaps you should just use DownloadFileAsync), and then the client, adds itself back to the ClientQueue.
In my testing, I've been able to support 10 to 15 concurrent connections with this method. Any more than that and I run into problems with DNS resolution (`DownloadStringAsync' doesn't do the DNS resolution asynchronously). You can get more connections, but doing so is a lot of work.
That's the approach I've taken in the past, and it's worked very well for downloading thousands of pages quickly. It's definitely not the approach I took with my high performance Web crawler, though.
I should also note that there is a huge difference in resource usage between these two blocks of code:
WebClient MyWebClient = new WebClient();
foreach (var url in urls_to_download)
{
MyWebClient.DownloadString(url);
}
---------------
foreach (var url in urls_to_download)
{
WebClient MyWebClient = new WebClient();
MyWebClient.DownloadString(url);
}
The first allocates a single WebClient instance that is used for all requests. The second allocates one WebClient for each request. The difference is huge. WebClient uses a lot of system resources, and allocating thousands of them in a relatively short time is going to impact performance. Believe me ... I've run into this. You're better off allocating just 10 or 20 WebClients (as many as you need for concurrent processing), rather than allocating one per request.
Why not just use a web crawling framework. It can handle all the stuff for you like (multithreading, httprequests, parsing links, scheduling, politeness, etc..).
Abot (https://code.google.com/p/abot/) handles all that stuff for you and is written in c#.
In addition to #Davids perfectly valid answer, I want to add a slightly cleaner "version" of his approach.
var pages = new List<string> { "http://bing.com", "http://stackoverflow.com" };
var sources = new BlockingCollection<string>();
Parallel.ForEach(pages, x =>
{
using(var client = new WebClient())
{
var pagesource = client.DownloadString(x);
sources.Add(pagesource);
}
});
Yet another approach, that uses async:
static IEnumerable<string> GetSources(List<string> pages)
{
var sources = new BlockingCollection<string>();
var latch = new CountdownEvent(pages.Count);
foreach (var p in pages)
{
using (var wc = new WebClient())
{
wc.DownloadStringCompleted += (x, e) =>
{
sources.Add(e.Result);
latch.Signal();
};
wc.DownloadStringAsync(new Uri(p));
}
}
latch.Wait();
return sources;
}
You should use parallel programming for this purpose.
There are a lot of ways to achieve what u want; the easiest would be something like this:
var pageList = new List<string>();
for (int i = 1; i <= pages; i++)
{
pageList.Add(baseurl + "&page=" + i.ToString());
}
// pageList is a list of urls
Parallel.ForEach<string>(pageList, (page) =>
{
try
{
WebClient client = new WebClient();
var pagesource = client.DownloadString(page);
client.Dispose();
lock (sourcelist)
sourcelist.Add(pagesource);
}
catch (Exception) {}
});
I Had a similar Case ,and that's how i solved
using System;
using System.Threading;
using System.Collections.Generic;
using System.Net;
using System.IO;
namespace WebClientApp
{
class MainClassApp
{
private static int requests = 0;
private static object requests_lock = new object();
public static void Main() {
List<string> urls = new List<string> { "http://www.google.com", "http://www.slashdot.org"};
foreach(var url in urls) {
ThreadPool.QueueUserWorkItem(GetUrl, url);
}
int cur_req = 0;
while(cur_req<urls.Count) {
lock(requests_lock) {
cur_req = requests;
}
Thread.Sleep(1000);
}
Console.WriteLine("Done");
}
private static void GetUrl(Object the_url) {
string url = (string)the_url;
WebClient client = new WebClient();
Stream data = client.OpenRead (url);
StreamReader reader = new StreamReader(data);
string html = reader.ReadToEnd ();
/// Do something with html
Console.WriteLine(html);
lock(requests_lock) {
//Maybe you could add here the HTML to SourceList
requests++;
}
}
}
You should think using Paralel's because the slow speed is because you're software is waiting for I/O and why not while a thread i waiting for I/O another one get started.
While the other answers are perfectly valid, all of them (at the time of this writing) are neglecting something very important: calls to the web are IO bound, having a thread wait on an operation like this is going to strain system resources and have an impact on your system resources.
What you really want to do is take advantage of the async methods on the WebClient class (as some have pointed out) as well as the Task Parallel Library's ability to handle the Event-Based Asynchronous Pattern.
First, you would get the urls that you want to download:
IEnumerable<Uri> urls = pages.Select(i => new Uri(baseurl +
"&page=" + i.ToString(CultureInfo.InvariantCulture)));
Then, you would create a new WebClient instance for each url, using the TaskCompletionSource<T> class to handle the calls asynchronously (this won't burn a thread):
IEnumerable<Task<Tuple<Uri, string>> tasks = urls.Select(url => {
// Create the task completion source.
var tcs = new TaskCompletionSource<Tuple<Uri, string>>();
// The web client.
var wc = new WebClient();
// Attach to the DownloadStringCompleted event.
client.DownloadStringCompleted += (s, e) => {
// Dispose of the client when done.
using (wc)
{
// If there is an error, set it.
if (e.Error != null)
{
tcs.SetException(e.Error);
}
// Otherwise, set cancelled if cancelled.
else if (e.Cancelled)
{
tcs.SetCanceled();
}
else
{
// Set the result.
tcs.SetResult(new Tuple<string, string>(url, e.Result));
}
}
};
// Start the process asynchronously, don't burn a thread.
wc.DownloadStringAsync(url);
// Return the task.
return tcs.Task;
});
Now you have an IEnumerable<T> which you can convert to an array and wait on all of the results using Task.WaitAll:
// Materialize the tasks.
Task<Tuple<Uri, string>> materializedTasks = tasks.ToArray();
// Wait for all to complete.
Task.WaitAll(materializedTasks);
Then, you can just use Result property on the Task<T> instances to get the pair of the url and the content:
// Cycle through each of the results.
foreach (Tuple<Uri, string> pair in materializedTasks.Select(t => t.Result))
{
// pair.Item1 will contain the Uri.
// pair.Item2 will contain the content.
}
Note that the above code has the caveat of not having an error handling.
If you wanted to get even more throughput, instead of waiting for the entire list to be finished, you could process the content of a single page after it's done downloading; Task<T> is meant to be used like a pipeline, when you've completed your unit of work, have it continue to the next one instead of waiting for all of the items to be done (if they can be done in an asynchronous manner).
I am using an active Threads count and a arbitrary limit:
private static volatile int activeThreads = 0;
public static void RecordData()
{
var nbThreads = 10;
var source = db.ListOfUrls; // Thousands urls
var iterations = source.Length / groupSize;
for (int i = 0; i < iterations; i++)
{
var subList = source.Skip(groupSize* i).Take(groupSize);
Parallel.ForEach(subList, (item) => RecordUri(item));
//I want to wait here until process further data to avoid overload
while (activeThreads > 30) Thread.Sleep(100);
}
}
private static async Task RecordUri(Uri uri)
{
using (WebClient wc = new WebClient())
{
Interlocked.Increment(ref activeThreads);
wc.DownloadStringCompleted += (sender, e) => Interlocked.Decrement(ref iterationsCount);
var jsonData = "";
RootObject root;
jsonData = await wc.DownloadStringTaskAsync(uri);
var root = JsonConvert.DeserializeObject<RootObject>(jsonData);
RecordData(root)
}
}

Categories