What is the fastest way to download web pages'es sources? - c#

I need to download numerous web pages' sources. So I need to do that as fast as possible. Here is my codes.
private static async Task<string> downloadsource(string link)
{
ServicePointManager.Expect100Continue = false;
WebRequest req = WebRequest.Create(link);
req.Proxy = null;
req.Method = "GET";
WebResponse res = await siteyeBaglantiTalebi.GetResponseAsync();
StreamReader read = new StreamReader(res.GetResponseStream());
return read.ReadToEnd();
}
List<string> links = new List<string>(){... including some web page links};
private static List<string> source_list(List<string> links)
{
List<string> sources = new List<string>();
for (int i = 0; i < links.Count; i++)
{
Task<string> _task = downloadsource(links[i]);
Console.WriteLine("Downloaded : " + i);
sources.Add(_task.Result);
}
return sources;
}
I was wondering if this code is the fastest way or it can be enhanced.
Can u pls help me with that ?

You are performing a _task.Result call inside each loop. Your code will run as fast as if you downloaded each page one after another if you code it like that.
Try this instead:
private async static Task<List<string>> source_list(List<string> links)
{
List<Task<string>> sources = new List<Task<string>>();
for (int i = 0; i < links.Count; i++)
{
Task<string> _task = downloadsource(links[i]);
Console.WriteLine("Downloading : " + i);
sources.Add(_task);
}
return (await Task.WhenAll(sources)).ToList();
}
This would be even better:
private async static Task<string[]> source_list(List<string> links)
{
return await Task.WhenAll(links.Select(l => downloadsource(l)));
}
Also I cleaned up your downloadsource method:
private static async Task<string> downloadsource(string link)
{
ServicePointManager.Expect100Continue = false;
WebRequest req = WebRequest.Create(link);
req.Proxy = null;
req.Method = "GET";
using (WebResponse res = await req.GetResponseAsync())
{
using (StreamReader read = new StreamReader(res.GetResponseStream()))
{
return read.ReadToEnd();
}
}
}

Related

Return multiple Json responses

I have built this API which returns JSON responses from Google place API to store/save it to the database and as this code sample shows it's List of PlaceId so I've written a For loop to loop every PlaceId and return them all then to the next point which is to post them to the database,
public class portal_teilnehmerController : ControllerBase
{
private readonly _0046696KContext _context;
private const string apiKey = #"apiKey";
private const string fields = "&fields=address_component,rating,reviews,user_ratings_total,website";
WebRequest request;
WebResponse response;
Stream data;
StreamReader reader;
private Task<string> responseFromServer;
private string[] JsonResponses = { };
public portal_teilnehmerController(_0046696KContext context)
{
_context = context;
}
[HttpGet]
[Produces("application/json")]
public async Task<JsonResult> Getportal_teilnehmerByPlaceId()
{
var PlaceId = await _context.portal_teilnehmer.Select(x => x.PlaceId).ToListAsync();
if (PlaceId == null)
{
throw new InvalidOperationException("This Portal ID not found, please be assure of your portal ID");
}
for (int i = 0; i < PlaceId.Count(); i++)
{
string url = #"https://maps.googleapis.com/maps/api/place/details/json?place_id=" + (PlaceId[i]) + (fields) + (apiKey);
request = WebRequest.Create(url);
response = await request.GetResponseAsync();
data = response.GetResponseStream();
reader = new StreamReader(data);
string timeStamp = GetTimestamp(DateTime.Now);
responseFromServer = reader.ReadToEndAsync();
}
return new JsonResult(await responseFromServer);
}
but what happens when I test, it just returns me the last PlaceId response of the For loop.
Any ideas on how to return them all to maybe array and save them?
Any ideas on how to return them all to maybe array and save them?
It's probably easier to use a List:
var responsesFromServer = new List<string>();
//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
for (int i = 0; i < PlaceId.Count(); i++)
{
string url = #"https://maps.googleapis.com/maps/api/place/details/json?place_id=" + (PlaceId[i]) + (fields) + (apiKey);
request = WebRequest.Create(url);
response = await request.GetResponseAsync();
data = response.GetResponseStream();
reader = new StreamReader(data);
string timeStamp = GetTimestamp(DateTime.Now);
responsesFromServer.Add(await reader.ReadToEndAsync());
// ^ ^^^^^^^^^^ ^
}
return new JsonResult(responsesFromServer);
// ^
New bits underlined with a caret ^
But you could use an array, I suppose.. After all you do know how many places you're going to download..
var responsesFromServer = new string[PlaceId.Count()];
for (int i = 0; i < PlaceId.Count(); i++)
{
...
responsesFromServer[i] = await reader.ReadToEndAsync();
}
return new JsonResult(responsesFromServer);

Task.WaitAll - Results are Overridden

Inside a loop I am creating a task using Task.Run(//...). Each task contains a WebRequest. At Task.WaitAll Result of one task getting overridden by Result of another Task. Whats wrong I am doing ?
When tried with debugger it works fine. Is it because of concurrency ? How to resolve this ?
Below is my code snippet:
SomeMethod()
{
//someItemList.Count() == 5
int r = 0;
Task<MyModel>[] myTaskList= new Task<MyModel>[5];
foreach(var item in someItemList){
Task<MyModel> t = Task<MyModel>.Run(() => { return
SomeOperationWithWebRequest(item); });
myTaskList[r] = t;
r++;
}
Task.WaitAll(myTaskList); //myTaskList[0].Result...myTaskList[4].Result all are having same output.
}
MyModel SomeOperationwithWebRequest(Item){
string URL = "SomeURLFromItem";
string DATA = "DATAfromItem"
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);
request.Method = "POST";
request.ContentType = "application/json";
request.ContentLength = DATA.Length;
using (Stream webStream = request.GetRequestStream())
using (StreamWriter requestWriter = new StreamWriter(webStream, System.Text.Encoding.ASCII))
{
requestWriter.Write(DATA);
}
try
{
WebResponse webResponse = request.GetResponseAsync();
using (Stream webStream = webResponse.GetResponseStream() ?? Stream.Null)
using (StreamReader responseReader = new StreamReader(webStream))
{
//response
}
catch (Exception ex)
{
}
return new MyModel() { // properties
};
}
I think it works in debugging because you dont await the async WebRequest. Try this:
private readonly List<string> _someItemList = new List<string> { "t1", "t2", "t3" };
private async Task SomeMethodAsync()
{
var myTaskList = new List<Task<MyModel>>();
int counter = 0;
foreach (var item in _someItemList)
{
var t = Task.Run(() => SomeOperationWithWebRequestAsync(counter++));
myTaskList.Add(t);
}
await Task.WhenAll(myTaskList);
}
public async Task<MyModel> SomeOperationWithWebRequestAsync(int counter)
{
//do your async request, for simplicity I just do a delay
await Task.Delay(counter * 1000);
return new MyModel {Counter = counter };
}
public class MyModel
{
public int Counter { get; set; }
}
and use it like:
await SomeMethodAsync();
Also notice Task.WhenAll vs Task.WaitAll see e.g
WaitAll vs WhenAll

Get first Http response from Parallel

I have 5 URL and I want to make a Http request for each one, And waiting for the first response that has the conditions.
List<string> urls; // url1, url2, ......
ParallelLoopResult result = Parallel.ForEach(urls, url=> GetTimeSlot(url));
private string GetTimeSlot(string url)
{
HttpWebRequest wr = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)wr.GetResponse();
string responseString = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding(response.CharacterSet)).ReadToEnd();
if (responseString.Length < 6)
return ""; //PARALEL RESUME
else
return responseString; //PARALEL ENDS
}
I need only the first response. Is it possible with Parallel or is there any better way? Thanks.
Parallel.ForEach will be fine to use especially for your use case.Just use a cancellation token to stop all other running tasks.
static void Main(string[] args)
{
var cts = new CancellationTokenSource();
var _lock = new Object();
var po = new ParallelOptions();
po.CancellationToken = cts.Token;
po.MaxDegreeOfParallelism = System.Environment.ProcessorCount;
var listOfUrls = new List<string>() { "url1", "url2" };
var responsResult = "";
try
{
Parallel.ForEach(listOfUrls, po, (url) =>
{
po.CancellationToken.ThrowIfCancellationRequested();
HttpWebRequest wr = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)wr.GetResponse();
string responseString = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding(response.CharacterSet)).ReadToEnd();
lock (_lock)
{
if (responseString.Length > 6)
{
responsResult = responseString;
cts.Cancel();
}
}
});
}
catch (OperationCanceledException e)
{
//cancellation was requested
}
finally
{
cts.Dispose();
}
}
You can use PLinq:
string firstResponse = urls
.AsParallel()
.Select(url => GetTimeSlot(url))
.FirstOrDefault(r => ! string.IsNullOrEmpty(r))
;

Multiple HttpWebRequests in c#

I have a windows form application that reads a bunch of urls(~800) from a text file to List. The application then shows the status code of all the urls.
The problem is If I run a normal for loop from 0 to list count, it takes a lot of time. I need to speed up the process to the maximum possible without blocking the UI. Below is my code
private async void button1_Click(object sender, EventArgs e)
{
System.IO.StreamReader file = new System.IO.StreamReader("urls.txt");
while ((line = file.ReadLine()) != null)
{
pages.Add(line);
}
file.Close();
for(int i=0; i<pages.Count; i++)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(pages[i]);
int code = 0;
try
{
WebResponse response = await request.GetResponseAsync();
HttpWebResponse r = (HttpWebResponse)response;
code = (int)r.StatusCode;
}
catch (WebException we)
{
var r = ((HttpWebResponse)we.Response).StatusCode;
code = (int)r;
}
}
//add the url and status code to a datagridview
}
One approach would be to use tasks, so that you are not waiting for the last request to complete, before starting the next.
Task<int> tasks;
for (int i = 0; i < 10; i++)
{
tasks = Task.Run<int>(() =>
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(pages[i]);
int code = 0;
try
{
WebResponse response = request.GetResponse();
HttpWebResponse r = (HttpWebResponse)response;
code = (int)r.StatusCode;
}
catch (WebException we)
{
var r = ((HttpWebResponse)we.Response).StatusCode;
code = (int)r;
}
return code;
}
);
}
await tasks;

How to Convert Synchronous 4.0 to ASynchronous in C# 4.5

I am having a hard time convert the below code which i have created in 4.0 to 4.5 using HttpClient.
According to my understand i guess if i create multiple web requests in the GUI thread itself without blocking the GUI if i got with asynchronous requeest.
how to convert the below code to Asynchronous using HttpClient in 4.5
// This is what called when button is clicked
Task t3 = new Task(SpawnTask);
t3.Start();
//if noofthreads are less 50 then GUI is woking fine.. if number increases then takes much time for repaint..
//where as other softwares are working without any problem even if the threads are more than 500!! in the same system
public void SpawnTask()
{
try
{
ParallelOptions po = new ParallelOptions();
po.CancellationToken = cts.Token;
po.MaxDegreeOfParallelism = noofthreads;
Parallel.ForEach(
urls,
po,
url => checkpl(url));
}
catch (Exception ex)
{
}
}
public void checkpl(string url)
{
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Timeout = 60*1000;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string stext = "";
using (BufferedStream buffer = new BufferedStream(response.GetResponseStream()))
{
using (StreamReader reader = new StreamReader(buffer))
{
stext = reader.ReadToEnd();
}
}
response.Close();
if (stext .IndexOf("domainname.com") != -1)
{
tfound = tfound + 1;
string lext = "Total Found : "+tfound.ToString();
label3.BeginInvoke(new InvokeDelegate(UpdateLabel), ltext);
slist.Add(url);
textBox2.BeginInvoke(new InvokeDelegate4(UpdateText), "Working Url " + url);
}
}
catch (Exception ex)
{
}
}
Since you are using .NET 4.5 you can use the new async and await keywords. Here is what it might look like.
private async void YourButton_Click(object sender, EventArgs args)
{
YourButton.Enabled = false;
try
{
var tasks = new List<Task>();
foreach (string url in Urls)
{
tasks.Add(CheckAsync(url));
}
await TaskEx.WhenAll(tasks);
}
finally
{
YourButton.Enabled = true;
}
}
private async Task CheckAsync(string url)
{
bool found = await UrlResponseContainsAsync(url, "domainname.com");
if (found)
{
slist.Add(url);
label3.Text = "Total Found: " + slist.Count.ToString();
textbox2.Text = "Working Url " + url;
}
}
private async Task<bool> UrlResponseContainsAsync(string url, string find)
{
var request = WebRequest.Create(url);
request.Timeout = 60 * 1000;
using (WebResponse response = await request.GetResponseAsync())
{
using (var buffer = new BufferedStream(response.GetResponseStream()))
using (var reader = new StreamReader(buffer))
{
string text = reader.ReadToEnd();
return text.Contains(find);
}
}
}

Categories