No results with C# web scraping - c#

I'm a beginner and I want to try to do some web scraping with C#, but with this code, it does not return any results, even though it should return a full list of items.
static void Main(string[] args)
{
GetHtmlAsync();
Console.ReadLine();
}
private static async void GetHtmlAsync()
{
var url = "https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=playstation+5&_sacat=0";
var httpClient = new HttpClient();
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var ProductList = htmlDocument.DocumentNode.Descendants("ul").Where(node => node.GetAttributeValue("class", "").Equals("ListViewInner")).ToList();
}

When you use async in C# you need to be async all the way down (and up).
So, to call the async void GetHtmlAsync() method, your caller needs to be async.
For a method to be async, it can't return void, but instead we return a Task. Tasks basically represent the "potential to return a value" and can be handed around irrespective of whether the potential has been reached, so you can have a Task<int> that will get you a number at some point, if you wait for it to do so.
If you want to have the result before your read line, you also need to await the result.
static async Task Main(string[] args)
{
await GetHtmlAsync();
Console.ReadLine();
}
Full Example
I don't know which implementation of HtmLDocument you are using, so there is one using statement that needs to be replaced below (using SOURCE.OF.HTMLDOCUMENT;).
using System;
using System.Net.Http;
using System.Threading.Tasks;
using SOURCE.OF.HTMLDOCUMENT;
namespace ConsoleApp1
{
class Program
{
static async Task Main(string[] args)
{
await GetHtmlAsync();
Console.ReadLine();
}
private static async Task GetHtmlAsync()
{
var url = "https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2380057.m570.l1313&_nkw=playstation+5&_sacat=0";
var httpClient = new HttpClient();
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var ProductList = htmlDocument.DocumentNode.Descendants("ul").Where(node => node.GetAttributeValue("class", "").Equals("ListViewInner")).ToList();
Console.WriteLine(ProductList.Count);
}
}
}

Related

Async parsing with AngleSharp

So I want to parse some data from website and I found a tutorial, here is code:
public static async void Test()
{
var config = Configuration.Default.WithDefaultLoader();
using var context = BrowsingContext.New(config);
var url = "http://webcode.me";
using var doc = await context.OpenAsync(url);
// var title = doc.QuerySelector("title").InnerHtml;
var title = doc.Title;
Console.WriteLine(title);
var pars = doc.QuerySelectorAll("p");
foreach (var par in pars)
{
Console.WriteLine(par.Text().Trim());
}
}
static void Main(string[] args)
{
Test();
}
And the program quits right after it reaches the:
using var doc = await context.OpenAsync(url);
Nothing is waiting for your asynchronous method to complete, so the program quits. You can fix this by amending to use an async main method:
static Task Main(string[] args)
{
return Test();
}
Or if you're using a version older than C# 7.1 (where async main not supported):
static void Main(string[] args)
{
Test().GetAwaiter().GetResult();
}
You'll also need to change the return type of Test to async Task:
public static async Task Test()
{
// ...
}
You might find the C# 7.1 docs on async main helpful.

Consume a Web Service in C#: Remove Task.Wait()

I am trying to consume a web service. It's an XML based service. I mean response in XML format. The code is working fine. However, I do not want to use task.Wait(). Please let me know how I can replace it with async/await.
Below is my code :
using System.IO;
using System.Net.Http;
using System.Threading.Tasks;
using System.Xml.Serialization;
namespace ConsoleApp6
{
class Program
{
static void Main(string[] args)
{
Program obj = new Program();
var result = obj.GetData().Result;
}
public async Task<string> GetData()
{
string url =
"https://test.net/info.php?akey=abcd&skey=xyz";
HttpClient client = new HttpClient();
HttpResponseMessage response = client.GetAsync(url).Result;
var responseValue = string.Empty;
if (response != null)
{
Task task = response.Content.ReadAsStreamAsync().ContinueWith(t =>
{
var stream = t.Result;
using (var reader = new StreamReader(stream))
{
responseValue = reader.ReadToEnd();
}
});
task.Wait(); // How I can replace it and use await
}
return responseValue;
}
}
[XmlRoot(ElementName = "Info")]
public class Test
{
[XmlAttribute(AttributeName = "att")]
public string SomeAttribute{ get; set; }
[XmlText]
public string SomeText{ get; set; }
}
}
You already are in an async context, so just use await:
var stream = await response.Content.ReadAsStreamAsync();
using (var reader = new StreamReader(stream))
{
responseValue = reader.ReadToEnd();
}
That said, you should check all your calls:
HttpResponseMessage response = await client.GetAsync(url);
and make your main async, too and while we are at it make the method static:
public static async Task Main)
{
var result = await GetData();
}
where your method signature is:
public static async Task<string> GetData()
The static isn't required, but you will find parallel and/or asynchronous programming is a lot easier if you have as little side effects as possible.
You can make Main method async as well and await GetData
static async Task Main(string[] args)
{
Program obj = new Program();
var result = await obj.GetData();
}

Simple.OData.Client not returning results, no error [duplicate]

public class test
{
public async Task Go()
{
await PrintAnswerToLife();
Console.WriteLine("done");
}
public async Task PrintAnswerToLife()
{
int answer = await GetAnswerToLife();
Console.WriteLine(answer);
}
public async Task<int> GetAnswerToLife()
{
await Task.Delay(5000);
int answer = 21 * 2;
return answer;
}
}
if I want to call Go in main() method, how can I do that?
I am trying out c# new features, I know i can hook the async method to a event and by triggering that event, async method can be called.
But what if I want to call it directly in main method? How can i do that?
I did something like
class Program
{
static void Main(string[] args)
{
test t = new test();
t.Go().GetAwaiter().OnCompleted(() =>
{
Console.WriteLine("finished");
});
Console.ReadKey();
}
}
But seems it's a dead lock and nothing is printed on the screen.
Your Main method can be simplified. For C# 7.1 and newer:
static async Task Main(string[] args)
{
test t = new test();
await t.Go();
Console.WriteLine("finished");
Console.ReadKey();
}
For earlier versions of C#:
static void Main(string[] args)
{
test t = new test();
t.Go().Wait();
Console.WriteLine("finished");
Console.ReadKey();
}
This is part of the beauty of the async keyword (and related functionality): the use and confusing nature of callbacks is greatly reduced or eliminated.
Instead of Wait, you're better off using
new test().Go().GetAwaiter().GetResult()
since this will avoid exceptions being wrapped into AggregateExceptions, so you can just surround your Go() method with a try catch(Exception ex) block as usual.
Since the release of C# v7.1 async main methods have become available to use which avoids the need for the workarounds in the answers already posted. The following signatures have been added:
public static Task Main();
public static Task<int> Main();
public static Task Main(string[] args);
public static Task<int> Main(string[] args);
This allows you to write your code like this:
static async Task Main(string[] args)
{
await DoSomethingAsync();
}
static async Task DoSomethingAsync()
{
//...
}
class Program
{
static void Main(string[] args)
{
test t = new test();
Task.Run(async () => await t.Go());
}
}
As long as you are accessing the result object from the returned task, there is no need to use GetAwaiter at all (Only in case you are accessing the result).
static async Task<String> sayHelloAsync(){
await Task.Delay(1000);
return "hello world";
}
static void main(string[] args){
var data = sayHelloAsync();
//implicitly waits for the result and makes synchronous call.
//no need for Console.ReadKey()
Console.Write(data.Result);
//synchronous call .. same as previous one
Console.Write(sayHelloAsync().GetAwaiter().GetResult());
}
if you want to wait for a task to be done and do some further processing:
sayHelloAsyn().GetAwaiter().OnCompleted(() => {
Console.Write("done" );
});
Console.ReadLine();
If you are interested in getting the results from sayHelloAsync and do further processing on it:
sayHelloAsync().ContinueWith(prev => {
//prev.Result should have "hello world"
Console.Write("done do further processing here .. here is the result from sayHelloAsync" + prev.Result);
});
Console.ReadLine();
One last simple way to wait for function:
static void main(string[] args){
sayHelloAsync().Wait();
Console.Read();
}
static async Task sayHelloAsync(){
await Task.Delay(1000);
Console.Write( "hello world");
}
public static void Main(string[] args)
{
var t = new test();
Task.Run(async () => { await t.Go();}).Wait();
}
Use .Wait()
static void Main(string[] args){
SomeTaskManager someTaskManager = new SomeTaskManager();
Task<List<String>> task = Task.Run(() => marginaleNotesGenerationTask.Execute());
task.Wait();
List<String> r = task.Result;
}
public class SomeTaskManager
{
public async Task<List<String>> Execute() {
HttpClient client = new HttpClient();
client.BaseAddress = new Uri("http://localhost:4000/");
client.DefaultRequestHeaders.Accept.Clear();
HttpContent httpContent = new StringContent(jsonEnvellope, Encoding.UTF8, "application/json");
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
HttpResponseMessage httpResponse = await client.PostAsync("", httpContent);
if (httpResponse.Content != null)
{
string responseContent = await httpResponse.Content.ReadAsStringAsync();
dynamic answer = JsonConvert.DeserializeObject(responseContent);
summaries = answer[0].ToObject<List<String>>();
}
}
}
try "Result" property
class Program
{
static void Main(string[] args)
{
test t = new test();
t.Go().Result;
Console.ReadKey();
}
}
C# 9 Top-level statements simplified things even more, now you don't even have to do anything extra to call async methods from your Main, you can just do this:
using System;
using System.Threading.Tasks;
await Task.Delay(1000);
Console.WriteLine("Hello World!");
For more information see What's new in C# 9.0, Top-level statements:
The top-level statements may contain async expressions. In that case, the synthesized entry point returns a Task, or Task<int>.

Async method sending response back to Main

I have implemented a soap client using a Async method. I want this method to return a string value that I get from the API server to my main Thread or to another method (whichever method is calling). How do I do this:
MAIN THREAD
static void Main(string[] args)
{
TEXT().GetAwaiter().OnCompleted(() => { Console.WriteLine("finished"); });
Console.ReadKey();
// if I do it like this
// var test = TEXT().GetAwaiter().OnCompleted(() => { Console.WriteLine("finished"); });
// it gives me error: Cannot assign void to an implicitly-typed local variable
}
ASYNC METHOD
public static async Task<string> TEXT()
{
Uri uri = new Uri("http://myaddress");
HttpClient hc = new HttpClient();
hc.DefaultRequestHeaders.Add("SOAPAction", "Some Action");
var xmlStr = "SoapContent"; //not displayed here for simplicity
var content = new StringContent(xmlStr, Encoding.UTF8, "text/xml");
using (HttpResponseMessage response = await hc.PostAsync(uri, content))
{
var soapResponse = await response.Content.ReadAsStringAsync();
string value = await response.Content.ReadAsStringAsync();
return value; //how do I get this back to the main thread or any other method
}
}
In a pre-C# 7.0 console application it can be achieved as simple as this:
public static void Main()
{
string result = TEXT().Result;
Console.WriteLine(result);
}
In this case TEXT can be considered a usual method, which returns Task<string>, so its result is available in Result property. You don't need to mess with awaiter, results etc.
At the same time, you cannot do this in most types of applications (WinForms, WPF, ASP.NET etc.) and in this case you will have to use async/await across all your application:
public async Task SomeMethod()
{
string result = await TEXT();
// ... do something with result
}
If you plan to do a lot of async in a console application, I recommend using this sort of MainAsync pattern:
static public void Main(string[] args) //Entry point
{
MainAsync(args).GetAwaiter().GetResult();
}
static public Task MainAsync(string[] args) //Async entry point
{
await TEXT();
Console.WriteLine("finished");
}
If you upgrade to C# 7.1 or later, you can then remove the Main method and use async main.
Or if you ever migrate this code to an ASP.NET or WinForms application, you can ignore Main and migrate the code in MainAsync (otherwise you will run afoul of the synchronization model and get deadlocked).
In C# 7.0+, you can use async Task Main
static async Task Main(string[] args)
{
var result = TEXT().ConfigureAwait(false)
Console.ReadKey();
}
for older versions of C#
public static void Main(string[] args)
{
try
{
TEST().GetAwaiter().GetResult();
}
catch (Exception ex)
{
WriteLine($"There was an exception: {ex.ToString()}");
}
}

Run time error and Program exits when using - Async and await using C#

I am trying to use the concept of async and await in my program. The program abruptly exits. I am trying to get the content length from few random urls and process it and display the size in bytes of each url.
Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Net;
namespace TestProgram
{
public class asyncclass
{
public async void MainCall() {
await SumPageSizes();
}
public async Task SumPageSizes(){
List<string> urllist = GetUrlList();
foreach (var url in urllist)
{
byte[] content = await GetContent(url);
Displayurl(content, url);
}
}
private void Displayurl(byte[] content, string url)
{
var length = content.Length;
Console.WriteLine("The bytes length for the url response " + url + " is of :" +length );
}
private async Task<byte[]> GetContent(string url)
{
var content = new MemoryStream();
try
{
var obj = (HttpWebRequest)WebRequest.Create(url);
WebResponse response = obj.GetResponse();
using (Stream stream = response.GetResponseStream())
{
await stream.CopyToAsync(content);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace);
}
return content.ToArray();
}
private List<string> GetUrlList()
{
var urllist = new List<string>(){
"http://msdn.microsoft.com/library/windows/apps/br211380.aspx",
"http://msdn.microsoft.com",
"http://msdn.microsoft.com/en-us/library/hh290136.aspx",
"http://msdn.microsoft.com/en-us/library/ee256749.aspx",
"http://msdn.microsoft.com/en-us/library/hh290138.aspx",
"http://msdn.microsoft.com/en-us/library/hh290140.aspx",
"http://msdn.microsoft.com/en-us/library/dd470362.aspx",
"http://msdn.microsoft.com/en-us/library/aa578028.aspx",
"http://msdn.microsoft.com/en-us/library/ms404677.aspx",
"http://msdn.microsoft.com/en-us/library/ff730837.aspx"
};
return urllist;
}
}
}
Main
public static void Main(string[] args)
{
asyncclass asyncdemo = new asyncclass();
asyncdemo.MainCall();
}
MainCall returns an uncompleted task and no other line of code is present beyond that, so your program ends
To wait for it use:
asyncdemo.MainCall().Wait();
You need to avoid async void and change MainCall to async Task in order to be able to wait for it from the caller.
Since this seems to be a console application, you can't use the await and async for the Main method using the current version of the compiler (I think the feature is being discussed for upcoming implementation in C# 7).
The problem is that you don't await an asynchron method and therefore you application exits before the method ended.
In c# 7 you could create an async entry point which lets you use the await keyword.
public static async Task Main(string[] args)
{
asyncclass asyncdemo = new asyncclass();
await asyncdemo.MainCall();
}
If you want to bubble your exceptions from MainCall you need to change the return type to Task.
public async Task MainCall()
{
await SumPageSizes();
}
If you wanted to run your code async before c# 7 you could do the following.
public static void Main(string[] args)
{
asyncclass asyncdemo = new asyncclass();
asyncdemo.MainCall().Wait();
// or the following line if `MainCall` doesn't return a `Task`
//Task.Run(() => MainCall()).Wait();
}
You have to be very careful when using async void methods. Those will not be awaited. One normal example of an async void is when you are calling an awaitable method inside a button click:
private async void Button_Click(object sender, RoutedEventArgs e)
{
// run task here
}
This way the UI won't be stuck waiting for the button click to complete.
On most custom methods you will almost always want to return a Task so that you are able to know when your method is finished.

Categories