Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want the logic to get all page URLs from a website if I provide a website URL, means if I will provide a website URL then I should get all the pages with URLs in a collection. How can I implement this using C#.
While this is not a trivial task, you best start with the Html Agility Pack.
It allows you to search for HTML tags, even if the markup is invalid. It is by far superior than parsing your responses manually.
As Save noted, the following answer provides a great example:
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(/* url */);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[#href]"))
{
}
Source: https://stackoverflow.com/a/2248422/548020
You can use the WebClient or WebRequest
WebRequest request = WebRequest.Create("http://www.yahoo.com");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
The community is reviewing whether to reopen this question as of 7 months ago.
Improve this question
I have some code already, but let's say I have a picture box in a windows forms app which updates to a random image whenever you click it.
How do I extract an image from a website (e.g. https://prnt.sc/hello1)
The image link is located under in the src=
<img class="no-click screenshot-image" src="https://image.prntscr.com/image/HENolz07Ty_AA4RwYdZVGg.png" crossorigin="anonymous" alt="Lightshot screenshot" id="screenshot-image" image-id="hello1">
The code I have already is:
var image = "";
pictureBox1.ImageLocation = image;
pictureBox1.Update();
How could I (with only the page page url) find the image on the page and define it to 'image' (preferably using c#)
use AngleSharp to get the page HTML and then select the desired element using various selectors. Then you can use the HttpClient to download the file
using AngleSharp;
using System.Net;
var config = Configuration.Default.WithDefaultLoader();
var address = "https://prnt.sc/hello1";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);
var imgSelector = "#screenshot-image";
var cells = document.QuerySelectorAll(imgSelector);
var imageAddress = cells.First().GetAttribute("src");
var client = new HttpClient();
var stream = await client.GetStreamAsync(imageAddress);
using (var fileStream = File.Create(AppDomain.CurrentDomain.BaseDirectory + #"\img.png"))
{
stream.CopyTo(fileStream);
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a desktop application that has some classes, which I want to serialize and send to a webpage when a user clicks a button in the desktop C# application.
The data is too long for an argument. What I want to achieve here is, how do I post it and open the website on the clients PC with the dynamic changes made by the sent data ?
Need some suggestions or guidance to proceed in the right direction.
You can use HttpClient
For example:
using (var client = new HttpClient())
{
client.BaseAddress = new Uri("http://myUrl");
var gizmo = new Product() { Name = "Gizmo", Price = 100, Category = "Widget" };
response = await client.PostAsJsonAsync("api/products", gizmo);
if (response.IsSuccessStatusCode)
{
//do something
}
}
One option would be to generate local page with form that contains data and "action=POST" pointing to your site. Than set script that automatically submit this form and as result you'll have data send by browser and browser will continue as if it is normal POST request.
If you don't like HttpClient you can also use WebClient which is a convenience wrapper for your exact scenario.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm having problems parsing information from a forum.
Heres some examples:
Easy
Hard
It would be really easy to get the information as they are displayed in the div where id = "poe-popup-container".
The problem is that that div is only populated when the browser allows you to see the information. That can be easily reproduced by making your browser height really small and looking in the HTML code for the . However, the div will be empty, but as soon you scroll down to see the item it will change.
I'm trying to read the nodes inside the with htmlagillitypack. The problem is that, as i explained, it only has information when the browser says that you need that information.
So, when you try to download the html, the div is empty.
I've tried to download the page with the web browser too, but the same thing happens.
I'm trying to use the following code:
string page = System.Text.Encoding.UTF8.GetString(Webclient.DownloadData("http://www.pathofexile.com/forum/view-thread/966384"));
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[#id='poe-popup-container']");
MessageBox.Show(node.InnerHtml);
You're trying to do impossible. Javascript is executed in browser. HtmlAgilityPack is library just for parsing static html - it can't execute javascript.
So why don't you look into browser automation instead ? Try for example http://watin.org/
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Currently working with Unity, this might a super basic question, but here goes.
I need to call a URL from my app in C#. This is done for analytics purposes, and so I don't want to open a web browser or anything, just call the URL and that's it. I know about Application.OpenURL() to open the browser, but how do I achieve this without opening the browser ?
You may try like this:
var client = new WebClient();
var x = client.DownloadString("http://example.com");
or
HttpWebRequest request = WebRequest.Create("http://example.com") as HttpWebRequest;
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Stream stream = response.GetResponseStream();
Use the WebClient class in the System.Net namespace.
It's a high level implementation of an HTTP client which is really easy to use.
Has a method called .DownloadString() which does exactly what you want - calls a URL using HTTP GET and returns the response as a string.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I am working asp.net mvc application. I am outputting some html on to a view which is supplied via another application. I want to be able to take this html and append a CDN domain to all image tags. I am not sure how to do this but would like some suggestions.
An easy way would be to use the Html Agility Pack [link] in your controller.
For example:
using HtmlAgilityPack;
...
private string AppendCdnToImgSrc(string htmlString)
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlString); // or htmlDoc.Load(htmlFileName) if a file
foreach(HtmlNode img in doc.DocumentElement.SelectNodes("//img[#src]")
{
HtmlAttribute attribute = img["src"];
attribute.Value = attribute.Value + ".cdn";
}
// return string output...
MemoryStream memStream = new MemoryStream();
htmlDoc.Save(memStream)
memStream.Seek(0, System.IO.SeekOrigin.Begin);
StreamReader reader = new StreamReader(memStream);
return reader.ReadToEnd();
}