I have a string that has HTML formatted content.
Now I want to convert that string to HTML, May I use HtmlElementCollection
Is it possible? If yes, then how?
Kindly explain. Thanks!
Take a look at the HtmlAgilityPack. More information can be found on the answers of other similar questions.
A string will be handled as HTML when you push this string into an environment that will render the content as HTML.
When you push the content of the string to an environment that doesn't handle HTML or you explicitly say that you don't want it to render as HTML. It will be rendered as plain text.
Use HTMLAGILITYPACK and use the following:
var st1 = stringdata; // your html formatted string
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(st1); // this is now html doc
If someone still locking for this
For that just type code below in the place you what to show the HTML
#((MarkupString)htmlString)
//htmlString =>string that has HTML formatted content
//tested with blazor server last ver of dotnet(now 6.0)
Related
Please check the code bellow. I am trying to grab a html text value from this html doc. I want to grab text Quick Kill 32 oz. Mosquito Yard Spray and i already tried to do it using SelectSingleNode like bellow and this cant grab this text value. Any idea how to fix it?
string html = #"<div class='pod-plp__description js-podclick-analytics' data-podaction='product name'>
<a class='' data-pos='0' data-request-type='sr' data-pod-type='pr' href='/p/AMDRO-Quick-Kill-32-oz-Mosquito-Yard-Spray-100530440/304755303'>
<span class='pod-plp__brand-name'>AMDRO</span>
Quick Kill 32 oz. Mosquito Yard Spray
</a>
</div>";
var doc = new HtmlDocument();
doc.Load(html);
string title = doc.DocumentNode
.SelectSingleNode("//div[#class='pod-plp__description js-podclick-analytics']span[#class='pod-plp__brand-name']")
.InnerText;
You are trying to targeting only span[#class='pod-plp__brand-name'] which will return you only inside span but you need following-sibling::text() to grab text after your span. Please see my example code bellow. Also you can learn more from html-agility-pack official site.
var Content = htmlDoc.DocumentNode.SelectSingleNode("//span[#class='pod-plp__brand-name']/following-sibling::text()[1]");
string title = titleAgain.InnerText.Trim();
Found solution from here
How can I get the text in a div tag from a webpage to my .cs file (C#)?
I tested the html agility pack but it did not work I got different error and it's probably because this is a Windows Phone 7 project. Has anyone else any idea how to solve this?
Silverlight C# Code
string text = HtmlPage.Window.Invoke("getDivText").ToString();
HTML
function getDivText() {
return YourDivText;
}
HtmlAgilityPack should be what you need. Make sure you get it from the NuGet, rather than directly from the project page, as the NuGet version includes a WP7 build.
Update
Windows Phone does not support synchronous networking APIs so HtmlAgilityPack can't support asynchronous loads. You need to pass a callback to LoadAsync to use it.
If you want to create document from string not actual file you should use-
doc.LoadHtml(string);
EDIT
This is how i use HtmlAgilityPack for parsing from webpage.(but this is in winForms)
string page;
using(WebClient client = new WebClient())
{
page = client.DownloadString(url);
}
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(page);
string result;
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[#class='obf']");
result = node.InnerText;
I am using the HtmlAgilityPack from codeplex.
When I pass a simple html string into it and then get the resulting html back,
it cuts off tags.
Example:
string html = "<select><option>test</option></select>";
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var result = d.DocumentNode.OuterHtml;
// result gives me:
<select><option>test</select>
So the closing tag for the option is missing. Am I missing a setting or using this wrong?
I fixed this by commenting out line 92 of HtmlNode.cs in the source, compiled and it worked like a charm.
ElementsFlags.Add("option", HtmlElementFlag.Empty); // comment this out
Found the answer on this question
In HTML the tag has no end tag.
In XHTML the tag must be properly closed.
http://www.w3schools.com/tags/tag_option.asp
"There is also no adherence to XHTML or XML" - HTML Agility Pack.
This could be why? My guess is that if the tag is optional, the Agility Pack will leave it off. Hope this helps!
I am working on some scraping app, i wanted to try to get it to work but ran into a problem. I have replaced the original scraping destination in the below code with googles webpage, just for testing. It seems that my download doesnt get everything, i note that the body and the html tags are missing their close tags. How do i get it to download everything? Whats wrong with my sample code:
string filename = "test.html";
WebClient client = new WebClient();
string searchTerm = HttpUtility.UrlEncode(textBox2.Text);
client.QueryString.Add("q", searchTerm);
client.QueryString.Add("hl", "en");
string data = client.DownloadString("http://www.google.com/search");
StreamWriter writer = new StreamWriter(filename, false, Encoding.Unicode);
writer.Write(data);
writer.Flush();
writer.Close();
Google's web pages are now in HTML 5, meaning the BODY and HTML tags can be self-closed - which is why Google omits them (believe it or not, it saves them bandwidth.)
See this article.
You can write HTML5 in either "HTML/SGML" mode (which allows the omitting of closing tags like HTML did prior to XHTML) or in "XHTML" which follows the rules of XML, requiring all tags to be closed.
Which the browser chooses to parse the page depends on whether you send a Content-type header of text/html for HTML/SGML syntax or application/xhtml+xml for XHTML syntax. (Source: HTML5 syntax - HTML vs XHTML)
...Google's page doesn't have the closing tags for <body> and <html>. Talk about crazy optimization...
http://www.google.com/search doesn't have closing tags.
i want download all images stored in html(web page) , i dont know how much image will be download , and i don`t want use "HTML AGILITY PACK"
i search in google but all site make me more confused ,
i tried regex but only one result ... ,
People are giving you the right answer - you can't be picky and lazy, too. ;-)
If you use a half-baked solution, you'll deal with a lot of edge cases. Here's a working sample that gets all links in an HTML document using HTML Agility Pack (it's included in the HTML Agility Pack download).
And here's a blog post that shows how to grab all images in an HTML document with HTML Agility Pack and LINQ
// Bing Image Result for Cat, First Page
string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";
// For speed of dev, I use a WebClient
WebClient client = new WebClient();
string html = client.DownloadString(url);
// Load the Html into the agility pack
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
// Now, using LINQ to get all Images
List<HtmlNode> imageNodes = null;
imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img")
where node.Name == "img"
&& node.Attributes["class"] != null
&& node.Attributes["class"].Value.StartsWith("img_")
select node).ToList();
foreach(HtmlNode node in imageNodes)
{
Console.WriteLine(node.Attributes["src"].Value);
}
First of all I just can't leave this phrase alone:
images stored in html
That phrase is probably a big part of the reason your question was down-voted twice. Images are not stored in html. Html pages have references to images that web browsers download separately.
This means you need to do this in three steps: first download the html, then find the image references inside the html, and finally use those references to download the images themselves.
To accomplish this, look at the System.Net.WebClient() class. It has a .DownloadString() method you can use to get the html. Then you need to find all the <img /> tags. You're own your own here, but it's straightforward enough. Finally, you use WebClient's .DownloadData() or DownloadFile() methods to retrieve the images.
You can use a WebBrowser control and extract the HTML from that e.g.
System.Windows.Forms.WebBrowser objWebBrowser = new System.Windows.Forms.WebBrowser();
objWebBrowser.Navigate(new Uri("your url of html document"));
System.Windows.Forms.HtmlDocument objDoc = objWebBrowser.Document;
System.Windows.Forms.HtmlElementCollection aColl = objDoc.All.GetElementsByName("IMG");
...
or directly invoke the IHTMLDocument family of COM interfaces
In general terms
You need to fetch the html page
Search for img tags and extract the src="..." portion out of them
Keep a list of all these extracted image urls.
Download them one by one.
Maybe this question about C# HTML parser will help you a little bit more.