WebBrowser Control c#: finding particular link and "clicking" it programmatically? - c#

I am working on csv downloader project ,i need to download the CSV files generated on the webpage . and using html agility , i found the exact link that contain the link for csv file
Download file in csv format
now i want , without any activity from my side , the application must detect this link in the web page ( i could do it by Htmlagility ) and should download the file once the web page fully navigated in Web browser in my app. I tried some example in one of the SO click here post but getting
Error :Object reference not set to an instance of an object.
HtmlElementCollection links = webBrowser.Document.GetElementsByTagName("A");
foreach (HtmlElement link in links) // this ex is given another SO post
{
if (link.InnerText.Equals("My Assigned"))
link.InvokeMember("Click");
}
Can any body suggest how to do it ??
Solved :
I changed to HtmlElementCollection links = webBrowser.Document.GetElementsByTagName("A"); to HtmlElementCollection links = webBrowser1.Document.Links and used
if (link.InnerText.Contains("My Assigned"))
{
link.InvokeMember("Click");
}
. any one who better solution?

InnerText might be null so build in a safeguard, to check for null:
if ((link.InnerText != null) && (link.InnerText.Equals("My Assigned")) )
link.InvokeMember("Click");

Actually, I would get rid of HTMLAgility pack (its pretty bad) and just go/loop through it yourself. Also, don't use innerText, because based on your examples, there doesn't seem to be an innertext in at least one of the links. Use the .href attribute and check for the .csv extension.
link.href.EndsWith(".csv")
And if there are more than one .cvs on each page, look for some url string or innertext property to refine it.
Also, the reason why your .GetElementsByTagName("A") was not working was because TagName refers to the name attribute of any particular TAG. So, you were saying, Get all TAG's with the TagType name="A"... does that make sense? I think there is a .GetElementsByTag[Type] or something like that which you can use to base it on the tag type and not the name attribute of a TAG.
Also, how are you downloading the .csv file? Is a "download dialog" box coming up or are you just showing people in the webbrowser control? (curious how you've handled that part).

Related

CefSharp possible to load html content?

I need to create a application which loads a html "template" file and parse them with current data values. So far no problemm but does anyone knows how to load the parsed html value into the cefsharp browser ?
I found some old topics here with an "loadHtml()" function. But this function isnt there anymore.
Thanks in advance
You need to add a using CefSharp; statement to your code to access the LoadHtml extensions methods.
chromiumWebBrowser.LoadHtml(html);
const string html = "<html><head><title>Test</title></head><body><h1>Html Encoded in URL!</h1></body></html>";
var base64EncodedHtml = Convert.ToBase64String(Encoding.UTF8.GetBytes(html));
browser.Load("data:text/html;base64," + base64EncodedHtml);
From the project wiki on github: Loading HTML/CSS/JavaScript/etc from disk/database/embedded resource/stream

Get Captcha Image from Web Browser control without using SRC

I know this question might sound familiar and there are plenty of posts out there on google with the same title BUT trust me this is different.
Editor : VS2008 (cannot upgrade it due to some technical difficulties)
Question
How to get Captcha Image from a Web Browser without using SRC?
Why wouldn't you use SRC?
Here is the site from which i am trying to get my Captcha Image
https://services.gst.gov.in/services/login
(The capta image appears once you type anything in User Name)
Now if you right click on the Captcha Image and go to inspect element you will see that the SRC of the captcha is:-
https://services.gst.gov.in/services/captcha?rnd=0.5313315062651027
and whenever you try to go to that link it will give you a captcha that is different from the previous one. That is why i cant use the below code because it shows different captcha than the one showing in the WebBrowser right now.
HtmlElement element = webBrowser1.Document.GetElementById("imgCaptcha");
string src = element.GetAttribute("src");
pictureBox1.Load(element.GetAttribute("src"));
You can use createControlRange to create a controlRange of non-text elements. Then find the image tag, for example by using id, then add the image tag to the control range and call it's execCommand method to execute Copy command, and finally, get the image from clipboard:
.NET 3.5
Add a reference to MSHTML. You can find it by Microsoft HTML Object Library under COM references and then add using mshtml;. Then:
IHTMLElement2 body = (IHTMLElement2)webBrowser1.Document.Body.DomElement;
IHTMLControlRange controlRange = (IHTMLControlRange)body.createControlRange();
IHTMLControlElement element = (IHTMLControlElement)webBrowser1.Document
.GetElementById("imgCaptcha").DomElement;
controlRange.add(element);
controlRange.execCommand("Copy", false, null);
pictureBox1.Image = (Bitmap)Clipboard.GetDataObject().GetData(DataFormats.Bitmap);
.NET >= 4.0
You don't need to add a reference, you can take advantage of dynamic:
dynamic body = webBrowser1.Document.Body.DomElement;
dynamic controlRange = body.createControlRange();
dynamic element = webBrowser1.Document.GetElementById("imgCaptcha").DomElement;
controlRange.add(element);
controlRange.execCommand("Copy", false, null);
pictureBox1.Image = (Bitmap)Clipboard.GetDataObject().GetData(DataFormats.Bitmap);
Note:
Run the code when the document is completed, for example in DocumentCompleted event.
Also you may want to add null checking to the code.
I used above code to get the google logo from https://www.google.com by id hplogo.
I also tested above code, by browsing https://demos.captcha.com/demos/features/captcha-demo.aspx and finding the captcah image by c_captchademo_samplecaptcha_CaptchaImage as id of the captcha image.

Getting nodes from html page using HtmlAgilityPack

My program collects info about Steam users' profiles (such as games, badges and etc.). I use HtmlAgilityPack to collect data from html page and so far it worked for me just good.
The problem is that on some pages it works well, but on some - returns null nodes or throws an exception
object reference not set to an instance of an object
Here's an example.
This part works well (when I'm getting badges):
WebClient client = new WebClient();
string html = client.DownloadString("http://steamcommunity.com/profiles/*id*/badges/");
var doc = new HtmlDocument();
doc.LoadHtml(html);
HtmlNodeCollection div = doc.DocumentNode.SelectNodes("//div[#class=\"badge_row is_link\"]");
This returns the exact amout of badges and then I can do whatever I want with them.
But in this one I do the exact same thing (but getting games), and somehow it keeps throwing me and error I mentioned above:
WebClient client = new WebClient();
string html = client.DownloadString("http://steamcommunity.com/profiles/*id*/games/?tab=all");
var doc = new HtmlDocument();
doc.LoadHtml(html);
HtmlNodeCollection div = doc.DocumentNode.SelectNodes("//*[#id='game_33120']");
I know that there is the node on the page (checked via google chrome code view) and I don't know why in 1st case it works, but in the 2nd it doesn't.
When you right-click on the page and choose View Source do you still see an element with id='game_33120'? My guess is you won't. My guess is that the page is being built dynamically, client-side. Therefore, the HTML that comes down in the request doesn't contain the element you're looking for. Instead that element appears once the Javascript code has run in the browser.
It appears that the original request will have a section of Javascript that contains a variable called rgGames which is a Javascript array of the games that will be rendered on the screen. You should be able to extract the information from that.
I dont understand the selectNodes method with this parameter "//*[#id='game_33120']", maybe is this your fault, but you can check this:
The real link of an steamprofil with batches etc is:
http://steamcommunity.com/id/id/badges/
and not
http://steamcommunity.com/profiles/id/badges/
after I visited an badges page, the url stay in the browser, at the games link, they redirect you to
http:// steamcommunity. com
Maybe this can help you

Listing extended tags in files with Tagliib

I am working with .flac audio files that use extended tags for a bit of magic. There is a tag called ReleaseGuid. I want to be able to list the contents or create the tag if it doesn't exist. I have done the prerequisite beating of my head against the wall for three days now. I have found a way to add a usertextinformation frame...although I don't see the value just the Owner. Please help me figure this out.
The following are lines of code that at least compile and seem to do something.
I need to get this to the point where I can add the needed tag.
File objFile = TagLib.File.Create(path);
TagLib.Id3v2.Tag id3v2tag = (TagLib.Id3v2.Tag)objFile.GetTag TagLib.TagTypes.Id3v2, true);
if (id3v2tag != null)
{
// Get the private frame, create if necessary.
PrivateFrame frame = PrivateFrame.Get(id3v2tag, "Mytag", true);
frame.PrivateData = System.Text.Encoding.Unicode.GetBytes "MyInfo");
id3v2tag.AddFrame(frame);
}
I have used mp3tag to see the tags I am needing by clicking on "extended tags".
Which type of tags would these be if I can add them using mp3tag? How do I read/write them using taglib?
To search for the tag type, you can open the (.flac) audio file in a texteditor like Notepad++ and search for your 'ReleaseGuid'. In front of this ID you will see the type like TXXX or PRIV or COMM.
Or you can have a look into the documentation (source code?) of the program who writes this 'ReleaseGuid' in your audio files.

c# find image in html and download them

i want download all images stored in html(web page) , i dont know how much image will be download , and i don`t want use "HTML AGILITY PACK"
i search in google but all site make me more confused ,
i tried regex but only one result ... ,
People are giving you the right answer - you can't be picky and lazy, too. ;-)
If you use a half-baked solution, you'll deal with a lot of edge cases. Here's a working sample that gets all links in an HTML document using HTML Agility Pack (it's included in the HTML Agility Pack download).
And here's a blog post that shows how to grab all images in an HTML document with HTML Agility Pack and LINQ
// Bing Image Result for Cat, First Page
string url = "http://www.bing.com/images/search?q=cat&go=&form=QB&qs=n";
// For speed of dev, I use a WebClient
WebClient client = new WebClient();
string html = client.DownloadString(url);
// Load the Html into the agility pack
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
// Now, using LINQ to get all Images
List<HtmlNode> imageNodes = null;
imageNodes = (from HtmlNode node in doc.DocumentNode.SelectNodes("//img")
where node.Name == "img"
&& node.Attributes["class"] != null
&& node.Attributes["class"].Value.StartsWith("img_")
select node).ToList();
foreach(HtmlNode node in imageNodes)
{
Console.WriteLine(node.Attributes["src"].Value);
}
First of all I just can't leave this phrase alone:
images stored in html
That phrase is probably a big part of the reason your question was down-voted twice. Images are not stored in html. Html pages have references to images that web browsers download separately.
This means you need to do this in three steps: first download the html, then find the image references inside the html, and finally use those references to download the images themselves.
To accomplish this, look at the System.Net.WebClient() class. It has a .DownloadString() method you can use to get the html. Then you need to find all the <img /> tags. You're own your own here, but it's straightforward enough. Finally, you use WebClient's .DownloadData() or DownloadFile() methods to retrieve the images.
You can use a WebBrowser control and extract the HTML from that e.g.
System.Windows.Forms.WebBrowser objWebBrowser = new System.Windows.Forms.WebBrowser();
objWebBrowser.Navigate(new Uri("your url of html document"));
System.Windows.Forms.HtmlDocument objDoc = objWebBrowser.Document;
System.Windows.Forms.HtmlElementCollection aColl = objDoc.All.GetElementsByName("IMG");
...
or directly invoke the IHTMLDocument family of COM interfaces
In general terms
You need to fetch the html page
Search for img tags and extract the src="..." portion out of them
Keep a list of all these extracted image urls.
Download them one by one.
Maybe this question about C# HTML parser will help you a little bit more.

Categories