I need to check if a element exists basically and if it does I want to open a url then back to the original page and then continue writing as it was. I tried a few approaches but they kept giving throwing exceptions. I added comments to the lines in question. I just cant figure out how to implement it.
foreach (string line in File.ReadLines(#"C:\\tumblrextract\\in7.txt"))
{
if (line.Contains("#"))
{
searchEmail.SendKeys(line);
submitButton.Click();
var result = driver.FindElement(By.ClassName("invite_someone_success")).Text;
if (driver.FindElements(By.ClassName("invite_someone_failure")).Count != 0)
// If invite_someone_failure exists open this url
driver.Url = "https://www.tumblr.com/lookup";
else
// Then back to following page and continue searchEmail.SendKeys(line); submitButton.Click(); write loop
driver.Url = "https://www.tumblr.com/following";
using (StreamWriter writer = File.AppendText("C:\\tumblrextract\\out7.txt"))
{
writer.WriteLine(result + ":" + line);
}
}
}
What is the exception you are getting? probably it may be Null reference exception. Please consider adding Null check in your code for the following
if(By.ClassName("invite_someone_success") != null){
var result = driver.FindElement(By.ClassName("invite_someone_success")).Text;
}
Above is not verified/exact code, just a pseudo code
you are using selenium and your might throw exceptions in some lines of code you have there - also take in consideration that i don't know tumblr website and it's html structure.
But first:
You're in a foreach loop and everytime you load at least once a page, all your elements will Stale, so this lines:
var searchEmail = driver.FindElement(By.Name("follow_this"));
var submitButton = driver.FindElement(By.Name("submit"));
will probably Stale in the next execution. (ElementStaleException).
Paste them too after:
driver.Url = "https://www.tumblr.com/following";
Second:
when using FindElement method you have to make sure the element exists or an ElementNotFoundException will also be thrown.
var result = driver.FindElement(By.ClassName("invite_someone_success")).Text;
var isThere = driver.FindElements(By.ClassName("invite_someone_failure"));
the dotNet selenium client have a static (i believe) class to help with that it's the ExpectedCondition
that you can use to check if an element is present before trying to read it's text..
I Invite you to understand how selenium works, specially StaleElementReferenceException.
Have fun.
Related
What my goal with my piece of code is not to save duplicate domains to a .txt file if a checkbox is ticked.
Code:
// save to file here
if (footprints.Any(externalUrl.Contains))
{
// Load all URLs into an array ...
var hash = new List<string>(File.ReadAllLines(#"Links\" + lblFootprintsUsed.Text));
// Find the domain root url e.g. site.com ...
var root = Helpers.GetRootUrl(externalUrl);
if (chkBoxDoNotSaveDuplicateDomains.Checked == true)
{
if (!hash.Contains(Helpers.GetRootUrl(externalUrl)))
{
using (var sr = new StreamWriter(#"Links\" + lblFootprintsUsed.Text, true))
{
// before saving make & to & and get rid of #038; altogether ...
var newURL = externalUrl.Replace("&", "&").Replace("#038;", " ");
sr.WriteLine(newURL);
footprintsCount++;
}
}
}
if (chkBoxDoNotSaveDuplicateDomains.Checked == false)
{
if (!hash.Contains(externalUrl))
{
using (var sr = new StreamWriter(#"Links\" + lblFootprintsUsed.Text, true))
{
// before saving make & to & and get rid of #038; altogether ...
var newURL = externalUrl.Replace("&", "&").Replace("#038;", " ");
sr.WriteLine(newURL);
footprintsCount++;
}
}
}
}
The code above starts off by checking if a certain footprint pattern is found in a URL structure, if it does we load all URLs into a List the way !hash.Contains(externalUrl) should work is NOT to add duplicate URLs to the .txt file, but i can see from testing it does add complete duplicate URLs (the first issue) i never noticed this before, then i tried to add !hash.Contains(Helpers.GetRootUrl(externalUrl)) which should not add duplicate domains to the .txt file.
So unchecked, the code should not add duplicate URLs to file.
And checked the code should not add duplicate domains to file.
Both seem to fail, i cannot see any issue in the code as such, is there anyhting i am missing or could do better? any help is appreciated.
Here you are adding the full URL to the file, but while checking you are comparing only with the root URL
Modify the condition
if (!hash.Contains(Helpers.GetRootUrl(externalUrl)))
to
if (!hash.Any(x => x.Contains(Helpers.GetRootUrl(externalUrl))))
I'm having trouble with datascraping on this web address: http://patorjk.com/software/taag/#p=display&f=Graffiti&t=Type%20Something%20.
The problem is: I've written a code that is supposed to grab the contents of a certain node and display it on console. However, the contents withing the node and the specific node itself seem to be unreachable, but I know they exists for the fact that I've created a condition within my code in order to let me know if nodes withing a certain body are being found and it is indeed being found but not displayed for some reason:
private static void getTextArt(string font, string word)
{
HtmlWeb web = new HtmlWeb();
//cureHtml method is just meant to return the http address
HtmlDocument htmlDoc = web.Load(cureHtml(font, word));
if(web.Load(cureHtml(font, word)) != null)
Console.WriteLine("Connection Established");
else
Console.WriteLine("Connection Failed!");
var nodes = htmlDoc.DocumentNode.SelectSingleNode(nodeXpath).ChildNodes;
foreach(HtmlNode node in nodes)
{
if(node != null)
Console.WriteLine("Node Found.");
else
Console.WriteLine("Node not found!");
Console.WriteLine(node.OuterHtml);
}
}
private const string nodeXpath = "//div[#id='maincontent']";
}
The Html displayed by the website looks like this:
The Html code within the website. Arrows point at the node I'm trying to reach and the content within it I'm trying to display on the console
When I run my code on console to check for the node and its contents and try to display the OuterHtml string of the Xpath, this is how console will display it to me:
Console Window Display
I hope some of you are able to explain to me why is it behaving this way. I've tried all kinds of searches on google for two days trying to figure out the problem for no use. Thank you all in advance.
The content you desire is loaded dynamically.
Use the HtmlWeb.LoadFromBrowser() method instead. Also, check htmlDoc for null, instead of calling it twice. Your current logic doesn't guarantee your state.
HtmlDocument htmlDoc = web.LoadFromBrowser(cureHtml(font, word));
if (htmlDoc != null)
Console.WriteLine("Connection Established");
else
Console.WriteLine("Connection Failed!");
Also, you'll need to decode the result.
Console.WriteLine(WebUtility.HtmlDecode(node.OuterHtml));
If this doesn't work, then your cureHtml() method is broken, or you're targeting .NET Core :)
I am working on web scraping, to get values from yello pages and while iterating through pages the loop function isnt getting the page count increment. I have added a loop its keep on showing data from same page. i am attaching my code below.
static void Main(string[] args)
{
string webUrl = "https://www.yellowpages.com";
bool Loop = true;
HtmlWeb Web = new HtmlWeb();
//First Url
HtmlDocument doc = Web.Load(webUrl + "/search?search_terms=software&geo_location_terms=Los+Angeles%2C+CA");
var HeaderName = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var abc in HeaderName)
{
Console.WriteLine(abc.InnerText);
}
//Loop through different pages from the paging of that first url and then keep on doing it until Next button returns nothing
while (Loop == true)
{
var NextPageCheck = doc.DocumentNode.SelectNodes("//a[text()='Next']/#href").ToList();
if (NextPageCheck.Count != 0)
{
string link = webUrl + NextPageCheck[0].Attributes["href"].Value;
doc = Web.Load(link);
HeaderName = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var abc in HeaderName)
{
Console.WriteLine(abc.InnerText);
}
}
else
{
Loop = false;
}
}
}
So the issue i am facing is, it keeps on showing the result from 2nd page. i want it to iterate that page and till there is no page number left like if it has 400 pages(in total), it should take that page url to 400
https://www.yellowpages.com/search?search_terms=software&geo_location_terms=Los%20Angeles%2C%20CA&page=2
page=2
Whilst debugging your code it seems I was getting a null error on the line in which you looking for the business names the second time around, in the version of HtmlAgilityPack that had installed it was encoding the urls so I simply added a decoding to the url
string link = webUrl + NextPageCheck[0].Attributes["href"].Value;
var urlDecode = HttpUtility.HtmlDecode(link);
doc = Web.Load(urlDecode);
And it seemed to work fine - as the comment says next time you post it would be helpful to post the error you are getting and what line so it's easier and faster to track down the actual bug
Hope this helps.
I have some code that reads in an xml file. However it is triggering an error at the 3rd IF statement:
if (xdoc.Root.Descendants("HOST").Descendants("Default")
.FirstOrDefault().Descendants("HostID")
.FirstOrDefault().Descendants("Deployment").Any())
Error:
System.NullReferenceException: Object reference not set to an instance of an object.
That is because in this particular file there is no [HOST] section.
I was assuming that on the first IF statement, if it didn't find any [HOST]section it would not go into the statement and therefore i should not get this error. Is there a way to check if a section exists first?
XDocument xdoc = XDocument.Load(myXmlFile);
if (xdoc.Root.Descendants("HOST").Any())
{
if (xdoc.Root.Descendants("HOST").Descendants("Default").Any())
{
if (xdoc.Root.Descendants("HOST").Descendants("Default").FirstOrDefault().Descendants("HostID").FirstOrDefault().Descendants("Deployment").Any())
{
if (xdoc.Root.Descendants("HOST").Descendants("Default").FirstOrDefault().Descendants("HostID").Any())
{
var hopsTempplateDeployment = xdoc.Root.Descendants("HOST").Descendants("Default").FirstOrDefault().Descendants("HostID").FirstOrDefault().Descendants("Deployment").FirstOrDefault();
deploymentKind = hopsTempplateDeployment.Attribute("DeploymentKind");
host = hopsTempplateDeployment.Attribute("HostName");
}
}
}
}
Within the body of this if block...
if (xdoc.Root.Descendants("HOST").Descendants("Default").Any())
{
if (xdoc.Root.Descendants("HOST").Descendants("Default").FirstOrDefault().Descendants("HostID").FirstOrDefault().Descendants("Deployment").Any())
{
if (xdoc.Root.Descendants("HOST").Descendants("Default").FirstOrDefault().Descendants("HostID").Any())
{
var hopsTempplateDeployment = xdoc.Root.Descendants("HOST").Descendants("Default").FirstOrDefault().Descendants("HostID").FirstOrDefault().Descendants("Deployment").FirstOrDefault();
deploymentKind = hopsTempplateDeployment.Attribute("DeploymentKind");
host = hopsTempplateDeployment.Attribute("HostName");
}
}
}
...you have established that the element <Root>/HOST/Default exists. You however don't know whether <Root>/HOST/Default/HostId/Deployment exists. If it doesn't you will get a NullReferenceException like the one you're experiencing due to the use of FirstOrDefault. It is generally recommended to use First in cases where you expect the elements to be present, which will give you at least a better error message.
If you expect the elements to be not present, a simple solution is to use the ?. along the respective LINQ2XML axis:
var hopsTemplateDeployment =
xdoc.Root.Descendants("HOST").Descendants("Default").FirstOrDefault()
?.Descendants("HostID").FirstOrDefault()
?.Descendants("Deployment").FirstOrDefault();
if (hopsTemplateDeployment != null)
{
deploymentKind = hopsTemplateDeployment.Attribute("DeploymentKind");
host = hopsTemplateDeployment.Attribute("HostName");
}
It will also save you the chain of nested if clauses.
I'm trying to get all languages from Google Translate. When I Open Developer Tools and click one of the language when all languages are popped (when arrow clicked), It gives //*[#id=':7']/div/text() for Arabic, but it returns null when I try to get node:
async Task AddLanguages()
{
try
{
// //*[#id=":6"]/div/text()
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
for (int i = 6; i <= 9; i++)
{
//*[#id=":6"]/div/text() //*[#id=":6"]/div/div
Debug.WriteLine(i);
var element = document.DocumentNode.SelectSingleNode("//*[#id=':7']/div/text()");
Trace.WriteLine(element == null, "Element is null");
}
}
catch (Exception e)
{
this.ShowMessageAsync("Hata!", "Dilleri yüklerken hata ortaya çıktı.");
}
}
Element is null: True outputs all the times ( I was trying to use for loop to loop through languages but, it doesnt even work for single one!)
I guess your xpath is wrong. You can try something like:
string Url = "https://translate.google.com/";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
var arabic = doc.DocumentNode.Descendants("div").FirstOrDefault(_ => _.ChildNodes.Any(node => node.Name.Equals("#text") && node.InnerText.Equals("Arabic")));
Since I can't comment yet...Have you tried clicking on the dropdwon first before looking for the elements?
Clicking on //*[#id='gt-sl-gms'] or it's inner div would make the elements visible..
That should work..
Anyway, I can't make $x work for the console in google chrome. I'm getting an Uncaught Type Error currently. Not sure if that has to do with anything..
Edit: Oh wait i think I know your problem..upon closer inspection of the element, it seems that the element (div) has another div before the text. so try /*[#id=':7']/div/text()[2]