c# webclient parent page - c#

I need to obtain some data from website. Manually I would open search form, enter document ID, click search, get search result (page 1), click on specific icon near the only hit (document ID is unique) and get the desired string on another page (page 2).
Search query URL contains document ID explicitly, so there is no problem to load page 1. URL of the page 2 could be found in the source code of page 1. But the server seems to be checking the reference before loading page 2, and the code execution terminates with error 500.
C# code is:
using (WebClient client = new WebClient())
{
string htmlCode0 = client.DownloadString("page_1_url");
//page 2 URL consists of common part and 6 digits ID
string toSearch("page_2_url_common");
int index = htmlCode.IndexOf(toSearch);
string toFind = htmlCode.Substring(index, toSearch.Length + 6);
string htmlCode1 = client.DownloadString(toFind);
}
Firebug has shown this while loading the page 2:
if (jQuery) {
jQuery(window).bind('beforeunload', function () {
if (window.doNotLock2 === undefined) {
if (window.doNotLock) {
window.doNotLock = false;
} else {
showLoading();
}
}
try {
if (window.childPopup) {
window.childPopup.close();
unBlockWindow();
window.childPopup = null;
}
var win = window.opener;
while (win.parent && win.parent != win) win = win.parent;
win.unBlockWindow();
win = null;
} catch (error) {
// if window open from other context - window.opener access denied, do nothing
}
return;
});
}
Is there any way to skip or override this check and get page 2 content?

Related

getting through multiple pages in web scraping

I am working on web scraping, to get values from yello pages and while iterating through pages the loop function isnt getting the page count increment. I have added a loop its keep on showing data from same page. i am attaching my code below.
static void Main(string[] args)
{
string webUrl = "https://www.yellowpages.com";
bool Loop = true;
HtmlWeb Web = new HtmlWeb();
//First Url
HtmlDocument doc = Web.Load(webUrl + "/search?search_terms=software&geo_location_terms=Los+Angeles%2C+CA");
var HeaderName = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var abc in HeaderName)
{
Console.WriteLine(abc.InnerText);
}
//Loop through different pages from the paging of that first url and then keep on doing it until Next button returns nothing
while (Loop == true)
{
var NextPageCheck = doc.DocumentNode.SelectNodes("//a[text()='Next']/#href").ToList();
if (NextPageCheck.Count != 0)
{
string link = webUrl + NextPageCheck[0].Attributes["href"].Value;
doc = Web.Load(link);
HeaderName = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var abc in HeaderName)
{
Console.WriteLine(abc.InnerText);
}
}
else
{
Loop = false;
}
}
}
So the issue i am facing is, it keeps on showing the result from 2nd page. i want it to iterate that page and till there is no page number left like if it has 400 pages(in total), it should take that page url to 400
https://www.yellowpages.com/search?search_terms=software&geo_location_terms=Los%20Angeles%2C%20CA&page=2
page=2
Whilst debugging your code it seems I was getting a null error on the line in which you looking for the business names the second time around, in the version of HtmlAgilityPack that had installed it was encoding the urls so I simply added a decoding to the url
string link = webUrl + NextPageCheck[0].Attributes["href"].Value;
var urlDecode = HttpUtility.HtmlDecode(link);
doc = Web.Load(urlDecode);
And it seemed to work fine - as the comment says next time you post it would be helpful to post the error you are getting and what line so it's easier and faster to track down the actual bug
Hope this helps.

Aspose PDF - get text from page that has a matching string

I'm working with an existing library - the goal of the library is to pull text out of PDFs to verify against expected values to quality check recorded data vs data in pdf.
I'm looking for a way to succinctly pull a specific page worth of text given a string that should only fall on that specific page.
var pdfDocument = new Document(file.PdfFilePath);
var textAbsorber = new TextAbsorber{
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
pdfDocument.Pages.Accept(textAbsorber);
foreach (var page in pdfDocument.Pages)
{
}
I'm stuck inside the foreach(var page in pdfDocument.Pages) portion... or is that the right area to be looking?
Answer: Text Absorber recreated each page - inside the foreach loop.
If the absorber isn't recreated, it keeps text from previous loops.
public List<string> ProcessPage(MyInfoClass file, string find)
{
var pdfDocument = new Document(file.PdfFilePath);
foreach (Page page in pdfDocument.Pages)
{
var textAbsorber = new TextAbsorber {
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
page.Accept(textAbsorber);
var ext = textAbsorber.Text;
var exts = ext.Replace("\n", "").Split('\r').ToList();
if (ext.Contains(find))
return exts;
}
return null;
}

Visual Studio Load Test redirect URL from Siteminder

I have a security application called Siteminder. It creates unique URLS for every authentication. HTTPS://SITE/idp/**RANDOMURLSTRING**/resumeSAML20/idp/startSSO.ping
How can i capture the Unique URL and have the test continue to login.
A webtest assumes the next URL in the process. It does not support[Or I don't know how] a unique redirect to a random URL. Does anyone know of a way to handle this case?
EDIT:
My Solution -- Replace the SessionID with {{SessionID}} in all the URLS and use this extraction rule
public class ExtractSiteMinderCustomUrl : ExtractionRule
{
public string SiteMinderSessionID { get; private set; }
// The Extract method. The parameter e contains the web performance test context.
//---------------------------------------------------------------------
public override void Extract(object sender, ExtractionEventArgs e)
{
//look for anchor tags with URLS
Regex regex = new Regex("<a\\s+(?:[^>]*?\\s+)?href=\"([^\"]+\\?[^\"]+)\"");
MatchCollection match = regex.Matches(e.Response.BodyString);
if (match.Count > 0)
{
foreach (Match ItemMatch in match)
{
if (ItemMatch.ToString().Contains("/idp/"))
{
//start and ends string from the sitemindersession is in the link on the page
e.WebTest.Context.Add(this.ContextParameterName, GetStringBetween(ItemMatch.ToString(), "/idp/", "/resume"));
e.Success = true;
return;
}
}
e.Success = false;
e.Message = String.Format(CultureInfo.CurrentCulture, "Not Found in Link : /idp/");
}
else
{
e.Success = false;
e.Message = String.Format(CultureInfo.CurrentCulture, "No href tags found");
}
}
public static string GetStringBetween(string token, string first, string second)
{
if (!token.Contains(first)) return "";
var afterFirst = token.Split(new[] { first }, StringSplitOptions.None)[1];
if (!afterFirst.Contains(second)) return "";
var result = afterFirst.Split(new[] { second }, StringSplitOptions.None)[0];
return result;
}
}
The simple answer is to have use extraction rule that gets the **RANDOMURLSTRING** then change the URLs in the requests to be, for example, HTTPS://SITE/idp/{{TheRandomString}}/resumeSAML20/idp/startSSO.ping where TheRandomString is the context parameter that holds the extracted value. Note the doubled curly braces ({{ and }}) around the context parameter.
Suppose a value returned by the first redirection needs to be captured but a normal web test would redirect again and so the response is not seen by the extraction rules. In this case need to handle the redirect explicitly. Set the Follow redirects property of the initial request to false then add extraction rule(s) to gather the wanted values. Add a new request after the initial request and use the extracted values in it as necessary. It is possible to extract the entire redirected url and set the Url field to the extracted value.

Get the Control on a page by passing URL? Also get the current page URL?

My Requirement: I want to know which page I am currently in so that if any test fails I want to pass the current page's URL to a method and get the home button link. Ultimately navigating to the home link in case of any exception.
Is there a way to achieve it ?
The URL should be in the address bar of the browser, just read it out of there.
One way of reading out the value is to record an assertion on the value in the address bar, then copy the part of the code in the recorded assertion method that accesses the value.
Another way is to use the cross-hairs tool to select the address area, then (click the double-chevron icon to open the left hand pane and) add the UI control for the selected area. Then access the value.
This will return the top Browser:
BrowserWindow bw = null;
try
{
Playback.PlaybackSettings.WaitForReadyLevel = WaitForReadyLevel.AllThreads;
var browser = new BrowserWindow() /*{ TechnologyName = "MSAA" }*/;
PropertyExpressionCollection f = new PropertyExpressionCollection();
f.Add("TechnologyName", "MSAA");
f.Add("ClassName", "IEFrame");
f.Add("ControlType", "Window");
browser.SearchProperties.AddRange(f);
UITestControlCollection coll = browser.FindMatchingControls();
// get top of browser stack
foreach (BrowserWindow win in coll)
{
bw = win;
break;
}
String url = bw.Uri.ToString(); //this is the value you want to save
}
catch (Exception e)
{
throw new Exception("Exception getting active (top) browser: - ------" + e.Message);
}
finally
{
Playback.PlaybackSettings.WaitForReadyLevel = WaitForReadyLevel.UIThreadOnly;
}

Javascript works on IE but not on Firefox and gives me error as Error: cprofiledetailscollapse is not defined

I use C#.net.
I wrote JavaScript for hide and show expand and collapse div accordingly. It work well in IE but not on Firefox, not even call the JavaScript function and gives me error as Error: ctl00_cpContents_dlSearchList_ctl08_profiledetailscollapse is not defined.
My JavaScript is as follows
function displayDiv(divCompact, divExpand) {
//alert('1');
var str = "ctl00_cpContents_";
var divstyle = new String();
// alert("ibtnShowHide" + ibtnShowHide);
divstyle = divCompact.style.display;
if (divstyle.toLowerCase() == "block" || divstyle == "") {
divCompact.style.display = "none";
divExpand.style.display = "block";
// ibtnShowHide.ImageUrl = "images/expand_img.GIF";
}
else {
// ibtnShowHide.ImageUrl = "images/restore_img.GIF";
divCompact.style.display = "block";
divExpand.style.display = "none";
}
return false;
}
ctl00_cpContents_dlSearchList_ctl08_profiledetailscollapse is an element id generated by ASP.NET. It's a profiledetailscollapse control inside dlSearchList.
JavaScript variable "ctl00_cpContents_dlSearchList_ctl08_profiledetailscollapse" is not
defined. Firefox does not automatically create, for each element with an id, a
variable in the global scope named after that id and containing a reference
to the element.
You might want to consider using jQuery to make sure that your DOM manipulation is cross-browser compatible.

Categories