I am working on web scraping, to get values from yello pages and while iterating through pages the loop function isnt getting the page count increment. I have added a loop its keep on showing data from same page. i am attaching my code below.
static void Main(string[] args)
{
string webUrl = "https://www.yellowpages.com";
bool Loop = true;
HtmlWeb Web = new HtmlWeb();
//First Url
HtmlDocument doc = Web.Load(webUrl + "/search?search_terms=software&geo_location_terms=Los+Angeles%2C+CA");
var HeaderName = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var abc in HeaderName)
{
Console.WriteLine(abc.InnerText);
}
//Loop through different pages from the paging of that first url and then keep on doing it until Next button returns nothing
while (Loop == true)
{
var NextPageCheck = doc.DocumentNode.SelectNodes("//a[text()='Next']/#href").ToList();
if (NextPageCheck.Count != 0)
{
string link = webUrl + NextPageCheck[0].Attributes["href"].Value;
doc = Web.Load(link);
HeaderName = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var abc in HeaderName)
{
Console.WriteLine(abc.InnerText);
}
}
else
{
Loop = false;
}
}
}
So the issue i am facing is, it keeps on showing the result from 2nd page. i want it to iterate that page and till there is no page number left like if it has 400 pages(in total), it should take that page url to 400
https://www.yellowpages.com/search?search_terms=software&geo_location_terms=Los%20Angeles%2C%20CA&page=2
page=2
Whilst debugging your code it seems I was getting a null error on the line in which you looking for the business names the second time around, in the version of HtmlAgilityPack that had installed it was encoding the urls so I simply added a decoding to the url
string link = webUrl + NextPageCheck[0].Attributes["href"].Value;
var urlDecode = HttpUtility.HtmlDecode(link);
doc = Web.Load(urlDecode);
And it seemed to work fine - as the comment says next time you post it would be helpful to post the error you are getting and what line so it's easier and faster to track down the actual bug
Hope this helps.
Am working in asp.net and had to rewrite some urls rewriting is working fine here is an example I had to change URL mywebsite.com/search.aspx?cat=1 to mywebsite.com/search/cameras and it's working fine now I have to change page meta tags and when I try to get url by using
HttpContext.Current.Request.Url.PathAndQuery
am getting search.aspx?cat=1
while I want here is address written in address bar which is search/cameras
if it's not possible than is there any way to set meta tags for specific pages?
here is code for url rewrite
m_boolIsCustomPage = true;
m_strPageBaseUrl = "search.aspx";
if (m_intIDSearch > -1)
{
l_strQueryContents = m_intIDSearch.ToString();
m_intIDSearch = -1;
}
else
{
l_strQueryContents = "-1";
m_intIDSearch = -1;
}
HttpContext.Current.Request.RawUrl
As received by IIS prior to any manipulation.
Request.RawUrl vs. Request.Url
I want to check whether a facebook user liked my facebook page or not. I got so many solutions using javascript but I want to implement this requirement in ASP.Net.
I copied the code from the below link:
http://duanedawnrae.com/Blog/post/2012/02/29/Determine-if-a-Facebook-user-Likes-your-page-with-ASPNET.aspx
I got the below ASP.Net code which works for the same.
ASP.Net code:
public class WebService : System.Web.Services.WebService
{
[WebMethod()]
public string GetFacebookLikeStatus(string fbpageid, string fbappid, string fbtoken, string fburl)
{
string strReturn = null;
// Placeholder for the Facbook "like" API call
string strURL = null;
strURL = "https://graph.facebook.com/me/likes?access_token=" + fbtoken;
// Placeholder for the Facebook GET response
WebRequest objGETURL = null;
objGETURL = WebRequest.Create(strURL);
// Declare response stream
Stream objStream = null;
// Declare The Facebook response
string strLine = null;
// Declare a count on the search term
int intStr = 0;
try
{
// Create an instance of the StreamReader
StreamReader objReader = new StreamReader(objStream);
// Get the response from the Facebook API as a JSON string.
// If access_token is not correct for the logged
// on user Facebook returns (400) bad request error
objStream = objGETURL.GetResponse().GetResponseStream();
// If all is well
try
{
// Execute the StreamReader
strLine = objReader.ReadToEnd().ToString();
// Check if Facebook page Id exists or not
intStr = strLine.IndexOf(fbpageid); // if valid return a value
if (intStr > 0)
{
strReturn = "1";
// if not valid return a value
}
else
{
strReturn = "0";
}
objStream.Dispose();
}
catch (Exception ex)
{
// For testing comment out for production
strReturn = ex.ToString();
// Uncomment below for production
//strReturn = "Some friendly error message"
}
}
catch (Exception ex)
{
// For testing comment out for production
strReturn = ex.ToString();
// Uncomment below for production
//strReturn = "Some friendly error message"
}
return strReturn;
}
}
The above code contains a webservice which contains a single function. The function contains four input parameters and returns a single output string.
But when I run this webservice I got the error, “Value cannot be null. Parameter name: stream”. This error is coming because the “objStream” variable is set to null. Please fix the issue so that I can get my correct output as I dont know how to implement my requirement.
Like Gating is not allowed on Facebook, and neither is incentivizing users to like your Page. Users must like something only because they really want to, you can´t reward them in any way.
That being said, you would need the user_likes permission to use /me/likes, and you would need to get it approved by Facebook. Which will not happen just for checking if the user liked your Page.
Btw, that article is from 2012. A lot of stuff changed since then.
I am using Mailgun API. There is a section that I need to provide a URL to them, then they are going to HTTP Post some data to me.
I provide this URL (http://test.com/MailGun/Webhook.aspx) to Mailgun, so they can Post data. I have a list of parameter names that they are sending like (recipient,domain, ip,...).
I am not sure how get that posted data in my page.
In Webhook.aspx page I tried some code as follows but all of them are empty.
lblrecipient.text= Request.Form["recipient"];
lblip.Text= Request.Params["ip"];
lbldomain.Text = Request.QueryString["domain"];
Not sure what to try to get the posted data?
This code will list out all the form variables that are being sent in a POST. This way you can see if you have the proper names of the post values.
string[] keys = Request.Form.AllKeys;
for (int i= 0; i < keys.Length; i++)
{
Response.Write(keys[i] + ": " + Request.Form[keys[i]] + "<br>");
}
This code reads the raw input stream from the HTTP request. Use this if the data isn't available in Request.Form or other model bindings or if you need access to the bytes/text as it comes.
using(var reader = new StreamReader(Request.InputStream))
content = reader.ReadToEnd();
You can simply use Request["recipient"] to "read the HTTP values sent by a client during a Web request"
To access data from the QueryString, Form, Cookies, or ServerVariables
collections, you can write Request["key"]
Source:
MSDN
Update: Summarizing conversation
In order to view the values that MailGun is posting to your site you will need to read them from the web request that MailGun is making, record them somewhere and then display them on your page.
You should have one endpoint where MailGun will send the POST values to and another page that you use to view the recorded values.
It appears that right now you have one page. So when you view this page, and you read the Request values, you are reading the values from YOUR request, not MailGun.
You are missing a step. You need to log / store the values on your server (mailgun is a client). Then you need to retrieve those values on your server (your pc with your web browser will be a client). These will be two totally different aspx files (or the same one with different parameters).
aspx page 1 (the one that mailgun has):
var val = Request.Form["recipient"];
var file = new File(filename);
file.write(val);
close(file);
aspx page 2:
var contents = "";
if (File.exists(filename))
var file = File.open(filename);
contents = file.readtoend();
file.close()
Request.write(contents);
Use this:
public void ShowAllPostBackData()
{
if (IsPostBack)
{
string[] keys = Request.Form.AllKeys;
Literal ctlAllPostbackData = new Literal();
ctlAllPostbackData.Text = "<div class='well well-lg' style='border:1px solid black;z-index:99999;position:absolute;'><h3>All postback data:</h3><br />";
for (int i = 0; i < keys.Length; i++)
{
ctlAllPostbackData.Text += "<b>" + keys[i] + "</b>: " + Request[keys[i]] + "<br />";
}
ctlAllPostbackData.Text += "</div>";
this.Controls.Add(ctlAllPostbackData);
}
}
In the web browser, open up developer console (F12 in Chrome and IE), then open network tab and watch the request and response data. Another option - use Fiddler (http://fiddler2.com/).
When you get to see the POST request as it is being sent to your page, look into query string and headers. You will see whether your data comes in query string or as form - or maybe it is not being sent to your page at all.
UPDATE: sorry, had to look at MailGun APIs first, they do not go through your browser, requests come directly from their server. You'll have to debug and examine all members of Request.Params when you get the POST from MailGun.
Try this
string[] keys = Request.Form.AllKeys;
var value = "";
for (int i= 0; i < keys.Length; i++)
{
// here you get the name eg test[0].quantity
// keys[i];
// to get the value you use
value = Request.Form[keys[i]];
}
In my case because I assigned the post data to the header, this is how I get it:
protected void Page_Load(object sender, EventArgs e){
...
postValue = Request.Headers["Key"];
This is how I attached the value and key to the POST:
var request = new NSMutableUrlRequest(url){
HttpMethod = "POST",
Headers = NSDictionary.FromObjectAndKey(FromObject(value), FromObject("key"))
};
webView.LoadRequest(request);
You can try to check the 'Request.Form.Keys'. If it will not works well, you can use 'request.inputStream' to get the soap string which will tell you all the request keys.
How to screen scrape HTTPS using C#?
You can use System.Net.WebClient to start an HTTPS connection, and pull down the page to scrape with that.
Look into the Html Agility Pack.
You can use System.Net.WebClient to grab web pages. Here is an example: http://www.codersource.net/csharp_screen_scraping.html
If for some reason you're having trouble with accessing the page as a web-client or you want to make it seem like the request is from a browser, you could use the web-browser control in an app, load the page in it and use the source of the loaded content from the web-browser control.
Here's a concrete (albeit trivial) example. You can pass a ship name to VesselFinder in the querystring, but even if it only finds one ship with that name it still shows you the search results screen with one ship. This example detects that case and takes the user straight to the tracking map for the ship.
string strName = "SAFMARINE MAFADI";
string strURL = "https://www.vesselfinder.com/vessels?name=" + HttpUtility.UrlEncode(strName);
string strReturnURL = strURL;
string strToSearch = "/?imo=";
string strPage = string.Empty;
byte[] aReqtHTML;
WebClient objWebClient = new WebClient();
objWebClient.Headers.Add("User-Agent: Other"); //You must do this or HTTPS won't work
aReqtHTML = objWebClient.DownloadData(strURL); //Do the name search
UTF8Encoding utf8 = new UTF8Encoding();
strPage = utf8.GetString(aReqtHTML); // get the string from the bytes
if (strPage.IndexOf(strToSearch) != strPage.LastIndexOf(strToSearch))
{
//more than one instance found, so leave return URL as name search
}
else if (strPage.Contains(strToSearch) == true)
{
//find the ship's IMO
strPage = strPage.Substring(strPage.IndexOf(strToSearch)); //cut off the stuff before
strPage = strPage.Substring(0, strPage.IndexOf("\"")); //cut off the stuff after
}
strReturnURL = "https://www.vesselfinder.com" + strPage;