As i'm trying to web scrape a part from the website. Here is an image below.
as the pagination is checked in red box i need to extract value of last in the image above it is 151. So the pagination is dynamic which is hard to extract when i check using view page source in only <div class="jsx-46358917 pagination-wrapper text-center"></div> is shown as inside its value is missing as i understand it is dynamic but i need the last value from the pagination example 151.
Here is a code which i have done so far to web scrape it.
public void parseItem(HtmlDocument doc, string zipCode)
{
//Getting json data
if (doc.DocumentNode.LastChild.HasChildNodes)
{
var siteScripts = doc.DocumentNode.SelectSingleNode("//script[#id='__NEXT_DATA__']").InnerText;
var result = JsonConvert.DeserializeObject<RealtorModel>(siteScripts);
if (result != null)
{
foreach (var realtor in result.Props.CriteriaData.SrpShell.LoadedData.SearchResults.HomeSearch.Results)
{
string propertyId = "M" + realtor.PropertyId;
string address = realtor.Location.Address.Line + ", " + realtor.Location.Address.City + ", " + realtor.Location.Address.StateCode + " " + realtor.Location.Address.PostalCode;
string listingURL = hostName + "/realestateandhomes-detail/" + realtor.Permalink;
var url = realtor.PrimaryPhoto;
listings.Add(new Listings { PropertyID = propertyId, Address = address, Price = realtor.ListPrice, ImageURL = realtor.PrimaryPhoto.Href.AbsoluteUri, ListingURL = listingURL });
}
}
pageNumber = pageNumber + 1;
string nextUrl = "https://www.realtor.com/realestateandhomes-search/" + zipCode + "/type-single-family-home" + "/pg-" + pageNumber;
AddTask(nextUrl, this.parseItem, zipCode);
}
else
{
System.Threading.Thread.Sleep(60000);
string nextUrl = "https://www.realtor.com/realestateandhomes-search/" + zipCode + "/type-single-family-home" + "/pg-" + pageNumber;
AddTask(nextUrl, this.parseItem, zipCode);
}
}
As i get the complete page through scraping only thing is the last value of the paginate which i cannot extract due to its dynamic nature. How can i achieve to do so any hint would be helpful.
Related
I've been looking for this but I cannot seem to find the answer.
What I want to accomplish is the following:
Right now when I reply with my embed it shows for example:
footbal,baseball
But what I want it to be is the following:
football,
baseball
Spread over 2 different lines.
Does anyone know how to do this with text Code?
Thank you in advance
Here is the code:
var value = "";
int price = 0;
foreach (var Item in content)
{
value += Item.Item1 + ": " + Item.Item2.ToString();
price += Item.Item2;
}
return new EmbedFieldBuilder()
{
Name = category + " - " + price,
Value = value
};
Worked for me with simple "\n" or Environment.NewLine:
var embed = new EmbedBuilder
{
Author = new EmbedAuthorBuilder() { Name = "AuthorNameHere" },
Title = "Sports",
Color = Color.Orange,
Description = "Football" + "\n\n" + "Baseball"
}.Build();
//var channel = GetYourNeededChannel();
await channel.SendMessageAsync("", false, embed);
Also works with fields in embed:
Fields = new List<EmbedFieldBuilder>()
{
new EmbedFieldBuilder()
{
Name = "TestField1",
Value = "FieldValue1" + "\n\n" + "FieldValue2"
}
}
I have a page there is a button that generates a Link like this.
private string GenerateLINK(string NameID)
{
string NameID= ds.Tables[0].Rows[0]["FName"] + " " + ds.Tables[0].Rows[0]["LName"];
string sQS = ID+ "|" + ClientName;
var xCrypto = new CryptoServer();
string Vector= null;
string sEncrypted = null;
xCrypto.Encrypt3DES(sQS, ref sEncrypted, ref Vector);
string sURL = sEncrypted + "#######" + Vector;
sURL = Server.UrlEncode(sURL);
sURL = "https://www.Page.aspx?s=" + sURL;
return sURL;
}
This then gets sent to a user who clicks on it and goes to a page.
Now the issue is I take the link like this and DCode it.
private void DecryptQuerystring()
{
var sQS = Request.QueryString["s"];
sQS = Server.UrlDecode(sQS);
var idelim = sQS.IndexOf("###X####", StringComparison.Ordinal);
var sIv = sQS.Substring(idelim + 8);
sQS = sQS.Substring(0, idelim);
var xCrypto = new ICECrypto.CryptoServer();
sQS = xCrypto.Decrypt3DES(sQS, sIv);
string sID = sQS.Substring(0, sQS.IndexOf("|"));
studentID = sID;
Name = sQS.Substring(sQS.IndexOf("|") + 1);
Welcome.InnerText = "Welcome " + sQS.Substring(sQS.IndexOf("|") + 1);
}
The Problem is when the User gets there and if he puts in any word in the link it breaks the whole page showing the Server Error. I want user to NOT to be able to Edit the Link insert any thing in it. Any clue? Thanks in advance!
This is funny but I am answering my own question maybe someone else could use it.
So where I am doing the Decryption of the QueryString() i put in the word
Try {
// Do the Decryption here
}
Catch(Exception ex) {
// if any thing goes wrong in that Try it will hit here and then i will show error 404
}
I was trying to retrieve data from Amazon SimpleDB and currently it only displays data in text like domainName: {attribute1, value2} {attribute1, value2}.
How can I show the data in data grid view? My code is as follows:
public static List<String> GetItemByQuery(IAmazonSimpleDB simpleDBClient, string domainName)
{
List<String> Results = new List<String>(); ;
SelectResponse response = simpleDBClient.Select(new SelectRequest()
{
SelectExpression = "Select * from " + domainName
});
String res = domainName + " has: ";
foreach (Item item in response.Items)
{
res = item.Name + ": ";
foreach (Amazon.SimpleDB.Model.Attribute attribute in item.Attributes)
{
res += "{" + attribute.Name + ", " + attribute.Value + "}, ";
}
res = res.Remove(res.Length - 2);
Results.Add(res);
}
return Results;
}
How you an read here:
http://docs.aws.amazon.com/sdkfornet1/latest/apidocs/html/P_Amazon_SimpleDB_Model_SelectResult_Item.htm
your response.Items is
public List<Item> Item { get; set; }
so you should directly use to DataSource of your Grid, set autogenerate column to your grid to start to view the result
i am having problem that this following script generates Email address when page is loaded and i want to parse that email how can i do that?
tr>
<td align='right' class='generalinfo_left' >Email Address:</td>
<td class='generalinfo_right'><script type="text/javascript">
//<![CDATA[
var o3752aaa9bb29d904adeb88838117fd7c = String.fromCharCode(109);var f03de7e643c296e211edddbc3197b33f6 = String.fromCharCode(97);var k7c3bf82468602c0f8dff4950e4b6ff1e = String.fromCharCode(105);var b3eaa633e44451be8df1fa47d75149934 = 'l';var ma2fa16c3a3f532b780aaf0fa5a5b75c6 = 't';var re0c13fc69c03925782867a0540f8c084 = 'o';var j335f1365672123d1fcaf9a83b76f1b7b = String.fromCharCode(58);var f32820e1c54cbc3fa0d418cd1c195eaec = String.fromCharCode(105);var y8c24ea00a7a1edf1c01f794d487697e3 = String.fromCharCode(110);var bcc0ad4f628e703f9ff6e25b87b77ec34 = 'f';var c985c961c7ee85fe6a25d5a66fb421745 = String.fromCharCode(111);var z5ab4e3bdc353d621cea5babcc5dca417 = String.fromCharCode(64);var s4e087167cd0bac466344e72016511172 = String.fromCharCode(97);var re26f6ae180723793af62bc36d5ab2530 = String.fromCharCode(108);var ye1b53d01de118079a38de5e951586731 = 'c';var g9fc5710c9266ce08afbe4da24702dfdd = String.fromCharCode(105);var k5cd5ea1bac40fdbb8b133b7e356809c6 = String.fromCharCode(118);var fcd6e4771e956e270c6897d24ca51c256 = String.fromCharCode(97);var y9d7854a5921fa2be88c8cd72c7e2884e = String.fromCharCode(114);var xa58bea1ecad6fe7d2c736aab1df2df44 = '.';var e4569f6c98804675f7117a84abb0b8d5c = 'c';var o4d2081e2344020922dcb924690c9972e = 'o';var af150185e5eef8ecd8dc1b0a4977c7d55 = String.fromCharCode(109);document.write("<a href='" + o3752aaa9bb29d904adeb88838117fd7c + f03de7e643c296e211edddbc3197b33f6 + k7c3bf82468602c0f8dff4950e4b6ff1e + b3eaa633e44451be8df1fa47d75149934 + ma2fa16c3a3f532b780aaf0fa5a5b75c6 + re0c13fc69c03925782867a0540f8c084 + j335f1365672123d1fcaf9a83b76f1b7b + f32820e1c54cbc3fa0d418cd1c195eaec + y8c24ea00a7a1edf1c01f794d487697e3 + bcc0ad4f628e703f9ff6e25b87b77ec34 + c985c961c7ee85fe6a25d5a66fb421745 + z5ab4e3bdc353d621cea5babcc5dca417 + s4e087167cd0bac466344e72016511172 + re26f6ae180723793af62bc36d5ab2530 + ye1b53d01de118079a38de5e951586731 + g9fc5710c9266ce08afbe4da24702dfdd + k5cd5ea1bac40fdbb8b133b7e356809c6 + fcd6e4771e956e270c6897d24ca51c256 + y9d7854a5921fa2be88c8cd72c7e2884e + xa58bea1ecad6fe7d2c736aab1df2df44 + e4569f6c98804675f7117a84abb0b8d5c + o4d2081e2344020922dcb924690c9972e + af150185e5eef8ecd8dc1b0a4977c7d55 + "'>" + f32820e1c54cbc3fa0d418cd1c195eaec + y8c24ea00a7a1edf1c01f794d487697e3 + bcc0ad4f628e703f9ff6e25b87b77ec34 + c985c961c7ee85fe6a25d5a66fb421745 + z5ab4e3bdc353d621cea5babcc5dca417 + s4e087167cd0bac466344e72016511172 + re26f6ae180723793af62bc36d5ab2530 + ye1b53d01de118079a38de5e951586731 + g9fc5710c9266ce08afbe4da24702dfdd + k5cd5ea1bac40fdbb8b133b7e356809c6 + fcd6e4771e956e270c6897d24ca51c256 + y9d7854a5921fa2be88c8cd72c7e2884e + xa58bea1ecad6fe7d2c736aab1df2df44 + e4569f6c98804675f7117a84abb0b8d5c + o4d2081e2344020922dcb924690c9972e + af150185e5eef8ecd8dc1b0a4977c7d55 + "</a>")
//]]>;
</script></td>
out put is like this
<td class="generalinfo_right">
<script type="text/javascript">
same above script plus following Line
</script>someID#email.com</td>
I wrote my own custom parser that will read the script and parse Email from it.here goes the code
If this code can be optimized or can be written more neatly please let me know
private string ReadEmail(string EmailScript)
{
string EncriptedEmail = "";
string dataPart = "";
dataPart = EmailScript.Substring(0, EmailScript.IndexOf("document.write")).Replace("//<![CDATA[\r", "").Replace("\"", "").Replace("\r\n","");
EncriptedEmail = EmailScript.Replace("\"","");
EncriptedEmail = EncriptedEmail.Substring(EncriptedEmail.IndexOf("'> + "), EncriptedEmail.IndexOf(" + </a>") - EncriptedEmail.IndexOf("'> +")).Replace("'> +", "").Trim();
string[] requiredVariables = EncriptedEmail.Split('+');
List<string> ExtractedDataFromRaw = new List<string>();
string email = "";
foreach (string variable in requiredVariables)
{
string temp = dataPart.Substring(dataPart.IndexOf(variable),dataPart.Length-dataPart.IndexOf(variable)).Replace(" ","");
string tempValueofVariable = temp.Substring(0, temp.IndexOf(";"));
tempValueofVariable = tempValueofVariable.Substring(tempValueofVariable.IndexOf("="), tempValueofVariable.Length - temp.IndexOf("=")).Replace("=","");
if (tempValueofVariable.Contains("String.fromCharCode"))
{
tempValueofVariable = GetCharacterFromASCII(tempValueofVariable.Replace("String.fromCharCode(", "").Replace(")", ""));
}
ExtractedDataFromRaw.Add(tempValueofVariable.Replace("'",""));
email += tempValueofVariable.Replace("'", "");
}
return email;
}
private string GetCharacterFromASCII(string value)
{
int result = 0;
int.TryParse(value, out result);
return char.ConvertFromUtf32(result);
}
That code is building the email address one character at a time from character codepoints and then assembling it later. I suppose this is an attempt to prevent email spam. Depending on what you need to do, it might be easiest to just pull the email address from the link using jQuery or something. $('a[href^=mailto]').attr('href').substring(7) or something ought to do it.
Using the C# Facebook SDK 5.0.3 everything works fine whit the client.Get("/me").
But when retrieving the status, I should get aan arraylist "data" with all the status messages according to the facebook Graph API, but instead my data array is empty and I get a 'Index out of bounds' exception.
Does anyone have an idea what my problem could be?
if (Request.Params["code"] != null)
{
var client = new FacebookClient(GetAccessToken());
dynamic me = client.Get("/me");
imgUser.ImageUrl = "https://graph.facebook.com/" + me.id + "/picture";
lblUsername.Text = me.name;
lblHometown.Text = me.hometown.name;
lblBirthday.Text = me.birthday;
lblCurrenttown.Text = me.location.name;
lblEmail.Text = me.email;
lblOpleidingen.Text = "";
lblOpleidingen.Text += me.education[1].type + ": " + me.education[1].school.name + ", " + me.education[1].year.name + "<br />"
+ me.education[0].type + ": " + me.education[0].school.name + ", " + me.education[0].year.name;
lblSex.Text = me.gender;
dynamic status = client.get("/me/statuses");
txtStatus.Text = status.data[0].message;
}
It requires the read_stream permission. Ensure you have it.
Your permission array should look like follows:
string[] extendedPermissions = new[] { "user_about_me", "read_stream" };
if (extendedPermissions != null && extendedPermissions.Length > 0)
{
var scope = new StringBuilder();
scope.Append(string.Join(",", extendedPermissions));
parameters["scope"] = scope.ToString();
}
furthermore your second get() should be capitalized: Get()
dynamic status = client.get("/me/statuses");
dynamic status = client.Get("/me/statuses");