Hello I would determine a method to input any kind of location data (I can cater it to just about anything) such as a city/state, a zip code, a street address etc. And get back the local time for that location.
Is that functionality build in somewhere or is there a good resource/class I can use already developed?
Thanks.
Ended up retrieving the search result from a Google search, since I did not have the lat/long.
User HTML Agility to extract the page contents, filtered the nodes that contained "Time", a simple enough the first item was the needed result.
If you google "time cincinnati oh" you get back "1:41pm Friday (EDT) - Time in Cincinnati, OH" at the top of the page. this code block extracts that. The safety is if the time is unable to be determined, the search page only shows the results, so the first item in the array is like, "Showing the results for "yourSearch"" etc.
public void timeZoneUpdate()
{
try
{
arrayToParse.Clear();
string URL = #"https://www.google.com/search?q=time+" + rowCity + "%2C+" + rowState;
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(URL);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
//Console.Write(result);
HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();
htmlSnippet.Load(new StringReader(result));
bool foundSection = false;
foreach (HtmlAgilityPack.HtmlNode table in htmlSnippet.DocumentNode.SelectNodes("//table"))
{
foreach (HtmlAgilityPack.HtmlNode row in table.SelectNodes("tr"))
{
foreach (HtmlAgilityPack.HtmlNode cell in row.SelectNodes("td"))
{
if (cell.InnerText.Contains("Time"))
{
foundSection = true;
}
if (foundSection)
{
//Console.WriteLine("Cell value : " + cell.InnerText);
arrayToParse.Add(cell.InnerText);
}
}
}
}
retrievedTimeZone = arrayToParse[0].ToString().Split('-')[0].Trim();
if(retrievedTimeZone.Contains("Showing"))
{
retrievedTimeZone = "Undetermined";
}
}
Related
I am working on web scraping, to get values from yello pages and while iterating through pages the loop function isnt getting the page count increment. I have added a loop its keep on showing data from same page. i am attaching my code below.
static void Main(string[] args)
{
string webUrl = "https://www.yellowpages.com";
bool Loop = true;
HtmlWeb Web = new HtmlWeb();
//First Url
HtmlDocument doc = Web.Load(webUrl + "/search?search_terms=software&geo_location_terms=Los+Angeles%2C+CA");
var HeaderName = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var abc in HeaderName)
{
Console.WriteLine(abc.InnerText);
}
//Loop through different pages from the paging of that first url and then keep on doing it until Next button returns nothing
while (Loop == true)
{
var NextPageCheck = doc.DocumentNode.SelectNodes("//a[text()='Next']/#href").ToList();
if (NextPageCheck.Count != 0)
{
string link = webUrl + NextPageCheck[0].Attributes["href"].Value;
doc = Web.Load(link);
HeaderName = doc.DocumentNode.SelectNodes("//a[#class='business-name']").ToList();
foreach (var abc in HeaderName)
{
Console.WriteLine(abc.InnerText);
}
}
else
{
Loop = false;
}
}
}
So the issue i am facing is, it keeps on showing the result from 2nd page. i want it to iterate that page and till there is no page number left like if it has 400 pages(in total), it should take that page url to 400
https://www.yellowpages.com/search?search_terms=software&geo_location_terms=Los%20Angeles%2C%20CA&page=2
page=2
Whilst debugging your code it seems I was getting a null error on the line in which you looking for the business names the second time around, in the version of HtmlAgilityPack that had installed it was encoding the urls so I simply added a decoding to the url
string link = webUrl + NextPageCheck[0].Attributes["href"].Value;
var urlDecode = HttpUtility.HtmlDecode(link);
doc = Web.Load(urlDecode);
And it seemed to work fine - as the comment says next time you post it would be helpful to post the error you are getting and what line so it's easier and faster to track down the actual bug
Hope this helps.
I'm working with an existing library - the goal of the library is to pull text out of PDFs to verify against expected values to quality check recorded data vs data in pdf.
I'm looking for a way to succinctly pull a specific page worth of text given a string that should only fall on that specific page.
var pdfDocument = new Document(file.PdfFilePath);
var textAbsorber = new TextAbsorber{
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
pdfDocument.Pages.Accept(textAbsorber);
foreach (var page in pdfDocument.Pages)
{
}
I'm stuck inside the foreach(var page in pdfDocument.Pages) portion... or is that the right area to be looking?
Answer: Text Absorber recreated each page - inside the foreach loop.
If the absorber isn't recreated, it keeps text from previous loops.
public List<string> ProcessPage(MyInfoClass file, string find)
{
var pdfDocument = new Document(file.PdfFilePath);
foreach (Page page in pdfDocument.Pages)
{
var textAbsorber = new TextAbsorber {
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
page.Accept(textAbsorber);
var ext = textAbsorber.Text;
var exts = ext.Replace("\n", "").Split('\r').ToList();
if (ext.Contains(find))
return exts;
}
return null;
}
I'm using google finance to convert a currency to another.
The code that I'm using is shown below which was working fine. However, today, I'm facing the IndexOutofrange exception and getting a result of -1 for the indexes searched below (which means that my result does not contain CONVERTED VALUE which is 100% true after logging it.
Then I went to the same web request called, fed it the same parameter, and then inspected the source code from the web browser, and I got the VALUE .
What do you think might be the issue? From a web browser I get the whole result, and from my app the result is missing the converted value field.
private static string CurrencyConvert(decimal amount, string fromCurrency, string toCurrency)
{
try
{
//Grab your values and build your Web Request to the API
string apiURL = String.Format("https://www.google.com/finance/converter?a={0}&from={1}&to={2}&meta={3}", amount, fromCurrency, toCurrency, Guid.NewGuid().ToString());
//Make your Web Request and grab the results
var request = WebRequest.Create(apiURL);
//Get the Response
StreamReader streamReader = new StreamReader(request.GetResponse().GetResponseStream(), System.Text.Encoding.ASCII);
//Grab your converted value (ie 2.45 USD)
String temp = streamReader.ReadToEnd().ToString();
int pFrom = temp.IndexOf("<span class=bld>") + ("<span class=bld>").Length;
int pTo = temp.LastIndexOf("</span>");
System.Windows.MessageBox.Show(pFrom.ToString() + " " + pTo.ToString());
String result = temp.Substring(pFrom, pTo - pFrom);
// string result = Regex.Matches(streamReader.ReadToEnd(), "<span class=\"?bld\"?>([^<]+)</span>")[0].Groups[1].Value;
//Get the Result
return result;
}
catch(Exception ex )
{
return "";
}
}
problem with URL. use this one:https://finance.google.com/finance/converter?a={0}&from={1}&to={2}&meta={3}
meta parameter unnecessary
https://finance.google.com/finance/converter?a={0}&from={1}&to={2} works fine as well
I am still learning to work with XML and C#.
I have looked many places on how to get this to work properly but I am unable to solve this as of yet and was wondering if anyone can see where I am going wrong?
I am trying to get a list containing the node values for distance and duration for two seperate occasions. First should be just one pair which is the total dist/duration pair: /DirectionsResponse/route/leg/distance/value, then I'm trying to get a second list which will contain the steps version: /DirectionsResponse/route/leg/steps/distance/value. If I can get the second one working I can figure out the first.
Many Thanks
Jaie
public class MyNode
{
public string Distance { get; set; }
public string Duration { get; set; }
}
public class Program
{
static void Main(string[] args)
{
//The full URI
//http://maps.googleapis.com/maps/api/directions/xml?`enter code here`origin=Sydney+australia&destination=Melbourne+Australia&sensor=false
//refer: https://developers.google.com/maps/documentation/webservices/
string originAddress = "Canberra+Australia";
string destinationAddress = "sydney+Australia";
StringBuilder url = new StringBuilder();
//http://maps.googleapis.com/maps/api/directions/xml?
//different request format to distance API
url.Append("http://maps.googleapis.com/maps/api/directions/xml?");
url.Append(string.Format("origin={0}&", originAddress));
url.Append(string.Format("destination={0}", destinationAddress));
url.Append("&sensor=false&departure_time=1343605500&mode=driving");
WebRequest request = HttpWebRequest.Create(url.ToString());
var response = request.GetResponse();
var stream = response.GetResponseStream();
XDocument xdoc = XDocument.Load(stream);
List<MyNode> routes =
(from route in xdoc.Descendants("steps")
select new MyNode
{
Duration = route.Element("duration").Value,
Distance = route.Element("distance").Value,
}).ToList<MyNode>();
foreach (MyNode route in routes)
{
Console.WriteLine("Duration = {0}", route.Duration);
Console.WriteLine("Distance = {0}", route.Distance);
}
stream.Dispose();
}
}
I reckon you are pretty close to where you want to be, just a little debugging would get you over the line.
Here is a small snippet of what I threw together in LinqPad which I use to scratch together things before I commit them to code.
var origin = "Canberra+Australia";
var dest = "sydney+Australia";
var baseUrl = "http://maps.googleapis.com/maps/api/directions/xml?origin={0}&destination={1}&sensor=false&departure_time=1343605500&mode=driving";
var req = string.Format(baseUrl, origin, dest);
var resp = new System.Net.WebClient().DownloadString(req);
var doc = XDocument.Parse(resp);
var total = doc.Root.Element("route").Element("leg").Element("distance").Element("value").Value;
total.Dump();
var steps = (from row in doc.Root.Element("route").Element("leg").Elements("step")
select new
{
Duration = row.Element("duration").Element("value").Value,
Distance = row.Element("distance").Element("value").Value
}).ToList();
steps.Dump();
The Dump method spits the result out to the LinqPad results. I had a list of 16 items in my steps results, and the total distance was a value of 286372.
Hope this helps.
I've been trying to get either an <object> or an <embed> tag using:
HtmlNode videoObjectNode = doc.DocumentNode.SelectSingleNode("//object");
HtmlNode videoEmbedNode = doc.DocumentNode.SelectSingleNode("//embed");
This doesn't seem to work.
Can anyone please tell me how to get these tags and their InnerHtml?
A YouTube embedded video looks like this:
<embed height="385" width="640" type="application/x-shockwave-flash"
src="http://s.ytimg.com/yt/swf/watch-vfl184368.swf" id="movie_player" flashvars="..."
allowscriptaccess="always" allowfullscreen="true" bgcolor="#000000">
I got a feeling the JavaScript might stop the swf player from working, hope not...
Cheers
Update 2010-08-26 (in response to OP's comment):
I think you're thinking about it the wrong way, Alex. Suppose I wrote some C# code that looked like this:
string codeBlock = "if (x == 1) Console.WriteLine(\"Hello, World!\");";
Now, if I wrote a C# parser, should it recognize the contents of the string literal above as C# code and highlight it (or whatever) as such? No, because in the context of a well-formed C# file, that text represents a string to which the codeBlock variable is being assigned.
Similarly, in the HTML on YouTube's pages, the <object> and <embed> elements are not really elements at all in the context of the current HTML document. They are the contents of string values residing within JavaScript code.
In fact, if HtmlAgilityPack did ignore this fact and attempted to recognize all portions of text that could be HTML, it still wouldn't succeed with these elements because, being inside JavaScript, they're heavily escaped with \ characters (notice the precarious Unescape method in the code I posted to get around this issue).
I'm not saying my hacky solution below is the right way to approach this problem; I'm just explaining why obtaining these elements isn't as straightforward as grabbing them with HtmlAgilityPack.
YouTubeScraper
OK, Alex: you asked for it, so here it is. Some truly hacky code to extract your precious <object> and <embed> elements out from that sea of JavaScript.
class YouTubeScraper
{
public HtmlNode FindObjectElement(string url)
{
HtmlNodeCollection scriptNodes = FindScriptNodes(url);
for (int i = 0; i < scriptNodes.Count; ++i)
{
HtmlNode scriptNode = scriptNodes[i];
string javascript = scriptNode.InnerHtml;
int objectNodeLocation = javascript.IndexOf("<object");
if (objectNodeLocation != -1)
{
string htmlStart = javascript.Substring(objectNodeLocation);
int objectNodeEndLocation = htmlStart.IndexOf(">\" :");
if (objectNodeEndLocation != -1)
{
string finalEscapedHtml = htmlStart.Substring(0, objectNodeEndLocation + 1);
string unescaped = Unescape(finalEscapedHtml);
var objectDoc = new HtmlDocument();
objectDoc.LoadHtml(unescaped);
HtmlNode objectNode = objectDoc.GetElementbyId("movie_player");
return objectNode;
}
}
}
return null;
}
public HtmlNode FindEmbedElement(string url)
{
HtmlNodeCollection scriptNodes = FindScriptNodes(url);
for (int i = 0; i < scriptNodes.Count; ++i)
{
HtmlNode scriptNode = scriptNodes[i];
string javascript = scriptNode.InnerHtml;
int approxEmbedNodeLocation = javascript.IndexOf("<\\/object>\" : \"<embed");
if (approxEmbedNodeLocation != -1)
{
string htmlStart = javascript.Substring(approxEmbedNodeLocation + 15);
int embedNodeEndLocation = htmlStart.IndexOf(">\";");
if (embedNodeEndLocation != -1)
{
string finalEscapedHtml = htmlStart.Substring(0, embedNodeEndLocation + 1);
string unescaped = Unescape(finalEscapedHtml);
var embedDoc = new HtmlDocument();
embedDoc.LoadHtml(unescaped);
HtmlNode videoEmbedNode = embedDoc.GetElementbyId("movie_player");
return videoEmbedNode;
}
}
}
return null;
}
protected HtmlNodeCollection FindScriptNodes(string url)
{
var doc = new HtmlDocument();
WebRequest request = WebRequest.Create(url);
using (var response = request.GetResponse())
using (var stream = response.GetResponseStream())
{
doc.Load(stream);
}
HtmlNode root = doc.DocumentNode;
HtmlNodeCollection scriptNodes = root.SelectNodes("//script");
return scriptNodes;
}
static string Unescape(string htmlFromJavascript)
{
// The JavaScript has escaped all of its HTML using backslashes. We need
// to reverse this.
// DISCLAIMER: I am a TOTAL Regex n00b; I make no claims as to the robustness
// of this code. If you could improve it, please, I beg of you to do so. Personally,
// I tested it on a grand total of three inputs. It worked for those, at least.
return Regex.Replace(htmlFromJavascript, #"\\(.)", UnescapeFromBeginning);
}
static string UnescapeFromBeginning(Match match)
{
string text = match.ToString();
if (text.StartsWith("\\"))
{
return text.Substring(1);
}
return text;
}
}
And in case you're interested, here's a little demo I threw together (super fancy, I know):
class Program
{
static void Main(string[] args)
{
var scraper = new YouTubeScraper();
HtmlNode davidAfterDentistEmbedNode = scraper.FindEmbedElement("http://www.youtube.com/watch?v=txqiwrbYGrs");
Console.WriteLine("David After Dentist:");
Console.WriteLine(davidAfterDentistEmbedNode.OuterHtml);
Console.WriteLine();
HtmlNode drunkHistoryObjectNode = scraper.FindObjectElement("http://www.youtube.com/watch?v=jL68NyCSi8o");
Console.WriteLine("Drunk History:");
Console.WriteLine(drunkHistoryObjectNode.OuterHtml);
Console.WriteLine();
HtmlNode jessicaDailyAffirmationEmbedNode = scraper.FindEmbedElement("http://www.youtube.com/watch?v=qR3rK0kZFkg");
Console.WriteLine("Jessica's Daily Affirmation:");
Console.WriteLine(jessicaDailyAffirmationEmbedNode.OuterHtml);
Console.WriteLine();
HtmlNode jazzerciseObjectNode = scraper.FindObjectElement("http://www.youtube.com/watch?v=VGOO8ZhWFR4");
Console.WriteLine("Jazzercise - Move your Boogie Body:");
Console.WriteLine(jazzerciseObjectNode.OuterHtml);
Console.WriteLine();
Console.Write("Finished! Hit Enter to quit.");
Console.ReadLine();
}
}
Original Answer
Why not try using the element's Id instead?
HtmlNode videoEmbedNode = doc.GetElementbyId("movie_player");
Update: Oh man, you're searching for HTML tags that are themselves within JavaScript? That's definitely why this isn't working. (They aren't really tags to be parsed from the perspective of HtmlAgilityPack; all of that JavaScript is really one big string inside a <script> tag.) Maybe there's some way you can parse the <script> tag's inner text itself as HTML and go from there.