I'm trying to recreate an old C# application of mine that streams from an online radio station. Problem with the old one is, it loaded an entire web page just to display a certain area of it, which takes more resources that I would deem necessary. So, now I'm rewriting the entire application, and am looking for a way how I can retrieve text from the following code on the website:
<div id="now" style="visibility: visible; display: block;">
<div class="scroll" style="margin-left: 0.000px;">
<div id="title">SONG_NAME</div>
<div id="artist">SONG_ARTIST</div>
</div>
</div>
This piece is constantly updated on the page, with the name and artist of the current song.
id="title" is the name of the song and id="artist" is the artist of the song.
I would like to retrieve the name and artist every say, 10 seconds or so.
Any idea what code to use for this ?
You'll probably want to pull the entire page back. The main considerations are:
You could request the html as uncompressed and open the stream using HttpWebResponse.GetResponseStream and then read up until the end of the block you need (you'll need to analyse the text as you go), and finally call HttpWebResponse.Close to close the stream and release the connection
If the entire response is compressed it may be more efficient to get the whole thing anyway before decompressing.
You need to test which is more efficient for the specific page you are scraping.
So the usual way is to retrieve the whole html stream, then use regex to find the block you need, and just keep your code simple.
Recommendation
If you want to keep it really simple then look at HtmlAgilityPack, which is even on NuGet to use with Visual Studio 2012. It makes working with html scraping very simple.
Related
I need to find the input box in this HTML:
<div id="employeesDataTable_filter" class="dataTables_filter">
<label>
<input type="search" class="form-control input-sm"
placeholder="Filter..." aria-controls="employeesDataTable">
</label>
</div>
But for the life of me cannot - please help,
I have successfully written bags of tests and found many page element of different types but this one has stumped me.
I am very new to this and have tried
By ExecutiveSearchBox = By.XPath("//input[#type='search' and
class='dataTables_filter']");
You have encountered problems because you are selecting class attribute on input node instead on div. Try following selector:
//div[#class='dataTables_filter']//input[#type='search']
Also as #Marco Forberg mention it is good to use contain() XPath function in case if there are multiple classes provided for element:
//div[contains(#class, 'dataTables_filter')]//input[#type='search']
I hope it'll help to resolve your issue :)
To find the input element in your html snippet, you simply use
FindElement( By.CssSelector( "input" ) )
But note:
not always is the input box editable after page load is completed, it may take some time. It might be wise to wait until the box becomes editable if you want to send data to it.
not always does the input box appear immediately in the DOM. With modern UI like Angular, it might be not there immediately, might be something else for a while and only later become an input field and the like. Also here, making use of Seleniums wait functionality sure is a good idea.
I ALWAYS wait for the DOM state I expect and only after some time when the state is not achieved I throw.
I'm generating reports from a .docx document using HtmlToOpenXml.
I need to ensure that a particular html block will be displayed on the same page, for example:
<p>Video provides a powerful way to help [...]</p>
<br />
<br />
<p>To make your document look professionally [...]</p>
I took a look around the web:
Open XML Table Header Same page
Create page break using OpenXml
<w:pPr><w:keepNext/></w:pPr> had my attention but I'm not sure that I can put two paragraphs inside a larger one.
I'm aware that it will depend of the font, size and so on but it will not change.
Use "page-break-inside" style in your first block surrounding content to move it to new page. Then try to keep other blocks small enough to fit page (no matter how hard you will try, if the content is too big, it won't fit on one page). Like in example:
<div style="page-break-inside: avoid">
<p>Video provides a powerful way to help [...]</p>
<br />
<br />
<p>To make your document look professionally [...]</p>
</div>
Take a look at documentation here: CSS page-break-before Property
I am beginning to study the very nice CMS Orchard and, after reading the basic documentation, I've stumbled in a little hurdle.
I've created a new DataType, 'SpecialOffer', which has some dataparts and some text datafields:
ProductName
PhotoURL
Price
Description
I've made a list, made a widget, customized the position.info file and the Views\Fields\Common.Text.cshtml file to change the position and the way the fields are rendered (a img for the photo, prepending € to the price and so on) but this doesn't give me the right amount of customization over the generated html.
I've installed the developer shape tracing module and created an alternate Content-SpecialOffer.cshtml file.
This gives me the opportunity to easily customize the HTML around the content, but I have no idea how to get to the single DataItem fields to display them the way I want.
I mean that the whole SpecialOffer object is displayed through
#Display(Model.Content )
and, exploring the model, I've not found a way to write something as, say (pseudocode)
<div>
the
<span class="name"> #Model.Contentitem.Fields["ProductName"]</span>
camera costs
#Model.Contentitem.Fields["Price"]
euros
</div>
I've read this post on SO
Custom View For RecentBlogPosts in Orchard
but it does not solves my problem, since it uses the standard properties of blogpost.
another little question: other than in the Documentation page of project orchard and b. LeRoy's http://weblogs.asp.net/bleroy/ where can I study Orchard?
Thanks!
Edit
I've found a way to do it:
#{
dynamic offer =Model.ContentItem.SpecialOffer;
}
<div>
the
<span class="name"> #offer.ProductName.Value</span>
camera costs
#offer.Price.Value
euros
</div>
is this the right way?
Yes, that's fine.
After the docs and the RSS feed on the home page of orchardproject.net, a major source of information (better than the previous 2) is the source code for the app and modules.
I currently have 2 JavaScript variables in which I need to retrieve values from. The HTML consists of a series of nested DIVs with no id/name attributes. Is it possible to retrieve the data from these variables using HTMLAgilityPack? If so how would I go about doing so, if not what would be required, regular expressions? If the latter, please help me in creating a regular expression that would allow me to do this. Thank you.
<div style="margin: 12px 0px;" align="left">
<script type="text/javascript">
variable1 = "var1";
variable2 = "var2";
</script>
</div>
I'm assuming you are trying to scrape this information from a website? Most likely one you don't have direct control over? There are several ways to do this, I'll go easy to hard( at least as I see em):
Ask the owner (of the site). Most of the time they can give you direct access to the information and if you ask nicely, they might just let you have it for free
You can use the webBrowser control, run the javascript and then parse values from the DOM afterwards. As opposed to HttpWebRequest, this allows for all the proper values to be loaded on the page and scraped. Helpful Link Here.
Steal the source with Firebug. Inspect the website with Firebug to see which URLs are called from the background. Most likely, its using an asynchronous request to retrieving the updated information from a webservice. Using Firebug, you can view this under the NET -> XHR. Look at the request and the values returned, you can then retrieve the values your self and parse the contents from the source rather than scrape the page.
I think this might be the information you were looking for, but if not let me know and I can clarify/fix answer
I am writing some code that connects to a website, and using C#, and System.IO, reads the html file into my application, and then I continue to parse it.
What I am wanting to do now is, there is a drop down (combobox) on this site, that has 2 static values. I am wanting to have my code pick the 2nd option in the combo box and then parse the resulting html on the post back.
Any Ideas?
Ya the 2 selects are always the same.
Spamming software? Uh... No. It parses a video game website for player stats and I have full permission from the vendor to do so.
Yes I agree about the webservices, and they dont exist. I have already written the HTML parser and it works great. However, I need to pop this drop down for more data
I'd use HtmlAgilityPack and the HtmlAgilitypPack.AddOns.FormProcessor for that.
Say the code looks like this:
What color is your favorite?: <br/>
<form method="post" action="post.php">
<select name="color">
<option>AliceBlue</option>
<option>AntiqueWhite</option>
<option>Aqua</option>
</select><br/>
<input type="submit" value="Submit"/>
</form>
You would want to POST to post.php the argument "color" with the value "Aqua" (or whatever select value you want).