Anyone have an pointers on an easy way to consume a search.twitter.com feed with ASP.Net? I tried using the RSSToolKit, but it doesn't provide anything for parsing the and other tags in the feed.
For example: I want to parse this feed: http://search.twitter.com/search.atom?q=c%23 and make it appear on a page just like it does in the twitter search results (links and all).
Depending on what sort of hammer you prefer, you could use:
An XSL transform, easy if you know
XSL, painful if you've never used it
Load it into an XmlDocument or
XPathDocument, then iterate the nodes you want
Put it into an XmlDataSource and
then bind that to a repeater
Many other options too, they're just some of my preferred hammers
Related
i want to extract some data from a website. f.e. (https://www.chefkoch.de/rezepte/drucken/512261146932016/Annas-Rouladen-mit-Seidenkloessen.html). The text on the left side an the ingredients table on the right.
i tried several ways like with a webclient and regex the parts but the problem was here that if the table has more than one list like in my example i cant split them.
i also tried it with an htmldocument and get the elements but the
elements doesnt have an id; only a class.
so is there any way to get these two thing out of the website?
im pretty new too html and that kind of stuff..
You should consider using some sort of web scraping library like https://ironsoftware.com/csharp/webscraper/ or Selenium. Doing so, you'll be able to target HTML elements and css classes (to extract the data).
I am trying to automate the testing of web forms. To that end I need to know how to use C# to dynamically locate input tags within the HTML page then assign values to them. I don't want to use XPath, because each time I will be using a different web form. I want to pass the web form's URL to Selenium and then automatically populate the fields. I've heard of HTMLAgilityPack. Would that help me? If so, how can I use it?
I appreciate your help.
I may have missed a crucial part of your question, however, have you looked at Selenium WebDriver?
If you write a test that handles a generic web form you can back your test by data that is dynamic. Therefore you can cater for changes in the page by using Data Driven Tests. I've written tests for many pages and there are always common actions, but I cater for each page differently though as there are different things on that page!
[EDIT]
Following on from your comments, I think looking into Selenium would be a good idea. The way to handle different pages is to have these element definitions ready in a 'definitions' class for each page. That way once you know what the page is, you just use the correct class for your definitions. It is best to know what elements you are going to be interacting with in your tests before the tests run. The point of automated UI testing is for a known set of actions to be performed and a correct result achieved.
I would suggest you look up some tutorials such as this and you can see my blog
though I wrote this when I was initially learning WatiN and then replaced it with Selenium (I like it better :P).
Html Agility Pack
This is an agile HTML parser that builds a read/write DOM and supports
plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor
XSLT to use it, don't worry...). It is a .NET code library that allows
you to parse "out of the web" HTML files. The parser is very tolerant
with "real world" malformed HTML. The object model is very similar to
what proposes System.Xml, but for HTML documents (or streams).
HtmlDocument doc = new HtmlDocument();
doc.Load(path);
foreach (HtmlNode input in doc.DocumentNode.SelectNodes("//input"))
// Your Code...
I wish to read the two Combobox lists from this site:
http://coinmill.com/
Then I wish to recreate that in C# 2010 Combobox.
Thank you.
Use the HtmlAgilityPack to download and parse the page at coinmill. Write an xpath expression or traverse the DOM to find the select tag you´re looking for in the HtmlAgilityPack document. Get the list of options, and load them into your list.
Mind you: this code can (and probably will) break every time coinmill updates their website. Your xpath or dom traversing is basing itself on the current layout of that page, and it will change. You would be better off looking for a webservice that offers the data you are looking for.
Menno
First of all, I hope my question doesn't bother you. I really need to get and idea of how I can accomplish that, but unfortunatelly, I'm really a beginner, I'm crawling when it comes to programming. I'm struggling to learn it the best way I can. I'll thank you for any help you give me.
Here's the task: I was ordered to find a way to collect some data from a website using a c# application. This will be done everyday, in order to update the data which we'll use to calculate some financial index.
I know my question might sound vague, anyway, even telling me how I can be more precise will help me. I know I seem to know desperate, but putting appart all the personell issues, my scholarship kind of depends on it.
Thanks in advance! (Please, don't mind the bad English, I'm brasilian and my English might not be that good yet.)
First, your English is fine. In fact, I thought you were a native speaker until you said otherwise.
The term you're looking for is 'site scraping'. Observe this question: Options for HTML scraping?. The second answer points to an HTML agility pack library you can use.
Now, there are two possibilities here. The first is you have to parse the HTML and scrape your data out of it. This is more computationally intensive and depends on the layout of the page. If they change the way the site looks, it could break the scraper.
The second possibility is they provide some XML or JSON web service you can consume. In this case you aren't scraping anything, but are rather using a true data feed. If the layout of the site changes, you will not break. Whether your target site supports this form of data feed is up to the site.
If I understand your question, you're being asked to do some Web Scraping, where you 1) download the contents of a web page and 2) try to parse data from that content.
For step #1, you should look into using a WebClient object in C# to download the HTML from the web page. You can give a WebClient object the URL you want to download the content from and obtain a String containing the content (probably HTML) of the URL.
How you go about doing step #2 depends on what content is present at the web site. If you know of certain patterns you're looking for in the HTML, you can search the HTML string using various methods. A more general solution for parsing HTML data can be found through using the Html Agility Pack, which will let you handle the HTML as a tree structure (DOM).
Use the WebClient class to get the page.
Turn the html into xml.
Use XPath to select the data you are interested in.
Ok, this is a pretty straightforward app design, and a lot of the code exists that you can reuse. Since you're a beginner, I'll break down into steps of what you need to do and recommend approaches.
1) You will use classes from System.Net to pull the web pages (WebClient being the easiest to usse). You will want to have this part of the program run on a timer if you can (using the scheduled jobs feature of the OS) and have it just pull the pages and drop them in a folder.
2) You have a second job which will run separately, pulling unread files from that folder, parsing them (using the HtmlAgility pack library is best) and then storing them in an index of some kind (Lucene is best for that)
3) You have a front end application of some sort (web or desktop) which queries that index for the information you're looking for.
I'm trying to make an extended version of a WebBrowser with stuff like highlighting text and getting properties or attributes of elements for a Web Scraper. WebBrowser functions doesn't help much at all, so if I could just find a way from HtmlElement to a JavaScript element (like the one returned by document.getElementById), and back, and then add JavaScript functions to the HTML from my application, it would make the job a lot easier. Right now I'm messing with the HTML of the code programmatically from C# and it's very messy. I was thinking about setting some unique Id to each HTML element from my program and then call the JavaScript document.getElementById to retrieve it. But that won't work, they might already have an Id assigned and I will mess up their HTML code. I don't know if I can give them some made up attribute like my_very_own_that_i_hope_no_web_page_on_the_world_ever_uses_attribute and then figure out if there is some JavaScript function getElementByWhateveAttributeIWant but I'm not sure if this would work. I read something about expansion or extended attributes on the DOM documentation in msdn but I'm not sure what that is about. Maybe some of you guys have a better way.
It would be much easier to use some rendering engine like trident to get the data from html document. Here is the Link for trident/MSHTML. you can do google and can have samples in c#
This is not nearly as hard as you imagine. You don't have to modify the document at all.
Once the WebBrowser has loaded a page, it's kept internally as a tree with the document node at the root. This node is available to your program, and you can find any element you want (or just enumerate them all) by walking the tree.
If you can give a concrete example, I can supply some code.