Read data from website, parse and display data in textviews [duplicate] - c#

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the best way to parse html in C#?
I'm trying to write some code which uses a HttpWebRequest with GET method (or any suggested faster function), find a keyword on the page and then display what comes after it in various textviews.
The homepage it looks up will always be the same and will always find the same lines but with different data.
I've read about something called HtmlAgilityPack a lot but I cannot figure out if I can use it for this, nor how to.
Is there any faster functions to use to just get and find data within source?
Can I use HtmlAgilityPack, if so how (example please)?
Is there any easier way this can be done?
cheersnox

Yes you can use HtmlAgilityPack, if you want to extract text from tags
HtmlAgilityPack is an HTML parser that builds a read/write DOM from “real world” HTML files. It supports XPATH or XSLT and is tolerant with "real world" malformed HTML
In one line it use's XPath queries that real helps in extracting data quickly

Related

Fetching images with c# htmlagilitypack [duplicate]

This question already has answers here:
Fetching google images using htmlagilitypack
(3 answers)
Closed 8 years ago.
I am trying to fetch images from sites like Google images and yandex (a russian search engine).
I use xpaths for the purpose. While, on yandex, i am able to fetch the image thumbnails (ie, their urls), i am not able to fetch the bigger image (which is possibly javascript generated, when one clicks on the image).
On google images, I am not able to fetch even the thumbnails. The xpath that i use for google images is:
#"//div[#class='rg_di']/img"
Can anybody help me with this?
Try this XPath instead: #"//div[#class='rg_di']//img". You should note that the img tag is not directly under the div tag as your XPath expression states (// instead of /).

looking for a way to retrieve the information from an <a href> tags. E.g. <a href="www.facebook.com"> [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to use HTML Agility pack
I am looking for a way to retrieve the information from an tags. E.g. is there anyway to retrieve www.facebook.com? I am using c# and i tried using htmlagilitypack but i can't seems to find a method to retrieve it.
Would appreciate it very much =)
I would ultimately use javascript in this case. Then you can target the dom object and reference the attribute "href". However in code behind, you will probably need to use a regular expression to mind out the href.

C# Clean Rss Description [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do you convert Html to plain text?
I want to show some Rss feeds in my application. But I do not want to show whole description.
I just want to show first couple of sentence of rss description.
Since lots of Rss feeds are coming as html, I want to convert to plain text and get sub string out of it.
Is there any way to html -> plain text in C#?
Thanks.
Please see filbys answer on How do you convert Html to plain text? for how to do this. You should also see Judah Himango's answer on the same question link. He basically says that you should use the HtmlAgilityPack which will do it for you.
Hope this helps!

Silverlight: Is it possible to syntax-colour XML in a textBox? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Silverlight XML editor / syntax highlighting
Hello,
I have some XML in my Silverlight Application that I store in a String and wish to output to the user. The xml is already "pretty printed" in the sense that it is formatted with indentations, but it would make it much clearer to read if I could also add syntax colouring to it.
Can this be done? How do I go about doing it? (please suggest a library or something)
Come to think of it, I'm not even sure if it's at all possible to output coloured text in a .NET interface...
Thank you for any insight!
(PS: I don't care which version of Silverlight)
I looked and did not find a control that would do XML syntax highlighting for a WinForms RichTextBox. This was for an XPath evaluator tool I built. The WinForms RichTextBox has the capability to display colors of course, but I couldn't find one smart enough to highlight XML syntax.
I ended up building one. The approach I used would probably work for WPF as well.
This is the explanation for how I got there:
WinForms RichTextBox : how to reformat asynchronously, without firing TextChanged event

Convert Rtf to HTML [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
We have a crystal report that we need to send out as an e-mail, but the HTML generated from the crystal report is pretty much just plain ugly and causes issues with some e-mail clients. I wanted to export it as rich text and convert that to HTML if it's possible.
Any suggestions?
I would check out this tool on CodeProject RTFConverter. This guy gives a great breakdown of how the program works along with details of the conversion.
Writing Your Own RTF Converter
There is also a sample on the MSDN Code Samples gallery called Converting between RTF and HTML which allows you to convert between HTML, RTF and XAML.
Mike Stall posted the code for one he wrote in c# here :
https://learn.microsoft.com/en-us/archive/blogs/jmstall/writing-an-rtf-to-html-converter-posting-code-in-blogs
UPDATED:
I got home and tried the below code and it does not work. For anyone wondering, the clipboard does not just magically convert stuff like I'd hoped. Rather, it allows an application to sort of "upload" a data object with a variety of paste formats, and then then you paste (which in my metaphor would be the "download") the program being pasted into specifies its preferred format. I personally ended up using this code, which has been recommended previously, and it was enormously easy to use and very effective. After you have imported the code (in VStudio, Project -> Add Existing Files) you then just go html to rtf like this:
return HtmlToRtfConverter.ConvertHtmlToRtf(myRtfString);
or the opposite direction:
return RtfToHtmlConverter.ConvertHtmlToRtf(myHtmlString);
(below is my previous incorrect answer, in case anyone is interested in the chronology of this answer haha)
Most if not all of the above answers provide comprehensive, often Library-based solutions to the problem at hand.
I am away from my computer and thus cannot test the idea, but one alternative, cheap and vaguely hack-y method would be the following.
private string HTMLFromRtf(string rtfString)
{
Clipboard.SetData(DataFormats.Rtf, rtfString);
return Clipboard.GetData(DataFormats.Html);
}
Again, not totally sure if this would work, but just messing around with some html on my iPhone I suspect it would. Documentation is here. More in depth explanation/docs RE the getting and setting of data models in the clipboard can be found here.
(Yes I am fully aware I'm here years later, but I assume this question is one which some people still want answered).
If you don't mind getting your hands dirty, it isn't that difficult to write an RTF to HTML converter.
Writing a general purpose RTF->HTML converter would be somewhat complicated because you would need to deal with hundreds of RTF verbs. However, in your case you are only dealing with those verbs used specifically by Crystal Reports. I'll bet the standard RTF coding generated by Crystal doesn't vary much from report to report.
I wrote an RTF to HTML converter in C++, but it only deals with basic formatting like fonts, paragraph alignments, etc. My translator basically strips out any specialized formatting that it isn't prepared to deal with. It took about 400 lines of C++. It basically scans the text for RTF tags and replaces them with equivalent HTML tags. RTF tags that aren't in my list are simply stripped out. A regex function is really helpful when writing such a converter.
I think you can load it in a Word document object by using .NET office programmability support and Visual Studio tools for office.
And then use the document instance to re-save as an HTML document.
I am not sure how but I believe it is possible entirely in .NET without any 3rd party library.
I am not aware of any libraries to do this (but I am sure there are many that can) but if you can already create HTML from the crystal report why not use XSLT to clean up the markup?
You can try to upload it to google docs, and download it as HTML.

Categories