How to include all resources into one html file? - c#

is there any c# library or any free tool which can convert a html file with many referenced resources into a one "all-in-one" html file?
The main task is to have only one file, it means I need to include
Javascript external files - this will probably mean replace all 'script' tags
with 'src' attribute by 'script' tags with content read from referenced file.
Images - replace src="picture.png" with data uri - something like src="data:image/png;base64,encodedContent..."
CSS files
may be i forgot something :)
This HTML file must be readable in all browsers, that's why I cannot use MHT file format (unreadable on Safari, iPad...)

You can use HTML Agility Pack to go read/write the html document. HTML Agility supports XPath so you can get a list of nodes you want to modify.
Using this, changing the attribute value of image tags should be easy. You can also get a list of external js references, read them and then update the script tag accordingly.

Related

View Xml in Awesomium

In Google Chrome, when you open an xml file, you get a formatted (pretty) view of the xml if there is no stylesheet referenced in the xml file itself.
I simply want to do this in my application, which uses Awesomium.
I am using the Awesomium.Windows.Forms.WebControl
I don't want to roll my own if I can avoid it.
Thanks!
I'm doing this in an internal tool for my development team. I format the XML with an xsl that colors and indents everything, then update the web control with the resulting HTML.
Check out this link for formatting XML, the CSS styles are built in, so you can update styles colors as you wish
See the "XML to HTML Verbatim Formatter with Syntax Highlighting" project on this page.
http://www2.informatik.hu-berlin.de/~obecker/XSLT/

HTML Parser for local HTML files

I need to be able to parse an HTML template file (with the intention of injecting an SVG element into a html file, then converting it to pdf via wkhtmltopdf).
I know about the HTML Agility Pack, but it seems incapable of parsing local files (attempts to use file:// URIs have caused it to throw exceptions).
So, can anyone recommend a C# HTML parser for local HTML files?
HTML Agility Pack is fine for local files, check out this example from the docs.
Alternatively, load the content from the file into a string using something like File.ReadAllText then pass it into HtmlDocument.LoadHtml(string html).
How about using the HtmlDocument.LoadHtml function of HTML Agility Pack?
You could use the File.ReadAllText to read the text into memory and pass it to the LoadHtml function.

Parse HTML Page, include all css styles

I want to send a complete html page as an email and want to include all the css styles into the email. Is there any library that creates me one html page with all the css styles correctly included. (Conside you can import css files which also have to be opened and included.)
Any help is appreciated.
The closest I came to this was having to send a control without includes, and I built this as a server control, read the css files ( and js files ) and write them out. However for an entire page, you might have more difficulty.
I do not believe that there is anything to do this. If you can read the entire page code, doing find and replaces may be your easiest answer. Find the csss tag, replace the inner contents with the values from the file in the tag.
You could try using this: http://martinnormark.com/move-css-inline-premailer-net
Im in the process of testing it myself in order to generate some Word documents with inline styles...i will add the results later...
Update:
It works althought Word 2010 applies some of the inline styles as it likes...didnt try with previous versions of word

Using C# how do I get a list/array of all script tags (and their contents) on a webpage?

I am using HttpWebRequest to put a remote web page into a String and I want to make a list of all it's script tags (and their contents) for parsing.
What is the best method to do this?
The best method is to use an HTML parser such as the HTML Agilty Pack.
From the site:
It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
Sample applications:
Page fixing or generation. You can fix a page the way you want, modify the DOM, add nodes, copy nodes, well... you name it.
Web scanners. You can easily get to img/src or a/hrefs with a bunch XPATH queries.
Web scrapers. You can easily scrap any existing web page into an RSS feed for example, with just an XSLT file serving as the binding. An example of this is provided.
Use an XML parser to get all the script tags with their content.
Like this one: simple xml

Parse and extract required text from text files using C#

I have some text files with some useful data wrapped in between HTML tags like <td>, <span>, etc. I want to write a program which extracts the data in between the tags.
The text file contains other junk data too. I would also like to store these extracted data into SQL Table. Anyone who can guide me in right direction?
Don't mention Regex and HTML in the same question on this site -- it's a sin!! ;-)
You likely want the HTML Agility Pack.

Categories