In Google Chrome, when you open an xml file, you get a formatted (pretty) view of the xml if there is no stylesheet referenced in the xml file itself.
I simply want to do this in my application, which uses Awesomium.
I am using the Awesomium.Windows.Forms.WebControl
I don't want to roll my own if I can avoid it.
Thanks!
I'm doing this in an internal tool for my development team. I format the XML with an xsl that colors and indents everything, then update the web control with the resulting HTML.
Check out this link for formatting XML, the CSS styles are built in, so you can update styles colors as you wish
See the "XML to HTML Verbatim Formatter with Syntax Highlighting" project on this page.
http://www2.informatik.hu-berlin.de/~obecker/XSLT/
Related
Context
What we need is to capture some user input (formatted text) from a WPF application and output a PDF with some stored images AND the user input on the last page.
What we've tried
We create the WPF app, add the iTextSharp library, recover the images from the DB and add it to the PDF. That's working. Now, for the user input we added a RichTextBox control from the Extended WPF Toolkit. We added this control mainly because of its binding properties and formatters. Basically we can bind the rich content of the control to a property. That binding is working. We already have the RTF format, as (in example):
"{\rtf1\ansi\ansicpg1252\uc1\htmautsp\deff2{\fonttbl{\f0\fcharset0 Times New Roman;}{\f2\fcharset0 Segoe UI;}}{\colortbl\red0\green0\blue0;\red255\green255\blue255;}\loch\hich\dbch\pard\plain\ltrpar\itap0{\lang1033\fs18\f2\cf0 \cf0\ql{\f2 {\ltrch This is the }{\b\ltrch RichTextBox}\li0\ri0\sa0\sb0\fi0\ql\par}}}"
Problem
The thing is, the actual output of the PDF is precisely that previously shown RTF, but the expected output (for the example) must be:
"This is the **RichTextBox**\r\n"
This is happening obviously because we are inserting the binded RTF from the control as it comes to the PDF, the thing is: How can we add that content and specify its RTF?
PS. If you have other working idea or solution (without using a richtextbox, or something like that) it's welcome. Thanks in advance.
Unfortunately, iTextSharp does not directly support RTF format anymore. I would suggest to convert the RTF fragment to XHTML first and then import the resulting XHTML into the final document (it seems that the official HTML support is gone away, so XHTML is the only alternative in this case).
In short, I would suggest to:
convert the RTF fragment into XHTML;
place the XHTML stream into a new iTextSharp document (or directly into the final document, if you wish);
add the content of the aforementioned document into the target document you are going to export as PDF.
UPDATE
There is no built-in mechanism to convert from RTF to XHTML but many open source project exist; I would start coupling this RTF to HTML converter with the HTML Agility Pack (which will in turn convert your HTML to XHTML).
Frankly, however, the whole flow is a bit complex to follow and I would perhaps opt for a simpler solution, maybe by using an HTML editor (alternative) directly in your project or by reverting to the FlowDocument as others have suggested.
WPF already had a good FlowDocument and it does good rendering. So we created Xaml to PDF converter, its in beta, but most String, Table and Images are converted to PDF successfully, its an open source project available at, http://xamltopdf.codeplex.com/ , RTF can give you FlowDocument and you can convert it to XAML and pass it on to XamlToPDF converter.
is there any c# library or any free tool which can convert a html file with many referenced resources into a one "all-in-one" html file?
The main task is to have only one file, it means I need to include
Javascript external files - this will probably mean replace all 'script' tags
with 'src' attribute by 'script' tags with content read from referenced file.
Images - replace src="picture.png" with data uri - something like src="data:image/png;base64,encodedContent..."
CSS files
may be i forgot something :)
This HTML file must be readable in all browsers, that's why I cannot use MHT file format (unreadable on Safari, iPad...)
You can use HTML Agility Pack to go read/write the html document. HTML Agility supports XPath so you can get a list of nodes you want to modify.
Using this, changing the attribute value of image tags should be easy. You can also get a list of external js references, read them and then update the script tag accordingly.
Microsoft Word interoperability classes will let you get at a property called WordOpenXML. This represents a package that will be stored - zipped up - in a .docx file and can be opened by Microsoft Word. However, is there a way to convert this Package to other formats, notably HTML?
I read in an answer to an old question that "Word 2007 has an API that you can use to convert to HTML. [...] You can find documentation around the API, but I remember that there is a convert to HTML function in the API." I'm not 100% sure which API that guy is talking about but perhaps it's System.IO.Packaging.Package or something similar. I can't seem to find any "convert to HTML function"; does anyone know how you can convert a Package format Word document into HTML?
The API in question is probably the Save method on the document; when a file type of HTML is chosen, Word transforms the document into HTML, and applies the appropriate styling.
Chances are, given that the docx format is XML, there is an XSLT transformation of some sort going on; this is just speculation, but it's not far-fetched, as XSLT is commonly used to create HTML from XML.
That said, what you are looking for probably does not reside in the Package class, nor should it. The Package class is used for creating packages of content, not with the transformation of that content.
However, there's nothing stopping you from providing the transformation of that content; you can get the XML that is the basis of the Word document and then apply your own XSLT which would produce the HTML that you want.
I have a .doc document. In this document i have some blanks for data. For example:
"car_id" is the best car in "car_country".
I need to open this doc file and change this blanks ("car_id", "car_country") to data from some object.
How to do this?
I would use DocumentFormat.OpenXml.dll. You can find it here. OpenXMLSDKv2.msi will add assembly. You will just need to add reference to DocumentFormat.OpenXml. And the OpenXMLSDKTool.msi will install a usefull tool that will display xml structure of .docx (for example) document.
This web page has some very good samples of Word automation using C#. Specifically, look at section 6 in the web page for a Mail Merge example.
http://www.c-sharpcorner.com/UploadFile/amrish_deep/WordAutomation05102007223934PM/WordAutomation.aspx
I creating a C# application that has to create a word document.
I'm using the Microsoft.Office.Interop.Word to do this and I've successfully managed to output some word documents, but creating the content trough the code is a very time consuming work.
I noted that word is able to open html pages and show it as a normal content so I created a simple test table in html and inserted it into the word document. But when I outputted the document the obvious happened: The tags where still there! Word did not format the tags as html. It just outputted exactly what I put in there.
How can I tell word to reformat the text as html?
edit: (trough the C# code of course)
edit 2: Please note that I'm parsing trough some data to make this, so I will end up with about 4 pages of the same table/html, so I will need to be able to tell word to start at the next page each time I've finished a loop. So a html-only method will probably not work.
If you're only wanting to output simple HTML content as a Word document, you could always cheat and write out the HTML content with a .doc extension.
Word will open that just fine.
If you need to add a page break, you can use a CSS page-break-before, like so:
<br style="page-break-before: always;"/>
If you're set on using Interop, having read up a little bit, this post states that you need a converter to insert HTML, and the converters are only accessible when:
you paste HTML from the Clipboard
open/insert HTML from a file
So, this answer looks like it provides a clipboard-based solution : Adding html text to Word using Interop
However, if there's any money to spend on the project, I can heartily recommend Aspose.Words which will do all of this for you.
As requested by the OP, and to make easier for others to find this solution, here it goes the answer I posted as a comment (plus extra results from testing):
When opening an HTML file, MS Word honors the CSS properties page-break-before and page-break-after. There is a caveat, however:
On "Web design" view, page-breaks are never shown (this doesn't mean that they aren't there), just like browsers don't "show" them. And Word opens html files on Web design view by default (which quite makes sense). You need to print the document or switch to some other view (typicall "Print design") to see your breaks in all their glory.
So, saving an HTML file with a .doc extension is a viable solution (also tested: Word opens it properly despite of the extension).
Note: all the testing was done on MS Word 2003 using this snippet: <html>asdf<br style="page-break-before: always;">new page!</html>
Don't build the document in code, create it in Word as template or mail merge template and the use code to merge or replace the fields data.
See this answer here
MS Word Office Automation - Filling Text Form Fields And Check Box Form Fields And Mail Merge
And See this from the mothership:
http://msdn.microsoft.com/en-us/library/ff433638.aspx
If you don't want to use an external lib, Interop is too slow for you and neither pure HTML nor mail merge template are flexible enough, you could write your content as text or HTML into one or more files (using C#), create a VBA macro in a Word document which by itself creates a second Word document, reads the content files and does any formatting you want afterwards.
You can run this macro programmatically by starting Word using the command line switch /m.
Another possible approach, if your html is xhtml (i.e. XML compliant), you could use XSLT to convert it to a Word XML format. But this would take a LOOOOOOOOOOONG time to code.
If you don't have to use HTML as the starting point you could simply build the Word XML document yourself rather than using XSLT, which would be easier. Time consuming but possible - it's something I do quite a lot in my work.
If a third party component is an option I would recommend the stuff from Aspose.
I have been pretty happy with their tools so far. The API is a little messy but everything works as one would expect.