How to parse HTML text and add it to MigraDoc Document

How to parse HTML text and add it to MigraDoc Document - c#

I need to get a text that's being written by a user (in CKEditor HTML), and then add that text to a MigraDoc document, as a paragraph or whatever I need it to be.
My idea was converting the text to an MDDDL document (in memory) and add it to the document. But I don't know if there are any DLLs that permit that behaviour.
So, my question is, can someone give me pointers or advice on how I could make this happen? Should I parse the HMTL text? If so, to what should I parse it? How can I add it afterwards?

Neither PDFsharp nor MigraDoc can parse HTML, so either write your own code or try to find a third-party library (which may not exist yet).
I would probably convert the HTML directly to MigraDoc document objects in memory.

MigraDoc / PDFSharp can't do this.
But, you could use HtmlAgilityPack nuget and then use its htmlDoc.DocumentNode.Descendants() to pull out the pieces of text from html in a flat list kind of a structure, and node.ParentNode.Name to figure out the tag that the text is wrapped in. And then insert the text into your MigraDoc document with something like .AddFormattedText() and apply custom MigraDoc styles to it - i.e. if the parent tag is "strong" then apply a MigraDoc style where Font.Italic = true; etc..

Related

MigraDoc to display RichText String

I am converting our current PDF export using ITextSharp over to Migradoc. We currently render Rich Text to ITextSharp from a string stored in the database - for example:
<p><strong style=\"color: rgb(230, 0, 0);\"><u>test</u></strong></p>
ITextSharp is able to pick out the elements of this and render appropriately using (I think) Cell.AddElement(ElementListItem).
Ideally I am looking for something identical to this for MigraDoc but any help on RTF in MigraDoc would be greatly appreciated.

MigraDoc does not parse HTML.
You can use the AddFormattedText method of the Paragraph class to mix various rich formats within one paragraph, but parsing the HTML is up to you.
See also:
http://pdfsharp.net/wiki/HelloMigraDoc-sample.ashx

Manipulate word document in C# and ASP.net

I want a way to manipulate a word document.
The document is a word template .DOTX file, and I need to hide and/or show specific paragraphs of the document based on conditions on my ASP.NET/C# application.
For example, if i entered the first and last names in my form, the generated word doc will show both fields, else of the last name is not entered it will not be shown in the generated document. I already know how to generate the doc but I dont know how to hide and/or show specific elements.

I am using Docentric Toolkit for this kind of scenario. The toolkit itself is using OpenXML in the background, so no Word installation is necessary. I suggest you take a look at the example for conditional content here.
In your ASP.NET application you prepare a data model. This model should also include a boolean field where you set true/false value depending on whether you want to show conditional content in the document or not. The document template contains conditional tag, which is used as a placeholder for conditional content. Other tags (placeholders) can be nested in conditional content.

How do I insert an image in a word document as footer

I need to create and insert a QR code into existing word documents using .NET.
I've done the QR generation part. The 2 things I need to accomplish are:
Inserting the QR code in the footer of an existing word document (preferably using Open XML).
Each page of the word document has a unique QR code. This means that each footer would have to be different. (I could eliminate the footer and place the QR code as part of the body, but that word make flow of text complicated.)
Is it possible to accomplish this?

I haven't done this, but I believe that what you will need to do is
put each page in a separate Word section (and that means, in effect,
that you will need to decide what your page size and layout is)
create a footer containing one QR code to find out what XML Word
expects, and what type of image data you need to store in the .docx
(assuming that you are not attempting to store your image data
externally in spearate files).
create a footer for each section (and ensure that the footers are
not "linked to previous"), replicating the format you discovered in
point (2)
create a part for each QR code image, and a relationship to that
part
What I am even less sure about is whether Word will insist that you also store each image in another format (e.g. Windows Metafile or Extended metafile format). My guess is that Word will generate what it needs from your .jpg (or whatever). Or maybe you can use "AltChunks" in some useful way here.
The background to this is that if it were a .doc format document, you could have created a single footer containing a set of nested field codes that used the { PAGE } page number field to link to the correct image for each page - e.g.
{ INCLUDETEXT "c:\\myqrcodes\\qr{ PAGE }.jpg" }
or more likely, the slightly more complicated
{ PAGE \#"'{ INCLUDETEXT "c:\\myqrcodes\\qr{ PAGE }.jpg" }'" }
But if you try to save that as .docx format, even in compatibility mode, when you close and re-open, I think you wil just see one image on all pages. Further, even though that approach works with .doc format, it only works if the external image files are actually there and located at absolute addresses in the file system. If they are located at releative addresses (there is a way to do that) you or the end user will probably have to update the footer field codes to get the correct results.

How to build custom sentences with OpenXML

I would like to know if there is a way to "play" with sentences in a .docx.
Here's what I need to do:
I have a paragraph in a document
exemple of my paragraph:
This is a paragraph that I need to format based on some conditions and I can't figure how to do this with openxml sdk.
end of exemple.
So based on a condition that I evaluate in c# I would like to add/remove the text. Another thing that you should know id that I would like the product owner to change the text of the document.
Basicly what I want to acheive is having a template document that my product owner can edit at will but in this document the text might change based on for whom the document is produce.
Thanks

You can create a template document in Word and create a content control in each place you want the text to be dynamically inserted. For each created content control, you should set unique Tag property value.
In your c# application code, you can then find a content control by its tag quite easily. After you find it, you can save its parent node, remove the content control and insert the text you want as a child paragraph element for the parent node saved earlier.
You can add content controls by using Developer tab. If you can't see it, you can use steps described here in order to show it: http://msdn.microsoft.com/en-us/library/bb608625.aspx .

There are some complexities involved in replacing text because text can be broken into multiple runs. If you want your owner to be able to edit the document, this is a problem you must solve. Luckily, there is sample code that you can use as part of the PowerTools for Open XML. PowerTools were written as cmdlets for PowerShell, but you can take the core C# code for your own programs. The TextReplacer.cs module should give you a good starting point. You will need some of the other modules that it depends on, like PtOpenXmlDocument.cs, PtOpenXmlUtil.cs and PtUtil.cs. I hope that helps.

read word file with text and picture

i have a word(Office) file. this file content text and picture.
how can read this file and show in <textarea> </textarea>;

The best way to display rich content like word document on UI is through html. You can export your word document to HTML and render it to asp.net UI controls. If you prefer, textarea, you have to implement custom textarea to support images from word-html file.
Also, you can use WebBrowser control to display this word-html file instead of textarea.

I don't believe you can read this into a <textarea> as you ask. (I will watch to see if somebody else shows how, because I want to see that too...) I believe the closest result you will get is to open the document into an <iframe> with the application/msword content type. If you are looking for some flexiblity in this, wrap the space in a <div> and swap out <textarea> for <iframe> at the server when appropriate.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to parse HTML text and add it to MigraDoc Document - c#

Neither PDFsharp nor MigraDoc can parse HTML, so either write your own code or try to find a third-party library (which may not exist yet). I would probably convert the HTML directly to MigraDoc document objects in memory.

Related

MigraDoc to display RichText String

Manipulate word document in C# and ASP.net

How do I insert an image in a word document as footer

How to build custom sentences with OpenXML

read word file with text and picture

Categories

Resources