How to put text with headings on clipboard? - c#

If I copy text (using the cursor etc. I don't mean programmatically) from a web page or Word document, and paste it in a Word document - Word knows what text is a heading, and what is simple text. I want to do the same thing (programmatically) - put text on the clipboard and specify that part of it is heading1, part heading2... and part is simple text.
I found this class to put html text (which can have headings) on the clipboard, but was wondering:
a) That's from January 2007. Perhaps there's a simpler way now.
b) HTML only allows up to 6 heading levels. (I actually tried h7 but Word didn't recognize it.) Perhaps there's some way to have unlimited heading levels like Word does.

I don't think the clipboard Handling had updates in the latest versions of .net framework.
I think that more complex updates/adding content to a word document may be achieved using ole automation or the open xml sdk.

Related

Can I add some hidden text with paragraphs in MSWord

I am developing a MS Word add-in using c#. The code processes some very large input file and converts it to a word document. I am able to generate data and store it in heading 1, 2, ... normal etc styles.
Now I also want to store some hidden text (containing some self-generated information) with each of these paragraphs. This hidden text should by no means be visible to someone who opens this file in MSWord. But this hidden text must be retrievable to convert this word file back to the original file. Is it possible to insert some hidden fields with each paragraph (and I want to have multiple such fields - say about 4 to 5).
I am using OpenXml to create the word file.
I am very anxiously looking for some solution. Or if it is not possible to do so, then I must look into some altogether different solution.

Embed word document into another WITHOUT icon

How to embed a word document into another word document via OpenXML SDK, but showing content, not an icon of word? Such, as we do it manually in word: Insert object from file -> WITHOUT checking "Dispaly as icon"?
I've found this article, but it uses an icon. I've also tried to use OpenXML SDK Productivity Tool, but shows only generated binary data.
EDITED:
I use the following code:
DrawAspect = OleDrawAspectValues.Content
and then i add image part:
var imagePart = mainDocumentPart.AddNewPart<ImagePart>("image/x-emf", imagePartId);
GenerateImagePart(imagePart);
But my image part - is just an array of bytes of word's icon.
So, in this case happens the following: when i open generated document, it shows embedded document as an icon, but when i double click this embedded document, edit it and save changes, the embedded document is shown as a content, so maybe it's possible in some way to show this content without editing embedded document? Should i use instead of array of bytes of word's icon an array of bytes of doc's screenshot?
Not sure i described it clear, so please ask
I'm afraid what you are asking for is almost impossible.
The only difference as far as the word file is concerned between the icon and the embedded file, is the image.
When you don't use a icon Word pretty much just take a screenshot of the document you are embedding and inserts that in place of the Icon graphic.
I've uploaded an example I grabbed from a Word file I made. Found this little gem in the /media folder inside the .docx file.
So basicly, your only choice in resolving this if you can't live with the Icon is to somehow grab a picture of the word-file you want to embed and insert that instead of the Icon image.
How you'd go about that can't be pretty. First of all the open xml sdk contains no such functionality. I tried playing a bit around with office interop as well, but no luck.
I only see two possible ways to achieve this.
First one is via Interop. You'll need to install a "pretend printer" like the ones that print to PDF instead of sending it to a printer. This one however needs to print to an image format. The format of the file in the Media folder was .emf but I'm not positive thats a requirement.
Anyways, should the above somehow be possible you could embed that picture, pretty much using the example you link from Microsoft, and just change this size of the "icon" which now would be an image of the document.
Second possibility would be to open the word document as a process, set the document size to 72% (or whatever makes the document be the only one on screen on your desktop) and the grab a print screen and cut it down to just the document and the use that as your image for the embedding.
For the record, I don't recommend you do any of the above, but thoose are the only options I see.
Should someone have a better solution to this I'm all ears.
Finally, should you decide that you want to push on with this, I'll be happy to code up an example of option number 2 if you reply and tell me you'd like that.
Kaspar
There is a nice wrapper API (Document Builder 2.2) around open xml specially designed to merge documents, with flexibility of choosing the paragraphs to merge etc. You can download it from here.
Using this tool you can embed a paragraph of another word document or entire word document as per your requirement.
The documentation and screen casts on how to use it are here.
Hope this helps.

Use iTextsharp to edit pdf template without Acrofields

I have a pdf template without AcroFields and i need to replace text in it. The text is formated like this ((aFieldToReplace)), but there are also tables that need filled up with a n-numbered rows.
Is there any good tutorial, resource or sample to find?
Is there a way to replace a text in a PDF file with itextsharp? has more or less the same question but the answer ignores the "no Acrofield" part of the question.
EDIT:
To make it even harder, i have multiple templates that i can use. The templates have all there own formatting-style (font, color,...)
EDIT 2:
The purpose is to create a report with some data in a database. The data in a database is coming from several forms in a ASP.NET MVC application.
The report could have several layouts depending on the chosen template.
Templates should be addable dynamically, so i can't create the layout from scratch. I really need to get the layout from a template.
Quoting the excellent iText in Action:
In a PDF document, every character or glyph on a PDF page has its fixed position, regardless of the application that’s used to view the document.
[…]
Suppose you want to replace the word “edit” with the word “manipulate” in a sentence, you’d have to reflow the text. You’d have to reposition all the characters that follow that word. Maybe you’d even have to move a portion of the text to the next page. That’s not trivial, if not impossible.
[…]
Don’t expect any tool to be able to edit a PDF file the same way you’d edit a Word document.
PDF is a document display format. If you want templating you'll probably have to use something else.
#Frederiek:
If you can spend a bit of money, this will do exactly what you want. Check out the demo, it's quite cool. It can reflow the text, replace images, etc. Quite nice.
http://www.iceni.com/infixServer.htm
Let me know if that works for you.

How to create a word document using html written in C#

I creating a C# application that has to create a word document.
I'm using the Microsoft.Office.Interop.Word to do this and I've successfully managed to output some word documents, but creating the content trough the code is a very time consuming work.
I noted that word is able to open html pages and show it as a normal content so I created a simple test table in html and inserted it into the word document. But when I outputted the document the obvious happened: The tags where still there! Word did not format the tags as html. It just outputted exactly what I put in there.
How can I tell word to reformat the text as html?
edit: (trough the C# code of course)
edit 2: Please note that I'm parsing trough some data to make this, so I will end up with about 4 pages of the same table/html, so I will need to be able to tell word to start at the next page each time I've finished a loop. So a html-only method will probably not work.
If you're only wanting to output simple HTML content as a Word document, you could always cheat and write out the HTML content with a .doc extension.
Word will open that just fine.
If you need to add a page break, you can use a CSS page-break-before, like so:
<br style="page-break-before: always;"/>
If you're set on using Interop, having read up a little bit, this post states that you need a converter to insert HTML, and the converters are only accessible when:
you paste HTML from the Clipboard
open/insert HTML from a file
So, this answer looks like it provides a clipboard-based solution : Adding html text to Word using Interop
However, if there's any money to spend on the project, I can heartily recommend Aspose.Words which will do all of this for you.
As requested by the OP, and to make easier for others to find this solution, here it goes the answer I posted as a comment (plus extra results from testing):
When opening an HTML file, MS Word honors the CSS properties page-break-before and page-break-after. There is a caveat, however:
On "Web design" view, page-breaks are never shown (this doesn't mean that they aren't there), just like browsers don't "show" them. And Word opens html files on Web design view by default (which quite makes sense). You need to print the document or switch to some other view (typicall "Print design") to see your breaks in all their glory.
So, saving an HTML file with a .doc extension is a viable solution (also tested: Word opens it properly despite of the extension).
Note: all the testing was done on MS Word 2003 using this snippet: <html>asdf<br style="page-break-before: always;">new page!</html>
Don't build the document in code, create it in Word as template or mail merge template and the use code to merge or replace the fields data.
See this answer here
MS Word Office Automation - Filling Text Form Fields And Check Box Form Fields And Mail Merge
And See this from the mothership:
http://msdn.microsoft.com/en-us/library/ff433638.aspx
If you don't want to use an external lib, Interop is too slow for you and neither pure HTML nor mail merge template are flexible enough, you could write your content as text or HTML into one or more files (using C#), create a VBA macro in a Word document which by itself creates a second Word document, reads the content files and does any formatting you want afterwards.
You can run this macro programmatically by starting Word using the command line switch /m.
Another possible approach, if your html is xhtml (i.e. XML compliant), you could use XSLT to convert it to a Word XML format. But this would take a LOOOOOOOOOOONG time to code.
If you don't have to use HTML as the starting point you could simply build the Word XML document yourself rather than using XSLT, which would be easier. Time consuming but possible - it's something I do quite a lot in my work.
If a third party component is an option I would recommend the stuff from Aspose.
I have been pretty happy with their tools so far. The API is a little messy but everything works as one would expect.

Word Automation Multiple Paste Problem

Is there a better way to paste HTML fragments into a Word document than via the clipboard from C#?
using Word = Microsoft.Office.Interop.Word;
I'm using some code that puts HTML into the clipboard:
HtmlFragment.CopyToClipboard(changedText);
I have a selection in word (from a formfield) and I do:
word.Selection.Paste();
But sometimes it just throws a COM exception. If I add
Thread.Sleep(100);
I can get it to work, but that's not ideal.
The Insert methods look like a better option but there is no Insert from HTML.
So what's the best way to insert lots of HTML fragments into Word quickly using the automation interfaces?
Edit
Some good advice in the responses but the issue turned out to be a simple <br> tag causing word to fail on paste.
For interop, instead of Selection.Paste you'll want to use Selection.PasteSpecial with a WdPasteDataType of wdPasteHTML.
If you're using the new formats of Word (i.e. 2007/2010), you could give up interop all together and just go with WordprocessingML (using the Open XML SDK or just free-hand it with Linq and System.IO.Packaging). Or you could just it in conjunction with Interop if that was a need.
If you're using Open XML, you could just use altChunk to import HTML. Here's an example (which includes an example for HTML) at How to Use altChunk for Document Assembly. And another (fresh off the presses - it was released today): Importing HTML that contains Numbering using altChunk.
+1 to Otaku's comments, though, generally speaking, i've found it best to use the various RANGE.* functions for pasting in data than the Selection object, or pasting through the clipboard. the main reason is, if you paste through the clipboard, you scramble whatever was on the clipboard (which might not be what the user wants to happen).
the Selection object applies across all open word documents, which can get you in trouble in some cases. Unfortunately there are a few things that you just about can't do any other way.
And, there are some things (Like altering text at the current cursor position) that you MUST use the selection object for.
+1 to DarinH comments. Also something to note is that you can paste on any place in the document using Range without having to change the selection of the document (the cursor in the document).
Sometimes PasteAndFormat throws an Exception on freshly created Documents, check my reply here if that happens: https://stackoverflow.com/a/65796482/15001063

Categories