Reading Rich Text from Excel Range (cells) with Office Interop

Reading Rich Text from Excel Range (cells) with Office Interop - c#

(This question was formerly titled "C# / WPF : Going from Excel Interop "Range" to WPF "FlowDocument"" however I've made progress on that front that allows me to restrict my question. I'm leaving the original question below so existing answers will still make sense.)
I'm using Office Interop to read the contents of cells in an Excel worksheet. Some of those cells contain Rich Text (for example some words are italicized but not the whole cell) and I would like to capture them as RTF so I can then display them into WPF controls.
I have been able to obtain the RTF contents of cells using the clipboard API, where I use Excel Interop to copy a Range of one cell to the clipboard, and then read the clipboard, like so:
// Step 1 : retrieve the RTF from the clipboard as a string
string txt = Clipboard.GetText(TextDataFormat.Rtf);
// Step 2 : create a FlowDocument object and a TextRange object:
FlowDocument doc = new FlowDocument();
TextRange tr = new TextRange(doc.ContentStart, doc.ContentEnd);
// Step 3 : convert the clipboard string to a stream
byte[] byteArray = Encoding.ASCII.GetBytes(txt);
MemoryStream stream = new MemoryStream(byteArray);
// Step 4 : load that stream into TextRange
tr.Load(stream, DataFormats.Rtf);
If I then assign "doc" to the Document property of, say, a RichTextBox control, it'll display the content of the Excel cell with the exact same formatting as Excel does, down to colored words and font sizes.
However, this is extremely slow. It may take minutes to load a thousand cells that way, even if most are empty.
So here's my updated question : clearly Excel has a mechanism for returning the RTF content of an Excel cell, otherwise my Clipboard code couldn't work. But is there are more efficient way than the Clipboard to exploit that mechanism ? Ideally through Interop ?
Original question :
This may be an unusual question but as I'm quite new to C#, WPF and Interop, I might be going about things the wrong way so don't hesitate to offer a better approach. Here's what I'm trying to do :
I'm coding a WPF application that uses Office Interop to grab the contents of cells from an Excel worksheet. That content is text which may contain some formatting (for example some words are in bold, others are in italics). The application then displays that content in a "FlowDocumentScrollViewer" control on its GUI.
I want this "FlowDocumentScrollViewer" control to render the content from the Excel cell exactly as it appears in Excel, with formatting and everything.
The best I've managed so far is to display the cell's content without any formatting. Here's how this works : I use Office Interop to read a Range of cells from the worksheet and take their Value2 property. Value2 is of type "object". Then I create a FlowDocument object out of it, like so:
FlowDocument doc = new FlowDocument();
Paragraph p = new Paragraph(new Run(Variable_containing_a_Value2.ToString()));
doc.Blocks.Add(p);
And then I store this FlowDocument into the "FlowDocumentScrollViewer" Document property.
Now since I'm using "ToString()" on the Value2 I'm not surprised that any formatting information this object might contain disappears past this point.
My problem is, I haven't been able to find a way to create that FlowDocument, from that Value2 object, that preserves formatting.
Now, I know there has to be a way to get that information through, because when I copy my Excel cell and paste it in Word, for example, then the formatting is carried through. I just don't know how.
Help me Obiwans, you're my only hope, as even Google has failed me.

It seems to me that you have at least a couple of options that will work better than just copying the cell contents as text. The Range object has Copy() and CopyPicture() methods, which you can use to have Excel copy the contents of the range to the clipboard.
The basic Copy() method should (I haven't tested it) put the contents of the cell into the clipboard in a variety of formats, including RTF. And you should be able to get the RTF and put that into the FlowDocument element.
Using RTF, you may still not get exactly the representation as seen in Excel. The only way to do that is to have Excel do the rendering. In that case, you'll want the CopyPicture() method, which will put picture of the range on the clipboard. This will be either a bitmap or metafile, depending on the options you use for the method call. You can then retrieve these from the clipboard and put them into your FlowDocument.
Depending on what applications you're looking at, e.g. Word, there's yet another more complicated approach, one that I doubt would work with FlowDocument, but which they are using. That is, they are presenting the Excel range an OLE object. This is harder to implement, but has the advantage that it's a live representation of the original Excel document, and the user can edit the range in-place in the host application.
The above should be enough to get you pointed in the right direction, so at least you know what you're looking for when you do your web searches. As stated, your question is very broad, and so the above is necessarily vague as well. Once you've decided on a particular method, have done some research and made an attempt into implementing that method, if you still have problems you can post a new question, with a good Minimal, Complete, and Verifiable code example that shows clearly what you've tried, with a detailed explanation of what specifically you're still having trouble with.

Related

MS Word C# AddIn - how to edit xml of an open word document

Thanks for coming by :)
I need to modify the XML of an MS Word Document directly, because the Word Interop's capabilities are insufficient for what I need to do.
The trick is that I have to do it from a Word Add-In and apply it to the currently open document, so I can't open/save packages (right?). In short, several dozen articles like the one below are not applicable here:
https://msdn.microsoft.com/en-us/library/aa982683%28v=office.12%29.aspx
Any help would be appreciated :)
Example problem -- Remove custom cell margins from a really, really big table in word (think 200x10) and check "Same as whole table" for each.
A lead on a solution (currenttable is the currently selected word table):
using System.Xml.Linq; // plus all the standard Word Add-In references
...
XDocument currentablexdocument = XDocument.Parse(currenttable.Range.WordOpenXML);
currentablexdocument.Descendants().Where(e =>e.Name.LocalName.Equals("tcMar")).Remove();
currenttable.Range.Delete();
currentselection.InsertXML(currentablexdocument.ToString());
Explanation:
currenttable.Range.WordOpenXML provides me with well-formed XML representation of the table, which I then interpret as an XDocument
tcMar = table cell margins. These XML elements exist only if a cell has custom margins. Deleting all such elements does exactly what I need.
currenttable.Range.Delete() deletes the old table
currentselection.InsertXML(...) inserts the modified table XML into the document with margins fixed. Pretty much instantaneous. Yay!
Problem:
Deleting and inserting the table is flaky and yields undesired results. It would be much better if I could MODIFY the xml directly. Is it possible?
Disclaimer:
Any other ideas of fixing this particular issue are welcome, but I have tried a myriad of possible solutions:
applying table style rejected by client,
looping "SendKeys" commands to automate use of the Word interface too unreliable,
changing Table.XXXPadding, Row.XXXPadding, Column.XXXPadding doesn't affect custom Cell margins (among other issues)
looping through cells to change their Cell.XXXPadding too slow (Freezes word for several minutes on a 200x10 table). Note, it's accessing the padding that's slow; the loop itself takes 3 seconds to traverse the whole table when implemented correctly.
ofc I tried it all with ScreenRefreshing = false and AllowAutoFit = false;
Somebody please help :)
Cheers!

Copy multiple (e.g. image and text) things to the Clipboard for pasting into MS office C# Winform

I want to copy multiple things to the clipboard:
Clipboard.SetImage(Image);
Clipboard.SetText(ImageCaption);
However I notice all the Clipboard.Set* functions clear the clipboard first. So in the above example you only get the text. Is there any "add" equivalent?
At the very least I would like to be able to add a bitmap and some text onto the clipboard so they can be pasted together into an application like Microsoft Office.
I cheekily tried:
[Serializable]
public class ImageAndText
{
public readonly string text;
public readonly Bitmap image;
public ImageAndText(string text, Bitmap image)
{
this.text = text;
this.image = image;
}
}
private void CopyToClipBoard()
{
Clipboard.SetData(DataFormats.Rtf , new ImageAndText(NewLayer.RecursionLog, rootLayer.GetBitmapTrueRes()));
}
But just got the "to string" of imageandtext which makes sense. There must be a data format I can use to tell MS that its getting an image and text? Is it secret?

Currently I have this working using RTF, but it does seem overkill for what I wanted to achieve...
I grabbed this nice little RTF library off code project, Thank you Dima for sharing this with the world :)
And I modified it, adding one extra function to write to the clipboard (i'm sure the extra function could have been added better but it works)('cough' hacky'cough')
The code to copy my image and text to the clipboard became:
private void CopyDetailToClipBoard(string caption, Image image)
{
//using rtf data format, put a bunch of data onto the clipboard
RtfDocument rtf = new RtfDocument();
RtfParagraph rtfTextBlock = new RtfParagraph();
const int captionFontSize = 22;
//add main image to rtf text
RtfImage image = new RtfImage(image, RtfImageFormat.Png);
rtfTextBlock.AppendParagraph(image);
//add caption
RtfFormattedText Caption = new RtfFormattedText(caption,
RtfCharacterFormatting.Underline | RtfCharacterFormatting.Bold,
1,
captionFontSize);
rtfTextBlock.AppendParagraph(Caption);
rtf.Contents.Add(rtfTextBlock);
//write the rtf to the clipboard
RtfWriter rtfWriter = new RtfWriter();
rtfWriter.WriteToClipBoard(rtf);
}
And the extra function I added to the RTF library was "WriteToClipbard" which I added to RtfWriter.cs and was simply a C&P of the write function but instead of writing the string to file it writes to the clipboard (it would be nicer if I had modified RtfDocument to simply had a "tostring" that could be used in either scenario)
//Where rtfWriter.WriteToClipBoard is a modified version of the "Write" function:
public void WriteToClipBoard(RtfDocument document)
{
_encoding = Encoding.GetEncoding((int)document.CodePage);
sb = new StringBuilder();
Reflect(document);
Clipboard.SetData(DataFormats.Rtf, sb.ToString());
}
I am still hoping there is a "lighter" solution then this, but this works for copying from my application to Word/any rtf application and it has the nice bonus that I can do fancy formatting etc.
p.s.
A word of warning, if your just adding text to an image (or vice-versa) just for the hell of it, don't, because the caveat is adding just a bitmap to the clipboard is recognized by more applications (e.g. paint), equally RTF into say notepad isn't very clean :p.
So your actually limiting the potential cross-over(see below for update) of your application's data by going the RTF route. For me, the text is actually more then just a "caption" it's a whole host of data explaining how the image was created (hence why the fancy formatting is actually quite useful but I have simplified all that for brevity).I actually found myself adding this as a "cntrl+shift+c" option and keeping "ctnrl+c" as adding a good old bitmap to the clipboard.
--UPDATE---
Turns out you can copy and paste RTF files into word. So you can skip the extra write to clipboard part and create the file and then add the file to the clipboard. This actually allows you to have multiple image+text items, the draw back is it gets pasted into word as an embedded doc but this was actually better for what I wanted.
Also turns out you can work around the application limit issue as you can add multiple types to the clipboard in parallel. So I currently add it to the clipboard as both an RTF file and a bitmap. doing the "setimage" does not overwrite the "setfile"
This way when I paste into word, I get the RTF, with the image and the bitmap.
When I paste into paint I paste just the image :) (in fact when I add to the bitmap clipboard I use GDI to write the caption on)
Its actually very cool how it works for various programs. Interestingly different office programs have different default "choices". Word seems to pick the RTF, whereas Powerpoint will pick the bitmap.

Paste data from clipboard to excel in right format

In performance analyzer I copy data from table in clipboard.
And then paste it in excel file.
The result is:
But when I paste it in text editor, I simple looks like:
Function Name Inclusive Samples Exclusive Samples Inclusive Samples
% Exclusive Samples %
[clr.dll] 26 26 39.39 39.39
Bee.Client.Common.BeeRight.CheckRightsForBeeUser() 10 0 15.15 0.00
Bee.Client.Common.BeeRight.get_Invoke() 6 0 9.09 0.00
Bee.Client.Common.BeeRight.Method(string,string) 13 0 19.70 0.00
Bee.Client.Common.Custom.FmCustom..ctor() 9 0 13.64 0.00
So can you tell me, how can I archive this effect?
Thanks!
Update
I'll try to explain.
I have DataGridView in my winform application. I wrote some function, which copy data from table into clipboard (the result looks like the text in my example). If I paste this text from clipboard to excel, the result will be excel file with data from clipboard, but there will be no formatting at all and this excel will be hard to read.
I wonder, how they prepare data from table (pic 1) such a way, that when I paste it to excel, it has formatting (pic 2), and when I paste it in text editor, we saw raw text..

The cause, why the direct work between your datagrid and Excel is good, is the implementation of the DataGridView component and its reaction to Copy operation, and the behavior of the application, you want to paste the content into. It can use some special codes, which are ignored by Notepad.
EDIT
So, now I understand your interest pretty well. I don't know how it works in C#, but in Java it looks so.
Every time you have any information in the clipboard there are a lot of variants, how other applications can use this content.
Suppose I want to get the content from the clipboard. I do it so:
Clipboard clipboard = Toolkit.getDefaultToolkit().getSystemClipboard();
Transferable contents = clipboard.getContents(null);
but now I should determine how the information should look for my application, and here your question begins.
If I have a picture in the clipboard, I have only 1 possible representation of it:
[mimetype=image/x-java-image;representationclass=java.awt.Image]
If I have some text from Notepad, there are already 27 variants:
[mimetype=application/x-java-text-encoding;representationclass=[B]
[mimetype=application/x-java-serialized-object;representationclass=java.lang.String]
[mimetype=text/plain;representationclass=java.io.Reader]
[mimetype=text/plain;representationclass=java.lang.String]
[mimetype=text/plain;representationclass=java.nio.CharBuffer]
and so on...
If I have some cells from an Excel sheet, there are 56 variants:
[mimetype=application/x-java-text-encoding;representationclass=[B]
[mimetype=text/html;representationclass=java.io.Reader]
[mimetype=text/html;representationclass=java.lang.String]
[mimetype=text/html;representationclass=java.nio.CharBuffer]
[mimetype=text/html;representationclass=[C]
and so on...
there is even an Image-variant for Excel-cells!
[mimetype=image/x-java-image;representationclass=java.awt.Image]
That is why it is possible to copy some cells from Excel and paste them into Paint as bitmap! It is not possible for Notepad of course, because its developers did not want to work with this presentation.
Now we can see, the clipboard is not so primitive how it can seem to be. Each time an application can analyze the content and take the best variant of it.
Now you can try to find some infos for C# development. I'm sure, you'll get it!

Paste from Excel into C# app, retaining full precision

I have data in an Excel spreadsheet with values like this:
0.69491375
0.31220394
The cells are formatted as Percentage, and set to display two decimal places. So they appear in Excel as:
69.49%
31.22%
I have a C# program that parses this data off the Clipboard.
var dataObj = Clipboard.GetDataObject();
var format = DataFormats.CommaSeparatedValue;
if (dataObj != null && dataObj.GetDataPresent(format))
{
var csvData = dataObj.GetData(format);
// do something
}
The problem is that csvData contains the display values from Excel, i.e. '69.49%' and '31.22%'. It does not contain the full precision of the extra decimal places.
I have tried using the various different DataFormats values, but the data only ever contains the display value from Excel, e.g.:
DataFormats.Dif
DataFormats.Rtf
DataFormats.UnicodeText
etc.
As a test, I installed LibreOffice Calc and copy/pasted the same cells from Excel into Calc. Calc retains the full precision of the raw data.
So clearly Excel puts this data somewhere that other programs can access. How can I access it from my C# application?
Edit - Next steps.
I've downloaded the LibreOffice Calc source code and will have a poke around to see if I can find out how they get the full context of the copied data from Excel.
I also did a GetFormats() call on the data object returned from the clipboard and got a list of 24 different data formats, some of which are not in the DataFormats enum. These include formats like Biff12, Biff8, Biff5, Format129 among other formats that are unfamiliar to me, so I'll investigate these and respond if I make any discoveries...

Also not a complete answer either, but some further insights into the problem:
When you copy a single Excel cell then what will end up in the clipboard is a complete Excel workbook which contains a single spreadsheet which in turn contains a single cell:
var dataObject = Clipboard.GetDataObject();
var mstream = (MemoryStream)dataObject.GetData("XML Spreadsheet");
// Note: For some reason we need to ignore the last byte otherwise
// an exception will occur...
mstream.SetLength(mstream.Length - 1);
var xml = XElement.Load(mstream);
Now, when you dump the content of the XElement to the console you can see that you indeed get a complete Excel Workbook. Also the "XML Spreadsheet" format contains the internal representation of the numbers stored in the cell. So I guess you could use Linq-To-Xml or similar to fetch the data you need:
XNamespace ssNs = "urn:schemas-microsoft-com:office:spreadsheet";
var numbers = xml.Descendants(ssNs + "Data").
Where(e => (string)e.Attribute(ssNs + "Type") == "Number").
Select(e => (double)e);
I've also tried to read the Biff formats using the Excel Data Reader however the resulting DataSets always came out empty...

The BIFF formats are an open specification by Microsoft. (Note, that I say specification not standard). Give a read to this to get an idea of what is going on.
Then those BIFF you see correspond to the some Excel formats. BIFF5 is XLS from Excel 5.0 and 95, BIFF8 is XLS from Excel 97 to 2003, BIFF12 is XLSB from Excel 2003, note that Excel 2007 can also produce them (I guess Excel 2010 too). There is some documentation here and also here (From OpenOffice) that may help you make sense of the binary there...
Anyways, there is some work has been done in past to parse this documents in C++, Java, VB and for your taste in C#. For example this BIFF12 Reader, the project NExcel, and ExcelLibrary to cite a few.
In particular NExcel will let you pass an stream which you can create from the clipboard data and then query NExcel to get the data. If you are going to take the source code then I think ExcelLibrary is much more readable.
You can get the stream like this:
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream = (System.IO.Stream)dataobject.GetData(format);
And read form the stream with NExcel would be something like this:
var wb = getWorkbook(stream);
var sheet = wb.Sheets[0];
var somedata = sheet.getCell(0, 0).Contents;
I guess the actual Office libraries from Microsoft would work too.
I know this is not the whole tale, please share how is it going. Will try it if I get a chance.

Use PDFBox to fill out a PDF Form

I have a pdf with a form in it. I am trying to write a class that will take data from my database and automatically populate the fields in the form.
I have already tried ITextSharp and their pricing is out of my budget, even though it works perfectly fine with my pdf. I need a free pdf parser that will let me import the pdf, set the data, and save the PDF out, preferably to a stream so that I can return a Stream object from my class rather than saving the pdf to the server.
I found this pdf reader and it doesn't work. Null reference errors are abundant and when I tried to "fix" them, it still couldn't find my fields.
So, I have moved on to PdfBox, as the documentation says it can manipulate a PDF, however, I cannot find any examples. Here is the code I have so far.
var document = PDDocument.load(inputPdf);
var catalog = document.getDocumentCatalog();
var form = catalog.getAcroForm();
form.getField("MY_FIELD").setValue("Test Value");
document.save("some location on my hard drive");
document.close();
The problem is that catalog.getAcroForm() is returning a null, so I can't access the fields. Does anyone know how I can use PdfBox to alter the field values and save the thing back out?
EDIT:
I did find this example, which is pretty much what I am doing. It's just that my acroform is null in pdfbox. I know there is one there because itextsharp can pull it out just fine.

Have you tried with the 1.2.1 version?
http://pdfbox.apache.org/apidocs/overview-summary.html

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.