I have data in an Excel spreadsheet with values like this:
0.69491375
0.31220394
The cells are formatted as Percentage, and set to display two decimal places. So they appear in Excel as:
69.49%
31.22%
I have a C# program that parses this data off the Clipboard.
var dataObj = Clipboard.GetDataObject();
var format = DataFormats.CommaSeparatedValue;
if (dataObj != null && dataObj.GetDataPresent(format))
{
var csvData = dataObj.GetData(format);
// do something
}
The problem is that csvData contains the display values from Excel, i.e. '69.49%' and '31.22%'. It does not contain the full precision of the extra decimal places.
I have tried using the various different DataFormats values, but the data only ever contains the display value from Excel, e.g.:
DataFormats.Dif
DataFormats.Rtf
DataFormats.UnicodeText
etc.
As a test, I installed LibreOffice Calc and copy/pasted the same cells from Excel into Calc. Calc retains the full precision of the raw data.
So clearly Excel puts this data somewhere that other programs can access. How can I access it from my C# application?
Edit - Next steps.
I've downloaded the LibreOffice Calc source code and will have a poke around to see if I can find out how they get the full context of the copied data from Excel.
I also did a GetFormats() call on the data object returned from the clipboard and got a list of 24 different data formats, some of which are not in the DataFormats enum. These include formats like Biff12, Biff8, Biff5, Format129 among other formats that are unfamiliar to me, so I'll investigate these and respond if I make any discoveries...
Also not a complete answer either, but some further insights into the problem:
When you copy a single Excel cell then what will end up in the clipboard is a complete Excel workbook which contains a single spreadsheet which in turn contains a single cell:
var dataObject = Clipboard.GetDataObject();
var mstream = (MemoryStream)dataObject.GetData("XML Spreadsheet");
// Note: For some reason we need to ignore the last byte otherwise
// an exception will occur...
mstream.SetLength(mstream.Length - 1);
var xml = XElement.Load(mstream);
Now, when you dump the content of the XElement to the console you can see that you indeed get a complete Excel Workbook. Also the "XML Spreadsheet" format contains the internal representation of the numbers stored in the cell. So I guess you could use Linq-To-Xml or similar to fetch the data you need:
XNamespace ssNs = "urn:schemas-microsoft-com:office:spreadsheet";
var numbers = xml.Descendants(ssNs + "Data").
Where(e => (string)e.Attribute(ssNs + "Type") == "Number").
Select(e => (double)e);
I've also tried to read the Biff formats using the Excel Data Reader however the resulting DataSets always came out empty...
The BIFF formats are an open specification by Microsoft. (Note, that I say specification not standard). Give a read to this to get an idea of what is going on.
Then those BIFF you see correspond to the some Excel formats. BIFF5 is XLS from Excel 5.0 and 95, BIFF8 is XLS from Excel 97 to 2003, BIFF12 is XLSB from Excel 2003, note that Excel 2007 can also produce them (I guess Excel 2010 too). There is some documentation here and also here (From OpenOffice) that may help you make sense of the binary there...
Anyways, there is some work has been done in past to parse this documents in C++, Java, VB and for your taste in C#. For example this BIFF12 Reader, the project NExcel, and ExcelLibrary to cite a few.
In particular NExcel will let you pass an stream which you can create from the clipboard data and then query NExcel to get the data. If you are going to take the source code then I think ExcelLibrary is much more readable.
You can get the stream like this:
var dataobject = System.Windows.Forms.Clipboard.GetDataObject();
var stream = (System.IO.Stream)dataobject.GetData(format);
And read form the stream with NExcel would be something like this:
var wb = getWorkbook(stream);
var sheet = wb.Sheets[0];
var somedata = sheet.getCell(0, 0).Contents;
I guess the actual Office libraries from Microsoft would work too.
I know this is not the whole tale, please share how is it going. Will try it if I get a chance.
Related
In my current project, Aspose has been used to work with Excel file (XLXS). This excel file has 4 worksheets. First two sheets are empty except they have first row which contain column names. These tab got data through code and other two contains tons of complex formula based on these inputs. Just imagine first two tab as inputs, third tab as complex calculation and last tab as output. Average size of file ranges from 26MB to 48MB. Below piece of code does most of the work. After this method, the file has been saved in some physical location too. output date saved in DB. This process working fine so far with above range, but when size exceeded beyound 100MB, it started throwing Out of Memory exception. Hardly once or twice, it able to complete the process in around 80 - 100 mins.
public void CaclulateM(DataSet dataModel)
{
var workbook = this.ExcelModel.Workbook;
var ranges = ExcelModel.GetExcelModelRanges;
base.ImportInputsTo(workbook, ranges, dataModel);
workbook.CalculateFormula(false);
base.ExportOutputsTo(workbook, ranges, dataModel);
}
I tried out some of the solution provided by Aspose, but failed.I tried other dlls too including Interop, ExcelLibrary, NPOI, but same result.
https://forum.aspose.com/t/aspose-cell-dll-issue-for-xslb-file/164440
Please help or let me know if you need any other input to suggest anything. I cannot provide you the excel file due to confidentiality.
(This question was formerly titled "C# / WPF : Going from Excel Interop "Range" to WPF "FlowDocument"" however I've made progress on that front that allows me to restrict my question. I'm leaving the original question below so existing answers will still make sense.)
I'm using Office Interop to read the contents of cells in an Excel worksheet. Some of those cells contain Rich Text (for example some words are italicized but not the whole cell) and I would like to capture them as RTF so I can then display them into WPF controls.
I have been able to obtain the RTF contents of cells using the clipboard API, where I use Excel Interop to copy a Range of one cell to the clipboard, and then read the clipboard, like so:
// Step 1 : retrieve the RTF from the clipboard as a string
string txt = Clipboard.GetText(TextDataFormat.Rtf);
// Step 2 : create a FlowDocument object and a TextRange object:
FlowDocument doc = new FlowDocument();
TextRange tr = new TextRange(doc.ContentStart, doc.ContentEnd);
// Step 3 : convert the clipboard string to a stream
byte[] byteArray = Encoding.ASCII.GetBytes(txt);
MemoryStream stream = new MemoryStream(byteArray);
// Step 4 : load that stream into TextRange
tr.Load(stream, DataFormats.Rtf);
If I then assign "doc" to the Document property of, say, a RichTextBox control, it'll display the content of the Excel cell with the exact same formatting as Excel does, down to colored words and font sizes.
However, this is extremely slow. It may take minutes to load a thousand cells that way, even if most are empty.
So here's my updated question : clearly Excel has a mechanism for returning the RTF content of an Excel cell, otherwise my Clipboard code couldn't work. But is there are more efficient way than the Clipboard to exploit that mechanism ? Ideally through Interop ?
Original question :
This may be an unusual question but as I'm quite new to C#, WPF and Interop, I might be going about things the wrong way so don't hesitate to offer a better approach. Here's what I'm trying to do :
I'm coding a WPF application that uses Office Interop to grab the contents of cells from an Excel worksheet. That content is text which may contain some formatting (for example some words are in bold, others are in italics). The application then displays that content in a "FlowDocumentScrollViewer" control on its GUI.
I want this "FlowDocumentScrollViewer" control to render the content from the Excel cell exactly as it appears in Excel, with formatting and everything.
The best I've managed so far is to display the cell's content without any formatting. Here's how this works : I use Office Interop to read a Range of cells from the worksheet and take their Value2 property. Value2 is of type "object". Then I create a FlowDocument object out of it, like so:
FlowDocument doc = new FlowDocument();
Paragraph p = new Paragraph(new Run(Variable_containing_a_Value2.ToString()));
doc.Blocks.Add(p);
And then I store this FlowDocument into the "FlowDocumentScrollViewer" Document property.
Now since I'm using "ToString()" on the Value2 I'm not surprised that any formatting information this object might contain disappears past this point.
My problem is, I haven't been able to find a way to create that FlowDocument, from that Value2 object, that preserves formatting.
Now, I know there has to be a way to get that information through, because when I copy my Excel cell and paste it in Word, for example, then the formatting is carried through. I just don't know how.
Help me Obiwans, you're my only hope, as even Google has failed me.
It seems to me that you have at least a couple of options that will work better than just copying the cell contents as text. The Range object has Copy() and CopyPicture() methods, which you can use to have Excel copy the contents of the range to the clipboard.
The basic Copy() method should (I haven't tested it) put the contents of the cell into the clipboard in a variety of formats, including RTF. And you should be able to get the RTF and put that into the FlowDocument element.
Using RTF, you may still not get exactly the representation as seen in Excel. The only way to do that is to have Excel do the rendering. In that case, you'll want the CopyPicture() method, which will put picture of the range on the clipboard. This will be either a bitmap or metafile, depending on the options you use for the method call. You can then retrieve these from the clipboard and put them into your FlowDocument.
Depending on what applications you're looking at, e.g. Word, there's yet another more complicated approach, one that I doubt would work with FlowDocument, but which they are using. That is, they are presenting the Excel range an OLE object. This is harder to implement, but has the advantage that it's a live representation of the original Excel document, and the user can edit the range in-place in the host application.
The above should be enough to get you pointed in the right direction, so at least you know what you're looking for when you do your web searches. As stated, your question is very broad, and so the above is necessarily vague as well. Once you've decided on a particular method, have done some research and made an attempt into implementing that method, if you still have problems you can post a new question, with a good Minimal, Complete, and Verifiable code example that shows clearly what you've tried, with a detailed explanation of what specifically you're still having trouble with.
The current version of EPPLUS support the creation of excel formulas but NOT excel array formulas, despite having the CreateFormulaArray() method.
When using the CreateFormulaArray() method, the correct formula string will appear on the excel formula editor. However, the formula does not actually execute on the sheet.
I was wondering if anyone knew of any clever workaround to this without having to use Microsoft.Office.Interop
My code is:
using (ExcelPackage pck = new ExcelPackage(newFile))
{
pck.Workbook.Worksheets.Add("Summary");
pck.Workbook.Worksheets.MoveToStart("Summary");
var summaryWS = pck.Workbook.Worksheets[1];
summaryWS.Cells["C2"].Value = 2;
summaryWS.Cells["C3"].Value = 3;
summaryWS.Cells["C4"].Value = 8;
summaryWS.Cells["A1"].CreateArrayFormula("STDEV.P($C$2:$C$4)*SQRT(8*260)");
}
my output in excel would be #NAME?
The formula editor would show {=STDEV.P($C$2:$C$4)*SQRT(8*260)}
Seems Excel is misinterpreting the function name STDEV.P which is the newer version of STDEVP. If you look at the XML output AFTER opening and saving with excel the wb EPPlus generates you will see it says _xludf.STDEV.P which means it thinks it is user-defined.
You can do one of two things. You could use the old version of the function:
summaryWS.Cells["A1"].CreateArrayFormula("STDEVP($C$2:$C$4)*SQRT(8*260)");
which is probably less then ideal since you always want to stick with the latest version. In that case, force excel to recognize the function like this:
summaryWS.Cells["A1"].CreateArrayFormula("_xlfn.STDEV.P($C$2:$C$4)*SQRT(8*260)");
I have a datagrid that I want to be able to copy and paste to/from excel. Pretty common scenario. I have the copy and paste functions implemented. However, this application has several datagrids, and I'd like to prevent the user from trying to copy data from one grid to another since the data is different.
I can serialize the objects in these grids to any format I want, so adding some kind of metadata that says "This data only goes in that grid" is trivial. But I can't add the metadata because then it would show up in excel. Is there some solution to this problem that allows me to paste data in one format in my application, but that excel will still handle correctly?
If you look at the clipboard class you can set the text but there is also quite a bit more you can do with it. Most of the advanced things you will want to do with the clipboard revolve around a pair of routines "SetDataObject" and "GetDataObject". To use this with multiple formats you can specify:
var serializableObject = new MyObject();
var clipData = new DataObject();
clipData.SetData(DataFormats.Text, "abcdefg");
clipData.SetData("CustomFormat", serializableObject);
Clipboard.SetDataObject(data);
Once you have done this you can get the data back from the clipboard by reversing this and requesting the data from the custom format. Briefly the reverse call looks like:
var clipData = (DataObject)Clipboard.GetDataObject();
var myObject = clipData.GetData("CustomFormat") as MyObject;
For a more complete example from Microsoft, see this page: http://msdn.microsoft.com/en-us/library/637ys738(v=vs.110).aspx. Just look at the bottom where it explains the use of multiple formats.
Hope this helps. Best of luck!
I have a pdf with a form in it. I am trying to write a class that will take data from my database and automatically populate the fields in the form.
I have already tried ITextSharp and their pricing is out of my budget, even though it works perfectly fine with my pdf. I need a free pdf parser that will let me import the pdf, set the data, and save the PDF out, preferably to a stream so that I can return a Stream object from my class rather than saving the pdf to the server.
I found this pdf reader and it doesn't work. Null reference errors are abundant and when I tried to "fix" them, it still couldn't find my fields.
So, I have moved on to PdfBox, as the documentation says it can manipulate a PDF, however, I cannot find any examples. Here is the code I have so far.
var document = PDDocument.load(inputPdf);
var catalog = document.getDocumentCatalog();
var form = catalog.getAcroForm();
form.getField("MY_FIELD").setValue("Test Value");
document.save("some location on my hard drive");
document.close();
The problem is that catalog.getAcroForm() is returning a null, so I can't access the fields. Does anyone know how I can use PdfBox to alter the field values and save the thing back out?
EDIT:
I did find this example, which is pretty much what I am doing. It's just that my acroform is null in pdfbox. I know there is one there because itextsharp can pull it out just fine.
Have you tried with the 1.2.1 version?
http://pdfbox.apache.org/apidocs/overview-summary.html