openxml sdk excel how to parse and calculate formula - c#

I have formula cell in excel file that has the formula =SUM(C2:C3).
From the web application hosted on remote webserver in the cloud, that does not have Excel installed, I would pass in values for C2 and C3.
I can also determine the exact formula in excel. How do I parse this formula programmatically in c# so that I could get the result of 6 if the input values of C2 and C3 were 2 and 4 respectively?
What if the formula is very complex, what is the best way to parse the formula and calculate it in C# on the server side in asp.net mvc application?
Code sample would really benefit me in this case.

If you provide a tool to open excel file and translate it's content to html you must deal with calculation.
If the file is "well created", for example manually with Excel you can be sure you don't need to manage computation of the formulas cause excel does the trick and stores both the formula in CellFormula's child element and result in CellValue's child element (See the method GetValue_D11()). So basically you just need to show the result.. which always will be a String.
Unfortunately you have to deal with styles and dataTypes, if you want to mantain behaviour.
Actually you have to build a complex web based spreadsheet viewer/editor.
Here is a sample "fixed" (totally not dynamic for all) for retrieving String values and formula values. if you wanna run the test be sure to download that file (http://www.devnmore.com/share/Test.xlsx) otherwise it can't works.
ShowValuesSample svs = new ShowValuesSample("yourPath\\Test.xlsx");
String[] test = svs.GetDescriptions_A2A10();
Double grandTotal = svs.GetValue_D11();
ShowValuesSample class:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DocumentFormat.OpenXml.Packaging;
using Ap = DocumentFormat.OpenXml.ExtendedProperties;
using Vt = DocumentFormat.OpenXml.VariantTypes;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Spreadsheet;
using A = DocumentFormat.OpenXml.Drawing;
using System.Globalization;
namespace TesterApp
{
public class ShowValuesSample
{
public String FileName { get; private set; }
private SpreadsheetDocument _ExcelDocument = null;
public SpreadsheetDocument ExcelDocument
{
get
{
if (_ExcelDocument == null)
{
_ExcelDocument = SpreadsheetDocument.Open(FileName, true);
}
return _ExcelDocument;
}
}
private SheetData _SheetDataOfTheFirstSheet = null;
public SheetData SheetDataOfTheFirstSheet
{
get
{
if (_SheetDataOfTheFirstSheet == null)
{
WorksheetPart shPart = ExcelDocument.WorkbookPart.WorksheetParts.ElementAt(0);
Worksheet wsh = shPart.Worksheet;
_SheetDataOfTheFirstSheet = wsh.Elements<SheetData>().ElementAt(0);
}
return _SheetDataOfTheFirstSheet;
}
}
private SharedStringTable _SharedStrings = null;
public SharedStringTable SharedStrings
{
get
{
if (_SharedStrings == null)
{
SharedStringTablePart shsPart = ExcelDocument.WorkbookPart.SharedStringTablePart;
_SharedStrings = shsPart.SharedStringTable;
}
return _SharedStrings;
}
}
public ShowValuesSample(String fileName)
{
FileName = fileName;
}
//In the file descriptions are stored as sharedString
//so cellValue it's the zeroBased index of the sharedStringTable
//in my example i saved 9 different values
//sharedstring it's a trick to reduce size of a file obiouvsly writing
//repetitive string just once
public String[] GetDescriptions_A2A10()
{
String[] retVal = new String[9];
for (int i = 0; i < retVal.Length; i++)
{
Row r = SheetDataOfTheFirstSheet.Elements<Row>().ElementAt(i + 1);
Cell c = r.Elements<Cell>().ElementAt(0);
Int32 shsIndex = Convert.ToInt32(c.CellValue.Text);
SharedStringItem shsItem = SharedStrings.Elements<SharedStringItem>().ElementAt(shsIndex);
retVal[i] = shsItem.Text.Text;
}
return retVal;
}
//The value it's stored beacause excel does
//To be sure it's correct you should perform all calculations
//In this case i'm sure Excel didn't stored the wrong value so..
public Double GetValue_D11()
{
Double retVal = 0.0d;
Int32 cellIndex = 0;
//cellIndex it's 0 and not 3, cause A11, B11, C11 are empty cells
//Another issue to deal with ;-)
Cell c = SheetDataOfTheFirstSheet.Elements<Row>().ElementAt(10).Elements<Cell>().ElementAt(cellIndex);
//as example take a look at the value of storedFormula
String storedFormula = c.CellFormula.Text;
String storedValue = c.CellValue.Text;
NumberFormatInfo provider = new NumberFormatInfo();
provider.NumberDecimalSeparator = ".";
provider.NumberGroupSeparator = ",";
provider.NumberGroupSizes = new Int32[] { 3 };
retVal = Convert.ToDouble(storedValue, provider);
return retVal;
}
}
}

spreadSheet.WorkbookPart.Workbook.CalculationProperties.ForceFullCalculation = true;
spreadSheet.WorkbookPart.Workbook.CalculationProperties.FullCalculationOnLoad = true;
worked for me.

I'm afraid its not possible. In Open XML you can read or change the formula. But you process the formula and get results through open xml.
Change the values for C2 and C3 for the formula and then save it in open xml, now open the document through Excel App. The values will be calculated and displayed.
Refer this SO Post, related to this issue open xml sdk excel formula recalculate cache issue
Refer this post too http://openxmldeveloper.org/discussions/formats/f/14/p/1806/158153.aspx
Hope this helps!

Related

Excel file Modified using EPPlus is not working in Ms Excel 2007

Okay, so i have an excel file (.xlsx) which will be downloaded when user clicks a button. The file is stored in a folder and will be processed by adding data validation and such before sending it to user. In my case i'm only adding dropdowns which will be filled from another sheet.
Here is my code that generates the file:
public IActionResult DownloadTemplateExcelFile(string type)
{
string fullFilePath = "../TemplateFiles/";
string fileName = "";
if (type != "")
{
fileName = type + ".xlsx";
fullFilePath += fileName;
}
var fileData = ExcelHelper.CreateExcelFile(type, fullFilePath);
return this.File(fileData, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",fileName);
}
And this code below is the file processing:
public static class ExcelHelper{
public static byte[] CreateExcelFile(string type, string fullFilePath){
using (var package = new ExcelPackage(new FileInfo(fullFilePath))){
// Getting list of values and set it for dropdown values
// List<string> dropdown1 = .. list string is loaded from database
ExcelWorksheet sheetDropdown1 = package.Workbook.Worksheets.Add("Dropdowns1");
var mainSheet= package.Workbook.Worksheets["Sheet1"];
sheetDropdown1.Cells["A1"].LoadFromCollection(dropdown1);
var dropdown1Addr = mainSheet.Cells[3,5,300,5].Address;
var dropdown1Formula = "='Dropdowns1'!$A:$A";
var validation = mainSheet.DataValidations.AddListValidation(dropdown1Addr);
validation.ShowErrorMessage = true;
validation.ErrorStyle = OfficeOpenXml.DataValidation.ExcelDataValidationWarningStyle.stop;
validation.ErrorTitle = "An invalid value was entered";
validation.Error = "Select a value from the list";
validation.AllowBlank = true;
validation.Formula.ExcelFormula = dropdown1Formula;
validation.Validate();
var excelFile = package.GetAsByteArray();
package.Dispose();
return excelFile;
}
}
}
When i opened the file in Excel 2010+ it worked just fine, the dropdown is loaded nicely and the cells in which the data validation is applied is working. However in excel 2007 when i tried to open it showed an error need to repair if i want to open it in 2007. If i do so however, the dropdown function is lost and unusable.
I've racked my brain but still haven't found any solution yet. How can i "Fix" this? I'm using EPPlus 6 for reference.

How to write data to an existing Excel doc in C#

I have a document that I need to update on a monthly basis and I'm writing an automation to do so. This is my first time attempting to update a document with C# as opposed to simply creating a new one. I have researched and tried implementing a few libraries that I've found online and here on StackOverflow, for example, ClosedXML, but so far I've had no luck. I understand this question has been asked here before, so my actual question is: Is my implementation incorrect/am I doing something wrong?
public void WriteToReport(List<BrandData> brandData, string reportFilePath)
{
using (var workbook = new XLWorkbook(reportFilePath))
{
var worksheet = workbook(1);
worksheet.Cell(26, 2).Value = "Hello World!";
workbook.SaveAs(reportFilePath);
}
}
Above is how I've tried to test ClosedXML so far. The GitHub docs imply that it should be this simple, but I don't see any changes made to the doc when the automation is finished. I've also tried using Streamwriter. If anyone can help me with ClosedXML or suggest another library that worked for them, it would be greatly appreciated.
Edit: Following explanations on other similar questions on here, I have tried this:
public void WriteToReport(List<BrandData> brandData, string reportFilePath)
{
var workbook = new XLWorkbook(reportFilePath);
var worksheet = workbook.Worksheet(1);
int numberOfLastColumn =
worksheet.LastColumnUsed().ColumnNumber();
IXLCell newCell = worksheet.Cell(numberOfLastColumn + 1, 1);
newCell.SetValue("Hello World");
workbook.SaveAs(reportFilePath);
}
Here is a simple example to write a string value to the first WorkSheet.
public void WriteToCell(string fileName, int row, int col, string value)
{
using var workbook = new XLWorkbook(fileName);
var worksheet = workbook.Worksheets.Worksheet(1);
worksheet.Cell(row, col).Value = value;
workbook.SaveAs(fileName);
}

Load an excel file that contains charts and insert new column using Infragistics.Documents.Excel

I would like to insert a new column in an existing file that contains charts.
It doesn't work, visual studio keeps running forever. I noticed that if I delete the charts that are in the loaded file it works just fine. A new column with data is inserted. I just don't know If I can conclude that it's because of existing charts that new columns can't be inserted.
Here is what I did :
private static void Main()
{
string outputFile = "metrics.xlsx";
Workbook workbook = Workbook.Load(outputFile);
Workbook temporary = SetIndicatorsWorkbook();
var values = new List<int>();
for(int j=0; j<12; j++)
{
values.Add((int)temporary.Worksheets["Unit & Integration Tests"].Rows[j].Cells[0].Value);
}
var worksheet = workbook.Worksheets["Unit Testing"];
var k = 9;
var count = worksheet.Rows[14].Cells.Count(cell => cell.Value!=null);
worksheet.Columns.Insert(count+1);
foreach (var value in values)
{
worksheet.Rows[k].Cells[count+1].Value = value;
k++;
}
workbook.Save(outputFile);
}
Your code seems fine, I used a random excel file that had a chart on the sheet and the code executed fine without errors. I will be able to assist further if you provide the metrics.xlsx file.

Importing a File with Dynamic Columns

I am new to SSIS and C#. In SQL Server 2008 I am importing data from a .csv file. Now I have the columns dynamic. They can be around 22 columns(some times more or less). I created a staging table with 25 columns and import data into it. In essence each flat file that I import has different number of columns. They are all properly formatted only. My task is to import all the rows from a .csv flat file including the headers. I want to put this in a job so I can import multiple files into the table daily.
So inside a for each loop I have a data flow task within which I have a script component. I came up(research online) with the C# code below but I get error:
Index was outside the bounds of the array.
I tried to find the cause using MessageBox and I found it is reading the first line and the index is going outside the bounds of the array after the first line.
1.) I need your help with fixing the code
2.) My File1Conn is the flat file connection instead I want to read it directly from a variable User::FileName that my foreach loop keeps updating. Please help with modifying the code below.
Thanks in advance.
This is my flat file:
https://drive.google.com/file/d/0B418ObdiVnEIRnlsZFdwYTRfTFU/view?usp=sharing
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Windows.Forms;
using System.IO;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
private StreamReader SR;
private string File1;
public override void AcquireConnections(object Transaction)
{
// Get the connection for File1
IDTSConnectionManager100 CM = this.Connections.File1Conn;
File1 = (string)CM.AcquireConnection(null);
}
public override void PreExecute()
{
base.PreExecute();
SR = new StreamReader(File1);
}
public override void PostExecute()
{
base.PostExecute();
SR.Close();
}
public override void CreateNewOutputRows()
{
// Declare variables
string nextLine;
string[] columns;
char[] delimiters;
int Col4Count;
String[] Col4Value = new string[50];
// Set the delimiter
delimiters = ";".ToCharArray();
// Read the first line (header)
nextLine = SR.ReadLine();
// Split the line into columns
columns = nextLine.Split(delimiters);
// Find out how many Col3 there are in the file
Col4Count = columns.Length - 3;
//MessageBox.Show(Col4Count.ToString());
// Read the second line and loop until the end of the file
nextLine = SR.ReadLine();
while (nextLine != null)
{
// Split the line into columns
columns = nextLine.Split(delimiters);
{
// Add a row
File1OutputBuffer.AddRow();
// Set the values of the Script Component output according to the file content
File1OutputBuffer.SampleID = columns[0];
File1OutputBuffer.RepNumber = columns[1];
File1OutputBuffer.Product = columns[2];
File1OutputBuffer.Col1 = columns[3];
File1OutputBuffer.Col2 = columns[4];
File1OutputBuffer.Col3 = columns[5];
File1OutputBuffer.Col4 = columns[6];
File1OutputBuffer.Col5 = columns[7];
File1OutputBuffer.Col6 = columns[8];
File1OutputBuffer.Col7 = columns[9];
File1OutputBuffer.Col8 = columns[10];
File1OutputBuffer.Col9 = columns[11];
File1OutputBuffer.Col10 = columns[12];
File1OutputBuffer.Col11 = columns[13];
File1OutputBuffer.Col12 = columns[14];
File1OutputBuffer.Col13 = columns[15];
File1OutputBuffer.Col14 = columns[16];
File1OutputBuffer.Col15 = columns[17];
File1OutputBuffer.Col16 = columns[18];
}
// Read the next line
nextLine = SR.ReadLine();
}
}
}
As you mentioned the file has dynamic amount of columns, in your script component you need to count number of columns by delimiters, then redirect to different outputs.
For your 2nd question, you can assign your variable to the flat file connection manager connection string property. Then you can read the variable value in your script directly.
Except for script component, you can create a "one column" flat file source by using a dummy delimiter, then in the data flow task, you can read amount of columns into a variable, conditional split the data flow, redirect the outputs into different destinations. An example can be found at http://sqlcodespace.blogspot.com.au/2015/03/ssis-design-pattern-handling-flat-file.html

Getting bibliographic data from text in a PDF and exporting to a window form

I use iText5 for .NET to extract text from a PDF, by using below code.
private void button1_Click(object sender, EventArgs e)
{
PdfReader reader2 = new PdfReader("Scharfetter1969.pdf");
int pagen = reader2.NumberOfPages;
reader2.Close();
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
for (int i = 1; i < 2; i++)
{
textBox1.Text = "";
PdfReader reader = new PdfReader("Scharfetter1969.pdf");
String s = PdfTextExtractor.GetTextFromPage(reader, i, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
textBox1.Text = s;
reader.Close();
}
}
But I want to get bibliographic data from research paper pdf.
Here is example of data which is extrected from this pdf (in endnote format), Here's a link!
%0 Journal Article
%T Repeated temperature modulation epitaxy for p-type doping and light-emitting diode based on ZnO
%A Tsukazaki, A.
%A Ohtomo, A.
%A Onuma, T.
%A Ohtani, M.
%A Makino, T.
%A Sumiya, M.
%A Ohtani, K.
%A Chichibu, S.F.
%A Fuke, S.
%A Segawa, Y.
%J Nature Materials
%V 4
%N 1
%P 42-46
%# 1476-1122
%D 2004
%I Nature Publishing Group
But remember that this is bibliographic information, it is not available in metadata of this pdf. I want to access Article Type (%O), Title (%T), Authors (%A), Date (%D) and (%I) and show it to different assigned textbox in window form.
I am using C# if any one have any code for this, or guide me how to do this.
PDF is a one-way format. You put data in so that it renders consistently on all devices (monitors, printers, etc) but the format was never intended to pull data back out. Any and all attempts to do that will be pure guess work. iText's PdfTextExtractor works but you are going to have to piece things together based on your own arbitrary set of rules, and these rules will probably change from PDF to PDF. The supplied PDF was created by InDesign which does such a great job of making text look good that it actually makes it even harder to parse the data back out.
That said, if your PDFs are all visually consistent, you could try to pull the data out while retaining formatting and use the formatting rules to guess what is what. That post will get you some HTML formatting that you could guess at. (If this actually works I'd recommend returning something more specific than HTML but I'll leave that up to you.)
Running it against your supplied PDF shows that the title is using the font HelveticaNeue-LightExt at about 17pts so you could write a rule to look for all lines that use that font at that size and combine them together. Authors are done in HelveticaNeue-Condensed at about 10pts so that's another rule.
The below code is a modified version of the one linked to above. Its a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0. It pulls out the title and authors for the supplied PDF but you'll need to tweak it for other PDFs and meta data. See the comments in the code for specific implementation details.
using System;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text.pdf.parser;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
PdfReader reader = new PdfReader(System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "nmat4-42.pdf"));
TextWithFontExtractionStategy S = new TextWithFontExtractionStategy();
string F = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, 1, S);
//Buffers to hold various parts from the PDF
List<string> titles = new List<string>();
List<string> authors = new List<string>();
//Array of lines of text
string[] lines = F.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
//Temporary string
string t;
//Loop through each line in the array
foreach (string line in lines)
{
//See if the line looks like a "title"
if (line.Contains("HelveticaNeue-LightExt") && line.Contains("font-size:17.28003"))
{
//Remove the HTML tags
titles.Add(System.Text.RegularExpressions.Regex.Replace(line, "</?span.*?>", "").Trim());
}
//See if the line looks like an "author"
else if (line.Contains("HelveticaNeue-Condensed") && line.Contains("font-size:9.995972"))
{
//Remove the HTML tags and trim extra characters
t = System.Text.RegularExpressions.Regex.Replace(line, "</?span.*?>", "").Trim(new char[] { ' ', ',', '*' });
//Make sure we have a valid name, probably need some more exceptions here, too
if (!string.IsNullOrWhiteSpace(t) && t != "AND")
{
authors.Add(t);
}
}
}
//Write out the title to the console
Console.WriteLine("Title : {0}", string.Join(" ", titles.ToArray()));
//Write out each author
foreach (string author in authors)
{
Console.WriteLine("Author : {0}", author);
}
Console.WriteLine(F);
this.Close();
}
public class TextWithFontExtractionStategy : iTextSharp.text.pdf.parser.ITextExtractionStrategy
{
//HTML buffer
private StringBuilder result = new StringBuilder();
//Store last used properties
private Vector lastBaseLine;
private string lastFont;
private float lastFontSize;
//http://api.itextpdf.com/itext/com/itextpdf/text/pdf/parser/TextRenderInfo.html
private enum TextRenderMode
{
FillText = 0,
StrokeText = 1,
FillThenStrokeText = 2,
Invisible = 3,
FillTextAndAddToPathForClipping = 4,
StrokeTextAndAddToPathForClipping = 5,
FillThenStrokeTextAndAddToPathForClipping = 6,
AddTextToPaddForClipping = 7
}
public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo)
{
string curFont = renderInfo.GetFont().PostscriptFontName;
//Check if faux bold is used
if ((renderInfo.GetTextRenderMode() == (int)TextRenderMode.FillThenStrokeText))
{
curFont += "-Bold";
}
//This code assumes that if the baseline changes then we're on a newline
Vector curBaseline = renderInfo.GetBaseline().GetStartPoint();
Vector topRight = renderInfo.GetAscentLine().GetEndPoint();
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(curBaseline[Vector.I1], curBaseline[Vector.I2], topRight[Vector.I1], topRight[Vector.I2]);
Single curFontSize = rect.Height;
//See if something has changed, either the baseline, the font or the font size
if ((this.lastBaseLine == null) || (curBaseline[Vector.I2] != lastBaseLine[Vector.I2]) || (curFontSize != lastFontSize) || (curFont != lastFont))
{
//if we've put down at least one span tag close it
if ((this.lastBaseLine != null))
{
this.result.AppendLine("</span>");
}
//If the baseline has changed then insert a line break
if ((this.lastBaseLine != null) && curBaseline[Vector.I2] != lastBaseLine[Vector.I2])
{
this.result.AppendLine("<br />");
}
//Create an HTML tag with appropriate styles
this.result.AppendFormat("<span style=\"font-family:{0};font-size:{1}\">", curFont, curFontSize);
}
//Append the current text
this.result.Append(renderInfo.GetText());
//Set currently used properties
this.lastBaseLine = curBaseline;
this.lastFontSize = curFontSize;
this.lastFont = curFont;
}
public string GetResultantText()
{
//If we wrote anything then we'll always have a missing closing tag so close it here
if (result.Length > 0)
{
result.Append("</span>");
}
return result.ToString();
}
//Not needed
public void BeginTextBlock() { }
public void EndTextBlock() { }
public void RenderImage(ImageRenderInfo renderInfo) { }
}
}
}

Categories