parse excel file best practise

parse excel file best practise - c#

I am facing an issue in parsing excel file. My file has more than 5000 rows. When I parse it, its taking ages I wanted to ask if there's any better way to do so.
public static List<List<List<string>>> ExtractData(string filePath)
{
List<List<List<string>>> Allwork = new List<List<List<string>>>();
Microsoft.Office.Interop.Excel.Application excelApp = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbook workBook = excelApp.Workbooks.Open(filePath);
foreach (Microsoft.Office.Interop.Excel.Worksheet sheet in workBook.Worksheets)
{
List<List<string>> Sheet = new List<List<string>>();
Microsoft.Office.Interop.Excel.Range usedRange = sheet.UsedRange;
//Iterate the rows in the used range
foreach (Microsoft.Office.Interop.Excel.Range row in usedRange.Rows)
{
List<string> Rows = new List<string>();
String[] Data = new String[row.Columns.Count];
for (int i = 0; i < row.Columns.Count; i++)
{
try
{
Data[i] = row.Cells[1, i + 1].Value2.ToString();
Rows.Add(row.Cells[1, i + 1].Value2.ToString());
}
catch
{
Rows.Add(" ");
}
}
Sheet.Add(Rows);
}
Allwork.Add(Sheet);
}
excelApp.Quit();
return Allwork;
}
This is my code.

Your issue is that you are reading one cell at a time, this is very costly and inefficient try reading a range of cells.
Simple example below
Excel.Range range = worksheet.get_Range("A"+i.ToString(), "J" + i.ToString());
System.Array myvalues = (System.Array)range.Cells.Value;
string[] strArray = ConvertToStringArray(myvalues);
A link to basic example
Read all the cell values from a given range in excel

I suggest not use interop, but odbc connection for getting excel data. This will allow you to treat excel file as database and use sql statements to read needed data.

If that's an option, and if your tables have a simple structure, I would suggest to try exporting the file to .csv and applying simple string processing logic.
You might also want to try out the Igos's sugestion.

One approach is to use something like the ClosedXML library to directly read the .xlsx file, not going through the Excel interop.

Related

EPPlus Table # Column reference

I'm trying to use EPPlus to create a table on a worksheet. I can create the table, but all my # variables become #Ref! when opening up the file. If I paste the exact same formula into Excel it takes it and has no problem. What am I missing here? Do I need to apply the table somehow after creating it?
Thanks,
Lee
private void ProcessVehicleData(BorrowingBase bbData, ExcelWorksheet ew, int colStart, int rowStart) {
int origFirstRow = rowStart;
foreach (DailyCAPS data in bbData.DailyCAPS) {
FillRow(ew, data, colStart, rowStart);
++rowStart;
}
try {
ExcelAddressBase eab = new ExcelAddressBase(origFirstRow - 1, ExcelColumnNameToNumber("A"), rowStart - 1, ExcelColumnNameToNumber("Y"));
ExcelTable et = ew.Tables.Add(eab, "VehicleData");
if (origFirstRow != rowStart) {
ew.Cells[origFirstRow, ExcelColumnNameToNumber("Y")].Formula = "=IF([#Inventory Days]>210,\"H\",IF([#TitleApp]+[#UtahTitleReceived]=0,\"B\",\"\"))";
}
}
catch { }
}

See comments for answer...github.com/JanKallman/EPPlus/issues/521

No,epplus can't do it.
Because epplus Tables.Add is only pure data fill not workbook query,so =[#XXX] is not work.

Read from Excel file — specific sheet

Simply put: I need to read from an xlsx file (rows), in the simplest way. That is, preferably without using third-party tools, or at least things that aren't available as nuget packages.
I've been trying for a while with IExcelDatareader, but I cannot figure out how to get data from a specific sheet.
This simple code snippet works, but it just reads the first worksheet:
FileStream stream = File.Open("C:\\test\\test.xlsx", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
excelReader.IsFirstRowAsColumnNames = true;
while (excelReader.Read()) {
Console.WriteLine(excelReader.GetString(0));
}
This prints the rows in the first worksheet, but ignores the others. Of course, there is nothing to suggest otherwise, but I cannot seem to find out how to specify the sheet name.
It strikes me that this should be quite easy?
Sorry for asking something which has been asked several times before, but the answer (here and elsewhere on the net) are a jungle of bad, plain wrong and outdated half-answers that's a nightmare to try and make sense of. Especially since almost everyone answering assumes that you know some specific details that are not always easy to find.
UPDATE:
As per daniell89's suggestion below, I've tried this:
FileStream stream = File.Open("C:\\test\\test.xlsx", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
excelReader.IsFirstRowAsColumnNames = true;
// Select the first or second sheet - this works:
DataTable specificWorkSheet = excelReader.AsDataSet().Tables[1];
// This works: Printing the first value in each column
foreach (var col in specificWorkSheet.Columns)
Console.WriteLine(col.ToString());
// This does NOT work: Printing the first value in each row
foreach (var row in specificWorkSheet.Rows)
Console.WriteLine(row.ToString());
Printing each column heading with col.ToString() works fine.
Printing the first cell of each row with row.ToString() results in this output:
System.Data.DataRow
System.Data.DataRow
System.Data.DataRow
...
One per row, so it's obviously getting the rows. But how to get the contents, and why does ToString() work for the columns and not for the rows?

Maybe look at this answer: https://stackoverflow.com/a/32522041/5358389
DataSet workSheets= reader.AsDataSet();
And then specific sheet:
DataTable specificWorkSheet = reader.AsDataSet().Tables[yourValue];
Enumerating rows:
foreach (var row in specificWorkSheet.Rows)
Console.WriteLine(((DataRow)row)[0]); // column identifier in square brackets

You need to get the Worksheet for the sheet you want to read data from. To get range A1 from Cars, for example:
var app = new Application();
Workbooks workbooks = app.Workbooks;
Workbook workbook = workbooks.Open(#"C:\MSFT Site Account Updates_May 2015.xlsx");
Worksheet sheet = workbook.Sheets["Cars"];
Range range = sheet.Range["A1"];

It is a late reply but i hope it will help someone
The script will be aiming at retrieving data from the first sheet and also to get the data of the first row
if (upload != null && upload.ContentLength > 0)
{
// ExcelDataReader works with the binary Excel file, so it needs a FileStream
// to get started. This is how we avoid dependencies on ACE or Interop:
Stream stream = upload.InputStream;
// We return the interface, so that
IExcelDataReader reader = null;
if (upload.FileName.EndsWith(".xls"))
{
reader = ExcelReaderFactory.CreateBinaryReader(stream);
}
else if (upload.FileName.EndsWith(".xlsx"))
{
reader = ExcelReaderFactory.CreateOpenXmlReader(stream);
}
else
{
ModelState.AddModelError("File", "This file format is not supported");
return View();
}
var result = reader.AsDataSet(new ExcelDataSetConfiguration()
{
ConfigureDataTable = (_) => new ExcelDataTableConfiguration()
{
UseHeaderRow = true
}
}).Tables[0];// get the first sheet data with index 0
var tables = result.Rows[0].Table.Columns;//we have to get a list of table headers here "first row" from 1 row
foreach(var rue in tables)// iterate through the header list and add it to variable 'Headers'
{
Headers.Add(rue.ToString());//Headers has been treated as a global variable "private List<string> Headers = new List<string>();"
}
var count = Headers.Count();// test if the headers have been added using count
reader.Close();
return View(result);
}
else
{
ModelState.AddModelError("File", "Please Upload Your file");
}

How to find duplicate values in Excel cells between multiple sheets programmatically

For example, I have a sheet called EmployeeSheet, which is just a single column of every employee's name first and last in a company. And let's assume this list is perfectly formatted and has no duplicates so every cell is unique in this sheet.
Now I have a sheet for each department in the company, such as FinanceSheet, ITSheet, and SalesSheet. Each sheet has in it somewhere (as in each sheet doesn't have the same layout) a list of employees in each department. However any 1 employee name should only appear once between all of the department sheets (this excludes the EmployeeSheet).
Here's the solution I can think of but not figure out how to implement, would be to make a multidimensional array (Learned a small bit about them in school, vaguely remember how to use though).
Pseudocode something like:
arrEmployees = {"Tom Hanks", "Burt Reynolds", "Your Mom"}
arrFinance = {"Tom Hanks"}
arrIT = {"Burt Reynolds"}
arrSales = {"Your Mom"}
arrSheets = {arrEmployees, arrFinance, arrIT, arrSales}
While I've been able to get single cell values and ranges as strings by using
Sheets shts = app.Worksheets;
Worksheet ws = (Worksheet)sheets.get_Item("EmployeeSheet");
Excel.Range empRange = (Excel.Range)worksheet.get_range("B2");
string empVal = empRange.Value2.ToString();
But with that process to get a single cell value to a string, I don't know how I would put that into an element of my array, let alone a range of values.
I'm sure my method is not the most efficient, and it might not even be possible, but that's why I'm here for help, so any tips are appreciated.
EDIT: This is the solution that ended up working for me. Thanks to Ian Edwards solution.
Dictionary<string, List<Point>> fields = new Dictionary<string, List<Point>>();
fields["Finance"] = new List<Point>() { new Point(2,20)};
fields["Sales"] = new List<Point>();
for (int row = 5; row <= 185; row += 20) {fields["Sales"].Add(new Point(2,row));}
List<string> names = new List<string>();
List<string> duplicates = new List<string>();
foreach (KeyValuePair<string, List<Point>> kp in fields)
{
Excel.Worksheet xlSheet = (Excel.Worksheet)workbook.Worksheets[kp.Key];
foreach (Point p in kp.Value)
{
if ((xlSheet.Cells[p.Y, p.X] as Excel.Range.Value != null)
{
string cellVal = ((xlSheet.Cells[p.Y,p.X] as Excel.Range).Value).ToString();
if (!names.Contains(cellVal))
{ names.Add(cellVal)) }
else { duplicates.Add(cellVal); } } } }

Here's a little example I knocked together - the comments should explain what's going on line by line.
You can declare the name of the worksheets you want to check for names, as well as where to start looking for names in the 'worksheets' dictionary.
I assume you don't know how many names are in each list - it will keep going down each list until it encounters a blank cell.
// Load the Excel app
Microsoft.Office.Interop.Excel.Application xlApp = new Microsoft.Office.Interop.Excel.Application();
// Open the workbook
var xlWorkbook = xlApp.Workbooks.Open("XLTEST.xlsx");
// Delcare the sheets and locations to look for names
Dictionary<string, Tuple<int, int>> worksheets = new Dictionary<string, Tuple<int, int>>()
{
// Declare the name of the sheets to look in and the 1 base X,Y index of where to start looking for names on each sheet (i.e. 1,1, = A1)
{ "Sheet1", new Tuple<int, int>(1, 1) },
{ "Sheet2", new Tuple<int, int>(2, 3) },
{ "Sheet3", new Tuple<int, int>(4, 5) },
{ "Sheet4", new Tuple<int, int>(2, 3) },
};
// List to keep track of all names in all sheets
List<string> names = new List<string>();
// Iterate over every sheet we need to look at
foreach(var worksheet in worksheets)
{
string workSheetName = worksheet.Key;
// Get this excel worksheet object
var xlWorksheet = (Microsoft.Office.Interop.Excel.Worksheet)xlWorkbook.Worksheets[workSheetName];
// Get the 1 based X,Y cell index
int row = worksheet.Value.Item1;
int column = worksheet.Value.Item2;
// Get the string contained in this cell
string name = (string)(xlWorksheet.Cells[row, column] as Microsoft.Office.Interop.Excel.Range).Value;
// name is null when the cell is empty - stop looking in this sheet and move on to the next one
while(name != null)
{
// Add the current name to the list
names.Add(name);
// Get the next name in the cell below this one
name = (string)(xlWorksheet.Cells[++row, column] as Microsoft.Office.Interop.Excel.Range).Value;
}
}
// Compare the number of names to the number of unique names
if (names.Count() != names.Distinct().Count())
{
// You have duplicate names!
}

You can use .Range to define multiple cells (ie, .Range["A1", "F500"])
https://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.worksheet.range.aspx
You can then use .get_Value to get the contents/values of all cells in that Range. According to dotnetperls.com get_Value() is much faster than get_Range() (see 'Performance' section). Using the combo of multiple ranges + get_value will definitely perform better of lots of single range calls using get_range.
https://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.namedrange.get_value(v=vs.120).aspx
I store them in the an Object Array.
(object[,])yourexcelRange.get_Value(Excel.XlRangeValueDataType.xlRangeValueDefault);
From there you can write your own comparison method to compare multiple arrays. One quirk is that doing this returns a 1-indexed array, instead of a standard 0-based index.

List<custom> to Excel c#

can anyone help me?
I have a structure
public struct Data
{
public string aaaAAA;
public string bbbBBB;
public string cccCCC;
...
...
}
then some code to bring in a data into a List, creaitng new list etc.
I want to then transport this to excel which I have done like this,
for (int r = 0; r < newlist.Count; r++)
{
ws.Cells[row,1] = newlist[r].aaaAAA;
ws.Cells[row,2] = newlist[r].bbbBBB;
ws.Cells[row,3] = newlist[r].cccBBB;
}
This works, but it is painfully slow. I am inputting over 12,000 rows and my structure has 85 elements (so each row has 85 columns of data).
Can anyone help make this quicker??
Thanks,
Timujin

If as #juharr mentioned you are able to use OpenXML, look at the ClosedXML library for creating Excel documents, found here.
Using your example above you could then use the following code:
var wb = new XLWorkbook();
var ws = wb.Worksheets.Add("Data_Test_Worksheet");
ws.Cell(1, 1).InsertData(newList);
wb.SaveAs(#"c:\temp\Data_Test.xlsx");
If you require a header row, then you would just have to add those manually, using something like the below(Then you would start inserting your rows above from Row 2):
PropertyInfo[] properties = newList.First().GetType().GetProperties();
List<string> headerNames = properties.Select(prop => prop.Name).ToList();
for (int i = 0; i < headerNames.Count; i++)
{
ws.Cell(1, i + 1).Value = headerNames[i];
}
On the performance requirement, this seems to be more performant than iterating through the array. I have done some basic testing on my side and to insert 20 000 rows for sample object containing 2 properties, it took a total of 1 second.

Export a large data query (60k+ rows) to Excel

I created a reporting tool as part of an internal web application. The report displays all results in a GridView, and I used JavaScript to read the contents of the GridView row-by-row into an Excel object. The JavaScript goes on to create a PivotTable on a different worksheet.
Unfortunately I didn't expect that the size of the GridView would cause overloading problems with the browser if more than a few days are returned. The application has a few thousand records per day, let's say 60k per month, and ideally I'd like to be able to return all results for up to a year. The number of rows is causing the browser to hang or crash.
We're using ASP.NET 3.5 on Visual Studio 2010 with SQL Server and the expected browser is IE8. The report consists of a gridview that gets data from one out of a handful of stored procedures depending on which population the user chooses. The gridview is in an UpdatePanel:
<asp:UpdatePanel ID="update_ResultSet" runat="server">
<Triggers>
<asp:AsyncPostBackTrigger ControlID="btn_Submit" />
</Triggers>
<ContentTemplate>
<asp:Panel ID="pnl_ResultSet" runat="server" Visible="False">
<div runat="server" id="div_ResultSummary">
<p>This Summary Section is Automatically Completed from Code-Behind</p>
</div>
<asp:GridView ID="gv_Results" runat="server"
HeaderStyle-BackColor="LightSkyBlue"
AlternatingRowStyle-BackColor="LightCyan"
Width="100%">
</asp:GridView>
</div>
</asp:Panel>
</ContentTemplate>
</asp:UpdatePanel>
I was relatively new to my team, so I followed their typical practice of returning the sproc to a DataTable and using that as the DataSource in the code behind:
List<USP_Report_AreaResult> areaResults = new List<USP_Report_AreaResult>();
areaResults = db.USP_Report_Area(ddl_Line.Text, ddl_Unit.Text, ddl_Status.Text, ddl_Type.Text, ddl_Subject.Text, minDate, maxDate).ToList();
dtResults = Common.LINQToDataTable(areaResults);
if (dtResults.Rows.Count > 0)
{
PopulateSummary(ref dtResults);
gv_Results.DataSource = dtResults;
gv_Results.DataBind();
(I know what you're thinking! But yes, I have learned much more about parameterization since then.)
The LINQToDataTable function isn't anything special, just converts a list to a datatable.
With a few thousand records (up to a few days), this works fine. The GridView displays the results, and there's a button for the user to click which launches the JScript exporter. The external JavaScript function reads each row into an Excel sheet, and then uses that to create a PivotTable. The PivotTable is important!
function exportToExcel(sMyGridViewName, sTitleOfReport, sHiddenCols) {
//sMyGridViewName = the name of the grid view, supplied as a text
//sTitleOfReport = Will be used as the page header if the spreadsheet is printed
//sHiddenCols = The columns you want hidden when sent to Excel, separated by semicolon (i.e. 1;3;5).
// Supply an empty string if all columns are visible.
var oMyGridView = document.getElementById(sMyGridViewName);
//If no data is on the GridView, display alert.
if (oMyGridView == null)
alert('No data for report');
else {
var oHid = sHiddenCols.split(";"); //Contains an array of columns to hide, based on the sHiddenCols function parameter
var oExcel = new ActiveXObject("Excel.Application");
var oBook = oExcel.Workbooks.Add;
var oSheet = oBook.Worksheets(1);
var iRow = 0;
for (var y = 0; y < oMyGridView.rows.length; y++)
//Export all non-hidden rows of the HTML table to excel.
{
if (oMyGridView.rows[y].style.display == '') {
var iCol = 0;
for (var x = 0; x < oMyGridView.rows(y).cells.length; x++) {
var bHid = false;
for (iHidCol = 0; iHidCol < oHid.length; iHidCol++) {
if (oHid[iHidCol].length !=0 && oHid[iHidCol] == x) {
bHid = true;
break;
}
}
if (!bHid) {
oSheet.Cells(iRow + 1, iCol + 1) = oMyGridView.rows(y).cells(x).innerText;
iCol++;
}
}
iRow++;
}
}
What I'm trying to do: Create a solution (probably client-side) that can handle this data and process it into Excel. Someone might suggest using the HtmlTextWriter, but afaik that doesn't allow for automatically generating a PivotTable and creates an obnoxious pop-up warning....
What I've tried:
Populating a JSON object -- I still think this has potential but I haven't found a way of making it work.
Using a SQLDataSource -- I can't seem to use it to get any data back out.
Paginating and looping through the pages -- Mixed progress. Generally ugly though, and I still have the problem that the entire dataset is queried and returned for each page displayed.
Update:
I'm still very open to alternate solutions, but I've been pursuing the JSON theory. I have a working server-side method that generates the JSON object from a DataTable. I can't figure out how to pass that JSON into the (external) exportToExcel JavaScript function....
protected static string ConstructReportJSON(ref DataTable dtResults)
{
StringBuilder sb = new StringBuilder();
sb.Append("var sJSON = [");
for (int r = 0; r < dtResults.Rows.Count; r++)
{
sb.Append("{");
for (int c = 0; c < dtResults.Columns.Count; c++)
{
sb.AppendFormat("\"{0}\":\"{1}\",", dtResults.Columns[c].ColumnName, dtResults.Rows[r][c].ToString());
}
sb.Remove(sb.Length - 1, 1); //Truncate the trailing comma
sb.Append("},");
}
sb.Remove(sb.Length - 1, 1);
sb.Append("];");
return sb.ToString();
}
Can anybody show an example of how to carry this JSON object into an external JS function? Or any other solution for the export to Excel.

It's easy and efficient to write CSV files. However, if you need Excel, it can also be done in a reasonably efficient way, that can handle 60,000+ rows by using the Microsoft Open XML SDK's open XML Writer.
Install Microsoft Open SDK if you don't have it already (google "download microsoft open xml sdk")
Create a Console App
Add Reference to DocumentFormat.OpenXml
Add Reference to WindowsBase
Try running some test code like below (will need a few using's)
Just Check out Vincent Tan's solution at http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/ ( Below, I cleaned up his example slightly to help new users. )
In my own use I found this pretty straight forward with regular data, but I did have to strip out "\0" characters from my real data.
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
...
using (var workbook = SpreadsheetDocument.Create("SomeLargeFile.xlsx", SpreadsheetDocumentType.Workbook))
{
List<OpenXmlAttribute> attributeList;
OpenXmlWriter writer;
workbook.AddWorkbookPart();
WorksheetPart workSheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
writer = OpenXmlWriter.Create(workSheetPart);
writer.WriteStartElement(new Worksheet());
writer.WriteStartElement(new SheetData());
for (int i = 1; i <= 50000; ++i)
{
attributeList = new List<OpenXmlAttribute>();
// this is the row index
attributeList.Add(new OpenXmlAttribute("r", null, i.ToString()));
writer.WriteStartElement(new Row(), attributeList);
for (int j = 1; j <= 100; ++j)
{
attributeList = new List<OpenXmlAttribute>();
// this is the data type ("t"), with CellValues.String ("str")
attributeList.Add(new OpenXmlAttribute("t", null, "str"));
// it's suggested you also have the cell reference, but
// you'll have to calculate the correct cell reference yourself.
// Here's an example:
//attributeList.Add(new OpenXmlAttribute("r", null, "A1"));
writer.WriteStartElement(new Cell(), attributeList);
writer.WriteElement(new CellValue(string.Format("R{0}C{1}", i, j)));
// this is for Cell
writer.WriteEndElement();
}
// this is for Row
writer.WriteEndElement();
}
// this is for SheetData
writer.WriteEndElement();
// this is for Worksheet
writer.WriteEndElement();
writer.Close();
writer = OpenXmlWriter.Create(workbook.WorkbookPart);
writer.WriteStartElement(new Workbook());
writer.WriteStartElement(new Sheets());
// you can use object initialisers like this only when the properties
// are actual properties. SDK classes sometimes have property-like properties
// but are actually classes. For example, the Cell class has the CellValue
// "property" but is actually a child class internally.
// If the properties correspond to actual XML attributes, then you're fine.
writer.WriteElement(new Sheet()
{
Name = "Sheet1",
SheetId = 1,
Id = workbook.WorkbookPart.GetIdOfPart(workSheetPart)
});
writer.WriteEndElement(); // Write end for WorkSheet Element
writer.WriteEndElement(); // Write end for WorkBook Element
writer.Close();
workbook.Close();
}
If you review that code you'll notice two major writes, first the Sheet, and then later the workbook that contains the sheet. The workbook part is the boring part at the end, the earlier sheet part contains all the rows and columns.
In your own adaptation, you could write real string values into the cells from your own data. Instead, above, we're just using the row and column numbering.
writer.WriteElement(new CellValue("SomeValue"));
Worth noting, the row numbering in Excel starts at 1 and not 0. Starting rows numbered from an index of zero will lead to "Corrupt file" error messages.
Lastly, if you're working with very large sets of data, never call ToList(). Use a data reader style methodology of streaming the data. For example, you could have an IQueryable and utilize it in a for each. You never really want to have to rely on having all the data in memory at the same time, or you'll hit an out of memory limitation and/or high memory utilization.

I would try to use displaytag to display the results. You could set it up display a certain number per page, which should solve your overloading issue. Then, you can set displaytag to allow for an Excel export.

We typically handle this with an "Export" command button which is wired up to a server side method to grab the dataset and convert it to CSV. Then we adjust the response headers and the browser will treat it as a download. I know this is a server side solution, but you may want to consider it since you'll continue having timeout and browser issues until you implement server side record paging.

Almost a week and a half since I began this problem, I've finally managed to get it all working to some extent. I'll wait temporarily from marking an answer to see if anybody else has a more efficient, better 'best practices' method.
By generating a JSON string, I've divorced the JavaScript from the GridView. The JSON is generated in code behind when the data is populated:
protected static string ConstructReportJSON(ref DataTable dtResults)
{
StringBuilder sb = new StringBuilder();
for (int r = 0; r < dtResults.Rows.Count; r++)
{
sb.Append("{");
for (int c = 0; c < dtResults.Columns.Count; c++)
{
sb.AppendFormat("\"{0}\":\"{1}\",", dtResults.Columns[c].ColumnName, dtResults.Rows[r][c].ToString());
}
sb.Remove(sb.Length - 1, 1); //Truncate the trailing comma
sb.Append("},");
}
sb.Remove(sb.Length - 1, 1);
return String.Format("[{0}]", sb.ToString());
}
Returns a string of data such as
[ {"Caller":"John Doe", "Office":"5555","Type":"Incoming", etc},
{"Caller":"Jane Doe", "Office":"7777", "Type":"Outgoing", etc}, {etc} ]
I've hidden this string by assigning the text to a Literal in the UpdatePanel using:
<div id="div_JSON" style="display: none;">
<asp:Literal id="lit_JSON" runat="server" />
</div>
And the JavaScript parses that output by reading the contents of the div:
function exportToExcel_Pivot(sMyJSON, sTitleOfReport, sReportPop) {
//sMyJSON = the name, supplied as a text, of the hidden element that houses the JSON array.
//sTitleOfReport = Will be used as the page header if the spreadsheet is printed.
//sReportPop = Determines which business logic to create a pivot table for.
var sJSON = document.getElementById(sMyJSON).innerHTML;
var oJSON = eval("(" + sJSON + ")");
// DEBUG Example Test Code
// for (x = 0; x < oJSON.length; x++) {
// for (y in oJSON[x])
// alert(oJSON[x][y]); //DEBUG, returns field value
// alert(y); //DEBUG, returns column name
// }
//If no data is in the JSON object array, display alert.
if (oJSON == null)
alert('No data for report');
else {
var oExcel = new ActiveXObject("Excel.Application");
var oBook = oExcel.Workbooks.Add;
var oSheet = oBook.Worksheets(1);
var oSheet2 = oBook.Worksheets(2);
var iRow = 0;
var iCol = 0;
//Take the column names of the JSON object and prepare them in Excel
for (header in oJSON[0])
{
oSheet.Cells(iRow + 1, iCol + 1) = header;
iCol++;
}
iRow++;
//Export all rows of the JSON object to excel
for (var r = 0; r < oJSON.length; r++)
{
iCol = 0;
for (c in oJSON[r])
{
oSheet.Cells(iRow + 1, iCol + 1) = oJSON[r][c];
iCol++;
} //End column loop
iRow++;
} //End row
The string output and the JavaScript 'eval' parsing both work surprisingly fast, but looping through the JSON object is a little slower than I'd like.
I believe that this method would be limited to around 1 billion characters of data -- maybe less depending how memory testing works out. (I've calculated that I'll probably be looking at a maximum of 1 million characters per day, so that should be fine, within one year of reporting.)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

parse excel file best practise - c#

I suggest not use interop, but odbc connection for getting excel data. This will allow you to treat excel file as database and use sql statements to read needed data.

If that's an option, and if your tables have a simple structure, I would suggest to try exporting the file to .csv and applying simple string processing logic. You might also want to try out the Igos's sugestion.

One approach is to use something like the ClosedXML library to directly read the .xlsx file, not going through the Excel interop.

Related

EPPlus Table # Column reference

Read from Excel file — specific sheet

How to find duplicate values in Excel cells between multiple sheets programmatically

List<custom> to Excel c#

Export a large data query (60k+ rows) to Excel

Categories

Resources