Reading large XLSX files - c#

I have an application that have to read excel and convert it to array. So far so good. Everything works file until I try to convert a larger file. I try OpenXML and try SAX approach:
using (SpreadsheetDocument xlsx = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = xlsx.WorkbookPart;
List<List<string>> parsedContent = new List<List<string>>();
foreach (WorksheetPart worksheet in workbookPart.WorksheetParts)
{
OpenXmlReader xlsxReader = OpenXmlReader.Create(worksheet);
while (xlsxReader.Read())
{
}
}
}
This is working well for files in range 1 - 10MB. My problem is when I try to load 10+ MB file. The result is OutOfMemoryException. How to proper read that big chunk of data? How to do it memory efficient?
P.s. I try libraries like ClosedXML, EPPlus and few others.
Every solution will be appreciated. Thank you in advance

If you plan on only performing a read on the excel file content, I suggest you use the ExcelDataReader library instead Link, which extracts the worksheetData into a DataSet object.
IExcelDataReader reader = null;
string FilePath = "PathToExcelFile";
//Load file into a stream
FileStream stream = File.Open(FilePath, FileMode.Open, FileAccess.Read);
//Must check file extension to adjust the reader to the excel file type
if (Path.GetExtension(FilePath).Equals(".xls"))
reader = ExcelReaderFactory.CreateBinaryReader(stream);
else if (Path.GetExtension(FilePath).Equals(".xlsx"))
reader = ExcelReaderFactory.CreateOpenXmlReader(stream);
if (reader != null)
{
//Fill DataSet
DataSet content = reader.AsDataSet();
//Read....
}

Use ExcelDataReader. It is easy to install through Nuget and should only require a few lines of code:
Nuget:
Install-Package ExcelDataReader
Usage:
using (FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
using (IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream))
{
DataSet result = excelReader.AsDataSet();
foreach (DataRow dr in result[0])
{
//Do stuff
}
}
}

Related

Using OpenXML read the existing file and append some rows with data, then return it as bytes so as to download the form

public static byte[] AddDataToPredifinedFormat(string path, string sheetName = "")
{
byte[] fileBytes = null;
var employee = new List<Employee>
{
new Employee{Name = "XYZ", Number = "12345"},
new Employee{Name = "ABC", Number = "12345"}
};
try
{
using (MemoryStream ms = new MemoryStream())
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(ms, false))
{
WorkbookPart wbPart = document.WorkbookPart;
IEnumerable<Sheet> sheets = wbPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string sheetId = sheetName != "" ? sheets.Where(q => q.Name == sheetName).First().Id.Value : sheets.First().Id.Value;
WorksheetPart wsPart = (WorksheetPart)wbPart.GetPartById(sheetId);
SheetData sheetData = wsPart.Worksheet.GetFirstChild<SheetData>();
foreach (var x in employee)
{
Row newRow = new Row();
Cell cell = new Cell();
cell.CellValue = new CellValue(x.Name.ToString());
cell.StyleIndex = 0;
newRow.AppendChild(cell);
sheetData.AppendChild(newRow);
}
wbPart.Workbook.Save(ms);
fileBytes = ms.ToArray();
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
return fileBytes;
}
Reading the existing file and appending rows but on downloading the file, Excel app shows it's corrupted.
Using OpenXML read the existing file and append some rows with data, then return it as bytes so as to download the form
I think in order to return the sheet you need to need to refactor code a little. Instead of expecting the memory stream to write directly to array and download file wont help,
What i did is in my code:
Write data to stream and read that back. So you need to return the updated stream
Write the stream data to ExcelPackage
Something like this
using OfficeOpenXml;
Stream stream = new MemoryStream(<process and return your steam>);
using (ExcelPackage package = new ExcelPackage(stream))
{
//write any more extra data to sheet
return package.GetAsByteArray();
}
Important bit to notice is : package.GetAsByteArray();
Can you try moving fileBytes = ms.ToArray(); to outside of the inner using block?
I have very similar code to this for modifying WordprocessingDocuments with OpenXML but I don't use MemoryStream. I File.Copy the original to a temporary file, modify, save and then load the byte[] from the temporary file for streaming.

Get Active or Default Sheet using ExcelDataReader

How can I get content of the Active sheet or the default sheet of an .xlsx workbook? I have to read many workbooks and I do not know the name of the Active sheets.
FileStream stream = File.Open("C:\\test\\test.xlsx", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
excelReader.IsFirstRowAsColumnNames = true;
while (excelReader.Read()) {
Console.WriteLine(excelReader.GetString(0));
}
I know how to do it using OfficeOpenXml but ExcelDataReader is lightweight and AsDataSet makes it very easy to turn excel data into JSON.
Using (var pck = new OfficeOpenXml.ExcelPackage()) {
using (var stream = System.IO.File.OpenRead(path))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.FirstOrDefault(f => f.View.TabSelected);
string ActiveSheetName = ws.Name;
}
Any help here is highly appreciated!

Reading the contents of the XLS file using ExcelDataReader

I use the following code to read the contents of the xls file using ExcelDataReader.
List<string> excelFiles = GetExcelFileNamesInDirectory(Application.persistentDataPath);
for (int i = 0; i < excelFiles.Count; i++)
{
using (var stream = File.Open(excelFiles[i], FileMode.Open, FileAccess.Read))
{
// Auto-detect format, supports:
// - Binary Excel files (2.0-2003 format; *.xls)
// - OpenXml Excel files (2007 format; *.xlsx)
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
// Choose one of either 1 or 2:
fileLog.text = reader.Name;
// 1. Use the reader methods
do
{
while (reader.Read())
{
// reader.GetDouble(0);
}
} while (reader.NextResult());
// 2. Use the AsDataSet extension method
var result = reader.AsDataSet();
// The result of each spreadsheet is in result.Tables
}
}
}
there is one excel file and hence the code passes the for loop. Unfortunately, the code does not cross the line using (var stream.. and it ends right before creating reader using ExcelReaderFactory. am I missing any reference ?

Take contents of an excel file (.xls or .xlsx) in to Dataset

i have an excel file named test.xls and i want to get the contents in the excel sheet into a Dataset.Is it possible
i tried a code but it throws exception,here is my code
string FilePath = Server.MapPath("portals\\_default") + "\\" + upprice.FileName;
upprice.PostedFile.SaveAs(FilePath);
FileStream stream = File.Open(FilePath, FileMode.Open, FileAccess.Read);
if (upprice.FileName.Contains(".xlsx"))
{
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
DataSet result = excelReader.AsDataSet();
}
I'm going to assume you're using this http://exceldatareader.codeplex.com/
From your code:
if (upprice.FileName.Contains(".xlsx"))
{
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
DataSet result = excelReader.AsDataSet();
}
else if (upprice.FileName.Contains(".xls"))
{
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
DataSet result = excelReader.AsDataSet();
}
these tests are backwards. ".xlsx" files are zipped xml documents. "xls" are the older binary files. Also consider using System.IO.Path.GetExtension() to get the file extension since you'll notice Contains(".xls") is true for both file types.

ExcelPackage: Office Open XML

I am new to this Open Office XML and I was wondering what file extension the XLPackage takes.
For example I assumed I just needed to input the file location of a CSV file I am using, but it does not work, do I have to convert the file to .xlsx or is there something other then the XLPackage that I should use?
The problem is that once it gets to the using a new OpenDialog is initiated and I cant find my file. I am probably just missing something obvious. File Contains Corrupt data, FileFormatException, I assume I need to convert the file before use?
I appreciate any feedback.
Some code:
FileInfo existingFile = new FileInfo(eFilePath);
using (ExcelPackage xlPackage = new ExcelPackage(existingFile)) // I think the issue is here.
{
ExcelWorksheet exeedSheet = xlPackage.Workbook.Worksheets[1];
//Total rows
for (int row = 1; row > 0; )
If you are working with a CSV the ExcelPackage is overkill for what you are doing.
CSV:
using (var Sr = new StreamReader("\\SomeCoolFile.CSV"))
{
var text = Sr.ReadToEnd();
Sr.Close();
text = text.Replace("\n", string.Empty);
var lines = text.Split('\r');
var info = lines.Select(line => line.Split(',')).ToList();
......
}
ExcelPackage:
using (var fs = new FileStream("\\SomeCoolFile.xlsx", FileMode.Open))
{
using (var package = new ExcelPackage(fs))
{
var workBook = package.Workbook;
.....
}
}

Categories