Using ExcelDataReader to read Excel data starting from a particular cell - c#

I am using ExcelDataReader to read data from my Excel workbook in C#.
But structure of my Excel sheet is such that data to be read can start from any particular cell and not necessarily A1.
Can any one Please suggest a way on how this can be achieved using ExcelDataReader?

If you are using ExcelDataReader 3+ you will find that there isn't any method for AsDataSet() for your reader object, You need to also install another package for ExcelDataReader.DataSet, then you can use the AsDataSet() method.
Also there is not a property for IsFirstRowAsColumnNames instead you need to set it inside of ExcelDataSetConfiguration.
Example:
using (var stream = File.Open(originalFileName, FileMode.Open, FileAccess.Read))
{
IExcelDataReader reader;
// Create Reader - old until 3.4+
////var file = new FileInfo(originalFileName);
////if (file.Extension.Equals(".xls"))
//// reader = ExcelDataReader.ExcelReaderFactory.CreateBinaryReader(stream);
////else if (file.Extension.Equals(".xlsx"))
//// reader = ExcelDataReader.ExcelReaderFactory.CreateOpenXmlReader(stream);
////else
//// throw new Exception("Invalid FileName");
// Or in 3.4+ you can only call this:
reader = ExcelDataReader.ExcelReaderFactory.CreateReader(stream)
//// reader.IsFirstRowAsColumnNames
var conf = new ExcelDataSetConfiguration
{
ConfigureDataTable = _ => new ExcelDataTableConfiguration
{
UseHeaderRow = true
}
};
var dataSet = reader.AsDataSet(conf);
// Now you can get data from each sheet by its index or its "name"
var dataTable = dataSet.Tables[0];
//...
}
You can find row number and column number of a cell reference like this:
var cellStr = "AB2"; // var cellStr = "A1";
var match = Regex.Match(cellStr, #"(?<col>[A-Z]+)(?<row>\d+)");
var colStr = match.Groups["col"].ToString();
var col = colStr.Select((t, i) => (colStr[i] - 64) * Math.Pow(26, colStr.Length - i - 1)).Sum();
var row = int.Parse(match.Groups["row"].ToString());
Now you can use some loops to read data from that cell like this:
for (var i = row; i < dataTable.Rows.Count; i++)
{
for (var j = col; j < dataTable.Columns.Count; j++)
{
var data = dataTable.Rows[i][j];
}
}
Update:
You can filter rows and columns of your Excel sheet at read time with this config:
var i = 0;
var conf = new ExcelDataSetConfiguration
{
UseColumnDataType = true,
ConfigureDataTable = _ => new ExcelDataTableConfiguration
{
FilterRow = rowReader => fromRow <= ++i - 1,
FilterColumn = (rowReader, colIndex) => fromCol <= colIndex,
UseHeaderRow = true
}
};

To be more clear, I will begin at the beginning.
I will rely on the sample code found in https://github.com/ExcelDataReader/ExcelDataReader, but with some modifications to avoid inconveniences.
The following code detects the file format, either xls or xlsx.
FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader;
//1. Reading Excel file
if (Path.GetExtension(filePath).ToUpper() == ".XLS")
{
//1.1 Reading from a binary Excel file ('97-2003 format; *.xls)
excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
}
else
{
//1.2 Reading from a OpenXml Excel file (2007 format; *.xlsx)
excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
}
//2. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();
//3. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = false;
Now we can access the file contents in a more convenient way. I use DataTable for this. The following is an example to access a specific cell, and print its value in the console:
DataTable dt = result.Tables[0];
Console.WriteLine(dt.Rows[rowPosition][columnPosition]);
If you do not want to do a DataTable, you can do the same as follows:
Console.WriteLine(result.Tables[0].Rows[rowPosition][columnPosition]);
It is important not try to read beyond the limits of the table, for this you can see the number of rows and columns as follows:
Console.WriteLine(result.Tables[0].Rows.Count);
Console.WriteLine(result.Tables[0].Columns.Count);
Finally, when you're done, you should close the reader and free resources:
//5. Free resources (IExcelDataReader is IDisposable)
excelReader.Close();
I hope you find it useful.
(I understand that the question is old, but I make this contribution to enhance the knowledge base, because there is little material about particular implementations of this library).

For ExcelDataReader v3.6.0 and above.
I struggled a bit to iterate over the Rows. So here's a little more to the above code. Hope it helps for few atleast.
using (var stream = System.IO.File.Open(copyPath, FileMode.Open, FileAccess.Read))
{
IExcelDataReader excelDataReader = ExcelDataReader.ExcelReaderFactory.CreateReader(stream);
var conf = new ExcelDataSetConfiguration()
{
ConfigureDataTable = a => new ExcelDataTableConfiguration
{
UseHeaderRow = true
}
};
DataSet dataSet = excelDataReader.AsDataSet(conf);
//DataTable dataTable = dataSet.Tables["Sheet1"];
DataRowCollection row = dataSet.Tables["Sheet1"].Rows;
//DataColumnCollection col = dataSet.Tables["Sheet1"].Columns;
List<object> rowDataList = null;
List<object> allRowsList = new List<object>();
foreach (DataRow item in row)
{
rowDataList = item.ItemArray.ToList(); //list of each rows
allRowsList.Add(rowDataList); //adding the above list of each row to another list
}
}

One way to do it :
FileStream stream = File.Open(#"c:\working\test.xls", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();
The result.Tables contains the sheets and the result.tables[0].Rows contains the cell rows.

I found this useful to read from a specific column and row:
FileStream stream = File.Open(#"C:\Users\Desktop\ExcelDataReader.xlsx", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
DataSet result = excelReader.AsDataSet();
excelReader.IsFirstRowAsColumnNames = true;
DataTable dt = result.Tables[0];
string text = dt.Rows[1][0].ToString();

Very easy with ExcelReaderFactory 3.1 and up:
using (var openFileDialog1 = new OpenFileDialog { Filter = "Excel Workbook|*.xls;*.xlsx;*.xlsm", ValidateNames = true })
{
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
var fs = File.Open(openFileDialog1.FileName, FileMode.Open, FileAccess.Read);
var reader = ExcelReaderFactory.CreateBinaryReader(fs);
var dataSet = reader.AsDataSet(new ExcelDataSetConfiguration
{
ConfigureDataTable = _ => new ExcelDataTableConfiguration
{
UseHeaderRow = true // Use first row is ColumnName here :D
}
});
if (dataSet.Tables.Count > 0)
{
var dtData = dataSet.Tables[0];
// Do Something
}
}
}

public static DataTable ConvertExcelToDataTable(string filePath, bool isXlsx = false)
{
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
//open file and returns as Stream
using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
var conf = new ExcelDataSetConfiguration
{
ConfigureDataTable = _ => new ExcelDataTableConfiguration
{
UseHeaderRow = true
}
};
var dataSet = reader.AsDataSet(conf);
// Now you can get data from each sheet by its index or its "name"
var dataTable = dataSet.Tables[0];
Console.WriteLine("Total no of rows " + dataTable.Rows.Count);
Console.WriteLine("Total no of Columns " + dataTable.Columns.Count);
return dataTable;
}
}
}

You could use the .NET library to do the same thing which i believe is more straightforward.
string ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0; data source={path of your excel file}; Extended Properties=Excel 12.0;";
OleDbConnection objConn = null;
System.Data.DataTable dt = null;
//Create connection object by using the preceding connection string.
objConn = new OleDbConnection(connString);
objConn.Open();
//Get the data table containg the schema guid.
dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string sql = string.Format("select * from [{0}$]", sheetName);
var adapter = new System.Data.OleDb.OleDbDataAdapter(sql, ConnectionString);
var ds = new System.Data.DataSet();
string tableName = sheetName;
adapter.Fill(ds, tableName);
System.Data.DataTable data = ds.Tables[tableName];
After you have your data in the datatable you can access them as you would normally do with a DataTable class.

Related

C# Problem opening excel file using `OpenXml` from `byte[]`

I have a problem when saving and opening a file with OpenXml library. Here is my code:
public static void SaveExcel(List<Dictionary<string, object>> listData, List<string> entityTypes, string appName)
{
using (var ms = new MemoryStream())
using (var excel = SpreadsheetDocument.Create(ms, SpreadsheetDocumentType.Workbook))
{
var workBookPart = excel.AddWorkbookPart();
workBookPart.Workbook = new Workbook();
var workSheetPart = workBookPart.AddNewPart<WorksheetPart>();
var workSheetData = new SheetData();
workSheetPart.Worksheet = new Worksheet(workSheetData);
var sheets = workBookPart.Workbook.AppendChild(new Sheets());
var index = 1;
foreach (var entityType in entityTypes)
{
var sheet = new Sheet
{
Id = excel.WorkbookPart.GetIdOfPart(workSheetPart),
SheetId = 1U,
Name = entityType
};
sheets.AppendChild(sheet);
}
workBookPart.Workbook.Save(ms);
File.WriteAllBytes("D:/nothing123.xlsx", ms.ToArray());
}
}
I am pretty sure I did the right thing though I have this error when opening the file:
Excel cannot be opened the file 'nothing123.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and the file extension matches the format of the file.
Any idea whats going on with my code?
I don't know if this is relevant to the tag but I made up my mind to use ClosedXml library since it's much easier to use than OpenXml. I can easily create a DataTable and create an Excel file out of the DataTable which is very convenient. Here is my quick sample code:
Sample Data Table
public DataTable getData() {
DataTable dt = new DataTable();
dt.TableName = "SheetName1";
dt.Columns.Add("FirstName");
dt.Columns.Add("LastName");
var row = dt.NewRow();
row["FirstName"] = "Alvin";
row["LastName"] = "Quezon";
dt.Rows.Add(row);
return dt;
}
Sample Code to Excel to Byte Array
public static byte[] GetExcelBytes(DataTable dataTable)
{
using (var ms = new MemoryStream())
using (var workBook = new XLWorkbook())
{
workBook.Worksheets.Add(dataTable);
workBook.SaveAs(ms);
return ms.ToArray();
}
}
I didn't get any issue when opening the file and with superb minimal code usage.
Hopefully this will help anyone would like to use this in the future.

ExcelDataReader is showing that multiple sheets exist in an excel file, even though the file has only one sheet

I am using ExcelDataReader to read the first sheet of the an Excel file. However, ExcelDataReader is telling me that multiple sheets exists in an Excel file even though I only see one sheet when I open the excel file in Excel.
This is the first time I have encountered this. ExcelDataReader always showed the exact number of sheets there were present in an excel file. No more, no less
This is the code I'm using to import and read the excel file
public static bool CheckForHeader(string filePath)
{
DataSet ds;
try
{
var extension = Path.GetExtension(filePath).ToLower();
using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
IExcelDataReader reader = null;
if (extension == ".xls")
{
reader = ExcelReaderFactory.CreateBinaryReader(stream);
}
else if (extension == ".xlsx")
{
reader = ExcelReaderFactory.CreateOpenXmlReader(stream);
}
else if (extension == ".csv")
{
reader = ExcelReaderFactory.CreateCsvReader(stream);
}
if (reader == null)
throw new Exception();
// reader.IsFirstRowAsColumnNames = firstRowNamesCheckBox.Checked;
using (reader)
{
ds = reader.AsDataSet(new ExcelDataSetConfiguration()
{
UseColumnDataType = false,
ConfigureDataTable = (tableReader) => new ExcelDataTableConfiguration()
{
UseHeaderRow = false
}
});
}
}
}
catch
{ throw new Exception("Error while loading the Excel Spreadsheet. Please make sure that the file is not being used by another program."); }
System.Data.DataTable workSheet = ds.Tables[0];
string FRowFCol;
FRowFCol = workSheet.Rows[0].ItemArray[0].ToString();
if (FRowFCol.Contains('\\'))
return false;
return true;
}
The ds should only have 1 table inside it, as the excel file only contains 1 sheet. But instead, it is showing that it contains 5 tables.

C# Import data from specified Excel sheet into SQL Server

I have some C# code to import Excel into a database, but it can only import data from the first sheet (Sheet1) into the SQL Server database. I want to import data from the second sheet (Sheet2) of Excel file into the database. Please modify the code to teach me how to do that.
private void button1_Click(object sender, EventArgs e)
{
OpenFileDialog ope = new OpenFileDialog();
ope.Filter = "Excel Files|*.xls;*.xlsx;*slsm";
if (ope.ShowDialog() == DialogResult.Cancel)
return;
FileStream stream = new FileStream(ope.FileName, FileMode.Open);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
DataSet result = excelReader.AsDataSet();
DataClasses1DataContext conn = new DataClasses1DataContext();
foreach(DataTable table in result.Tables)
{
//Default start value of foreach statement is 0
//Please tell me: How to count dr from 1 instead of 0???
foreach (DataRow dr in table.Rows)
{
//if index of dr==1, then continue
if (table.Rows.IndexOf(dr) == 0)
{
continue;
}
test addtable = new test()
{
id = Convert.ToInt32(dr[0]),
name = Convert.ToString(dr[1]),
surname = Convert.ToString(dr[2]),
age = Convert.ToInt32(dr[3])
};
conn.tests.InsertOnSubmit(addtable);
}
}
conn.SubmitChanges();
excelReader.Close();
stream.Close();
MessageBox.Show("Import table completed!");
}
Assuming you are looking to get the data from the spreadsheet Sheet2, this should work :
DataSet result = excelReader.AsDataSet();
while (excelReader.NextResult())
{
if (excelReader.Name == "Sheet2")
{
result = excelReader.AsDataSet();
break;
}
}

How to Read the Uploaded Excel File using NPOI with out storing in the server or in the Project

My sample Code is below using the NPOI.dll. I can read the EXCEL File (i.e. Excel would be stored in the system like D:/Jamal/Test.xls. Then the dll is easily reading the content, but I need to read the uploaded Excel file without storing it in any place before.The HTTPPOSTEDFILEBASE excelfile has the value for the Excel file but I need to know how to read it using NPOI dlls
public List<string> SendInvitesExcelFile1(List<String> CorrectMailIDs,
ListInvites Invites, HttpPostedFileBase excelfile)
{
List<string> mailids = new List<string>();
//string filename = (excelfile.FileName).ToString();
HSSFWorkbook hssfwb;
// using (FileStream file = new FileStream(#"D:\test.xls", FileMode.Open, FileAccess.Read))
using (FileStream file = new FileStream(excelFile.FileName, FileMode.Open, FileAccess.Read))
{
hssfwb = new HSSFWorkbook(file);
}
Sheet sheet = hssfwb.GetSheet("sheet1");
for (int row = 0; row <= sheet.LastRowNum; row++)
{
if (sheet.GetRow(row) != null) //null is when the row only contains empty cells
{
mailids.Add(sheet.GetRow(row).GetCell(0).ToString());
}
}
return mailids;
I came across the same problem and I solved it using Inpustream. I am pasting the code for your reference.
[HttpPost]
public DataTable PostValues(HttpPostedFileBase file)
{
ISheet sheet;
string filename = Path.GetFileName(Server.MapPath(file.FileName));
var fileExt = Path.GetExtension(filename);
if (fileExt == ".xls")
{
HSSFWorkbook hssfwb = new HSSFWorkbook(file.InputStream);
sheet = hssfwb.GetSheetAt(0);
}
else
{
XSSFWorkbook hssfwb = new XSSFWorkbook(file.InputStream);
sheet = hssfwb.GetSheetAt(0);
}
DataTable table = new DataTable();
IRow headerRow = sheet.GetRow(0);
int cellCount = headerRow.LastCellNum;
for (int i = headerRow.FirstCellNum; i < cellCount; i++)
{
DataColumn column = new DataColumn(headerRow.GetCell(i).StringCellValue);
table.Columns.Add(column);
}
int rowCount = sheet.LastRowNum;
for (int i = (sheet.FirstRowNum); i < sheet.LastRowNum; i++)
{
IRow row = sheet.GetRow(i);
DataRow dataRow = table.NewRow();
for (int j = row.FirstCellNum; j < cellCount; j++)
{
if (row.GetCell(j) != null)
{
dataRow[j] = row.GetCell(j).ToString();
}
}
table.Rows.Add(dataRow);
}
return table;
}
You can use a MemoryStream as well, so you should be able to get the byte array repsonse and open the spreadsheet. I'm unsure if the WorkbookFactory will detect the file type from the MemoryStream, so you may need to instruct the users to use the format that you require.
Here is how I use a MemoryStream to return a xls from a dot net core controller without ever storing it as a file.
IWorkbook wb = new HSSFWorkbook();
//Edit workbook object here
MemoryStream m = new MemoryStream();
wb.Write(m);
var byteArray = m.ToArray();
return new FileContentResult(byteArray, "application/vnd.ms-excel");

Optimal way to Read an Excel file (.xls/.xlsx)

I know that there are different ways to read an Excel file:
Iterop
Oledb
Open Xml SDK
Compatibility is not a question because the program will be executed in a controlled environment.
My Requirement :
Read a file to a DataTable / CUstom Entities (I don't know how to make dynamic properties/fields to an object[column names will be variating in an Excel file])
Use DataTable/Custom Entities to perform some operations using its data.
Update DataTable with the results of the operations
Write it back to excel file.
Which would be simpler.
Also if possible advice me on custom Entities (adding properties/fields to an object dynamically)
Take a look at Linq-to-Excel. It's pretty neat.
var book = new LinqToExcel.ExcelQueryFactory(#"File.xlsx");
var query =
from row in book.Worksheet("Stock Entry")
let item = new
{
Code = row["Code"].Cast<string>(),
Supplier = row["Supplier"].Cast<string>(),
Ref = row["Ref"].Cast<string>(),
}
where item.Supplier == "Walmart"
select item;
It also allows for strongly-typed row access too.
I realize this question was asked nearly 7 years ago but it's still a top Google search result for certain keywords regarding importing excel data with C#, so I wanted to provide an alternative based on some recent tech developments.
Importing Excel data has become such a common task to my everyday duties, that I've streamlined the process and documented the method on my blog: best way to read excel file in c#.
I use NPOI because it can read/write Excel files without Microsoft Office installed and it doesn't use COM+ or any interops. That means it can work in the cloud!
But the real magic comes from pairing up with NPOI Mapper from Donny Tian because it allows me to map the Excel columns to properties in my C# classes without writing any code. It's beautiful.
Here is the basic idea:
I create a .net class that matches/maps the Excel columns I'm interested in:
class CustomExcelFormat
{
[Column("District")]
public int District { get; set; }
[Column("DM")]
public string FullName { get; set; }
[Column("Email Address")]
public string EmailAddress { get; set; }
[Column("Username")]
public string Username { get; set; }
public string FirstName
{
get
{
return Username.Split('.')[0];
}
}
public string LastName
{
get
{
return Username.Split('.')[1];
}
}
}
Notice, it allows me to map based on column name if I want to!
Then when I process the excel file all I need to do is something like this:
public void Execute(string localPath, int sheetIndex)
{
IWorkbook workbook;
using (FileStream file = new FileStream(localPath, FileMode.Open, FileAccess.Read))
{
workbook = WorkbookFactory.Create(file);
}
var importer = new Mapper(workbook);
var items = importer.Take<CustomExcelFormat>(sheetIndex);
foreach(var item in items)
{
var row = item.Value;
if (string.IsNullOrEmpty(row.EmailAddress))
continue;
UpdateUser(row);
}
DataContext.SaveChanges();
}
Now, admittedly, my code does not modify the Excel file itself. I am instead saving the data to a database using Entity Framework (that's why you see "UpdateUser" and "SaveChanges" in my example). But there is already a good discussion on SO about how to save/modify a file using NPOI.
Using OLE Query, it's quite simple (e.g. sheetName is Sheet1):
DataTable LoadWorksheetInDataTable(string fileName, string sheetName)
{
DataTable sheetData = new DataTable();
using (OleDbConnection conn = this.returnConnection(fileName))
{
conn.Open();
// retrieve the data using data adapter
OleDbDataAdapter sheetAdapter = new OleDbDataAdapter("select * from [" + sheetName + "$]", conn);
sheetAdapter.Fill(sheetData);
conn.Close();
}
return sheetData;
}
private OleDbConnection returnConnection(string fileName)
{
return new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fileName + "; Jet OLEDB:Engine Type=5;Extended Properties=\"Excel 8.0;\"");
}
For newer Excel versions:
return new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=Excel 12.0;");
You can also use Excel Data Reader an open source project on CodePlex. Its works really well to export data from Excel sheets.
The sample code given on the link specified:
FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
//1. Reading from a binary Excel file ('97-2003 format; *.xls)
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
//...
//2. Reading from a OpenXml Excel file (2007 format; *.xlsx)
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//...
//3. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();
//...
//4. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();
//5. Data Reader methods
while (excelReader.Read())
{
//excelReader.GetInt32(0);
}
//6. Free resources (IExcelDataReader is IDisposable)
excelReader.Close();
Reference: How do I import from Excel to a DataSet using Microsoft.Office.Interop.Excel?
Try to use this free way to this, https://freenetexcel.codeplex.com
Workbook workbook = new Workbook();
workbook.LoadFromFile(#"..\..\parts.xls",ExcelVersion.Version97to2003);
//Initialize worksheet
Worksheet sheet = workbook.Worksheets[0];
DataTable dataTable = sheet.ExportDataTable();
If you can restrict it to just (Open Office XML format) *.xlsx files, then probably the most popular library would be EPPLus.
Bonus is, there are no other dependencies. Just install using nuget:
Install-Package EPPlus
Try to use Aspose.cells library (not free, but trial is enough to read), it is quite good
Install-package Aspose.cells
There is sample code:
using Aspose.Cells;
using System;
namespace ExcelReader
{
class Program
{
static void Main(string[] args)
{
// Replace path for your file
readXLS(#"C:\MyExcelFile.xls"); // or "*.xlsx"
Console.ReadKey();
}
public static void readXLS(string PathToMyExcel)
{
//Open your template file.
Workbook wb = new Workbook(PathToMyExcel);
//Get the first worksheet.
Worksheet worksheet = wb.Worksheets[0];
//Get cells
Cells cells = worksheet.Cells;
// Get row and column count
int rowCount = cells.MaxDataRow;
int columnCount = cells.MaxDataColumn;
// Current cell value
string strCell = "";
Console.WriteLine(String.Format("rowCount={0}, columnCount={1}", rowCount, columnCount));
for (int row = 0; row <= rowCount; row++) // Numeration starts from 0 to MaxDataRow
{
for (int column = 0; column <= columnCount; column++) // Numeration starts from 0 to MaxDataColumn
{
strCell = "";
strCell = Convert.ToString(cells[row, column].Value);
if (String.IsNullOrEmpty(strCell))
{
continue;
}
else
{
// Do your staff here
Console.WriteLine(strCell);
}
}
}
}
}
}
Read from excel, modify and write back
/// <summary>
/// /Reads an excel file and converts it into dataset with each sheet as each table of the dataset
/// </summary>
/// <param name="filename"></param>
/// <param name="headers">If set to true the first row will be considered as headers</param>
/// <returns></returns>
public DataSet Import(string filename, bool headers = true)
{
var _xl = new Excel.Application();
var wb = _xl.Workbooks.Open(filename);
var sheets = wb.Sheets;
DataSet dataSet = null;
if (sheets != null && sheets.Count != 0)
{
dataSet = new DataSet();
foreach (var item in sheets)
{
var sheet = (Excel.Worksheet)item;
DataTable dt = null;
if (sheet != null)
{
dt = new DataTable();
var ColumnCount = ((Excel.Range)sheet.UsedRange.Rows[1, Type.Missing]).Columns.Count;
var rowCount = ((Excel.Range)sheet.UsedRange.Columns[1, Type.Missing]).Rows.Count;
for (int j = 0; j < ColumnCount; j++)
{
var cell = (Excel.Range)sheet.Cells[1, j + 1];
var column = new DataColumn(headers ? cell.Value : string.Empty);
dt.Columns.Add(column);
}
for (int i = 0; i < rowCount; i++)
{
var r = dt.NewRow();
for (int j = 0; j < ColumnCount; j++)
{
var cell = (Excel.Range)sheet.Cells[i + 1 + (headers ? 1 : 0), j + 1];
r[j] = cell.Value;
}
dt.Rows.Add(r);
}
}
dataSet.Tables.Add(dt);
}
}
_xl.Quit();
return dataSet;
}
public string Export(DataTable dt, bool headers = false)
{
var wb = _xl.Workbooks.Add();
var sheet = (Excel.Worksheet)wb.ActiveSheet;
//process columns
for (int i = 0; i < dt.Columns.Count; i++)
{
var col = dt.Columns[i];
//added columns to the top of sheet
var currentCell = (Excel.Range)sheet.Cells[1, i + 1];
currentCell.Value = col.ToString();
currentCell.Font.Bold = true;
//process rows
for (int j = 0; j < dt.Rows.Count; j++)
{
var row = dt.Rows[j];
//added rows to sheet
var cell = (Excel.Range)sheet.Cells[j + 1 + 1, i + 1];
cell.Value = row[i];
}
currentCell.EntireColumn.AutoFit();
}
var fileName="{somepath/somefile.xlsx}";
wb.SaveCopyAs(fileName);
_xl.Quit();
return fileName;
}
I used Office's NuGet Package: DocumentFormat.OpenXml and pieced together the code from that component's doc site.
With the below helper code, was similar in complexity to my other CSV file format parsing in that project...
public static async Task ImportXLSX(Stream stream, string sheetName) {
{
// This was necessary for my Blazor project, which used a BrowserFileStream object
MemoryStream ms = new MemoryStream();
await stream.CopyToAsync(ms);
using (var document = SpreadsheetDocument.Open(ms, false))
{
// Retrieve a reference to the workbook part.
WorkbookPart wbPart = document.WorkbookPart;
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = wbPart?.Workbook.Descendants<Sheet>().Where(s => s?.Name == sheetName).FirstOrDefault();
// Throw an exception if there is no sheet.
if (theSheet == null)
{
throw new ArgumentException("sheetName");
}
WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(theSheet.Id));
// For shared strings, look up the value in the
// shared strings table.
var stringTable =
wbPart.GetPartsOfType<SharedStringTablePart>()
.FirstOrDefault();
// I needed to grab 4 cells from each row
// Starting at row 11, until the cell in column A is blank
int row = 11;
while (true) {
var accountNameCell = GetCell(wsPart, "A" + row.ToString());
var accountName = GetValue(accountNameCell, stringTable);
if (string.IsNullOrEmpty(accountName)) {
break;
}
var investmentNameCell = GetCell(wsPart, "B" + row.ToString());
var investmentName = GetValue(investmentNameCell, stringTable);
var symbolCell = GetCell(wsPart, "D" + row.ToString());
var symbol = GetValue(symbolCell, stringTable);
var marketValue = GetCell(wsPart, "J" + row.ToString()).InnerText;
// DO STUFF with data
row++;
}
}
}
private static string? GetValue(Cell cell, SharedStringTablePart stringTable) {
try {
return stringTable.SharedStringTable.ElementAt(int.Parse(cell.InnerText)).InnerText;
} catch (Exception) {
return null;
}
}
private static Cell GetCell(WorksheetPart wsPart, string cellReference) {
return wsPart.Worksheet.Descendants<Cell>().Where(c => c.CellReference.Value == cellReference)?.FirstOrDefault();
}

Categories