Reading from Excel File using ClosedXML - c#

My Excel file is not in tabular data. I am trying to read from an excel file.
I have sections within my excel file that are tabular.
I need to loop through rows 3 to 20 which are tabular and read the data.
Here is party of my code:
string fileName = "C:\\Folder1\\Prev.xlsx";
var workbook = new XLWorkbook(fileName);
var ws1 = workbook.Worksheet(1);
How do I loop through rows 3 to 20 and read columns 3,4, 6, 7, 8?
Also if a row is empty, how do I determine that so I can skip over it without reading that each column has a value for a given row.

To access a row:
var row = ws1.Row(3);
To check if the row is empty:
bool empty = row.IsEmpty();
To access a cell (column) in a row:
var cell = row.Cell(3);
To get the value from a cell:
object value = cell.Value;
// or
string value = cell.GetValue<string>();
For more information see the documentation.

Here's my jam.
var rows = worksheet.RangeUsed().RowsUsed().Skip(1); // Skip header row
foreach (var row in rows)
{
var rowNumber = row.RowNumber();
// Process the row
}
If you just use .RowsUsed(), your range will contain a huge number of columns. Way more than are actually filled in!
So use .RangeUsed() first to limit the range. This will help you process the file faster!
You can also use .Skip(1) to skip over the column header row (if you have one).

I'm not sure if this solution will solve OP's problem but I prefer using RowsUsed method. It can be used to get the list of only those rows which are non-empty or has been edited by the user. This way I can avoid making emptiness check while processing each row.
Below code snippet can process 3rd to 20th row numbers out of all the non-empty rows. I've filtered the empty rows before starting the foreach loop. Please bear in mind that filtering the non-empty rows before starting to process the rows can affect the total count of rows which will get processed. So you need to be careful while applying any logic which is based on the total number of rows processed inside foreach loop.
string fileName = "C:\\Folder1\\Prev.xlsx";
using (var excelWorkbook = new XLWorkbook(fileName))
{
var nonEmptyDataRows = excelWorkbook.Worksheet(1).RowsUsed();
foreach (var dataRow in nonEmptyDataRows)
{
//for row number check
if(dataRow.RowNumber() >=3 && dataRow.RowNumber() <= 20)
{
//to get column # 3's data
var cell = dataRow.Cell(3).Value;
}
}
}
RowsUsed method is helpful in commonly faced problems which require processing the rows of an excel sheet.

It works easily
XLWorkbook workbook = new XLWorkbook(FilePath);
var rowCount = workbook.Worksheet(1).LastRowUsed().RowNumber();
var columnCount = workbook.Worksheet(1).LastColumnUsed().ColumnNumber();
int column = 1;
int row = 1;
List<string> ll = new List<string>();
while (row <= rowCount)
{
while (column <= columnCount)
{
string title = workbook.Worksheets.Worksheet(1).Cell(row, column).GetString();
ll.Add(title);
column++;
}
row++;
column = 1;
}

Related

How to save reordered DataGridView columns in the updated order?

I have a datagridview in a winform that I read data into. I name each column after the count in the loop I use. Part of the function reading the data is below. The file I read from is a csv created from excel.
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields(); //read in the next row of data
dgv_data.Rows.Add(); // add new row
rowCount++;
//put row number inside left margin
dgv_data.Rows[rowCount - 1].HeaderCell.Value = rowCount.ToString();
for (int i = 0; i < col; i++)
{
dgv_data.Rows[rowCount - 1].Cells[i].Value = fields[i]; //put the data into the cell
//If the cell is true or a number greater than 1 then we colour it green
if (fields[i].ToLower() == "true") dgv_data.Rows[rowCount - 1].Cells[i].Style.BackColor = Color.SpringGreen;
if (int.TryParse(fields[i], out num))
{
if (int.Parse(fields[i]) > 0) dgv_data.Rows[rowCount - 1].Cells[i].Style.BackColor = Color.SpringGreen;
}
dgv_data.Rows[rowCount - 1].Cells[i].Tag = (rowCount - 1).ToString() + ":" + i.ToString(); //Unique cell tag
}
}
I need to reorder the columns as I need to save in a different order BUT I also need to reorder them back to original order so flip-flop between the two different orders. This I do with a simple function, I only show a few of the columns here as there are 30 in total. This works well even if a bit inefficient.
private void btn_reorder_Click(object sender, EventArgs e)
{
if (flag)
{
flag = false;
dgv_data.Columns[22].DisplayIndex = 0;
dgv_data.Columns[20].DisplayIndex = 1;
dgv_data.Columns[12].DisplayIndex = 2;
}
else
{
flag = true;
dgv_data.Columns[0].DisplayIndex = 0;
dgv_data.Columns[1].DisplayIndex = 1;
dgv_data.Columns[2].DisplayIndex = 2;
}
dgv_data.Refresh();
}
The issue comes when I need to save the data to a csv file, I do not get them saved in the new order. Before I save it I need to manipulate a few columns e.g change seconds to milliseconds. Using the following method, I can do this but when I save the file it always has the original layout.
var sb = new StringBuilder();
foreach (DataGridViewRow row in dgv_data.Rows)
{
row.Cells[1].Value = (int.Parse(row.Cells[1].Value.ToString()) * 1000).ToString();
var cells = row.Cells.Cast<DataGridViewCell>();
sb.AppendLine(string.Join(",", cells.Select(cell => "\"" + cell.Value + "\"").ToArray()));
}
File.WriteAllText(saveFileDialog1.FileName, sb.ToString());
I found on internet a different method and this does save the new layout but I cannot manipulate the cells before I save them.
dgv_data.ClipboardCopyMode = DataGridViewClipboardCopyMode.EnableWithoutHeaderText;
// Select all the cells
dgv_data.SelectAll();
// Copy selected cells to DataObject
DataObject dataObject = dgv_data.GetClipboardContent();
// Get the text of the DataObject, and serialize it to a file
File.WriteAllText(saveFileDialog1.FileName,
dataObject.GetText(TextDataFormat.CommaSeparatedValue));
How can I make sure that when I reorder the columns that I can save them in the same order as they are show in the DataGridView and still be able to flip-flop between the two column orders?
DataGridView Column have many ways you can address them, two are their name or their index number in the DataGridView Collection.
The user creates the Column name, but the Index is created by the system when columns are created, and I cannot see a way to ever edit this number.
If you want to reorder, the visual order of your columns in the GUI you change the DisplayIndex. This does not Change the Index number of the column. It just changes how the DGV looks in the UI.
I created a small example which you can download from https://github.com/zizwiz/DataGridView_ReorderColumn_Example
When you save the left hand reordered DGV by copy to clipboard you get the view in the GUI but if you save by parsing through the grid you get the original Index view.
To get round this if you want to reorder and then save that by parsing through the DGV then you must Copy the original DGV to a new DGV column by column in the order you now want. There are many ways to do this I just show on simple way what you would probably want to do is put the column in a temporary list, remove all columns and add them again.
I cannot find a way of changing the Index property of a column after it has been created, so this copy method although cumbersome is what I have used.
As this is only a quick example it does not have all the bells and whistles one might want to use, it just illustrates how I got over a problem I encountered. The files you save are put in the same folder as the "exe" you run.

How to always skip specified number of Rows using DocumentFormat.OpenXml

I iterate through Rows using DocumentFormat.OpenXml, sometimes i need to start from 4th, 8th, 11th row. I define how many rows should be skipped with "skipRows" and "If" below let's me skip unnecessary rows:
var rows = sheet.Descendants<Row>();
foreach (Row row in rows)
{
if (dataRowIndex < skipRows)
{
dataRowIndex++;
continue;
}
The problem is that sometimes when row is completely empty it automatically doesn't iterate through it. Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that? How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?
Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that?
This is due to the way the XML schema is defined. A row is completely optional in the schema; if there's no data in a row then there's no requirement to write it to the XML (although there's nothing to stop it being written either). If there is a cell in a row then the row must be written to the XML as a cell is a child of a row; without the row there would be nowhere to write the cell.
How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?
You can use the RowIndex property of the Row to find out the actual index of the Row being read.
The following example should do what you're after:
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
var rows = sheetData.Descendants<Row>();
foreach (Row row in rows)
{
if (row.RowIndex <= skipRows)
{
continue;
}
//this is just to show that it's outputting from the first non-skipped row
Cell cell = row.GetFirstChild<Cell>();
string contents;
if (cell.DataType == CellValues.SharedString)
{
int index = int.Parse(cell.CellValue.InnerText);
contents = stringTable.SharedStringTable.ElementAt(index).InnerText;
}
else
{
contents = cell.InnerText;
}
Console.WriteLine(contents);
}
}

OutOfMemoryException while trying to read big Excel file into DataTable

I'm using SSIS package to clean and load data from .Xlsx file to SQL Server table.
I have also to highlight cells containing wrong data in .Xlsx file, for this I have to get back column and row indexes based on column name and row id(witch I have in my data spreadsheet). For that I compare each column name from my first spreadsheet (Error_Sheet) with rows of a column that I added in a second spreadsheet and do the same for rows, and if I have the same value of cells I get back the column and row indexes of my data spreadsheet and highlight the cell based on that column and row index. The script worked fine, but after trying to run it from a server I got an Memory exception and also on my workstation where it was working fine before.
I've tried to reduce the range that I'm taking data from : AC1:AC10000 to AC1:AC100, it worked only after the first time compilation, but it keeps throwing exception again.
string strSQLErrorColumns = "Select * From [" + Error_Sheet + "AC1:AC100]";
OleDbConnection cn = new OleDbConnection(strCn);
OleDbDataAdapter objAdapterErrorColumns = new OleDbDataAdapter(strSQLErrorColumns, cn);
System.Data.DataSet dsErrorColumns = new DataSet();
objAdapterErrorColumns.Fill(dsErrorColumns, Error_Sheet);
System.Data.DataTable dtErrorColumns = dsErrorColumns.Tables[Error_Sheet];
dsErrorColumns.Dispose();
objAdapterErrorColumns.Dispose();
foreach (DataColumn ColumnData in dtDataColumns.Columns){
ColumnDataCellsValue = dtDataColumns.Columns[iCntD].ColumnName.ToString();
iCntE = 0;
foreach (DataRow ColumnError in dtErrorColumns.Rows){
ColumnErrorCellsValue = dtErrorColumns.Rows[iCntE].ItemArray[0].ToString();
if (ColumnDataCellsValue.Equals(ColumnErrorCellsValue)){
ColumnIndex = ColumnData.Table.Columns[ColumnDataCellsValue].Ordinal;
iCntE = iCntE + 1;
break;
}
}
iCntD = iCntD + 1;
}
ColumnIndexHCell = ColumnIndex + 1;
RowIndexHCell = RowIndex + 2;
Range rng = xlSheets.Cells[RowIndexHCell, ColumnIndexHCell] as Excel.Range;
rng.Interior.Color = System.Drawing.ColorTranslator.ToOle(System.Drawing.Color.Yellow);
There is any other way to load data in DataTable to get column and row index without using a lot of memory or by using Excel.Range.Cell instead of dataset and DataTable to get Cell value, column and row index from xlsx file please ?
I didn't show the whole code because it's long. Please keep me informed if more information needed.
When trying to read data from an Excel with huge number of Rows, it is better to read data by chunk (in OleDbDataAdapter you can use paging option to achieve that).
int result = 1;
int intPagingIndex = 0;
int intPagingInterval = 1000;
while (result > 0){
result = daGetDataFromSheet.Fill(dsErrorColumns,intPagingIndex, intPagingInterval , Error_Sheet);
System.Data.DataTable dtErrorColumns = dsErrorColumns.Tables[Error_Sheet];
//Implement your logic here
intPagingIndex += intPagingInterval ;
}
This will prevent an OutOfMemory Exception. And no more need to specify a range like AC1:AC10000
References
Paging Through a Query Result
Fill(DataSet, Int32, Int32, String)

How to read table of MS Word in c# winforms?

I have a table with 8 columns in MS Word. 7 columns are text based and one contains image. I want to read all values row by row and show in controls on form. I have tried following code its giving me an error. Also this code is for text I think
w = new Word.Application();
var document = w.Documents.Open(tbWordFile.Text.Trim());
for (int iCounter = 1; iCounter <= document.Tables.Count; iCounter++)
{
foreach (Row in document.Tables[iCounter].Rows)
{
foreach (Cell aCell in aRow.Cells)
{
currLine = aCell.Range.Text;
//Process Line
}
}
}
There is An error occured on "Row" variable is that "Row is inaccessible due to
its protection level
foreach (Row in document.Tables[iCounter].Rows)
you are iterating rows and your per item is what? You declare Row with no variable name. This should be something like:
foreach (Row aRow in document.Tables[iCounter].Rows)

How can i get actual used range for modified excels using Epplus?

I am reading data from excel to datable using EPPlus.
After reading an excel sheet with 10 rows of record, I modified the excel sheet by removing existing data and kept data for only one row.
But when I am reading the modified excel it still reading 10 rows (1 with value and remaining as null fields) to data table.
How can limit this?
I am using following code for reading Excel.
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(FilePath))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
bool hasHeader = true; // adjust it accordingly(this is a simple approach)
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
DSClientTransmittal.Tables[0].Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (var rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
//var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var wsRow = ws.Cells[rowNum, 1, rowNum, DSClientTransmittal.Tables[0].Columns.Count];
var row = DSClientTransmittal.Tables[0].NewRow();
foreach (var cell in wsRow)
{
try
{
object cellValue = cell.Value;
//row[cell.Start.Column - 1] = cell.Text;
row[cell.Start.Column - 1] = cellValue.ToString().Trim();
//cell.Style.Numberformat.Format = "#";
//row[cell.Start.Column - 1] = cell.Text;
}
catch (Exception ex) { }
}
DSClientTransmittal.Tables[0].Rows.Add(row);
}
pck.Dispose();
}
When I was using Interop excel to read excel, same issue was overcame by
clearformat() method like
ws.Columns.ClearFormats();
xlColCount = ws.UsedRange.Columns.Count;
Is there any equivalent for this in Epplus open xml?
How can I get actual used range for modified excels?
There is no built-in way of indicating that a row shouldn't be accounted for when only deleting data in some cells.
Dimension is as close as you can get, but rows are included in the Dimension if any column contains data or if any row above or below contains data.
You could however try to find out if you should skip a row in the for loop.
For example if you always delete data in the first 4 columns only, then you could try:
if(!ws.Cells[rowNum, 1, rowNum, 4].All(c => c.Value == null))
{
//Continue adding the row to the table
}
The description isn't indicating the criteria for skipping a row, but you get the idea.
To start with, I am not a C# programmer, but I think I have a solution that works using an Excel VBA script. You may be able to run this Excel VBA code with C, or get insight in how to accomplish the same thing with C+.
The problem you are having is related to the way Excel handles the working size of a worksheet. If you enter data in the 1 millionth row and then delete that cell, Excel still shows the worksheet as having 1 million rows.
I tested out this Excel VBA code and it successfully deleted all rows that were completely empty, and then reset the worksheet size.
Sub DelEmptyRowsResizeWorksheet()
Dim i As Long, iLimit As Long
iLimit = ActiveSheet.UsedRange.Rows.Count
For i = iLimit To 1 Step -1
If Application.CountA(Cells(i, 1).EntireRow) = 0 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
iLimit = ActiveSheet.UsedRange.Rows.Count ' resize the worksheet based on the last row with data
End Sub
To do this manually without a script, first delete all empty rows at the bottom (or columns on the right side) of a worksheet, save it, then close and reopen the workbook. I found that this also resets the Excel workbook size.

Categories