Why the last row never gets read? - c#

Hello well i have one question why the last row never gets read? It dosen´t matter if its only one row in the excel file or 100 rows. The last row never shows up in the List. And i have no clue why....
Here is my Excel File:
and this is my method:
public List<string> getListData(bool skipFirstRow, int numberOfColumns, string filepath)
{
int startpoint = 1;
int cell = 1;
int row = 1;
List<string> stringList = new List<string>();
//Open Excel (Application)
var excelApplication = openExcelApplication();
//Open Excel File
Excel.Workbook excelWorkbook = excelApplication.Workbooks.Open(filepath);
//Get the Worksheets from the file
Excel.Sheets excelSheets = excelWorkbook.Worksheets;
//Select the first Worksheet
Excel.Worksheet worksheet = (Excel.Worksheet)excelSheets.get_Item(1);
if (skipFirstRow == true)
{
startpoint = 2;
}
Excel.Range range = worksheet.get_Range("A" + Convert.ToString(startpoint), Missing.Value);
while ((range.Cells[startpoint, cell] as Excel.Range).Value2 != null)
{
for (int i = 1; i <= numberOfColumns + 1; i++)
{
string sValue = (range.Cells[row, cell] as Excel.Range).Value2.ToString();
stringList.Add(sValue);
cell++;
}
startpoint++;
cell = 1;
row++;
}
closeExcelApplication(excelApplication);
var result =
stringList
.Select((item, index) => new { Item = item, Index = index })
.GroupBy(x => x.Index / numberOfColumns)
.Select(g => string.Join(";", g.Select(x => x.Item)))
.ToList();
return result;
}
I tried it with the debugger and even google. Then i tried it with the last used row stuff but didnt worked.
Excel.Range last = worksheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);
Excel.Range range = worksheet.get_Range("A1", last);
int lastUsedRow = last.Row;
int lastUsedColumn = last.Column;
Any help or advise would be great so thanks for your time and help.

Your algorithm is buggy.
Let's see what happens when skipFirstRow is true and your Excel sheet has three rows 1, 2 and 3. At the start of the while loop, we have the following situation:
startpoint = 2
row = 1
During the first iteration, your while loop reads the contents of row 1. After the iteration, we have the following situation:
startpoint = 3
row = 2
During the second iteration, your while loop reads the contents of row 2. After the iteration, we have the following situation:
startpoint = 4
row = 3
Since range.Cells[startpoint, cell] is empty, your code stops here. Rows 1 and 2 have been read, row 3 has been ignored.
As you can see, the reason for your problem is that you check the row in startpoint and read the row in row, and when those two differ, you have a problem. Suggested solution: Drop the startpoint variable and use row instead.

The excel sheet display first row number as 1
but internally it starts from 0, may be it is reading all the data,put different test data on last column

Related

Export selected rows from datagridview to excel using spire.XLS & C#

I'm trying to export the manually selected rows of a datagridview on a windows form application to a excel file using spire.XLS and add the sum of the last column in the last row. The datagridview data is like below:
And when I run the code after selecting the 1st & the last rows then the excel should look like:
Here is my code:
void ExportBtnXLSXClick(object sender, EventArgs e)
{
Workbook book = new Workbook();
Worksheet sheet = book.Worksheets[0];
sheet.Name = "Exported from gridview";
//Convert data from datagridview to datatable
DataTable dt=GetDgvToTable(dataGridView1);
//Export datatable to excel
sheet.InsertDataTable(dt, true, 1, 1, -1, -1);
//sheet.InsertDataTable(dt, true, 1, 1, true);
sheet.Range[1,1,sheet.LastRow,sheet.LastColumn].AutoFitColumns();
sheet.AllocatedRange.BorderAround(LineStyleType.Thin, borderColor:ExcelColors.Black);
sheet.AllocatedRange.BorderInside(LineStyleType.Thin, borderColor:ExcelColors.Black);
book.SaveToFile(#"C:\Users\Tamal\Desktop\Spire.XLS C#\Report.xlsx", ExcelVersion.Version2013);
book.Dispose();
MessageBox.Show("Export complete");
}
with the helper method
public DataTable GetDgvToTable(DataGridView dgv)
{
DataTable dt = new DataTable();
//Column
for (int count = 0; count < dgv.Columns.Count; count++)
{
DataColumn dc = new DataColumn(dgv.Columns[count].Name.ToString());
dt.Columns.Add(dc);
}
//Row
for (int count = 0; count < dgv.SelectedRows.Count; count++)
{
DataRow dr = dt.NewRow();
for (int countsub = 0; countsub < dgv.Columns.Count; countsub++)
//for (int countsub = 0; countsub < dgv.SelectedRows.Count; countsub++)
{
dr[countsub] = Convert.ToString(dgv.Rows[count].Cells[countsub].Value);
}
dt.Rows.Add(dr);
}
decimal total = dataGridView1.SelectedRows.OfType<DataGridViewRow>()
.Sum(t => Convert.ToDecimal(t.Cells[2].Value));
dt.Rows.Add(total);
return dt;
}
But it is not showing the two rows that I selected and also the sum is showing on the 1st column of the last row in excel. How can I get the desired result?
Also, if I run the code multiple times then the data should not overwrite but just paste data after the last filled row, preferably keeping a row blank in between.
Here is how the datagridview looks like:
Please help
You have multiple questions here. For starters, if you want to add multiple selections to the same workbook, then you will need to do some things differently…
check to see if the file already exists, if not, then create a new one, if it does exist then you will need to “open” it.
If the file already exists, then, after opening the file, you will need to check if the worksheet "Exported from gridview" exists… if it does exist, then, you will need to find the last used row and start adding the additional rows after that last row.
Finally save the file.
Currently, the code is simply “creating” a new workbook, then the code is overwriting the existing workbook if it already exists. So, if you want to “add” additional selections to the “same” workbook, you will need to add the code that checks if the file exists, and if so, find the next available row on the worksheet to add the additional items to as explained in the previous 3 steps above.
Below is a simple example that works however it was not tested well and I am confident you may need to adjust the code to fit your requirements.
First, in the GetDgvToTable method, it appears the code is grabbing the wrong rows when creating the table….
for (int count = 0; count < dgv.SelectedRows.Count; count++)
{
DataRow dr = dt.NewRow();
for (int countsub = 0; countsub < dgv.Columns.Count; countsub++)
//for (int countsub = 0; countsub < dgv.SelectedRows.Count; countsub++)
{
dr[countsub] = Convert.ToString(dgv.Rows[count].Cells[countsub].Value);
}
dt.Rows.Add(dr);
}
Above, the code is looping through the “number of selected” rows. The problem is on the line…
dr[countsub] = Convert.ToString(dgv.Rows[count].Cells[countsub].Value);
Specifically, at …
dgv.Rows[count].
Here the code is ignoring the “SELECTED” rows. In other words, if the “selected” rows were contiguous “selected” rows starting from the “first” row (0), then it will work. Unfortunately, the first “selected” row may not necessarily be the “first” row in the grid. The code is incorrectly making this assumption.
To fix this, I suggest you loop through the “selected” rows using a foreach loop through grids SelectedRows collection. This will ensure the code uses the “SELECTED” rows. The code change would look something like…
foreach (DataGridViewRow row in dgv.SelectedRows) {
DataRow dr = dt.NewRow();
for (int i = 0; i < row.Cells.Count; i++) {
dr[i] = row.Cells[i].Value.ToString();
}
dt.Rows.Add(dr);
}
Next, you state you want to additional selections to be added to the same worksheet with an empty row between the selections. This was previously discussed above. In the code below, it checks to see if the Excel file already exist and if it does, then opens that file. Next a check is made to see if the worksheet already exists. If it does not exist, then a new one is created.
Next a check is made to see if any “previous” selections have already been added to the worksheet. I am not that familiar with Spire, however, I noted that if you call the method…
sheet.LastRow…
It will return the next available row… EXCEPT if the worksheet is empty. When the worksheet is empty, this will return a value like 65,570. I will leave this to you to possibly figure out a more elegant way to get the next empty row. In this case I simply checked if the sheet.LastRow was over 60000. If the value is over 60000, then I assume the sheet is empty and simply set the next available row to 1. I am confident there is a better way to do this.
Given all this, the changes below to both methods appears to do as you want by adding the additional selection to the same worksheet “below” the previous selections.
public DataTable GetDgvToTable(DataGridView dgv) {
DataTable dt = new DataTable();
//Column
for (int count = 0; count < dgv.Columns.Count; count++) {
DataColumn dc = new DataColumn(dgv.Columns[count].Name.ToString());
dt.Columns.Add(dc);
}
//Row
foreach (DataGridViewRow row in dgv.SelectedRows) {
DataRow dr = dt.NewRow();
for (int i = 0; i < row.Cells.Count; i++) {
dr[i] = row.Cells[i].Value.ToString();
}
dt.Rows.Add(dr);
}
decimal total = dataGridView1.SelectedRows.OfType<DataGridViewRow>()
.Sum(t => Convert.ToDecimal(t.Cells[2].Value));
dt.Rows.Add("","Total",total);
return dt;
}
Then the changes to the Export method.
private void btnExportToExcel_Click(object sender, EventArgs e) {
Workbook book = new Workbook();
if (File.Exists(#"pathToFile\Report.xlsx")) {
book.LoadFromFile(#"pathToFile\Report.xlsx");
}
Worksheet sheet = book.Worksheets["Exported from gridview"];
if (sheet == null) {
sheet = book.CreateEmptySheet("Exported from gridview");
}
//Convert data from datagridview to datatable
DataTable dt = GetDgvToTable(dataGridView1);
//Export datatable to excel
int startRow = sheet.LastRow + 2;
if (startRow > 60000) {
startRow = 1;
}
sheet.InsertDataTable(dt, true, startRow, 1, -1, -1);
sheet.Range[1, 1, sheet.LastRow, sheet.LastColumn].AutoFitColumns();
sheet.AllocatedRange.BorderAround(LineStyleType.Thin, borderColor: ExcelColors.Black);
sheet.AllocatedRange.BorderInside(LineStyleType.Thin, borderColor: ExcelColors.Black);
book.SaveToFile(#"pathToFile\Report.xlsx", ExcelVersion.Version2013);
book.Dispose();
MessageBox.Show("Export complete");
}
I hope this makes sense and helps.

find the row number of a datatable column containing specific value C#

I have a datatable in C# with a column called Point that contains various integer values. How do I find the row number of the first row that is equal to a specific value. E.g. Maybe I want to find the first time that the number 52 appears in the Point Column and it appears first at row 10. How do I find the value 10?
Note that I want to find the row number and not the value of another column at this position, hence why this question is different to:
Find row in datatable with specific id
A for loop is probably the simplest way. This answer returns the index of the row (row number) in the DataTable which matches a specific value.
int firstRow = 0;
for (int i = 0; i < dt.Rows.Count; i++)
{
var row = dt.Rows[i];
int point = Convert.ToInt32(row["Point"].ToString());
if (point == 52)
{
// i is the first row matching your condition
firstRow = i;
break;
}
}
The following may work for you:
DataTable table = new DataTable("SomeData");
table.Columns.Add("Point", typeof(int));
table.Rows.Add(5);
table.Rows.Add(7);
table.Rows.Add(52);
table.Rows.Add(2);
table.Rows.Add(1);
table.Rows.Add(4);
table.Rows.Add(9);
var row = table.AsEnumerable().Select((r, i) => new { Row = r, Index = i }).Where(x => (int)x.Row["Point"] == 52).FirstOrDefault();
int rowNumber = 0;
if (row != null)
rowNumber = row.Index + 1;
Note that in this example I give the row number, not the index that starts from zero.

Delete excel rows based on multiple criteria in different columns [duplicate]

I have a lot of excel files that contains data and it contains empty rows and empty columns.
like shown bellow
I am trying to remove Empty rows and columns from excel using interop.
I create a simple winform application and used the following code and it works fine.
Dim lstFiles As New List(Of String)
lstFiles.AddRange(IO.Directory.GetFiles(m_strFolderPath, "*.xls", IO.SearchOption.AllDirectories))
Dim m_XlApp = New Excel.Application
Dim m_xlWrkbs As Excel.Workbooks = m_XlApp.Workbooks
Dim m_xlWrkb As Excel.Workbook
For Each strFile As String In lstFiles
m_xlWrkb = m_xlWrkbs.Open(strFile)
Dim m_XlWrkSheet As Excel.Worksheet = m_xlWrkb.Worksheets(1)
Dim intRow As Integer = 1
While intRow <= m_XlWrkSheet.UsedRange.Rows.Count
If m_XlApp.WorksheetFunction.CountA(m_XlWrkSheet.Cells(intRow, 1).EntireRow) = 0 Then
m_XlWrkSheet.Cells(intRow, 1).EntireRow.Delete(Excel.XlDeleteShiftDirection.xlShiftUp)
Else
intRow += 1
End If
End While
Dim intCol As Integer = 1
While intCol <= m_XlWrkSheet.UsedRange.Columns.Count
If m_XlApp.WorksheetFunction.CountA(m_XlWrkSheet.Cells(1, intCol).EntireColumn) = 0 Then
m_XlWrkSheet.Cells(1, intCol).EntireColumn.Delete(Excel.XlDeleteShiftDirection.xlShiftToLeft)
Else
intCol += 1
End If
End While
Next
m_xlWrkb.Save()
m_xlWrkb.Close(SaveChanges:=True)
Marshal.ReleaseComObject(m_xlWrkb)
Marshal.ReleaseComObject(m_xlWrkbs)
m_XlApp.Quit()
Marshal.ReleaseComObject(m_XlApp)
But when cleaning big excel files it takes a lot of time.
Any suggestions for optimizing this code? or another way to clean this excel files faster? Is there a function that can delete empty rows in one click?
I don't have problem if answers are using C#
EDIT:
I uploaded a sample file Sample File. But not all files have same structure.
I found that looping through the excel worksheet can take some time if the worksheet is large. So my solution tried to avoid any looping in the worksheet. To avoid looping through the worksheet, I made a 2 dimensional object array from the cells returned from usedRange with:
Excel.Range targetCells = worksheet.UsedRange;
object[,] allValues = (object[,])targetCells.Cells.Value;
This is the array I loop through to get the indexes of the empty rows and columns. I make 2 int lists, one keeps the row indexes to delete the other keeps the column indexes to delete.
List<int> emptyRows = GetEmptyRows(allValues, totalRows, totalCols);
List<int> emptyCols = GetEmptyCols(allValues, totalRows, totalCols);
These lists will be sorted from high to low to simplify deleting rows from the bottom up and deleting columns from right to left. Then simply loop through each list and delete the appropriate row/col.
DeleteRows(emptyRows, worksheet);
DeleteCols(emptyCols, worksheet);
Finally after all the empty rows and columns have been deleted, I SaveAs the file to a new file name.
Hope this helps.
EDIT:
Addressed the UsedRange issue such that if there are empty rows at the top of the worksheet, those rows will now be removed. Also this will remove any empty columns to the left of the starting data. This allows for the indexing to work properly even if there are empty rows or columns before the data starts.
This was accomplished by taking the address of the first cell in UsedRange this will be an address of the form “$A$1:$D$4”. This will allow the use of an offset if the empty rows at the top and empty columns to the left are to remain and not be deleted. In this case I am simply deleting them. To get the number of rows to delete from the top can be calculated by the first “$A$4” address where the “4” is the row that the first data appears. So we need to delete the top 3 rows. The Column address is of the form “A”, “AB” or even “AAD” this required some translation and thanks to How to convert a column number (eg. 127) into an excel column (eg. AA) I was able to determine how many columns on the left need to be deleted.
class Program {
static void Main(string[] args) {
Excel.Application excel = new Excel.Application();
string originalPath = #"H:\ExcelTestFolder\Book1_Test.xls";
Excel.Workbook workbook = excel.Workbooks.Open(originalPath);
Excel.Worksheet worksheet = workbook.Worksheets["Sheet1"];
Excel.Range usedRange = worksheet.UsedRange;
RemoveEmptyTopRowsAndLeftCols(worksheet, usedRange);
DeleteEmptyRowsCols(worksheet);
string newPath = #"H:\ExcelTestFolder\Book1_Test_Removed.xls";
workbook.SaveAs(newPath, Excel.XlSaveAsAccessMode.xlNoChange);
workbook.Close();
excel.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(workbook);
System.Runtime.InteropServices.Marshal.ReleaseComObject(excel);
Console.WriteLine("Finished removing empty rows and columns - Press any key to exit");
Console.ReadKey();
}
private static void DeleteEmptyRowsCols(Excel.Worksheet worksheet) {
Excel.Range targetCells = worksheet.UsedRange;
object[,] allValues = (object[,])targetCells.Cells.Value;
int totalRows = targetCells.Rows.Count;
int totalCols = targetCells.Columns.Count;
List<int> emptyRows = GetEmptyRows(allValues, totalRows, totalCols);
List<int> emptyCols = GetEmptyCols(allValues, totalRows, totalCols);
// now we have a list of the empty rows and columns we need to delete
DeleteRows(emptyRows, worksheet);
DeleteCols(emptyCols, worksheet);
}
private static void DeleteRows(List<int> rowsToDelete, Excel.Worksheet worksheet) {
// the rows are sorted high to low - so index's wont shift
foreach (int rowIndex in rowsToDelete) {
worksheet.Rows[rowIndex].Delete();
}
}
private static void DeleteCols(List<int> colsToDelete, Excel.Worksheet worksheet) {
// the cols are sorted high to low - so index's wont shift
foreach (int colIndex in colsToDelete) {
worksheet.Columns[colIndex].Delete();
}
}
private static List<int> GetEmptyRows(object[,] allValues, int totalRows, int totalCols) {
List<int> emptyRows = new List<int>();
for (int i = 1; i < totalRows; i++) {
if (IsRowEmpty(allValues, i, totalCols)) {
emptyRows.Add(i);
}
}
// sort the list from high to low
return emptyRows.OrderByDescending(x => x).ToList();
}
private static List<int> GetEmptyCols(object[,] allValues, int totalRows, int totalCols) {
List<int> emptyCols = new List<int>();
for (int i = 1; i < totalCols; i++) {
if (IsColumnEmpty(allValues, i, totalRows)) {
emptyCols.Add(i);
}
}
// sort the list from high to low
return emptyCols.OrderByDescending(x => x).ToList();
}
private static bool IsColumnEmpty(object[,] allValues, int colIndex, int totalRows) {
for (int i = 1; i < totalRows; i++) {
if (allValues[i, colIndex] != null) {
return false;
}
}
return true;
}
private static bool IsRowEmpty(object[,] allValues, int rowIndex, int totalCols) {
for (int i = 1; i < totalCols; i++) {
if (allValues[rowIndex, i] != null) {
return false;
}
}
return true;
}
private static void RemoveEmptyTopRowsAndLeftCols(Excel.Worksheet worksheet, Excel.Range usedRange) {
string addressString = usedRange.Address.ToString();
int rowsToDelete = GetNumberOfTopRowsToDelete(addressString);
DeleteTopEmptyRows(worksheet, rowsToDelete);
int colsToDelete = GetNumberOfLeftColsToDelte(addressString);
DeleteLeftEmptyColumns(worksheet, colsToDelete);
}
private static void DeleteTopEmptyRows(Excel.Worksheet worksheet, int startRow) {
for (int i = 0; i < startRow - 1; i++) {
worksheet.Rows[1].Delete();
}
}
private static void DeleteLeftEmptyColumns(Excel.Worksheet worksheet, int colCount) {
for (int i = 0; i < colCount - 1; i++) {
worksheet.Columns[1].Delete();
}
}
private static int GetNumberOfTopRowsToDelete(string address) {
string[] splitArray = address.Split(':');
string firstIndex = splitArray[0];
splitArray = firstIndex.Split('$');
string value = splitArray[2];
int returnValue = -1;
if ((int.TryParse(value, out returnValue)) && (returnValue >= 0))
return returnValue;
return returnValue;
}
private static int GetNumberOfLeftColsToDelte(string address) {
string[] splitArray = address.Split(':');
string firstindex = splitArray[0];
splitArray = firstindex.Split('$');
string value = splitArray[1];
return ParseColHeaderToIndex(value);
}
private static int ParseColHeaderToIndex(string colAdress) {
int[] digits = new int[colAdress.Length];
for (int i = 0; i < colAdress.Length; ++i) {
digits[i] = Convert.ToInt32(colAdress[i]) - 64;
}
int mul = 1; int res = 0;
for (int pos = digits.Length - 1; pos >= 0; --pos) {
res += digits[pos] * mul;
mul *= 26;
}
return res;
}
}
EDIT 2: For testing I made a method that loops thru the the worksheet and compared it to my code that loops thru an object array. It shows a significant difference.
Method to Loop thru the worksheet and delete empty rows and columns.
enum RowOrCol { Row, Column };
private static void ConventionalRemoveEmptyRowsCols(Excel.Worksheet worksheet) {
Excel.Range usedRange = worksheet.UsedRange;
int totalRows = usedRange.Rows.Count;
int totalCols = usedRange.Columns.Count;
RemoveEmpty(usedRange, RowOrCol.Row);
RemoveEmpty(usedRange, RowOrCol.Column);
}
private static void RemoveEmpty(Excel.Range usedRange, RowOrCol rowOrCol) {
int count;
Excel.Range curRange;
if (rowOrCol == RowOrCol.Column)
count = usedRange.Columns.Count;
else
count = usedRange.Rows.Count;
for (int i = count; i > 0; i--) {
bool isEmpty = true;
if (rowOrCol == RowOrCol.Column)
curRange = usedRange.Columns[i];
else
curRange = usedRange.Rows[i];
foreach (Excel.Range cell in curRange.Cells) {
if (cell.Value != null) {
isEmpty = false;
break; // we can exit this loop since the range is not empty
}
else {
// Cell value is null contiue checking
}
} // end loop thru each cell in this range (row or column)
if (isEmpty) {
curRange.Delete();
}
}
}
Then a Main for testing/timing the two methods.
enum RowOrCol { Row, Column };
static void Main(string[] args)
{
Excel.Application excel = new Excel.Application();
string originalPath = #"H:\ExcelTestFolder\Book1_Test.xls";
Excel.Workbook workbook = excel.Workbooks.Open(originalPath);
Excel.Worksheet worksheet = workbook.Worksheets["Sheet1"];
Excel.Range usedRange = worksheet.UsedRange;
// Start test for looping thru each excel worksheet
Stopwatch sw = new Stopwatch();
Console.WriteLine("Start stopwatch to loop thru WORKSHEET...");
sw.Start();
ConventionalRemoveEmptyRowsCols(worksheet);
sw.Stop();
Console.WriteLine("It took a total of: " + sw.Elapsed.Milliseconds + " Miliseconds to remove empty rows and columns...");
string newPath = #"H:\ExcelTestFolder\Book1_Test_RemovedLoopThruWorksheet.xls";
workbook.SaveAs(newPath, Excel.XlSaveAsAccessMode.xlNoChange);
workbook.Close();
Console.WriteLine("");
// Start test for looping thru object array
workbook = excel.Workbooks.Open(originalPath);
worksheet = workbook.Worksheets["Sheet1"];
usedRange = worksheet.UsedRange;
Console.WriteLine("Start stopwatch to loop thru object array...");
sw = new Stopwatch();
sw.Start();
DeleteEmptyRowsCols(worksheet);
sw.Stop();
// display results from second test
Console.WriteLine("It took a total of: " + sw.Elapsed.Milliseconds + " Miliseconds to remove empty rows and columns...");
string newPath2 = #"H:\ExcelTestFolder\Book1_Test_RemovedLoopThruArray.xls";
workbook.SaveAs(newPath2, Excel.XlSaveAsAccessMode.xlNoChange);
workbook.Close();
excel.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(workbook);
System.Runtime.InteropServices.Marshal.ReleaseComObject(excel);
Console.WriteLine("");
Console.WriteLine("Finished testing methods - Press any key to exit");
Console.ReadKey();
}
EDIT 3 As per OP request...
I updated and changed the code to match the OP code. With this I found some interesting results. See below.
I changed the code to match the functions you are using ie… EntireRow and CountA. The code below I found that it preforms terribly. Running some tests I found the code below was in the 800+ milliseconds execution time. However one subtle change made a huge difference.
On the line:
while (rowIndex <= worksheet.UsedRange.Rows.Count)
This is slowing things down a lot. If you create a range variable for UsedRang and not keep regrabbibg it with each iteration of the while loop will make a huge difference. So… when I change the while loop to…
Excel.Range usedRange = worksheet.UsedRange;
int rowIndex = 1;
while (rowIndex <= usedRange.Rows.Count)
and
while (colIndex <= usedRange.Columns.Count)
This performed very close to my object array solution. I did not post the results, as you can use the code below and change the while loop to grab the UsedRange with each iteration or use the variable usedRange to test this.
private static void RemoveEmptyRowsCols3(Excel.Worksheet worksheet) {
//Excel.Range usedRange = worksheet.UsedRange; // <- using this variable makes the while loop much faster
int rowIndex = 1;
// delete empty rows
//while (rowIndex <= usedRange.Rows.Count) // <- changing this one line makes a huge difference - not grabbibg the UsedRange with each iteration...
while (rowIndex <= worksheet.UsedRange.Rows.Count) {
if (excel.WorksheetFunction.CountA(worksheet.Cells[rowIndex, 1].EntireRow) == 0) {
worksheet.Cells[rowIndex, 1].EntireRow.Delete(Excel.XlDeleteShiftDirection.xlShiftUp);
}
else {
rowIndex++;
}
}
// delete empty columns
int colIndex = 1;
// while (colIndex <= usedRange.Columns.Count) // <- change here also
while (colIndex <= worksheet.UsedRange.Columns.Count) {
if (excel.WorksheetFunction.CountA(worksheet.Cells[1, colIndex].EntireColumn) == 0) {
worksheet.Cells[1, colIndex].EntireColumn.Delete(Excel.XlDeleteShiftDirection.xlShiftToLeft);
}
else {
colIndex++;
}
}
}
UPDATE by #Hadi
You can alter DeleteCols and DeleteRows function to get better performance if excel contains extra blank rows and columns after the last used ones:
private static void DeleteRows(List<int> rowsToDelete, Microsoft.Office.Interop.Excel.Worksheet worksheet)
{
// the rows are sorted high to low - so index's wont shift
List<int> NonEmptyRows = Enumerable.Range(1, rowsToDelete.Max()).ToList().Except(rowsToDelete).ToList();
if (NonEmptyRows.Max() < rowsToDelete.Max())
{
// there are empty rows after the last non empty row
Microsoft.Office.Interop.Excel.Range cell1 = worksheet.Cells[NonEmptyRows.Max() + 1,1];
Microsoft.Office.Interop.Excel.Range cell2 = worksheet.Cells[rowsToDelete.Max(), 1];
//Delete all empty rows after the last used row
worksheet.Range[cell1, cell2].EntireRow.Delete(Microsoft.Office.Interop.Excel.XlDeleteShiftDirection.xlShiftUp);
} //else last non empty row = worksheet.Rows.Count
foreach (int rowIndex in rowsToDelete.Where(x => x < NonEmptyRows.Max()))
{
worksheet.Rows[rowIndex].Delete();
}
}
private static void DeleteCols(List<int> colsToDelete, Microsoft.Office.Interop.Excel.Worksheet worksheet)
{
// the cols are sorted high to low - so index's wont shift
//Get non Empty Cols
List<int> NonEmptyCols = Enumerable.Range(1, colsToDelete.Max()).ToList().Except(colsToDelete).ToList();
if (NonEmptyCols.Max() < colsToDelete.Max())
{
// there are empty rows after the last non empty row
Microsoft.Office.Interop.Excel.Range cell1 = worksheet.Cells[1,NonEmptyCols.Max() + 1];
Microsoft.Office.Interop.Excel.Range cell2 = worksheet.Cells[1,NonEmptyCols.Max()];
//Delete all empty rows after the last used row
worksheet.Range[cell1, cell2].EntireColumn.Delete(Microsoft.Office.Interop.Excel.XlDeleteShiftDirection.xlShiftToLeft);
} //else last non empty column = worksheet.Columns.Count
foreach (int colIndex in colsToDelete.Where(x => x < NonEmptyCols.Max()))
{
worksheet.Columns[colIndex].Delete();
}
}
check my answer at Get Last non empty column and row index from excel using Interop
Maybe something to consider:
Sub usedRangeDeleteRowsCols()
Dim LastRow, LastCol, i As Long
LastRow = Cells.Find(What:="*", SearchDirection:=xlPrevious, SearchOrder:=xlByRows).Row
LastCol = Cells.Find(What:="*", SearchDirection:=xlPrevious, SearchOrder:=xlByColumns).Column
For i = LastRow To 1 Step -1
If WorksheetFunction.CountA(Range(Cells(i, 1), Cells(i, LastCol))) = 0 Then
Cells(i, 1).EntireRow.Delete
End If
Next
For i = LastCol To 1 Step -1
If WorksheetFunction.CountA(Range(Cells(1, i), Cells(LastRow, i))) = 0 Then
Cells(1, i).EntireColumn.Delete
End If
Next
End Sub
I think there are two efficiencies compared to equivalent functions in the original code. Firstly, instead of using Excel's unreliable UsedRange property, we find the last value and only scan rows and columns within the genuine used range.
Secondly the worksheet count function again only works within the genuine used range - for example when searching for blank rows we only look in the range of used columns (rather than .EntireRow).
The For loops work backwards because, for example, every time a row is deleted, the row address of following data changes. Working backwards means the row addresses of "data to be worked on" doesn't change.
In my opinion the most time consuming part could be enumerating and finding empty rows and columns.
What about:
http://www.howtogeek.com/206696/how-to-quickly-and-easily-delete-blank-rows-and-columns-in-excel-2013/
EDIT:
What about:
m_XlWrkSheet.Columns("A:A").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
m_XlWrkSheet.Rows("1:1").SpecialCells(xlCellTypeBlanks).EntireColumn.Delete
Tested on sample data result looks ok, performance better (tested from VBA but difference is huge).
UPDATE:
Tested on sample Excel with 14k rows (made from sample data) original code ~30 s, this version <1s
The easiest way that I know of is to hide non-blank cells and delete the visible ones:
var range = m_XlWrkSheet.UsedRange;
range.SpecialCells(XlCellType.xlCellTypeConstants).EntireRow.Hidden = true;
range.SpecialCells(XlCellType.xlCellTypeVisible).Delete(XlDeleteShiftDirection.xlShiftUp);
range.EntireRow.Hidden = false;
Faster methods are to not delete anything at all, but to move (cut+paste) the non-blank areas.
The fastest Interop way (there are faster more complicated methods without opening the file) is to get all values in array, move the values in the array, and put the values back:
object[,] values = m_XlWrkSheet.UsedRange.Value2 as object[,];
// some code here (the values start from values[1, 1] not values[0, 0])
m_XlWrkSheet.UsedRange.Value2 = values;
You could open an ADO connection to the worksheet, get a list of fields, issue an SQL statement which includes only known fields, and also exclude records with no values in the known fields.

Fastest method to remove Empty rows and Columns From Excel Files using Interop

I have a lot of excel files that contains data and it contains empty rows and empty columns.
like shown bellow
I am trying to remove Empty rows and columns from excel using interop.
I create a simple winform application and used the following code and it works fine.
Dim lstFiles As New List(Of String)
lstFiles.AddRange(IO.Directory.GetFiles(m_strFolderPath, "*.xls", IO.SearchOption.AllDirectories))
Dim m_XlApp = New Excel.Application
Dim m_xlWrkbs As Excel.Workbooks = m_XlApp.Workbooks
Dim m_xlWrkb As Excel.Workbook
For Each strFile As String In lstFiles
m_xlWrkb = m_xlWrkbs.Open(strFile)
Dim m_XlWrkSheet As Excel.Worksheet = m_xlWrkb.Worksheets(1)
Dim intRow As Integer = 1
While intRow <= m_XlWrkSheet.UsedRange.Rows.Count
If m_XlApp.WorksheetFunction.CountA(m_XlWrkSheet.Cells(intRow, 1).EntireRow) = 0 Then
m_XlWrkSheet.Cells(intRow, 1).EntireRow.Delete(Excel.XlDeleteShiftDirection.xlShiftUp)
Else
intRow += 1
End If
End While
Dim intCol As Integer = 1
While intCol <= m_XlWrkSheet.UsedRange.Columns.Count
If m_XlApp.WorksheetFunction.CountA(m_XlWrkSheet.Cells(1, intCol).EntireColumn) = 0 Then
m_XlWrkSheet.Cells(1, intCol).EntireColumn.Delete(Excel.XlDeleteShiftDirection.xlShiftToLeft)
Else
intCol += 1
End If
End While
Next
m_xlWrkb.Save()
m_xlWrkb.Close(SaveChanges:=True)
Marshal.ReleaseComObject(m_xlWrkb)
Marshal.ReleaseComObject(m_xlWrkbs)
m_XlApp.Quit()
Marshal.ReleaseComObject(m_XlApp)
But when cleaning big excel files it takes a lot of time.
Any suggestions for optimizing this code? or another way to clean this excel files faster? Is there a function that can delete empty rows in one click?
I don't have problem if answers are using C#
EDIT:
I uploaded a sample file Sample File. But not all files have same structure.
I found that looping through the excel worksheet can take some time if the worksheet is large. So my solution tried to avoid any looping in the worksheet. To avoid looping through the worksheet, I made a 2 dimensional object array from the cells returned from usedRange with:
Excel.Range targetCells = worksheet.UsedRange;
object[,] allValues = (object[,])targetCells.Cells.Value;
This is the array I loop through to get the indexes of the empty rows and columns. I make 2 int lists, one keeps the row indexes to delete the other keeps the column indexes to delete.
List<int> emptyRows = GetEmptyRows(allValues, totalRows, totalCols);
List<int> emptyCols = GetEmptyCols(allValues, totalRows, totalCols);
These lists will be sorted from high to low to simplify deleting rows from the bottom up and deleting columns from right to left. Then simply loop through each list and delete the appropriate row/col.
DeleteRows(emptyRows, worksheet);
DeleteCols(emptyCols, worksheet);
Finally after all the empty rows and columns have been deleted, I SaveAs the file to a new file name.
Hope this helps.
EDIT:
Addressed the UsedRange issue such that if there are empty rows at the top of the worksheet, those rows will now be removed. Also this will remove any empty columns to the left of the starting data. This allows for the indexing to work properly even if there are empty rows or columns before the data starts.
This was accomplished by taking the address of the first cell in UsedRange this will be an address of the form “$A$1:$D$4”. This will allow the use of an offset if the empty rows at the top and empty columns to the left are to remain and not be deleted. In this case I am simply deleting them. To get the number of rows to delete from the top can be calculated by the first “$A$4” address where the “4” is the row that the first data appears. So we need to delete the top 3 rows. The Column address is of the form “A”, “AB” or even “AAD” this required some translation and thanks to How to convert a column number (eg. 127) into an excel column (eg. AA) I was able to determine how many columns on the left need to be deleted.
class Program {
static void Main(string[] args) {
Excel.Application excel = new Excel.Application();
string originalPath = #"H:\ExcelTestFolder\Book1_Test.xls";
Excel.Workbook workbook = excel.Workbooks.Open(originalPath);
Excel.Worksheet worksheet = workbook.Worksheets["Sheet1"];
Excel.Range usedRange = worksheet.UsedRange;
RemoveEmptyTopRowsAndLeftCols(worksheet, usedRange);
DeleteEmptyRowsCols(worksheet);
string newPath = #"H:\ExcelTestFolder\Book1_Test_Removed.xls";
workbook.SaveAs(newPath, Excel.XlSaveAsAccessMode.xlNoChange);
workbook.Close();
excel.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(workbook);
System.Runtime.InteropServices.Marshal.ReleaseComObject(excel);
Console.WriteLine("Finished removing empty rows and columns - Press any key to exit");
Console.ReadKey();
}
private static void DeleteEmptyRowsCols(Excel.Worksheet worksheet) {
Excel.Range targetCells = worksheet.UsedRange;
object[,] allValues = (object[,])targetCells.Cells.Value;
int totalRows = targetCells.Rows.Count;
int totalCols = targetCells.Columns.Count;
List<int> emptyRows = GetEmptyRows(allValues, totalRows, totalCols);
List<int> emptyCols = GetEmptyCols(allValues, totalRows, totalCols);
// now we have a list of the empty rows and columns we need to delete
DeleteRows(emptyRows, worksheet);
DeleteCols(emptyCols, worksheet);
}
private static void DeleteRows(List<int> rowsToDelete, Excel.Worksheet worksheet) {
// the rows are sorted high to low - so index's wont shift
foreach (int rowIndex in rowsToDelete) {
worksheet.Rows[rowIndex].Delete();
}
}
private static void DeleteCols(List<int> colsToDelete, Excel.Worksheet worksheet) {
// the cols are sorted high to low - so index's wont shift
foreach (int colIndex in colsToDelete) {
worksheet.Columns[colIndex].Delete();
}
}
private static List<int> GetEmptyRows(object[,] allValues, int totalRows, int totalCols) {
List<int> emptyRows = new List<int>();
for (int i = 1; i < totalRows; i++) {
if (IsRowEmpty(allValues, i, totalCols)) {
emptyRows.Add(i);
}
}
// sort the list from high to low
return emptyRows.OrderByDescending(x => x).ToList();
}
private static List<int> GetEmptyCols(object[,] allValues, int totalRows, int totalCols) {
List<int> emptyCols = new List<int>();
for (int i = 1; i < totalCols; i++) {
if (IsColumnEmpty(allValues, i, totalRows)) {
emptyCols.Add(i);
}
}
// sort the list from high to low
return emptyCols.OrderByDescending(x => x).ToList();
}
private static bool IsColumnEmpty(object[,] allValues, int colIndex, int totalRows) {
for (int i = 1; i < totalRows; i++) {
if (allValues[i, colIndex] != null) {
return false;
}
}
return true;
}
private static bool IsRowEmpty(object[,] allValues, int rowIndex, int totalCols) {
for (int i = 1; i < totalCols; i++) {
if (allValues[rowIndex, i] != null) {
return false;
}
}
return true;
}
private static void RemoveEmptyTopRowsAndLeftCols(Excel.Worksheet worksheet, Excel.Range usedRange) {
string addressString = usedRange.Address.ToString();
int rowsToDelete = GetNumberOfTopRowsToDelete(addressString);
DeleteTopEmptyRows(worksheet, rowsToDelete);
int colsToDelete = GetNumberOfLeftColsToDelte(addressString);
DeleteLeftEmptyColumns(worksheet, colsToDelete);
}
private static void DeleteTopEmptyRows(Excel.Worksheet worksheet, int startRow) {
for (int i = 0; i < startRow - 1; i++) {
worksheet.Rows[1].Delete();
}
}
private static void DeleteLeftEmptyColumns(Excel.Worksheet worksheet, int colCount) {
for (int i = 0; i < colCount - 1; i++) {
worksheet.Columns[1].Delete();
}
}
private static int GetNumberOfTopRowsToDelete(string address) {
string[] splitArray = address.Split(':');
string firstIndex = splitArray[0];
splitArray = firstIndex.Split('$');
string value = splitArray[2];
int returnValue = -1;
if ((int.TryParse(value, out returnValue)) && (returnValue >= 0))
return returnValue;
return returnValue;
}
private static int GetNumberOfLeftColsToDelte(string address) {
string[] splitArray = address.Split(':');
string firstindex = splitArray[0];
splitArray = firstindex.Split('$');
string value = splitArray[1];
return ParseColHeaderToIndex(value);
}
private static int ParseColHeaderToIndex(string colAdress) {
int[] digits = new int[colAdress.Length];
for (int i = 0; i < colAdress.Length; ++i) {
digits[i] = Convert.ToInt32(colAdress[i]) - 64;
}
int mul = 1; int res = 0;
for (int pos = digits.Length - 1; pos >= 0; --pos) {
res += digits[pos] * mul;
mul *= 26;
}
return res;
}
}
EDIT 2: For testing I made a method that loops thru the the worksheet and compared it to my code that loops thru an object array. It shows a significant difference.
Method to Loop thru the worksheet and delete empty rows and columns.
enum RowOrCol { Row, Column };
private static void ConventionalRemoveEmptyRowsCols(Excel.Worksheet worksheet) {
Excel.Range usedRange = worksheet.UsedRange;
int totalRows = usedRange.Rows.Count;
int totalCols = usedRange.Columns.Count;
RemoveEmpty(usedRange, RowOrCol.Row);
RemoveEmpty(usedRange, RowOrCol.Column);
}
private static void RemoveEmpty(Excel.Range usedRange, RowOrCol rowOrCol) {
int count;
Excel.Range curRange;
if (rowOrCol == RowOrCol.Column)
count = usedRange.Columns.Count;
else
count = usedRange.Rows.Count;
for (int i = count; i > 0; i--) {
bool isEmpty = true;
if (rowOrCol == RowOrCol.Column)
curRange = usedRange.Columns[i];
else
curRange = usedRange.Rows[i];
foreach (Excel.Range cell in curRange.Cells) {
if (cell.Value != null) {
isEmpty = false;
break; // we can exit this loop since the range is not empty
}
else {
// Cell value is null contiue checking
}
} // end loop thru each cell in this range (row or column)
if (isEmpty) {
curRange.Delete();
}
}
}
Then a Main for testing/timing the two methods.
enum RowOrCol { Row, Column };
static void Main(string[] args)
{
Excel.Application excel = new Excel.Application();
string originalPath = #"H:\ExcelTestFolder\Book1_Test.xls";
Excel.Workbook workbook = excel.Workbooks.Open(originalPath);
Excel.Worksheet worksheet = workbook.Worksheets["Sheet1"];
Excel.Range usedRange = worksheet.UsedRange;
// Start test for looping thru each excel worksheet
Stopwatch sw = new Stopwatch();
Console.WriteLine("Start stopwatch to loop thru WORKSHEET...");
sw.Start();
ConventionalRemoveEmptyRowsCols(worksheet);
sw.Stop();
Console.WriteLine("It took a total of: " + sw.Elapsed.Milliseconds + " Miliseconds to remove empty rows and columns...");
string newPath = #"H:\ExcelTestFolder\Book1_Test_RemovedLoopThruWorksheet.xls";
workbook.SaveAs(newPath, Excel.XlSaveAsAccessMode.xlNoChange);
workbook.Close();
Console.WriteLine("");
// Start test for looping thru object array
workbook = excel.Workbooks.Open(originalPath);
worksheet = workbook.Worksheets["Sheet1"];
usedRange = worksheet.UsedRange;
Console.WriteLine("Start stopwatch to loop thru object array...");
sw = new Stopwatch();
sw.Start();
DeleteEmptyRowsCols(worksheet);
sw.Stop();
// display results from second test
Console.WriteLine("It took a total of: " + sw.Elapsed.Milliseconds + " Miliseconds to remove empty rows and columns...");
string newPath2 = #"H:\ExcelTestFolder\Book1_Test_RemovedLoopThruArray.xls";
workbook.SaveAs(newPath2, Excel.XlSaveAsAccessMode.xlNoChange);
workbook.Close();
excel.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(workbook);
System.Runtime.InteropServices.Marshal.ReleaseComObject(excel);
Console.WriteLine("");
Console.WriteLine("Finished testing methods - Press any key to exit");
Console.ReadKey();
}
EDIT 3 As per OP request...
I updated and changed the code to match the OP code. With this I found some interesting results. See below.
I changed the code to match the functions you are using ie… EntireRow and CountA. The code below I found that it preforms terribly. Running some tests I found the code below was in the 800+ milliseconds execution time. However one subtle change made a huge difference.
On the line:
while (rowIndex <= worksheet.UsedRange.Rows.Count)
This is slowing things down a lot. If you create a range variable for UsedRang and not keep regrabbibg it with each iteration of the while loop will make a huge difference. So… when I change the while loop to…
Excel.Range usedRange = worksheet.UsedRange;
int rowIndex = 1;
while (rowIndex <= usedRange.Rows.Count)
and
while (colIndex <= usedRange.Columns.Count)
This performed very close to my object array solution. I did not post the results, as you can use the code below and change the while loop to grab the UsedRange with each iteration or use the variable usedRange to test this.
private static void RemoveEmptyRowsCols3(Excel.Worksheet worksheet) {
//Excel.Range usedRange = worksheet.UsedRange; // <- using this variable makes the while loop much faster
int rowIndex = 1;
// delete empty rows
//while (rowIndex <= usedRange.Rows.Count) // <- changing this one line makes a huge difference - not grabbibg the UsedRange with each iteration...
while (rowIndex <= worksheet.UsedRange.Rows.Count) {
if (excel.WorksheetFunction.CountA(worksheet.Cells[rowIndex, 1].EntireRow) == 0) {
worksheet.Cells[rowIndex, 1].EntireRow.Delete(Excel.XlDeleteShiftDirection.xlShiftUp);
}
else {
rowIndex++;
}
}
// delete empty columns
int colIndex = 1;
// while (colIndex <= usedRange.Columns.Count) // <- change here also
while (colIndex <= worksheet.UsedRange.Columns.Count) {
if (excel.WorksheetFunction.CountA(worksheet.Cells[1, colIndex].EntireColumn) == 0) {
worksheet.Cells[1, colIndex].EntireColumn.Delete(Excel.XlDeleteShiftDirection.xlShiftToLeft);
}
else {
colIndex++;
}
}
}
UPDATE by #Hadi
You can alter DeleteCols and DeleteRows function to get better performance if excel contains extra blank rows and columns after the last used ones:
private static void DeleteRows(List<int> rowsToDelete, Microsoft.Office.Interop.Excel.Worksheet worksheet)
{
// the rows are sorted high to low - so index's wont shift
List<int> NonEmptyRows = Enumerable.Range(1, rowsToDelete.Max()).ToList().Except(rowsToDelete).ToList();
if (NonEmptyRows.Max() < rowsToDelete.Max())
{
// there are empty rows after the last non empty row
Microsoft.Office.Interop.Excel.Range cell1 = worksheet.Cells[NonEmptyRows.Max() + 1,1];
Microsoft.Office.Interop.Excel.Range cell2 = worksheet.Cells[rowsToDelete.Max(), 1];
//Delete all empty rows after the last used row
worksheet.Range[cell1, cell2].EntireRow.Delete(Microsoft.Office.Interop.Excel.XlDeleteShiftDirection.xlShiftUp);
} //else last non empty row = worksheet.Rows.Count
foreach (int rowIndex in rowsToDelete.Where(x => x < NonEmptyRows.Max()))
{
worksheet.Rows[rowIndex].Delete();
}
}
private static void DeleteCols(List<int> colsToDelete, Microsoft.Office.Interop.Excel.Worksheet worksheet)
{
// the cols are sorted high to low - so index's wont shift
//Get non Empty Cols
List<int> NonEmptyCols = Enumerable.Range(1, colsToDelete.Max()).ToList().Except(colsToDelete).ToList();
if (NonEmptyCols.Max() < colsToDelete.Max())
{
// there are empty rows after the last non empty row
Microsoft.Office.Interop.Excel.Range cell1 = worksheet.Cells[1,NonEmptyCols.Max() + 1];
Microsoft.Office.Interop.Excel.Range cell2 = worksheet.Cells[1,NonEmptyCols.Max()];
//Delete all empty rows after the last used row
worksheet.Range[cell1, cell2].EntireColumn.Delete(Microsoft.Office.Interop.Excel.XlDeleteShiftDirection.xlShiftToLeft);
} //else last non empty column = worksheet.Columns.Count
foreach (int colIndex in colsToDelete.Where(x => x < NonEmptyCols.Max()))
{
worksheet.Columns[colIndex].Delete();
}
}
check my answer at Get Last non empty column and row index from excel using Interop
Maybe something to consider:
Sub usedRangeDeleteRowsCols()
Dim LastRow, LastCol, i As Long
LastRow = Cells.Find(What:="*", SearchDirection:=xlPrevious, SearchOrder:=xlByRows).Row
LastCol = Cells.Find(What:="*", SearchDirection:=xlPrevious, SearchOrder:=xlByColumns).Column
For i = LastRow To 1 Step -1
If WorksheetFunction.CountA(Range(Cells(i, 1), Cells(i, LastCol))) = 0 Then
Cells(i, 1).EntireRow.Delete
End If
Next
For i = LastCol To 1 Step -1
If WorksheetFunction.CountA(Range(Cells(1, i), Cells(LastRow, i))) = 0 Then
Cells(1, i).EntireColumn.Delete
End If
Next
End Sub
I think there are two efficiencies compared to equivalent functions in the original code. Firstly, instead of using Excel's unreliable UsedRange property, we find the last value and only scan rows and columns within the genuine used range.
Secondly the worksheet count function again only works within the genuine used range - for example when searching for blank rows we only look in the range of used columns (rather than .EntireRow).
The For loops work backwards because, for example, every time a row is deleted, the row address of following data changes. Working backwards means the row addresses of "data to be worked on" doesn't change.
In my opinion the most time consuming part could be enumerating and finding empty rows and columns.
What about:
http://www.howtogeek.com/206696/how-to-quickly-and-easily-delete-blank-rows-and-columns-in-excel-2013/
EDIT:
What about:
m_XlWrkSheet.Columns("A:A").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
m_XlWrkSheet.Rows("1:1").SpecialCells(xlCellTypeBlanks).EntireColumn.Delete
Tested on sample data result looks ok, performance better (tested from VBA but difference is huge).
UPDATE:
Tested on sample Excel with 14k rows (made from sample data) original code ~30 s, this version <1s
The easiest way that I know of is to hide non-blank cells and delete the visible ones:
var range = m_XlWrkSheet.UsedRange;
range.SpecialCells(XlCellType.xlCellTypeConstants).EntireRow.Hidden = true;
range.SpecialCells(XlCellType.xlCellTypeVisible).Delete(XlDeleteShiftDirection.xlShiftUp);
range.EntireRow.Hidden = false;
Faster methods are to not delete anything at all, but to move (cut+paste) the non-blank areas.
The fastest Interop way (there are faster more complicated methods without opening the file) is to get all values in array, move the values in the array, and put the values back:
object[,] values = m_XlWrkSheet.UsedRange.Value2 as object[,];
// some code here (the values start from values[1, 1] not values[0, 0])
m_XlWrkSheet.UsedRange.Value2 = values;
You could open an ADO connection to the worksheet, get a list of fields, issue an SQL statement which includes only known fields, and also exclude records with no values in the known fields.

Storing Excel Rows in List<Range> for later insertion into different workbook

I am splitting one large spreadsheet into many (100's) smaller spreadsheet. My approach is to store the rows of the source spreadsheet in a list:
List<Range> ranges = new List<Range>();
Workbook book = xl.Workbooks.Add("path to book");
Worksheet sheet = book.sheets[1];
for (int r = 1; r <= sheet.UsedRange.Rows.Count; r++)
{
ranges.Add((Range)sheet.Rows[r]);
}
book.Close();
......
Workbook book2 = xl.Workbooks.Add();
Worksheet sheet2 = book2.sheets[1];
for (int r2 = 0; r2 <= ranges.Count; r2++)
{
Range row = (Range)ranges[r2]; //
sheet2.rows[r2+1].Value2 = row; //fails;
//querying in debug, the properties of row all throw an exception
//queryying sheet.rows[r2+1] expands as expected
}
If you see where my error is please advise.
Thanks.
I think after you close the first source book, you can no longer use the range references you've taken from it. So move book.Close() to the end of your snippet. Your upper bound for the for second loop should just be < not <=.
for (int r2 = 0; r2 < ranges.Count; r2++)
{
ranges[r2].Copy();
sheet2.Paste(sheet2.Rows[r2+1]);
}

Categories