Split a large Excel file into multiple, based on row count - c#

I have a C# console application which needs a large Excel to be split into multiple Excel files based on the row count. The code below shows a source file with only 51 rows (including the header column rows) but the final source file will have 100,000+ rows.
The code is trying to skip the very first (header) row and then should copy from rows 2 through 11 and so on--I have the target files set to only 10 rows per file, to make developing faster.
Question So how do I copy rows 2 through 11 and subsequent 10 rows from the source Excel file and paste to multiple target Excel files so that the target files each will have 10 rows?
Here is the almost newly written code. It is loosely based on copying of specific range of excel cells from one worksheet to another worksheet and https://social.msdn.microsoft.com/Forums/vstudio/en-US/afd01976-63d0-4f96-9ba4-e3e2b6cf8d55/excel-with-c-how-to-specify-a-range-?forum=vsto
Now I am able to write 5 Excel files. But the first file has 9 rows (starting from row 2) while 2nd file has only 3 rows, starting with row 10, the 3rd has 13 rows starting, again, with row 10; the last two files have incrementally more rows, both starting with row 10.
So something wrong with my For Loop? Or the way I am selecting the ranges?
string startPath = System.IO.Path.GetDirectoryName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
string filePath_source = Path.Combine(startPath, #"Source_Files\Offers_Source_Temp.xlsx");
string filePath_copiedinto = Path.Combine(startPath, #"Source_Files\ToBeCopiedInto.xlsx");
app = new Excel.Application();
app.DisplayAlerts = false;
book = app.Workbooks.Open(filePath_source);
sheet = (Excel.Worksheet)book.Worksheets.get_Item((1));
int iRowCount = sheet.UsedRange.Rows.Count;
int maxrows = 10;//change this to something like 50,000 later. 01/16/18
int maxloops = iRowCount / maxrows;
int beginrow = 2; //skipping the header row.
Excel.Application destxlApp;
Excel.Workbook destworkBook;
Excel.Worksheet destworkSheet;
Excel.Range destrange;
string srcPath;
string destPath;
//Opening of first worksheet and copying
srcPath = filePath_source;
for (int i = 1; i <= maxloops; i++) {
Excel.Range rng = (Excel.Range)sheet.Range[sheet.Cells[beginrow, 1], sheet.Cells[maxrows, 3]];
rng.Copy(Type.Missing);
//opening of the second worksheet and pasting
destPath = filePath_copiedinto;
destxlApp = new Excel.Application();
destxlApp.DisplayAlerts = false;
destworkBook = destxlApp.Workbooks.Open(destPath, 0, false);
destworkSheet = destworkBook.Worksheets.get_Item(1);
destrange = destworkSheet.Cells[1, 1];
destrange.Select();
destworkSheet.Paste(Type.Missing, Type.Missing);
destworkBook.SaveAs(startPath + "\\Output_Files\\" + beginrow + ".xlsx");
destworkBook.Close(true, null, null);
destxlApp.Quit();
beginrow = beginrow + maxrows;
string blah = null;
}

I would suggest to use OpenXml library to do that task. It is dependency free and supports the whole OpenXml structure.
Here a starting point how to read/write the rows:
using System;
using System.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
// Open the document for editing.
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
foreach (Row r in sheetData.Elements<Row>())
{
}
}
Now, writing is very similar:
using (SpreadsheetDocument spreadSheet = SpreadsheetDocument.Create(fileName),
SpreadsheetDocumentType.Workbook))
{
// create the workbook
spreadSheet.AddWorkbookPart();
spreadSheet.WorkbookPart.Workbook = new Workbook (); // create the worksheet
spreadSheet.WorkbookPart.AddNewPart<WorksheetPart>();
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet = new Worksheet();
// create sheet data
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.AppendChild(new SheetData());
// create row
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.First().AppendChild(new Row());
}

Got it! In my revised code in the Question, I came close but had some problem in the For Loop; fixed it per the code below. So here is the almost complete code. Thanks everyone for your help!!
try
{
string startPath = System.IO.Path.GetDirectoryName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
string filePath_source = Path.Combine(startPath, #"Source_Files\Offers_Source_Temp.xlsx");
string filePath_copiedinto = Path.Combine(startPath, #"Source_Files\ToBeCopiedInto.xlsx");
app = new Excel.Application();
app.DisplayAlerts = false;
book = app.Workbooks.Open(filePath_source);
sheet = (Excel.Worksheet)book.Worksheets.get_Item((1));
int iRowCount = sheet.UsedRange.Rows.Count;
int countColumns = sheet.UsedRange.Columns.Count;
int maxrows = 10;//change this to something like 50,000 later. 01/16/18
int maxloops = iRowCount / maxrows;
int beginrow = 2; //skipping the header row.
Excel.Application destxlApp;
Excel.Workbook destworkBook;
Excel.Worksheet destworkSheet;
Excel.Range destrange;
string srcPath;
string destPath;
//Opening of first worksheet and copying
srcPath = filePath_source;
for (int i = 1; i <= maxloops; i++) {
/// Excel.Range rng = (Excel.Range)sheet.Range[sheet.Cells[beginrow, 1], sheet.Cells[maxrows, 3]];
Excel.Range startCell = sheet.Cells[beginrow, 1];//not sure the second parameter needed?
Excel.Range endCell = sheet.Cells[beginrow+maxrows-1, 3];//not sure the second parameter needed?
Excel.Range rng = sheet.Range[startCell, endCell];
rng = rng.EntireRow;//so second parameters above should not be needed. But doesn't work without it!
rng.Copy(Type.Missing);
//opening of the second worksheet and pasting
destPath = filePath_copiedinto;
destxlApp = new Excel.Application();
destxlApp.DisplayAlerts = false;
destworkBook = destxlApp.Workbooks.Open(destPath, 0, false);
destworkSheet = destworkBook.Worksheets.get_Item(1);
destrange = destworkSheet.Cells[1, 1];
destrange.Select();
destworkSheet.Paste(Type.Missing, Type.Missing);
destworkBook.SaveAs(startPath + "\\Output_Files\\" + beginrow + ".xlsx");
destworkBook.Close(true, null, null);
destxlApp.Quit();
beginrow = beginrow + maxrows;
}//for loop
}

Related

c# excel how to Find First Cell in a used range.

I can use XlCellType.xlCellTypeLastCell to find last cell in the used range. How to get first cell in one line ?
Code to get last cell position.
Excel.Range mergeCells = (Excel.Range)mergeSheet.Cells[6,1].EntireRow;
var rowRng = mergeCells.SpecialCells(XlCellType.xlCellTypeLastCell, Type.Missing);
var colPosition = rowRng.Column;
One way is to get mergeCells.value and loop through to increment a counter until I see null/empty value. But I was hoping to get this in one line.
Any ideas ?
Test Cases:
(1)
Expected Result colPosition = 1
(2)
Expected Result colPosition = 5
Here is a solution using the Excel Interop library (as tagged in the question). The below method will return a 1-based column index of the first cell in a given row. It worked for me on your supplied test cases as well as a few of my own. Note that if you wish to simply use the first row in the used range - rather than a supplied row, you can find the first used row number using ActiveSheet.UsedRange.Rows[1].Row.
public static int FindFirstCellInExcelRow(string filePath, int rowNum)
{
Excel.Application xlApp = null;
Excel.Workbook wkBook = null;
Excel.Worksheet wkSheet = null;
Excel.Range range = null;
try
{
xlApp = new Excel.Application();
wkBook = xlApp.Workbooks.Open(filePath);
wkSheet = wkBook.ActiveSheet;
range = wkSheet.Cells[rowNum, 1].EntireRow;
if (range.Cells[1, 1].Value != null)
{
return range.Cells[1, 1].Column;
}
var result = range.Find(What: "*", After: range.Cells[1, 1], LookAt: Excel.XlLookAt.xlPart, LookIn: Excel.XlFindLookIn.xlValues, SearchOrder: Excel.XlSearchOrder.xlByColumns, SearchDirection: Excel.XlSearchDirection.xlNext, MatchByte: false, MatchCase: false);
int colIdx = result?.Column ?? 0; // return 0 if no cell in row contains value
return colIdx;
}
finally
{
wkBook.Close();
Marshal.ReleaseComObject(xlApp);
Marshal.ReleaseComObject(wkBook);
Marshal.ReleaseComObject(wkSheet);
Marshal.ReleaseComObject(range);
xlApp = null;
wkBook = null;
wkSheet = null;
range = null;
}
}
I highly (x10) recommend using ClosedXML over Microsoft's Excel libraries (unless you are using the old xls files). Using ClosedXML you would do the following (this is taken right from their webpage):
Get it right off the NuGet packages. Install-Package ClosedXML -Version 0.93.1
https://github.com/ClosedXML/ClosedXML/wiki/Finding-and-extracting-the-data
var wb = new XLWorkbook(northwinddataXlsx);
var ws = wb.Worksheet("Data");
// Look for the first row used
var firstRowUsed = ws.FirstRowUsed();
// Narrow down the row so that it only includes the used part
var categoryRow = firstRowUsed.RowUsed();
// Move to the next row (it now has the titles)
categoryRow = categoryRow.RowBelow();
// Get all categories
while (!categoryRow.Cell(coCategoryId).IsEmpty())
{
String categoryName = categoryRow.Cell(coCategoryName).GetString();
categories.Add(categoryName);
categoryRow = categoryRow.RowBelow();
}
try the below code snippet, this will give the first row of a excel used range
Excel.Workbook xlWB = Globals.ThisAddIn.Application.ActiveWorkbook;
Excel.Worksheet xlWS = xlWB.ActiveSheet;
int firstRow = xlWS.UsedRange.Row;

Open a xls file with SAX approach in .Net and adding several rows (about 100K)

I need your support.
I need to open a .xls file with c# adding about 100k of rows.
I tried several approach, but all approaches run me into memory error, except for OpenXML with SAX Approach. This one works for me.
My problem is I'm able to create excel with 100K rows from the scratch (and it works like a charm), but I'm not able to open a file xls that for example has first row with titles, second row with formulas, and suppose I've to add 100K rows from the third row of this excel.
This is my code.
When i run it, file is cloned but no rows are added...could you please help me?
Thanks
//copy of file
String str_IntermidiateXLSPath = "c:\\testXls3.xlsx";
File.Copy("c:\\testXls.xlsx", str_IntermidiateXLSPath, true);
using (SpreadsheetDocument SpreadsheetDocumentObj = SpreadsheetDocument.Open(str_IntermidiateXLSPath, true))
{
WorkbookPart workbookpart = null;
WorksheetPart worksheetpart = null;
OpenXmlWriter writer = null;
Cell cell = null;
Row row = null;
SpreadsheetDocumentObj.CompressionOption = CompressionOption.SuperFast;
workbookpart = SpreadsheetDocumentObj.WorkbookPart;
worksheetpart = workbookpart.WorksheetParts.First();
writer = OpenXmlWriter.Create(worksheetpart);
writer.WriteStartElement(new Worksheet());
writer.WriteStartElement(new SheetData());
//starting row
for (int i = 3; i < 80; i++)
{
row = new Row();
for (int j = 0; j < 10; j++)
{
cell = new Cell();
row.Append(cell);
cell.DataType = CellValues.InlineString;
cell.InlineString = new InlineString { Text = new Text { Text = "XXX" } };
}
writer.WriteElement(row);
}
writer.WriteEndElement();
writer.WriteEndElement();
writer.Close();
workbookpart.Workbook.Save();
SpreadsheetDocumentObj.Close();
}
Process.Start(str_IntermidiateXLSPath);

C# Excel Interop: Replace Isn't Replacing values

I'm using Excel interop to replace all comma values in an Excel workbook, and then saving the obtained file as a csv. I'm doing this as follows:
var app = new Application();
app.DisplayAlerts = false;
var wb = app.Workbooks.Open(excelfilename);
Worksheet sheet = wb.WorkSheets[0];
sheet.Activate();
Range last = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell, Type.Missing);
Range range = sheet.Range["A1", last];
int lastUsedRow = last.Row;
int lastUsedColumn = last.Column;
var startcell = sheet.Cells[1, 1];
var endcell = sheet.Cells[lastUsedRow, lastUsedColumn];
var subrange = sheet.Range[startcell, endcell];
subrange.Replace(#",", #"", XlLookAt.xlPart, XlSearchOrder.xlByColumns, false, Type.Missing, false, false);
wb.SaveAs(outputfilename, Microsoft.Office.Interop.Excel.XlFileFormat.xlCSVWindows);
wb.Close(false, "", true);
However, I found that the above code doesn't actually replace all the commas in the sheet, and I still have commas in the Worksheet before I export to CSV. What am I doing wrong?

copying of specific range of excel cells from one worksheet to another worksheet

I am writing a C# program which copies a range of cells from a worksheet of one workbook to a worksheet of an other workbook. But the problem I am facing is I am only able to copy and paste the whole worksheet of first workbook. I want to know how to select only a specific range(from row 5 [column 1 to column 10] to row 100 [column 1 to column 10]) and paste it in second workbook worksheet starting from row 2 column 8.
Also i want to know how a fill a column say from C1 to C100 with some value in a direct way instead of using the loop like below
for(i=1;i<2;i++)
{
for(j=1;j<101;i++)
{
worksheet.cells[i,j]="Fixed";
}
}
Here is the code that i have written so far
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Excel = Microsoft.Office.Interop.Excel;
namespace ConsoleApplication3
{
class Program
{
static void Main(string[] args)
{
Excel.Application srcxlApp;
Excel.Workbook srcworkBook;
Excel.Worksheet srcworkSheet;
Excel.Range srcrange;
Excel.Application destxlApp;
Excel.Workbook destworkBook;
Excel.Worksheet destworkSheet;
Excel.Range destrange;
string srcPath;
string destPath;
//Opening of first worksheet and copying
srcPath="C:\\Documents and Settings\\HARRY\\Desktop\\incident.csv";
srcxlApp = new Excel.Application();
srcworkBook = srcxlApp.Workbooks.Open(srcPath);
srcworkSheet = srcworkBook.Worksheets.get_Item(1);
srcrange = srcworkSheet.UsedRange;
srcrange.Copy(Type.Missing);
//opening of the second worksheet and pasting
destPath = "C:\\Documents and Settings\\HARRY\\Desktop\\FIXED Aging incident Report.xls";
destxlApp = new Excel.Application();
destworkBook = destxlApp.Workbooks.Open(destPath,0,false);
destworkSheet = destworkBook.Worksheets.get_Item(1);
destrange = destworkSheet.Cells[1, 1];
destrange.Select();
destworkSheet.Paste(Type.Missing, Type.Missing);
destworkBook.SaveAs("C:\\Documents and Settings\\HARRY\\Desktop\\FIXED Aging incident Report " + DateTime.Now.ToString("MM_dd_yyyy") + ".xls");
srcxlApp.Application.DisplayAlerts = false;
destxlApp.Application.DisplayAlerts = false;
destworkBook.Close(true, null, null);
destxlApp.Quit();
srcworkBook.Close(false, null, null);
srcxlApp.Quit();
}
}
}
You should be able to do this:
Excel.Range from = srcworkSheet.Range("C1:C100");
Excel.Range to = destworkSheet.Range("C1:C100");
from.Copy(to);
mrtig has a very elegant solution. But it won't work if you have the workbooks in separate instances of excel. So, the key is to open them in just one instance. I've modified your example to show using this approach:
public void CopyRanges()
{
// only one instance of excel
Excel.Application excelApplication = new Excel.Application();
srcPath="C:\\Documents and Settings\\HARRY\\Desktop\\incident.csv";
Excel.Workbook srcworkBook = excelApplication.Workbooks.Open(srcPath);
Excel.Worksheet srcworkSheet = srcworkBook.Worksheets.get_Item(1);
destPath = "C:\\Documents and Settings\\HARRY\\Desktop\\FIXED Aging incident Report.xls";
Excel.Workbook destworkBook = excelApplication.Workbooks.Open(destPath,0,false);
Excel.Worksheet destworkSheet = destworkBook.Worksheets.get_Item(1);
Excel.Range from = srcworkSheet.Range("C1:C100");
Excel.Range to = destworkSheet.Range("C1:C100");
// if you use 2 instances of excel, this will not work
from.Copy(to);
destworkBook.SaveAs("C:\\Documents and Settings\\HARRY\\Desktop\\FIXED Aging incident Report " + DateTime.Now.ToString("MM_dd_yyyy") + ".xls");
srcxlApp.Application.DisplayAlerts = false;
destxlApp.Application.DisplayAlerts = false;
destworkBook.Close(true, null, null);
srcworkBook.Close(false, null, null);
excelApplication.Quit();
}
For the First part of setting the same value for the entire range, instead of looping following will work out
range1 = workSheet.get_Range("A1:B100");
range1.Value = "Fixed";
And for copying you can try what #mrtig has suggested.

How to merge two Excel workbook into one workbook in C#?

Let us consider that I have two Excel files (Workbooks) in local. Each Excel workbook is having 3 worksheets.
Lets say WorkBook1 is having Sheet1, Sheet2, Sheet3
Workbook2 is having Sheet1, Sheet2, Sheet3.
So here I need to merge these two excel workbook into one and the new excel workbook that is let's say Workbook3 which will have total 6 worksheets (combination of workbook1 and workbook2).
I need the code that how to perform this operation in c# without using any third party tool. If the third party tool is free version then its fine.
An easier solution is to copy the worksheets themselves, and not their cells.
This method takes any number of excel file paths and copy them into a new file:
private static void MergeWorkbooks(string destinationFilePath, params string[] sourceFilePaths)
{
var app = new Application();
app.DisplayAlerts = false; // No prompt when overriding
// Create a new workbook (index=1) and open source workbooks (index=2,3,...)
Workbook destinationWb = app.Workbooks.Add();
foreach (var sourceFilePath in sourceFilePaths)
{
app.Workbooks.Add(sourceFilePath);
}
// Copy all worksheets
Worksheet after = destinationWb.Worksheets[1];
for (int wbIndex = app.Workbooks.Count; wbIndex >= 2; wbIndex--)
{
Workbook wb = app.Workbooks[wbIndex];
for (int wsIndex = wb.Worksheets.Count; wsIndex >= 1; wsIndex--)
{
Worksheet ws = wb.Worksheets[wsIndex];
ws.Copy(After: after);
}
}
// Close source documents before saving destination. Otherwise, save will fail
for (int wbIndex = 2; wbIndex <= app.Workbooks.Count; wbIndex++)
{
Workbook wb = app.Workbooks[wbIndex];
wb.Close();
}
// Delete default worksheet
after.Delete();
// Save new workbook
destinationWb.SaveAs(destinationFilePath);
destinationWb.Close();
app.Quit();
}
Edit: notice that you might want to Move method instead of Copy in case you have dependencies between the sheets, e.g. pivot table, charts, formulas, etc. Otherwise the data source will disconnect and any changes in one sheet won't effect the other.
Here's a working sample that joins two books into a new one, hope it will give you an idea:
using System;
using Excel = Microsoft.Office.Interop.Excel;
using System.Reflection;
namespace MergeWorkBooks
{
class Program
{
static void Main(string[] args)
{
Excel.Application app = new Excel.Application();
app.Visible = true;
app.Workbooks.Add("");
app.Workbooks.Add(#"c:\MyWork\WorkBook1.xls");
app.Workbooks.Add(#"c:\MyWork\WorkBook2.xls");
for (int i = 2; i <= app.Workbooks.Count; i++)
{
int count = app.Workbooks[i].Worksheets.Count;
app.Workbooks[i].Activate();
for (int j=1; j <= count; j++)
{
Excel._Worksheet ws = (Excel._Worksheet)app.Workbooks[i].Worksheets[j];
ws.Select(Type.Missing);
ws.Cells.Select();
Excel.Range sel = (Excel.Range)app.Selection;
sel.Copy(Type.Missing);
Excel._Worksheet sheet = (Excel._Worksheet)app.Workbooks[1].Worksheets.Add(
Type.Missing, Type.Missing, Type.Missing, Type.Missing
);
sheet.Paste(Type.Missing, Type.Missing);
}
}
}
}
}
You're looking for Office Autmation libraries in C#.
Here is a sample code to help you get started.
System.Data.Odbc.OdbcDataAdapter Odbcda;
//CSV File
strConnString = "Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + SourceLocation + ";Extensions=asc,csv,tab,txt;Persist Security Info=False";
sqlSelect = "select * from [" + filename + "]";
System.Data.Odbc.OdbcConnection conn = new System.Data.Odbc.OdbcConnection(strConnString.Trim());
conn.Open();
Odbcda = new System.Data.Odbc.OdbcDataAdapter(sqlSelect, conn);
Odbcda.Fill(ds, DataTable);
conn.Close();
This would read the contents of an excel file into a dataset.
Create multiple datasets like this and then do a merge.
Code taken directly from here.

Categories