skip a sheet in excel workbook using openxml(C#) - c#

iterating through all sheets in workbook using openxml(C#) but want to skip a specific sheet based on its name.Please suggest how it can be done.

The sheet names are stored in the WorkbookPart in a Sheets element
which has children of element Sheet which corresponds to each
worksheet in the Excel file.
See the answer given in this question.
Make your own loop and skip the sheet according to its name.

Related

How to use match function or similar with openxml

I'm creating two worksheets with openxml, I want to add a hyperlink from a cell in sheet 2 to a cell in sheet 1. I know that with Microsoft interop excel, you could use a match function to find the row number where a value is found. I was wondering how could the same task be done with openXML instead.
With Microsoft interop excel, I did something like this
var excelApp = new Application();
WorksheetFunction function = excelApp.WorksheetFunction;
var rowNumber = function.Match(currentItem.originID, sheet1.Range["B1","B" + rowCount], 0);
Where sheet1 is the sheet I'm searching through, and rowCount is how many rows there are in that sheet. This would find currentItem.originID in the range of column B and return the row number that the match was found
How could I do something similar to this but with openXML?

Getting the worksheet to which the area belong to

In VSTO, in developing an excel add-in, i've an excel area name, is it possible to get the worksheet it belong to?
Is it also possible to get the worksheet that a cell belong to?
If you have a range object just use .parent to get the worksheet. Then you can get the name of the worksheet or whatever else.
If you only have a range name as a string then you need to loop through the names in the workbook and worksheets looking for that name. Then use the refersTo property to get the address. Parse for the worksheet name.

Setting Active worksheet using Gembox

I'm currently working on a spreadsheet reader that needs to read the second worksheet in the ExcelFile. I've looked only but can't find anywhere that references how to set the activeworksheet.
Currently my active worksheet is set like this:
ExcelWorksheet worksheet = excelFile.Worksheets.ActiveWorksheet;
When debugging, i noticed that it's reading the first worksheet out of the two, when i need it to be reading the second file.
How would i be able to set the active worksheet to an index of 1 rather than an index of 0.
Thanks
UPDATE:
I fixed this by querying using Linq, that went through the ExcelFile Worksheets and setting the worksheet index. Example code below:
ExcelWorksheet worksheet = excelFile.Worksheets.Where(x => x.Index = 1).SingleOrDefault();
First note that active worksheet is the one selected when file is opened with some Excel application (like MS Excel), see ActiveWorksheet property's help page.
You can set any sheet to be an active one and usually (by default) it's the first one in the workbook, so that is why you're accessing the first sheet with it. But in order to access any sheet that you want you can retrieve it from the Worksheets collection through indexer, like the following:
ExcelWorksheet firstSheet = excelFile.Worksheets[0];
ExcelWorksheet secondSheet = excelFile.Worksheets[1];
Or with the sheet's name, like the following:
ExcelWorksheet firstSheet = excelFile.Worksheets["Sheet1"];
ExcelWorksheet secondSheet = excelFile.Worksheets["Sheet2"];
See Worksheets properties on help page.

OpenXML does not help to read large Excel files contrary to documentation

The documentation says that:
The following code segment is used to read a very large Excel
file using the DOM approach.
and then goes an example. I use it to implement reading a relatively large file with 700K rows. I have this code by now:
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
// no other code
}
When I start my program, I see how quickly - just in five seconds - it runs out of memory (>1G). And the debugger points to this line of code:
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
So, I need to know whether OpenXML really helps to read large files. And, if not, what are the alternatives (Interop does not help - I've already checked it).
EDIT
One extra mysterious thing. This code I get by now:
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
count++;
}
}
gives me in the count variable over than a million of rows. However, I do have 14K on the first sheet and 700K on the second sheet. It is very strange. So, my extra question is how to parse only rows with data using SAX approach. And one final mystery of reading large Excel files on OpenXML. One guy in this thread says that: "Turns out that the worksheets are enumerated backwards for some reason (so the first of my three sheets is actually index 3". So, my final extra question is how to get the sheet you want. At this moment I use this code:
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
But taking into account what the say, I'm not sure that in my case I would actually get the first worksheet.
You seem to have a few questions, I'll try and tackle them one-by-one.
So, I need to know whether OpenXML really helps to read large files. And, if not, what are the alternatives (Interop does not help - I've already checked it).
Yes, the OpenXml SDK is great for reading large files but you may need to use a SAX approach rather than a DOM approach. From the same documentation you cite:
However, the DOM approach requires loading entire Open XML parts into memory, which can cause an Out of Memory exception when you are working with really large files.... Consider using SAX when you need to handle very large files.
The DOM approach loads the whole sheet into memory which for a large sheet can cause out of memory exceptions. Using the SAX approach you read each element in turn which reduces the memory consumption considerably.
So, my extra question is how to parse only rows with data using SAX approach
You are only getting the rows that have data (or at least the rows that exist in the XML) using the SDK. You appear to have asked this as a separate question which I've answered in more detail but essentially you are seeing the start and end of each row element using the code in your question. See my answer to your Why does OpenXML read rows twice question for more details.
So, my final extra question is how to get the sheet you want.
You need to find the Sheet by name which is a descendant of the Workbook. Once you have that you can use its Id to get the WorksheetPart:
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filename, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).First();
if (sheet != null)
{
WorksheetPart worksheetPart = workbookPart.GetPartById(sheet.Id) as WorksheetPart;
//read worksheetPart...
}
}

How to get the 'first' sheet in OOXML with C# and the SDK?

SO! :) Simple question -- it's probably been asked, but I could not find it.
I am retrieving data from an XLSX using the Open XML SDK and C#.
I want to get the "first" sheet (as in the first one you would see in Excel), but when I use...
WorkbookPart wbPart = workBook.WorkbookPart;
//Now let's find the dimension of the first worksheet
string sheetArea = wbPart.WorksheetParts.First().Worksheet.SheetDimension.Reference.Value;
Unfortunately, in a brand-new XLSX this pulled "Sheet3" instead of "Sheet1". I do not know the sheet name ahead of time nor can I force the user to submit a workbook with only one sheet or specify sheet name. My present requirements are to grab the first sheet.
Can someone please help? :)
EDIT: I figured it out! But I can't answer my own question for 7 hours, so...
I found this by digging through answers on this other SO question:
Open XML SDK 2.0 - how to update a cell in a spreadsheet?
In essence, a working example might be this :
(wbPart.GetPartById(wbPart.Workbook.Sheets.Elements<Sheet>().First().Id.Value) as WorksheetPart).Worksheet.SheetDimension.Reference.Value
As far as I know, something like:
Sheet firstSheet = wbPart.Workbook.Descendants<Sheet>().First();
Worksheet firstWorksheet = ((WorksheetPart)wbPart.GetPartById(firstSheet.Id)).Worksheet;
Should return the first worksheet. The workbook Sheet descendants should always be sorted based on the order they appear in the workbook, at least in my experience.
If you wish to get the first visible, use:
Sheet firstSheet = wbPart.Workbook.Descendants<Sheet>()
.First(s => s.State == SheetStateValues.Visible);

Categories