When I send data to Excel it ignores the merged "property" of some cells and just writes to the first cell it finds. So assuming I have column A and column B merged and I am sending data to column A and C, it actually splits the merged column so I am left with an empty column B.
Here is some code for context (some variables have been kept generic):
Range cells = this.Worksheet.Cells;
Range cell = (Range)cells[rowIndex, columnIndex];
Boolean merged = (Boolean)cell.MergeCells; //Here I am trying to determine if the
//cell is merged.
My problem is that .MergeCells always returns false. What am I doing wrong here? I know that in the Excel worksheet the cells are merged.
The problem is you are casting to a boolean, and MergeCells is not always guaranteed to give you back a boolean, as outlined in this more recent question: how to detect merged cells in c# using MS interop excel. You need to also check for the value of null - see the linked question for how to do that.
Hypothesis
So what's probably happening to your code is the null value casts back to false, even though what the null value actually indicates is that there are merged cells in the range.
The answer is: Your code is correct.
Boolean merged = (Boolean)cell.MergeCells; //Cast from dynamic{bool} to bool
This works for me (Excel 2013 on Windows 7).
I have noticed both true and false values in my own tests.
So maybe your worksheet's cells just DO NOT CONTAIN a merged cell!?
Related
this is my first post so sorry if it doesn't look good or if the formatting is weird.
Anyways, I need to find a way to get the correct value of numeric (non-string) cells using OpenXML, but with some spreadsheets my current method doesn't seem to work.
There seems to be a difference in how I need to code for varying spreadsheets.
Once I've accessed an Excel file and have opened one of its Sheets I get the number of columns and rows for the current sheet and begin my loops. Within these loops is the code below that is used to add each cell to the datatable using the Cell Reference (made using the parent loops) as the guide. This is the looped through code:
//making the cell index/location of the cell (for example: B10, E17, AE14, ...) based on our row/column indexing so far
cellRef = ConvertColNumToLetter(startColNum + column) + (y + StartRow).ToString();
//getting the specific cell at the cell index of our cellRef
workingCell = cells.Where(c => c.CellReference == cellRef).FirstOrDefault();
//using a try-catch to solve the issue of the program failing whenever the cell has no inner text to take
try
{
if (workingCell.InnerText == null)
value = "";
else
{
value = workingCell.InnerText;
}
//if the cell's data is a string
if (workingCell.DataType != null && workingCell.DataType.Value == CellValues.SharedString)
{
var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
if (stringTable != null)
value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
dataRow[colIterator] = value.ToString();
colIterator++;
}
//if the cell's data is numeric
else if (workingCell.CellValue != null)
{
value = workingCell.CellValue.InnerText.ToString();
dataRow[colIterator] = value;
colIterator++;
}
//if the cell doesn't have any data
else
{
dataRow[colIterator] = "";
colIterator++;
}
}
//this will only fail if the cell didn't have any data/the data is null
catch
{
dataRow[colIterator] = "";
colIterator++;
}
My code works just fine for 99% of the spreadsheets it encounters. My only problem is with numeric values not being accessed correctly on one specific spreadsheet but it seems to get the correct numeric values on all other spreadsheets.
With the normal spreadsheets, whatever numeric value that is stored in the formula is returned. As an example, here is a cell's value from one such spreadsheet: "31313308.87". When my program reaches that cell, here is the workingCell's details (the workingCell is the cell at the current cell reference location):
which as you can see are the values that I would want and would go through the program and store the correct values into the datatable.
But the spreadsheet that's giving me issues is strange. When I get to one of the numeric cells I end up just getting the completely wrong values. Opening up the actual spreadsheet, there is a numeric cell with a value of "502028.6", but my program gets this in the workingCell's details:
I'm not sure where the "279907" comes from or how I should go about getting the correct values.
Here's another strange thing. The next three values of the spreadsheet are zeros which all return 0 in the CellValue and InnerText of the workingCell. The next value on the spreadsheet is "-160", and in the workingCell's details it shows:
1500 in the CellValue and InnerText?
Again, this is the only spreadsheet that this is happening on (or at least the only one that I know of). The program returns the correct string values as well, but on this one specific spreadsheet, none of the numeric values seem to be picked up with my program (despite it working and returning the correct numeric values for the hundreds of other spreadsheets I've used it on so far).
Is there some simple fix that I'm just not seeing? I would love any pointers or suggestions you may have for me, and I will be happy to clarify anything if you need some additional information.
Thank you for all of your help in advance!
--EDIT:
I forgot to specify when I asked the question, but the incorrect data (the incorrect CellValues and InnerTexts) is actually successfully processed into the else if statement then added to the datatable. I don't have any problem as far as adding values to the datatable goes, but I just can't seem to get the correct CellValues and InnerTexts for numeric cells on the one specific spreadsheet
I've 2 excel sheets(Sheet1 & Sheet2) in my Excel workbook. I want to copy row data from Sheet2 to Sheet1.
Condition is:
if Sheet2 copied row doesn't exist in Sheet1 then paste it otherwise don't paste the row.
Copied Rows except 1st row in Sheet2:
Range dataWithoutFirstRow = xlAccrualSheet.Range[xlAccrualSheet.UsedRange.Cells[2, 1],
xlAccrualSheet.UsedRange.SpecialCells(XlCellType.xlCellTypeLastCell)];
dataWithoutFirstRow.Copy();
Paste in below used range in Sheet1:
Range DataRange = xlAccrualWorkSheet.Cells[emptycell, 1];
DataRange.PasteSpecial(XlPasteType.xlPasteAllUsingSourceTheme);
Please Tell me How to check already exist rows in Sheet1.
Awaiting for Your Response
Please Tell me How to check already exist rows in Sheet1.
Use Range to read out all data from the workSheet. You are already doing a similar thing accrualSheet. Then, you can use range.Cells(i,j) or range.Rows(r).Value or even range.Rows(r).Cells(i,j) to get the data inside each specific row.
When pasting occurs, loop over all pasted rows, and for each pasted row compare it with rows from workSheet. You may do it directly by reading that on the fly (as mentioned above), or you may read all rows from workSheet and store them in a List and compare incoming rows against that list - it will work much faster that way.
Now, one of the most interesting things is probably what does it mean to "compare rows". Either you will need to compare all cells within a row with another one, or you will need to compare just a specific set of "columns" like "date, time, cause, origin, caseId" etc. But that.. noone knows, you said nothing about that. If there's no info on that, then you probably should compare whole rows and assume that any difference in any cell means that the rows are different.
I am writing a C# application that reads data from an Excel file. Everything was running smoothly until I attempted to read from a cell that used a formula.
I am pulling data from the sheet and trying to add the cumulative quantity, so in a loop, I'm using:
cntr = Cell(row, column);
NOTE: I'm paraphrasing rather than copy my actual code.
Anyways, if the actual cell value contains a number, this works, but if the cell contains a function, it returns the string
"=SUM(A1:A5)"
and I'm not sure how I can execute this in my C# code to retrieve the actual value of that cell.
Try
Cell(a,b).Value
instead of just Cell(a,b).
Also, the following approach should work
Excel.Range objRange = (Excel.Range)objSheet.Cells[rowN,colN];
variableName = objRange.get_Value(System.Missing.Type).ToString();
You may modify it for your datatype
This code is supposed to get all rows in the range that I specify, and delete ONLY the rows with no cell DATA in them. It's actually deleting every row in the range though. Why?
Range range = _sheet.get_Range("A25:A542", Type.Missing);
range = range.EntireRow;
range.Delete(Type.Missing);
Type.Missing doesn't mean what you think it means. Type.Missing is a COM artefact - it just tells the Excel object that you're not providing that particular parameter. It's the kind of thing that's normally taken care of for you in VB.NET and VBA. C# 4.0 has support for optional parameters, which makes things much easier.
You do not check if any DATA exists, so the program deletes all rows from line 25 til line 542.
I'm reading an Excel document via the DocumentFormat.OpenXml library. Is there a good way to find out how many columns it has?
The current code, which I've just come across while investigating a bug, does this:
public string getMaxColumnName(SheetData aSheetData)
{
string lLastCellReference = aSheetData.Descendants<Cell>().Last().CellReference.InnerText;
char[] lRowNumberIndex = lLastCellReference.IndexOfAny(new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' });
return lLastCellReference.Substring(0, lRowNumberIndex);
}
In English: find the last cell in the sheet, get its cell reference (like "CB99"), and retrieve everything before the first digit. The problem is that the last cell in the sheet is not necessarily in the rightmost column.
I have a test sheet that is a neat, rectangular table. It has 1000 rows filling columns A through M, so the function is supposed to return the string "M". But because there is an extraneous space character in cell C1522, that's counted as the last cell, so the function reports the max column as "C".
My initial impulse was to just replace that Last() call with some kind of Max(columnNumber). However, Cell apparently does not expose an actual column number, only this composite CellReference string. I don't think I want to be doing string-splitting inside a predicate there.
Is there a way to find the sheet's rightmost column, without having to parse the CellReference of every single cell?
As I understand the format, there are various cases:
If the file is not generated by Excel and the worksheet contains data in a way that there are no blank rows and there are no blank column within a row, but not necesarily every row has the same number of columns (which may be the case):
You are pretty much screwed. The format allow for rows and cells references to be ignored in this case. You have to count all cell references in each row to get the maximum.
If the file is not generated by Excel, but cells are populated sparse (which apparently is not the case):
The last cell of each row holds the reference of the column it must be in the "r" attribute. You will have to convert the reference, though.
If the file is generated by Excel:
Usually, and I haven't found an Excel-generated file that doesn't, the worksheet part has a child named dimension, which has a "ref" attribute with the cell reference used by the worksheet i.e. "A1:M1001". It is only a case of using this to know the columns. Of course, it works only if the extraneous character does not come in a column after the table.
Alternatively, every row usually, and every Excel-generated file I have seen has it, has an attribute called "spans" that has the columns that row uses. The "spans" attribute format is numeric, so in your example it would have a value "1:13" for every row in the table. Maybe you only have to check the first row this way.
I have concluded that this is the wrong thing to do in the first place. The consuming code is never actually looking for the rightmost cell in the whole sheet. Generally, what it wants is the number of cells in a particular row-- either row 1, or a known table header location.
In fact, with the possible exception of rendering or printing, I can't come up with any situation where getting the whole sheet's max cell is necessary.
Therefore, I need to refactor slightly. I'm changing the function so it takes a sheet and a row index, and returns the column of the rightmost cell in that row. That is, it will now look like:
public string getMaxColumnIndex(SheetData aSheetData, int aRowIndex);
For the implementation of that, I can check the Row.Spans property when it exists, or else parse the cell reference of Row.ChildElements.Last().