I need to color Excel cells in a fast manner. I found similar method to write to Excel cells which for me is really fast, so I tried applying the same method when coloring the cells. Consider the following code:
xlRange = xlWorksheet.Range["A6", "AS" + dtSchedule.Rows.Count];
double[,] colorData = new double[dtSchedule.Rows.Count, dtSchedule.Columns.Count];
for (var row = 0; row < dtSchedule.Rows.Count; row++)
{
for (var column = 0; column < dtSchedule.Columns.Count; column++)
{
if (column <= 3)
{
colorData[row, column] = GetLightColor2("#ffffff");
continue;
}
if (dtSchedule.Rows[row][column].ToString() != "#000000" && !string.IsNullOrEmpty(dtSchedule.Rows[row][column].ToString()))
{
string[] schedule = dtSchedule.Rows[row][column].ToString().Split('/');
string color = schedule[0].Trim();
colorData[row, column] = GetLightColor2(color);
continue;
}
colorData[row, column] = GetLightColor2("#000000");
}
}
xlRange.Interior.Color = colorData;
This is the GetLightColor2 function:
private double GetLightColor2(string hex)
{
return ColorTranslator.ToOle(ColorTranslator.FromHtml(hex));
}
When I ran the code, an error was thrown at
xlRange.Interior.Color = colorData;
With the following error:
System.Runtime.InteropServices.COMException (0x80020005): Type
mismatch. (Exception from HRESULT: 0x80020005 (DISP_E_TYPEMISMATCH))
at System.RuntimeType.ForwardCallToInvokeMember(String memberName,
BindingFlags flags, Object target, Int32[] aWrapperTypes, MessageData&
msgData) at Microsoft.Office.Interop.Excel.Interior.set_Color(Object
value)
I could not find any other workaround unless coloring the cell by looping through each cell which is really slow. Or is it that I'm doing it the wrong way.
Thank you for your kind attention guys.
If your question is not about excel addin, I would strongly recommend to follow Akhil R J's advice. It's not the last big problem you'll encounter in interop, this technology is just one big problem and bug. If for some reason you cannot, I can tell you some things about your problem:
1) There is no way to do what you want using arrays. It is possible only for values and formulas.
2) Set Application.ScreenUpdating = false, when you set colors or any other operation with excel. Then it freezes user input, things go faster.
3) If many cells have the same color - use Application.Union to make a range from separated cells of the same color. But it's effective only to merge up to 50 cells in one time. If you take more, merging operation takes too much time, and it's not effective. After that, just set one color to the whole merged range. Pretty effective, around 5-10 times faster in my case.
4) There is another way, difficult one. I going to try it myself for the same problem (I have an addin, so I cannot just start to use OpenXML). Using interop, you can copy the target range to the windows clipboard. In the clipboard it is stored in many formats, including something OpenXMl-like. So you can edit it in the clipboard and paste back, using interop again. I think it's the fastest way, but it must be very time consuming to write this code.
Related
My goal is to find all the cells in an Excel containing a specific text. The Excel is quite large (about 2Mb) and has about 22 sheets. Historically we had problems with Interop, so I found IronXL which I love the way it operates.
The problem is that at some point, the RAM memory increases above 2Gb, and of course it's very slow.
I'm aware of the materialization issue, so I'm trying to avoid ToList() or Count() when using LINQ.
The first "problem" I found with IronXL is that the Cell class doesn't have any field specifying the sheet name where it is contained, so I divided the code in 2 sections:
The LINQ to find all the cells containing the text
Then I iterate in all previous results to store the desired cell info + sheet name where it was found in my custom class MyCell
The custom class:
class MyCell
{
public int X;
public int Y;
public string Location;
public string SheetName;
public MyCell(int x, int y, string location, string sheetName)
{
X = x;
Y = y;
Location = location;
SheetName = sheetName;
}
}
Here is my code:
List<MyCell> FindInExcel(WorkBook wb, string textToFind)
{
List<MyCell> res = new List<MyCell>();
var cells = from sheet in wb.WorkSheets
from cell in sheet
where cell.IsText && cell.Text.Contains(textToFind)
select new { cell, sheet };
foreach (var cell in cells)
{
res.Add(new MyCell(cell.cell.ColumnIndex, cell.cell.RowIndex, cell.cell.Location, cell.sheet.Name));
}
return res;
}
To test my method, I call:
WorkBook excel = WorkBook.Load("myFile.xlsx");
var results = FindInExcel(excel, "myText");
What happens when I execute and debug the code is indeed very weird. The LINQ query is executed very fast, and in my case I get 2 results. Then it starts iterating in the foreach, and the first 2 times the values are added to the list, so, everything is perfect. But the 3rd time, when it evaluates if any other item is available, is when the memory reaches 2Gb and takes like 10 seconds.
I observed the same behaviour when I do this:
int count = cells.Count()
I'm aware this is materializing the results, but what I don't understand is why I get the 2 first results in the foreach so fast, and it's only in the last step where the memory increases.
Seeing this behavior, it seems clear the code knows somewhere how many items has found without having to call the Count(), otherwise it would be slow the first time the "foreach" is called.
Just to know if I was getting crazy, I tried to put this small code in the FindInExcel method:
int cnt = 0;
foreach (var cell in cells)
{
res.Add(new MyCell(cell.cell.ColumnIndex, cell.cell.RowIndex, cell.cell.Location, cell.sheet.Name));
cnt++;
if (cnt == 2)
break;
}
In this last case, I don't have the memory issue and I finally get a List of 2 items with the cells I want, and without any memory issue.
What am I missing? Is there any way to do what I'm trying to do without materializing the results? I even tried to move to the .NET Framework 4.8.1 to see if some bug was fixed, but I'm getting the same behavior.
Note: If I use this code in a small Excel, it runs very fast.
Thank you in advance!
I already found the issue. There was a sheet where a hidden formula was extended to the last cell (M1048576), and so it was searching the value in all of these cells. Once removed, there is no memory issue anymore.
Thank you guys!
I'm working wth .NET 4.7.2, Windowsform.
I have a datagridview and I manage to generate a powerpoint file pptx.
I made a first ppt slide and I'd like to add the datagridview content into the second ppt slide given that I need to have the option to change the data within the PPt slide.
Microsoft.Office.Interop.PowerPoint.Application pptApp = new Microsoft.Office.Interop.PowerPoint.Application();
pptApp.Visible = Microsoft.Office.Core.MsoTriState.msoTrue;
Microsoft.Office.Interop.PowerPoint.Slides slides;
Microsoft.Office.Interop.PowerPoint._Slide slide;
Microsoft.Office.Interop.PowerPoint._Slide slide2;
Microsoft.Office.Interop.PowerPoint.TextRange objText;
// Create File
Presentation pptPresentation = pptApp.Presentations.Add(Microsoft.Office.Core.MsoTriState.msoTrue);
CustomLayout customLayout = pptPresentation.SlideMaster.CustomLayouts[PpSlideLayout.ppLayoutText];
// new Slide
slides = pptPresentation.Slides;
slide = slides.AddSlide(1, customLayout);
slide2 = slides.AddSlide(1, customLayout);
// title
objText = slide.Shapes[1].TextFrame.TextRange;
objText.Text = "Bonds Screner Report";
objText.Font.Name = "Haboro Contrast Ext Light";
objText.Font.Size = 32;
Shape shape1 = slide.Shapes[2];
slide.Shapes.AddPicture("C:\\mylogo.png", Microsoft.Office.Core.MsoTriState.msoFalse, Microsoft.Office.Core.MsoTriState.msoTrue, shape1.Left, shape1.Top, shape1.Width, shape1.Height);
slide.NotesPage.Shapes[2].TextFrame.TextRange.Text = "Disclaimer";
dataGridViewBonds.ClipboardCopyMode = DataGridViewClipboardCopyMode.EnableAlwaysIncludeHeaderText;
dataGridViewBonds.SelectAll();
DataObject obj = dataGridViewBonds.GetClipboardContent();
Clipboard.SetDataObject(obj, true);
Shape shapegrid = slide2.Shapes[2];
I know I'm not so far by now but I miss smething. Any help would be appreciated !
I am familiar with Excel interop and have used it many times and most likely have become numb to the awkward ways in which interop works. Using PowerPoint interop can be very frustrating for numerous reasons, however, the biggest I feel is the lack of documentation and the differences between the different MS versions.
In addition, I looked for a third-party PowerPoint library and “Aspose” looked like the only option, unfortunately it is not a “free” option. I will assume there is a free third-party option and I just did not look in the right place… Or there may be a totally different way to do this possibly with XML. I am confident I am preaching to the choir.
Therefore, what I have been able to put together may work for you. For starters, looking at your current posted code, there is one part missing that you need to get the “copied” grid cells into the slide…
slide.Shapes.Paste();
This will paste the “copied” cells from the grid into an “unformatted” table into the slide. This will copy the “row header” if it is displayed in the grid in addition to the “new row” if the grids AllowUserToAddRows is set to true. If this “unformatted paste” works for you, then you are good to go.
If you prefer to have at least a minimally formatted table and ignore the row headers and last empty row… It may be easier to simply “create” a new Table in the slide with the size we want along with the correct number of rows and columns. Granted, this may be more work, however, using the paste is going require this anyway “IF” you want the table formatted.
The method (below) takes a power point _Slide and a DataGridView. The code “creates” a new Table in the slide based on the number of rows and columns in the given grid. With this approach, the table will be “formatted” using the default “Table Style” in the presentation. So, this may give you the formatting you want by simply “creating” the table as opposed to “pasting” the table.
I have tried to “apply” one of the existing “Table Styles” in power point, however, the examples I saw used something like…
table.ApplyStyle("{5C22544A-7EE6-4342-B048-85BDC9FD1C3A}");
Which uses a GUID id to identify “which” style to use. I am not sure why MS decided on this GUID approach… this is beyond me, and it worked for “some” styles but not all.
Also, more common-sense solutions that showed something like…
table.StylePreset = TableStylePreset.MediumStyle2Accent2;
Unfortunately using my 2019 version of Office PowerPoint, this property does not exist. I have abandoned further research on this as it appears to be version dependent. Very annoying!
Given this, it may be easier if we format the cells individually as we want. We will need to add the cells text from the grid into the individual cells anyway, so we could also format the individual cells at the same time. Again, I am confident there is a better way, however, I could not find one.
Below the InsertTableIntoSlide(_Slide slide, DataGridView dgv) method takes a slide and a grid as parameters. It will add a table to the slide with data from the given grid. A brief code trace is below.
First a check is made to get the number of total rows in the grid (not including the headers) totRows. If the grids AllowUsersToAddRows is true, then the total rows variable is decremented by 1 to ignore this new row. Next the number of columns in the grid is set to the variable totCols. The top left X and Y point is defined topLeftX and topLeftY to position the table in the slide along with the tables width and height.
ADDED NOTE: Using the AllowUserToAddRows property to determine the number of rows … may NOT work as described above and will “miss” the last row… “IF” AllowUserToAddRows is true (default) AND the grid is data bound to a data source that does NOT allow new rows to be added. In that case you do NOT want to decrement the totRows variable.
Next a “Table” “Shape” is added to the slide using the previous variables to define the base table dimensions. Next are two loops. The first loop adds the header cells to the first row in the table. Then a second loop to add the data from the cells in the grid… to the table cells in the slide.
The commented-out code is left as an example such that you want to do some specific formatting for the individual cells. This was not need in my case since the “default” table style was close to the formatting I wanted.
Also, a note that “ForeColor” is the “Back ground” color of the cell/shape. Strange!
I hope this helps and again, sympathize more about having to use PowerPoint interop… I could not.
private void InsertTableIntoSlide(_Slide slide, DataGridView dgv) {
try {
int totRows;
if (dgv.AllowUserToAddRows) {
totRows = dgv.Rows.Count - 1;
}
else {
totRows = dgv.Rows.Count;
}
int totCols = dgv.Columns.Count;
int topLeftX = 10;
int topLeftY = 10;
int width = 400;
int height = 100;
// add extra row for header row
Shape shape = slide.Shapes.AddTable(totRows + 1, totCols, topLeftX, topLeftY, width, height);
Table table = shape.Table;
for (int i = 0; i < dgv.Columns.Count; i++) {
table.Cell(1, i+1).Shape.TextFrame.TextRange.Text = dgv.Columns[i].HeaderText;
//table.Cell(1, i+1).Shape.Fill.ForeColor.RGB = ColorTranslator.ToOle(Color.Blue);
//table.Cell(1, i+1).Shape.TextFrame.TextRange.Font.Bold = Microsoft.Office.Core.MsoTriState.msoTrue;
//table.Cell(1, i+1).Shape.TextFrame.TextRange.Font.Color.RGB = ColorTranslator.ToOle(Color.White);
}
int curRow = 2;
for (int i = 0; i < totRows; i++) {
for (int j = 0; j < totCols; j++) {
if (dgv.Rows[i].Cells[j].Value != null) {
table.Cell(curRow, j + 1).Shape.TextFrame.TextRange.Text = dgv.Rows[i].Cells[j].Value.ToString();
//table.Cell(curRow, j + 1).Shape.Fill.ForeColor.RGB = ColorTranslator.ToOle(Color.LightGreen);
//table.Cell(curRow, j + 1).Shape.TextFrame.TextRange.Font.Bold = Microsoft.Office.Core.MsoTriState.msoTrue;
//table.Cell(curRow, j + 1).Shape.TextFrame.TextRange.Font.Color.RGB = ColorTranslator.ToOle(Color.Black);
}
}
curRow++;
}
}
catch (Exception ex) {
MessageBox.Show("Error: " + ex.Message);
}
}
I've been working on a quite complex C# VSTO project that does a lot of different things in Excel. However, I have recently stumbled upon a problem I have no idea how to solve. I'm afraid that putting the whole project here will overcomplicate my question and confuse everyone so this is the part with the problem:
//this is a simplified version of Range declaration which I am 100% confident in
Range range = worksheet.Range[firstCell, lastCell]
range.Formula = array;
//where array is a object[,] which basically contains only strings and also works perfeclty fine
The last line that is supposed to insert a [,] array to an Excel range used to work before for smaller Excel books, but now crashes for bigger books with a System.OutOfMemoryException: Insufficient memory to continue the execution of the program and I have no idea why, because it used to work with arrays as long as 500+ elements for one of its dimensions whereas now it crashes for an array with under 400 elements. Furthermore, the RAM usage is about 1.2GB at the moment of crash and I know this project is capable of running perfectly fine with the RAM usage of ~3GBs.
I have tried the following things: inserting this array row by row, then inserting it cell by cell, calling GC.Collect() before each insertion of a row or a cell but it would nonetheless crash with a System.OutOfMemoryException.
So I would appreciate any help in solving this problem or identifying where the error could possibly be hiding, because I can't wrap my head around why it refuses to work for arrays with smaller length (but maybe with slightly bigger contents) at the RAM usage level of 1.2GBs which is like 1/3 of what it used to handle. Thank you!
EDIT
I've been told in the comments that the code above might be too sparse, so here is a more detailed version (I hope it's not too confusing):
List<object[][]> controlsList = new List<object[][]>();
// this list is filled with a quite long method calling a lot of other functions
// if other parts look fine, I guess I'll have to investigate it
int totalRows = 1;
foreach (var control in controlsList)
{
if (control.Length == 0)
continue;
var range = worksheet.GetRange(totalRows + 1, 1, totalRows += control.Length, 11);
//control is an object[n][11] so normally there are no index issues with inserting
range.Formula = control.To2dArray();
}
//GetRange and To2dArray are extension methods
public static Range GetRange(this Worksheet sheet, int firstRow, int firstColumn, int lastRow, int lastColumn)
{
var firstCell = sheet.GetRange(firstRow, firstColumn);
var lastCell = sheet.GetRange(lastRow, lastColumn);
return (Range)sheet.Range[firstCell, lastCell];
}
public static Range GetRange(this Worksheet sheet, int row, int col) => (Range)sheet.CheckIsPositive(row, col).Cells[row, col];
public static T CheckIsPositive<T>(this T returnedValue, params int[] vals)
{
if (vals.Any(x => x <= 0))
throw new ArgumentException("Values must be positive");
return returnedValue;
}
public static T[,] To2dArray<T>(this T[][] source)
{
if (source == null)
throw new ArgumentNullException();
int l1 = source.Length;
int l2 = source[0].Length(1);
T[,] result = new T[l1, l2];
for (int i = 0; i < l1; ++i)
for (int j = 0; j < l2; ++j)
result[i, j] = source[i][j];
return result;
}
I am not 100% sure I figured it out correctly, but it seems like the issue lies within Interop.Excel/Excel limitations and the length of formulas I'm trying to insert: whenever the length approaches 8k characters, which is close to Excel limit for the formula contents, the System.OutOfMemoryException pops out. When I opted to leave lengthy formulas out, the program started working fine.
I have c# windows application that is reading files content. I wanted to extract values from used rows only.
I am using this code:
int rows = ExcelWorksheet.UsedRange.Rows.Count;
Everything works fine. Except when I have empty rows on top, the counting will be incorrect.
-File has no special characters, formula or such. Just plain text on it.
-The application can read excel xls and xlsx with no issue if the file has no empty rows on top.
Okay, now I've realized I'm doing it all wrong. Of course it will not read all of my UsedRange.Rows because in my for loop, I am starting the reading always on the first row. So I get the ((Microsoft.Office.Interop.Excel.Range)(ExcelWorksheet.UsedRange)).Row; as a starting point of reading
This code works:
int rows = ExcelWorksheet.UsedRange.Rows.Count;
int fRowIndex = ((Microsoft.Office.Interop.Excel.Range)(ExcelWorksheet.UsedRange)).Row;
int rowCycle = 1;
for (int rowcounter = fRowIndex; rowCycle <= rows; rowcounter++)
{
//code for reading
}
Instead of read Excel row-by-row, better to get it in C# as a Range, and then handle it as
Sheet.UsedRange.get_Value()
for whole UsedRange in Sheet. Whenever you'd like to get a part of UsedRange, do it as
Excel.Range cell1 = Sheet.Cells[r0, c0];
Excel.Range cell2 = Sheet.Cells[r1, c1];
Excel.Range rng = Sheet.Range[cell1, cell2];
var v = rng.get_Value();
You well know size of v in C# memory from the values of [r1-r0, c1-c0]
When I use the Excel Reader it reads in everything fine except for time stamps. It turns, for example, 15:59:35 into .67290509259259268
How do I stop this from happening?
object[,] valueArray = (object[,])excelRange.get_Value(XlRangeValueDataType.xlRangeValueDefault);
That is my array that is holding the values that are read in from the excel sheet. Not sure if that is the reason.
Try DateTime.FromOADate - however, the numeric value you mentioned in the question doesn't actually correspond to the time you mentioned.
The reason for this is Excel stores all it's DateTimes as floating point numbers. The decimal part is the time component while the integer part represents the date.
You can get use Range.Text value to get the text, which should be formatted correctly. I don't think you can use this quite in the same way as above (trying to do the same myself so not got the actual approach yet). Also be wary it might be slow reading the text (reading number formats is v.slow).
Alternatively try using a library, FlexCel is a very good one we use, or Apose for a more complete solution.
Itterative Approach (this is almost certainly considerably slower than get_Value returning an object[,]).
if (excelRange!= null)
{
int nRows = excelRange.Rows.Count;
int nCols = excelRange.Columns.Count;
for (int iRow = 1; iRow <= nRows; iRow++)
{
for (int iCount = 1; iCount <= nCols; iCount++)
{
excelRange= (Microsoft.Office.Interop.Excel.Range)xlSheet.Cells[iRow, iCount];
String text = excelRange.Text;
}
}
}
(Edit: Removed other examples that were actually for Sharepoint.)