how to create Dictionary for excel in c# - c#

I have excel like this :
ProductID
SomeExplanation
AnotherColumn1
AnotherColumn2
AnotherColumn3
1
X
6
A
65465
2
Y
5
B
6556
3
Z
7
C
65465
I want to create Dictionary that key values(which are ProductID, SomeExplanation,AnotherColumn1,AnotherColumn2, AnotherColumn3) and this dictionary must have List of values (for example dictionary key : ProductId and it's values : 1,2,3 etc..) and I think there must be List that containes all dictionaries.
I am using aspose library for excel and .net framework 4.5 .
Aspose returning the it's cell values as an object.
So my first question how can create List of dictionaries, and these dictionaries must have list of values (List<Dictionary<key,List of values>>) and how to add values to this List of dictionary ?
My second question with that : how can I fill this list of dictionaries with aspose worksheet ?

This is method that accept Aspose Worksheet as a parameter this worksheet parameter can be one of the excel files' worksheet.
I want to iterate through all cell and assign values to dictionary, and this values belong to its header(0 row and columnOrder)
For example: there is a list called myExcelContainer and this list is a series of Excel columns and also this columns is an dictionary that contains key of value (Excel header - for example ProductId) and the values [1, 2, 3] under the Excel header.
public List<Dictionary<string, List<object>>> GenerateExcelDictionary(Worksheet worksheet)
{
var columnMax = worksheet.Cells.MaxDataColumn;
var rowMax = worksheet.Cells.MaxDataRow;
var myExcelContainer = new List<Dictionary<string, List<object>>>();
var columnKeyWithValues = new Dictionary<string, List<object>>();
for (int column = 0; column < columnMax; column++)
{
var columnName = worksheet.Cells[0, column].Value.ToString().Replace(" ", string.Empty);
columnKeyWithValues.Add(columnName, new List<object>());
}
for (int column = 0; column < columnMax; column++)
{
var values = new List<object>();
for (int row = 1;row < rowMax;row++)
{
values.Add(worksheet.Cells[row, column]);
}
columnKeyWithValues[worksheet.Cells[0, column].Value.ToString()] = values;
}
myExcelContainer.Add(columnKeyWithValues);
return myExcelContainer;
}
this is the excel container :
var myExcelContainer = new List<Dictionary<string, List<object>>>();
But If you can improve the algortihm performance I want you to share, please.
My english not great :) .

Related

Microsoft.Interop.Excel can't read cell value

I wrote a small method that will give me the headers of a table in excel:
private List<string> GetCurrentHeaders(int headerRow, Excel.Worksheet ws)
{
//List where specific values get saved if they exist
List<string> headers = new List<string>();
//List of all the values that need to exist as headers
var headerlist = columnName.GetAllValues();
for (int i = 0; i < headerlist.Count; i++)
{
//GetData() is a Method that outputs the Data from a cell.
//headerRow is defining one row under the row I actually need, therefore -1 )
string header = GetData(i + 1, headerRow - 1, ws);
if (headerlist.Contains(header) && !headers.Contains(header))
{
headers.Add(header);
}
}
return headers;
}
Now I got an Excel-table, where the first value I need is in cell A11 (or Row 11, Column 1).
When I set a breakpoint after string header = GetData(i + 1, headerRow - 1, ws);, where i+1 = 1 and headerRow - 1 = 11, I can see that the value he read is empty, which is not the case.
The GetData-Method just does one simple thing:
public string GetData(int row, int col, Excel.Worksheet ws)
{
string val = "";
val = Convert.ToString(ws.Cells[row, col].Value) != null
? ws.Cells[row, col].Value.ToString() : "";
val = val.Replace("\n", " ");
return val;
}
I don't get why this can't get me the value I need, while it works on every other excel table too. The excel itself is no different from the others. It's file extension is .xls, the data is in the same layout as in the other tables, etc
There are a few steps to getting this right. You need to know the dimensions of your table to know where the headers are. Your method hast two ways of knowing this: 1) passing the table Range to the method, or 2) giving the coordinates of a cell within the table (usually the top-left cell) and trusting the CurrentRegion property to do the job for you. The most reliable way would be the first as you will be explicitly telling the method where to look, but it'll require the consumer to figure out the address which isn't always straightforward. The CurrentRegion approach works fine too but note that if you have an empty column within your table range, it will only address until that empty column. Having said all that, you could have the following:
List<string> GetHeaders(Worksheet worksheet, int row, int column)
{
Range currentRegion = worksheet.Cells[row, column].CurrentRegion;
Range headersRow = currentRegion.Rows[1];
var headers = headersRow
.Cast<Range>() // We cast so we can use LINQ
.Select(c => c.Text is DBNull ? null : c.Text as string) //The null value of c.Text is not null but DBNull
.ToList();
return headers;
}
Then you can simply test if you're missing headers. The following code assumes the ActiveCell is a cell within the table Range, but you can change that easily to address a specific cell.
List<string> GetMissingHeaders(List<string> expectedHeaders)
{
var worksheet = App.ActiveSheet; //App is your Excel application
Range activeCell = worksheet.ActiveCell;
var headers = GetHeaders(worksheet, activeCell.Row, activeCell.Column);
return expectedHeaders.Where(h => headers.Any(i => i == h) == false).ToList();
}

c# interop excel - copy filtered range to array/list

I want to copy filtered Excel Range (particular column) to array or list. My problem is, I'm able to copy an normal range to array easily. But when I apply filter, I'm unable to copy it properly. I have tried multiple ways.
I have tried with Range.Cells.Value and Range.Rows.Cast<Excel.Range>(). it gives me only two rows(1st two) but there are 15 rows in excel sheet when I filter based on a criteria:
Excel.Range srcRange = sheet.UsedRange;
srcRange.AutoFilter(field, criteria, Excel.XlAutoFilterOperator.xlFilterValues, Type.Missing, true);
Excel.Range filteredRange = sheet.UsedRange.SpecialCells(Excel.XlCellType.xlCellTypeVisible, Type.Missing);
Excel.Range rn = filteredRange.Columns[columnNumber];
//var myVal = (System.Array)rn.Rows.Cast<Excel.Range>().SelectMany(x => x.ToString());
//var myVal = (System.Array)rn.Rows.Cast<object>().SelectMany(x => x.ToString()); //this gives exception - com object cannot be casted to string type
var myVl = (System.Array)rn.Cells.Value;
arr1 = myVl.OfType<object>().Select(o => o.ToString()).ToArray();
It skips other rows and takes only first continuous rows! So if I have rows indexed 1,2,3 in the filtered criteria, the array is populated only with first three rows. Though there are rows at different index like. I know I can do this something like this:
foreach (Excel.Range area in filteredRange.Areas)
{
foreach (Excel.Range row in area.Rows)
{
int index = row.Row;
string test = sheet.Cells[index, column].Value.ToString();
tmpList.Add(test);
}
}
But this is not a solution for me as I can't write this when I want to copy values from multiple columns! So I was looking for a 1 liner. I don't mind whether I store the values into an array or list.
It would be really helpful if someone can point me into the right direction. Thanks!
You could try something like this
Excel.Range filteredRange = sheet.UsedRange.SpecialCells(Excel.XlCellType.xlCellTypeVisible, Type.Missing);
foreach (Excel.Range area in filteredRange.Areas)
{
foreach (Excel.Range row in area.Rows)
{
foreach (Excel.Range column in area.Columns)
{
list.Add(sheet.Cells[row.Row, column.Column].Value.ToString());
}
}
}

Best way to merge in list<object> where object contains list of values

I have an IList<Row> where Row contains a list of Cells IList<Cell>. These cells have a ToString, ToDouble etc.
I want to loop through this list of rows and check if there are rows with the same value for cell[index]. Let's say for cell 3.
If there are rows with the same value, I should merge these rows into one row. It is certain that all cells are - in case of the same key - the same except for the cell with an amount, let's say that this is cell 4. So this should be merged (so 1 deleted) with the only difference that the value is the sum of both.
I have tried to create a Dictionary<string, double>. I looped through all rows, check whether map contains key, if not -> merge (also did this with an extension method Merge, but the same idea).
After this loopthrough, I created a new list, placed the dictionary in there and looped through the old list for the other information.
Well I think that my way is way too long, that there should be a way to do this much easier, maybe by LINQ or whatsoever. Any idea's on how to do this properly? Or do you guys think that my approach isn't that bad?
Try:
var mergedRows = rows.GroupBy(x => x.Cells[0].Value.ToString())
.Select(x => new Row() { Cells = new List<Cell>
{
new Cell() { Value = x.Key },
new Cell() { Value = x.Sum(y => int.Parse(y.Cells[1].Value.ToString())) }
}
});

C# LinqToExcel GetColumnName nth row [duplicate]

I am using the library LinqToExcel to read excel files in my mvc4 project. My problem is when I try to read the headers at row 4... How I can do this?
In project, exists a function that returns all the column names, but I suppose that the columns need to be at row 0.
// Summary:
// Returns a list of columns names that a worksheet contains
//
// Parameters:
// worksheetName:
// Worksheet name to get the list of column names from
public IEnumerable<string> GetColumnNames(string worksheetName);
Thanks.
Unfortunately the GetColumnNames() method only works when the header row is on row 1.
However, it should be possible to get the column names by using the WorksheetRangeNoHeader() method.
It would look something like this
var excel = new ExcelQueryFactory("excelFileName");
// Only select the header row
var headerRow = from c in excel.WorksheetRangeNoHeader("A4", "Z4")
select c;
var columnNames = new List<string>();
foreach (var headerCell in headerRow)
columnNames.Add(headerCell.ToString());
An FYI for future googlers:
It appears that GetColumnNames() has changed since the above answer was accepted.
There is now an overload in which you can define the range of the header row as a string:
// This will return a List<string>
var colNames = ExcelFile
.GetColumnNames(SheetName, "A9:AF9")
.ToList();

How to find duplicate values in Excel cells between multiple sheets programmatically

For example, I have a sheet called EmployeeSheet, which is just a single column of every employee's name first and last in a company. And let's assume this list is perfectly formatted and has no duplicates so every cell is unique in this sheet.
Now I have a sheet for each department in the company, such as FinanceSheet, ITSheet, and SalesSheet. Each sheet has in it somewhere (as in each sheet doesn't have the same layout) a list of employees in each department. However any 1 employee name should only appear once between all of the department sheets (this excludes the EmployeeSheet).
Here's the solution I can think of but not figure out how to implement, would be to make a multidimensional array (Learned a small bit about them in school, vaguely remember how to use though).
Pseudocode something like:
arrEmployees = {"Tom Hanks", "Burt Reynolds", "Your Mom"}
arrFinance = {"Tom Hanks"}
arrIT = {"Burt Reynolds"}
arrSales = {"Your Mom"}
arrSheets = {arrEmployees, arrFinance, arrIT, arrSales}
While I've been able to get single cell values and ranges as strings by using
Sheets shts = app.Worksheets;
Worksheet ws = (Worksheet)sheets.get_Item("EmployeeSheet");
Excel.Range empRange = (Excel.Range)worksheet.get_range("B2");
string empVal = empRange.Value2.ToString();
But with that process to get a single cell value to a string, I don't know how I would put that into an element of my array, let alone a range of values.
I'm sure my method is not the most efficient, and it might not even be possible, but that's why I'm here for help, so any tips are appreciated.
EDIT: This is the solution that ended up working for me. Thanks to Ian Edwards solution.
Dictionary<string, List<Point>> fields = new Dictionary<string, List<Point>>();
fields["Finance"] = new List<Point>() { new Point(2,20)};
fields["Sales"] = new List<Point>();
for (int row = 5; row <= 185; row += 20) {fields["Sales"].Add(new Point(2,row));}
List<string> names = new List<string>();
List<string> duplicates = new List<string>();
foreach (KeyValuePair<string, List<Point>> kp in fields)
{
Excel.Worksheet xlSheet = (Excel.Worksheet)workbook.Worksheets[kp.Key];
foreach (Point p in kp.Value)
{
if ((xlSheet.Cells[p.Y, p.X] as Excel.Range.Value != null)
{
string cellVal = ((xlSheet.Cells[p.Y,p.X] as Excel.Range).Value).ToString();
if (!names.Contains(cellVal))
{ names.Add(cellVal)) }
else { duplicates.Add(cellVal); } } } }
Here's a little example I knocked together - the comments should explain what's going on line by line.
You can declare the name of the worksheets you want to check for names, as well as where to start looking for names in the 'worksheets' dictionary.
I assume you don't know how many names are in each list - it will keep going down each list until it encounters a blank cell.
// Load the Excel app
Microsoft.Office.Interop.Excel.Application xlApp = new Microsoft.Office.Interop.Excel.Application();
// Open the workbook
var xlWorkbook = xlApp.Workbooks.Open("XLTEST.xlsx");
// Delcare the sheets and locations to look for names
Dictionary<string, Tuple<int, int>> worksheets = new Dictionary<string, Tuple<int, int>>()
{
// Declare the name of the sheets to look in and the 1 base X,Y index of where to start looking for names on each sheet (i.e. 1,1, = A1)
{ "Sheet1", new Tuple<int, int>(1, 1) },
{ "Sheet2", new Tuple<int, int>(2, 3) },
{ "Sheet3", new Tuple<int, int>(4, 5) },
{ "Sheet4", new Tuple<int, int>(2, 3) },
};
// List to keep track of all names in all sheets
List<string> names = new List<string>();
// Iterate over every sheet we need to look at
foreach(var worksheet in worksheets)
{
string workSheetName = worksheet.Key;
// Get this excel worksheet object
var xlWorksheet = (Microsoft.Office.Interop.Excel.Worksheet)xlWorkbook.Worksheets[workSheetName];
// Get the 1 based X,Y cell index
int row = worksheet.Value.Item1;
int column = worksheet.Value.Item2;
// Get the string contained in this cell
string name = (string)(xlWorksheet.Cells[row, column] as Microsoft.Office.Interop.Excel.Range).Value;
// name is null when the cell is empty - stop looking in this sheet and move on to the next one
while(name != null)
{
// Add the current name to the list
names.Add(name);
// Get the next name in the cell below this one
name = (string)(xlWorksheet.Cells[++row, column] as Microsoft.Office.Interop.Excel.Range).Value;
}
}
// Compare the number of names to the number of unique names
if (names.Count() != names.Distinct().Count())
{
// You have duplicate names!
}
You can use .Range to define multiple cells (ie, .Range["A1", "F500"])
https://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.worksheet.range.aspx
You can then use .get_Value to get the contents/values of all cells in that Range. According to dotnetperls.com get_Value() is much faster than get_Range() (see 'Performance' section). Using the combo of multiple ranges + get_value will definitely perform better of lots of single range calls using get_range.
https://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.namedrange.get_value(v=vs.120).aspx
I store them in the an Object Array.
(object[,])yourexcelRange.get_Value(Excel.XlRangeValueDataType.xlRangeValueDefault);
From there you can write your own comparison method to compare multiple arrays. One quirk is that doing this returns a 1-indexed array, instead of a standard 0-based index.

Categories