I am trying to retrieve data from an Excel spreadsheet using C#. The data in the spreadsheet has the following characteristics:
no column names are assigned
the rows can have varying column lengths
some rows are metadata, and these rows label the content of the columns in the next row
Therefore, the objects I need to construct will always have their name in the very first column, and its parameters are contained in the next columns. It is important that the parameter names are retrieved from the row above. An example:
row1|---------|FirstName|Surname|
row2|---Person|Bob------|Bloggs-|
row3|---------|---------|-------|
row4|---------|Make-----|Model--|
row5|------Car|Toyota---|Prius--|
So unfortunately the data is heterogeneous, and the only way to determine what rows "belong together" is to check whether the first column in the row is empty. If it is, then read all data in the row, and check which parameter names apply by checking the row above.
At first I thought the straightforward approach would be to simply loop through
1) the dataset containing all sheets, then
2) the datatables (i.e. sheets) and
3) the row.
However, I found that trying to extract this data with nested loops and if statements results in horrible, unreadable and inflexible code.
Is there a way to do this in LINQ ? I had a look at this article to start by filtering the empty rows between data but didn't really get anywhere. Could someone point me in the right direction with a few code snippets please ?
Thanks in advance !
hiro
I see that you've already accepted the answer, but I think that more generic solution is possible - using reflection.
Let say you got your data as a List<string[]> where each element in the list is an array of string with all cells from corresponding row.
List<string[]> data;
data = LoadData();
var results = new List<object>();
string[] headerRow;
var en = data.GetEnumerator();
while(en.MoveNext())
{
var row = en.Current;
if(string.IsNullOrEmpty(row[0]))
{
headerRow = row.Skip(1).ToArray();
}
else
{
Type objType = Type.GetType(row[0]);
object newItem = Activator.CreateInstance(objType);
for(int i = 0; i < headerRow.Length; i++)
{
objType.GetProperty(headerRow[i]).SetValue(newItem, row[i+1]);
}
results.Add(newItem);
}
}
Related
UPDATE
I think I have found what is causing the issue here https://stackoverflow.com/a/5665600/19393524
I believe my issue lies with my use of .DefaultView. The post thinks when you do a sort on it it is technically a write operation to the DataTable object and might not propagate changes made properly or entirely. It is an interesting read and seems to answer my question of why passing valid data to a DataRow is throwing this exception AFTER I make changes to the datatable
UPDATE:
Let me be crystal clear. I have already solved my problem. I would just like to know why it is throwing an error. In my view the code should work and it does.. the first run through.
AFTER I have already deleted the column then added it back (run this code once)
When I debug my code line by line in Visiual studio and stop at the line:
data.Rows[i].SetField(sortColumnNames[k], value);
the row exists
the column exisits
value is not null
sortColumnNames[k] is not null and contains the correct column name
i is 0
Yet it still throws an exception. I would like to know why. What am I missing?
Sorry for the long explanation but this one needs some context unfortunately.
So my problem is this, I have code that sorts data in a DataTable object by column. The user picks the column they want to sort by and then my code sorts it.
I ran into an issue where I needed numbers to sort as numbers not strings (all data in the table is strings). eg (string sorting would result in 1000 coming before 500)
So my solution was to create a temporary column that uses the correct datatype so that numbers get sorted properly and the original string data of the number remains unchanged but is now sorted properly. This worked perfectly. I could sort string numeric data as numeric data without changing the formatting of the number or data type.
I delete the column I used to sort afterwards because I use defaultview to sort and copy data to another DataTable object.
That part all works fine the first time.
The issue is when the user needs to do a different sort on the same column. My code adds back the column. (same name) then tries to add values to the column but then I get a null reference exception "Object not set to an instance of an object"
Here is what I've tried:
I've tried using AcceptChanges() after deleting a column but this did nothing.
I've tried using column index, name, and column object returned by DataTable.Columns.Add() in the first parameter of SetField() in case it was somehow referencing the "old" column object I deleted (this is what I think the problem is more than likely)
I've tried changing the value of the .ItemArray[] directly but this does not work even the first time
Here is the code:
This is the how the column names are passed:
private void SortByColumn()
{
if (cbAscDesc.SelectedIndex != -1)//if the user has selected ASC or DESC order
{
//clears the datatable object that stores the sorted defaultview
sortedData.Clear();
//grabs column names the user has selected to sort by and copies them to a string[]
string[] lbItems = new string[lbColumnsToSortBy.Items.Count];
lbColumnsToSortBy.Items.CopyTo(lbItems, 0);
//adds temp columns to data to sort numerical strings properly
string[] itemsToSort = AddSortColumns(lbItems);
//creates parameters for defaultview sort
string columnsToSortBy = String.Join(",", itemsToSort);
string sortDirection = cbAscDesc.SelectedItem.ToString();
data.DefaultView.Sort = columnsToSortBy + " " + sortDirection;
//copies the defaultview to the sorted table object
sortedData = data.DefaultView.ToTable();
RemoveSortColumns(itemsToSort);//removes temp sorting columns
}
}
This is where the temp columns are added:
private string[] AddSortColumns(string[] items)//adds columns to data that will be used to sort
//(ensures numbers are sorted as numbers and strings are sorted as strings)
{
string[] sortColumnNames = new string[items.Length];
for (int k = 0; k < items.Length; k++)
{
int indexOfOrginialColumn = Array.IndexOf(columns, items[k]);
Type datatype = CheckDataType(indexOfOrginialColumn);
if (datatype == typeof(double))
{
sortColumnNames[k] = items[k] + "Sort";
data.Columns.Add(sortColumnNames[k], typeof(double));
for (int i = 0; i < data.Rows.Count; i++)
{
//these three lines add the values in the original column to the column used to sort formated to the proper datatype
NumberStyles styles = NumberStyles.Any;
double value = double.Parse(data.Rows[i].Field<string>(indexOfOrginialColumn), styles);
bool test = data.Columns.Contains("QtySort");
data.Rows[i].SetField(sortColumnNames[k], value);//this is line that throws a null ref exception
}
}
else
{
sortColumnNames[k] = items[k];
}
}
return sortColumnNames;
}
This is the code that deletes the columns afterward:
private void RemoveSortColumns(string[] columnsToRemove)
{
for (int i = 0; i < columnsToRemove.Length; i++)
{
if (columnsToRemove[i].Contains("Sort"))
{
sortedData.Columns.Remove(columnsToRemove[i]);
}
}
}
NOTE:
I've been able to fix the problem by just keeping the column in data and just deleting the column from sortedData as I use .Clear() on the sorted table which seems to ensure the exception is not thrown.
I would still like an answer though as to why this is throwing an exception. If I use .Contains() on the line right before the one where the exception is thrown is says the column exists and returns true and in case anyone is wondering the params sortColumnNames[k] and value are never null either.
Your problem is probably here:
private void RemoveSortColumns()
{
for (int i = 0; i < data.Columns.Count; i++)
{
if (data.Columns[i].ColumnName.Contains("Sort"))
{
data.Columns.RemoveAt(i);
sortedData.Columns.RemoveAt(i);
}
}
}
If you have 2 columns, and the first one matches the if, you will never look at the second.
This is because it will run:
i = 0
is i < columns.Count which is 2 => yes
is col[0].Contains("sort") true => yes
remove col[0]
i = 1
is i < columns.Count which is 1 => no
The solution is to readjust i after the removal
private void RemoveSortColumns()
{
for (int i = 0; i < data.Columns.Count; i++)
{
if (data.Columns[i].ColumnName.Contains("Sort"))
{
data.Columns.RemoveAt(i);
sortedData.Columns.RemoveAt(i);
i--;//removed 1 element, go back 1
}
}
}
I fixed my original issue by changing a few lines of code in my SortByColumn() method:
private void SortByColumn()
{
if (cbAscDesc.SelectedIndex != -1)//if the user has selected ASC or DESC order
{
//clears the datatable object that stores the sorted defaultview
sortedData.Clear();
//grabs column names the user has selected to sort by and copies them to a string[]
string[] lbItems = new string[lbColumnsToSortBy.Items.Count];
lbColumnsToSortBy.Items.CopyTo(lbItems, 0);
//adds temp columns to data to sort numerical strings properly
string[] itemsToSort = AddSortColumns(lbItems);
//creates parameters for defaultview sort
string columnsToSortBy = String.Join(",", itemsToSort);
string sortDirection = cbAscDesc.SelectedItem.ToString();
DataView userSelectedSort = data.AsDataView();
userSelectedSort.Sort = columnsToSortBy + " " + sortDirection;
//copies the defaultview to the sorted table object
sortedData = userSelectedSort.ToTable();
RemoveSortColumns(itemsToSort);//removes temp sorting columns
}
}
Instead of sorting on data.DefaultView I create a new DataView object and pass data.AsDataView() as it's value then sort on that. Completely gets rid of the issue in my original code. For anyone wondering I still believe it is bug with .DefaultView in the .NET framework that Microsoft will probably never fix. I hope this will help someone with a similar issue in the future.
Here is the link again to where I figured out a solution to my problem.
https://stackoverflow.com/a/5665600
I have a C# program I am creating as a parsing program. One section of the code pulls in a list of rows in an Excel file that do not have a particular value in a particular column. I was then going to use a foreach loop to loop through each of those rows and delete them, however it is taking quite a long time to cycle through each of those rows. And there are multiple tabs that I am needing to run this on.
So my thought was turning the list of Excel rows into a range and then just deleting that range. Is this possible to convert that list of rows into an Excel range? Below is the code snippet:
XLWorkbook wb = new XLWorkbook(Path.Combine(Destination, fName) + ".xlsx");
IXLWorksheet ws = wb.Worksheet(SheetName);
var range = ws.RangeUsed();
var table = range.AsTable();
var cell = table.HeadersRow().CellsUsed(c => c.Value.ToString() == ColName).FirstOrDefault();
//Gets the column letter for use in next section
string colLetter = cell.WorksheetColumn().ColumnLetter();
//Create list of rows that DO NOT contain the inv number being searched
//This is the list I would like to convert to a range to speed up the delete
List<IXLRow> deleterows = ws
.Column(colLetter)
.CellsUsed(c => c.Value.ToString() != i)
.Select(c => c.WorksheetRow()).ToList();
//Deletes the header row so that isn't removed
deleterows.RemoveAt(0);
foreach (IXLRow x in deleterows)
{
x.Delete();
}
Right now each iteration checks the value of the cell by accessing that cell in the file. That takes a lot of time.
Read all the data on one sheet into an array and do all the iteration in the array. That will be a magnitude faster.
Since you want to parse data, I suggest you use that array for any other operation you want to do and only write it back to a file when you are done. (If your result is stored in Excel again, make sure write the whole array at once as a Range).
I want to set the cell formula for columns k-P by using a loop and in c# i can only loop with integers, how do you actually get the column(E.G K,L,M,N,O,P) with the index? for looping through rows its pretty easy because they are just numbers but for columns excel uses letters.
I cant think of anything other than defining my own List for letters k-P in c#
You can use CellReference, if you already have the cell object.
var temp = new CellReference(cell);
var reference = temp.FormatAsString();
Hope this works for you!
Allso you can get it directly from ICell object:
var adress = cell.Address.FormatAsString();
Ore if you need column name only add method like this:
public static string GetColumnName(this ICell cell)
{
return Regex.Match(cell.Address.FormatAsString(), #"[A-Z]+").Value;
}
And then just call it like:
var colName = cell.GetColumnName();
Could anyone tell me please what the best/most efficient way is to get the index of the row in a datagrid view that has the smallest integer value for a particular column.
I did this using a foreach loop to traverse the collection of rows and compare the respective value of each row and store the smallest. Each time I find a smaller value I update my "smallest" variable with that value. This works but I'm pretty sure there must be some lambda expression that does a better job. Could anyone help please?
This is the column containing the value:
dgvItems.Rows[i].Cells["col1"].Value
Many thanks.
the best way to do so is with LINQ queries,
LINQ is better looking, more readable and more efficient than for loops
you should use the DataGridView data source and not directly to the DataGridView data
DataTable dt = dgv.DataSource as DataTable;
MinRow: var result = dt.Rows.IndexOf(dt.AsEnumerable().OrderBy(x => x["col1"]).First());
MaxRow: var result1 = dt.Rows.IndexOf(dt.AsEnumerable().OrderByDescending(x => x["col1"]).First());
hope this could help you
I will help you for this one since you are not asking about using LINQ.
Lets do it the "usual" way..
First, declare a new List
List<int> columnValue = new List<int>();
we will Iterate through each row in the grid
foreach (DataGridViewRow row in myDataGrid.Rows)
{
if (null != row && null != row.Cells["col1"].Value)
{
//Add the value of columnValue to list
columnValue.Add(Convert.ToInt32(row.Cells[0].Value.ToString()));
}
}
and get the MIN. value from LIST
int result = columnValue.Min();
finally, get the index of the value we got from the LIST
int i = columnValue.IndexOf(result); //returns the Index from "result" var
I have a csv file I am going to read from disk. I do not know up front how many columns or the names of the columns.
Any thoughts on how I should represent the fields. Ideally I want to say something like,
string Val = DataStructure.GetValue(i,ColumnName).
where i is the ith Row.
Oh just as an aside I will be parsing using the TextFieldParser class
http://msdn.microsoft.com/en-us/library/cakac7e6(v=vs.90).aspx
That sounds as if you would need a DataTable which has a Rows and Columns property.
So you can say:
string Val = table.Rows[i].Field<string>(ColumnName);
A DataTable is a table of in-memory data. It can be used strongly typed (as suggested with the Field method) but actually it stores it's data as objects internally.
You could use this parser to convert the csv to a DataTable.
Edit: I've only just seen that you want to use the TextFieldParser. Here's a possible simple approach to convert a csv to a DataTable:
var table = new DataTable();
using (var parser = new TextFieldParser(File.OpenRead(path)))
{
parser.Delimiters = new[]{","};
parser.HasFieldsEnclosedInQuotes = true;
// load DataColumns from first line
String[] headers = parser.ReadFields();
foreach(var h in headers)
table.Columns.Add(h);
// load all other lines as data '
String[] fields;
while ((fields = parser.ReadFields()) != null)
{
table.Rows.Add().ItemArray = fields;
}
}
If the column names are in the first row read that and store in a Dictionary<string, int> that maps the column name to the column index.
You could then store the remaining rows in a simple structure like List<string[]>.
To get a column for a row you'd do csv[rowIndex][nameToIndex[ColumnName]];
nameToIndex[ColumnName] gets the column index from the name, csv[rowIndex] gets the row (string array) we want.
This could of course be wrapped in a class.
Use the csv parser if you want, but a text parser is something very easy to do by yourself if you need customization.
For you need, i would use one (or more) Dictionnary. At least one to have the PropertyString --> column index. And maybe the reverse one column index--> PropertyString if needed.
When i parse a file for csv, i usually put the result in a list while parsing, and then in an array once complete for speed reasons (List.ToArray()).