How do I convert a Value2 string to a DateTime (Excel Interop) - c#

I am importing data from an Excel file and converting it into a DataSet, which I have working (kind of). The problem that I am having is that two fields are Dates in Excel but when I do the import they get turned into numbers and I am not sure how to get them back to dates. The last two columns are the columns that are dates (HDate & CDate).
This is the code
private void FillDatagrid()
{
Excel.Application xlApp = new Excel.Application();
Excel.Workbook xlWorkbook = xlApp.Workbooks.Open(#"\\Server01\Ins\Eligible.xls");
Excel._Worksheet xlWorksheet = xlWorkbook.Sheets[1];
Excel.Range xlRange = xlWorksheet.UsedRange;
DataTable table = new DataTable("Employees");
table.Columns.Add("StoreID");
table.Columns.Add("EmpID");
table.Columns.Add("EmpName");
table.Columns.Add("Position");
table.Columns.Add("HDate");
table.Columns.Add("CDate");
int rowCount = xlRange.Rows.Count;
for (int i = 2; i <= rowCount; i++)
{
table.Rows.Add(
xlRange.Cells[i, 1].Value2.ToString(),
xlRange.Cells[i, 2].Value2.ToString(),
xlRange.Cells[i, 3].Value2.ToString(),
xlRange.Cells[i, 4].Value2.ToString(),
xlRange.Cells[i, 5].Value2.ToString(),
xlRange.Cells[i, 5].Value2.ToString());
}
DataSet ds = new DataSet();
ds.Tables.Add(table);
dataGrid.ItemsSource = ds.Tables["Employees"].DefaultView;
}
As far as I can tell a Value2 will only convert to a string.
Note #1 Although I am putting this data into a DataGrid in this sample I am not doing that in the actual code block. I just wanted to mention that so that it didn't seem like this could be fixed with formatting in XAML etc.
Note #2 I realize I have xlRange.Cells[i, 5].Value2.ToString() twice. I am doing so to get around the problem of column 6 have null values, which my import didn't like. I plan to come back to that after I get this problem fixed.
Note #3 When I say I am getting the date as a string I mean it is coming from Excel as a string but formatted as a number so, for instance, the cell data has a date like 6/30/2015 but it comes over to my dataset like 42185.

What about:
table.Columns.Add("HDate", typeof(DateTime));
table.Columns.Add("CDate", typeof(DateTime));

I was able to solve this by altering my loop
for (int i = 2; i <= rowCount; i++)
{
string storeId = xlRange.Cells[i, 1].Value2.ToString();
string employeeId = xlRange.Cells[i, 2].Value2.ToString();
string employeeName = xlRange.Cells[i, 3].Value2.ToString();
string position = xlRange.Cells[i, 4].Value2.ToString();
string hDate = Convert.ToString(xlRange.Cells[i, 5].Value);
string cDate = Convert.ToString(xlRange.Cells[i, 6].Value);
table.Rows.Add(storeId, employeeId, employeeName, position, hDate, cDate);
}
This also took care of another problem. Which is to say the Convert.ToString(xxx.Value) handles nulls where the Value2.ToString() would not.

Related

Excel export in normal text format

I currently am writing a system that includes reading in excel files. I want to read in this excel file and potentially spew it out into a csv. However, the issue im running into is that it is keeping the format that excel uses. For example i have a long number that in excel displays as 3.9151+E15, and it reads it in like this. When i highlight the cell in excel it shows the real number '3915100000026840'. This is the number i want to receive. It also adds a timestamp to dates which i do not want. It adds 00:00 00:00:000 or something similar to 17/05/2018, which is all i want. So basically, i want to retrieve the real text values from this excel spreadsheet.
The code i have at the minute is
public static DataTable READExcel(string path)
{
Microsoft.Office.Interop.Excel.Application objXL = null;
Microsoft.Office.Interop.Excel.Workbook objWB = null;
objXL = new Microsoft.Office.Interop.Excel.Application();
objWB = objXL.Workbooks.Open(path);
Microsoft.Office.Interop.Excel.Worksheet objSHT = objWB.Worksheets[1];
int rows = objSHT.UsedRange.Rows.Count;
int cols = objSHT.UsedRange.Columns.Count;
DataTable dt = new DataTable();
int noofrow = 1;
for (int c = 1; c <= cols; c++)
{
string colname = objSHT.Cells[1, c].Value.ToString();
dt.Columns.Add(colname);
noofrow = 2;
}
for (int r = noofrow; r <= rows; r++)
{
DataRow dr = dt.NewRow();
for (int c = 1; c <= cols; c++)
{
dr[c - 1] = objSHT.Cells[r, c].Value.ToString();
}
dt.Rows.Add(dr);
}
objWB.Close();
objXL.Quit();
return dt;
}
(Also another question which is slightly related but slightly not related, i had a csv to start which had the value '0003915100000026845'. When i turned this csv into a excel file it changed it to the value i reference above. Will excel have remembered these leading zeros anywhere or not?)

Apply format to many excel cells at once

I would like to format all added values in my excel file and i have a "small" and "fast" solution like this:
Item2 is a List<string>, Item3 is a List<List<string>>
if (chkWithValues.Checked && results.Item3.Any())
{
var rows = results.Item3.Count;
var cols = results.Item3.Max(x => x.Count);
object[,] values = new object[rows, cols];
object[,] format = new object[rows, cols];
//All returned items are inserted into the Excel file
//Item2 contains the database types, Item3 the Values
// pgMain shows the progress for the selected tables
for (int j = 0; j < results.Item3.Count(); j++)
{
int tmpNbr = 1;
foreach (string value in results.Item3[j])
{
values[j, tmpNbr - 1] = Converter.Convert(results.Item2[tmpNbr - 1], value).ToString().Replace("'", "");
format[j, tmpNbr - 1] = ExcelColumnTypes.ConvertToExcelTypes(results.Item2[tmpNbr - 1]);
tmpNbr++;
}
pgMain.Maximum = results.Item3.Count();
pgMain.PerformStep();
}
Excel.Range range = xlWorksheet.Range["A3", GetExcelColumnName(cols) + (rows + 2)];
range.Value = values;
range.NumberFormat = format;
}
To add the numberformat efficiently with a single assignment, I've found a solution with a 2d array, which contains all number formats that should be set.
The problem is that I get the following error message "Unable to set the NumberFormat property of the Range class" when I have more than (i think) 50.000 cells to format.
Does someone know a solution that is fast and can handle a large amount of cells without error?
update:
ExcelColumnTypes.ConvertToExcelTypes
public static string ConvertToExcelTypes(string databaseType)
{
if (DatabaseColumnTypes.DOUBLE.Contains(databaseType))
return DOUBLEPO1;
if (DatabaseColumnTypes.DATE.Contains(databaseType))
return DATE2;
if (DatabaseColumnTypes.INTEGER.Contains(databaseType))
return INT;
return TEXT;
}
The DatabaseColumnTypes are List with const or direct const.
Sample:
public const string VARBINARY = "varbinary";
public static List<string> STRING = new List<string>()
{
CHAR,
VARCHAR,
TEXT,
NTEXT,
NCHAR,
NVARCHAR,
BINARY,
VARBINARY
};
Please change the
range.NumberFormat = format;
range.NumberFormatLocal = format;
Then it should work

Replacing an Excel connection range with an Add-In

I've having a bit of a nightmare with an Excel Add-In I've written. The customers workbook used to be populated from a SQL connection and has loads of formulas setup around named tables etc. I'm trying to populate some the same tables that connection populated (using the existing headers and footers) with the data from a WCF service while maintaining formatting and formulas (ie: not break anything).
Getting the data in is fine. The problem I'm hitting is this: The data being replaced may be more or less data than currently exists in the named range. I can't seem to find a way of removing the exising rows and replacing them with my new data and having the named range resize to the new data.
Many thanks in advance.
Range range = activeWorksheet.get_Range("Name", MissingValue);
range.Clear();
object[,] data = new object[result.Length, 26];
range.get_Resize(result.Length, 26);
... fill data....
range.Value2 = data;
Ok, managed to solve it with the code below. Also removed the "range.Clear()" call which stopped the formatting from being removed.
Range range = activeWorksheet.get_Range("Name", MissingValue);
int totalMissingRows = 0;
if (range.Rows.Count < result.Length)
{
totalMissingRows = result.Length - range.Rows.Count;
for (int i = 0, l = totalMissingRows; i < l; i++)
{
Excel.Range rng = range;
rng = (Excel.Range)rng.Cells[rng.Rows.Count, 1];
rng = rng.EntireRow;
rng.Insert(Excel.XlInsertShiftDirection.xlShiftDown, MissingValue);
}
}
//delete extra lines
//remove left over data
for (int i = result.Length, l = range.Rows.Count; i < l; i++) { range.Cells[range.Rows.Count, 1].EntireRow.Delete(null); }
Since, you are already getting an array of data, why not directly write it to the excel like this:
int startRow, startCol;
var startCell = (Range)worksheet.Cells[startRow, startCol];
var endCell = (Range)worksheet.Cells[startRow + result.Length, startCol + 26];
var writeRange = worksheet.get_Range(startCell, endCell);
writeRange.Value2 = data;
Here, I have used the lengths of your array as per your question and data is the 2d array of data.

Excel Work Book - Read from C# substantially slow ?

was experimenting with reading from an excel workbook and noticed it takes a long time to read a sheet with 3560 rows and 7 columns, about 1 minute and 17 seconds. All I did was loop through the whole sheet and store the values in a list.
Is this normal, or am I doing something wrong ?
static void Main(string[] args)
{
List<string> testList = new List<string>();
Excel.Application excelApp = new Excel.Application();
Excel.Workbook workbook = excelApp.Workbooks.Open(#"C:\Users\rnewell\Desktop\FxData.xlsx");
Excel.Worksheet worksheet = workbook.Sheets[1];
Excel.Range range = worksheet.UsedRange;
int rowCount = range.Rows.Count;
int colCount = range.Columns.Count;
int rowCounter = 1;
int colCounter = 1;
while (rowCounter < rowCount)
{
colCounter = 1;
while (colCounter <= colCount)
{
//Console.Write(range.Cells[rowCounter, colCounter].Value2.ToString() + " ");
testList.Add(range.Cells[rowCounter, colCounter].Value2.ToString());
colCounter++;
}
Console.WriteLine();
rowCounter++;
}
Console.ReadKey();
excelApp.Workbooks.Close();
}
#TimWilliams' comment is the correct answer. Reading a single cell takes as long as reading a range of any size. This is the overhead of talking to the COM layer, and you are incurring it thousands of times. You should write the range to an object[,], and then access that array cell by cell.
int rowCount = range.Rows.Count;
int colCount = range.Columns.Count;
object[,] values= range.Value2;
int rowCounter = 1;
int colCounter = 1;
while (rowCounter < rowCount)
{
colCounter = 1;
while (colCounter <= colCount)
{
// check for null?
testList.Add(values[rowCounter, colCounter].ToString());
}
}
Note that the array will be one-based instead of zero-based like normal C# arrays. The data will go from 1 to rowCount and from 1 to colCount, but Rows and Columns properties will return rowCount and colCount, not 1 + rowCount and 1 + colCount. If you want to write data back, you can use a zero-based array of the right size (in fact you have to AFAIK since you can't create a one-based array) and it will work fine.
Since you are loading data from the Open XML (*.xlsx) file format, I would suggest you use Open XML SDK. It doesn't start Excel in the background which is always a good thing, in particular if you need to run your code non-interactively.
I've also written a blog post on different methods of accessing data in Excel which you might find useful.
In general, it should be a matter of seconds.
But as you are creating an instance of Excel itself including its addons it may take a long time to initialize everything in your instance.
For your purpose you can use any public domain excel sheet reading library which doesn't launch Excel.

How to speed up dumping a DataTable into an Excel worksheet?

I have the following routine that dumps a DataTable into an Excel worksheet.
private void RenderDataTableOnXlSheet(DataTable dt, Excel.Worksheet xlWk,
string [] columnNames, string [] fieldNames)
{
// render the column names (e.g. headers)
for (int i = 0; i < columnNames.Length; i++)
xlWk.Cells[1, i + 1] = columnNames[i];
// render the data
for (int i = 0; i < fieldNames.Length; i++)
{
for (int j = 0; j < dt.Rows.Count; j++)
{
xlWk.Cells[j + 2, i + 1] = dt.Rows[j][fieldNames[i]].ToString();
}
}
}
For whatever reason, dumping DataTable of 25 columns and 400 rows takes about 10-15 seconds on my relatively modern PC. Takes even longer testers' machines.
Is there anything I can do to speed up this code? Or is interop just inherently slow?
SOLUTION: Based on suggestions from Helen Toomik, I've modified the method and it should now work for several common data types (int32, double, datetime, string). Feel free to extend it. The speed for processing my dataset went from 15 seconds to under 1.
private void RenderDataTableOnXlSheet(DataTable dt, Excel.Worksheet xlWk, string [] columnNames, string [] fieldNames)
{
Excel.Range rngExcel = null;
Excel.Range headerRange = null;
try
{
// render the column names (e.g. headers)
for (int i = 0; i < columnNames.Length; i++)
xlWk.Cells[1, i + 1] = columnNames[i];
// for each column, create an array and set the array
// to the excel range for that column.
for (int i = 0; i < fieldNames.Length; i++)
{
string[,] clnDataString = new string[dt.Rows.Count, 1];
int[,] clnDataInt = new int[dt.Rows.Count, 1];
double[,] clnDataDouble = new double[dt.Rows.Count, 1];
string columnLetter = char.ConvertFromUtf32("A".ToCharArray()[0] + i);
rngExcel = xlWk.get_Range(columnLetter + "2", Missing.Value);
rngExcel = rngExcel.get_Resize(dt.Rows.Count, 1);
string dataTypeName = dt.Columns[fieldNames[i]].DataType.Name;
for (int j = 0; j < dt.Rows.Count; j++)
{
if (fieldNames[i].Length > 0)
{
switch (dataTypeName)
{
case "Int32":
clnDataInt[j, 0] = Convert.ToInt32(dt.Rows[j][fieldNames[i]]);
break;
case "Double":
clnDataDouble[j, 0] = Convert.ToDouble(dt.Rows[j][fieldNames[i]]);
break;
case "DateTime":
if (fieldNames[i].ToLower().Contains("time"))
clnDataString[j, 0] = Convert.ToDateTime(dt.Rows[j][fieldNames[i]]).ToShortTimeString();
else if (fieldNames[i].ToLower().Contains("date"))
clnDataString[j, 0] = Convert.ToDateTime(dt.Rows[j][fieldNames[i]]).ToShortDateString();
else
clnDataString[j, 0] = Convert.ToDateTime(dt.Rows[j][fieldNames[i]]).ToString();
break;
default:
clnDataString[j, 0] = dt.Rows[j][fieldNames[i]].ToString();
break;
}
}
else
clnDataString[j, 0] = string.Empty;
}
// set values in the sheet wholesale.
if (dataTypeName == "Int32")
rngExcel.set_Value(Missing.Value, clnDataInt);
else if (dataTypeName == "Double")
rngExcel.set_Value(Missing.Value, clnDataDouble);
else
rngExcel.set_Value(Missing.Value, clnDataString);
}
// figure out the letter of the last column (supports 1 letter column names)
string lastColumn = char.ConvertFromUtf32("A".ToCharArray()[0] + columnNames.Length - 1);
// make the header range bold
headerRange = xlWk.get_Range("A1", lastColumn + "1");
headerRange.Font.Bold = true;
// autofit for better view
xlWk.Columns.AutoFit();
}
finally
{
ReleaseObject(headerRange);
ReleaseObject(rngExcel);
}
}
private void ReleaseObject(object obj)
{
try
{
System.Runtime.InteropServices.Marshal.ReleaseComObject(obj);
obj = null;
}
catch
{
obj = null;
}
finally
{
GC.Collect();
}
}
Instead of setting cell values one by one, do it in a batch.
Step 1. Transfer the data from your DataTable into an array with the same dimensions.
Step 2. Define an Excel Range object that spans the appropriate range.
Step 3. Set the Range.Value to the array.
This will be a lot faster because you will have a total two calls across the Interop boundary (one to get the Range object, one to set its value), instead of two per cell (get cell, set value).
There is some sample code at MSDN KB article 302096.
Interop is inherently very slow.
There is a large overhead associated with each call.
To speed it up try writing back an object array of data to a range of cells in one assignment statement.
Or if this is a serious problem try using one of the Managed Code Excel extensions that can read/write data using managed code via the XLL interface. (Addin Express, Managed XLL etc.)
If you have a recordset, the fastest way to write to Excel is CopyFromRecordset.
Do you have a specific requirement to go the COM automation route? If not, you have a few other options.
Use the OLEDB provider to create/write to an Excel file http://support.microsoft.com/kb/316934
Use a third party library to write to Excel. Depending on your licensing requirements there are a few options.
Update: A good free library is NPOI http://npoi.codeplex.com/
Write the data to a csv file, and load that into Excel
Write the data as XML which can be loaded into Excel.
Use the Open XML SDK
http://www.microsoft.com/downloads/details.aspx?familyid=C6E744E5-36E9-45F5-8D8C-331DF206E0D0&displaylang=en
Interop has the fastest method called CopyFromRecordset
but ADODB library has to be used
Definitely the fastest way/method and I have tried a few. Perhaps, not easy to use but the speed is astonishing:
https://learn.microsoft.com/en-us/office/vba/api/excel.range.copyfromrecordset
a short sample:
using ADODB;
using Microsoft.Office.Interop;
//--- datatable --- already exists
DataTable dt_data = new DataTable();
//--- or your dt code is here ..........
//--- mine has 3 columns ------
//--- code to populate ADO rs with DataTable data --- nothing special
//--- create empty rs .....
ADODB.Recordset rs = new ADODB.Recordset();
rs.CursorType = CursorTypeEnum.adOpenKeyset;
rs.CursorLocation = CursorLocationEnum.adUseClient;
rs.LockType = LockTypeEnum.adLockOptimistic;
rs.Fields.Append("employee_id",DataTypeEnum.adBSTR,255,FieldAttributeEnum.adFldIsNullable);
rs.Fields.Append("full_name", DataTypeEnum.adBSTR, 255, FieldAttributeEnum.adFldIsNullable);
rs.Fields.Append("start_date", DataTypeEnum.adBSTR, 10, FieldAttributeEnum.adFldIsNullable);
rs.Open();
//--- populate ADO rs with DataTable data ----
for (int i = 0; i < dt_data.Rows.Count; i++)
{
rs.AddNew();
rs.Fields["employee_id"].Value = dt_data.Rows[i]["employee_id"].ToString();
rs.Fields["full_name"].Value = dt_data.Rows[i]["full_name"].ToString();
//--- if date is empty......
if (dt_data.Rows[i]["start_date"].ToString().Length > 0)
{
rs.Fields["start_date"].Value = dt_data.Rows[i]["start_date"].ToString();
}
rs.Update();
}
Microsoft.Office.Interop.Excel.Application xlexcel;
Microsoft.Office.Interop.Excel.Workbook xlWorkBook;
Microsoft.Office.Interop.Excel.Worksheet xlWorkSheet;
object misValue = System.Reflection.Missing.Value;
xlexcel = new Microsoft.Office.Interop.Excel.Application();
xlexcel.Visible = true;
xlWorkBook = xlexcel.Workbooks.Add(misValue);
xlWorkSheet = (Microsoft.Office.Interop.Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);
//--- populate columns from rs --
for (int i = 0; i < rs.Fields.Count; i++)
{
xlWorkSheet.Cells[1, i + 1] = rs.Fields[i].Name.ToString();
};
//----- .CopyFromRecordset method -- (rs object, MaxRows, MaxColumns) --- in this case 3 columns but it can 1,2,3 etc ------
xlWorkSheet.Cells[2, 1].CopyFromRecordset(CloneFilteredRecordset(rs), rs.RecordCount, 3);
You could create an Excel add-in, with VBA code to do all your db heavy lifting. from .NET, all you'd need to do is instantiate Excel, add the add-in, and call the Excel VBA routine, passing any parameters to it that it needs to execute your SQL statements.
I agree with Charles. Interop is really slow. But try this:
private void RenderDataTableOnXlSheet(DataTable dt, Excel.Worksheet xlWk,
string [] columnNames, string [] fieldNames)
{
// render the column names (e.g. headers)
int columnLength = columnNames.Length;
for (int i = 0; i < columnLength; i++)
xlWk.Cells[1, i + 1] = columnNames[i];
// render the data
int fieldLength = fieldNames.Length;
int rowCount = dt.Rows.Count;
for (int j = 0; j < rowCount; j++)
{
for (int i = 0; i < fieldLength; i++)
{
xlWk.Cells[j + 2, i + 1] = dt.Rows[j][fieldNames[i]].ToString();
}
}
}
HTH

Categories