I am reading a xlsx-file via oledb. There are some rows where a column (containing a date-string) returns null and some rows where the column (also containing a date-string) returns the date-string. In excel the column-type is set to "date".
Here is my connection-string:
$"Provider=Microsoft.ACE.OLEDB.12.0;Data
Source={PATH_TO_FILE};Extended Properties=\"Excel 12.0
Xml;HDR=NO\""
Here is the command-text to query the data:
$"SELECT * FROM [SHEET_NAME$A4:BC] WHERE
F1 IS NOT NULL"
Here is how i read the data from the data-record:
var test = dataRecord.GetValue(dataRecord.GetOrdinal("F39"));
Her are some examples what the inspector shows me when test contains the date-string:
{07.01.1975 00:00:00}
{03.08.1987 00:00:00}
{03.10.1988 00:00:00}
{01.05.1969 00:00:00}
{20.12.2016 00:00:00}
{18.07.2011 00:00:00}
In other cases the inspector only show:
{}
Here is a screenshot from the xlsx-document where i have marked a line in red where the return-value is empty and green where the actual date-string is returned:
The date-strings are formatted like dd.mm.yyyy
Why do these rows return an empty value instead of the date-string?
As suggested by AndyG i have checked if the date-string values might fail in dependece of the format ("dd.mm.yyyy" vs. "mm.dd.yyyy"). But there are cases which are invalid for "mm.dd.yyyy" that dont fail.
I was not able to solve the problem, but was able to bypass it, by changing the column-type in Excel to text.
I had to copy the whole xls-file, delete the content of the copy, set the column-type to text, copy the content from the first file and paste it into the second file. Otherwise Excel was changing the date-strings to the numbers which are used to store the date.
Now I can read the cells correctly.
Two years too late but after struggling with this for the better part of several hours I hope this might help someone:
It sounds likely the first row in your excel document contains column names and not actual data, which means they are of a different Excel data type (General/Text + DateTime).
The fix to handle is pretty simple - adjust your connection string to reflect this using the HDR property in Extended Properties:
$"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={PATH_TO_FILE};Extended Properties=\"Excel 12.0 Xml;HDR=YES\""
HDR = true means the first row contains field names
You can read more about it here:
https://www.connectionstrings.com/ace-oledb-12-0/
Additionally, if you are encountering this on odd lines in your document like OP, ensure that the data type is identical for the entire column, except your column titles if using HDR=true
Excel can sometimes flip DateTime fields to General fields which would cause this behavior
Related
I'm using Miscrosoft Office Interop Excel to sort and manage a ".csv" file and create an excel file.
When I copy one cell that contains a date for example :
"04/05/2018 18:55"
I substring that into
"04/05/2018"
and paste it into another cell, it transforms it into this weird number
43195
Why is this happening? How can I prevent or modify this? I'm currently passing the info. like this:
String date = worksheet.Cells[1, 1].Value2.ToString();
worksheet.Cells[1, 2].Value2 = date; //this shows up as 43195
It’s not being transformed really. Excel actually stores all dates as a number representing the number of days since 1/0/1900 (really!). What you are seeing in your cell is the raw numeric value representing the date. If you open your result spreadsheet, highlight your column, right click and select formatting, and select “Date”, you’ll see it display the date you expect.
So you are going to want to do this in your code. Assuming your date is in column “B” and your worksheet is “ws”:
ws.Range[“B:B”].NumberFormat = “mm/dd/yyyy”;
This assumes a US date format.
I have a program (actually SSIS script task, but I don't suppose that matters) that creates an OLE DB connection to an Excel workbook, and reads the cell values in each worksheet, storing them in a SQL Server table.
Each worksheet has several sections of rows, each section being for a separate product. The first two rows of each product section are a quarter row, and a year row. Here is a screen shot:
I use an OleDbDataReader with a "Select *" command to read the data in each sheet into a DataTable. I have a column called "YearQuarter" in my SQL database, where I store a concatenation of the year row value and the preceding quarter row value, with a hyphen between the two strings:
My code is like this:
OleDbConnection oleExcelConnection = new OleDbConnection(
"Provider=Microsoft.ACE.OLEDB.12.0;" +
"Data Source=" + strWkbkFilePath + ";" +
"Mode=Read;" +
"Extended Properties=\"Excel 8.0;HDR=No;IMEX=1\"");
oleExcelConnection.Open();
DataTable dtCurrSheet = new DataTable();
// Name of table is in strLoadTblNm.
OleDbCommand oleExcelCommand;
OleDbDataReader oleExcelReader;
oleExcelCommand = excel_conn.CreateCommand();
oleExcelCommand.CommandText = "Select * From [" + strLoadTblNm + "]";
oleExcelCommand.CommandType = CommandType.Text;
oleExcelReader = oleExcelCommand.ExecuteReader();
// Load worksheet into data table
dtSheet.Load(oleExcelReader);
oleExcelReader.Close();
Looking at the output data, I noticed that I was getting inconsistent results. Some rows would have a YearQuarter column value that would have only the Year row value in them, while others would have the cell values from both rows. For example, I'd have "2009 - Year End" followed by just "2010", with no " - 1st Qtr." appended to it.
This is because that quarter cell valued is never loaded into the data reader, as the Dataset Visualizer shows:
Notice also that, in the Dataset, the column that is missing the Quarter cell value also has other numeric values missing their formatting (no commas).
If I save the file as a .csv, all cell values are preserved.
However, I noticed that it wasn't consistent. Sometimes I'd run my package and the same row would now have the full value. So, in the above example, I'd get "2010 - 1st Qtr."
I finally realized that it was working as expected only if I happened to have the workbook open in Excel at the same time that the program was running!
Why would this make a difference? Could it be that there is a macro or something in the workbook that is executed by Excel, but not when the workbook is accessed only via an OLE DB connection? Would the fact that it had been executed in Excel then affect the data obtained by OLE DB? If that's the case, how do I get around this? The spreadsheets are provided to me. So I can't modify them.
I think you're having issues with the auto-formatting thing Excel tries to apply. With an OLEDB connection, I can't see how having the sheet open fixes your problem (obviously very strange).
Try Adding IMEX = 1 to your connection options to treat the entire sheet as text to see if this is your issue. Pulled from OLEDB connection does not read data from excel sheet Also another good post from an external site: Tips for reading Excel spreadsheets using ADO.NET
Also, you're pulling data from an excel sheet and writing it to another excel sheet... Same workbook? I have a couple more ideas for ya though depending on your situation.
This bug turns out to be a "feature", and it should come with a big warning sign.
This article (thanks, #vb4all) explains that "ADO.NET scans the first 8 rows of data, and based on that, guesses the datatype for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!"
In other words, it is treating the worksheet as a relation table, in which all values in a given column are of the same type. Of course, worksheet data is not bound by this restriction.
This behavior can be gotten around by setting IMEX=1 in the connection string options and then modifying these registry settings:
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/Typ
(Note: registry keys vary depending on 32 vs. 64 bit. E.g., for 64-bit, the first one would be HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Jet 4.0).
I think this was a very risky design, inviting data transfer errors that could easily go unnoticed.
I have been working with excel spreadsheets and so far I never had any problems with them.. But this error,"Not a legal OleAut date.", showed up out of the blue when I tried to read an excel file. Does anyone know how I can fix this. Here is the code I use to read the excel and put the data into a dataset. It has worked fine previously but after I made some changes (which doesn't involve dates) to the data source this error showed up.
var fileName = string.Format("C:\\Drafts\\Excel 97-2003 formats\\All Data 09 26 2012_Edited.xls");
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);
var adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connectionString);
DataSet Originalds = new DataSet();
adapter.Fill(Originalds, "Employees"); // this is where the error shows up
I sort of figured out a work around to this problem I changed the connection string to the latest oleDB provider.
var connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=YES;'", fileName);
I also made sure that the empty space after the last row of my excel data is not being read as a value.
Look for different date formats, in my case one of the column was in dd/mm/yyyy format and the other was in mm/dd/yyyy. Once the date formats were rectified data import worked.
In my case, I had a date in my table inserted manually with the wrong format, it was 01/01/0019 instead of 01/01/2019.
I have this issue while reading excel in oledb.
I found that in excel many date columns are empty in the beginning records. i had 6000 records in excel and one of the datecolumn has around first 4000 empty cells so excel is not able to understand what to convert the cell so i added one dummy record fill with date and process my file. Or you can move few records which has date value in it at top
How can you fix this? Hi!jack your data with a dummy row at row 1 and force the column(s) in question into a string (in this case only - it is a data type error, so apply the fix according to the type).
It is necessary to understand what the data adaptor does, which interprets the data type of each column by examining, by default, the first 8 rows of data (sans header if HDR=Yes in connect string) and deciding on a data type (it can be over-ridden -yes there is an override - in the connection string to 16 rows - almost never very helpful).
Data adaptors can do other nasty things, like skip strings in columns of mixed data types, like string/double (which is really just a string column, but not to the adaptor if the first rows are all double). It won't even give you the courtesy of an error in this example.
This often occurs in data coming from ERP sources that contains "Free Form" columns. User defined columns are the usual suspects. It can be very difficult to find in other data type issues. I once spent quite a bit of time resolving an issue with a column that typed as a string with a max length of 255 chars. Deep in the data there were cells that exceeded that length and threw errors.
If you don't want to advance to the level of "Genetic Engineering" in working with data adaptors, the fastest way to resolve an issue like this is to hi!jack your data and force the column(s) in question to the correct type (or incorrect, which you can then correct in your own code if need be). Plan B is to give the data back to the customer/user and tell them to correct it. Good luck with Plan B. There is a reason it isn't Plan A.
More on manipulating via the connection string and similar issues with the adaptor - but be wary, results are not going to be 100% fool proof. I've tested changing IMEX and HDR settings extensively. If you want to get through the project quickly, hi!jack the data. OleDB & mixed Excel datatypes : missing data
Here is another posting similar in context, note all of the possible time consuming solutions. I have yet to be convinced there is a better solution, and it simply defies the logic a programmer brings to the keyboard every morning. Too bad, you have a job to do, sometimes you have to be a hack. DateTime format mismatch on importing from Excel Sheet
I'm trying to read CSV files through OleDb and C#. I'm able to read most of the files perfectly but only in some cases I'm getting empty cell value(even in this file some cells value are coming but not all) even if value is there. Have any of you faced such issue with oleDB and CSV files?? If yes, then please tell the solution.
OLEDB likes to guess at the data types based on the values found in the first few rows, and anything that doesn't fit that data type after it's guessed comes back null/empty.
So if you had a csv like the following...
1,A
2,B
3,C
4,D
5A,E
5B,F
6,G
7,H
depending on registry settings (I'm more familiar with this issue for Excel, not sure if it's configured the same way for CSV), OLEDB might read the first 8 records, and decide the first column is numeric because the majority of the data is numeric, and the second is char, and once it sets those data types, if it reads a non-numeric value for that first column, it doesn't throw any error, just returns the value as null.
If this is your issue, I believe you can work around it by using IMEX=1 in your connection string to force mixed data to be read as text, and then when you retrieve the values, I always use GetValue, as opposed to GetString or GetDouble, etc.
I have a C#/.Net job that imports data from Excel and then processes it. Our client drops off the files and we process them. I don't have any control over the original file.
I use the OleDb library to fill up a dataset. The file contains some numbers like 30829300, 30071500, etc... The data type for those columns is "Text".
Those numbers are converted to scientific notation when I import the data. Is there anyway to prevent this from happening?
One workaround to this issue is to change your select statement, instead of SELECT * do this:
"SELECT Format([F1], 'General Number') From [Sheet1$]"
-or-
"SELECT Format([F1], \"#####\") From [Sheet1$]"
However, doing so will blow up if your cells contain more than 255 characters with the following error:
"Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done."
Fortunately my customer didn't care about erroring out in this scenario.
This page has a bunch of good things to try as well:
http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/
The OleDb library will, more often than not, mess up your data in an Excel spreadsheet. This is largely because it forces everything into a fixed-type column layout, guessing at the type of each column from the values in the first 8 cells in each column. If it guesses wrong, you end up with digit strings converted to scientific-notation. Blech!
To avoid this you're better off skipping the OleDb and reading the sheet directly yourself. You can do this using the COM interface of Excel (also blech!), or a third-party .NET Excel-compatible reader. SpreadsheetGear is one such library that works reasonably well, and has an interface that's very similar to Excel's COM interface.
Using this connection string:
Provider=Microsoft.ACE.OLEDB.12.0; data source={0}; Extended Properties=\"Excel 12.0;HDR=NO;IMEX=1\"
with Excel 2010 I have noticed the following. If the Excel file is open when you run the OLEDB SELECT then you get the current version of the cells, not the saved file values. Furthermore the string values returned for a long number, decimal value and date look like this:
5.0130370071e+012
4.08
36808
If the file is not open then the returned values are:
5013037007084
£4.08
Monday, October 09, 2000
If you look at the actual .XSLX file using Open XML SDK 2.0 Productivity Tool (or simply unzip the file and view the XML in notepad) you will see that Excel 2007 actually stores the raw data in scientific format.
For example 0.00001 is stored as 1.0000000000000001E-5
<x:c r="C18" s="11" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:v>1.0000000000000001E-5</x:v>
</x:c>
Looking at the cell in Excel its displayed as 0.00001 in both the cell and the formula bar. So it not always true that OleDB is causing the issue.
I have found that the easiest way is to choose Zip format, rather than text format for columns with large 'numbers'.
Have you tried casting the value of the field to (int) or perhaps (Int64) as you are reading it?
Look up the IMEX=1 connection string option and TypeGuessRows registry setting on google.
In truth, there is no easy way round this because the reader infers column data types by looking at the first few rows (8 by default). If the rows contain all numbers then you're out of luck.
An unfortunate workaround which I've used in the past is to use the HDR=NO connection string option and set the TypeGuessRows registry setting value to 1, which forces it to read the first row as valid data to make its datatype determination, rather than a header.
It's a hack, but it works. The code reads the first row (containing the header) as text, and then sets the datatype accordingly.
Changing the registry is a pain (and not always possible) but I'd recommend restoring the original value afterwards.
If your import data doesn't have a header row, then an alternative option is to pre-process the file and insert a ' character before each of the numbers in the offending column. This causes the column data to be treated as text.
So all in all, there are a bunch of hacks to work around this, but nothing really foolproof.
I had this same problem, but was able to work around it without resorting to the Excel COM interface or 3rd party software. It involves a little processing overhead, but appears to be working for me.
First read in the data to get the column names
Then create a new DataSet with each of these columns, setting each of their DataTypes to string.
Read the data in again into this new
dataset. Voila - the scientific
notation is now gone and everything is read in as a string.
Here's some code that illustrates this, and as an added bonus, it's even StyleCopped!
public void ImportSpreadsheet(string path)
{
string extendedProperties = "Excel 12.0;HDR=YES;IMEX=1";
string connectionString = string.Format(
CultureInfo.CurrentCulture,
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"{1}\"",
path,
extendedProperties);
using (OleDbConnection connection = new OleDbConnection(connectionString))
{
using (OleDbCommand command = connection.CreateCommand())
{
command.CommandText = "SELECT * FROM [Worksheet1$]";
connection.Open();
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
using (DataSet columnDataSet = new DataSet())
using (DataSet dataSet = new DataSet())
{
columnDataSet.Locale = CultureInfo.CurrentCulture;
adapter.Fill(columnDataSet);
if (columnDataSet.Tables.Count == 1)
{
var worksheet = columnDataSet.Tables[0];
// Now that we have a valid worksheet read in, with column names, we can create a
// new DataSet with a table that has preset columns that are all of type string.
// This fixes a problem where the OLEDB provider is trying to guess the data types
// of the cells and strange data appears, such as scientific notation on some cells.
dataSet.Tables.Add("WorksheetData");
DataTable tempTable = dataSet.Tables[0];
foreach (DataColumn column in worksheet.Columns)
{
tempTable.Columns.Add(column.ColumnName, typeof(string));
}
adapter.Fill(dataSet, "WorksheetData");
if (dataSet.Tables.Count == 1)
{
worksheet = dataSet.Tables[0];
foreach (var row in worksheet.Rows)
{
// TODO: Consume some data.
}
}
}
}
}
}
}
I got one solution from somewhere else but it worked perfectly for me.
No need to make any code change, just format excel columns cells to 'General" instead of any other formatting like "number" or "text", then even Select * from [$Sheet1] or Select Column_name from [$Sheet1] will read it perfectly even with large numeric values more than 9 digits
I googled around this state..
Here are my solulition steps
For template excel file
1-format Excel coloumn as Text
2- write macro to disable error warnings for Number -> text convertion
Private Sub Workbook_BeforeClose(Cancel As Boolean)
Application.ErrorCheckingOptions.BackgroundChecking = Ture
End Sub
Private Sub Workbook_Open()
Application.ErrorCheckingOptions.BackgroundChecking = False
End Sub
On codebehind
3- while reading data to import
try to parse incoming data to Int64 or Int32....