I am doing the task of importing xls file to sql server 2008 using c#, the xls file contains 3
column (ProductCode = having alphanumeric values,Productname = having string values,
Categoryids = having alphanumeric values) in xls file.
When I am importing the xls through my code it reads Productname,Categoryids but ProductCode with only numeric values, it can not read the codes which containing characters.
eg : sample column values
productcode
-30-sunscreen-250ml,
04 5056,
045714PC,
10-cam-bag-pouch-navy-dot,
100102
I reads 100102, but it can not reads the [045714PC,04 5056,-30-sunscreen-250ml,10-cam-bag-pouch-navy-dot]
Please suggest any solutions.
Thanks
Excel's OLEDB driver makes assumptions about the column's data based on the first 8 rows of data. If the majority of the first 8 rows for a given column, it assumes the entire column is numeric and then can't properly handle the alphanumeric values.
There are four solutions for this:
Sort your incoming data so the majority of the first 8 rows have alphanumeric values in that column (and in any other column with mixed numeric / alphanumeric data).
Add rows of fake data in, say, rows 2-9 that you ignore when you actually perform the import, and ensure that row contains letters in any column that should not be strictly numeric.
Edit the REG_DWORD key called "TypeGuessRows" located at [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel] in your registry and change the 8 to a 0. This will force Excel to look through the entire sheet before guessing the column data types. However, this can hinder performance. (You can also change the value from 8 to anything between 1 and 16, but that just changes how many rows Excel looks at, and 16 may still not be enough for you.)
Add ";IMEX=1" in your connection string. This will change the logic to look for at least one non-numeric value instead of looking at the majority of the values. This may then be combined with solution (1) or (2) to ensure it "sees" an alphanumeric value in the appropriate columns within the first 8 rows.
Related
I've been using LinqToExcel to import data from .xlsx files successfully for a while. Recently, however, I was sent a .csv file that I'm unable to read the data of.
Let's say that the file contains the following data:
Col1 Col2 Col3
A B C
D E F
I've created a class for mapping the columns as such:
public class Test
{
[ExcelColumn("Col1")]
public string Col1 { get; set; }
[ExcelColumn("Col2")]
public string Col2 { get; set; }
[ExcelColumn("Col3")]
public string Col3 { get; set; }
}
Then I try to read the data like so:
var test = from c in excel.Worksheet<Test>()
select c;
The query successfully returns two Test-objects, but all property values are null.
I even tried to read the data without class and header:
var test = from c in excel.WorksheetNoHeader()
select c;
In this case, the query also returns two rows, both with three cells/values. But again all of these values are null. What could be the issue here?
I should also note that the file opens and looks perfectly fine in Excel. Furthermore using StreamReader, I'm able to read all of its rows and values.
What type of data is in each of those columns? (string, numeric, ...)
According to Initializing the Microsoft Excel driver
TypeGuessRows
The number of rows to be checked for the data type. The data type is
determined given the maximum number of kinds of data found. If there
is a tie, the data type is determined in the following order: Number,
Currency, Date, Text, Boolean. If data is encountered that does not
match the data type guessed for the column, it is returned as a Null
value. On import, if a column has mixed data types, the entire column
will be cast according to the ImportMixedTypes setting. The default
number of rows to be checked is 8. Values are of type REG_DWORD.
See post Can I specify the data type for a column rather than letting linq-to-excel decide?
The post Setting TypeGuessRows for excel ACE Driver states how to change the value for TypeGuessRows.
When the driver determines that an Excel column contains text data,
the driver selects the data type (string or memo) based on the longest
value that it samples. If the driver does not discover any values
longer than 255 characters in the rows that it samples, it treats the
column as a 255-character string column instead of a memo column.
Therefore, values longer than 255 characters may be truncated. To
import data from a memo column without truncation, you must make sure
that the memo column in at least one of the sampled rows contains a
value longer than 255 characters, or you must increase the number of
rows sampled by the driver to include such a row. You can increase the
number of rows sampled by increasing the value of TypeGuessRows under
the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
registry key.
One more thing we need to keep in mind is that the registry
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
only applies to Excel 97- 2003. For Excel 2007 and higher version,
Excel Open XML (.XLSX extension) actually uses ACE OLE DB provider
rather JET provider. If you want to keep the file extension as .XLSX,
you need to modify the following registry key according to your Excel
version:
Excel 2007: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access
Connectivity Engine\Engines\Excel\TypeGuessRows Excel 2010:
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity
Engine\Engines\Excel\TypeGuessRows Excel 2013:
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity
Engine\Engines\Excel\TypeGuessRows
Did you try to materialize your query by calling ToList or ToArray at the end?
I tried to recreate your case and had no trouble reading the data from the Excel file using the following code snippet:
var excel = new ExcelQueryFactory(FilePath);
List<Test> tests = (
from c in excel.Worksheet<Test>()
select c
)
.ToList();
It returns two objects with all properties filled properly.
One minor thing, when I added ToList initially, I got the following exception:
The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine.'
Which according to what they say in the official docs seems reasonable since I was missing Microsoft Access Database Engine 2010 Distributable on my machine.
I have been working with excel spreadsheets and so far I never had any problems with them.. But this error,"Not a legal OleAut date.", showed up out of the blue when I tried to read an excel file. Does anyone know how I can fix this. Here is the code I use to read the excel and put the data into a dataset. It has worked fine previously but after I made some changes (which doesn't involve dates) to the data source this error showed up.
var fileName = string.Format("C:\\Drafts\\Excel 97-2003 formats\\All Data 09 26 2012_Edited.xls");
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);
var adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connectionString);
DataSet Originalds = new DataSet();
adapter.Fill(Originalds, "Employees"); // this is where the error shows up
I sort of figured out a work around to this problem I changed the connection string to the latest oleDB provider.
var connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=YES;'", fileName);
I also made sure that the empty space after the last row of my excel data is not being read as a value.
Look for different date formats, in my case one of the column was in dd/mm/yyyy format and the other was in mm/dd/yyyy. Once the date formats were rectified data import worked.
In my case, I had a date in my table inserted manually with the wrong format, it was 01/01/0019 instead of 01/01/2019.
I have this issue while reading excel in oledb.
I found that in excel many date columns are empty in the beginning records. i had 6000 records in excel and one of the datecolumn has around first 4000 empty cells so excel is not able to understand what to convert the cell so i added one dummy record fill with date and process my file. Or you can move few records which has date value in it at top
How can you fix this? Hi!jack your data with a dummy row at row 1 and force the column(s) in question into a string (in this case only - it is a data type error, so apply the fix according to the type).
It is necessary to understand what the data adaptor does, which interprets the data type of each column by examining, by default, the first 8 rows of data (sans header if HDR=Yes in connect string) and deciding on a data type (it can be over-ridden -yes there is an override - in the connection string to 16 rows - almost never very helpful).
Data adaptors can do other nasty things, like skip strings in columns of mixed data types, like string/double (which is really just a string column, but not to the adaptor if the first rows are all double). It won't even give you the courtesy of an error in this example.
This often occurs in data coming from ERP sources that contains "Free Form" columns. User defined columns are the usual suspects. It can be very difficult to find in other data type issues. I once spent quite a bit of time resolving an issue with a column that typed as a string with a max length of 255 chars. Deep in the data there were cells that exceeded that length and threw errors.
If you don't want to advance to the level of "Genetic Engineering" in working with data adaptors, the fastest way to resolve an issue like this is to hi!jack your data and force the column(s) in question to the correct type (or incorrect, which you can then correct in your own code if need be). Plan B is to give the data back to the customer/user and tell them to correct it. Good luck with Plan B. There is a reason it isn't Plan A.
More on manipulating via the connection string and similar issues with the adaptor - but be wary, results are not going to be 100% fool proof. I've tested changing IMEX and HDR settings extensively. If you want to get through the project quickly, hi!jack the data. OleDB & mixed Excel datatypes : missing data
Here is another posting similar in context, note all of the possible time consuming solutions. I have yet to be convinced there is a better solution, and it simply defies the logic a programmer brings to the keyboard every morning. Too bad, you have a job to do, sometimes you have to be a hack. DateTime format mismatch on importing from Excel Sheet
I'm trying to read CSV files through OleDb and C#. I'm able to read most of the files perfectly but only in some cases I'm getting empty cell value(even in this file some cells value are coming but not all) even if value is there. Have any of you faced such issue with oleDB and CSV files?? If yes, then please tell the solution.
OLEDB likes to guess at the data types based on the values found in the first few rows, and anything that doesn't fit that data type after it's guessed comes back null/empty.
So if you had a csv like the following...
1,A
2,B
3,C
4,D
5A,E
5B,F
6,G
7,H
depending on registry settings (I'm more familiar with this issue for Excel, not sure if it's configured the same way for CSV), OLEDB might read the first 8 records, and decide the first column is numeric because the majority of the data is numeric, and the second is char, and once it sets those data types, if it reads a non-numeric value for that first column, it doesn't throw any error, just returns the value as null.
If this is your issue, I believe you can work around it by using IMEX=1 in your connection string to force mixed data to be read as text, and then when you retrieve the values, I always use GetValue, as opposed to GetString or GetDouble, etc.
I recently started learning Linq and SQL. As a small project I'm writing a dictionary application for Windows Phone. The project is split into two Applications. One Application (that currently runs on my PC) generates a SDF file on my PC. The second App runs on my Windows Phone and searches the database. However I would like to optimize the data usage. The raw entries of the dictionary are written in a TXT file with a filesize of around 39MB. The file has the following layout
germanWord \tab englishWord \tab group
germanWord \tab englishWord \tab group
The file is parsed into a SDF database with the following tables.
Table Word with columns _version (rowversion), Id (int IDENTITY), Word (nvarchar(250)), Language (int)
This table contains every single word in the file. The language is a flag from my code that I used in case I want to add more languages later. A word-language pair is unique.
Table Group with columns _version (rowversion), GroupId (int IDENTITY), Caption (nvarchar(250))
This table contains the different groups. Every group is present one time.
Table Entry with columns _version (rowversion), EntryId (int IDENTITY), WordOneId (int), WordTwoId(int), GroupId(int)
This table links translations together. WordOneId and WordTwoId are foreign keys to a row in the Word Table, they contain the id of a row. GroupId defines the group the words belong to.
I chose this layout to reduce the data footprint. The raw textfile contains some german (or english) words multiple times. There are around 60 groups that repeat themselfes. Programatically I reduce the wordcount from around 1.800.000 to around 1.100.000. There are around 50 rows in the Group table. Despite the reduced number of words the SDF is around 80MB in filesize. That's more than twice the size of the the raw data. Another thing is that in order to speed up the searching of translation I plan to index the Word column of the Word table. By adding this index the file grows to over 130MB.
How can it be that the SDF with ~60% of the original data is twice as large?
Is there a way to optimize the filesize?
The database file must contain all of the data from your raw file, in addition to row metadata -- it also will contain the strings based on the datatypes specified -- I believe your option here is NVARCHAR which uses two bytes per letter. Combining these considerations, it would not surprise me that a database file is over twice as large as a text file of the same data using the ISO-Latin-1 character set.
I have a C# add-in for Excel and I need to put the data on a worksheet. I do it like so:
// Now build a string array of the results
string[,] resultArray = new string[objects.Length, length];
// Fill in the values
Excel.Range resultRange = worksheet.get_Range("A2").get_Resize(objects.Length, length);
resultRange.Value = resultArray;
I have left out some unimportant steps. Basically, I am passed an array of objects, I get the type from the first one and use the properties to build a list of column names. I put that in the row 1. That is why I start the data in row 2.
The issue I ran across is that I had an Excel 97-2003 workbook (with max rows of around 65K row) and I tried to bring in 105K objects. It choked on the resize. I would like to check to see if my resize is even valid so I can tell the user, but I can't seem to find a "MaxRows" or similar property. Is there one?
worksheet.Rows.Count will give you the maximum number of rows in a given worksheet. If you check this property before accessing a large number of rows you can make your program compatible with 2003, 2007, and take advantage of increased numbers of rows in all future versions of Excel.
Here are the hard limits for a worksheet in Excel 2003. There is no programmatic way to exceed these limits, although you could start spreading data across multiple sheets or various other bits of trickery.
Excel 2010 has expanded these limits significantly. So an option might be to get your customer to upgrade as a part of the project.