I'm trying to read CSV files through OleDb and C#. I'm able to read most of the files perfectly but only in some cases I'm getting empty cell value(even in this file some cells value are coming but not all) even if value is there. Have any of you faced such issue with oleDB and CSV files?? If yes, then please tell the solution.
OLEDB likes to guess at the data types based on the values found in the first few rows, and anything that doesn't fit that data type after it's guessed comes back null/empty.
So if you had a csv like the following...
1,A
2,B
3,C
4,D
5A,E
5B,F
6,G
7,H
depending on registry settings (I'm more familiar with this issue for Excel, not sure if it's configured the same way for CSV), OLEDB might read the first 8 records, and decide the first column is numeric because the majority of the data is numeric, and the second is char, and once it sets those data types, if it reads a non-numeric value for that first column, it doesn't throw any error, just returns the value as null.
If this is your issue, I believe you can work around it by using IMEX=1 in your connection string to force mixed data to be read as text, and then when you retrieve the values, I always use GetValue, as opposed to GetString or GetDouble, etc.
Related
I have a field in a sqlite database, we'll call it field1, on which I'm trying to iterate over each record (there's over a thousand records). The field type is string. The value of field1 in the first four rows are as follows:
DEPARTMENT
09:40:24
PARAM
350297
Here is some simple code I use to iterate over each row and display the value:
while (sqlite_datareader.Read())
{
strVal = sqlite_datareader.GetString(0);
Console.WriteLine(strVal);
}
The first 3 values display correctly. However, when it gets to the numerical entry 350297 it errors out with the following exception on the .getString() method
An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.SQLite.dll
I've tried casting to a string, and a bunch of other stuff. But I can't get to the bottom of why this is happening. For now, I'm forced to use getValue, which is of type object, then convert back to a string. But I'd like to figure out why getString() isn't working here.
Any ideas?
EDIT: Here's how I currently deal with the problem:
object objVal; // This is declared before the loop starts...
objVal = sqlite_datareader.IsDBNull(i) ? "" : sqlite_datareader.GetValue(i);
if (objVal != "")
{
strVal = (string)objVal;
}
What the question should have included is
The table schema, preferrably the CREATE TABLE statement used to define the table.
The SQL statement used in opening the sqlite_datareader.
Any time you're dealing with data type issues from a database, it is prudent to include such information. Otherwise there is much unnecessary guessing and floundering (as apparent in the comments), when so very useful, crucial information is explicitly defined in the schema DDL. The underlying query for getting the data is perhaps less critical, but it could very well be part of the issue if there are CAST statements and/or other expressions that might be affecting the returned types. If I were debugging the issue on my own system, these are the first thing I would have checked!
The comments contain good discussion, but a best solution will come with understanding how sqlite handles data types straight from the official docs. The key takeaway is that sqlite defines type affinities on a column and then stores actual values according to a limited set of storage classes. A type affinity is a type to which data will attempt to be converted before storing. But (from the docs) ...
The important idea here is that the type is recommended, not required. Any column can still store any type of data.
But now consider...
A column with TEXT affinity stores all data using storage classes NULL, TEXT or BLOB. If numerical data is inserted into a column with TEXT affinity it is converted into text form before being stored.
So even though values of any storage class can be stored in any column, the default behavior should have been to convert any numeric values, like 350297, as a string before storing the value... if the column was properly declared as a TEXT type.
But if you read carefully enough, you'll eventually come to the following at the end of section 3.1.1. Affinity Name Examples:
And the declared type of "STRING" has an affinity of NUMERIC, not TEXT.
So if the question details are taken literally and field1 was defined like field1 STRING, then technically it has NUMERIC affinity and so a value like 350297 would have been stored as an integer, not a string. And the behavior described in the question is precisely what one would expect when retrieving data into strictly-typed data model like System.Data.SQLite.
It is very easy to cuss at such an unintuitive design decisions and I won't defend the behavior, but
at least the results of "STRING" type are clearly stated so that the column can be redefined to TEXT in order to fix the problem, and
"STRING" is actually not a standard SQL data type. SQL strings are instead defined with TEXT, NTEXT, CHAR, NCHAR, VARCHAR, NVARCHAR, etc.
The solution is either to use code as currently implemented: Get all values as objects and then convert to string values... which should be universally possible with .Net objects since they should all have ToString() method defined.
Or, redefine the column to have TEXT affinity like
CREATE TABLE myTable (
...
field1 TEXT,
...
)
Exactly how to redefine an existing column filled with data is another question altogether. However, at least when doing the conversion from the original to the new column, remember to use a CAST(field1 AS TEXT) to ensure the storage class is changed for the existing data. (I'm not certain whether type affinity is "enforced" when simply copying/inserting data from an existing table into another or if the original storage class is preserved by default. That's why I suggest the cast to force it to a text value.)
I'm working with C# and MySQL now. I've tried to search around the internet for day to find out why I can't use AddWithValue method to add unicode characters because when I manually add it in MySQL, it works! But back in the C# code with MySQL connector for .NET it doesn't work. Other than the unicode characters is fine.
cmd.CommandText = "INSERT INTO tb_osm VALUES (#id, #timestamp, #user)";
cmd.Parameters.AddWithValue("#id", osmobj.ID);
cmd.Parameters.AddWithValue("#timestamp", osmobj.TimeStamp);
cmd.Parameters.AddWithValue("#user", osmobj.User);
cmd.ExecuteNonQuery();
For example: osmbj.User = "ສະບາຍດີ", it will be "???????" in the database.
Please T^T
does this link help you?
read/write unicode data in MySql
Basically it says, you should append your connection string with charset=utf8;
Like so:
id=my_user;password=my_password;database=some_db123;charset=utf8;
You have to be sure that unicode characters are supported at every level of the process, all the way from the input into C# to the column stored in MySql.
The C# level is easy, because strings are already utf-16 by default. As long as you're not using some weird gui toolkit, reading from a bad file or network stream, or running in a weird console app environment with no unicode support, you'll be in good shape.
The next layer is the parameter definition. Here, you're better off avoiding the AddWithValue() method, anyway. The link pertains the Sql Server, but the same reasoning applies to MySql, even if MySql is less strict with your data than it should be. You should use an Add() override that lets you explicitly the declare the type of your parameters as NVarChar, instead of making the ADO.Net provider try to guess.
Next up is the connection between your application and the database. Here, you want to make sure to include the charset=utf8 clause (or better) as part of the connection string.
Then we need to think about the collation of the database itself. You have to be sure that an NVarChar column in MySql will be able to support your data. One of the answers from the question at previous link also covers how to handle this.
Finally, make sure the column is defined with the NVarChar type, instead of just VarChar.
Yes, utf8 at all stages -- byte-encoding in client, conversion on the wire (charset=utf8), and on the column. I do not know whether C# converts from utf16 to utf8 before exposing the characters; if it does not, then charset=utf16 (or no setting) may be the correct step.
Because you got multiple ?, the likely cause is trying to transform non-latin1 characters into a CHARACTER SET latin1 column. Since latin1 has no codes for Lao, ? was substituted. Probably you said nothing about the column, but depended on the DEFAULT on the table and/or database, which happened to be latin1.
The ສະບາຍດີ is lost and cannot be recovered from ???????.
Once you have changed things, check that it is stored correctly by doing SELECT col, HEX(col) .... For the string ສະບາຍດີ, you should get hex E0BAAAE0BAB0E0BA9AE0BAB2E0BA8DE0BA94E0BAB5. Notice how that is groups of E0BAxx, which is the range of utf8 values for Lao.
If you still have troubles, please provide the HEX for further analysis.
I have been working with excel spreadsheets and so far I never had any problems with them.. But this error,"Not a legal OleAut date.", showed up out of the blue when I tried to read an excel file. Does anyone know how I can fix this. Here is the code I use to read the excel and put the data into a dataset. It has worked fine previously but after I made some changes (which doesn't involve dates) to the data source this error showed up.
var fileName = string.Format("C:\\Drafts\\Excel 97-2003 formats\\All Data 09 26 2012_Edited.xls");
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);
var adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connectionString);
DataSet Originalds = new DataSet();
adapter.Fill(Originalds, "Employees"); // this is where the error shows up
I sort of figured out a work around to this problem I changed the connection string to the latest oleDB provider.
var connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=YES;'", fileName);
I also made sure that the empty space after the last row of my excel data is not being read as a value.
Look for different date formats, in my case one of the column was in dd/mm/yyyy format and the other was in mm/dd/yyyy. Once the date formats were rectified data import worked.
In my case, I had a date in my table inserted manually with the wrong format, it was 01/01/0019 instead of 01/01/2019.
I have this issue while reading excel in oledb.
I found that in excel many date columns are empty in the beginning records. i had 6000 records in excel and one of the datecolumn has around first 4000 empty cells so excel is not able to understand what to convert the cell so i added one dummy record fill with date and process my file. Or you can move few records which has date value in it at top
How can you fix this? Hi!jack your data with a dummy row at row 1 and force the column(s) in question into a string (in this case only - it is a data type error, so apply the fix according to the type).
It is necessary to understand what the data adaptor does, which interprets the data type of each column by examining, by default, the first 8 rows of data (sans header if HDR=Yes in connect string) and deciding on a data type (it can be over-ridden -yes there is an override - in the connection string to 16 rows - almost never very helpful).
Data adaptors can do other nasty things, like skip strings in columns of mixed data types, like string/double (which is really just a string column, but not to the adaptor if the first rows are all double). It won't even give you the courtesy of an error in this example.
This often occurs in data coming from ERP sources that contains "Free Form" columns. User defined columns are the usual suspects. It can be very difficult to find in other data type issues. I once spent quite a bit of time resolving an issue with a column that typed as a string with a max length of 255 chars. Deep in the data there were cells that exceeded that length and threw errors.
If you don't want to advance to the level of "Genetic Engineering" in working with data adaptors, the fastest way to resolve an issue like this is to hi!jack your data and force the column(s) in question to the correct type (or incorrect, which you can then correct in your own code if need be). Plan B is to give the data back to the customer/user and tell them to correct it. Good luck with Plan B. There is a reason it isn't Plan A.
More on manipulating via the connection string and similar issues with the adaptor - but be wary, results are not going to be 100% fool proof. I've tested changing IMEX and HDR settings extensively. If you want to get through the project quickly, hi!jack the data. OleDB & mixed Excel datatypes : missing data
Here is another posting similar in context, note all of the possible time consuming solutions. I have yet to be convinced there is a better solution, and it simply defies the logic a programmer brings to the keyboard every morning. Too bad, you have a job to do, sometimes you have to be a hack. DateTime format mismatch on importing from Excel Sheet
I'm using an OleDbDataAdapter (Microsoft.ACE.OLEDB.12.0 to be precise) to retrieve data from an Excel workbook. For one table I'm using a typed dataset but for another table I can't do that since the number of columns is unknown (the Excel template may generate extra columns).
The problem was that if someone enters too many numeric values in a column, "JET" seems to assume it's a numeric column and the textual values are not loaded anymore. I know you can change the template and set the specific data type for that column but the template is already widely spread so I'd rather resolve it during import.
Now what I tried was first counting the number of used columns and preparing a new DataTable with a defined Columns collection and setting their DataType property to typeof(string). Sadly JET doesn't seem to be looking at this and still chooses it's own way. I'm guessing that even if I could use a strongly typed dataset here, it wouldn't help either...
Does anyone know how to tell JET how to import the data so I don't have to face the burden of delivering a new template version?
Please *PLEASE*: don't come with an Excel automation solution...
If you have access to the registry set TypeGuessRows=0 and/or ImportMixedTypes=Text. See here for more info INITIALIZATION SETTINGS
I have a formated Excel file, over which I do not have control, and I need to read the information contained in it.
The problem with the file is that the first few rows contain formated information and I can not modify that file nor I can not ask for a format change.
Is it possible then, to read such a file through ADO.Net?
Thanks in advance,
I think this article explains how to do this pretty well:
http://support.microsoft.com/kb/316934
As for the lack of a header row...
With Excel workbooks, the first row in
a range is the header row (or field
names) by default. If the first range
does not contain headers, you can
specify HDR=NO in the extended
properties in your connection string.
If you specify HDR=NO in the
connection string, the Jet OLE DB
provider automatically names the
fields for you (F1 represents the
first field, F2 represents the second
field, and so on).