I have a CSV Reading code for ASP.NET application I maintain. This ASP.NET website is running fine from 3 yrs now, and CSV reading code that use Ole.JetDB.4.0 is doing its work fine, except that once in a while some CSV with more than 4K-5K records create a problem. Usually the problem is that a record at random position [random row] miss the first character of it.
CSV File is just bunch of name and addresses per row, and they are in ASNI Format. CSV is comma seperate, no data have "comma" in data and now enclosing of field in Single or Double quote. Also, it doesn't happen often, We use the same code for say 70K record upload they works fine, but some time say in 3 yrs about 3-4 files have this problem only, we upload about one file daily.
For those who need what I did
using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection
("Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties='text;HDR=Yes;FMT=Delimited';Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
{
string sql_select = "select * from [" + this.FileName + "]";
System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter();
da.SelectCommand = new System.Data.OleDb.OleDbCommand(sql_select, conn);
DataSet ds = new DataSet();
// Read the First line of File to know the header
string[] lines = System.IO.File.ReadAllLines(HttpContext.Current.Server.MapPath("/System/SaleList/") + FileName);
string header = "";
if (lines.Length > 0)
header = lines[0];
string[] headers = header.Split(',');
CreateSchema(headers, FileName);
da.Fill(ds, "ListData");
DataTable dt = ds.Tables["ListData"];
}
And this code is working fine except the mention thing. I cut some unrelated part so, might not work by copy paste.
EDIT: More information
I try to use ODBC with Microsoft Text Driver, then I use ACE Driver with OleDB. the result is same with all three drive.
If I swap the problem record, with the preceding Row those rows are read quite well, until the next problem row [if more than one row is having problem in original file], if those are only problem row it works fine.
So from above it looks like that something is there that distract character counter, but how I can ensure it working smooth is still a quiz.
EDIT 2: I have submitted it as bug to Microsoft here : https://connect.microsoft.com/VisualStudio/feedback/details/811869/oledb-ace-driver-12-jet-4-0-or-odbc-text-driver-all-fail-to-read-data-properly-from-csv-text-file
I would suggest you examine a problem file with a hex editor - inspect the line that causes the problem and the line immediately preceding it.
In particular look at the line terminators (CR/LF? CR only? LF only?) and look for any non-printable characters.
Try using ACE Driver instead of JET (it's available on x86 and x64 servers, JET is only x86!)
using (System.Data.OleDb.OleDbConnection conn
= new System.Data.OleDb.OleDbConnection
("Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties="Excel 12.0 Xml;HDR=YES";
Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
{
I got the same OleDB, Missing characters of data problem, see here:
The characters go missing because the Microsoft.Jet.OLEDB.4.0 driver
tries to guess the column datatype. In my case its was treating the
data as hexadecimal not alphanumeric.
Problematic oledbProviderString:
oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"
{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited\"";
To fix the problem I added TypeGuessRows=0
oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"
{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited;TypeGuessRows=0\"";
Repro:
Create a Book1.csv file with this content:
KU88,G6,CC
KU88,F7,CC
Step through this code as pictured above.
private void button1_Click(object sender, EventArgs e)
{
string folder = #"G:\Developers\Folder";
ReproProblem(folder);
}
static string oledbProviderString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"Text;HDR=No;FMT=Delimited\"";
private void ReproProblem(string folderPath)
{
using (OleDbConnection oledbConnection = new OleDbConnection(string.Format(oledbProviderString, folderPath)))
{
string sqlStatement = "Select * from [Book1.csv]";
//open the connection
oledbConnection.Open();
//Create an OleDbDataAdapter for our connection
OleDbDataAdapter adapter = new OleDbDataAdapter(sqlStatement, oledbConnection);
//Create a DataTable and fill it with data
DataTable table = new DataTable();
adapter.Fill(table);
//close the connection
oledbConnection.Close();
}
}
why dont u just use this:
using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection
("Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties='text;HDR=Yes;FMT=Delimited';Data Source=" + HttpContext.Current.Server.MapPath("/System/SaleList/"))
{
string sql_select = "select * from [" + this.FileName + "]";
System.Data.OleDb.OleDbDataAdapter da = new System.Data.OleDb.OleDbDataAdapter();
da.SelectCommand = new System.Data.OleDb.OleDbCommand(sql_select, conn);
DataSet ds = new DataSet();
// Read the First line of File to know the header
string[] lines = System.IO.File.ReadAllLines(HttpContext.Current.Server.MapPath("/System/SaleList/") + FileName);
DataTable mdt=new DataTable("ListData");
for (int i = 1; i < lines.Length; i++)
{
string[] sep=lines[i].Split(',');
foreach (var item in sep)
{
mdt.Rows.Add(sep);
}
}
string header = "";
if (lines.Length > 0)
header = lines[0];
string[] headers = header.Split(',');
ds.Tables.Add(mdt);
CreateSchema(headers, FileName);
da.Fill(ds, "ListData");
DataTable dt = mdt;}
i didnt debugged it. i hope there is no problem but if there is im here for you.
thank you very much
Related
I am reading excel data using the OleDbDataAdapter for doing this I am using the below code. My excel file has 80 rows and 19 columns. Each column represents different languages(e.g English Arabic, Chinese, etc).
Each row has certain strings.
public DataSet ReadExcelFile(string dataSource)
{
DataSet ds = new DataSet();
string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + dataSource
+ " ; Extended Properties='Excel 12.0; IMEX=1'";
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;
// Get all Sheets in Excel File
DataTable dtSheet = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
// Loop through all Sheets to get data
foreach (DataRow dr in dtSheet.Rows)
{
string sheetName = dr["TABLE_NAME"].ToString();
if (!sheetName.EndsWith("$"))
continue;
// Get all rows from the Sheet
cmd.CommandText = "SELECT * FROM [" + sheetName + "]";
DataTable dt = new DataTable();
dt.TableName = sheetName;
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
da.Fill(dt);
ds.Tables.Add(dt);
}
cmd = null;
conn.Close();
}
return ds;
}
It works perfectly fine except for a few cells. for the few cells table does not have complete string this is happening for the Chinese language:
for example, my string is:
“个性化喂养模式”允许您预置常用的喂养模式。一旦设定好 , 当按“模式”键时 , 它将自动出现在喂养模式列表中。
-----------------------------------------------
您可以创建 , 编辑或删除个性化喂养模式。
-----------------------------------------------
提示 : 个性化喂养模式可能会被默认喂养列表隐藏。
-----------------------------------------------
使用“>”键选择需要的喂养模式。"
But I am getting only:
“个性化喂养模式”允许您预置常用的喂养模式。一旦设定好 , 当按“模式”键时 , 它将自动出现在喂养模式列表中。
-----------------------------------------------
您可以创建 , 编辑或删除个性化喂养模式。
-----------------------------------------------
提示 : 个性化喂养模式可能会被默认喂养列表隐藏。
-----------------------------------------------
The last row is missing.
This is happening only for 3 cell rest cell are coming properly.
It seems that it is being truncated to 255 characters.
According to this Microsoft Oledb truncates the data length to 255 characters
When you use OLEDB providers then the datatype is determined automatically by the provider based on the first 8 rows. If you have lengthy cells in the first 8 rows then data type will be set as text and otherwise it will be memo type which can hold 255 characters only. To overcome this issue either change the registry setting as mentioned in below KB article: http://support.microsoft.com/kb/281517 or use Microsoft.Jet.OLEDB provider to read the data.
Or you may try the OpenXml approach. Parse and read a large spreadsheet document (Open XML SDK)
i am trying to read excel data to C# using ODBC here is my code
string lstrFileName = "Sheet1";
//string strConnString = "Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq="+path+ ";Extensions=asc,csv,tab,txt;Persist Security Info=False";
string strConnString = "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};Dbq=E:\\T1.xlsx;Extensions=xls/xlsx;Persist Security Info=False";
DataTable ds;
using (OdbcConnection oConn = new OdbcConnection(strConnString))
{
using (OdbcCommand oCmd = new OdbcCommand())
{
oCmd.Connection = oConn;
oCmd.CommandType = System.Data.CommandType.Text;
oCmd.CommandText = "select A from [" + lstrFileName + "$]";
OdbcDataAdapter oAdap = new OdbcDataAdapter();
oAdap.SelectCommand = oCmd;
ds = new DataTable();
oAdap.Fill(ds);
oAdap.Dispose();
// ds.Dispose();
}
}
my sample data
A
1
2
3
AA
BB
its data table its read 1,2,3 and two blank row
i can understand because of first row its deciding data type , but how can i convert as String and read all row .
Any suggestion .
i Already tried CStr but no help .
For a previous discussion of similar problem here, please check following:
DBNull in non-empty cell when reading Excel file through OleDB
As a workaround, you may also format the column as "text"(i.e. in Excel, select column, right click "Format Cells..."), though this might be impractical if you will process large number of files or if you must not touch the file..
This is partially speculation, but when reading an Excel document as a database, the adapter has to make a judgement on datatypes and usually does a pretty good job. However, because Excel allows mixed datatypes (and databases do not), it occasionally gets it wrong.
My recommendation would to be to not use a data adapter, and just read in every field as an object type. From there, you can easily cast them to strings (StringBuilder, ToString(), etc) or even TryParse into fields you suspect they should be, ignoring the ODBC datatype.
Something like this would be a boilerplate for that:
using (OdbcCommand oCmd = new OdbcCommand())
{
oCmd.Connection = oConn;
oCmd.CommandType = System.Data.CommandType.Text;
oCmd.CommandText = "select A from [" + lstrFileName + "$]";
using (OdbcDataReader reader = oCmd.ExecuteReader())
{
object[] fields = new object[reader.FieldCount];
while (reader.Read())
{
reader.GetValues(fields);
// do something with fields
}
}
}
Additional information: The Microsoft Office Access database engine could not find the object 'C:\Users\username\Documents\sampleData.xls'. Make sure the object exists and that you spell its name and the path name correctly.
The Error is highlighted at
theDataAdapter.Fill(spreadSheetData);
Here's the sample data I used (tried in .csv , .xls , .xlsx )
Name Age Status Children
Johnny 34 Married 3
Joey 21 Single 1
Michael 16 Dating 0
Smith 42 Divorced 4
Here's the code associated:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
using System.Data.OleDb;
namespace uploadExcelFile
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void btnImport_Click(object sender, EventArgs e)
{
var frmDialog = new System.Windows.Forms.OpenFileDialog();
if (frmDialog.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
string strFileName = frmDialog.FileName;
System.IO.FileInfo spreadSheetFile = new System.IO.FileInfo(strFileName);
scheduleGridView.DataSource = spreadSheetFile.ToString();
System.Diagnostics.Debug.WriteLine(frmDialog.FileName);
System.Diagnostics.Debug.WriteLine(frmDialog.SafeFileName);
String name = frmDialog.SafeFileName;
String constr = String.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES""", frmDialog.FileName);
OleDbConnection myConnection = new OleDbConnection(constr);
OleDbCommand onlineConnection = new OleDbCommand("SELECT * FROM [" + frmDialog.FileName + "]", myConnection);
myConnection.Open();
OleDbDataAdapter theDataAdapter = new OleDbDataAdapter(onlineConnection);
DataTable spreadSheetData = myConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
theDataAdapter.Fill(spreadSheetData);
scheduleGridView.DataSource = spreadSheetData;
}
}
}
}
scheduleGridView is the DataGridViews name, & btnImport is the name for the import Button.
I've installed 2007 Office System Driver: Data Connectivity Components; which gave me the AccessDatabaseEngine.exe, but from there I've been stuck here without understanding how to get around this. It should go without saying that the filepath is correct in its entirety. There is no odd characters in the path name either (spaces, underlines, etc)
Mini Update :: (another dead end it seems like)
Although the initial error says, "could not find the object 'C:\Users\username\Documents\sampleData.xls'"
In the Debugger the exception is read as
When I look at details the exception as "C:\Users\username\Documents\sampleData.xls"
So I thought the error was that it wasn't taking the path as a literal, but this article C# verbatim string literal not working. Very Strange backslash always double
Shows very clearly that that is not my issue.
I am guessing you may be mistaken by what is returned from the following line of code…
DataTable spreadSheetData = myConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
The DataTable returned from this line will have nine (9) columns (TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, TABLE_GUID, DESCRIPTION, TABLE_PROPID, DATE_CREATED and DATE_MODIFIED). This ONE (1) DataTable returned simply “Describes” the worksheet(s) and named range(s) in the entire selected Excel workbook. Each row in this DataTable represent either a worksheet OR a named range. To distinguish worksheets from named ranges, the “TABLE_NAME” column in this DataTable has the name of the worksheet or range AND ends each “Worksheet” Name with a dollar sign ($). If the “TABLE_NAME” value in a row does NOT end in dollar sign, then it is a range and not a worksheet.
Therefore, when the line
OleDbDataAdapter theDataAdapter = new OleDbDataAdapter(onlineConnection);
Blows up and says it cannot file the “filename” error… is somewhat expected because this line is looking for a “worksheet” name, not a filename. On the line creating the select command…
OleDbCommand onlineConnection = new OleDbCommand("SELECT * FROM [" + frmDialog.FileName + "]", myConnection);
This is incorrect; you have already selected the filename and open the file with
String constr = String.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES""", frmDialog.FileName);
OleDbConnection myConnection = new OleDbConnection(constr);
myConnection.Open();
The correct OleDbCommand line should be…
OleDbCommand onlineConnection = new OleDbCommand("SELECT * FROM [" + sheetName + "]", myConnection);
The problem here is that the current code is not getting the worksheet names. Therefore, we cannot “select” the worksheet from the workbook then fill the adapter with the worksheet.
The other issue is setting the DataGridView’s DataSource to spreadSheetData… when you get the worksheet(s) from an Excel “Workbook”, you must assume there will be more than one sheet. Therefore a DataSet will work as a container to hold all the worksheets in the workbook. Each DataTable in the DataSet would be a single worksheet and it can be surmised that the DataGridView can only display ONE (1) of these tables at a time. Given this, below are the changes described along with an added button to display the “Next” worksheet in the DataGridView since there may be more than one worksheet in the workbook. Hope this makes sense.
int sheetIndex = 0;
DataSet ds = new DataSet();
public Form1() {
InitializeComponent();
}
private void btnImport_Click(object sender, EventArgs e) {
var frmDialog = new System.Windows.Forms.OpenFileDialog();
if (frmDialog.ShowDialog() == System.Windows.Forms.DialogResult.OK) {
String constr = String.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES""", frmDialog.FileName);
OleDbConnection myConnection = new OleDbConnection(constr);
myConnection.Open();
DataTable spreadSheetData = myConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string sheetName = "";
DataTable dt;
OleDbCommand onlineConnection;
OleDbDataAdapter theDataAdapter;
// fill the "DataSet" each table in the set is a worksheet in the Excel file
foreach (DataRow dr in spreadSheetData.Rows) {
sheetName = dr["TABLE_NAME"].ToString();
sheetName = sheetName.Replace("'", "");
if (sheetName.EndsWith("$")) {
onlineConnection = new OleDbCommand("SELECT * FROM [" + sheetName + "]", myConnection);
theDataAdapter = new OleDbDataAdapter(onlineConnection);
dt = new DataTable();
dt.TableName = sheetName;
theDataAdapter.Fill(dt);
ds.Tables.Add(dt);
}
}
myConnection.Close();
scheduleGridView.DataSource = ds.Tables[0];
setLabel();
}
}
private void setLabel() {
label1.Text = "Showing worksheet " + sheetIndex + " Named: " + ds.Tables[sheetIndex].TableName + " out of a total of " + ds.Tables.Count + " worksheets";
}
private void btnNextSheet_Click(object sender, EventArgs e) {
if (sheetIndex == ds.Tables.Count - 1)
sheetIndex = 0;
else
sheetIndex++;
scheduleGridView.DataSource = ds.Tables[sheetIndex];
setLabel();
}
I solved it. Well there was a workaround. I used the Excel Data Reader found in this thread: How to Convert DataSet to DataTable
Which led me to https://github.com/ExcelDataReader/ExcelDataReader
^ The readme was fantastic, just went to solution explorer, right click on references, manage NuGet Packages, select browse in the new box, enter ExcelDataReader, then in the .cs file be sure to include, "using Excel;" at the top, the code mentioned in the first link was essentially enough, but here's my exact code for those wondering.
var frmDialog = new System.Windows.Forms.OpenFileDialog();
if (frmDialog.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
/*string strFileName = frmDialog.FileName;
//System.IO.FileInfo spreadSheetFile = new System.IO.FileInfo(strFileName);
System.IO.StreamReader reader = new System.IO.StreamReader(strFileName);
*/
string strFileName = frmDialog.FileName;
FileStream stream = File.Open(strFileName, FileMode.Open, FileAccess.Read);
//1. Reading from a binary Excel file ('97-2003 format; *.xls)
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
//...
//2. Reading from a OpenXml Excel file (2007 format; *.xlsx)
//IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//...
//3. DataSet - The result of each spreadsheet will be created in the result.Tables
//DataSet result = excelReader.AsDataSet();
//...
//4. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();
DataTable data = result.Tables[0];
//5. Data Reader methods
while (excelReader.Read())
{
//excelReader.GetInt32(0);
}
scheduleGridView.DataSource = data;
excelReader.Close();
I am using a function in order to open an .xls file with multiple worksheets and copy the entire content into a .csv file.
Everything works just fine on my local machine: no exceptions, no errors etc.
But when I am running it on windows server 2012R I am getting an exception when the connection is opened.
Here is the code where I am trying to open an OleDB connection and then query through the file:
static void ConvertExcelToCsv(string excelFilePath, string csvOutputFile, int worksheetNumber)
{
// connection string
var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + excelFilePath + ";Extended Properties=\"Excel 8.0;HDR=no;Format=xls\"");
var cnn = new OleDbConnection(cnnStr);
// get schema, then data
var dt = new DataTable();
try
{
cnn.Open();
var schemaTable = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (schemaTable.Rows.Count < worksheetNumber) throw new ArgumentException("The worksheet number provided cannot be found in the spreadsheet");
string worksheet = schemaTable.Rows[worksheetNumber - 1]["table_name"].ToString().Replace("'", "");
string sql = String.Format("select * from [{0}]", worksheet);
var da = new OleDbDataAdapter(sql, cnn);
da.Fill(dt);
....
The excelFilePath is my source excel file (.xls) and csvOutputFile is the file were the content is going to be passed to.
Does anyone has any ideas why I am getting this exception??
I wrote some methods which are supposed to fetch a DataTable for each WorkSheet in a Excel file:
Step 1 is to get the names of all sheets included in a .xlsx file:
private static List<string> GetSheetNames(string filePath)
{
List<string> sheetNames = new List<string>();
DataTable dt = null;
try
{
OleDbConnection connection = new OleDbConnection("provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + filePath + "';Extended Properties='Excel 12.0 Xml;HDR=YES;'");
connection.Open();
dt = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dt == null)
{
return null;
}
// Add the sheet name to the string array.
foreach (DataRow row in dt.Rows)
{
sheetNames.Add(row["TABLE_NAME"].ToString());
}
}catch(Exception ex)
{
MessageBox.Show(ex.Message);
}
return sheetNames;
}
Step 2 is to read every sheet and return an according DataTable:
private static DataTable ReadExcelSheet(string filePath,string sheetName)
{
DataTable table = new DataTable();
ValidateSheetName(ref sheetName);
try
{
OleDbConnection connection;
DataSet DtSet;
OleDbDataAdapter cmd;
connection = new OleDbConnection("provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + filePath + "';Extended Properties='Excel 12.0 Xml;HDR=YES;'");
cmd = new OleDbDataAdapter("select * from ["+sheetName+"]", connection);
cmd.TableMappings.Add("Table", sheetName.Replace("$",string.Empty));
DtSet = new DataSet();
cmd.Fill(DtSet);
table = DtSet.Tables[0];
connection.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
return table;
}
Both methods are called from this last method which returns a List<DataTable>:
private static List<DataTable> ConvertExcelToTables(string filePath)
{
List<string> sheetNames = GetSheetNames(filePath);
List<DataTable> tableList = new List<DataTable>();
foreach(string sheetName in sheetNames)
{
tableList.Add(ReadExcelSheet(filePath,sheetName));
}
return tableList;
}
There is also a little helper method which should be irrelevant for the question:
private static void ValidateSheetName(ref string sheetName)
{
sheetName = sheetName.EndsWith("$") ? sheetName : sheetName + "$";
}
If I take one sheet from a example file it looks like this:
Now no matter if I just look into the DataTable while debugging or if I bind it as a DataSource of a DataGridView the result looks a little weird:
My guess is that this might have to do with Excel sheets beginning counting with 1 not with 0. But even if this is the case I can't really think of a solution. Or did I miss something. Actually this is a pity because this seems to be a clean solution imo.
No, the problem is caused by
HDR=YES;
in your connection string.
Change it to
HDR=NO;
HDR=YES means that the first line of your Excel sheets is assumed to contain the fields' names of your table. But this is not the case with the sheet shown as an example. Indeed the OleDb provider cannot determine the name of the second column (it's blank) and thus it assigns the default value (the letter F followed by the progressive number of the column)
You could find a lot of examples and explanations about connectionstrings for excel at connectionstrings.com