I have an excel file with 1 column, the column contains mixed data,int & string.
After I read the data from the excel file, I saw that the cell with data AZ-965 is null.
this is my string connection
return new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source="
+ excel + ";Extended Properties='Excel 12.0;HDR=YES;IMEX=1;ImportMixedTypes=Text;TypeGuessRows=0;';");
and this how I read data from Excel file
private static List<T> GetImportedData<T>(string sheet, OleDbConnection myConnection,
List<string> column, ImportDataView<T> view) where T : IImportableData
{
var cols = string.Join( ",", column.Select(Cast));
var formattableString = $"select {cols} from [{sheet}$]";
using (var MyCommand = new
OleDbDataAdapter(formattableString, myConnection))
{
using (var DtSet = new DataSet())
{
int nbRow = 0;
MyCommand.Fill(DtSet);
var dt = DtSet.Tables[0];
var rows = dt.AsEnumerable();
var convertedList = rows
//.AsParallel()
.Select(x => GenerateImport<T>(x, column, ref nbRow, view))
.ToList();
return convertedList;
}
}
}
private static string Cast(string arg)
{
return $"IIf(IsNull([{arg}]), '', CStr([{arg}]))";
// return $"CStr([{arg}]) as {cleanName(arg)}";
//return $"[{arg}]";
}
I check this link, but nothing is work, same isue
TypeGuessRows in your connection string is not a connection string property, at least not that I'm aware. It's a registry key:
https://learn.microsoft.com/en-us/office/client-developer/access/desktop-database-reference/initializing-the-microsoft-excel-driver?tabs=office-2016
So I don't think it's doing what you think it is... or anything for that matter.
If you can't control the registry or the content of the file, my only suggestion would be to change the header property in the connection string (HDR=YES) to "no" so that it reads the first record in as text... then you can effectively trash the first row.
It's not a great solution, but it should do what you seek.
Alternatively, you can try reading the content using a DbDataReader instead of using a data adapter:
using (OleDbCommand cmd = new OleDbCommand(formattableString, conn))
{
using (OleDbDataReader reader = cmd.ExecuteReader())
{
string column = reader.IsDBNull(0) ? null : reader.GetValue(0).ToString();
}
}
Based on my inferences on what you are trying to do, the data reader may actually offer some other advantages as well.
Related
I have a huge Excel file with around 50k rows. I am reading it using the following connection string
string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=No;IMEX=1'", MyFilePath);
In this huge Excel file, the rows are grouped in nesting pattern. Meaning, lets say first 500 rows are grouped under Group A there is a sub group in that group comprising of rows from 300-400 as Group B and then again from 350-400
in Group C. Now when I read the excel file in my program, I get all the rows, but I cannot distinguish between the row grouping I mentioned above. Is there any smart way to identify and group them accordingly?
Here's a sample of my code.
rivate List<List<string>> ReadSheetData(string _query, bool _HasHeaders = true)
{
string conn = "";
if (!_HasHeaders)
conn = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=No;IMEX=1'", MyFilePath);
else
conn = this.conn;
List<List<string>> ret = new List<List<string>>();
using (OleDbConnection connection = new OleDbConnection(conn))
{
connection.Open();
try
{
OleDbCommand command = new OleDbCommand(_query, connection);
using (OleDbDataReader dr = command.ExecuteReader())
{
DataTable tbl = dr.GetSchemaTable();
while (dr.Read())
{
List<string> rowVals = new List<string>();
ret.Add(rowVals);
for (int i = 0; i < dr.FieldCount; i++)
{
dynamic cell = dr[i];
string value = cell != null ? cell.ToString() : "";
rowVals.Add(value);
}
}
}
}
catch (Exception ex)
{ }
}
ret.RemoveAll(a => a.All(b => b == "") == true);
return ret;
}
Here's my situation. I'm designing a program that takes Excel files (which may be in csv, xls, or xlsx format) from a remote network drive, processes the data, then outputs and stores the results of that process. The program provides a listbox of filenames that are obtained from the remote network drive folder using the method detailed in the accepted answer here. Once the user selects a filename from the listbox, I want the program to find the file and obtain the information from it to do the data processing. I have tried using this method to read the data from the Excel file while in a threaded security context, but that method just fails without giving any kind of error. It seems to not terminate. Am I going about this the wrong way?
Edit - (Final Notes: I have taken out the OleDbDataAdapter and replaced it with EPPlus handling.)
I was able to scrub sensitive data from the code, so here it is:
protected void GetFile(object principalObj)
{
if (principalObj == null)
{
throw new ArgumentNullException("principalObj");
}
IPrincipal principal = (IPrincipal)principalObj;
Thread.CurrentPrincipal = principal;
WindowsIdentity identity = principal.Identity as WindowsIdentity;
WindowsImpersonationContext impersonationContext = null;
if (identity != null)
{
impersonationContext = identity.Impersonate();
}
try
{
string fileName = string.Format("{0}\\" + Files.SelectedValue, #"RemoteDirectoryHere");
string connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.14.0; data source={0}; Extended Properties=Excel 14.0;", fileName);
OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM Sheet1", connectionString);
DataSet ds = new DataSet();
adapter.Fill(ds, "Sheet1");
dataTable = ds.Tables["Sheet1"];
}
finally
{
if (impersonationContext != null)
{
impersonationContext.Undo();
}
}
}
Additional Edit
Now xlsx files have been added to the mix.
Third Party
Third party solutions are not acceptable in this case (unless they allow unrestricted commercial use).
Attempts - (Final Notes: Ultimately I had to abandon OleDb connections.)
I have tried all of the different connection strings offered, and I have tried them with just one file type at a time. None of the connection strings worked with any of the file types.
Permissions
The User does have access to the file and its directory.
Your connection string might be the issue here. As far as I know, there isn't 1 that can read all xls, csv, and xlsx. I think you're using the XLSX connection string.
When I read xls, i use the following connection string:
#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sFilePath + ";Extended Properties='Excel 8.0;HDR=YES;IMEX=1;'"
Having said that, I recommend using a 3rd party file reader/parser to read XLS and CSV since, from my experience, OleDbDataAdapter is wonky depending on the types of data that's being read (and how mixed they are within each column).
For XLS, try NPOI https://code.google.com/p/npoi/
For CSV, try http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
For XLSX, try EPPlus http://epplus.codeplex.com/
I've had great success with the above libraries.
Is it really important that you use an OleDb interface for this? I've always done it with Microsoft.Office.Excel.Interop, to wit:
using System;
using Microsoft.Office.Interop.Excel;
namespace StackOverflowExample
{
class Program
{
static void Main(string[] args)
{
var app = new Application();
var wkbk = app.Workbooks.Open(#"c:\data\foo.xls") as Workbook;
var wksht = wkbk.Sheets[1] as Worksheet; // not zero-based!
for (int row = 1; row <= 100; row++) // not zero-based!
{
Console.WriteLine("This is row #" + row.ToString());
for (int col = 1; col <= 100; col++)
{
Console.WriteLine("This is col #" + col.ToString());
var cell = wksht.Cells[row][col] as Range;
if (cell != null)
{
object val = cell.Value;
if (val != null)
{
Console.WriteLine("The value of the cell is " + val.ToString());
}
}
}
}
}
}
}
As you will be dealing with xlsx extension, you should rather opt for the new connection string.
public static string getConnectionString(string fileName, bool HDRValue, bool WriteExcel)
{
string hdrValue = HDRValue ? "YES" : "NO";
string writeExcel = WriteExcel ? string.Empty : "IMEX=1";
return "Provider=Microsoft.ACE.OLEDB.12.0;" + "Data Source=" + fileName + ";" + "Extended Properties=\"Excel 12.0 xml;HDR=" + hdrValue + ";" + writeExcel + "\"";
}
Above is the code for getting the connection string. First argument expects the actual path for file location. Second argument will decide whether to consider first row values as column headers or not. Third argument helps decide whether you want to open the connection to create and write the data or simply read the data. To read the data set it to "FALSE"
public static ReadData(string filePath, string sheetName, List<string> fieldsToRead, int startPoint, int endPoint)
{
DataTable dt = new DataTable();
try
{
string ConnectionString = ProcessFile.getConnectionString(filePath, false, false);
using (OleDbConnection cn = new OleDbConnection(ConnectionString))
{
cn.Open();
DataTable dbSchema = cn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dbSchema == null || dbSchema.Rows.Count < 1)
{
throw new Exception("Error: Could not determine the name of the first worksheet.");
}
StringBuilder sb = new StringBuilder();
sb.Append("SELECT *");
sb.Append(" FROM [" + sheetName + fieldsToRead[0].ToUpper() + startPoint + ":" + fieldsToRead[1].ToUpper() + endPoint + "] ");
OleDbDataAdapter da = new OleDbDataAdapter(sb.ToString(), cn);
dt = new DataTable(sheetName);
da.Fill(dt);
if (dt.Rows.Count > 0)
{
foreach (DataRow row in dt.Rows)
{
string i = row[0].ToString();
}
}
cn.Dispose();
return fileDatas;
}
}
catch (Exception)
{
}
}
This is for reading 2007 Excel into dataset
DataSet ds = new DataSet();
try
{
string myConnStr = "";
myConnStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=MyDataSource;Extended Properties=\"Excel 12.0;HDR=YES\"";
OleDbConnection myConn = new OleDbConnection(myConnStr);
OleDbCommand cmd = new OleDbCommand("select * from [Sheet1$] ", myConn);
OleDbDataAdapter adapter = new OleDbDataAdapter();
adapter.SelectCommand = cmd;
myConn.Open();
adapter.Fill(ds);
myConn.Close();
}
catch
{ }
return ds;
I have a xslx file with following data
www.url.com
www.url.com
www.url.com
www.url.com
www.url.com
www.url.com
www.url.com
www.url.com
...
Like you can see I have only 1 column used and a lot of rows.
I need to read that column from the xslx file somehow and convert it to List<string>.
Any help?
Thanks!
You can use EPPlus, it's simple, something like this :
var ep = new ExcelPackage(new FileInfo(excelFile));
var ws = ep.Workbook.Worksheets["Sheet1"];
var domains = new List<string>();
for (int rw = 1; rw <= ws.Dimension.End.Row; rw++)
{
if (ws.Cells[rw, 1].Value != null)
domains.Add(ws.Cells[rw, 1].Value.ToString());
}
the easiest way is to use OleDb, you can do something like this:
List<string> values = new List<string>();
string constr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\\your\\path\\file.xlsx;Extended Properties=\"Excel 12.0 Xml;HDR=NO;\"";
using (OleDbConnection conn = new OleDbConnection(constr))
{
conn.Open();
OleDbCommand command = new OleDbCommand("Select * from [SheetName$]", conn);
OleDbDataReader reader = command.ExecuteReader();
if (reader.HasRows)
{
while (reader.Read())
{
// this assumes just one column, and the value is text
string value = reader[0].ToString();
values.Add(value);
}
}
}
foreach (string value in values)
Console.WriteLine(value);
Have you tried using ClosedXML? Super simple to do.
var wb = new XLWorkbook(FileName);
var ws2 = wb.Worksheet(1);
List<string> myData = new List<string>();
foreach (var r in ws2.RangeUsed().RowsUsed())
{
myData.Add(r.Cell(1).GetString());
}
You can use OOXML to read the file and this library simplify your work http://simpleooxml.codeplex.com.
In my website I am reading a CSV file and parsing it. Now the CSV does not have a column names. It is simply a raw list of comma seperated values.
I take this file and use the ODBCDataReader class to read the rows.
The problem is that when I retrieve the first value it skips the first row of the CSV. This is probably because it considers first row as column header. But in my case there are no column headers. So every time my first row is skipped.
How can I retrieve the first row of my CSV?
Here is the screenshot of my CSV:
Here is the code that I am using to parse the CSV.
public string CsvParser()
{
int _nNrRowsProccessed = 0;
string connectionString = #"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + ConfigurationManager.AppSettings["CSVFolder"] + ";";
OdbcConnection conn = new OdbcConnection(connectionString);
try
{
conn.Open();
string strFileName = ConfigurationManager.AppSettings["CSVFile"];
string strSQL = "Select * from " + strFileName;
OdbcCommand cmd = new OdbcCommand();
cmd.Connection = conn;
cmd.CommandText = strSQL;
cmd.CommandType = CommandType.Text;
OdbcDataReader reader = cmd.ExecuteReader();
string strLine = null;
// MasterCalendar_DB.OpenMySQLConnection();
while (reader.Read())
{
// insert data into mastercalendar
strLine = reader[0].ToString();
string strLine1 = reader[1].ToString();
string strLine2 = reader[2].ToString();
string strLine3 = reader[3].ToString();
string[] arLine = strLine.Split(';');
// string strAgencyPropertyID = arLine[0];
// DateTime dt = DateTime.Parse(arLine[1]);
// Int64 nDate = (Int64)Util.ConvertToUnixTimestamp(dt);
// String strAvailability = (arLine[2]);
_nNrRowsProccessed++;
// MasterCalendar_DB.Insert(strAgencyPropertyID, nDate, strAvailability);
}
}
catch (Exception ex)
{
throw ex;
}
finally
{
conn.Close();
// MasterCalendar_DB.CloseMySQLConnection();
}
return "Success";
}
You want to have a look at the page of the Text-Driver over at connectionstrings.org.
Basically, you create a schema.ini in the same directory, which holds varies options. One of them is the ColNameHeader option, which takes a boolean.
Example from the site:
[customers.txt]
Format=TabDelimited
ColNameHeader=True
MaxScanRows=0
CharacterSet=ANSI
Check this: Schema.ini File (Text File Driver)
You may need to set ColNameHeader = false
Reference Microsoft.VisualBasic and you can use TextFieldParser, which almost certainly has less dependencies than your proposed method.
using (var parser =
new TextFieldParser(#"c:\data.csv")
{
TextFieldType = FieldType.Delimited,
Delimiters = new[] { "," }
})
{
while (!parser.EndOfData)
{
string[] fields;
fields = parser.ReadFields();
//go go go!
}
}
Quick and dirty solution:
string strLine = reader.GetName(0);
string strLine1 = reader.GetName(1);
string strLine2 = reader.GetName(2);
string strLine3 = reader.GetName(3);
reader.GetName(int i); // Gets the name of specified column.
I use this code to load an Excel file into a datatable:
public static DataTable ImportExcelFile(string connectionString)
{
DbProviderFactory factory =
DbProviderFactories.GetFactory("System.Data.OleDb");
var listCustomers = new DataTable();
using (DbConnection connection = factory.CreateConnection())
{
if (connection != null)
{
connection.ConnectionString = connectionString;
using (DbCommand command = connection.CreateCommand())
{
// Cities$ comes from the name of the worksheet
command.CommandText = "SELECT * FROM [Sheet1_2$]";
connection.Open();
using (DbDataReader dr = command.ExecuteReader())
{
listCustomers.Load(dr);
}
}
}
}
return listCustomers;
}
The problem is, some columns in the Excel file, for example, AccountID, contains both string data ('quanmv') and number (123456). When I use this code, it just ignores cell with number value, and leave it with blank.
How can I fix that?
Thank you so much.
By default, the connection to excel tries to guess data types of columns. If it guesses wrong, it may leave nulls where types don't convert. You can add IMEX=1 to the connection string to turn off this automatic guessing, and treat all values as strings.