DataTable ColumnName Limitations? - c#

I'm trying to read data from an Excel File into a DataSet, but my problem is the DataTable Column Names aren't matching what's in my Excel. It seems to be only Columns with long Column Names. In the DataTable Column Names it cuts off at some point.
Here's my attempt:
DataSet mainExcelDataToImport = new DataSet();
using (OleDbConnection mainExcelOleDbConnection = new OleDbConnection())
{
string theMainExcelConnectionString = this.ExcelConnectionString.Replace("{FullFilePath}", this.SelectedFilePath);
mainExcelOleDbConnection.ConnectionString = theMainExcelConnectionString;
// Open
mainExcelOleDbConnection.Open();
string mainExcelSQL = "SELECT * FROM [{ExcelSheet}$]";
mainExcelSQL = mainExcelSQL.Replace("{ExcelSheet}", selectedExcelSheet);
using (OleDbCommand mainExcelOleDbCommand = mainExcelOleDbConnection.CreateCommand())
{
mainExcelOleDbCommand.CommandText = mainExcelSQL;
mainExcelOleDbCommand.CommandType = CommandType.Text;
// Prepare
mainExcelOleDbCommand.Prepare();
using (OleDbDataAdapter mainOleDbDataAdapter = new OleDbDataAdapter(mainExcelOleDbCommand))
{
mainOleDbDataAdapter.Fill(mainExcelDataToImport);
// Close
mainExcelOleDbConnection.Close();
}
}
}
Here's a .xlsx I've tried:
A1234567890123456789012345678901234567890123456789012345678901234567890
TEST
'A12345.....' is the Column Name at A1
'TEST' is the Value at A2
When I check the `ColumnName` in the `DataTable` I get: 'A123456789012345678901234567890123456789012345678901234567890123' which is 64 characters.
How does the ColumnName MaxLength limitation work?
Can I get rid of it?
Is it maybe a limitation on the OleDbDataAdapter?
Any help / suggestions would be appreciated.

64 characters is unusually long for the name of a database column. I've never seen a column name come anywhere near that length.
It seems odd that you would have a column name so long.
They have to be stored somewhere of course and so there is likely to be a maximum on any database software eg SQL server has a maximum of 128. It would not surprise me if intermediate software like sqlclient had a lower limit. Very long names are so unusual.
It's probably a limitation of the adapter.
You could try some of the alternatives mentioned in this thread:
Best /Fastest way to read an Excel Sheet into a DataTable?
Or maybe you can just work with xml - and not a datatable.
Xml will be able to handle any size field that excel can.
https://learn.microsoft.com/en-us/office/open-xml/understanding-the-open-xml-file-formats
https://learn.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet
I

Related

Read Nth row as columns using OLEDB

I am trying to import an xlsx file data into a DataTable. I want to read the 2nd row as columns row or maybe 4th row as columns. currently I am using the below code which is working fine, wanted to know is there any other way to read the excel data from 2nd/4th rows?
public static DataTable GetDataTableFromSecondRow(string filePath,string sheetName)
{
var oleDbConnection = new ExcelToDb(filePath).GetOleDbConnection();
using (OleDbCommand oleDbCommand = new OleDbCommand(String.Format("select * from [{0}${1}]", sheetName, "A2:end"), oleDbConnection))
{
oleDbCommand.ExecuteNonQuery();
using (OleDbDataReader reader = oleDbCommand.ExecuteReader())
{
DataTable dataTable = new DataTable();
dataTable.Load(reader);
return dataTable;
}
}
}
The issue with the code is when I am trying to read the excel file to the end by using "A2:end" it is leaving the data after few blank rows which is not a correct way.
can we use something like "dt1.AsEnumerable.Skip(3)" which skips not just the rows but also the first row default column?
Example picture - would like to read my data as second table by skipping first 2-3 lines.
Try use NPOI to read excel, it can easily use rowNumber and colNumber to get cell value, see this post:
sheet.GetRow(rowNumber).GetCell(colNumber).StringCellValue))
NPOI can add by Nuget.
update:
Select * From [SheetName$] may work, it can select all data on sheet including middle empty cell, then it can use datatable.Rows[rowNum][colNum] to get any cell value, reference post.

Adding a formula to Excel worksheet results in HRESULT: 0x800A03EC

I've searched for an appropriate solution online but couldn't find anything helpful...
In an Excel worksheet I need to assign some values from a database table and then add a formula to next to each value (depending on another Excel worksheet in the same workbook). Adding the data works perfectly but adding the formula results in an error.
I'm getting the data and adding it to the sheet like this:
using (SqlConnection conn = new SqlConnection("MyConnectionString"))
using (SqlCommand comm = new SqlCommand("SELECT DISTINCT [MyField] FROM [MyTable]", conn)
{
conn.Open();
using (SqlDataReader reader = comm.ExecuteReader())
{
myStringList.Add("MyField");
if (reader.HasRows)
while (reader.Read())
myStringList.Add(reader.GetString(reader.GetOrdinal("MyField")));
}
}
workbook.Worksheets.Add(After: workbook.Worksheets[workbook.Sheets.Count]);
for (int counter = 1; counter <= myStringList.Count(); counter++)
((Excel.Worksheet)workbook.ActiveSheet).Cells[counter, 1] = myStringList[counter-1];
So far so good. Now I get to my problem. I need to add a formula to the cells B2, B3, ... for every used cell in column A. The difficulty is that I want to do it with a for loop because the formula depends on the column A.
for (int counter = 2; counter <= myStringList.Count(); counter++)
((Excel.Worksheet)workbook.ActiveSheet).Range["B" + counter].Formula
= $"=VLOOKUP(A{counter};MyOtherWorksheet!$B$2:$B${numberOfRows};1;FALSE)";
numberOfRows is the number of rows in column B in MyOtherWorksheet (it returns the correct number in debugger, so that's not the problem).
But when I assign the formula like this, I'm getting the following exception without any helpful message:
HRESULT: 0x800A03EC
I tried changing .Range["B" + counter] to .Cells[counter, 2] and even tried using .FormulaR1C1 instead of .Formula but I got the same exception.
What am I missing?
I've found the problem. I had to change .Formula to .FormulaLocal.
MSDN description for .FormulaLocal:
Returns or sets the formula for the object, using A1-style references in the language of the user. Read/write Variant.
MSDN description for .Formula:
Returns or sets a Variant value that represents the object's formula in A1-style notation and in the macro language.

Load Excel sheet into DataTable but only the first 256 chars per cell are imported,

I have an excel sheet that I want to load into a datatable withe OleDb.
The sheet contains a multiline text column with up to 1000 chars.
However, using this code below, I only have 256 chars in my DataTable per cell after the import.
Is this a limitation from the provider or is it possible to tell it to read the whole column?
var connectionString = #"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\file.xlsx;Extended Properties=""Excel 12.0 Xml;HDR=YES;IMEX=1"";";
var sheetName = "Sheet1";
using (var con = new OleDbConnection(connectionString))
{
con.Open();
var table = new DataTable(sheetName);
var query = "SELECT * FROM [" + sheetName + "]";
OleDbDataAdapter adapter = new OleDbDataAdapter(query, con);
adapter.Fill(table);
return table;
}
I found a solution.
The problem is that OleDb is guessing, which dbtype to choose.
And, if the first few rows only contain data shorter than 256 chars, that is applied to all rows.
Howevery, as a workaround I just moved one row with large data to the beginning of the sheet and now the whole data gets imported.
Here is a link that describes the problem. There is also a workaround with a registry key, but I haven't tried that.
http://www.xtremevbtalk.com/showthread.php?t=206454
This registry fix worked for me.
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel]
"TypeGuessRows"=dword:00000000
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel]
"TypeGuessRows"=dword:00000000

How do you get the name of the first page of an excel workbook?

Suppose you don't know the name of the first worksheet in an excel workbook. And you want to find a way to read from the first page. This snippet sometimes works, but not always. Is it just me? Or is there a no brainer way to do this?
MyConnection = new System.Data.OleDb.OleDbConnection("provider=Microsoft.Jet.OLEDB.4.0;Data Source='" + inputFile + "';Extended Properties=Excel 8.0;");
String[] excelSheets = new String[tbl.Rows.Count];
int i = 0;
foreach (DataRow row in tbl.Rows)
{
excelSheets[i] = row["TABLE_NAME"].ToString();
i++;
}
string pageName = excelSheets[0];
OleDbDataAdapter myAdapter = new System.Data.OleDb.OleDbDataAdapter("SELECT * FROM [" + pageName + "]", MyConnection);
Note: I am looking for the name of the first worksheet.
If you have Office installed on the machine, why not just use Visual Studio Tools for Office (VSTO). Here is essentially the code to get the worksheet:
Microsoft.Office.Interop.Excel.Application app = new Microsoft.Office.Interop.Excel.Application();
Microsoft.Office.Interop.Excel.Workbook workbook = app.Workbooks.Open(fileName,otherarguments);
Microsoft.Office.Interop.Excel.Worksheet worksheet = workbook.Worksheets[1] as Microsoft.Office.Interop.Excel.Worksheet;
Your code seems to be missing the defintion of tbl. I assume it is something like
DataTable tbl = MyConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
If so, you will probably get the sheetnames but in the wrong order.
I could not find a proper solution for this issue, so I approached it from another point of view. I decided to look for sheets that actual had information on it. You can probably do this by looking at the rows, but the method I used was to look at the columns from the schema information. (This obviously will fail in your used sheet only has one column as unused sheets also have one column), but it worked in my case, and I also used it to check I had the expected number of columns (in my case nine)
This uses the GetOleDbSchemaTable(OleDbSchemaGuid.Columns, null) method to return the column information.
The code is probably irrelevant/trival, and as I happened to be learning LINQ when I came across this issue, so I wrote it in LINQ style
It does require a small class called LinqList which you can get here
DataTable columnDetails = objConn.GetOleDbSchemaTable(
System.Data.OleDb.OleDbSchemaGuid.Columns, null);
LinqList<DataRow> rows = new LinqList<DataRow>(columnDetails.Rows);
var query= (from r in rows
group r by r["Table_Name"] into results
select new { results.Key , count=results.Count() }
);
var activeSheets = (from sheet in query
where sheet.count == 9
select sheet.Key
).ToList();
if (activeSheets.Count != 1)
... display error
This is the same as this other question First sheet Excel
I think that the order of the returned table gets messed up. We would need to find a way to get the order of the tabs. For now if you check your code, sometime the first sheet is index 0. But it can be returned in any order. I have tried deleting the other sheets and with only one you get the right name. But that wouldn't be pratical.
edit : after some research, it could be the tabs are returned in order of names Using Excel OleDb to get sheet names IN SHEET ORDER
see link
SpreadsheetGear for .NET will let you load a workbook and get the names of sheets (with IWorkbook.Worksheets[sheetIndex].Name) and get the raw data or formatted text of each cell (it does more but that's probably what you are looking for if you are currently using OleDB).
You can download a free trial here.
Disclaimer: I own SpreadsheetGear LLC

Excel DateTime being returned as DBNull

I have some Excel file reading code that uses the OLEDB (Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=Excel 8.0;) which works well but I keep encountering an issue whereby certain dates are returned as DBNull.
In the original XLS document, the format of dates that work (en-GB locale) are:
"02/04/2009 17:00:00" // returned as a System.DateTime
And the following style fails:
"08/Jan/09 11:24 AM" // returned as DBNull
Excel knows they're both dates (although I can't force them to style correctly) as the following correctly shows a date:
=DATE(YEAR(c),MONTH(c),DAY(c)) // where c = cell reference.
Is there a way, without altering the auto-generated original, to get the data?
EDIT for reference, here is my read-data method (assuming a dbAdapter is set up already -- note the DBNull doesn't come from the catch which isn't fired at all):
public List<List<string>> GetData(string tableName, int maxColumns)
{
List<List<string>> rows = new List<List<string>>();
DataSet ExcelDataSet = new DataSet();
dbCommand.CommandText = #"SELECT * FROM [" + tableName + "]";
dbAdapter.Fill(ExcelDataSet);
DataTable table = ExcelDataSet.Tables[0];
foreach (DataRow row in table.Rows)
{
List<string> data = new List<string>();
for (int column = 0; column < maxColumns; column++)
{
try
{
data.Add(row[column].ToString());
}
catch (Exception)
{
data.Add(null);
}
}
// Stop processing at first blank row
if ( string.IsNullOrEmpty(data[0]) ) break;
rows.Add(data);
}
return rows;
}
I don't know if this will be helpful or not, but I have run into issues with Excel OLEDB code returning NULLs where I expected data and it almost always came back to a data type inference issue. Excel determines the datatype of a column based on the first x rows of data (I think x=10, could be wrong). I know you don't want to alter the file, but it might be worth trying to put the problem date style in the first 10 rows and see if it alters the behavior of your application.
Obviously if it does fix it, then that doesn't solve your problem. The only fixes in that case that I know of are to alter the file (put something in the first 10 rows that forces it to use the correct datatype). Sorry I can't offer a better solution, but hopefully at least I am helping you figure out what's causing your issue.

Categories