I have a huge Excel file with around 50k rows. I am reading it using the following connection string
string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=No;IMEX=1'", MyFilePath);
In this huge Excel file, the rows are grouped in nesting pattern. Meaning, lets say first 500 rows are grouped under Group A there is a sub group in that group comprising of rows from 300-400 as Group B and then again from 350-400
in Group C. Now when I read the excel file in my program, I get all the rows, but I cannot distinguish between the row grouping I mentioned above. Is there any smart way to identify and group them accordingly?
Here's a sample of my code.
rivate List<List<string>> ReadSheetData(string _query, bool _HasHeaders = true)
{
string conn = "";
if (!_HasHeaders)
conn = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=No;IMEX=1'", MyFilePath);
else
conn = this.conn;
List<List<string>> ret = new List<List<string>>();
using (OleDbConnection connection = new OleDbConnection(conn))
{
connection.Open();
try
{
OleDbCommand command = new OleDbCommand(_query, connection);
using (OleDbDataReader dr = command.ExecuteReader())
{
DataTable tbl = dr.GetSchemaTable();
while (dr.Read())
{
List<string> rowVals = new List<string>();
ret.Add(rowVals);
for (int i = 0; i < dr.FieldCount; i++)
{
dynamic cell = dr[i];
string value = cell != null ? cell.ToString() : "";
rowVals.Add(value);
}
}
}
}
catch (Exception ex)
{ }
}
ret.RemoveAll(a => a.All(b => b == "") == true);
return ret;
}
Related
I am using oledb to read excel file and then i am trying to apply some conditions and then save in db, but i just want to read excel row one by one using loop.
var cmd = conn.CreateCommand();
cmd.CommandText = "select * from [الحيازات$]";// want to get this rows using loop so
//that only specific row is selected perform opertion blow then i will save in db,but
//its selecting all rows now
using (var rdr = cmd.ExecuteReader())
{
//LINQ query - when executed will create anonymous objects for each row
var query = (from DbDataRecord row in rdr
select row)
.Select(x =>
{
//dynamic item = new ExpandoObject();
Dictionary<string, string> item = new Dictionary<string, string>();
for (int i = 0; i < x.FieldCount; i++)
{
//code
}
});
}
Please check this
private void ReadExcel()
{
try
{
string connectionString = #"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filePath + #";Extended Properties=""Excel 12.0;HDR=YES;IMEX=1;""";
using (OleDbConnection connection = new OleDbConnection())
{
connection.ConnectionString = connectionString;
using (OleDbCommand command = connection.CreateCommand())
{
command.CommandText = "SELECT * FROM [Sheet1$]";
connection.Open();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(command);
DataSet dataSet = new DataSet();
dataAdapter.Fill(dataSet);
if (dataSet != null && dataSet.Tables[0] != null)
{
foreach (DataRow row in dataSet.Tables[0].Rows)
{
string value = row["EmailColumnName"].ToString();
}
}
}
}
}
catch (Exception ex)
{
//Handle Exception
}
}
I am looking for some assistance in obtaining the Columns from a specific worksheet using C#. I am currently able to connect to the Excel file and obtain a read of the Columns, but it is giving me the Columns for every worksheet in my Excel, not a specific one.
What can I do to my code to obtain only the Columns from the desired worksheet? Here is my code which currently fills my Checkbox List with all of the columns.
OleDbConnection excelConnection = new OleDbConnection(String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0\"", strFullPath));
using (OleDbCommand cmd = new OleDbCommand("SELECT * FROM [LogFile$]", excelConnection))
{
excelConnection.Open();
DataTable dt = excelConnection.GetSchema("Columns");
cbColumnList.DataSource = dt;
cbColumnList.DataTextField = "Column_name";
cbColumnList.DataValueField = "Column_name";
cbColumnList.DataBind();
}
I am fairely sure my issue has something to do with where I am creating hte DataTable, as i'm pulling the Scheme from excelConnection and not cmd, thus it's most likely bypassing my query where I have defined the Worksheet to get the columns from. If this is the case, how would I fix it?
First Solution
using System;
using System.Data;
using System.Data.OleDb;
namespace ConsoleApp35
{
class Program
{
static void Main(string[] args)
{
using (OleDbConnection excelConnection = new OleDbConnection(String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0\"", #"D:\Coverage.xlsx")))
{
excelConnection.Open();
var dt = new DataTable();
var da = new OleDbDataAdapter();
var _command = new OleDbCommand();
_command.Connection = excelConnection;
_command.CommandText = "SELECT * FROM [Sheet1$]";
da.SelectCommand = _command;
try
{
da.Fill(dt);
// printing columns names
foreach (DataColumn d in dt.Columns)
{
Console.WriteLine(d.ColumnName);
}
// dt has all data from Sheet1
foreach (DataRow r in dt.Rows)
{
Console.WriteLine(String.Join(", ", r.ItemArray));
}
}
catch (Exception e)
{
// process error here
}
Console.ReadLine();
}
}
}
}
Second solution
This solution is for those who have the same requirements but use Open XML.
Install both Open XML packages from Microsoft:
https://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=5124
In your solution add a references to DocumentFormat.OpenXml and WindowsBase assemblies. Add the following using directives:
using DocumentFormat.OpenXml.Spreadsheet;
using DocumentFormat.OpenXml.Packaging;
using System.Text.RegularExpressions;
Add the following methods to your class that processes your Excel file.
private static string GetColumnName(string cellName)
{
var regex = new Regex("[a-zA-Z]+");
var match = regex.Match(cellName);
return match.Value;
}
public static SharedStringItem GetSharedStringItemById(WorkbookPart
workbookPart, int id)
{
return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem> ().ElementAt(id);
}
Use the following code to get columns names from the specified sheet. Also, additionally, you can process here all the data you need from the file. I have tested reading data from a simple Excel sheet that contains 2 columns and several rows where cells have integer and string values. For processing other data types you need to add additional blocks of code checking cell.DataType.
var columnsNames = new HashSet<string>();
var data = new List<List<string>>();
using (SpreadsheetDocument myDoc =
DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(#"D:\FileName.xlsx",
false))
{
foreach (Sheet s in myDoc.WorkbookPart.Workbook.Sheets)
{
if (s.Name == "Sheet1") {
string relationshipId = s.Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)myDoc.WorkbookPart.GetPartById(relationshipId);
var sd = worksheetPart.Worksheet.Elements<SheetData>().First();
IEnumerable<Row> rows = sd.Elements<Row>();
foreach (Row row in rows)
{
var rowList = new List<string>();
foreach (Cell cell in row.Elements<Cell>())
{
// get columns names
var columnName = GetColumnName(cell.CellReference.Value);
columnsNames.Add(columnName);
// process data
string cellValue = string.Empty;
if (cell.DataType != null)
{
if (cell.DataType == CellValues.SharedString)
{
int id = -1;
if (Int32.TryParse(cell.InnerText, out id))
{
SharedStringItem item = GetSharedStringItemById(myDoc.WorkbookPart, id);
if (item.Text != null)
{
cellValue = item.Text.Text;
}
else if (item.InnerText != null)
{
cellValue = item.InnerText;
}
else if (item.InnerXml != null)
{
cellValue = item.InnerXml;
}
}
}
}
else
{
cellValue = cell.CellValue.Text;
}
rowList.Add(cellValue);
}
data.Add(rowList);
}
}
}
myDoc.Close();
}
How to properly get the Data Type of MS Access table
Below is my code:
var con = $"Provider=Microsoft.Jet.OLEDB.4.0;" +
$"Data Source={database};" +
"Persist Security Info=True;";
using (OleDbConnection myConnection = new OleDbConnection())
{
myConnection.ConnectionString = con;
myConnection.Open();
DataTable dt = myConnection.GetSchema("Tables");
var l = myConnection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, new object[] { null, null, null, "TABLE" });
var isTableExist = false;
foreach (DataRow item in l.Rows)
{
var tbleName = item[2].ToString();
if (tbleName == tableName)
{
isTableExist = true;
break;
}
}
if (isTableExist)
{
string sql = string.Format($"SELECT * FROM {tableName}");
using (OleDbCommand cmd = new OleDbCommand(sql, myConnection))
{
using (OleDbDataReader reader = cmd.ExecuteReader())
{
if (reader != null)
{
for (int i = 0; i < reader.FieldCount; i++)
{
Console.WriteLine(reader.GetName(i));
Console.WriteLine(reader.GetDataTypeName(i));
}
}
}
}
}
}
Here is the structure of the mdb file im reading
And this is what i'm receiving
id
DBTYPE_I4
main_id
DBTYPE_I4
Document_ID
DBTYPE_WVARCHAR
End_page
DBTYPE_WVARCHAR
No_pages
DBTYPE_I4
Why does the AutoNumber datatype show the same as the number data type?
What i'm doing here is merging tables. And i want to skip the insert statement for those data type AutoNumber but i can't make it work because the result is the same with number. Thank you
I am creating a winform application where every day, a user will select a xlsx file with the day's shipping information to be merged with our invoicing data.
The challenge I am having is when the user does not download the xlsx file with the specification that the winform data requires. (I wish I could eliminate this step with an API connection but sadly I cannot)
My first step is checking to see if the xlsx file has headers to that my file path is valid
Example
string connString = "provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + *path* + "';Extended Properties='Excel 12.0;HDR=YES;';";
Where path is returned from an OpenFileDialog box
If the file was chosen wasn't downloaded with headers the statement above throws an exception.
If change HDR=YES; to HDR=NO; then I have trouble identifying the columns I need and if the User bothered to include the correct ones.
My code then tries to load the data into a DataTable
private void loadRows()
{
for (int i = 0; i < deliveryTable.Rows.Count; i++)
{
DataRow dr = deliveryTable.Rows[i];
int deliveryId = 0;
bool result = int.TryParse(dr[0].ToString(), out deliveryId);
if (deliveryId > 1 && !Deliveries.ContainsKey(deliveryId))
{
var delivery = new Delivery(deliveryId)
{
SalesOrg = Convert.ToInt32(dr[8]),
SoldTo = Convert.ToInt32(dr[9]),
SoldName = dr[10].ToString(),
ShipTo = Convert.ToInt32(dr[11]),
ShipName = dr[12].ToString(),
};
Which all works only if the columns are in the right place.
If they are not in the right place my thought is to display a message to the user to get the right information
Does anyone have any suggestions?
(Sorry, first time posting a question and still learning to think through it)
I guess you're loading the spreadsheet into a Datatable? Hard to tell with one line of code. I would use the columns collection in the datatable and check to see if all the columns you want are there. Sample code to enumerate the columns below.
private void PrintValues(DataTable table)
{
foreach(DataRow row in table.Rows)
{
foreach(DataColumn column in table.Columns)
{
Console.WriteLine(row[column]);
}
}
}
private void GetExcelSheetForUpload(string PathName, string UploadExcelName)
{
string excelFile = "DateExcel/" + PathName;
OleDbConnection objConn = null;
System.Data.DataTable dt = null;
try
{
DataSet dss = new DataSet();
String connString = "Provider=Microsoft.ACE.OLEDB.12.0;Persist Security Info=True;Extended Properties=Excel 12.0 Xml;Data Source=" + PathName;
objConn = new OleDbConnection(connString);
objConn.Open();
dt = objConn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dt == null)
{
return;
}
String[] excelSheets = new String[dt.Rows.Count];
int i = 0;
foreach (DataRow row in dt.Rows)
{
if (i == 0)
{
excelSheets[i] = row["TABLE_NAME"].ToString();
OleDbCommand cmd = new OleDbCommand("SELECT * FROM [" + excelSheets[i] + "]", objConn);
OleDbDataAdapter oleda = new OleDbDataAdapter();
oleda.SelectCommand = cmd;
oleda.Fill(dss, "TABLE");
}
i++;
}
grdExcel.DataSource = dss.Tables[0].DefaultView;
grdExcel.DataBind();
lblTotalRec.InnerText = Convert.ToString(grdExcel.Rows.Count);
}
catch (Exception ex)
{
ViewState["Fuletypeidlist"] = "0";
grdExcel.DataSource = null;
grdExcel.DataBind();
}
finally
{
if (objConn != null)
{
objConn.Close();
objConn.Dispose();
}
if (dt != null)
{
dt.Dispose();
}
}
}
if (grdExcel.HeaderRow.Cells[0].Text.ToString() == "CODE")
{
GetExcelSheetForEmpl(PathName);
}
else
{
divStatusMsg.Style.Add("display", "");
divStatusMsg.Attributes.Add("class", "alert alert-danger alert-dismissable");
divStatusMsg.InnerText = "ERROR !!... Upload Excel Sheet in header Defined Format ";
}
how to get how many worksheet in file
string connectionString = String.Format(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR=no;IMEX=1;""", openFileDialog1.FileName);
now i want know how many worksheet in side given ?
what code i should i write ?
how can i know ?
thanks
While the question pointed to by Haris Hasan will allow you to extract the sheet names, if you just want the number of worksheets in a workbook, the following will do the trick:
string connectionString = String.Format(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR=no;IMEX=1;""", openFileDialog1.FileName);
int numberOfSheets = 0;
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
DataTable dt = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dt != null)
{
foreach (DataRow row in dt.Rows)
{
if (row["TABLE_NAME"].ToString().EndsWith("$"))
{
numberOfSheets++;
}
}
}
}
EDIT:
Or, for a shorter version, use the following (thanks for jb for helping me on this one):
string connectionString = String.Format(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR=no;IMEX=1;""", openFileDialog1.FileName);
int numberOfSheets = 0;
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
DataTable dt = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dt != null)
{
numberOfSheets = dt.AsEnumerable().Cast<DataRow>().Where(row => row["TABLE_NAME"].ToString().EndsWith("$")).Count();
}
}