read an xslx file and convert to List - c#

I have a xslx file with following data
www.url.com
www.url.com
www.url.com
www.url.com
www.url.com
www.url.com
www.url.com
www.url.com
...
Like you can see I have only 1 column used and a lot of rows.
I need to read that column from the xslx file somehow and convert it to List<string>.
Any help?
Thanks!

You can use EPPlus, it's simple, something like this :
var ep = new ExcelPackage(new FileInfo(excelFile));
var ws = ep.Workbook.Worksheets["Sheet1"];
var domains = new List<string>();
for (int rw = 1; rw <= ws.Dimension.End.Row; rw++)
{
if (ws.Cells[rw, 1].Value != null)
domains.Add(ws.Cells[rw, 1].Value.ToString());
}

the easiest way is to use OleDb, you can do something like this:
List<string> values = new List<string>();
string constr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\\your\\path\\file.xlsx;Extended Properties=\"Excel 12.0 Xml;HDR=NO;\"";
using (OleDbConnection conn = new OleDbConnection(constr))
{
conn.Open();
OleDbCommand command = new OleDbCommand("Select * from [SheetName$]", conn);
OleDbDataReader reader = command.ExecuteReader();
if (reader.HasRows)
{
while (reader.Read())
{
// this assumes just one column, and the value is text
string value = reader[0].ToString();
values.Add(value);
}
}
}
foreach (string value in values)
Console.WriteLine(value);

Have you tried using ClosedXML? Super simple to do.
var wb = new XLWorkbook(FileName);
var ws2 = wb.Worksheet(1);
List<string> myData = new List<string>();
foreach (var r in ws2.RangeUsed().RowsUsed())
{
myData.Add(r.Cell(1).GetString());
}

You can use OOXML to read the file and this library simplify your work http://simpleooxml.codeplex.com.

Related

Getting List of Columns within a specific Excel Worksheet

I am looking for some assistance in obtaining the Columns from a specific worksheet using C#. I am currently able to connect to the Excel file and obtain a read of the Columns, but it is giving me the Columns for every worksheet in my Excel, not a specific one.
What can I do to my code to obtain only the Columns from the desired worksheet? Here is my code which currently fills my Checkbox List with all of the columns.
OleDbConnection excelConnection = new OleDbConnection(String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0\"", strFullPath));
using (OleDbCommand cmd = new OleDbCommand("SELECT * FROM [LogFile$]", excelConnection))
{
excelConnection.Open();
DataTable dt = excelConnection.GetSchema("Columns");
cbColumnList.DataSource = dt;
cbColumnList.DataTextField = "Column_name";
cbColumnList.DataValueField = "Column_name";
cbColumnList.DataBind();
}
I am fairely sure my issue has something to do with where I am creating hte DataTable, as i'm pulling the Scheme from excelConnection and not cmd, thus it's most likely bypassing my query where I have defined the Worksheet to get the columns from. If this is the case, how would I fix it?
First Solution
using System;
using System.Data;
using System.Data.OleDb;
namespace ConsoleApp35
{
class Program
{
static void Main(string[] args)
{
using (OleDbConnection excelConnection = new OleDbConnection(String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0\"", #"D:\Coverage.xlsx")))
{
excelConnection.Open();
var dt = new DataTable();
var da = new OleDbDataAdapter();
var _command = new OleDbCommand();
_command.Connection = excelConnection;
_command.CommandText = "SELECT * FROM [Sheet1$]";
da.SelectCommand = _command;
try
{
da.Fill(dt);
// printing columns names
foreach (DataColumn d in dt.Columns)
{
Console.WriteLine(d.ColumnName);
}
// dt has all data from Sheet1
foreach (DataRow r in dt.Rows)
{
Console.WriteLine(String.Join(", ", r.ItemArray));
}
}
catch (Exception e)
{
// process error here
}
Console.ReadLine();
}
}
}
}
Second solution
This solution is for those who have the same requirements but use Open XML.
Install both Open XML packages from Microsoft:
https://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=5124
In your solution add a references to DocumentFormat.OpenXml and WindowsBase assemblies. Add the following using directives:
using DocumentFormat.OpenXml.Spreadsheet;
using DocumentFormat.OpenXml.Packaging;
using System.Text.RegularExpressions;
Add the following methods to your class that processes your Excel file.
private static string GetColumnName(string cellName)
{
var regex = new Regex("[a-zA-Z]+");
var match = regex.Match(cellName);
return match.Value;
}
public static SharedStringItem GetSharedStringItemById(WorkbookPart
workbookPart, int id)
{
return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem> ().ElementAt(id);
}
Use the following code to get columns names from the specified sheet. Also, additionally, you can process here all the data you need from the file. I have tested reading data from a simple Excel sheet that contains 2 columns and several rows where cells have integer and string values. For processing other data types you need to add additional blocks of code checking cell.DataType.
var columnsNames = new HashSet<string>();
var data = new List<List<string>>();
using (SpreadsheetDocument myDoc =
DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(#"D:\FileName.xlsx",
false))
{
foreach (Sheet s in myDoc.WorkbookPart.Workbook.Sheets)
{
if (s.Name == "Sheet1") {
string relationshipId = s.Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)myDoc.WorkbookPart.GetPartById(relationshipId);
var sd = worksheetPart.Worksheet.Elements<SheetData>().First();
IEnumerable<Row> rows = sd.Elements<Row>();
foreach (Row row in rows)
{
var rowList = new List<string>();
foreach (Cell cell in row.Elements<Cell>())
{
// get columns names
var columnName = GetColumnName(cell.CellReference.Value);
columnsNames.Add(columnName);
// process data
string cellValue = string.Empty;
if (cell.DataType != null)
{
if (cell.DataType == CellValues.SharedString)
{
int id = -1;
if (Int32.TryParse(cell.InnerText, out id))
{
SharedStringItem item = GetSharedStringItemById(myDoc.WorkbookPart, id);
if (item.Text != null)
{
cellValue = item.Text.Text;
}
else if (item.InnerText != null)
{
cellValue = item.InnerText;
}
else if (item.InnerXml != null)
{
cellValue = item.InnerXml;
}
}
}
}
else
{
cellValue = cell.CellValue.Text;
}
rowList.Add(cellValue);
}
data.Add(rowList);
}
}
}
myDoc.Close();
}

Convert column with mixed data to string

I have an excel file with 1 column, the column contains mixed data,int & string.
After I read the data from the excel file, I saw that the cell with data AZ-965 is null.
this is my string connection
return new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source="
+ excel + ";Extended Properties='Excel 12.0;HDR=YES;IMEX=1;ImportMixedTypes=Text;TypeGuessRows=0;';");
and this how I read data from Excel file
private static List<T> GetImportedData<T>(string sheet, OleDbConnection myConnection,
List<string> column, ImportDataView<T> view) where T : IImportableData
{
var cols = string.Join( ",", column.Select(Cast));
var formattableString = $"select {cols} from [{sheet}$]";
using (var MyCommand = new
OleDbDataAdapter(formattableString, myConnection))
{
using (var DtSet = new DataSet())
{
int nbRow = 0;
MyCommand.Fill(DtSet);
var dt = DtSet.Tables[0];
var rows = dt.AsEnumerable();
var convertedList = rows
//.AsParallel()
.Select(x => GenerateImport<T>(x, column, ref nbRow, view))
.ToList();
return convertedList;
}
}
}
private static string Cast(string arg)
{
return $"IIf(IsNull([{arg}]), '', CStr([{arg}]))";
// return $"CStr([{arg}]) as {cleanName(arg)}";
//return $"[{arg}]";
}
I check this link, but nothing is work, same isue
TypeGuessRows in your connection string is not a connection string property, at least not that I'm aware. It's a registry key:
https://learn.microsoft.com/en-us/office/client-developer/access/desktop-database-reference/initializing-the-microsoft-excel-driver?tabs=office-2016
So I don't think it's doing what you think it is... or anything for that matter.
If you can't control the registry or the content of the file, my only suggestion would be to change the header property in the connection string (HDR=YES) to "no" so that it reads the first record in as text... then you can effectively trash the first row.
It's not a great solution, but it should do what you seek.
Alternatively, you can try reading the content using a DbDataReader instead of using a data adapter:
using (OleDbCommand cmd = new OleDbCommand(formattableString, conn))
{
using (OleDbDataReader reader = cmd.ExecuteReader())
{
string column = reader.IsDBNull(0) ? null : reader.GetValue(0).ToString();
}
}
Based on my inferences on what you are trying to do, the data reader may actually offer some other advantages as well.

SQL Server return Select statement with XML datatype and convert it into DataSet in C#, ASP.Net

In my database table, I have a column name 'SectionDatatable' which are in XML datatype. In my C# code, after I have connection to database to my database and I make a query to get SectionDatatablewhich is XML format in my database, UserDefinedSectionData. I need to convert the 'SectionDatatable' in XML Datatype and convert it into DataSet, how can I do it. I have stuck for 1 day, the following is my code.
SqlConnectionStringBuilder csb = new SqlConnectionStringBuilder();
csb.DataSource = #"CPX-XSYPQKSA91D\SQLEXPRESS";
csb.InitialCatalog = "DNN_Database";
csb.IntegratedSecurity = true;
string connString = csb.ToString();
string queryString = "select * FROM UserDefinedSectionData WHERE SectionDataTabId = #tabId";
using (SqlConnection connection = new SqlConnection(connString))
using (SqlCommand command = connection.CreateCommand())
{
command.CommandText = queryString;
command.Parameters.Add(new SqlParameter("tabId", tabId));
connection.Open();
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
string sectionData= reader["SectionDatatable"].ToString();
int moduleId = Int32.Parse(reader["SectionDataModuleId"].ToString());
}
}
}
This is the simple example of converting XML string to DataSet. This sample also demonstrates processing all tables in DataSet.
You need to replace XML string in this sample with your XML output from database. You can change code as per you need to access data.
string RESULT_OF_SectionDatatable = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
var xmlReader = XmlReader.Create(new StringReader(RESULT_OF_SectionDatatable));
DataSet ds = new DataSet();
ds.ReadXml(xmlReader);
foreach (DataTable table in ds.Tables)
{
Console.WriteLine(table);
Console.WriteLine();
foreach (var row in table.AsEnumerable())
{
for (int i = 0; i < table.Columns.Count; ++i)
{
Console.WriteLine(table.Columns[i].ColumnName +"\t" + row[i]);
}
Console.WriteLine();
}
}

Identify and load the Row Grouping in Excel using C#

I have a huge Excel file with around 50k rows. I am reading it using the following connection string
string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=No;IMEX=1'", MyFilePath);
In this huge Excel file, the rows are grouped in nesting pattern. Meaning, lets say first 500 rows are grouped under Group A there is a sub group in that group comprising of rows from 300-400 as Group B and then again from 350-400
in Group C. Now when I read the excel file in my program, I get all the rows, but I cannot distinguish between the row grouping I mentioned above. Is there any smart way to identify and group them accordingly?
Here's a sample of my code.
rivate List<List<string>> ReadSheetData(string _query, bool _HasHeaders = true)
{
string conn = "";
if (!_HasHeaders)
conn = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=No;IMEX=1'", MyFilePath);
else
conn = this.conn;
List<List<string>> ret = new List<List<string>>();
using (OleDbConnection connection = new OleDbConnection(conn))
{
connection.Open();
try
{
OleDbCommand command = new OleDbCommand(_query, connection);
using (OleDbDataReader dr = command.ExecuteReader())
{
DataTable tbl = dr.GetSchemaTable();
while (dr.Read())
{
List<string> rowVals = new List<string>();
ret.Add(rowVals);
for (int i = 0; i < dr.FieldCount; i++)
{
dynamic cell = dr[i];
string value = cell != null ? cell.ToString() : "";
rowVals.Add(value);
}
}
}
}
catch (Exception ex)
{ }
}
ret.RemoveAll(a => a.All(b => b == "") == true);
return ret;
}

Bulk copy to multiple tables

my application is currently reading data from a csv file into my GUI and then displaying , at the moment the code looks like this :
try
{
//Browse for file
OpenFileDialog ofd = new OpenFileDialog();
//Only show .csv files
ofd.Filter = "Microsoft Office Excel Comma Separated Values File|*.csv";
DialogResult result = ofd.ShowDialog();
if (result == DialogResult.OK)
{
SqlConnection con = new SqlConnection(#"Data Source=(local);Initial Catalog=ionCalc;Integrated Security=True");
string filepath = ofd.FileName;
StreamReader sr = new StreamReader(filepath);
string line = sr.ReadLine();
string[] value = line.Split(',');
DataTable dt = new DataTable();
DataRow row;
foreach (string dc in value)
{
dt.Columns.Add(new DataColumn(dc));
}
while (!sr.EndOfStream)
{
value = sr.ReadLine().Split(',');
if (value.Length == dt.Columns.Count)
{
row = dt.NewRow();
row.ItemArray = value;
dt.Rows.Add(row);
}
}
SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock);
bc.DestinationTableName = "Inventory";
bc.BatchSize = dt.Rows.Count;
bc.ColumnMappings.Add("Name", "Name");
bc.ColumnMappings.Add("IN", "IN");
bc.ColumnMappings.Add("Fund", "Fund");
bc.ColumnMappings.Add("Status", "Status");
bc.ColumnMappings.Add("ShareCurrency", "ShareCurrency");
bc.ColumnMappings.Add("PriceFrequency", "PriceFrequency");
bc.ColumnMappings.Add("ClassCode", "ClassCode");
bc.ColumnMappings.Add("Simulation", "Simulation");
bc.ColumnMappings.Add("Hedged", "Hedged");
bc.ColumnMappings.Add("FundCurrency", "FundCurrency");
bc.ColumnMappings.Add("Type", "Type");
con.Open();
bc.WriteToServer(dt);
bc.Close();
connection.Close();
Which correctly throws all the data into the datagrid for the inventory. However I want to make sure that while the data is getting placed in the inventory table that the name and fund have their distinct names passed to their respective tables ...any ideas ?
Apologies for the starting line , for some reason keeps being cut out
var names = dt.AsEnumerable().Select(o=>o.Field<string>("Name").Distinct());
var funds = dt.AsEnumerable().Select(o=>o.Field<string>("Fund").Distinct());
Once you have these 2 collections you can convert them to datatables as you see fit (Lots of suggestions about the site), then use more SqlBulkCopy to process them

Categories