Force EPPLUS to read as text - c#

I'm developping an application to read xlsx files, do some validation and insert into database. Unfortunatelly when I try to read columns marked as numeric (fe with EAN-13 codes) I get miniumum value of an int.
The user doesn't see this because Excel displays it properly.
How can I make it read the file as plain text? I know I can use OLEBD for it, but I also need to edit the file dynamically, so epplus ExcelPackage is the best choice.
Here is code im using:
FileInfo file = new FileInfo(path);
MainExcel = new OfficeOpenXml.ExcelPackage(file);
{
var ws = MainExcel.Workbook.Worksheets.First();
DataTable tbl = new DataTable();
for (var rowNum = 1; rowNum <= ws.Dimension.End.Row; rowNum++) //currently loading all file
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var row = tbl.NewRow();
foreach (var cell in wsRow)
{
row[cell.Start.Column - 1] = cell.Text;
}
tbl.Rows.Add(row);
}
}
and that's how I enumerate columns
foreach (var firstRowCell in ws.Cells[3, 1, 3, ws.Dimension.End.Column])
{
System.Type typeString = System.Type.GetType("System.String") ;
tbl.Columns.Add( firstRowCell.Text , typeString );
}
For people whom it might concern, here is the file (works also for non google users):
https://drive.google.com/open?id=0B3kIzUcpOx-iMC1iY0VoLS1kU3M&authuser=0
I noticed that ExcelRange.value property is an array which contains all of the objects unformatted. But once you iterate over cells in ExcelRange and request cell.Text property, it has already been processed. Trying to modify ConditionalFormatting and DataValidation in ExcelRange does not help (f.e. AddContainsText()) - #EDIT--> Neither for an entire sheet :-(
I'd prefer NOT to cast ExcelRange.Value as Array, it's ugly and very conditional.

Apparently this is the solution (not complete code though, you have to add columns to datatable). I couldn't find the format string which specifies 'no formatting' in Epplus, but here you have it.
var ws = MainExcel.Workbook.Worksheets.First();
DataTable tbl = new DataTable();
for (var rowNum = 1; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var array = wsRow.Value as object[,];
var row = tbl.NewRow();
int hhh =0;
foreach (var cell in wsRow)
{
cell.Style.Numberformat.Format = "#";
row[cell.Start.Column - 1] = cell.Text;
}
tbl.Rows.Add(row);
}

The cells in your file are custom-formatted as a fraction. Have you done this on purpose?
Anyway, if you want to keep this format, you can alternatively use cell.Value or cell.RichText.Text to get your 13-digit number.
Hope this helps.

Related

How retrieve each specific column's values by looping through rows using C# from excel?

I am editing uploaded excel workbooks using C# with the same logic I used to do using VBA. I am using SyncFusion to open the workbooks but however, the code below is not letting me read the whole column to apply the logic. Why?
public void AppendID(string excelFilePath, HttpResponse response)
{
using (ExcelEngine excelEngine = new ExcelEngine())
{
IApplication application = excelEngine.Excel;
application.DefaultVersion = ExcelVersion.Excel2007;
IWorkbook workbook = application.Workbooks.Open(excelFilePath);
workbook.Version = ExcelVersion.Excel97to2003;
workbook.Allow3DRangesInDataValidation = true;
//Accessing worksheet via name
IWorksheet worksheet = workbook.Worksheets[2];
When I try to define the range, the error will appear "Two names not allowed".
var prismaID = worksheet.UsedRange["C15:C"].Value;
var type = worksheet.UsedRange["F15:F"].Value;
var placements = worksheet.UsedRange["I15:I"].Value;
if (!type.Contains("PKG"))
{
placements = placements + prismaID;
}
worksheet.Range["G7"].Text = "Testing";
workbook.SaveAs(excelFilePath);
workbook.Close();
}
}
Logic:
Let's say I have three columns and how to use the following logic to manipulate usedRange cells?
ID Condition Name Output
1 Yes Sarah Sarah(1)
2 No George George
3 Yes John(3) John(3)
The logics to apply:
Move the first column 'ID' to the end of the column 'Name' but
if Column 'Condition' contains 'No'then don't move the first column
or if it contains the same 'ID' already.
Here is the VBA code:
With xlSheet
LastRow = xlSheet.UsedRange.Rows.Count
Set target = .Range(.Cells(15, 9), .Cells(LastRow, 9))
values = target.Value
Set ptype=.Range(.Cells(15,6),.Cells(LastRow,6))
pvalues=ptype.Value
For i = LBound(values, 1) To UBound(values, 1)
'if Statement for test keywords
If InStr(1,pvalues(i,1),"Package")= 0 AND InStr(1,pvalues(i,1),"Roadblock")= 0 Then
If Instr(values(I,1),.Cells(i + 15 - LBound(values, 1), 3)) = 0 Then
'If InStr(1,values(i,1),"(")=0 Then
values(i, 1) = values(i, 1) & "(" & .Cells(i + 15 - LBound(values, 1), 3) & ")"
End If
End If
Next
target.Value = values
End With
Your requirement can be achieved by appending column ID with column Name using XlsIO.
Please refer below code snippet for the same.
Code Snippet:
for(int row = 1; row<= worksheet.Columns[1].Count; row++)
{
if (worksheet[row, 2].Value == "yes" && !worksheet[row, 3].Value.EndsWith(")"))
worksheet[row, 4].Value = worksheet[row, 3].Value + "(" + worksheet[row, 1].Value + ")";
else
worksheet[row, 4].Value = worksheet[row, 3].Value;
}
We have prepared simple sample and the sample can be downloaded from the following link.
Sample Link: http://www.syncfusion.com/downloads/support/directtrac/general/ze/Sample859524528.zip
I work for Syncfusion.
So I am working with templates in excel, and I developed this logic.
I create a coupling of the first row of column names and the rows using the first cell as the key to bind the data in groups to a multi value dictionary.
I use the below function, which can be adapted to skip rows before parsing allowing you to target the proper row for binding. Book is ExcelDataReader.AsDataSet()
public static MultiValueDictionary<string, ILookup<string, string>> ParseTemplate(string Sheet, ref List<string> keys)
{
int xskip = 0;
MultiValueDictionary<string, ILookup<string, string>> mvd = new MultiValueDictionary<string, ILookup<string, string>>();
var sheetRows = Book.Tables[Sheet];
//Parse First row
var FirstRow = sheetRows.Rows[0];
for (var Columns = 0; Columns < sheetRows.Columns.Count; Columns++)
{
if (xskip == 0)
{
xskip = 1;
continue;
}
keys.Add(FirstRow[Columns].ToString());
}
//Skip First Row
xskip = 0;
//Create a binding of first row and all subsequent rows
foreach (var row in sheetRows.Select().Skip(1))
{
//Make the key the first cell of each row
var key = row[0];
List<string> rows = new List<string>();
foreach (var item in row.ItemArray)
{
if (xskip == 0)
{
xskip = 1;
continue;
}
rows.Add(item.ToString());
}
mvd.Add(key.ToString(), keys.Zip(rows, (m, n) => new { Key = m, Value = n }).ToLookup(x => x.Key, y => y.Value));
xskip = 0;
}
return mvd;
}
}
//This is example of what a function to parse this could do.
foreach(var Key in mvd.Keys)
{
var KeywithValues = mvd[Key];
foreach(ColumnName in Keys)
{
KeywithValues[ColumnName].
}
}
Hope it helps.

Why would there be no data in the visualiser when there is valid data in the DataTable?

I'm trying to build a wrapper for SpreadsheetLight that returns a DataSet from any .xlsx document passed through it. However, I seem to be having a problem with DataRows not being added to a temporary DataTable.
Here's part of the code that parses a worksheet and generates a DataTable from it:
public DataSet ReadToDataSet(string fileName)
{
using (var wb = new SLDocument(fileName))
{
var set = new DataSet(GenerateTitle(wb.DocumentProperties.Title));
foreach (var wsName in wb.GetWorksheetNames())
{
var ws = wb.SelectWorksheet(wsName);
// Select worksheet returns a bool, so if it comes back false, try the next worksheet instead.
if (!ws) continue;
// Statistics gives indecies of the first and last data cells
var stats = wb.GetWorksheetStatistics();
// Create a new DataTable for each worksheet
var dt = new DataTable(wsName);
//var addDataColumns = true;
for (var colIdx = stats.StartColumnIndex; colIdx < stats.EndColumnIndex; colIdx++)
dt.Columns.Add(colIdx.ToString(), typeof(string));
// Scan each row
for (var rowIdx = stats.StartRowIndex; rowIdx < stats.EndRowIndex; rowIdx++)
{
//dt.Rows.Add();
var newRow = dt.NewRow();
// And each column for data
for (var colIdx = stats.StartColumnIndex; colIdx < stats.EndColumnIndex; colIdx++)
{
//if (addDataColumns)
// dt.Columns.Add();
newRow[colIdx - 1] = wb.GetCellValueAsString(rowIdx, colIdx);
//if (colIdx >= stats.EndColumnIndex)
// addDataColumns = false;
}
dt.Rows.Add(newRow);
}
set.Tables.Add(dt);
}
// Debug output
foreach (DataRow row in set.Tables[0].Rows)
{
foreach (var output in row.ItemArray)
{
Console.WriteLine(output.ToString());
}
}
return set;
}
}
Note: SpreadsheetLight indicies start from 1 instead of 0;
Now, I've tried replacing dt.Rows.Add() with new object[stats.EndColumnIndex -1];, as well as a temporary variable from var newRow = dt.NewRow(); and then passing them into the DataTable afterwards, but still get the same end result. The row objects are populating correctly, but aren't transferring to the DataTable at the end.
When you explore the object during runtime, it shows the correct number of rows and columns in the relevant properties. But when you open it up in the DataVisualiser you can only see the columns, no rows.
I must be missing something obvious.
Update
I looped through the resulting table and output the values to the console as a test. All the correct values appear, but the visualiser remains empty:
I guess the question now is, why would there be no data in the visualiser when there is valid data in the DataTable?
Update 2
Added the full method for reference, including a simple set of for loops to loop through all rows and columns in the first DataTable. Note: I also experimented with pulling the column creation out of the loop and even setting the datatypes. Made no difference. Commented code shows the original.
Ok, turns out the problem was most likely from the columns being added. Either there were too many columns for the visualiser to handle (1024) which I find hard to believe, or there was a bug in visual studio that's randomly corrected itself.
There's also a bug in SpreadsheetLight that lists all columns as having data when you call GetWorksheetStatistics(); so I've used a workaround that uses the maximum number of total cells available OR the stats.NumberOfColumns, whichever is the smallest.
Either way, the below code now functions.
public DataSet ReadToDataSet(string fileName)
{
using (var wb = new SLDocument(fileName))
{
var set = new DataSet(GenerateTitle(wb.DocumentProperties.Title));
foreach (var wsName in wb.GetWorksheetNames())
{
var ws = wb.SelectWorksheet(wsName);
// Select worksheet returns a bool, so if it comes back false, try the next worksheet instead.
if (!ws) continue;
// Statistics gives indecies of the first and last data cells
var stats = wb.GetWorksheetStatistics();
// There is a bug with the stats columns. Take the total number of elements available or the columns from the stats table, whichever is the smallest
var newColumnIndex = stats.NumberOfCells < stats.NumberOfColumns
? stats.NumberOfCells
: stats.NumberOfColumns;
// Create a new DataTable for each worksheet
var dt = new DataTable(wsName);
var addDataColumns = true;
// Scan each row
for (var rowIdx = stats.StartRowIndex; rowIdx < stats.EndRowIndex; rowIdx++)
{
var newRow = dt.NewRow();
// And each column for data
for (var colIdx = stats.StartColumnIndex; colIdx < newColumnIndex; colIdx++)
{
if (addDataColumns)
dt.Columns.Add();
newRow[colIdx - 1] = wb.GetCellValueAsString(rowIdx, colIdx);
}
addDataColumns = false;
dt.Rows.Add(newRow);
}
set.Tables.Add(dt);
}
return set;
}
}
Hopefully someone else finds this as a useful reference in the future, either for SpreadsheetLight or DataVisualiser in Visual Studio. If anyone know's of any limits for the visualiser, I'm all ears!

Office Open XML - Target Cell by its content

How to target Cell if I know its content (there are no duplicates in the xlsx document) using Office Open XML?
I mean I have xlsx sheet (template) and somewhere in it placed my "variable". For example "<<_time>>". I want to find that element (by "variable" name) and change the cell value (current time in this case).
Basic code:
FileInfo newFile = new FileInfo(#"...");
FileInfo template = new FileInfo(#"...");
using (ExcelPackage xlPackage = new ExcelPackage(newFile, template))
{
ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets.First();
//need target Cell by it's value (must use for-loop?)
//worksheet.Cells[...].Value = "...";
xlPackage.Save();
}
Ok, I solved it by classic loop.
var start = worksheet.Dimension.Start;
var end = worksheet.Dimension.End;
for (int row = start.Row; row <= end.Row; row++)
{
for (int col = start.Column; col <= end.Column; col++)
{
string cellValue = worksheet.Cells[row, col].Text.ToString();
if (cellValue == "<<_time>>")
{
worksheet.Cells[row, col].Value = "..";
}
}
}

How to Import excel file data to data table in c#

I am moving data from excel file to datatable, where 10th row is column values .So i used following code by using EPPLUS library(OfficeOpenXml).When i moved to datatable the Columns are Item,Description,Accountnumber,Tender,Levelnumbers.I said these all are 10th row of the excel file, hence due to merging of top level columns it is coming in a sequence like Item,Column1,Column2,Description,Column3,Column4,Column5,Tender,Column6,Levelnumbers.I need a logic like first i need to skip null rows(no data) for column Levelnumbers then Description column name should be moved to Column4 and current column of Description should be named like 'Edata', so the column sequence should be like Item,Column1,Column2,Edata,Column3,Description,Column5,Tender,Column6,Levelnumbers
So altogether by using following code i got values in data table like
Item,Column1,Column2,Description,Column3,Column4,Column5,Tender,Column6,Levelnumbers
1,null,null,Efax,null,Edescription1,null,Tfirst,null,123353
2,null,null,Zfax,null,Zdescription1,null,Tsecond,null,null
3,null,null,Xfax,null,Xdescription1,null,Tthird,null,456546
But it should come as like(skipped values which has Levelnumbers blank ),how to achieve it?
Item,Column1,Column2,Edata,Column3,Description,Column5,Tender,Column6,Levelnumbers
1,null,null,Efax,null,Edescription1,null,Tfirst,null,123353
3,null,null,Xfax,null,Xdescription1,null,Tthird,null,456546
code used is
public static DataTable getDataTableFromExcel(string path)
{
using (var pck = new OfficeOpenXml.ExcelPackage())
{
DataTable tbl = new DataTable();
try
{
using (var stream = File.OpenRead(path))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
bool hasHeader = true; // adjust it accordingly( i've mentioned that this is a simple approach)
string ErrorMessage = string.Empty;
foreach (var firstRowCell in ws.Cells[10, 1, 17, ws.Dimension.End.Column])
{
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 11 : 1;
for (var rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var row = tbl.NewRow();
foreach (var cell in wsRow)
{
row[cell.Start.Column - 1] = cell.Text;
}
tbl.Rows.Add(row);
}
}
catch (Exception exp)
{
}
return tbl;
}
}
If I understand what you are asking (let me know if not) you just want to be able to filter out rows missing a value in the last column? Best to do an explicit cell reference call rather then trying something like wsRow.Last() because the wsRow range will only return cells that have values in it so the Last() will never return a reference to the last column cell since it would be null.
As for replacing column names, all you need is an if statement when populating the column list.
This should do it:
//foreach (var firstRowCell in ws.Cells[10, 1, 17, ws.Dimension.End.Column]) -- ASSUME YOU MEANT ONLY THE 10TH ROW?
foreach (var firstRowCell in ws.Cells[10, 1, 10, ws.Dimension.End.Column])
{
if (!hasHeader)
tbl.Columns.Add(string.Format("Column {0}", firstRowCell.Start.Column));
else if(firstRowCell.Text == "Description")
tbl.Columns.Add("Edata");
else if (firstRowCell.Text == "Column4")
tbl.Columns.Add("Description");
else
tbl.Columns.Add(firstRowCell.Text);
}
var startRow = hasHeader ? 11 : 1;
for (var rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
//Skip row if last column is null
if (ws.Cells[rowNum, ws.Dimension.End.Column].Value == null)
continue;
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var row = tbl.NewRow();
foreach (var cell in wsRow)
{
row[cell.Start.Column - 1] = cell.Text;
}
tbl.Rows.Add(row);
}

Don't split the string if contains in double marks

I have a text delimeted file need to convert into datatable. Given the text something like this :
Name,Contact,Email,Date Of Birth,Address
JOHN,01212121,hehe#yahoo.com,1/12/1987,"mawar rd, shah alam, selangor"
JACKSON,01223323,haha#yahoo.com,1/4/1967,"neelofa rd, sepang, selangor"
DAVID,0151212,hoho#yahoo.com,3/5/1956,"nora danish rd, klang, selangor"
And this is how i read the text file in C#
DataTable table = new DataTable();
using (StreamReader sr = new StreamReader(path))
{
#region Text to csv
while (!sr.EndOfStream)
{
string[] line = sr.ReadLine().Split(',');
//table.Rows.Add(parts[0], parts[1], parts[2], parts[3], parts[4], parts[5]);
if (IsRowHeader)//Is user want to read first row as the header
{
foreach (string column in line)
{
table.Columns.Add(column);
}
totalColumn = line.Count();
IsRowHeader = false;
}
else
{
if (totalColumn == 0)
{
totalColumn = line.Count();
for (int j = 0; j < totalColumn; j++)
{
table.Columns.Add();
}
}
// create a DataRow using .NewRow()
DataRow row = table.NewRow();
// iterate over all columns to fill the row
for (int i = 0; i < line.Count(); i++)
{
row[i] = line[i];
}
// add the current row to the DataTable
table.Rows.Add(row);
}
}
The column is dynamic, the user can add or remove the column on the text file. So I need to check how many column and set to datatable, after that I will read for each line, set value to datarow and then add row to table.
If I don't remove the semicolon inside the double marks, it will show the error "Cannot find column 5" because on the first line is only 4 column (start from 0).
What the best way to deal with text delimited?
Don't try and re-invent the CSV-parsing wheel. Use the parser built into .NET: Microsoft.VisualBasic.FileIO.TextFieldParser
See https://stackoverflow.com/a/3508572/7122.
No, just don't. Don't try and write your own CSV parser - there's no reason to do it.
This article explains the problem and recommends using FileHelpers - which are decent enough.
There is also the Lumenworks reader which is simpler and just as useful.
Finally apparently you can just use DataSets to link to your CSV as described here. I didn't try this one, but looks interesting, if probably outdated.
I usually go with something like this:
const char separator = ',';
using (var reader = new StreamReader("C:\\sample.txt"))
{
var fields = (reader.ReadLine() ?? "").Split(separator);
// Dynamically add the columns
var table = new DataTable();
table.Columns.AddRange(fields.Select(field => new DataColumn(field)).ToArray());
while (reader.Peek() >= 0)
{
var line = reader.ReadLine() ?? "";
// Split the values considering the quoted field values
var values = Regex.Split(line, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")
.Select((value, current) => value.Trim())
.ToArray()
;
// Add those values directly
table.Rows.Add(values);
}
// Demonstrate the results
foreach (DataRow row in table.Rows)
{
Console.WriteLine();
foreach (DataColumn col in table.Columns)
{
Console.WriteLine("{0}={1}", col.ColumnName, row[col]);
}
}
}

Categories