I want to read excel file using gembox like first row as header to check weather required column are present or not.
You can find the reading example here.
Also, here is another example of how you can read just the first row:
var workbook = ExcelFile.Load("Sample.xlsx");
var worksheet = workbook.Worksheets[0];
var firstRow = worksheet.Rows[0];
foreach (var cell in firstRow.AllocatedCells)
{
Console.WriteLine(cell.Name);
Console.WriteLine(cell.Value);
Console.WriteLine("---");
}
I hope this helps.
UPDATE
Another attempt to guess what exactly is requested here:
string[] expectedColumns = new string[] { "Column 1", "Column 2", "Column 3" };
var workbook = ExcelFile.Load("Sample.xlsx");
var worksheet = workbook.Worksheets[0];
var firstRow = worksheet.Rows[0];
string[] actualColumns = firstRow.AllocatedCells
.Select(cell => cell.Value != null ? cell.Value.ToString() : string.Empty)
.ToArray();
for (int i = 0; i < expectedColumns.Length; i++)
if (expectedColumns[i] != actualColumns[i])
throw new Exception("Unexpected column name detected!");
Note, Select and ToArraymethods are provided by System.Linq.
Related
I have a simple document with one table in it. I would like to read its cells content. I found many tutorials for writing, but none for reading.
I suppose I should enumerate sections, but how to know which contains a table?
var document = DocX.Create(#"mydoc.docx");
var s = document.GetSections();
foreach (var item in s)
{
}
I'm using the following namespace aliases:
using excel = Microsoft.Office.Interop.Excel;
using word = Microsoft.Office.Interop.Word;
You can specifically grab the tables using this code:
private void WordRunButton_Click(object sender, EventArgs e)
{
var excelApp = new excel.Application();
excel.Workbooks workbooks = excelApp.Workbooks;
var wordApp = new word.Application();
word.Documents documents = wordApp.Documents;
wordApp.Visible = false;
excelApp.Visible = false;
// You don't want your computer to actually load each one visibly; would ruin performance.
string[] fileDirectories = Directory.GetFiles("Some Directory", "*.doc*",
SearchOption.AllDirectories);
foreach (var item in fileDirectories)
{
word._Document document = documents.Open(item);
foreach (word.Table table in document.Tables)
{
string wordFile = item;
appendName = Path.GetFileNameWithoutExtension(wordFile) + " Table " + tableCount + ".xlsx";
//Not needed if you're not going to save each table individually
var workbook = excelApp.Workbooks.Add(1);
excel._Worksheet worksheet = (excel.Worksheet)workbook.Sheets[1];
for (int row = 1; row <= table.Rows.Count; row++)
{
for (int col = 1; col <= table.Columns.Count; col++)
{
var cell = table.Cell(row, col);
var range = cell.Range;
var text = range.Text;
var cleaned = excelApp.WorksheetFunction.Clean(text);
worksheet.Cells[row, col] = cleaned;
}
}
workbook.SaveAs(Path.Combine("Some Directory", Path.GetFileName(appendName)), excel.XlFileFormat.xlWorkbookDefault);
//Last arg can be whatever file extension you want
//just make sure it matches what you set above.
workbook.Close();
Marshal.ReleaseComObject(workbook);
tableCount++;
}
document.Close();
Marshal.ReleaseComObject(document);
}
//Microsoft apps are picky with memory. Make sure you close and release each instance once you're done with it.
//Failure to do so will result in many lingering apps in the background
excelApp.Application.Quit();
workbooks.Close();
excelApp.Quit();
Marshal.ReleaseComObject(workbooks);
Marshal.ReleaseComObject(excelApp);
wordApp.Application.Quit();
wordApp.Quit();
Marshal.ReleaseComObject(documents);
Marshal.ReleaseComObject(wordApp);
}
The document is the actual word document type (word.Document). Make sure you check for split cells if you have them!
Hope this helps!
If you only have one table in document it should be rather simple. Try this:
DocX doc = DocX.Load("C:\\Temp\\mydoc.docx");
Table t = doc.Table[0];
//read cell content
string someText = t.Rows[0].Cells[0].Paragraps[0].Text;
You can loop through table rows and table cells inside each row, and also through Paragraphs inside each Cells[i] if there are more paragraphs. You can do that with simple for loop:
for (int i = 0; i < t.Rows.Count; i++)
{
someText = t.Rows[i].Cells[0].Paragraphs[0].Text;
}
Hope it helps.
I tried to use Microsoft.Office.Interop.Excel but it's too slow when it comes to reading large excel documents (it was taking over 5 minutes for me). I read that DocumentFormat.OpenXml is faster when it comes to reading large excel documents but in the documentation it doesn't appear that I can't store the columns and row indexes.
For now, I am also only interested in the first row to get the column headers and I will be reading the rest of the document after some logic. I haven't been able to find a way to read only a portion of the excel document. I want to do something similar to this:
int r = 1; //row index
int c = 1; //column index
while (xlRange.Cells[r,c] != null && xlRange.Cells[r, c].Value2 != null)
{
TagListData.Add(new TagClass { IsTagSelected = false, TagName = xlRange[r, c].Value2.toString(), rIndex = r, cIndex = c });
c += 3;
}
Users will be picking excel documents through openFileDialog so there's no fixed number of rows of columns I can use. Is there a way I could make this work?
Thank you
In OpenXML if a cell has no text it might or might not appear in the list of cells (depends on whether it ever had text or not). Therefore the while (...Value2 != null) type of approach isn't really a safe way to do things in OpenXML.
Here is a very simple approach to reading the first row (written using LINQPad hence the Main and the Dump). Note the (simplified) use of the SharedStringTable to get the real text of the cell:
void Main()
{
var fileName = #"c:\temp\openxml-read-row.xlsx";
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
{
// Get the necessary bits of the doc
WorkbookPart workbookPart = doc.WorkbookPart;
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
WorkbookStylesPart ssp = workbookPart.GetPartsOfType<WorkbookStylesPart>().First();
Stylesheet ss = ssp.Stylesheet;
// Get the first worksheet
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
Worksheet sheet = worksheetPart.Worksheet;
var rows = sheet.Descendants<Row>();
var row = rows.First();
foreach (var cell in row.Descendants<Cell>())
{
var txt = GetCellText(cell, sst);
// LINQPad specific method .Dump()
$"{cell.CellReference} = {txt}".Dump();
}
}
}
}
// Very basic way to get the text of a cell
private string GetCellText(Cell cell, SharedStringTable sst)
{
if (cell == null)
return "";
if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
{
int ssid = int.Parse(cell.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
return str;
}
else if (cell.CellValue != null)
{
return cell.CellValue.Text;
}
return "";
}
However... there's potentially a lot of work involved with OpenXML and you'd be well advised to try and use something like ClosedXML or EPPlus instead.
eg using ClosedXML
using (var workbook = new XLWorkbook(fileName))
{
var worksheet = workbook.Worksheets.First();
var row = worksheet.Row(1);
foreach (var cell in row.CellsUsed())
{
var txt = cell.Value.ToString();
// LINQPad specific method .Dump()
$"{cell.Address.ToString()} = {txt}".Dump();
}
}
How do I set HeaderAddress to two cells of a clustered bar chart in EPPlus? I have the below data from my database output, where first column is as you can see merged cells.
I am looking for the following layout of the data
Please take note that the images are from a replica of the data, but generated inside Excel.
What I've tried so far is basically
ExcelChartSerie s = chart.Series.Add(axis.Address, xAxis.Address);
s.HeaderAddress = new ExcelAddress(startRow + r, GetColumnNumberByName(startColumn), startRow + 1, GetColumnNumberByName(startColumn) + 1);
where I more or less select the current row and two columns. This gives me "Address must be a row, column or single cell", but in order to get this to work, I must select multiple cells, no?
Epplus does not have the ability to set it. it is not so difficult but it requires XML manipulation. Kind of ugly but it gets the job done. Not knowing your code I made up a quick unit test that demos. It has to match to the right chart type so if it is not BarClustered let me know:
[TestMethod]
public void Chart_Meged_Header_Test()
{
//Throw in some data
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[]
{
new DataColumn("Col1", typeof (string)),
new DataColumn("Col2", typeof (int))
});
for (var i = 0; i < 10; i++)
{
var row = datatable.NewRow();
row[0] = $"item {(i%2 == 0 ? "A" : "B")}";
row[1] = i * 10;
datatable.Rows.Add(row);
}
//Create a test file
var fileInfo = new FileInfo(#"c:\temp\Chart_Meged_Header_Test.xlsx");
if (fileInfo.Exists)
fileInfo.Delete();
using (var pck = new ExcelPackage(fileInfo))
{
var workbook = pck.Workbook;
var worksheet = workbook.Worksheets.Add("Sheet1");
worksheet.Cells["B1"].LoadFromDataTable(datatable, true);
worksheet.Column(4).Style.Numberformat.Format = "m/d/yyyy";
var chart = worksheet.Drawings.AddChart("chart test", eChartType.BarClustered);
var serie = chart.Series.Add(worksheet.Cells["C2:C11"], worksheet.Cells["B2:B11"]);
chart.SetPosition(0, 0, 3, 0);
chart.SetSize(120);
//Add merged headers
worksheet.Cells["A2"].Value = "Group 1";
worksheet.Cells["A2:A6"].Merge = true;
worksheet.Cells["A7"].Value = "Group 2";
worksheet.Cells["A7:A11"].Merge = true;
//Get reference to the worksheet xml for proper namespace
var chartXml = chart.ChartXml;
var nsm = new XmlNamespaceManager(chartXml.NameTable);
var nsuri = chartXml.DocumentElement.NamespaceURI;
nsm.AddNamespace("c", nsuri);
//Get the Series ref and its cat
var serNode = chartXml.SelectSingleNode("c:chartSpace/c:chart/c:plotArea/c:barChart/c:ser", nsm);
var catNode = serNode.SelectSingleNode("c:cat", nsm);
//Get Y axis reference to replace with multi level node
var numRefNode = catNode.SelectSingleNode("c:numRef", nsm);
var multiLvlStrRefNode = chartXml.CreateNode(XmlNodeType.Element, "c:multiLvlStrRef", nsuri);
//Set the proper cell reference and replace the node
var fNode = chartXml.CreateElement("c:f", nsuri);
fNode.InnerXml = numRefNode.SelectSingleNode("c:f", nsm).InnerXml;
fNode.InnerXml = fNode.InnerXml.Replace("$B$2", "$A$2");
multiLvlStrRefNode.AppendChild(fNode);
catNode.ReplaceChild(multiLvlStrRefNode, numRefNode);
//Set the multi level flag
var noMultiLvlLblNode = chartXml.CreateElement("c:noMultiLvlLbl", nsuri);
var att = chartXml.CreateAttribute("val");
att.Value = "0";
noMultiLvlLblNode.Attributes.Append(att);
var catAxNode = chartXml.SelectSingleNode("c:chartSpace/c:chart/c:plotArea/c:catAx", nsm);
catAxNode.AppendChild(noMultiLvlLblNode);
pck.Save();
}
}
Gives this as the output:
Edit
Based on the replies below, the error I am experiencing may or may not be causing my inability to read my excel file. That is, I am not getting data from the line worksheet.Cells[row,col].Value in my for loop given below.
Problem
I am trying to return a DataTable with information from an excel file. Specifically, it is an xlsx file from 2013 excel I believe. Please see the code below:
private DataTable ImportToDataTable(string Path)
{
DataTable dt = new DataTable();
FileInfo fi = new FileInfo(Path);
if(!fi.Exists)
{
throw new Exception("File " + Path + " Does not exist.");
}
using (ExcelPackage xlPackage = new ExcelPackage(fi))
{
//Get the worksheet in the workbook
ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets.First();
//Obtain the worksheet size
ExcelCellAddress startCell = worksheet.Dimension.Start;
ExcelCellAddress endCell = worksheet.Dimension.End;
//Create the data column
for(int col = startCell.Column; col <= endCell.Column; col++)
{
dt.Columns.Add(col.ToString());
}
for(int row = startCell.Row; row <= endCell.Row; row++)
{
DataRow dr = dt.NewRow(); //Create a row
int i = 0;
for(int col = startCell.Column; col <= endCell.Column; col++)
{
dr[i++] = worksheet.Cells[row, col].Value.ToString();
}
dt.Rows.Add(dr);
}
}
return dt;
}
Error
This is where things get weird. I can see the proper value in startCell and endCell. However, when I look at worksheet I take a peek under Cells and I see something I don't understand:
worksheet.Cells.Current' threw an exception of type 'System.NullReferenceException
Attempts
Reformatting my excel with general fields.
Making sure no field in my excel was empty
RTFM'ed epplus documentation. Nothing suggestive of this error.
Looked at EPPlus errors on stackoverflow. My problem is unique.
Honestly, I am having trouble figuring out what this error is really saying? Is something wrong with my format? Is something wrong with epplus? I have read on here people had no problems with 2013 xlsx with eeplus and I am only trying to parse the excel file by row. If someone could help me shed light on what this error means and how to rectify it. I would be most grateful. I've spent quite a long time trying to figure this out.
When we give:
dr[i++] = worksheet.Cells[row, col].Value.ToString();
it search for value at that column, if the column is empty, it gives Null reference error.
Try instead:
dr[i++] = worksheet.Cells[row, col].Text;
Hope this will help
Like #Thorians said, current is really meant to use when you enumerating the cells. If you want to use it in purest form and actually be able to call current then you would need something like this:
using (var pck = new ExcelPackage(existingFile))
{
var worksheet = pck.Workbook.Worksheets.First();
//this is important to hold onto the range reference
var cells = worksheet.Cells;
//this is important to start the cellEnum object (the Enumerator)
cells.Reset();
//Can now loop the enumerator
while (cells.MoveNext())
{
//Current can now be used thanks to MoveNext
Console.WriteLine("Cell [{0}, {1}] = {2}"
, cells.Current.Start.Row
, cells.Current.Start.Column
, cells.Current.Value);
}
}
Note that you have to create a kind of local collection cells for this to work properly. Otherwise Current will be null if you tried `worksheet.cells.current'
But it would be simpler to use a ForEach and have the CLR do the work for you.
UPDATE: Based on comments. Your code should work fine as is, could it be your excel file:
[TestMethod]
public void Current_Cell_Test()
{
//http://stackoverflow.com/questions/32516676/trying-to-read-excel-file-with-epplus-and-getting-system-nullexception-error
//Throw in some data
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[] { new DataColumn("Col1", typeof (int)), new DataColumn("Col2", typeof (int)),new DataColumn("Col3", typeof (object)) });
for (var i = 0; i < 10; i++)
{
var row = datatable.NewRow(); row[0] = i; row[1] = i * 10; row[2] = Path.GetRandomFileName(); datatable.Rows.Add(row);
}
//Create a test file
var fi = new FileInfo(#"c:\temp\test1.xlsx");
if (fi.Exists)
fi.Delete();
using (var pck = new ExcelPackage(fi))
{
var worksheet = pck.Workbook.Worksheets.Add("Sheet1");
worksheet.Cells.LoadFromDataTable(datatable, true);
pck.Save();
}
var dt = new DataTable();
using (ExcelPackage xlPackage = new ExcelPackage(fi))
{
//Get the worksheet in the workbook
ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets.First();
//Obtain the worksheet size
ExcelCellAddress startCell = worksheet.Dimension.Start;
ExcelCellAddress endCell = worksheet.Dimension.End;
//Create the data column
for (int col = startCell.Column; col <= endCell.Column; col++)
{
dt.Columns.Add(col.ToString());
}
for (int row = startCell.Row; row <= endCell.Row; row++)
{
DataRow dr = dt.NewRow(); //Create a row
int i = 0;
for (int col = startCell.Column; col <= endCell.Column; col++)
{
dr[i++] = worksheet.Cells[row, col].Value.ToString();
}
dt.Rows.Add(dr);
}
}
Console.Write("{{dt Rows: {0} Columns: {1}}}", dt.Rows.Count, dt.Columns.Count);
}
Give this in the output:
{Rows: 11, Columns: 3}
Current is the current range when enumerating.
there is nothing wrong with this throwing an exception in debugging inspection when it is not being used within an enumerating scope.
code sample:
var range = ws.Cells[1,1,1,100];
foreach (var cell in range)
{
var a = range.Current.Value; // a is same as b
var b = cell.Value;
}
I am also getting same issue while reading excel file and none of the solution provided worked for me. Here is working code:
public void readXLS(string FilePath)
{
FileInfo existingFile = new FileInfo(FilePath);
using (ExcelPackage package = new ExcelPackage(existingFile))
{
//get the first worksheet in the workbook
ExcelWorksheet worksheet = package.Workbook.Worksheets[1];
int colCount = worksheet.Dimension.End.Column; //get Column Count
int rowCount = worksheet.Dimension.End.Row; //get row count
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= colCount; col++)
{
Console.WriteLine(" Row:" + row + " column:" + col + " Value:" + worksheet.Cells[row, col].Value.ToString().Trim());
}
}
}
}
Right this is starting to drive me mad, I have a asp:gridview with check boxes, the user has the ability to check which information he/she wants to export to excel, when they click the button the below code is executed, now you can see im doing a for each row in the gridview etc
if the check box for a row is checked i go to the DB execute some information return a datatable and then try and add its values to the Epplus excel spreadsheet, but in the foreach(datacolum) and foreach(DataRow) it doesnt allow me to use
ws.Cells[1, iColumnCount] = c.ColumnName; as it says its read only?
but this one excel spread sheet could have 1 - 10 different bits of information depending on how many checkboxes are checked....can someone please help me and put me out of my misery........ :(
heres my full code
protected void BtnTest_Click(object sender, EventArgs e)
{
bool ReportGenerated = false;
FileInfo newFile = new FileInfo("C:\\Users\\Scott.Atkinson\\Desktop\\book1.xls");
ExcelPackage pck = new ExcelPackage(newFile);
foreach (GridViewRow row in gvPerformanceResult.Rows)
{
object misValue = System.Reflection.Missing.Value;
CheckBox chkExcel = (CheckBox)row.FindControl("chkExportToExcel");
if (chkExcel.Checked)
{
HyperLink HypCreatedBy = (HyperLink)row.FindControl("HyperCreatedBy"); //Find the name of Sales agent
string CreatedBy = HypCreatedBy.Text;
string Fname = HypCreatedBy.Text;
string[] names = Fname.Split();
CreatedBy = names[0];
CreatedBy = CreatedBy + "." + names[1];
WebUser objUser = new WebUser(CreatedBy, true);
DataTable DT = new DataTable();
LeadOpportunities objLeadOpportunities = new LeadOpportunities();
DT = objLeadOpportunities.LoadPRCDetail("PRC", objUser.ShortAbbr, objUser.CanViewAllLead, ReportCriteria); // Load the information to export to Excel.
if (DT.Rows.Count > 0)
{
ReportGenerated = true;
//Add the Content sheet
var ws = pck.Workbook.Worksheets.Add("Content");
ws.View.ShowGridLines = true;
int iRowCount = ws.Dimension.Start.Row; //Counts how many rows have been used in the Excel Spreadsheet
int iColumnCount = ws.Dimension.Start.Column; //Counts how many Columns have been used.
if (iRowCount > 1)
iRowCount = iRowCount + 2;
else
iRowCount = 1;
iColumnCount = 0;
foreach (DataColumn c in DT.Columns)
{
iColumnCount++;
if (iRowCount == 0)
ws.Cells[1, iColumnCount] = c.ColumnName;
else
ws.Cells[iRowCount, iColumnCount] = c.ColumnName;
}
foreach (DataRow r in DT.Rows)
{
iRowCount++;
iColumnCount = 0;
foreach (DataColumn c in DT.Columns)
{
iColumnCount++;
if (iRowCount == 1)
ws.Cells[iRowCount + 1, iColumnCount] = r[c.ColumnName].ToString();
else
ws.Cells[iRowCount, iColumnCount] = r[c.ColumnName].ToString();
WorkSheet.Columns.AutoFit(); //Correct the width of the columns
}
}
pck.Save();
System.Diagnostics.Process.Start("C:\\Users\\Scott.Atkinson\\Desktop\\book1.xls");
}
}
}
}
Any help would be highly appreciated.
it doesnt allow me to use
ws.Cells[1, iColumnCount] = c.ColumnName;
That line should have been:
ws.Cells[1,iColumnCount].Value = c.ColumnName
but it now falls over on the int iRowCount = ws.Dimension.Start.Row; //Counts how many rows have been used in the Excel Spreadsheet int iColumnCount = ws.Dimension.Start.Column; //Counts how many Columns have been used. can someone help me get the row/column count?
The .Dimension property gives the address for the range covering the top left cell to the bottom right cell so to get the row count we can use:
var rowCount = ws.Dimension.End.Row - ws.Dimension.Start.Row + 1;
and similarly for the column count:
var colCount = ws.Dimension.End.Column - ws.Dimension.Start.Column + 1;