I have an Excel file from which I have to extract the required data and save it to a database. I know by using Range we can get a particular range of data. But my data that was to be extracted was a bit large. So can anyone suggest which was the best and simple method to retrieve the data and store the information in a database?
I would like to read the data from A10 to an unknown range. My data will be as follows
As per marked with red after that data should go in to the database column by column I will do that if anyone can suggest the best method to read the remaining columns too.
You could use SQL Server Integration Services to import the excel data to a table. A SSIS package can run at scheduled times or be invoked. It uses the spreadsheet as a data source and allows you to map columns.
Well, you can use NPOI to read in the Excel file and parse it any way you want. We use it to import large Excel files into a SQL database as well. Using NPOI you have complete freedom on how to interpret the data.
Most important thing is that, if you want to do this more often, either the format of the Excel file should not change, or you should have some generic description of the Excel file stored somewhere else which tells your code how to interpret the file. The latter is of course more difficult to do. It depends on your particular use case which is better.
In our case the Excel file has a fixed layout, so our implementation is based on that layout.
If you still need to do it from code there is only one way of doing. As per your question you said that your data will start from A10, first of all get the UsedRange of excel as follows
Microsoft.Office.Interop.Excel.Range xlRange = worksheet.UsedRange;
As there are only 2 columns get the row count and column count of excel ad follows
iRows = xlRange.Rows.Count;
iCols = xlRange.Columns.Count;
Later start your loop as follows
for (int iRow = 10; iRow <= iRows; iRow++)
{
for (int iCol = 1; iCol <= iCols; iCol++)
{
xlRange = (Microsoft.Office.Interop.Excel.Range)worksheet.Cells[iRow, iCol];
Console.WriteLine(xlRange.Text); // From here do as per you required and insert the required data to the data base.
List<string> lstItems = new List<string>(); // Declare this initially
lstItems.Add(xlRange.Text.ToString());
if (lstItems.Count == 10)
{
if (xlRange.Text.ToString().Contains("www") || lstItems[9].ToString() == string.Empty)
{
}
}
Related
I am trying to count the number of sheets in a workbook. The workbook is created using NPOI and there doesn't seem to be a way to count the amount of sheets using the C# version of NPOI?
This is a really tricky thing to both explain and show... But I will give it a try.
What I am trying to do is having an existing excel-file as a template for statistics. This existing excel-file can have different amounts of templates and I need to be able to count these templates to know where to place my new sheets and edit their names.
The sender of the data only has to chose which template-sheet should be filled with which data, and I will then remove the template-sheets from the workbook after all data has been inserted.
What I have tried:
I have read the documentation and searched for information and have tried the following approaches:
getNumberOfSheets - How to know number of sheets in a workbook?
Problem with this approach: The C# version of NPOI doesn't seem to have getNumberOfSheets.
Convert found row-counters into sheet-counters - NPOI - Get excel row count to check if it is empty
Can't really recreate the code to work for sheets as the functionality for sheets and rows are too different.
var sheetIndex = 0;
foreach (var sheet in requestBody.Sheets)
{
if (sheet.TemplateNumber == "")
{
sheetTemplate = templateWorkbook.CreateSheet(sheet.Name);
}
else
{
sheetTemplate = templateWorkbook.CloneSheet(Convert.ToInt32(sheet.SheetTemplate));
if (!templates.Contains(Convert.ToInt32(sheet.SheetTemplate)))
{
templates.Add(Convert.ToInt32(sheet.SheetTemplate));
}
// Do math's to make sure we add the name to the newly created sheet further down the code (I need to actual index here)
}
// Insert statistics
//After inserting statistics:
workingCopy.SetSheetName(sheetIndex, sheet.Name);
foreach (var template in templates)
{
workingCopy.RemoveSheetAt(template);
}
}
You can get number of sheets from NumberOfSheets property in XSSFWorkbook class.
I'm working on a card number masking process. We get these human created excel documents in and need to mask a set of the digits, but its not always guaranteed that Column D is going to be the column with the card numbers. Could be Column D only, or D and G, etc. I know these documents will always have at least 10 rows not counting headers.
I want to run a scan of the worksheets in an excel workbook and detect which columns have data, then check the 3rd cell of each non null column. If it matches a numerical string at least 9 digits long, define that column as a card type in an array, then go back and iterate through the array of columns meeting that requirement and mask the desired characters. Is this reasonably doable between some C# methods and excel properties within the Interops library?
Yes, it is possible to do so. There are several libraries out there, that give you access to Excel documents and let you scan through worksheets, rows, columns and cell values.
Some libraries are based on the Interop COM interface Excel and start a background Excel process that does the real work of extracting the information.
Libraries like NPOI (for xls and xlsx) or the Open XML SDK (xlsx) access Excel files directly without the need of having Excel installed. This is extremly valuable for server side processing of Office documents. In NPOI sweeping through a Excel file looks like this (just to give you an idea).
var workbook = new XSSFWorkbook(dataStream);
var sheet = workbook.GetSheetAt(0);
var rowEnumerator = sheet.GetRowEnumerator();
while (rowEnumerator.MoveNext())
{
IRow row = (XSSFRow)_rowEnumerator.Current;
int colCount = row.LastCellNum;
var tableRow = new TableRow(colCount);
for (var c = 0; c < colCount; c++)
{
var cell = row.GetCell(c);
if (cell != null)
{
if (IsCreditCardNumber(c))
{
...
}
}
}
}
I am currently using EPPlus library to export large amounts of data to several worksheets and tables in side each of those worksheets.
I have been able to create list validation and have it working via a lookup worksheet named range perfectly fine. However, I have come across some strange behaviour which I have been unable to figure out.
To begin:
I download the file. I open the file. I select a spreadsheet with a table, there are multiple rows in the table, there is a list validation column with Options Yes/No to select from a dropdown. Each row has this list validation.
Scenario 1:
I then create a new row in the excel table, by dragging from the bottom right corner of the excel table to create the new row. The formula was not copied to the new row. I have now lost the validation for a new row in my excel table.
Scenario 2:
I delete all existing rows in the excel table, except for the first row (which still contains list validation in the Yes/No column). I THEN create a new row in the excel table by dragging from the bottom right corner of the excel table to create the new row.
The formula IS copied to the new row, I can now insert new valid data into this row by using the provided validation.
The logic of my code:
Each cell has validation applied to it by a loop which gets the kind of validation the cell needs to have (i.e number, date, list, greater than, less than etc). List validation is accessed via a named table lookup address. There is NO XML output error and the file opens fine, I can access the list validation from the cells without any problem.
Things I have tried to fix this issue:
1) Fill the range of cells, THEN create the excel table from this range.
- The idea behind this is, to first have a selection of data created, then select the range and just turn it into an excel table. Default behaviour would be for new rows in a table to just copy the fomula from the row above. So this solution seems logical.
2) Create an excel table on a range of non-filled cells, then fill this range.
- The idea behind this is, there could have been a bug in the way EPPlus creates a table in the worksheet, or possibly there could be an issue with order of XML elements and really was simply just an experimental change.
The code:
var strategy = Strategy.CreateTableFirst;
ExcelRange subRowDataRange = null;
ExcelTable table = null;
if (strategy == Strategy.CreateTableFirst)
{
subRowDataRange = worksheet.Cells[headerRowIndex, worksheet.Dimension.Start.Column, ToRow: headerRowIndex + groupedRowData.Count(), ToCol: dataFields.Count()];
table = worksheet.Tables.Add(subRowDataRange, Name: null); // Auto generate Excel table name
table.TableStyle = TableStyles.Light13;
}
foreach (var field in dataFields)
{
// Headers
if (strategy == Strategy.CreateTableFirst)
{
table.Columns[dataFields.IndexOf(field)].Name = field.Name;
}
else
{
worksheet.Cells[headerRowIndex, columnIndex].Value = field.Name;
}
// Help Text
if (field.HelpText.HasValue())
{
worksheet.Cells[headerRowIndex, columnIndex].AddComment(field.HelpText, Author: "System");
}
int dataRowIndex = headerRowIndex + 1; // First row in the datatable
if (groupedRowData.None())
{
worksheet.Cells[dataRowIndex, columnIndex].Set(field, owner: owner, rowIndex: null, addValidation: true);
}
// Add SubRows
foreach (var rowData in groupedRowData)
{
worksheet.Cells[dataRowIndex, columnIndex].Set(field, owner: owner, rowIndex: rowData.Key, addValidation: true);
dataRowIndex++;
}
columnIndex++;
}
if (strategy == Strategy.CreateTableLast)
{
subRowDataRange = worksheet.Cells[headerRowIndex, worksheet.Dimension.Start.Column, ToRow: worksheet.Dimension.End.Row + 1, ToCol: dataFields.Count()];
table = worksheet.Tables.Add(subRowDataRange, Name: null);
table.TableStyle = TableStyles.Light13;
}
}
This is the output table in excel after the code:
The funny thing is, the cell validation is copied down to the next row fine if I create the table manually and have the first row set with the data, then drag down to make a new row and it copies over fine. I'm not sure how I am going to be able to export multiple rows of data and be assured that when a user inserts a new row, validation is copied down.
I downloaded the Microsoft XML SDK to compare the excel table with 1 row (which I am then able to drag down to create a second row with copied formula) and the original downloaded excel file with many rows in the excel table.
The results are almost identical with regards to the excel table in XML output.
Also nothing seems out of place after deleting the rows and saving the file for comparison.
Any EPPlus gurus have an idea?
Update: 30/04/2015. Client understands the issue and accepts it for what it is. No solution has been found.
I'm not familiar with EPPlus, but I've had this issue in VBA before and was able to force the table to fill by using VBA script that looks something like this:
LastRow = Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
Range(Cells(TopRowOfTable,ColumnOfTableRow1),Cells(LastRow,ColumnOfTableRow1).Filldown
Basically just finding the last row, then using the filldown command to force the field to fill.
I wrote a parser that takes some information from Excel sheets using the Spire.xls library and then writes the information to another Excel file.
I'm running into a weird problem. For some reason the program is taking serial numbers such as
03-02281
03-02282
03-01975
And writing them into the Excel sheet as
3/1/2281
3/1/2282
3/1/1975
This only happens with some values.
Others such as
30-04761
03-00613
03-00614
are transcribed unchanged.
I checked in the excel file, the fields are set as text format. So they were either stored that way originally or Excel is interpreting the serial numbers to be dates. Other possibility is that it doesn't happen in the original file and the text is not automatically corrected/changed if I manually type in the correct values.
Does anyone know why this is happening and how I can tell Excel to just treat these as text and nothing else?
I though about appending a ' to them in the beginning of each value, but these have to then be read by other parsers so it's not the most convenient option.
Edit:
Here's some ofthe code I use for this, hopefulyl it can give you guys an idea of where I'm going wrong.
This is the code that adds all the values:
Workbook workbook = new Workbook();
workbook.LoadFromFile(templateExcelFileUri);
Worksheet sheet = workbook.Worksheets[0];
int ColumnIndex = 0; //for the datatable columns iteration
int columnCounter = 1; //for the excel sheet columns iteration
int ColumnsToAdd = 6; //(Seccion, seccion desc, marca, marca desc, **IdArticulo**, articulo desc)
//get the data of the new column
DataColumn DescriptionsDataColumn;
//First, add the suggestions.
for (; ColumnIndex < ColumnsToAdd; ColumnIndex++,
columnCounter++)
{
sheet.InsertColumn(columnCounter);
if(columnCounter==5)
sheet.Columns[5].NumberFormat = "#";// the column with the serial numbers.
DescriptionsDataColumn = AutomatController.DescriptionsTable.Columns[ColumnIndex];
//insert the data into the new column
sheet.InsertDataColumn(DescriptionsDataColumn, true, 2, columnCounter);
}
And for references, the table the values of which I add:
public static void SetDescriptionsTable()
{
DescriptionsTable.Columns.Add("Seccion", typeof(string));
DescriptionsTable.Columns.Add("SeccionDescripcion", typeof(string));
DescriptionsTable.Columns.Add("Marca", typeof(string));
DescriptionsTable.Columns.Add("MarcaDescripcion", typeof(string));
DescriptionsTable.Columns.Add("IdArticulo", typeof(string)); //Serial numbers
DescriptionsTable.Columns.Add("ArticuloDescripcion", typeof(string));
}
Thanks for the edits to the format of my question and the title. I'm still a little new here and I'm learning how to do that better.
The reason why some values do not map to dates is because they fall outside of the format MM-dyyyy format. for example there is no month 30 (30-02281) or day 0 (03-01975).
I think the only thing you need to do is set the format of the target column and cell prior to setting its value through the API. Sometimes cloning a column or a cell defaults the formatting to "Auto" and Excel tries to be too smart.
If you can share a bit of your code the community may be able to more accurately diagnose the problem.
You should set the columns format to general before setting the value.
I need to be able to export some data that is received from a stored procedure in SQL Server 2008. Once the data is returned I need to be able to output it or export it to a new excel spreadsheet.
What is the easiest way of doing this, Can LINQ do this? or am i forced to use XSLT? I presume that i must first convert my data that is returned to XML and then apply XSLT - as XSLT works against XML documents.
XSLT 2 is not available in VS 2008 so we still have to use XSLT 1 - but is this really the way to go or best option?
I would think that it would be possible using an alternative method but maybe i am wrong.
I would really appreciate any advice, tutorials etc
Thanks
for outputting to csv or xml you really don't need any functionality that is not in xpath 1.0 ... its rare that i ran into a situation that required anything more complex.
you could select into an xelement with linq ... however doing this in one statement would mean you cannot validate your data. I usually end up iterating over a collection of elements to handle the edge cases.
HOwever out putting as csv is easier and takes less space than xml ... i think xml is overused tbh.
An alternative (*and i dont recommend it) would be to query sql server from inside the excel document.
That was you can select your data directly into a spread sheet. This is fairly old and I don't much like it tbh.
this a code that export an array of object (you can easily fill it with your data) to an excel spreadsheat :
public static void SaveToExcel(object[,] data)
{
Excel = Microsoft.VisualBasic.Interaction.CreateObject("Excel.Application", String.Empty);
Excel.ScreenUpdating = false;
dynamic workbook = Excel.workbooks;
workbook.Add();
dynamic worksheet = Excel.ActiveSheet;
const int left = 1;
const int top = 1;
int height = data.GetLength(0);
int width = data.GetLength(1);
int bottom = top + height - 1;
int right = left + width - 1;
if (height == 0 || width == 0)
return;
dynamic rg = worksheet.Range[worksheet.Cells[top, left], worksheet.Cells[bottom, right]];
rg.Value = data;
// Set borders
for (var i = 1; i <= 4; i++)
rg.Borders[i].LineStyle = 1;
// Set header view
dynamic rgHeader = worksheet.Range[worksheet.Cells[top, left], worksheet.Cells[top, right]];
rgHeader.Font.Bold = true;
rgHeader.Interior.Color = 189 * (int)Math.Pow(16, 4) + 129 * (int)Math.Pow(16, 2) + 78;
rg.EntireColumn.AutoFit();
// Show excel app
Excel.ScreenUpdating = true;
Excel.Visible = true;
}
Its possible to push it straight out to Excel from SQL Server.
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=D:\testing.xls;',
'SELECT * FROM [SheetName$]') select * from SQLServerTable
This and more examples are available from this source.
ADO.NET also has a driver for Excel. So if your data is naturally a database in "shape" then I'd probably use that.
You could use the Excel interop if you wanted to do formatting and to leverage Excel's spreadsheet capabilities, but this is probably too "messy" for simple data transfer.
Also, as dtb points out, if it was a simple one-table data file, you could use CSV file. Although not native Excel ,it can be readily imported and is usually the easiest way of getting external data into Excel.
If you need a .NET package for writing Excel files, try
NExcelAPI
for old Excel file format (<= 2003), or
ExcelPackage
for the newer Office Open XML format. For both libraries you don't need to have Excel installed.
EDIT: here is another one for the older (Excel 2002/2003) XML based file format
http://www.carlosag.net/Tools/ExcelXmlWriter/
If you've got a few dollars to spend, I've used xPort Tools for the last couple of years and have been pleased with it.