.Net Core: Reading data from CSV & Excel files - c#

Using .net core & c# here.
I have a UI from which user can upload the Excel or CSV files. Once they upload this goes to my web api which handles the reading of the data from these files and returns json.
My Api code as:
[HttpPost("upload")]
public async Task<IActionResult> FileUpload(IFormFile file)
{
JArray data = new JArray();
using (ExcelPackage package = new ExcelPackage(file.OpenReadStream()))
{
ExcelWorksheet worksheet = package.Workbook.Worksheets[1];
//Process, read from excel here and populate jarray
}
return Ok(data );
}
In my above code I am using EPPlus for reading the excel file. For excel file it works all fine but it cannot read csv file which is the limitation of EPPlus.
I searched and found another library CSVHelper: https://joshclose.github.io/CsvHelper/ The issue with this is it does vice versa and can read from CSV but not from Excel.
Is there any library available which supports reading from both.
Or would it be possible use EPPlus only but convert uploaded CSV to excel on the fly and then read. (please note I am not storing the excel file anywhere so cant use save as to save it as excel)
Any inputs please?
--Updated - Added code for reading data from excel---
int rowCount = worksheet.Dimension.End.Row;
int colCount = worksheet.Dimension.End.Column;
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= colCount; col++)
{
var rowValue = worksheet.Cells[row, col].Value;
}
}
//With the code suggested in the answer rowcount is always 1

You can use EPPLus and a MemoryStream for opening csv files into an ExcelPackage without writing to a file. Below is an example. You may have to change some of the the parameters based on your CSV file specs.
[HttpPost("upload")]
public async Task<IActionResult> FileUpload(IFormFile file)
{
var result = string.Empty;
string worksheetsName = "data";
bool firstRowIsHeader = false;
var format = new ExcelTextFormat();
format.Delimiter = ',';
format.TextQualifier = '"';
using (var reader = new System.IO.StreamReader(file.OpenReadStream()))
using (ExcelPackage package = new ExcelPackage())
{
result = reader.ReadToEnd();
ExcelWorksheet worksheet =
package.Workbook.Worksheets.Add(worksheetsName);
worksheet.Cells["A1"].LoadFromText(result, format, OfficeOpenXml.Table.TableStyles.Medium27, firstRowIsHeader);
}
}

Here's using Aspose, which is unfortunately not free, but wow it works great. My API is using the streaming capability with Content-Type: multipart/form-data rather than the IFormFile implementation:
[HttpPut]
[DisableFormValueModelBinding]
public async Task<IActionResult> UploadSpreadsheet()
{
if (!MultipartRequestHelper.IsMultipartContentType(Request.ContentType))
{
return BadRequest($"Expected a multipart request, but got {Request.ContentType}");
}
var boundary = MultipartRequestHelper.GetBoundary(MediaTypeHeaderValue.Parse(Request.ContentType), _defaultFormOptions.MultipartBoundaryLengthLimit);
var reader = new MultipartReader(boundary, HttpContext.Request.Body);
var section = (await reader.ReadNextSectionAsync()).AsFileSection();
//If you're doing CSV, you add this line:
LoadOptions loadOptions = new LoadOptions(LoadFormat.CSV);
var workbook = new Workbook(section.FileStream, loadOptions);
Cells cells = workbook.Worksheets[0].Cells;
var rows = cells.Rows.Cast<Row>().Where(x => !x.IsBlank);
//Do whatever else you want here

Please try with below code
private string uploadCSV(FileUpload fl)
{
string fileName = "";
serverLocation = Request.PhysicalApplicationPath + "ExcelFiles\\";
fileName = fl.PostedFile.FileName;
int FileSize = fl.PostedFile.ContentLength;
string contentType = fl.PostedFile.ContentType;
fl.PostedFile.SaveAs(serverLocation + fileName);
string rpath = string.Empty, dir = string.Empty;
HttpContext context = HttpContext.Current;
string baseUrl = context.Request.Url.Scheme + "://" + context.Request.Url.Authority + context.Request.ApplicationPath.TrimEnd('/') + '/';
try
{
rpath = serverLocation + fileName;//Server.MapPath(dir + fileName);
using (Stream InputStream = fl.PostedFile.InputStream)
{
Object o = new object();
lock (o)
{
byte[] buffer = new byte[InputStream.Length];
InputStream.Read(buffer, 0, (int)InputStream.Length);
lock (o)
{
File.WriteAllBytes(rpath, buffer);
buffer = null;
}
InputStream.Close();
}
}
}
catch (Exception ex)
{
lblSOTargetVal.Text = ex.Message.ToString();
}
return rpath;
}

Use the Open XML SDK package and add insert working solution for it.

Related

How to extract all pages and attachments from PDF to PNG

I am trying to create a process in .NET to convert a PDF and all it's pages + attachments to PNGs. I am evaluating libraries and came across PDFiumSharp but it is not working for me. Here is my code:
string Inputfile = "input.pdf";
string OutputFolder = "Output";
string fileName = Path.GetFileNameWithoutExtension(Inputfile);
using (PdfDocument doc = new PdfDocument(Inputfile))
{
for (int i = 0; i < doc.Pages.Count; i++)
{
var page = doc.Pages[i];
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, false))
{
page.Render(bitmap);
var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
bitmap.Save(targetFile);
}
}
}
When I run this code, I get this exception:
screenshot of exception
Does anyone know how to fix this? Also does PDFiumSharp support extracting PDF attachments? If not, does anyone have any other ideas on how to achieve my goal?
PDFium does not look like it supports extracting PDF attachments. If you want to achieve your goal, then you can take a look at another library that supports both extracting PDF attachments as well as converting PDFs to PNGs.
I am an employee of the LEADTOOLS PDF SDK which you can try out via these 2 nuget packages:
https://www.nuget.org/packages/Leadtools.Pdf/
https://www.nuget.org/packages/Leadtools.Document.Sdk/
Here is some code that will convert a PDF + all attachments in the PDF to separate PNGs in an output directory:
SetLicense();
cache = new FileCache { CacheDirectory = "cache" };
List<LEADDocument> documents = new List<LEADDocument>();
if (!Directory.Exists(OutputDir))
Directory.CreateDirectory(OutputDir);
using var document = DocumentFactory.LoadFromFile("attachments.pdf", new LoadDocumentOptions { Cache = cache, LoadAttachmentsMode = DocumentLoadAttachmentsMode.AsAttachments });
if (document.Pages.Count > 0)
documents.Add(document);
foreach (var attachment in document.Attachments)
documents.Add(document.LoadDocumentAttachment(new LoadAttachmentOptions { AttachmentNumber = attachment.AttachmentNumber }));
ConvertDocuments(documents, RasterImageFormat.Png);
And the ConvertDocuments method:
static void ConvertDocuments(IEnumerable<LEADDocument> documents, RasterImageFormat imageFormat)
{
using var converter = new DocumentConverter();
using var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
converter.SetOcrEngineInstance(ocrEngine, false);
converter.SetDocumentWriterInstance(new DocumentWriter());
foreach (var document in documents)
{
var name = string.IsNullOrEmpty(document.Name) ? "Attachment" : document.Name;
string outputFile = Path.Combine(OutputDir, $"{name}.{RasterCodecs.GetExtension(imageFormat)}");
int count = 1;
while (File.Exists(outputFile))
outputFile = Path.Combine(OutputDir, $"{name}({count++}).{RasterCodecs.GetExtension(imageFormat)}");
var jobData = new DocumentConverterJobData
{
Document = document,
Cache = cache,
DocumentFormat = DocumentFormat.User,
RasterImageFormat = imageFormat,
RasterImageBitsPerPixel = 0,
OutputDocumentFileName = outputFile,
};
var job = converter.Jobs.CreateJob(jobData);
converter.Jobs.RunJob(job);
}
}

Reading xlsx file steam stuck with A disk error occurred during a write operation. (Exception from HRESULT: 0x8003001D (STG_E_WRITEFAULT))

i am using using OfficeOpenXml; for reading excel file and its working fine with other reading file but for one perticular file i got above mention error.
public ReadExcelFile(Stream stream, string worksheet, List<ExcelColumnMapping> columnMapping, ITypeConvert typeConvert = null, int headerSize = 1)
{
_stream = stream;
_excelPackage = new ExcelPackage(stream);
_worksheet = _excelPackage.Workbook.Worksheets[worksheet];
_headerSize = headerSize;
_rowCount = headerSize + 1;
var mapping = ColumnMappingRowNumber(columnMapping);
_resultColumnNumbersAndTypes = mapping.Select(x => Tuple.Create(x.Item2, x.Item3, x.Item4)).ToList();
_converter = typeConvert ?? new ExcelTypeConvert();
ReadLineValues();
}

Efficiently Convert .xslx to .csv in C#?

As input, I have a set of excel files with several worksheets inside. I need to export a single csv file for each worksheet. Below is my code which works but it is very slow. It builds upon the solutions proposed in this previous post. Please consider that I have to run this on rather big .xlsx files (approx. 300Mb).
QUESTION: Is there any way to improve this?
void Main()
{
string folder = #"\\PATH_TO_FOLDER\";
var files = Directory.GetFiles(folder, "*.xlsx", SearchOption.TopDirectoryOnly);
foreach (string file in files)
{
ConvertToCsv(file, Directory.GetParent(file) + #"\\output\");
}
}
public static void ConvertToCsv(string file, string targetFolder)
{
FileInfo finfo = new FileInfo(file);
ExcelPackage package = new ExcelPackage(finfo);
// if targetFolder doesn't exist, create it
if (!Directory.Exists(targetFolder)) {
Directory.CreateDirectory(targetFolder);
}
var worksheets = package.Workbook.Worksheets;
int sheetcount = 0;
foreach (ExcelWorksheet worksheet in worksheets)
{
sheetcount++;
var maxColumnNumber = worksheet.Dimension.End.Column;
var currentRow = new List<string>(maxColumnNumber);
var totalRowCount = worksheet.Dimension.End.Row+1;
var currentRowNum = 1;
//No need for a memory buffer, writing directly to a file
//var memory = new MemoryStream();
string file_name = targetFolder + Path.GetFileNameWithoutExtension(file) + "_" + sheetcount + ".csv";
using (var writer = new StreamWriter(file_name, false, Encoding.UTF8))
{
//the rest of the code remains the same
for (int i = 1; i < totalRowCount; i++)
{
i.Dump();
// populate line with semi columns separators
string line = "";
for (int j = 1; j < worksheet.Dimension.End.Column+1; j++)
{
if (worksheet.Cells[i, j].Value != null)
{
string cell = worksheet.Cells[i, j].Value.ToString() + ";";
line += cell;
}
}
// write line
writer.WriteLine(line);
}
}
}
}

Html table convert as excel and send via email

I'm developing an app which can generate a excel file using html table. Up to now I developed html table download as excel file part. (This happens in client side with javascript). Now I need to send email with that attachment (The excel file) to particular person's email address. So I'm confuse how to do this, because up to now I generate excel in client side and need to send that file via email. In this case is it needed to copy client side excel to the server? If so how to do this?
Please give me a direction.
Update 1 (Adding codes)
This is the javascript, that I used to download html table as excel to client side.
var tablesToExcel = (function () {
var uri = 'data:application/vnd.ms-excel;base64,'
, tmplWorkbookXML = '<?xml version="1.0"?><?mso-application progid="Excel.Sheet"?><Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">'
+ '<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"><Author>Axel Richter</Author><Created>{created}</Created></DocumentProperties>'
+ '<Styles>'
+ '<Style ss:ID="Currency"><NumberFormat ss:Format="Currency"></NumberFormat></Style>'
+ '<Style ss:ID="Date"><NumberFormat ss:Format="Medium Date"></NumberFormat></Style>'
+ '</Styles>'
+ '{worksheets}</Workbook>'
, tmplWorksheetXML = '<Worksheet ss:Name="{nameWS}"><Table>{rows}</Table></Worksheet>'
, tmplCellXML = '<Cell{attributeStyleID}{attributeFormula}><Data ss:Type="{nameType}">{data}</Data></Cell>'
, base64 = function (s) { return window.btoa(unescape(encodeURIComponent(s))) }
, format = function (s, c) { return s.replace(/{(\w+)}/g, function (m, p) { return c[p]; }) }
return function (tables, wsnames, wbname, appname) {
var ctx = "";
var workbookXML = "";
var worksheetsXML = "";
var rowsXML = "";
for (var i = 0; i < tables.length; i++) {
if (!tables[i].nodeType) tables[i] = document.getElementById(tables[i]);
for (var j = 0; j < tables[i].rows.length; j++) {
rowsXML += '<Row>'
for (var k = 0; k < tables[i].rows[j].cells.length; k++) {
var dataType = tables[i].rows[j].cells[k].getAttribute("data-type");
var dataStyle = tables[i].rows[j].cells[k].getAttribute("data-style");
var dataValue = tables[i].rows[j].cells[k].getAttribute("data-value");
dataValue = (dataValue) ? dataValue : tables[i].rows[j].cells[k].innerHTML;
var dataFormula = tables[i].rows[j].cells[k].getAttribute("data-formula");
dataFormula = (dataFormula) ? dataFormula : (appname == 'Calc' && dataType == 'DateTime') ? dataValue : null;
ctx = {
attributeStyleID: (dataStyle == 'Currency' || dataStyle == 'Date') ? ' ss:StyleID="' + dataStyle + '"' : ''
, nameType: (dataType == 'Number' || dataType == 'DateTime' || dataType == 'Boolean' || dataType == 'Error') ? dataType : 'String'
, data: (dataFormula) ? '' : dataValue
, attributeFormula: (dataFormula) ? ' ss:Formula="' + dataFormula + '"' : ''
};
rowsXML += format(tmplCellXML, ctx);
}
rowsXML += '</Row>'
}
ctx = { rows: rowsXML, nameWS: wsnames[i] || 'Sheet' + i };
worksheetsXML += format(tmplWorksheetXML, ctx);
rowsXML = "";
}
ctx = { created: (new Date()).getTime(), worksheets: worksheetsXML };
workbookXML = format(tmplWorkbookXML, ctx);
var link = document.createElement("A");
link.href = uri + base64(workbookXML);
link.download = wbname || 'Workbook.xls';
link.target = '_blank';
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
}
})();
Still I do not have idea to save generated excel to server and send it as email.
AS per our discussion:
1. you need to send data from client to server
you can use this code to do this sending headers and values to server using ajax and you can also filter columns as you want
function SaveToServer() {
var gov = GetHeaders('tbl');
$.ajax({
url: '#Url.Content("~/Home/ReciveData")',
data: { headers: JSON.stringify(gov.heasers), data: JSON.stringify(gov.data) },
success: function (data) {
// Success
},
error: function (xhr) {
}
});
}
function GetHeaders(tableName) {
table = document.getElementById(tableName);
var tbl_Hdata = [];
var tbl_Data = [];
for (var i = 0, row; row = table.rows[i]; i++) {
var rowData = [];
for (var j = 0, col; col = row.cells[j]; j++) {
// add column filter
if (i == 0) {
tbl_Hdata.push(col.innerHTML);
}
else {
rowData.push(col.innerHTML);
}
}
if (i > 0) {
tbl_Data.push(rowData);
}
}
return { heasers: tbl_Hdata, data: tbl_Data };
}
now we want to recive this data and convert it to datatable to save it to excel in server side
using NPOI
public void ReciveData(string headers, string data)
{
#region Read Data
List<string> tbl_Headers = new List<string>();
List<List<string>> tbl_Data = new List<List<string>>();
tbl_Headers = Newtonsoft.Json.JsonConvert.DeserializeObject<List<string>>(headers);
tbl_Data = Newtonsoft.Json.JsonConvert.DeserializeObject<List<List<string>>>(data);
#endregion
#region Create Data Table
DataTable dataTable = new DataTable("Data");
foreach (var prop in tbl_Headers)
{
dataTable.Columns.Add(prop);
}
DataRow row;
foreach (var rw in tbl_Data)
{
row = dataTable.NewRow();
for (int i = 0; i < rw.Count; i++)
{
row[tbl_Headers[i]] = rw[i];
}
dataTable.Rows.Add(row);
}
#endregion
#region Save To excel
string path = #"D:\";
string fileName = "";
GenerateExcelSheetWithoutDownload(dataTable, path, out fileName);
#endregion
}
public bool GenerateExcelSheetWithoutDownload(DataTable dataTable, string exportingSheetPath, out string exportingFileName)
{
#region Validate the parameters and Generate the excel sheet
bool returnValue = false;
exportingFileName = Guid.NewGuid().ToString() + ".xls";
if (dataTable != null && dataTable.Rows.Count > new int())
{
string excelSheetPath = string.Empty;
#region Check If The directory is exist
if (!Directory.Exists(exportingSheetPath))
{
Directory.CreateDirectory(exportingSheetPath);
}
excelSheetPath = exportingSheetPath + exportingFileName;
FileInfo fileInfo = new FileInfo(excelSheetPath);
#endregion
#region Write stream to the file
MemoryStream ms = DataToExcel(dataTable);
byte[] blob = ms.ToArray();
if (blob != null)
{
using (MemoryStream inStream = new MemoryStream(blob))
{
FileStream fs = new FileStream(excelSheetPath, FileMode.Create);
inStream.WriteTo(fs);
fs.Close();
}
}
ms.Close();
returnValue = true;
#endregion
}
return returnValue;
#endregion
}
private static MemoryStream DataToExcel(DataTable dt)
{
MemoryStream ms = new MemoryStream();
using (dt)
{
#region Create File
HSSFWorkbook workbook = new HSSFWorkbook();//Create an excel Workbook
ISheet sheet = workbook.CreateSheet("data");//Create a work table in the table
int RowHeaderIndex = new int();
#endregion
#region Table Headers
IRow headerTableRow = sheet.CreateRow(RowHeaderIndex);
if (dt != null)
{
foreach (DataColumn column in dt.Columns)
{
headerTableRow.CreateCell(column.Ordinal).SetCellValue(column.Caption);
}
RowHeaderIndex++;
}
#endregion
#region Data
foreach (DataRow row in dt.Rows)
{
IRow dataRow = sheet.CreateRow(RowHeaderIndex);
foreach (DataColumn column in dt.Columns)
{
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
}
RowHeaderIndex++;
}
#endregion
workbook.Write(ms);
ms.Flush();
//ms.Position = 0;
}
return ms;
}
Now you can send this file as attachment in mail
You can't create Excel files with HTML tables. This is a hack that's used to fake actual Excel files. Excel isn't fooled, it recognizes the HTML file and tries to import the data using defaults. This will easily break for any number of reasons, eg different locale settings for decimals and dates.
Excel files are just zipped XML files. You can create them using XML manipulation, the Open XML SDK or a library like EPPlus.
Creating an Excel file with EPPlus is as easy as calling the LoadFromCollection or LoadFromDatatable method. The sheet can be saved to any stream, including FileStream or MemoryStream. A MemoryStream can be used to send the data to a web browser as shown in this answer:
public ActionResult ExportData()
{
//Somehow, load data to a DataTable
using (ExcelPackage package = new ExcelPackage())
{
var ws = package.Workbook.Worksheets.Add("My Sheet");
//true generates headers
ws.Cells["A1"].LoadFromDataTable(dataTable, true);
var stream = new MemoryStream();
package.SaveAs(stream);
string fileName = "myfilename.xlsx";
string contentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
stream.Position = 0;
return File(stream, contentType, fileName);
}
}
Mail attachments can also be created from a MemoryStream. The Attachment(Stream, string,string) constructor accepts any stream as input. The example above could be modified to create an attachment instead of sending the data to the browser:
public void SendData(string server, string recipientList)
{
//Same as before
using (ExcelPackage package = new ExcelPackage())
{
var ws = package.Workbook.Worksheets.Add("My Sheet");
ws.Cells["A1"].LoadFromDataTable(dataTable, true);
var stream = new MemoryStream();
package.SaveAs(stream);
string fileName = "myfilename.xlsx";
string contentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
stream.Position = 0;
SendExcel(server,recipientList);
}
}
void SendExcel(string server, string recipientList)
{
//Send the file
var message = new MailMessage("logMailer#contoso.com", recipientList);
message.Subject = "Some Data";
Attachment data = new Attachment(stream, name, contentType);
// Add the attachment to the message.
message.Attachments.Add(data);
// Send the message.
// Include credentials if the server requires them.
var client = new SmtpClient(server);
client.Credentials = CredentialCache.DefaultNetworkCredentials;
client.Send(message);
}
}
UPDATE
Generating an XSLX table on the client side becomes a lot easier if you use a library like js-xlsx. There's even a sample that generates an XLSX file from an HTML table

NPOI return a .xls file with NancyFx

I am trying to create a export function where i send the created .xls file to the user.
I am using NancyFx for the requests and NPOI for the creation of the excel file.
Can't figure out what is wrong with this code, i get a OK 200 response but not content/file returns back.
public class ExportService
{
private HSSFWorkbook HssfWorkbook { get; set; }
public ExportService()
{
HssfWorkbook = new HSSFWorkbook();
}
public Response Export()
{
string fileName = "test2.xls";
var response = new Response();
response.Headers.Add("Content-Disposition", string.Format("attachment;filename={0}", fileName));
InitializeWorkbook();
GenerateData();
response.Contents(WriteToStream());
return response.AsAttachment(fileName, "application/vnd.ms-exce");
}
private MemoryStream WriteToStream()
{
//Write the stream data of workbook to the root directory
MemoryStream file = new MemoryStream();
HssfWorkbook.Write(file);
return file;
}
private void GenerateData()
{
var sheet1 = HssfWorkbook.CreateSheet("Försäljning");
sheet1.CreateRow(0).CreateCell(0).SetCellValue("Detta är ett test");
int x = 1;
for (int i = 1; i <= 15; i++)
{
var row = sheet1.CreateRow(i);
for (int j = 0; j < 15; j++)
{
row.CreateCell(j).SetCellValue(x++);
}
}
}
private void InitializeWorkbook()
{
////create a entry of DocumentSummaryInformation
var documentSummaryInformation = PropertySetFactory.CreateDocumentSummaryInformation();
documentSummaryInformation.Company = "Test Company";
HssfWorkbook.DocumentSummaryInformation = documentSummaryInformation;
////create a entry of SummaryInformation
var summaryInformation = PropertySetFactory.CreateSummaryInformation();
summaryInformation.Subject = "Test Subject";
HssfWorkbook.SummaryInformation = summaryInformation;
}
}
The problem was that my request came from an AJAX call.
Need to save the file and then redirect the user to the file that was created in my AJAX response.

Categories