I have a method which create an excel file (.xlsx) from a list of strings using DocumentFormat.OpenXml. The created file needs to be repaired when I try to open it with Excel 2016. When I click "Yes" Excel shows my file correctly.
Does anyone have any suggestions? Thanks in advance.
Here's my code:
private byte[] ExportDataXlsx(System.Data.Common.DbDataReader reader, string[] fields, string[] headers, string Culture) {
System.IO.MemoryStream sw = new System.IO.MemoryStream();
using (var workbook = Packaging.SpreadsheetDocument.Create(sw, SpreadsheetDocumentType.Workbook)) {
var sheetData = CreateSheet(workbook);
while (reader.Read()) {
Spreadsheet.Row newRow = new Spreadsheet.Row();
foreach (string column in fields) {
Spreadsheet.Cell cell = new Spreadsheet.Cell();
cell.DataType = Spreadsheet.CellValues.String;
object value = null;
try {
int index = reader.GetOrdinal(column);
cell.DataType = DbKymosDomainService.ToXlsType(reader.GetFieldType(index));
value = DbKymosDomainService.ToStringFromCulture(reader.GetValue(index), reader.GetFieldType(index), Culture);
if (cell.DataType == Spreadsheet.CellValues.Number){
value = value == null ? "" : value.ToString().Replace(",", ".");
}
}
catch { }
cell.CellValue = new Spreadsheet.CellValue(value == null ? null : value.ToString()); //
newRow.AppendChild(cell);
try { var x = newRow.InnerXml; } catch { newRow.RemoveChild(cell); }
}
sheetData.AppendChild(newRow);
}
workbook.Close();
}
byte[] data = sw.ToArray();
sw.Close();
sw.Dispose();
return data;
}
Function which create sheet:
private Spreadsheet.SheetData CreateSheet(Packaging.SpreadsheetDocument workbook)
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Spreadsheet.Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Spreadsheet.Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<Packaging.WorksheetPart>();
var sheetData = new Spreadsheet.SheetData();
sheetPart.Worksheet = new Spreadsheet.Worksheet(sheetData);
Spreadsheet.Sheets sheets = workbook.WorkbookPart.Workbook.GetFirstChild<Spreadsheet.Sheets>();
string relationshipId = workbook.WorkbookPart.GetIdOfPart(sheetPart);
uint sheetId = 1;
if (sheets.Elements<Spreadsheet.Sheet>().Count() > 0) {
sheetId =
sheets.Elements<Spreadsheet.Sheet>().Select(s => s.SheetId.Value).Max() + 1;
}
Spreadsheet.Sheet sheet = new Spreadsheet.Sheet() { Id = relationshipId, SheetId = sheetId, Name = "Export" };
sheets.Append(sheet);
return sheetData;
}
In my experience when a file needs to be repaired after creating it using OpenXML it means that it is missing a crucial element or the crucial element is in the wrong place. I'm having difficulty following your code so that in itself points to something being in the wrong place. Code should be sequential and self-explanatory. A few pointers however to help with getting to the root cause of your issue.
I would suggest first using ClosedXML as it takes so much strain out of the coding.https://github.com/closedxml/closedxml
Debug your code and step through each step to see what's going on.
Open the created file in OpenXML Productivity Tool https://github.com/OfficeDev/Open-XML-SDK/releases/tag/v2.5 and have a look around.
Another tool that I couldn't be without is OpenXML FileViewer: https://github.com/davecra/OpenXmlFileViewer
Lastly I always run this sub routine to validate documents I create using OpenXML:
public static List<string> ValidateWordDocument(FileInfo filepath, ref Int32 maxerrors = 100)
{
try
{
using (WordprocessingDocument wDoc = WordprocessingDocument.Open(filepath.FullName, false))
{
OpenXmlValidator validator = new OpenXmlValidator();
int count = 0;
List<string> er = new List<string>()
{
string.Format($"Assessment of {filepath.Name} on {DateTime.Now} yielded the following result: {Constants.vbCrLf}")
};
// set at zero so that we can determine the total quantity of errors
validator.MaxNumberOfErrors = 0;
// String.Format("<strong> Warning : </strong>")
foreach (ValidationErrorInfo error in validator.Validate(wDoc))
{
count += 1;
if (count > maxerrors)
break;
er.Add($"Error {count}{Constants.vbCrLf}" + $"Description {error.Description}{Constants.vbCrLf}" + $"ErrorType: {error.ErrorType}{Constants.vbCrLf}" + $"Node {error.Node}{Constants.vbCrLf}" + $"Name {error.Node.LocalName}{Constants.vbCrLf}" + $"Path {error.Path.XPath}{Constants.vbCrLf}" + $"Part: {error.Part.Uri}{Constants.vbCrLf}" + $"-------------------------------------------{Constants.vbCrLf}" + $"Outer XML: {error.Node.OuterXml}" + $"-------------------------------------------{Constants.vbCrLf}");
}
int validatorcount = validator.Validate(wDoc).Count;
switch (validatorcount)
{
case object _ when validatorcount > maxerrors:
{
er.Add($"Returned {count - 1} as this is the Maximum Number set by the system. The actual number of errors in {filepath.Name} is {validatorcount}");
er.Add("A summary list of all error types encountered is given below");
List<string> expectedErrors = validator.Validate(wDoc).Select(_e => _e.Description).Distinct().ToList();
er.AddRange(expectedErrors);
break;
}
case object _ when 1 <= validatorcount && validatorcount <= maxerrors:
{
er.Add($"Returned all {validator} errors in {filepath.Name}");
break;
}
case object _ when validatorcount == 0:
{
er.Add($"No Errors found in document {filepath.Name}");
break;
}
}
return er;
wDoc.Close();
}
}
catch (Exception ex)
{
Information.Err.MessageElevate();
return null;
}
}
It helps greatly with problem solving any potential issues.
I have a program to parse a CSV file from local filesystem to a specified SQL Server table.
Now when i execute the program i get error :
System.IndexOutOfRangeException: 'Cannot find column 1' exception on the line where i the program attempts to populate the datatable.
On closer inspection the error shows that its emanating from row number 3 as shown on this link :
CSV_ERROR
This is how i am reading and saving the CSV file :
static void Main(string[] args)
{
var absPath = #"C:\Users\user\Documents\Projects\MastercardSurveillance\fbc_mc_all_cards.csv";
ProcessFile();
void ProcessFile()
{
string realPath = #"C:\Users\user\Documents\CSV";
string appLog = "CSVERRORS";
var logPath = realPath + Convert.ToString(appLog) + DateTime.Today.ToString("dd -MM-yy") + ".txt";
if (!File.Exists(logPath))
{
File.Create(logPath).Dispose();
}
var dt = GetDATATable();
if (dt == null)
{
return;
}
if (dt.Rows.Count == 0)
{
using (StreamWriter sw = File.AppendText(logPath))
{
sw.WriteLine("No rows imported after reading file " + absPath);
sw.Flush();
sw.Close();
}
return;
}
ClearData();
InsertDATA();
}
DataTable GetDATATable()
{
var FilePath = absPath;
string TableName = "Cards";
string realPath = #"C:\Users\user\Documents\CSV";
string appLog = "CSVERRORS";
var logPath = realPath + Convert.ToString(appLog) + DateTime.Today.ToString("dd -MM-yy") + ".txt";
if (!File.Exists(logPath))
{
File.Create(logPath).Dispose();
}
var dt = new DataTable(TableName);
using (var csvReader = new TextFieldParser(FilePath))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
var readFields = csvReader.ReadFields();
if (readFields == null)
{
using (StreamWriter sw = File.AppendText(logPath))
{
sw.WriteLine("Could not read header fields for file " + FilePath);
sw.Flush();
sw.Close();
}
return null;
}
foreach (var dataColumn in readFields.Select(column => new DataColumn(column, typeof(string)) { AllowDBNull = true, DefaultValue = string.Empty }))
{
dt.Columns.Add(dataColumn);
}
while (!csvReader.EndOfData)
{
var data = csvReader.ReadFields();
if (data == null)
{
using (StreamWriter sw = File.AppendText(logPath))
{
sw.WriteLine(string.Format("Could not read fields on line {0} for file {1}", csvReader.LineNumber, FilePath));
sw.Flush();
sw.Close();
}
continue;
}
var dr = dt.NewRow();
for (var i = 0; i < data.Length; i++)
{
if (!string.IsNullOrEmpty(data[i]))
{
dr[i] = data[i];
}
}
dt.Rows.Add(dr);
}
}
return dt;
}
void ClearData()
{
string SqlSvrConn = #"Server=XXXXXX-5QFK4BL\MSDEVOPS;Database=McardSurveillance;Trusted_Connection=True;MultipleActiveResultSets=true;";
using (var sqlConnection = new SqlConnection(SqlSvrConn))
{
sqlConnection.Open();
// Truncate the live table
using (var sqlCommand = new SqlCommand(_truncateLiveTableCommandText, sqlConnection))
{
sqlCommand.ExecuteNonQuery();
}
}
}
void InsertDATA()
{
string SqlSvrConn = #"Server=XXXXXX-5QFK4BL\MSDEVOPS;Database=McardSurveillance;Trusted_Connection=True;MultipleActiveResultSets=true;";
DataTable table = GetDATATable();
using (var sqlBulkCopy = new SqlBulkCopy(SqlSvrConn))
{
sqlBulkCopy.DestinationTableName = "dbo.Cards";
for (var count = 0; count < table.Columns.Count; count++)
{
sqlBulkCopy.ColumnMappings.Add(count, count);
}
sqlBulkCopy.WriteToServer(table);
}
}
}
How can i identify and possibly exclude the extra data columns being returned from the CSV file?
It appears there is a mismatch between number of columns in datatable and number of columns being read from the CSV file.
Im not sure however how i can account for this with my logic. For now i did not want to switch to using a CSV parse package but rather i need insight on how i can remove the extra column or rather ensure that the splitting takes account of all possible dubious characters.
For clarity i have a copy of the CSV file here :
CSV_FILE
I have 300 csv files that each file contain 18000 rows and 27 columns.
Now, I want to make a windows form application which import them and show in a datagridview and do some mathematical operation later.
But, my performance is very inefficiently...
After search this problem by google, I found a solution "A Fast CSV Reader".
(http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader)
I'm follow the code step by step, but my datagridview still empty.
I don't know how to solve this problem.
Could anyone tell me how to do or give me another better way to read csv efficiently.
Here is my code...
using System.IO;
using LumenWorks.Framework.IO.Csv;
private void Form1_Load(object sender, EventArgs e)
{
ReadCsv();
}
void ReadCsv()
{
// open the file "data.csv" which is a CSV file with headers
using (CachedCsvReader csv = new
CachedCsvReader(new StreamReader("data.csv"), true))
{
// Field headers will automatically be used as column names
dataGridView1.DataSource = csv;
}
}
Here is my input data:
https://dl.dropboxusercontent.com/u/28540219/20130102.csv
Thanks...
The data you provide contains no headers (first line is a data line). So I got an ArgumentException (item with same key added) when I tried to add the csv reader to the DataSource. Setting the hasHeaders parameter in the CachCsvReader constructor did the trick and it added the data to the DataGridView (very fast).
using (CachedCsvReader csv = new CachedCsvReader(new StreamReader("data.csv"), false))
{
dataGridView.DataSource = csv;
}
Hope this helps!
You can also do like
private void ReadCsv()
{
string filePath = #"C:\..\20130102.csv";
FileStream fileStream = null;
try
{
fileStream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
}
catch (Exception ex)
{
return;
}
DataTable table = new DataTable();
bool isColumnCreated = false;
using (StringReader reader = new StringReader(new StreamReader(fileStream, Encoding.Default).ReadToEnd()))
{
while (reader.Peek() != -1)
{
string line = reader.ReadLine();
if (line == null || line.Length == 0)
continue;
string[] values = line.Split(',');
if(!isColumnCreated)
{
for(int i=0; i < values.Count(); i++)
{
table.Columns.Add("Column" + i);
}
isColumnCreated = true;
}
DataRow row = table.NewRow();
for(int i=0; i < values.Count(); i++)
{
row[i] = values[i];
}
table.Rows.Add(row);
}
}
dataGridView1.DataSource = table;
}
Based on you performance requirement, this code can be improvised. It is just a working sample for your reference.
I hope this will give some idea.
Hi I'm using csvHelper to read in a csv files with a variable number of columns. The first row always contains a header row. The number of columns is unknown at first, sometimes there are three columns and sometimes there are 30+. The number of rows can be large.
I can read in the csv file, but how do I address each column of data. I need to do some basic stats on the data (e.g. min, max, stddev), then write them out in a non csv format.
Here is my code so far...
try{
using (var fileReader = File.OpenText(inFile))
using (var csvResult = new CsvHelper.CsvReader(fileReader))
{
// read the header line
csvResult.Read();
// read the whole file
dynamic recs = csvResult.GetRecords<dynamic>().ToList();
/* now how do I get a whole column ???
* recs.getColumn ???
* recs.getColumn['hadername'] ???
*/
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
}
Thanks
I don't think the library is capable of doing so directly. You have to read your column from individual fields and add them to a List, but the process is usually fast because readers do job fast. For example if your desired column is of type string, the code would be like so:
List<string> myStringColumn= new List<string>();
using (var fileReader = File.OpenText(inFile))
using (var csvResult = new CsvHelper.CsvReader(fileReader))
{
while (csvResult.Read())
{
string stringField=csvResult.GetField<string>("Header Name");
myStringColumn.Add(stringField);
}
}
using (System.IO.StreamReader file = new System.IO.StreamReader(Server.MapPath(filepath)))
{
//Csv reader reads the stream
CsvReader csvread = new CsvReader(file);
while (csvread.Read())
{
int count = csvread.FieldHeaders.Count();
if (count == 55)
{
DataRow dr = myExcelTable.NewRow();
if (csvread.GetField<string>("FirstName") != null)
{
dr["FirstName"] = csvread.GetField<string>("FirstName"); ;
}
else
{
dr["FirstName"] = "";
}
if (csvread.GetField<string>("LastName") != null)
{
dr["LastName"] = csvread.GetField<string>("LastName"); ;
}
else
{
dr["LastName"] = "";
}
}
else
{
lblMessage.Visible = true;
lblMessage.Text = "Columns are not in specified format.";
lblMessage.ForeColor = System.Drawing.Color.Red;
return;
}
}
}
I have a remote sql connection in C# that needs to execute a query and save its results to the users's local hard disk. There is a fairly large amount of data this thing can return, so need to think of an efficient way of storing it. I've read before that first putting the whole result into memory and then writing it is not a good idea, so if someone could help, would be great!
I am currently storing the sql result data into a DataTable, although I am thinking it could be better doing something in while(myReader.Read(){...}
Below is the code that gets the results:
DataTable t = new DataTable();
string myQuery = QueryLoader.ReadQueryFromFileWithBdateEdate(#"Resources\qrs\qryssysblo.q", newdate, newdate);
using (SqlDataAdapter a = new SqlDataAdapter(myQuery, sqlconn.myConnection))
{
a.Fill(t);
}
var result = string.Empty;
for(int i = 0; i < t.Rows.Count; i++)
{
for (int j = 0; j < t.Columns.Count; j++)
{
result += t.Rows[i][j] + ",";
}
result += "\r\n";
}
So now I have this huge result string. And I have the datatable. There has to be a much better way of doing it?
Thanks.
You are on the right track yourself. Use a loop with while(myReader.Read(){...} and write each record to the text file inside the loop. The .NET framework and operating system will take care of flushing the buffers to disk in an efficient way.
using(SqlConnection conn = new SqlConnection(connectionString))
using(SqlCommand cmd = conn.CreateCommand())
{
conn.Open();
cmd.CommandText = QueryLoader.ReadQueryFromFileWithBdateEdate(
#"Resources\qrs\qryssysblo.q", newdate, newdate);
using(SqlDataReader reader = cmd.ExecuteReader())
using(StreamWriter writer = new StreamWriter("c:\temp\file.txt"))
{
while(reader.Read())
{
// Using Name and Phone as example columns.
writer.WriteLine("Name: {0}, Phone : {1}",
reader["Name"], reader["Phone"]);
}
}
}
I came up with this, it's a better CSV writer than the other answers:
public static class DataReaderExtension
{
public static void ToCsv(this IDataReader dataReader, string fileName, bool includeHeaderAsFirstRow)
{
const string Separator = ",";
StreamWriter streamWriter = new StreamWriter(fileName);
StringBuilder sb = null;
if (includeHeaderAsFirstRow)
{
sb = new StringBuilder();
for (int index = 0; index < dataReader.FieldCount; index++)
{
if (dataReader.GetName(index) != null)
sb.Append(dataReader.GetName(index));
if (index < dataReader.FieldCount - 1)
sb.Append(Separator);
}
streamWriter.WriteLine(sb.ToString());
}
while (dataReader.Read())
{
sb = new StringBuilder();
for (int index = 0; index < dataReader.FieldCount; index++)
{
if (!dataReader.IsDBNull(index))
{
string value = dataReader.GetValue(index).ToString();
if (dataReader.GetFieldType(index) == typeof(String))
{
if (value.IndexOf("\"") >= 0)
value = value.Replace("\"", "\"\"");
if (value.IndexOf(Separator) >= 0)
value = "\"" + value + "\"";
}
sb.Append(value);
}
if (index < dataReader.FieldCount - 1)
sb.Append(Separator);
}
if (!dataReader.IsDBNull(dataReader.FieldCount - 1))
sb.Append(dataReader.GetValue(dataReader.FieldCount - 1).ToString().Replace(Separator, " "));
streamWriter.WriteLine(sb.ToString());
}
dataReader.Close();
streamWriter.Close();
}
}
usage: mydataReader.ToCsv("myfile.csv", true)
Rob Sedgwick answer is more like it, but can be improved and simplified. This is how I did it:
string separator = ";";
string fieldDelimiter = "";
bool useHeaders = true;
string connectionString = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
using (SqlConnection conn = new SqlConnection(connectionString))
{
using (SqlCommand cmd = conn.CreateCommand())
{
conn.Open();
string query = #"SELECT whatever";
cmd.CommandText = query;
using (SqlDataReader reader = cmd.ExecuteReader())
{
if (!reader.Read())
{
return;
}
List<string> columnNames = GetColumnNames(reader);
// Write headers if required
if (useHeaders)
{
first = true;
foreach (string columnName in columnNames)
{
response.Write(first ? string.Empty : separator);
line = string.Format("{0}{1}{2}", fieldDelimiter, columnName, fieldDelimiter);
response.Write(line);
first = false;
}
response.Write("\n");
}
// Write all records
do
{
first = true;
foreach (string columnName in columnNames)
{
response.Write(first ? string.Empty : separator);
string value = reader[columnName] == null ? string.Empty : reader[columnName].ToString();
line = string.Format("{0}{1}{2}", fieldDelimiter, value, fieldDelimiter);
response.Write(line);
first = false;
}
response.Write("\n");
}
while (reader.Read());
}
}
}
And you need to have a function GetColumnNames:
List<string> GetColumnNames(IDataReader reader)
{
List<string> columnNames = new List<string>();
for (int i = 0; i < reader.FieldCount; i++)
{
columnNames.Add(reader.GetName(i));
}
return columnNames;
}
I agree that your best bet here would be to use a SqlDataReader. Something like this:
StreamWriter YourWriter = new StreamWriter(#"c:\testfile.txt");
SqlCommand YourCommand = new SqlCommand();
SqlConnection YourConnection = new SqlConnection(YourConnectionString);
YourCommand.Connection = YourConnection;
YourCommand.CommandText = myQuery;
YourConnection.Open();
using (YourConnection)
{
using (SqlDataReader sdr = YourCommand.ExecuteReader())
using (YourWriter)
{
while (sdr.Read())
YourWriter.WriteLine(sdr[0].ToString() + sdr[1].ToString() + ",");
}
}
Mind you, in the while loop, you can write that line to the text file in any format you see fit with the column data from the SqlDataReader.
Keeping your original approach, here is a quick win:
Instead of using String as a temporary buffer, use StringBuilder. That will allow you to use the function .append(String) for concatenations, instead of using the operator +=.
The operator += is specially inefficient, so if you place it on a loop and it is repeated (potentially) millions of times, the performance will be affected.
The .append(String) method won't destroy the original object, so it's faster
Using the response object without a response.Close() causes at least in some instances the html of the page writing out the data to be written to the file. If you use Response.Close() the connection can be closed prematurely and cause an error producing the file.
It is recommended to use the HttpApplication.CompleteRequest() however this appears to always cause the html to be written to the end of the file.
I have tried the stream in conjunction with the response object and have had success in the development environment. I have not tried it in production yet.
I used .CSV to export data from database by DataReader. in my project i read datareader and create .CSV file manualy. in a loop i read datareader and for every rows i append cell value to result string. for separate columns i use "," and for separate rows i use "\n". finally i saved result string as result.csv.
I suggest this high performance extension. i tested it and quickly export 600,000 rows as .CSV .
I use:
private void SaveData(string path)
{
DataTable tblResult = new DataTable();
using(SqlCommand cm = new SqlCommand("select something", objConnect))
{
tblResult.Load(cm.ExecuteLoad());
}
if (tblResult != null)
{
using(FileStream fs = new FileStream(path, FileMode.Create, FileAccess.Write))
{
BinaryFormatter bin = new BinaryFormatter();
bin.Serialize(fs, tblResult);
}
}
}
ease to use, and easy to load, with:
private DataTable LoadData(string path)
{
DataTable t = new DataTable();
using(FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
{
BinaryFormatter bin = new BinaryFormatter();
t = (DataTable)bin.Deserialize(fs);
}
return t;
}
you can use this method also to save a DataSet.