C# StreamRead in CSV file with a field containing "," - c#

I'm working in closed envrionment where I cannot install additional packages and have limited ability to use .Net framework classes. Plus I have no control over the CSV file format that I'm receiving.
I receive a CSV file that must be pulled into our business system and updates the database.
I can pull the file in to a DataTable via the below code ...
CSV File Ex:
Order# Qty Description ...
12345 3 desc1, desc2, desc3, etc..
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < rows.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
However, the problem is that one field in the CSV file is a description that contains multiple "," characters. Doing the above loads each comma separated word set in the description value into its own index in the rows array.
Currently there should be a total of 10 columns in the csv file but with the description field issue the number of columns vary depending on the length/number of commas in the description field...10, 15, 22 columns etc.
I have no control over the format of the CSV file before it's sent. Is there any way to get around this. Even skipping over this field when creating the DataTable would be fine for my purposes.
Thanks

You can use textqualifier to enclose every field so that the commas or semicolons are not considered as delimeters. The following method should fix the problem.
Install-Package CsvHelper
public static DataTable ReadCSVToDataTable(string path)
{
CsvHelper.Configuration.CsvConfiguration config = new CsvHelper.Configuration.CsvConfiguration();
config.Delimiter = delimeter;
config.Encoding = new UTF8Encoding(false);
if (string.IsNullOrEmpty(textQualifier))
{
config.QuoteAllFields = false;
}
else
{
char qualifier = textQualifier.ToCharArray()[0];
config.Quote = qualifier;
config.QuoteAllFields = true;
}
DataTable dt = new DataTable();
using (var sr = new StreamReader(path))
{
using (var reader = new CsvReader(sr, config))
{
int j = 0;
while (reader.Read())
{
if (j == 0)
{
if (config.HasHeaderRecord)
{
foreach (string header in reader.FieldHeaders)
dt.Columns.Add(header);
}
else
{
for (int i = 0; i < reader.CurrentRecord.Length; i++)
dt.Columns.Add();
}
j++;
}
AddRow(dt, reader);
}
}
}
return dt;
}

Fstagger, this should work for you assuming you have only one column with internal comma's and the CSV is formed properly (especially if the Description field begins with ," and ends with ",. You need to replace my example INDEX_OF_DESCRIPTION with the actual value.
int iDescStart = 0;
int iDescEnd = 0;
string zLine = "";
const int INDEX_OF_DESCRIPTION = 3;
const char SEPARATOR = '\u001F'; //ASCII Unit Separator, decimal 31
while(!sr.EndOfStream){
zLine = sr.ReadLine();
iDescStart = zLine.IndexOf(",\"");
iDescEnd = zLine.IndexOf("\",");
zLine = zLine.Substring(0, iDescStart)
+ ","
+ zLine.Substring(iDescStart + 2, iDescEnd - iDescStart - 2).Replace(',', SEPARATOR)
+ ","
+ zLine.Substring(iDescEnd + 2);
string[] zaFields = zLine.Split(',');
zaFields[INDEX_OF_DESCRIPTION] = zaFields[INDEX_OF_DESCRIPTION].Replace(SEPARATOR, ',');
datarow dr = dt.NewRow();
for (int i = 0; i < zaFields.Length; i++){
dr[i] = zaFields[i];
}
dt.Rows.Add(dr);
}
Let me know if this works for you : )

It looks like your CSV has fixed size columns padded with spaces. So I guess you'd be better off reading a fixed amount of characters for each column and trim the trailing spaces, instead of splitting with comma.

Try this class.
It deals with commas how you need.

My Solution that ended up working
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
var fullrow = String.Empty;
foreach (var entry in rows)
{
fullrow += entry.ToString() + ",";
}
var startQuote = fullrow.IndexOf("\"");
var endQuote = fullrow.IndexOf("\"", startQuote + 1); //LastIndexOf("\"");
if (startQuote > -1 && endQuote > -1)
{
var substring = fullrow.Substring(startQuote, Math.Abs(startQuote - endQuote));
substring = substring.Replace(',', ' ');
fullrow = fullrow.Remove(startQuote, Math.Abs(startQuote - endQuote)).Insert(startQuote, substring);
}
rows = fullrow.Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < rows.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
Thanks #Michael Gorsich for the alternate code!

Related

How can I write a string to format row data to save as a CSV file?

I currently have a program which uses StreamReader to access a CSV file and store the values in a data grid, however when saving this data it is printing a new line for each column value of the data row.
The program currently prints the csv file as:
headerText, headerText, headerText, headerText
Column 1, Column 2, Column 1, Column 2, Column 3, Column 1, Column 2, Column 3, Column 4
What I need it to print is:
headerText, headerText, headerText, headerText
Column 1, Column 2, Column 3, Column 4
string CsvFpath = "C:/StockFile/stockfiletest.csv";
try
{
StreamWriter csvFileWriter = new StreamWriter(CsvFpath, false);
string columnHeaderText = "";
int countColumn = stockGridView.ColumnCount - 1;
if (countColumn >= 0)
{
columnHeaderText = stockGridView.Columns[0].HeaderText;
}
for (int i = 1; i <= countColumn; i++)
{
columnHeaderText = columnHeaderText + ',' + stockGridView.Columns[i].HeaderText;
}
csvFileWriter.WriteLine(columnHeaderText);
foreach (DataGridViewRow dataRowObject in stockGridView.Rows)
{
if (!dataRowObject.IsNewRow)
{
string dataFromGrid = "{0} += {1} += {2} += {3}";
dataFromGrid = dataRowObject.Cells[0].Value.ToString();
for (int i = 1; i <= countColumn; i++)
{
dataFromGrid = dataFromGrid + ',' + dataRowObject.Cells[i].Value.ToString();
csvFileWriter.Write(dataFromGrid);
}
csvFileWriter.WriteLine();
}
}
csvFileWriter.Dispose();
MessageBox.Show("Saved stockfile.csv");
}
catch (Exception exceptionObject)
{
MessageBox.Show(exceptionObject.ToString());
}
Can anyone tell me what I'm doing wrong with my String formation and how to achieve the required file output?
As mentioned in another answer, the issue is that you are writing to the file inside the loop as you process each column, instead of after the loop, when you have collected all the column information for the row.
Another way you could do this is to use string.Join and System.Linq to more concisely concatenate the column values for each row.
Also note that we can wrap the csvFileWriter in a using block, so that it automatically gets closed and disposed when the block execution completes:
using (var csvFileWriter = new StreamWriter(CsvFpath, false))
{
// Write all the column headers, joined with a ','
csvFileWriter.WriteLine(string.Join(",",
stockGridView.Columns.Cast<DataGridViewColumn>().Select(col => col.HeaderText)));
// Grab all the rows that aren't new and, for each one, join the cells with a ','
foreach (var row in stockGridView.Rows.Cast<DataGridViewRow>()
.Where(row => !row.IsNewRow))
{
csvFileWriter.WriteLine(string.Join(",",
row.Cells.Cast<DataGridViewCell>().Select(cell => cell.Value.ToString())));
}
}
Another thing: Instead of writing your own csv parser, there are existing tools that you can use to write csv files, such as CsvHelper, which will handle other sorts of edge cases that can cause problems, such as values that have commas in them.
Your problem is here:
for (int i = 1; i <= countColumn; i++)
{
dataFromGrid = dataFromGrid + ',' + dataRowObject.Cells[i].Value.ToString();
csvFileWriter.Write(dataFromGrid);
}
You're adding to the string each time.. and not clearing it at any point. So on line 1 column one you get ",col1" col2 is ",col1,col2" .. but you're also writing them out each time..
There is a CSV writer class, but, you can do what you're doing, but, just move the write outside the loop, and then reset it. However..
for (int i = 1; i <= countColumn; i++)
{
if (i>1) csvFileWriter.Write(",");
csvFileWriter.Write(dataRowObject.Cells[i].Value.ToString());
}
will stop you getting the extra "," at the start, and write as it goes

Try...catch returning nothing but code is still breaking

UPDATE: So this code is collection a SQL Query into a DataSet prior to this method. This data set is then dropped into excel in the corresponding tab at a specific cell address(which is loaded from the form) but the code below is the exporting to excel method. I am getting the following error:
An unhandled exception of type 'System.AccessViolationException' occurred in SQUiRE (Sql QUery REtriever) v1.exe
Additional information: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
I have been tracking this for a while and thought I fixed it, but my solution was a false positive. So I am using a try...catch block that is breaking but not returning anything. Let me know if you all see anything that I am missing. I usually break on this line (templateSheet = templateBook.Sheets[tabName];) and on the same tabName. The tab is not locked or restricted so It can be written to and works more than half of the time.
public void ExportToExcel(DataSet dataSet, Excel.Workbook templateBook, int i, int h, Excel.Application excelApp) //string filePath,
{
try
{
lock (this.GetType())
{
Excel.Worksheet templateSheet;
//check to see if the template is already open, if its not then open it,
//if it is then bind it to work with it
//if (!fileOpenTest)
//{ templateBook = excelApp.Workbooks.Open(filePath); }
//else
//{ templateBook = (Excel.Workbook)System.Runtime.InteropServices.Marshal.BindToMoniker(filePath); }
//Grabs the name of the tab to dump the data into from the "Query Dumps" Tab
string tabName = lstQueryDumpSheet.Items[i].ToString();
templateSheet = templateBook.Sheets[tabName];
// Copy DataTable
foreach (System.Data.DataTable dt in dataSet.Tables)
{
// Copy the DataTable to an object array
object[,] rawData = new object[dt.Rows.Count + 1, dt.Columns.Count];
// Copy the values to the object array
for (int col = 0; col < dt.Columns.Count; col++)
{
for (int row = 0; row < dt.Rows.Count; row++)
{ rawData[row, col] = dt.Rows[row].ItemArray[col]; }
}
// Calculate the final column letter
string finalColLetter = string.Empty;
string colCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
int colCharsetLen = 26;
if (dt.Columns.Count > colCharsetLen)
{ finalColLetter = colCharset.Substring((dt.Columns.Count - 1) / colCharsetLen - 1, 1); }
finalColLetter += colCharset.Substring((dt.Columns.Count - 1) % colCharsetLen, 1);
/*Grabs the full cell address from the "Query Dump" sheet, splits on the '=' and
*pulls out only the cell address (i.e., "address=a3" becomes "a3")*/
string dumpCellString = lstQueryDumpText.Items[i].ToString();
string dumpCell = dumpCellString.Split('=').Last();
/*Refers to the range in which we are dumping the DataSet. The upper right hand cell is
*defined by 'dumpCell'and the bottom right cell is defined by the final column letter
*and the count of rows.*/
string firstRef = "";
string baseRow = "";
//Determines if the column is one letter or two and handles them accordingly
if (char.IsLetter(dumpCell, 1))
{
char[] createCellRef = dumpCell.ToCharArray();
firstRef = createCellRef[0].ToString() + createCellRef[1].ToString();
for (int z = 2; z < createCellRef.Count(); z++)
{ baseRow = baseRow + createCellRef[z].ToString(); }
}
else
{
char[] createCellRef = dumpCell.ToCharArray();
firstRef = createCellRef[0].ToString();
for (int z = 1; z < createCellRef.Count(); z++)
{ baseRow = baseRow + createCellRef[z].ToString(); }
}
int baseRowInt = Convert.ToInt32(baseRow);
int startingCol = ColumnLetterToColumnIndex(firstRef);
int endingCol = ColumnLetterToColumnIndex(finalColLetter);
int finalCol = startingCol + endingCol;
string endCol = ColumnIndexToColumnLetter(finalCol - 1);
int endRow = (baseRowInt + (dt.Rows.Count - 1));
string cellCheck = endCol + endRow;
string excelRange;
if (dumpCell.ToUpper() == cellCheck.ToUpper())
{ excelRange = string.Format(dumpCell + ":" + dumpCell); }
else
{ excelRange = string.Format(dumpCell + ":{0}{1}", endCol, endRow); }
//Dumps the cells into the range on Excel as defined above
templateSheet.get_Range(excelRange, Type.Missing).Value2 = rawData;
/*Check to see if all the SQL queries have been run from
if (i == lstSqlAddress.Items.Count - 1)
{
//Turn Auto Calc back on
excelApp.Calculation = Excel.XlCalculation.xlCalculationAutomatic;
/*Run through the value save sheet array then grab the address from the corresponding list
*place in the address array. If the address reads "whole sheet" then save the whole page,
*else set the addresses range and value save that.
for (int y = 0; y < lstSaveSheet.Items.Count; y++)
{
MessageBox.Show("Save Sheet: " + lstSaveSheet.Items[y] + "\n" + "Save Address: " + lstSaveRange.Items[y]);
}*/
//run the macro to hide the unused columns
excelApp.Run("ReportMakerExecute");
//save excel file as hospital name and move onto the next
SaveTemplateAs(templateBook, h);
}
}
}
}
catch (Exception e)
{
MessageBox.Show(e.ToString());
}
}

Issue with .NET String.Split

I'm attempting to parse a text file containing data that is being used on a remote FTP server. The data is delimited by an equals sign (=) and I'm attempting to load each row in to two columns in a DataGridView. The code I have written works fine except for when an equals character is thrown into the second column's value. When this happens, regardless of specifying the maximum count as being 2. I'd prefer not to change the delimiter if possible.
Here is the code that is being problematic:
dataGrid_FileContents.Rows.Clear();
char delimiter = '=';
StreamReader fileReader = new StreamReader(fileLocation);
String fileData = fileReader.ReadToEnd();
String[] rows = fileData.Split("\n".ToCharArray());
for(int i = 0; i < rows.Length; i++)
{
String str = rows[i];
String[] items = str.Split(new char[] { delimiter }, 1, StringSplitOptions.RemoveEmptyEntries);
if (items.Length == 2)
{
dataGrid_FileContents.Rows.Add(items[0], items[1]);
}
}
fileReader.Close();
And an example of the file being loaded:
boats=123
cats=234-f
cars==1
It works as intended for the first two rows and then ignores the last row as it ends up creating a String[] with 1 element and two String[]s with zero elements.
Try the following. It will capture the value before and after the first '=', correctly parsing the cars==1 scenario.
String[] items = str.Split(new char[] { delimiter }, 2, stringSplitOptions.None);
A different solution, if you want everything after the first equals then you could approach this problem using string.IndexOf
for(int i = 0; i < rows.Length; i++)
{
String str = rows[i];
int pos = str.IndexOf(delimiter);
if (pos != -1)
{
string first = str.Substring(0, pos-1);
string second = str.Substring(pos + 1);
dataGrid_FileContents.Rows.Add(first, second);
}
}
Just read all items delimeted by '=' in row.
Then iterate over items, and check, that item not empty, than use this prepared data to write
here illustrated snippet
http://dotnetfiddle.net/msVho2
and your snippet can be transformed to something like bellow
dataGrid_FileContents.Rows.Clear();
char delimiter = '=';
using(StreamReader fileReader = new StreamReader(fileLocation))
{
string[] data = new string[2];
while(true)
{
string row = fileReader.ReadLine();
if(row == null)
break;
string[] items = row.Split(delimiter);
int data_index = 0;
foreach(string item in items)
{
if(data_index >= data.Length)
{
//TODO: log warning
break;
}
if(!string.IsNullOrWhiteSpace(item))
{
data[data_index++] = item;
}
}
if(data_index < data.Length)
{
//TODO: log error, only 1 item in row
continue;
}
dataGrid_FileContents.Rows.Add(data[0], data[1]);
}
}

Writing rows from a DataTable is creating a run-on write: how do I preserve each line as it is written?

Given this code:
using (StreamWriter sw = File.CreateText(file))
{
for (int r = 0; r < originalDataTable.Rows.Count; r++)
{
for (int c = 0; c < originalDataTable.Columns.Count; c++)
{
var rowValueAtColumn = originalDataTable.Rows[r][c].ToString();
var valueToWrite = string.Format(#"{0}{1}", rowValueAtColumn, "\t");
if (c != originalDataTable.Columns.Count)
sw.Write(valueToWrite);
else
sw.Write(valueToWrite + #"\n");
}
}
}
I am trying to write a DataRow back to a file one row at a time; however, the file it is creating is creating a run-on sentence where all the data being written to the file is just in one line. There should be 590 individual lines not just one.
What do I need to add to the code above so that my lines are broken out as they are in the data table? My code just doesn't seem to be working.
sw.Write(valueToWrite + #"\n"); is wrong. Because of the # it is not entering a newline, you are writing the character \ then the character n.
You want to do either sw.Write(valueToWrite + "\n"); or have the program put a new line in for you by doing sw.WriteLine(valueToWrite), however that will enter Environment.NewLine which is \r\n on windows.
However you can make your code even simpler by inserting the row the separator outside of the column for loop. I also defined the two separators at the top of the loop in case you want to ever change them (What will the program you are sending this to do when you hit some data that has a \t or a \n in the text itself?), and a few other small tweaks to make the code easier to read.
string colSeperator = "\t";
string rowSeperator = "\n";
using (StreamWriter sw = File.CreateText(file))
{
for (int r = 0; r < originalDataTable.Rows.Count; r++)
{
for (int c = 0; c < originalDataTable.Columns.Count; c++)
{
sw.Write(originalDataTable.Rows[r][c])
sw.Write(colSeperator);
}
sw.Write(rowSeperator);
}
}
Here is another similification just to show other ways to do it (now that you don't need to check originalDataTable.Columns.Count)
string colSeperator = "\t";
string rowSeperator = "\n";
using (StreamWriter sw = File.CreateText(file))
{
foreach (DataRow row in originalDataTable.Rows)
{
foreach (object value in row.ItemArray))
{
sw.Write(value)
sw.Write(colSeperator);
}
sw.Write(rowSeperator);
}
}
Change sw.Write() to sw.WriteLine()
http://msdn.microsoft.com/en-us/library/system.io.streamwriter.writeline.aspx

Data truncation message when inserting data from one row to another row

Is there a way to allow data to get truncated if its too long when inserting it from one
row to another row? Whats happening now is that it gets an error and doesn't add the row if one of the fields is too long.
Here is the piece of code that I have:
DataRow dr = dt.NewRow();
for (int j = 0; j < edi50.Columns.Count; j++)
dr[j] = dr50[j];
dt.Rows.Add(dr);
try
{
RemoveNulls(dt);
daEDI40050.Update(dt);
}
catch (Exception e)
{
string m = e.Message;
}
I have a description field 25 chars long but the data is 34 chars going into it. I want to be able to have it insert the 25 and truncate the rest and still add the row.
thank you
You could get the schema from database first(untested):
DataTable schema;
using (var con = new System.Data.SqlClient.SqlConnection(conStr))
{
var getSchemaSql = String.Format("SELECT * FROM {0}", tableName);
using (var schemaCommand = new System.Data.SqlClient.SqlCommand(getSchemaSql, con))
{
con.Open();
using (var reader = schemaCommand.ExecuteReader(CommandBehavior.SchemaOnly))
{
schema = reader.GetSchemaTable();
}
}
}
and then something similar to this:
for (int j = 0; j < schema.Rows.Count; j++)
{
DataRow schemaRow = schema.Rows[j];
Type dataType = schemaRow.Field<Type>("DataType");
int columnSize = schemaRow.Field<int>("ColumnSize");
if (dataType.FullName == "System.String")
{
String value = dr50[j] as String;
if (value != null)
value = value.Substring(0, columnSize);
}
}
Again, totally untested and written from scratch, but it might give you an idea how to get the column size. Of course this works only for string, but i assume that this is what you want.
If you know which column has the limit, and which field that column maps to, then simply truncate the field's value on all objects before calling Update():
myObject.StringField = myObject.StringField.Substring(0,25);

Categories