[C#]TextFieldParser not behaving as expected

[C#]TextFieldParser not behaving as expected - c#

I am trying to read the csv you can download from here: https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=planets . Just click on "Download Table" and select CSV, all columns, all rows.
The code has some problems:
How to recognize the comment? I expect the class to simply skip them and do not put them in the fields variables. But they are.
Why the number of columns is wrong? They are 403 and instead it find 405. According to pandas (Python3) they are 403. In fact when I try to use TextFieldParser for more complicated operations on this csv I get some errors like OutOfBoundary related to the index of the array (of course, columns are 403 but it though they are 405).
Code:
private void loadData(string fileName) {
int rows = 0;
int columns = 0;
using (TextFieldParser parser = new TextFieldParser(fileName, Encoding.UTF8))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
parser.CommentTokens = new []{"#"};
parser.TrimWhiteSpace = false;
parser.HasFieldsEnclosedInQuotes = false;
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
if (fields.Length == 0) {
//Should be a commment
printLine("Comment found on row " + rows);
}
if (fields.Length > columns)
columns = fields.Length;
rows++;
}
printLine ("Rows: " + rows);
printLine ("Columns: " + columns);
printLine ("Errors on line: " + parser.ErrorLineNumber);
}
}

To ignore the commented lines you need to change your parser.CommentTokens statement to use new string[] as below
parser.CommentTokens = new string []{"#"};
Once you change that the comments will be ignored. There are 3 lines in the file that have a different number of columns then the 403 that all others have
I added the check below to determine when the number of fields is greater than 403(Line 159, 3310, and 3311 have 404 and 405 columns/fields)
if (fields.Length > 403)
{
Console.WriteLine($"Line:{lineNo} has {fields.Length}.");
}
With the above at least you can do some kind of checking/cleanup on those lines that have more than the number of expected fields

Related

TextFieldParser - retrieve line read by ReadFields

I am reading from text files and each row is supposed to have at least 9 fields. Some of the data has only 5 fields, so ReadFields() works and I get an exception when accessing fields[8]. I would prefer to throw a custom exception showing the line that was not complete.
TextFieldParse does not appear to have a property for the retrieving the line that ReadFields() just processed.
using (var parser = new TextFieldParser(filename))
{
parser.HasFieldsEnclosedInQuotes = true;
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
linenum++;
var fields = parser.ReadFields(); // fields 0...N
want to add exception here that messages back the 'short' line
if (fields.length < 10) {
rawline = ????
throw new Exception ("ERROR: " + filename
+ " not enough data at [" + rawline + "]"
);
}
normal processing
string name = fields[0];
double cost = Convert.ToDouble(fields[8]);
// ... add info to a list
}
}
One possibility would be to use a TextReader to read each line, and a new TextFieldParser for each line as a MemoryStream -- seems like too much
using (var reader = new StreamReader(filename))
{
var line = reader.ReadLine();
// new Parser and Stream for every line, bleah!
using (var parser = new TextFieldParser(
new MemoryStream(Encoding.ASCII.GetBytes(line))))
{
parser.HasFieldsEnclosedInQuotes = true;
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
var fields = parser.ReadFields();
if (fields.Length < 9)
{
throw new Exception("too few fields: " + line);
}
}
}
Are there other, more reasonable, approaches ?

What about string.Join?
while (!parser.EndOfData)
{
linenum++;
var fields = parser.ReadFields(); // fields 0...N
if (fields.length < 10)
{
var rawline = string.Join(",", fields);
throw new Exception ("ERROR: " + filename
+ " not enough data at [" + rawline + "]");
}
// ... rest of the code here
Notes:
I'm not sure how the parser returns fields surrounded with " - do they return with the " or without it, and do you even care about that?
Unless encountering such rows means you no longer need to process the rest of the file, I wouldn't throw an exception just yet. You might want to store a list of exceptions and throw a single aggregate exception at the end of the function if that list is not empty.

C#: Reading a variable structured CSV file into a datatable with a row counter

I am trying to develop a tool that will take a CSV file and import it into a datatable with the first column in the datatable being a row counter.
The CSV files are from different customers and so have different structures. Some have a header line; some have several header lines; some have no header line. They have also have varying columns.
So far, I have the code below.
public void Import_CSV()
{
OpenFileDialog dialog = new OpenFileDialog();
dialog.Filter = "CSV Files (*.csv)|*.csv";
bool? result = dialog.ShowDialog();
if (result ?? false)
{
string[] headers;
string CSVFilePathName = dialog.FileName;
string delimSelect = cboDelimiter.Items.GetItemAt(cboDelimiter.SelectedIndex).ToString();
// If user hasn't selected a delimiter, assume comma
if (delimSelect == "")
{
delimSelect = ",";
}
string[] delimiterType = new string[] {cboDelimiter.Items.GetItemAt(cboDelimiter.SelectedIndex).ToString()};
DataTable dt = new DataTable();
// Read first line of file to get number of fields and create columns and column numbers in data table
using (StreamReader sr1 = new StreamReader(CSVFilePathName))
{
headers = sr1.ReadLine().Split(delimiterType, StringSplitOptions.None);
//dt.Columns.Add("ROW", typeof(int));
//dt.Columns["ROW"].AutoIncrement = true;
//dt.Columns["ROW"].AutoIncrementSeed = 1;
//dt.Columns["ROW"].AutoIncrementStep = 1;
int colCount = 1;
foreach (string header in headers)
{
dt.Columns.Add("C" + colCount.ToString());
colCount++;
}
}
using (StreamReader sr = new StreamReader(CSVFilePathName))
{
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(delimiterType, StringSplitOptions.None);
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
dtGrid.ItemsSource = dt.DefaultView;
txtColCount.Text = dtGrid.Columns.Count.ToString();
txtRowCount.Text = dtGrid.Items.Count.ToString();
}
}
This works, in as much as it creates column headers (C1, C2....according to how many there are in the csv file) and then the rows are written in, but I want to add a column at the far left with a row number as the rows are added. In the code, you can see I've got a section commented out that creates an auto-number column, but I'm totally stuck on how the rows are written into the datatable. If I uncomment that section, I get errors as the first column in the csv file tries to write into an int field. I know you can specify which field in each row can go in which column, but that won't help here as the columns are unknown at this point. I just need it to be able to read ANY file in, regardless of the structure, but with the row counter.
Hope that makes sense.

You write in your question, that uncommenting the code that adds the first column leads to errors. This is because of your loop: it starts at 0, but the 0-th column is the one you have added manually. So you need just to skip it in your loop, starting at 1. However, the source array has to be processed from the 0-th element.
So the solution is:
First, uncomment the row adding code.
Then, in your loop, introduce an offset to leave the first column untouched:
for (int i = 0; i < headers.Length; i++)
{
dr[i + 1] = rows[i];
}

Trouble with parsing CSV files in C#

I'm trying to import a CSV file to my C# site and save it in the database. While doing research I learned about CSV parsing, I've tried to implement this but I've ran into some trouble. Here is a portion of my code so far:
string fileext = Path.GetExtension(fupcsv.PostedFile.FileName);
if (fileext == ".csv")
{
string csvPath = Server.MapPath("~/CSVFiles/") + Path.GetFileName(fupcsv.PostedFile.FileName);
fupcsv.SaveAs(csvPath);
// Add Columns to Datatable to bind data
DataTable dtCSV = new DataTable();
dtCSV.Columns.AddRange(new DataColumn[2] { new DataColumn("ModuleId", typeof(int)), new DataColumn("CourseId", typeof(int))});
// Read all the lines of the text file and close it.
string[] csvData = File.ReadAllLines(csvPath);
// iterate over each row and Split it to New line.
foreach (string row in csvData)
{
// Check for is null or empty row record
if (!string.IsNullOrEmpty(row))
{
using (TextFieldParser parser = new TextFieldParser(csvPath))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
int i = 1;
foreach (char cell in row)
{
dtCSV.NewRow()[i] = cell;
i++;
}
}
}
}
}
}
I keep getting the error "There is no row at position -1" at " dtCSV.Rows[dtCSV.Rows.Count - 1][i] = cell;"
Any help would be greatly appreciated, thanks

You are trying to index rows that you have not created. Instead of
dtCSV.Rows[dtCSV.Rows.Count - 1][i] = cell;
use
dtCSV.NewRow()[i] = cell;
I also suggest you start indexing i from 0 and not from 1.
All right so it turns out there were a bunch of errors with your code, so I made some edits.
string fileext = Path.GetExtension(fupcsv.PostedFile.FileName);
if (fileext == ".csv")
{
string csvPath = Server.MapPath("~/CSVFiles/") + Path.GetFileName(fupcsv.PostedFile.FileName);
fupcsv.SaveAs(csvPath);
DataTable dtCSV = new DataTable();
dtCSV.Columns.AddRange(new DataColumn[2] { new DataColumn("ModuleId", typeof(int)), new DataColumn("CourseId", typeof(int))});
var csvData = File.ReadAllLines(csvPath);
bool headersSkipped = false;
foreach (string line in csvData)
{
if (!headersSkipped)
{
headersSkipped = true;
continue;
}
// Check for is null or empty row record
if (!string.IsNullOrEmpty(line))
{
//Process row
int i = 0;
var row = dtCSV.NewRow();
foreach (var cell in line.Split(','))
{
row[i] = Int32.Parse(cell);
i++;
}
dtCSV.Rows.Add(row);
dtCSV.AcceptChanges();
}
}
}
I ditched the TextFieldParser solution solely because I'm not familiar with it, but if you want to stick with it, it shouldn't be hard to reintegrate it.
Here are some of the things you got wrong:
Not calling NewRow() to create a new row or adding it to the table with AddRow(row)
Iterating through the characters in row instead of the fields you parsed
Not parsing the value of cell - it's value type is string and you are trying to add to an int column
Some other things worth noting (just to improve your code's performance and readability :))
Consider using var when declaring new variables, it takes a lot of the stress away from having to worry about exactly what type of variable you are creating
As others in the comments said, use ReadAllLines() it parses your text file into lines neatly, making it easier to iterate through.
Most of the times when working with arrays or lists, you need to index from 0, not from 1
You have to use AcceptChanges() to commit all the changes you've made

Exporting table to CSV using LINQ

I'm having hard times exporting DB table in CSV file using LINQ. I've tried few things from related topics, but it was all way too long and I need a simpliest solution. There has to be something.
With this code is problem, that file is created, but empty. When I tried to debug, query is fine, there's everything I want to export. What am I doing wrong?
private void Save_Click(object sender, RoutedEventArgs e)
{
StreamWriter sw = new StreamWriter("test.csv");
DataDataContext db = new DataDataContext();
var query = from x in db.Zbozis
orderby x.Id
select x;
foreach (var something in query)
{
sw.WriteLine(something.ToString());
}
}
Edit: Ok, I tried all your suggestions, sadly with same result (CSV was created, but in it was 10x Lekarna.Zbozi (Name of project/db + name of table)).
So I used a method, that I've found (why reinventing a wheel, huh).
public string ConvertToCSV(IQueryable query, string replacementDelimiter)
{
// Create the csv by looping through each row and then each field in each row
// seperating the columns by commas
// String builder for our header row
StringBuilder header = new StringBuilder();
// Get the properties (aka columns) to set in the header row
PropertyInfo[] rowPropertyInfos = null;
rowPropertyInfos = query.ElementType.GetProperties();
// Setup header row
foreach (PropertyInfo info in rowPropertyInfos)
{
if (info.CanRead)
{
header.Append(info.Name + ",");
}
}
// New row
header.Append("\r\n");
// String builder for our data rows
StringBuilder data = new StringBuilder();
// Setup data rows
foreach (var myObject in query)
{
// Loop through fields in each row seperating each by commas and replacing
// any commas in each field name with replacement delimiter
foreach (PropertyInfo info in rowPropertyInfos)
{
if (info.CanRead)
{
// Get the fields value and then replace any commas with the replacement delimeter
string tmp = Convert.ToString(info.GetValue(myObject, null));
if (!String.IsNullOrEmpty(tmp))
{
tmp.Replace(",", replacementDelimiter);
}
data.Append(tmp + ",");
}
}
// New row
data.Append("\r\n");
}
// Check the data results... if they are empty then return an empty string
// otherwise append the data to the header
string result = data.ToString();
if (string.IsNullOrEmpty(result) == false)
{
header.Append(result);
return header.ToString();
}
else
{
return string.Empty;
}
}
So I have a modified version of previous code:
StreamWriter sw = new StreamWriter("pokus.csv");
ExportToCSV ex = new ExportToCSV();
var query = from x in db.Zbozis
orderby x.Id
select x;
string s = ex.ConvertToCSV(query,"; ");
sw.WriteLine(s);
sw.Flush();
Everything is fine, except it export every line in one column and does not separate it. See here -> http://i.stack.imgur.com/XSNK0.jpg
Question is obvious then, how to divide it into columns like I have in my DB?
Thanks

You are not closing the file. Either use "using"
using(StreamWriter sw = new StreamWriter("test.csv"))
{
..............
}
or simply try this
File.WriteAllLines("test.csv",query);

is there any way to insert data from text file to dataset?

i have text file that looks like this:
1 \t a
2 \t b
3 \t c
4 \t d
i have dataset: DataSet ZX = new DataSet();
is there any way for inserting the text file values to this dataset ?
thanks in advance

You will have to parse the file manually. Maybe like this:
string data = System.IO.File.ReadAllText("myfile.txt");
DataRow row = null;
DataSet ds = new DataSet();
DataTable tab = new DataTable();
tab.Columns.Add("First");
tab.Columns.Add("Second");
string[] rows = data.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string r in rows)
{
string[] columns = r.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
if (columns.Length <= tab.Columns.Count)
{
row = tab.NewRow();
for (int i = 0; i < columns.Length; i++)
row[i] = columns[i];
tab.Rows.Add(row);
}
}
ds.Tables.Add(tab);
UPDATE
If you don't know how many columns in the text file you can modify my original example as the following (assuming that the number of columns is constant for all rows):
// ...
string[] columns = r.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
if (tab.Columns.Count == 0)
{
for(int i = 0; i < columns.Length; i++)
tab.Columns.Add("Column" + (i + 1));
}
if (columns.Length <= tab.Columns.Count)
{
// ...
Also remove the initial creation of table columns:
// tab.Columns.Add("First");
// tab.Columns.Add("Second")
-- Pavel

Sure there is,
Define a DataTable, Add DataColumn with data types that you want,
ReadLine the file, split the values by tab, and add each value as a DataRow to DataTable by calling NewRow.
There is a nice sample code at MSDN, take a look and follow the steps

Yes, create data tabel on the fly, refer this article for how-to
Read your file line by line and add those value to your data table , refer this article for how-to read text file

Try this
private DataTable GetTextToTable(string path)
{
try
{
DataTable dataTable = new DataTable
{
Columns = {
{"MyID", typeof(int)},
"MyData"
},
TableName="MyTable"
};
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(path))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] words = line.Split(new string[] { "\\t" }, StringSplitOptions.RemoveEmptyEntries);
dataTable.Rows.Add(words[0], words[1]);
}
}
return dataTable;
}
catch (Exception e)
{
// Let the user know what went wrong.
throw new Exception(e.Message);
}
}
Call it like
GetTextToTable(Path.Combine(Server.MapPath("."), "TextFile.txt"));
You could also check out CSV File Imports in .NET

I'd like also to add to the "volpan" code the following :
String _source = System.IO.File.ReadAllText(FilePath, Encoding.GetEncoding(1253));
It's good to add the encoding of your text file, so you can be able to read the data and in my case export those after modification to another file.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

[C#]TextFieldParser not behaving as expected - c#

Related

TextFieldParser - retrieve line read by ReadFields

C#: Reading a variable structured CSV file into a datatable with a row counter

Trouble with parsing CSV files in C#

Exporting table to CSV using LINQ

is there any way to insert data from text file to dataset?

Categories

Resources