TextFieldParser - retrieve line read by ReadFields

TextFieldParser - retrieve line read by ReadFields - c#

I am reading from text files and each row is supposed to have at least 9 fields. Some of the data has only 5 fields, so ReadFields() works and I get an exception when accessing fields[8]. I would prefer to throw a custom exception showing the line that was not complete.
TextFieldParse does not appear to have a property for the retrieving the line that ReadFields() just processed.
using (var parser = new TextFieldParser(filename))
{
parser.HasFieldsEnclosedInQuotes = true;
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
linenum++;
var fields = parser.ReadFields(); // fields 0...N
want to add exception here that messages back the 'short' line
if (fields.length < 10) {
rawline = ????
throw new Exception ("ERROR: " + filename
+ " not enough data at [" + rawline + "]"
);
}
normal processing
string name = fields[0];
double cost = Convert.ToDouble(fields[8]);
// ... add info to a list
}
}
One possibility would be to use a TextReader to read each line, and a new TextFieldParser for each line as a MemoryStream -- seems like too much
using (var reader = new StreamReader(filename))
{
var line = reader.ReadLine();
// new Parser and Stream for every line, bleah!
using (var parser = new TextFieldParser(
new MemoryStream(Encoding.ASCII.GetBytes(line))))
{
parser.HasFieldsEnclosedInQuotes = true;
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
var fields = parser.ReadFields();
if (fields.Length < 9)
{
throw new Exception("too few fields: " + line);
}
}
}
Are there other, more reasonable, approaches ?

What about string.Join?
while (!parser.EndOfData)
{
linenum++;
var fields = parser.ReadFields(); // fields 0...N
if (fields.length < 10)
{
var rawline = string.Join(",", fields);
throw new Exception ("ERROR: " + filename
+ " not enough data at [" + rawline + "]");
}
// ... rest of the code here
Notes:
I'm not sure how the parser returns fields surrounded with " - do they return with the " or without it, and do you even care about that?
Unless encountering such rows means you no longer need to process the rest of the file, I wouldn't throw an exception just yet. You might want to store a list of exceptions and throw a single aggregate exception at the end of the function if that list is not empty.

Related

Read a text file until a line contains some string file, then again keep reading next lines until another string is encountered

I have to read a text file and if line contains ".engineering $name" then look for line which contains ".default" and do some operation with this line. I need to keep reading lines until I find ".default" in a set of lines. (This set is like, until I hit next ".engineering"). Loop continue like this again for next ".engineering $name"
Note:
".engineering" keyword is fixed string, $name reading dynamically,
".default" is fixed string,
I am able to do the first part that is reading line which contains ".engineering $name"
I am unable to get logic for next part, finding ".default" until it hits next ".engineering"
Looking for logic or code for this logic in C#. Thank you
Code:
using (var stream = new FileStream(path, FileMode.Open, FileAccess.Read))
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
string[] def_arr = null;
var line1 = reader.ReadLine();
if (line1.Contains(".engineering " + name + " ") && !reader.EndOfStream)
{
var nextLine = reader.ReadLine(); // nextLine contains ".default"
def_arr = nextLine.Split(' ');
def_val = def_arr[1].Replace("\"", "");
port_DefaultValues.Add(name + ", " + def_val);
}
}
}
var nextLine is the line containing ".default". I have coded like immidiate next line of finding ".engineering" is having ".default".But it is not always the case. ".default" can be in any line before it hits next ."engineering".
I hope the problem statement is clear.

Try this code -
using (var stream = new FileStream(path, FileMode.Open, FileAccess.Read))
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
string[] def_arr = null;
var line1 = reader.ReadLine();
if (line1.Contains(".engineering " + name + " ") && !reader.EndOfStream)
{
var nextLine = reader.ReadLine(); // nextLine contains ".default"
while (!nextLine.Contains(".default") && !reader.EndOfStream)
{
nextLine = reader.ReadLine();
}
def_arr = nextLine.Split(' ');
def_val = def_arr[1].Replace("\"", "");
port_DefaultValues.Add(name + ", " + def_val);
}
}
}
I have just added a loop that will keep reading the next line until it encounters .default. Keep in mind it will throw exception if that is not found in rest of the file.

C# - Split CSV File by Removing Bad Rows

I have a csv file with 2 million rows and file size of 2 GB. But due to a couple of free text form columns, these contain redundant CRLF and cause the file to not load in the SQL Server table. I get an error that the last column does not end with ".
I have the following code, but it gives an OutOfMemoryException when reading from fileName. The line is:
var lines = File.ReadAllLines(fileName);
How can I fix it? Ideally, I would like to split the file into two good and bad rows. Or delete rows that do not end with "CRLF.
int goodRow = 0;
int badRow = 0;
String badRowFileName = fileName.Substring(0, fileName.Length - 4) + "BadRow.csv";
String goodRowFileName = fileName.Substring(0, fileName.Length - 4) + "GoodRow.csv";
var charGood = "\"\"";
String lineOut = string.Empty;
String str = string.Empty;
var lines = File.ReadAllLines(fileName);
StringBuilder sbGood = new StringBuilder();
StringBuilder sbBad = new StringBuilder();
foreach (string line in lines)
{
if (line.Contains(charGood))
{
goodRow++;
sbGood.AppendLine(line);
}
else
{
badRow++;
sbBad.AppendLine(line);
}
}
if (badRow > 0)
{
File.WriteAllText(badRowFileName, sbBad.ToString());
}
if (goodRow > 0)
{
File.WriteAllText(goodRowFileName, sbGood.ToString());
}
sbGood.Clear();
sbBad.Clear();
msg = msg + "Good Rows - " + goodRow.ToString() + " Bad Rows - " + badRow.ToString() + " Done.";

You can translate that code like this to be much more efficient:
int goodRow = 0, badRow = 0;
String badRowFileName = fileName.Substring(0, fileName.Length - 4) + "BadRow.csv";
String goodRowFileName = fileName.Substring(0, fileName.Length - 4) + "GoodRow.csv";
var charGood = "\"\"";
using (var lines = File.ReadLines(fileName))
using (var swGood = new StreamWriter(goodRowFileName))
using (var swBad = new StreamWriter(badRowFileName))
{
foreach (string line in lines)
{
if (line.Contains(charGood))
{
goodRow++;
swGood.WriteLine(line);
}
else
{
badRow++;
swBad.WriteLine(line);
}
}
}
msg += $"Good Rows: {goodRow,9} Bad Rows: {badRow,9} Done.";
But I'd also look at using a real csv parser for this. There are plenty on NuGet. That might even let you clean up the data on the fly.

I would not suggest reading the entire file into memory, then processing the file, then writing all modified contents out to the new file.
Instead using file streams:
using (var rdr = new StreamReader(fileName))
using (var wrtrGood = new StreamWriter(goodRowFileName))
using (var wrtrBad = new StreamWriter(badRowFileName))
{
string line = null;
while ((line = rdr.ReadLine()) != null)
{
if (line.Contains(charGood))
{
goodRow++;
wrtr.WriteLine(line);
}
else
{
badRow++;
wrtrBad.WriteLine(line);
}
}
}

[C#]TextFieldParser not behaving as expected

I am trying to read the csv you can download from here: https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=planets . Just click on "Download Table" and select CSV, all columns, all rows.
The code has some problems:
How to recognize the comment? I expect the class to simply skip them and do not put them in the fields variables. But they are.
Why the number of columns is wrong? They are 403 and instead it find 405. According to pandas (Python3) they are 403. In fact when I try to use TextFieldParser for more complicated operations on this csv I get some errors like OutOfBoundary related to the index of the array (of course, columns are 403 but it though they are 405).
Code:
private void loadData(string fileName) {
int rows = 0;
int columns = 0;
using (TextFieldParser parser = new TextFieldParser(fileName, Encoding.UTF8))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
parser.CommentTokens = new []{"#"};
parser.TrimWhiteSpace = false;
parser.HasFieldsEnclosedInQuotes = false;
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
if (fields.Length == 0) {
//Should be a commment
printLine("Comment found on row " + rows);
}
if (fields.Length > columns)
columns = fields.Length;
rows++;
}
printLine ("Rows: " + rows);
printLine ("Columns: " + columns);
printLine ("Errors on line: " + parser.ErrorLineNumber);
}
}

To ignore the commented lines you need to change your parser.CommentTokens statement to use new string[] as below
parser.CommentTokens = new string []{"#"};
Once you change that the comments will be ignored. There are 3 lines in the file that have a different number of columns then the 403 that all others have
I added the check below to determine when the number of fields is greater than 403(Line 159, 3310, and 3311 have 404 and 405 columns/fields)
if (fields.Length > 403)
{
Console.WriteLine($"Line:{lineNo} has {fields.Length}.");
}
With the above at least you can do some kind of checking/cleanup on those lines that have more than the number of expected fields

Using a CSV parser, I am trying to select multiple items from the CSV. It won't let me choose more than two however

Like I said, I'm using a CSV parser which can be found here.The CSV Parser
I am able to successfully run the code using this
using (var stream = File.OpenRead(fileSaveLocation))
using (var reader = new StreamReader(stream))
{
var data = CsvParser.ParseHeadAndTail(reader, ',', '"');
var header2 = data.Item1;
var lines = data.Item2;
foreach (var line in lines.Take(5))
{
for (var i = 0; i < header2.Count; i++)
if (!string.IsNullOrEmpty(line[i]))
{
sb.Append(header2[i] + "=" + line[i]);
sb.Append(Environment.NewLine);
}
}
}
But I want to be able to select about 10 items. So if I try to add a new variable like:
var test = data.Item3;
It won't work.
When I do try to run it, it tells me this:
Error 1 'System.Tuple,System.Collections.Generic.IEnumerable>>'
does not contain a definition for 'Item3' and no extension method
'Item3' accepting a first argument of type
'System.Tuple,System.Collections.Generic.IEnumerable>>'
could be found (are you missing a using directive or an assembly
reference?) C:\repo\Scriptalizer\default.aspx.cs 82 37 Scriptalizer(1)
It will throw an error before I ever try to run the program. It says it cannot resolve Item3. How can I get it to let me put as many columns as I want?
Also, is there a way to dynamically select items? Say the user can input "ignore the first 3 lines" for example, how could I declare these and get the correct columns?

Haven't used this CSV parsing library in particular, but it sounds like you need the following:
using (var stream = File.OpenRead(fileSaveLocation))
{
using (var reader = new StreamReader(stream))
{
// Get the header and rows as a two-item tuple
var data = CsvParser.ParseHeadAndTail(reader, ',', '"');
// Get header and and rows into separate variables
var header2 = data.Item1;
var lines = data.Item2;
// This is where you get the rows you want
// So in this example, we'll skip the first
// 3 lines and then get the next 10 lines
var filteredLines = lines.Skip(3).Take(10);
// Iterate through the lines and do whatever you need to do
foreach (var line in filteredLines)
{
for (var i = 0; i < header2.Count; i++)
{
if (!string.IsNullOrEmpty(line[i]))
{
sb.Append(header2[i] + "=" + line[i]);
sb.Append(Environment.NewLine);
}
}
}
}
}

This is what I ended up doing which worked.
using (var stream = File.OpenRead(fileSaveLocation))
using (var reader = new StreamReader(stream))
{
var data = CsvParser.ParseHeadAndTail(reader, ',', '"');
var header2 = data.Item1;
var lines = data.Item2;
for (int i = 4; i < header2.Count; i++)
sb.Append(("\"" + header2[i] + "\"" + ","));
sb.Append(sb.ToString().TrimEnd(','));
sb.Append(Environment.NewLine);
foreach (var line in lines)
{
for (var i = 4; i < header2.Count; i++)
{
sb.Append("\"" + line[i] + "\"");
if (i != header2.Count - 1)
{
sb.Append(",");
}
}
sb.Append(Environment.NewLine);
}
}

Copying CSV file while reordering/adding empty columns

Copying CSV file while reordering/adding empty columns.
For example if ever line of incoming file has values for 3 out of 10 columns in order different from output like (except first which is header with column names):
col2,col6,col4 // first line - column names
2, 5, 8 // subsequent lines - values for 3 columns
and output expected to have
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
then output should be "" for col0,col1,col3,col5,col7,col8,col9,and values from col2,col4,col4 in the input file. So for the shown second line (2,5,8) expected output is ",,2,,5,,8,,,,,"
Below code I've tried and it is slower than I want.
I have two lists.
The first list filecolumnnames is created by splitting a delimited string (line) and this list gets recreated for every line in the file.
The second list list has the order in which the first list needs to be rearranged and re concatenated.
This works
string fileName = "F:\\temp.csv";
//file data has first row col3,col2,col1,col0;
//second row: 4,3,2,1
//so on
string fileName_recreated = "F:\\temp_1.csv";
int count = 0;
const Int32 BufferSize = 1028;
using (var fileStream = File.OpenRead(fileName))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
{
String line;
List<int> list = new List<int>();
string orderedcolumns = "\"\"";
string tableheader = "col0,col1,col2,col3,col4,col5,col6,col7,col8,col9,col10";
List<string> tablecolumnnames = new List<string>();
List<string> filecolumnnames = new List<string>();
while ((line = streamReader.ReadLine()) != null)
{
count = count + 1;
StringBuilder sb = new StringBuilder("");
tablecolumnnames = tableheader.Split(',').ToList();
if (count == 1)
{
string fileheader = line;
//fileheader=""col2,col1,col0"
filecolumnnames = fileheader.Split(',').ToList();
foreach (string col in tablecolumnnames)
{
int index = filecolumnnames.IndexOf(col);
if (index == -1)
{
sb.Append(",");
// orderedcolumns=orderedcolumns+"+\",\"";
list.Add(-1);
}
else
{
sb.Append(filecolumnnames[index] + ",");
//orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\"";
list.Add(index);
}
// MessageBox.Show(orderedcolumns);
}
}
else
{
filecolumnnames = line.Split(',').ToList();
foreach (int items in list)
{
//MessageBox.Show(items.ToString());
if (items == -1)
{
sb.Append(",");
}
else
{
sb.Append(filecolumnnames[items] + ",");
}
}
//expected format sb.Append(filecolumnnames[3] + "," + filecolumnnames[2] + "," + filecolumnnames[2] + ",");
//sb.Append(orderedcolumns);
var result = String.Join (", ", list.Select(index => filecolumnnames[index]));
}
using (FileStream fs = new FileStream(fileName_recreated, FileMode.Append, FileAccess.Write))
using (StreamWriter sw = new StreamWriter(fs))
{
sw.WriteLine(sb.ToString());
}
}
I am trying to make it faster by constructing a string orderedcolumns and remove the second for each loop which happens for every row and replace it with constructed string.
so if you uncomment the orderedcolumns string construction orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\""; and uncomment the append sb.Append(orderedcolumns); I am expecting the value inside the constructed string but when I append the orderedcolumns it is appending the text i.e.
""+","+filecolumnnames[3]+","+filecolumnnames[2]+","+filecolumnnames[1]+","+filecolumnnames[0]+","+","+","+","+","+","+","
i.e. I instead want it to take the value inside the filecolumnnames[3] list and not the filecolumnnames[3] name itself.
Expected value: if that line has 1,2,3,4
I want the output to be 4,3,2,1 as filecolumnnames[3] will have 4, filecolumnnames[2] will have 3..

String.Join is the way to construct comma/space delimited strings from sequence.
var result = String.Join (", ", list.Select(index => filecolumnnames[index]);
Since you are reading only subset of columns and orders in input and output don't match I'd use dictionary to hold each row of input.
var row = tablecolumnnames
.Zip(line.Split(','), (Name,Value)=> new {Name,Value})
.ToDictionary(x => x.Name, x.Value);
For output I'd fill sequence from defaults or input row:
var outputLine = String.Join(",",
filecolumnnames
.Select(name => row.ContainsKey(name) ? row[name] : ""));
Note code is typed in and not compiled.

orderedcolumns = orderedcolumns+ "+filecolumnnames["+index+"]" + "+\",\""; "
should be
orderedcolumns = orderedcolumns+ filecolumnnames[index] + ",";
you should however use join as others have pointed out. Or
orderedcolumns.AppendFormat("{0},", filecolumnnames[index]);
you will have to deal with the extra ',' on the end

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

TextFieldParser - retrieve line read by ReadFields - c#

Related

Read a text file until a line contains some string file, then again keep reading next lines until another string is encountered

C# - Split CSV File by Removing Bad Rows

[C#]TextFieldParser not behaving as expected

Using a CSV parser, I am trying to select multiple items from the CSV. It won't let me choose more than two however

Copying CSV file while reordering/adding empty columns

Categories

Resources