Get only first column values from CSV rows using CSVHelper - c#

I am trying to parse a CSV file and extract the first string in each line, or the first column as it is laid out in MS Excel.
I am using CsvParser with the Read() method returning a string[] to the row variable.
The problem is, it is returning every single value, so my out looks like this for each line:
20070330 00:00 // This is the value I want to reference
0.9312
0.9352
0.9298
0.9343
How can I reference only the values at these positions in the file without putting in a counter to skip the interim values?
using (TextReader reader = File.OpenText(folder))
{
var datesInCsv = new List<string>();
var parsedCsv = new CsvParser(reader);
while (true)
{
var row = parsedCsv.Read();
if (row.IsNullOrEmpty())
{
break;
}
foreach (var date in row)
{
Console.WriteLine(date);
}
Console.ReadLine();
}
}

The CsvHelper library you are using requires you pass a CsvConfiguration instance when you build the instance of a CsvParse class. This is important to let the library understand and correctly parse your line
The most important thing to give is the value for the Delimiter property. In other words the character used to separate one field from the next one.
You should look at your CSV file and put the appropriate value below.
(For this example I have used the semicolon but change it according to your file)
Looking at you code I think also that the configuration SkipEmptyRecords could be used to simplify the code.
using (TextReader reader = File.OpenText(folder))
{
var datesInCsv = new List<string>();
CsvConfiguration config = new CsvConfiguration();
config.Delimiter = ";";
config.SkipEmptyRecords = true;
var parsedCsv = new CsvParser(reader, config);
string[] row = null;
while ((row = parsedCsv.Read()) != null)
{
// This IsNullOrEmpty doesn't exist according
// to Intellisense and the compiler.
// if(row.IsNullOrEmpty)
// At this point the row is an array of strings where the first
// element is the value you are searching for. No need of loops
Console.WriteLine(row[0]);
Console.ReadLine();
}
}
The Read method of the CsvParser instance returns an array of strings after splitting the input line accordingly to your delimiter. At this point you just need to reference the first element of the array without any loop.

You are explicitly printing every value to the console here:
foreach (var date in row)
{
Console.WriteLine(date);
}
This foreach loop iterates over all elemtens of the current row and print them.
Replace the loop with this, which will only print the first element (index 0):
Console.WriteLine(row[0]);
Of course, this can fail if the line was empty, so you need to check for that, too.

You need to "read" the records and pass the argument of the "mapping" you want. Either by index or name.
See here at each section detailed above:
https://joshclose.github.io/CsvHelper/

Related

Unable to split List content with space as delimiter - C#

I have just started to learn C# for one of my projects, in which i should be able to read all the data from a text file and then i should be able to compare the imported data with the default data structure of the file. So far i have been able to do a little bit of stuff, however i am stuck at splitting the imported data in a list with space as delimiter so that i can try to compare it with the default data which i am planning to put in a default data list.
The structure of the file(File1) to be imported(or the user provided) is as follows:-
%emp_first_name% = xxxxxxxx %emp_middle_name% = xxxxxxxx %emp_last_name% = xxxxxxxx;
%emp_age% = nn;
%emp_dept.% = xxxxxxxx;
%emp_joining_date% = xx-xx-xxxx;
the default structure of the file(File2) is:-
%emp_first_name% = xxxxxxxx %emp_middle_name% = xxxxxxxx %emp_last_name% = xxxxxxxx;
%emp_age% = nn;
%emp_total_exp% = xx;
%emp_grade% = x;
%emp_dept.% = xxxxxxxx;
%emp_joining_date% = xx-xx-xxxx;
after reading the File1 in a list, i am unable to split it using space as a delimiter, this is what i am doing to read the File1 into a list.
public static void readFinL(string filename)
{
string readAllLines = File.ReadAllText(filename);
List<string> list = new List<string>();
list.Add(readAllLines);
foreach (string d in list)
{
var f = d.Split(',');
Console.WriteLine(f.GetValue(0));
}
}
what am i not doing or what is it that i am doing incorrectly with this method to read the file in a list. I am passing the data in a list since i should be able to compare File1 with File2 to check which row is missing in File1. Any pointer in correct direction will be helpful.
First of all, d.Split(',') is splitting with the comma. Use var f = d.Split(' ') instead.
If I'm not wrong, File.ReadAllText return a single string. Your list only have one element by this way.
string[] lines = File.ReadAllLines("path to the file")
Should do the work.

Check if list contains a string that matches closely

I'm trying to figure out the most efficient way to implement the following scenario:
I have a list like this:
public static IEnumerable<string> ValidTags = new List<string> {
"ABC.XYZ",
"PQR.SUB.UID",
"PQR.ALI.OBD",
};
I have a huge CSV with multiple columns. One of the column is tags. This column either contains blank values, or one of the above values. The problem is, the tag column may contain values like "ABC.XYZ?#" i.e. the valid tags plus some extraneous characters. I need to update such columns with the valid tag, since they "closely match" one of our valid tags.
Example:
if the CSV contains PQR.ALI.OBD? update it with the valid tag PQR.ALI.OBD
if the CSV contains PQR.ALI.OBA, this is invalid, just add suffix invalid and update it PQR.ALI.OBA-invalid.
I'm trying to figure out the best possible way to do this.
My current approach is:
Iterate through each column in CSV, get the tagValue
Now check if our tagValue contains any of the string from list
If it contains but is not exactly the same, update it with the value it contains.
If it doesnt "contain" any value from the list, add suffix-invalid.
Is there any better/more efficient way to do this?
Update:
The list has only 5 items, I have shown three here.
The extra chars are only at the end, and that's happening because people are editing those CSVs in excel web version and that messes up some entries.
My current code: (I'm sure there is a better way to do this, also new at C# so please tell me how I can improve this). I'm using CSVHelper to get CSV cells.
var record = csv.GetRecord<Record>();
string tag = csv.GetField(10); //tag column number in CSV is 10
/* Criteria for validation:
* tag matches our list, but has extraneous chars - strip extraneous chars and update csv
* tag doesn't match our list - add suffix invalid.*/
int listIndex = 0;
bool valid;
foreach (var validTags in ValidTags) //ValidTags is the enum above
{
if (validTags.Contains(tag.ToUpper()) && !string.Equals(validTags, subjectIdentifier.ToUpper()))
{
valid = true;
continue; //move on to next csv row.
//this means that tag is valid but has some extra characters appended to it because of web excel, strip extra charts
}
listIndex++;
if(listIndex == 3 && !valid) {
//means we have reached the end of the list but not found valid tag
//add suffix invalid and move on to next csv row
}
}
Since you say that the extra characters are only at the end, and assuming that the original tag is still present before the extra characters, you could just search the list for each tag to see if the tag contains an entry from the list. If it does, then update it to the correct entry if it's not an exact match, and if it doesn't, append the "-invalid" tag to it.
Before doing this, we may need to first sort the list Descending so that when we're searching we find the closest (longest) match (in a case where one item in the list begins with another item in the list).
var csvPath = #"f:\public\temp\temp.csv";
var entriesUpdated = 0;
// Order the list so we match on the most similar match (ABC.DEF before ABC)
var orderedTags = ValidTags.OrderByDescending(t => t);
var newFileLines = new List<string>();
// Read each line in the file
foreach (var csvLine in File.ReadLines(csvPath))
{
// Get the columns
var columns = csvLine.Split(',');
// Process each column
for (int index = 0; index < columns.Length; index++)
{
var column = columns[index];
switch (index)
{
case 0: // tag column
var correctTag = orderedTags.FirstOrDefault(tag =>
column.IndexOf(tag, StringComparison.OrdinalIgnoreCase) > -1);
if (correctTag != null)
{
// This item contains a correct tag, so
// update it if it's not an exact match
if (column != correctTag)
{
columns[index] = correctTag;
entriesUpdated++;
}
}
else
{
// This column does not contain a correct tag, so mark it as invalid
columns[index] += "-invalid";
entriesUpdated++;
}
break;
// Other cases for other columns follow if needed
}
}
newFileLines.Add(string.Join(",", columns));
}
// Write the new lines if any were changed
if (entriesUpdated > 0) File.WriteAllLines(csvPath, newFileLines);

C# so I need to split out a string, I think

so I have this application that I have inherited from someone that is long gone. The gist of the application is that it reads in a .cvs file that has about 5800 lines in it, copies it over to another .cvs, which it creates new each time, after striping out a few things , #, ', &. Well everything works great, or it has until about a month ago. so I started checking into it, and what I have found so far is that there are about 131 items missing from the spreadsheet. Now I read someplace that the maximun amount of data a string can hold is over 1,000,000,000 chars, and my spreadsheet is way under that, around 800,000 chars, but the only thing I can think is doing it is the string object.
So anyway, here is the code in question, this piece appears
to both read in from the existing field, and output to the new file:
StreamReader s = new StreamReader(File);
//Read the rest of the data in the file.
string AllData = s.ReadToEnd();
//Split off each row at the Carriage Return/Line Feed
//Default line ending in most windows exports.
//You may have to edit this to match your particular file.
//This will work for Excel, Access, etc. default exports.
string[] rows = AllData.Split("\r\n".ToCharArray(), System.StringSplitOptions.RemoveEmptyEntries);
//Now add each row to the DataSet
foreach (string r in rows)
{
//Split the row at the delimiter.
string[] items = r.Split(delimiter.ToCharArray());
//Add the item
result.Rows.Add(items);
}
If anyone can help me I would really appreciate it. I either need to figure out how to split the data better, or I need to figure out why it is cutting out the last 131 lines from the existing excel file to the new excel file.
One easier way to do this, since you're using "\r\n" for lines, would be to just use the built-in line reading method: File.ReadLines(path)
foreach(var line in File.ReadLines(path))
{
var items = line.Split(',');
result.Rows.Add(items);
}
You may want to check out the TextFieldParser class, which is part of the Microsoft.VisualBasic.FileIO namespace (yes, you can use this with C# code)
Something along the lines of:
using(var reader = new TextFieldParser("c:\\path\\to\\file"))
{
//configure for a delimited file
reader.TextFieldType = FieldType.Delimited;
//configure the delimiter character (comma)
reader.Delimiters = new[] { "," };
while(!reader.EndOfData)
{
string[] row = reader.ReadFields();
//do stuff
}
}
This class can help with some of the issues of splitting a line into its fields, when the field may contain the delimiter.

How to format and read CSV file?

Here is just an example of the data I need to format.
The first column is simple, the problem the second column.
What would be the best approach to format multiple data fields in one column?
How to parse this data?
Important*: The second column needs to contain multiple values, like in an example below
Name Details
Alex Age:25
Height:6
Hair:Brown
Eyes:Hazel
A csv should probably look like this:
Name,Age,Height,Hair,Eyes
Alex,25,6,Brown,Hazel
Each cell should be separated by exactly one comma from its neighbor.
You can reformat it as such by using a simple regex which replaces certain newline and non-newline whitespace with commas (you can easily find each block because it has values in both columns).
A CSV file is normally defined using commas as field separators and CR for a row separator. You are using CR within your second column, this will cause problems. You'll need to reformat your second column to use some other form of separator between multiple values. A common alternate separator is the | (pipe) character.
Your format would then look like:
Alex,Age:25|Height:6|Hair:Brown|Eyes:Hazel
In your parsing, you would first parse the comma separated fields (which would return two values), and then parse the second field as pipe separated.
This is an interesting one - it can be quite difficult to parse specific format files which is why people often write specific classes to deal with them. More conventional file formats like CSV, or other delimited formats are [more] easy to read because they are formatted in a similar way.
A problem like the above can be addressed in the following way:
1) What should the output look like?
In your instance, and this is just a guess, but I believe you are aiming for the following:
Name, Age, Height, Hair, Eyes
Alex, 25, 6, Brown, Hazel
In which case, you have to parse out this information based on the structure above. If it's repeated blocks of text like the above then we can say the following:
a. Every person is in a block starting with Name Details
b. The name value is the first text after Details, with the other columns being delimited in the format Column:Value
However, you might also have sections with addtional attributes, or attributes that are missing if the original input was optional, so tracking the column and ordinal would be useful too.
So one approach might look like the following:
public void ParseFile(){
String currentLine;
bool newSection = false;
//Store the column names and ordinal position here.
List<String> nameOrdinals = new List<String>();
nameOrdinals.Add("Name"); //IndexOf == 0
Dictionary<Int32, List<String>> nameValues = new Dictionary<Int32 ,List<string>>(); //Use this to store each person's details
Int32 rowNumber = 0;
using (TextReader reader = File.OpenText("D:\\temp\\test.txt"))
{
while ((currentLine = reader.ReadLine()) != null) //This will read the file one row at a time until there are no more rows to read
{
string[] lineSegments = currentLine.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
if (lineSegments.Length == 2 && String.Compare(lineSegments[0], "Name", StringComparison.InvariantCultureIgnoreCase) == 0
&& String.Compare(lineSegments[1], "Details", StringComparison.InvariantCultureIgnoreCase) == 0) //Looking for a Name Details Line - Start of a new section
{
rowNumber++;
newSection = true;
continue;
}
if (newSection && lineSegments.Length > 1) //We can start adding a new person's details - we know that
{
nameValues.Add(rowNumber, new List<String>());
nameValues[rowNumber].Insert(nameOrdinals.IndexOf("Name"), lineSegments[0]);
//Get the first column:value item
ParseColonSeparatedItem(lineSegments[1], nameOrdinals, nameValues, rowNumber);
newSection = false;
continue;
}
if (lineSegments.Length > 0 && lineSegments[0] != String.Empty) //Ignore empty lines
{
ParseColonSeparatedItem(lineSegments[0], nameOrdinals, nameValues, rowNumber);
}
}
}
//At this point we should have collected a big list of items. We can then write out the CSV. We can use a StringBuilder for now, although your requirements will
//be dependent upon how big the source files are.
//Write out the columns
StringBuilder builder = new StringBuilder();
for (int i = 0; i < nameOrdinals.Count; i++)
{
if(i == nameOrdinals.Count - 1)
{
builder.Append(nameOrdinals[i]);
}
else
{
builder.AppendFormat("{0},", nameOrdinals[i]);
}
}
builder.Append(Environment.NewLine);
foreach (int key in nameValues.Keys)
{
List<String> values = nameValues[key];
for (int i = 0; i < values.Count; i++)
{
if (i == values.Count - 1)
{
builder.Append(values[i]);
}
else
{
builder.AppendFormat("{0},", values[i]);
}
}
builder.Append(Environment.NewLine);
}
//At this point you now have a StringBuilder containing the CSV data you can write to a file or similar
}
private void ParseColonSeparatedItem(string textToSeparate, List<String> columns, Dictionary<Int32, List<String>> outputStorage, int outputKey)
{
if (String.IsNullOrWhiteSpace(textToSeparate)) { return; }
string[] colVals = textToSeparate.Split(new[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
List<String> outputValues = outputStorage[outputKey];
if (!columns.Contains(colVals[0]))
{
//Add the column to the list of expected columns. The index of the column determines it's index in the output
columns.Add(colVals[0]);
}
if (outputValues.Count < columns.Count)
{
outputValues.Add(colVals[1]);
}
else
{
outputStorage[outputKey].Insert(columns.IndexOf(colVals[0]), colVals[1]); //We append the value to the list at the place where the column index expects it to be. That way we can miss values in certain sections yet still have the expected output
}
}
After running this against your file, the string builder contains:
"Name,Age,Height,Hair,Eyes\r\nAlex,25,6,Brown,Hazel\r\n"
Which matches the above (\r\n is effectively the Windows new line marker)
This approach demonstrates how a custom parser might work - it's purposefully over verbose as there is plenty of refactoring that could take place here, and is just an example.
Improvements would include:
1) This function assumes there are no spaces in the actual text items themselves. This is a pretty big assumption and, if wrong, would require a different approach to parsing out the line segments. However, this only needs to change in one place - as you read a line at a time, you could apply a reg ex, or just read in characters and assume that everything after the first "column:" section is a value, for example.
2) No exception handling
3) Text output is not quoted. You could test each value to see if it's a date or number - if not, wrap it in quotes as then other programs (like Excel) will attempt to preserve the underlying datatypes more effectively.
4) Assumes no column names are repeated. If they are, then you have to check if a column item has already been added, and then create an ColName2 column in the parsing section.

Search for a keyword in C# and output the line as a string

How would it be possible to search for a string e.g. #Test1 in a text file and then output the line below it as a string e.g.
Test.txt
#Test1
86/100
#Test2
99/100
#Test3
13/100
so if #Test2 was the search keyword "99/200" would be turned into a string
Parse the file once, store the results in a dictionary. Then lookup in the dictionary.
var dictionary = new Dictionary<string, string>();
var lines = File.ReadLines("testScores.txt");
var e = lines.GetEnumerator();
while(e.MoveNext()) {
if(e.Current.StartsWith("#Test")) {
string test = e.Current;
if(e.MoveNext()) {
dictionary.Add(test, e.Current);
}
else {
throw new Exception("File not in expected format.");
}
}
}
Now you can just say
Console.WriteLine(dictionary["#Test1"]);
etc.
Also, long-term, I recommend moving to a database.
Use readline and search for the string (ex. #Test1) and then use the next line as input.
If the exactly above is the file format. Then you can use this
1. read all lines till eof in an array.
2. now run a loop and check if the string[] is not empty.
Hold the value in some other array or list.
now you have items one after one. so whenever you use loop and use [i][i+1],
it will give you the test number and score.
Hope this might help.
How about RegularExpressions? here's a good example
This should do it for you:
int lineCounter = 0;
StreamReader strReader = new StreamReader(path);
while (!strReader.EndOfStream)
{
string fileLine = strReader.ReadLine();
if (Regex.IsMatch(fileLine,pattern))
{
Console.WriteLine(pattern + "found in line " +lineCounter.ToString());
}
lineCounter++;
}

Categories