CsvHelper Parser.Read() not splitting columns

CsvHelper Parser.Read() not splitting columns - c#

For some to me unknown reason the csvHelper.parser.read() method returns a string array with only one entry containing the entire row.
The csv-file looks like this:
Name;Vorname;Alter
Petersen;Peter;18
Heinzen;Heinz;19
The code is like this:
using (CsvReader reader = new CsvReader(new StreamReader(path, Encoding.Default)))
{
String[] cells = reader.Parser.Read();
// cells = {"Name;Vorname;Alter"} (length = 1)
}
What am I doing wrong, or how do I get it to output an array of strings with three entries?
Edit:
CsvHelper: https://joshclose.github.io/CsvHelper/
expected result:
cells = {"Name", "Vorname", "Alter"} (length = 3)

Well, i feel stupid now...
Change the reader.Configuration.Delimiter = ";";
Thanks Benjamin Podszun for getting me on the right track

Related

How to turn a string into a 2d string array

as the title suggests, I am looking for guidance in how to turn a string (csvData) into a 2D string array by splitting it two times with ';' and ',' respectivly.
Currently I am at the stage where I am able to split it once into rows and turn it into an array, but I cannot figure out how to instead create a 2D array where the columns divided by ',' are also separate.
string[] Sep = csvData.Split(';').Select(csvData => csvData.Replace(" ","")).Where(csvData => !string.IsNullOrEmpty(csvData)).ToArray();
I have tried various things like :
string[,] Sep = csvData.Split(';',',').Select(csvData => csvData.Replace(" ","")).Where(csvData => !string.IsNullOrEmpty(csvData)).ToArray();
naivly thinking that c# would understand what I tried to achieve, but since I am here it's obvious that I got the error that "cannot implicitly convert type string[] to string [*,*]"
Note that I have not coded for a while, so if my thinking is completely wrong and you do not understand what I am trying to convey with this question, I apologize in advance.
Thanks!

In a strongly-typed language like C#, the compiler makes no assumptions about what you intend to do with your data. You must make your intent explicit through your code. Something like this should work:
string csvData = "A,B;C,D";
string[][] sep = csvData.Split(';') // Returns string[] {"A,B","C,D"}
.Select(str => str.Split(',')) // Returns IEnumerable<string[]> {{"A","B"},{"C","D"}}
.ToArray(); // Returns string[][] {{"A","B"},{"C","D"}}

Rows are separated by semicolon, columns by comma?
Splitting by ';' gives you an array of rows. Split a row by ',' gives you an array of values.
If your data has a consistent schema, as in each csv you process has the same columns, you could define a class to represent the entity to make the data easier to with with.
Let's say it's customer data:
John,Smith,8675309,johnsmith#gmail.com;
You could make a class with those properties:
public class Customer
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string Phone { get; set; }
public string Email { get; set; }
}
Then:
var rows = csvdata.Split(';');
List<Customer> customers = new();
foreach(var row in rows)
{
var customer = row.Split(',');
customers.Add(new()
{
FirstName = row[0],
LastName = row[1],
Phone = row[2],
Email = row[3]
});
}
Now you have a list of customers to do whatever it is you do with customers.

Here is an answer to present a few alternative ideas and things you can do with C# - more for educational/academic purposes than anything else. These days to consume a CSV we'd use a CSV library
If your data is definitely regularly formed you can get away with just one Split. The following code splits on either char to make one long array. It then stands to reason that every 4 elements is a new customer, the data of the customer being given by n+0, n+1, n+2 and n+3. Because we know how many data items we will consume, dividing it by 4 gives us the number of customers so we can presize our 2D array
var bits = data.Split(';',',');
var twoD = new string[bits.Length/4,4];
for(int x = 0; x < bits.Length; x+=4){
twoD[x/4,0] = bits[x+0];
twoD[x/4,1] = bits[x+1];
twoD[x/4,2] = bits[x+2];
twoD[x/4,3] = bits[x+3];
}
I don't think I'd use 2D arrays though - and I commend the other answer advising to create a class to hold the related data; you can use this same technique
var custs = new List<Customer>();
for(int x = 0; x < bits.Length;){
custs.Add(new()
{
FirstName = bits[x++],
LastName = bits[x++],
Phone = bits[x++],
Email = bits[x++]
});
}
Here we aren't incrementing x in the loop header; every time a bit of info is assigned x is bumped up by 1 in the loop body. We could have kept the same approach as before, jumping it by 4 - just demoing another approach that lends itself well here.
I mentioned that these days we probably wouldn't really read a csv manually and split ourselves - what if the data contains a comma, or a semicolon - it wrecks the file structure
There are a boatload of libraries that read CSV files, CsvHelper is a popular one, and you'd use it like:
using var reader = new StreamReader("path\\to\\file.csv");
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture)
var custs = csv.GetRecords<Customer>().ToList();
...
Your file would have a header line with column names that match your property names in c#. If it doesn't then you can use attributes on the properties to tell CsvH what column should be mapped to what property - https://joshclose.github.io/CsvHelper/getting-started/

Here's the simplest way I know to produce a 2d array by splitting a string.
string csvData = "A,B,C;D,E,F,G";
var temporary =
csvData
.Split(';')
.SelectMany((xs, i) => xs.Split(',').Select((x, j) => new { x, i, j }))
.ToArray();
int max_i = temporary.Max(x => x.i);
int max_j = temporary.Max(x => x.j);
string[,] array = new string[max_i + 1, max_j + 1];
foreach (var t in temporary)
{
array[t.i, t.j] = t.x;
}
I purposely chose csvData to be missing a value.
temporary is this:
And the final array is this:

C# - LinQ - Read text files, group by first column and order by last column?

hi have a text files that contains 3 columns something like this:
contract1;pdf1;63
contract1;pdf2;5
contract1;pdf3;2
contract1;pdf4;00
contract2;pdf1;2
contract2;pdf2;30
contract2;pdf3;5
contract2;pdf4;80
now, i want to write those information into another text files ,and the output will be order put for first the records with the last column in "2,5", something like this:
contract1;pdf3;2
contract1;pdf2;5
contract1;pdf1;63
contract1;pdf4;00
contract2;pdf1;2
contract2;pdf3;5
contract2;pdf2;30
contract2;pdf4;80
how can i do?
thanks

You can use LINQ to group and sort the lines after reading, then put them back together:
var output = File.ReadAllLines(#"path-to-file")
.Select(s => s.Split(';'))
.GroupBy(s => s[0])
.SelectMany(sg => sg.OrderBy(s => s[2] == "2" ? "-" : s[2] == "5" ? "+" : s[2]).Select(sg => String.Join(";", sg)));
Then just write them to a file.

I'm not going to write your program for you, but I would recommend this library for reading and writing delimited files:
https://joshclose.github.io/CsvHelper/getting-started/
When you new up the reader make sure to specify your semi-colon delimiter:
using (var reader = new StreamReader("path\\to\\input_file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = ";";
var records = csv.GetRecords<Row>();
// manipulate the data as needed here
}
Your "Row" class (choose a more appropriate name for clarity) will specify the schema of the flat file. It sounds like you don't have headers? If not, you can specify the Order of each item.
public class Row
{
[Index(1)]
public string MyValue1 { get; set; }
[Index(2)]
public string MyValue2 { get; set; }
}
After reading the data in, you can manipulate it as needed. If the output format is different from the input format, you should convert the input class into an output class. You can use the Automapper library if you would like. However, for a simple project I would suggest to just manually convert the input class into the output class.
Lastly, write the data back out:
using (var writer = new StreamWriter("path\\to\\output_file.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}

Split does not work as expected with commas

I need to write a CSV Parser I am now trying to separat the fields to manipulate them.
Sample CSV:
mitarbeiter^tagesdatum^lohnart^kostenstelle^kostentraeger^menge^betrag^belegnummer
11005^23.01.2018^1^^31810020^5,00^^
11081^23.01.2018^1^^31810020^5,00^^
As you can see, there a several empty cells.
I am doing the following:
using (CsvFileReader reader = new CsvFileReader(path))
{
CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
csvROW.Add(new aCSVROW());
string[] items = s.Split(new char[] { '^' }, StringSplitOptions.None);
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}
}
Problem:
It seems that Split stops after the comma (5,00). The separator is ^ ... is there a reason why?
I tried several things without success...
Thank you so much!

CsvFileReader reads rows from a CSV file and then strings within that row. What else do you expect the CsvFileReader to do than separating the row?
After reading the second line, row will have the contents
11005^23.01.2018^1^^31810020^5
and
00^^
When you split the first row by ^, the last entry of the resulting array will be "5". Anyway, your code will throw, because you are trying to access items exceeding the bounds of the array.
I don't know CsvFileReader. Maybe you can pass ^ as a separator and spare the splitting of the string. Anyway, you could use a StreamReader, too. This will work much more like you expected.
using (StreamReader reader = new StreamReader(path))
{
while (!reader.EndOfStream)
{
var csvLine = reader.ReadLine();
csvROW.Add(new aCSVROW());
string[] items = csvLine.Split(new char[] { '^' }, StringSplitOptions.None);
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}

Is CsvRow meant to be the data of all rows, or of one row? Because as it is, you keep adding a new aCSVROW object into csvROW for each read line, but you keep replacing the data on just csvROW[0], the first inserted aCSVROW. This means that in the end, you will have a lot of rows that all have no data in them, except for the one on index 0, that had its properties overwritten on each iteration, and ends up containing the data of the last read row.
Also, despite using a CsvReader class, you are using plain normal String.Split to actually separate the fields. Surely that's what the CsvReader class is for?
Personally, I always use the TextFieldParser, from the Microsoft.VisualBasic.FileIO namespace. It has the advantage it's completely native in the .Net framework, and you can simply tell it which separator to use.
This function can get the data out of it as simple List<String[]>:
A:
Using C# to search a CSV file and pull the value in the column next to it
Once you have your data, you can paste it into objects however you want.
List<String[]> lines = SplitFile(path, textEncoding, "^");
// I assume "CsvRow" is some kind of container for multiple rows?
// Looks like pretty bad naming to me...
CsvRow allRows = new CsvRow();
foreach (String items in lines)
{
// Create new object, and add it to list.
aCSVROW row = new aCSVROW();
csvROW.Add(row);
// Fill the actual newly created object, not the first object in allRows.
// conside adding index checks here though to avoid index out of range exceptions.
row.mitarbeiter = items[0];
row.tagesdatum = items[1];
row.lohnart = items[2];
row.kostenstelle = items[3];
row.kostentraeger = items[4];
row.menge = items[5];
row.betrag = items[6];
row.belegnummer = items[7];
}
// Done. All rows added to allRows.

CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
csvROW.Add(new aCSVROW());
s.Split("^","");
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}
}

Speedily Read and Parse Data

As of now, I am using this code to open a file and read it into a list and parse that list into a string[]:
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
string[] splitCP4DataBaseLines = CP4DataBaseRTB.Text.Split('\n');
List<string> tempCP4List = new List<string>();
string[] line1CP4Components;
foreach (var line in splitCP4DataBaseLines)
tempCP4List.Add(line + Environment.NewLine);
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
line1CP4Components = new Regex("\"UNIT\",\"PARTS\"", RegexOptions.Multiline)
.Split(concattedUnitPart)
.Where(c => !string.IsNullOrEmpty(c)).ToArray();
I am wondering if there is a quicker way to do this. This is just one of the files I am opening, so this is repeated a minimum of 5 times to open and properly load the lists.
The minimum file size being imported right now is 257 KB. The largest file is 1,803 KB. These files will only get larger as time goes on as they are being used to simulate a database and the user will continually add to them.
So my question is, is there a quicker way to do all of the above code?
EDIT:
***CP4***
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106536"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:11"
"PMABAR",""
"COMMENT",""
"PTPNAME","R160805"
"CMPNAME","R160805"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",180
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.25
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.40
"PTPTLCL",10
"PTPTLPX",0.30
"PTPTLPY",0.30
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",100
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106653"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:42"
"PMABAR",""
"COMMENT",""
"PTPNAME","0603R"
"CMPNAME","0603R"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",18
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.23
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.34
"PTPTLCL",0
"PTPTLPX",0.60
"PTPTLPY",0.40
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",80
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
... the file goes on and on and on.. and will only get larger.
The REGEX is placing each "UNIT PARTS" and the following code until the NEXT "UNIT PARTS" into a string[].
After this, I am checking each string[] to see if the "NAME" section exists in a different list. If it does exist, I am outputting that "UNIT PARTS" at the end of a textfile.

This bit is a potential performance killer:
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
(See this article for why.) Use a StringBuilder for repeated concatenation:
// No need to use tempCP4List at all
StringBuilder builder = new StringBuilder();
foreach (var line in splitCP4DataBaseLines)
{
concattedUnitPart.AppendLine(line);
line1CP4PartLines++;
}
Or even just:
string concattedUnitPart = string.Join(Environment.NewLine,
splitCP4DataBaseLines);
Now the regex part may well also be slow - I'm not sure. It's not obvious what you're trying to achieve, whether you need regular expressions at all, or whether you really need to do the whole thing in one go. Can you definitely not just process it line by line?

You could achieve the same output list 'line1CP4Components' using the following:
Regex StripEmptyLines = new Regex(#"^\s*$", RegexOptions.Multiline);
Regex UnitPartsMatch = new Regex(#"(?<=\n)""UNIT"",""PARTS"".*?(?=(?:\n""UNIT"",""PARTS"")|$)", RegexOptions.Singleline);
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
List<string> line1CP4Components = new List<string>(
UnitPartsMatch.Matches(StripEmptyLines.Replace(CP4DataBaseRTB.Text, ""))
.OfType<Match>()
.Select(m => m.Value)
);
return line1CP4Components.ToArray();
You may be able to ignore the use of StripEmptyLines, but your original code is doing this via the Where(c => !string.IsNullOrEmpty(c)). Also your original code is causing the '\r' part of the "\r\n" newline/linefeed pair to be duplicated. I assumed this was an accident and not intentional?
Also you don't seem to be using the value in 'line1CP4PartLines' so I omitted the creation of the value. It was seemingly inconsistent with the omission of empty lines later so I guess you're not depending on it. If you need this value a simple regex can tell you how many new lines are in the string:
int linecount = new Regex("^", RegexOptions.Multiline).Matches(CP4DataBaseRTB.Text).Count;

// example of what your code will look like
string CP4DataBase = "C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
List<string> Cp4DataList = new List<string>(File.ReadAllLines(CP4DataBase);
//or create a Dictionary<int,string[]> object
string strData = string.Empty;//hold the line item data which is read in line by line
string[] strStockListRecord = null;//string array that holds information from the TFE_Stock.txt file
Dictionary<int, string[]> dctStockListRecords = null; //dictionary object that will hold the KeyValuePair of text file contents in a DictList
List<string> lstStockListRecord = null;//Generic list that will store all the lines from the .prnfile being processed
if (File.Exists(strExtraLoadFileLoc + strFileName))
{
try
{
lstStockListRecord = new List<string>();
List<string> lstStrLinesStockRecord = new List<string>(File.ReadAllLines(strExtraLoadFileLoc + strFileName));
dctStockListRecords = new Dictionary<int, string[]>(lstStrLinesStockRecord.Count());
int intLineCount = 0;
foreach (string strLineSplit in lstStrLinesStockRecord)
{
lstStockListRecord.Add(strLineSplit);
dctStockListRecords.Add(intLineCount, lstStockListRecord.ToArray());
lstStockListRecord.Clear();
intLineCount++;
}//foreach (string strlineSplit in lstStrLinesStockRecord)
lstStrLinesStockRecord.Clear();
lstStrLinesStockRecord = null;
lstStockListRecord.Clear();
lstStockListRecord = null;
//Alter the code to fit what you are doing..

ReadStreamAsDT - Filehelpers and C# - How do I dynamically read in a CSV using filehelpers?

I am attempting to read in a CSV dynamically via FileHelpers and work with the CSV data as a datatable. My CSV files will not be the same. They will have different column headers and different amounts of columns. I am using the ReadStreamAsDT method, but it seems to still want a structured class to initialize the FileHelperEngine. Any ideas?

I had to use FileHelpers.RunTime and the DelimitedClassBuilder to create a DataTable from the file. Here is my method. If I get more time, I will explain this better.
private static DataTable CreateDataTableFromFile(byte[] importFile) {
var cb = new DelimitedClassBuilder("temp", ",") { IgnoreFirstLines = 0, IgnoreEmptyLines = true, Delimiter = "," };
var ms = new MemoryStream(importFile);
var sr = new StreamReader(ms);
var headerArray = sr.ReadLine().Split(',');
foreach (var header in headerArray) {
cb.AddField(header, typeof(string));
cb.LastField.FieldQuoted = true;
cb.LastField.QuoteChar = '"';
}
var engine = new FileHelperEngine(cb.CreateRecordClass());
return engine.ReadStreamAsDT(sr);
}
Obviously, there is a lot of validation along with other logic taking place around this method, but I do not have much time right now to dive into it. Hope this helps!

Have you tried using http://www.codeproject.com/KB/database/CsvReader.aspx ? You could leverage this library's parsing. It's fast and reliable.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

CsvHelper Parser.Read() not splitting columns - c#

Well, i feel stupid now... Change the reader.Configuration.Delimiter = ";"; Thanks Benjamin Podszun for getting me on the right track

Related

How to turn a string into a 2d string array

C# - LinQ - Read text files, group by first column and order by last column?

Split does not work as expected with commas

Speedily Read and Parse Data

ReadStreamAsDT - Filehelpers and C# - How do I dynamically read in a CSV using filehelpers?

Categories

Resources