CSV quoted values with line break inside data - c#

This question is specific to ChoETL CSV reader
Take this example
"Header1","Header2","Header3"
"Value1","Val
ue2","Value3"
(Notepad++ screenshot)
All headers and values are quoted
There's a line break in "Value2"
I've been playing with ChoETL options, but I can't get it to work:
foreach (dynamic e in new
ChoCSVReader(#"test.csv")
.WithFirstLineHeader()
.MayContainEOLInData(true)
.MayHaveQuotedFields()
//been playing with these too
//.QuoteAllFields()
// .ConfigureHeader(c => c.IgnoreColumnsWithEmptyHeader = true)
//.AutoIncrementDuplicateColumnNames()
//.ConfigureHeader(c => c.QuoteAllHeaders = true)
//.IgnoreEmptyLine()
)
{
System.Console.WriteLine(e["Header1"]);
}
This fails with:
Missing 'Header2' field value in CSV file
The error varies depending on the reader configuration
What is the correct configuration to read this text?

It is bug in handling one of the cases (ie. header having quotes - csv2 text). Applied fix. Take the ChoETL.NETStandard.1.2.1.35-beta1 package and give it a try.
string csv1 = #"Header1,Header2,Header3
""Value1"",""Val
ue2"",""Value3""";
string csv2 = #"""Header1"",""Header2"",""Header3""
""Value1"",""Val
ue2"",""Value3""";
string csv3 = #"Header1,Header2,Header3
Value1,""Value2"",Value3";
using (var r = ChoCSVReader.LoadText(csv1)
.WithFirstLineHeader()
.MayContainEOLInData(true)
.QuoteAllFields())
r.Print();
using (var r = ChoCSVReader.LoadText(csv2)
.WithFirstLineHeader()
.MayContainEOLInData(true)
.QuoteAllFields())
r.Print();
using (var r = ChoCSVReader.LoadText(csv3)
.WithFirstLineHeader()
.MayContainEOLInData(true)
.QuoteAllFields())
r.Print();
Sample fiddle: https://dotnetfiddle.net/VubCDR

Related

CsvHelper delimeter character same as end of line character

I've run into an issue while parsing some csv-like files that I know how to fix, but like to confirm if that's the appropriate way to do.
The file structure
The file I'm trying to parse has a structure similar to .csv in that it's values are separated with a delimeter (in my case it's |), but different to the ones I've previously seen is that it also has a delimeter at the end of the line, e.g:
Column1|Column2|Column3|
Row1Val1|Row1Val2|Row1Val3|
Row2Val1|Row2Val2|Row2Val3|
The issue
The problem arose when I wrote some unit tests to cover my service that wraps over the CsvHelper library. Apparently there is some issue when I provide the following configuration:
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|",
HasHeaderRecord = true,
NewLine = "|\r\n"
};
With the above configuration, csvReader.GetRecords() returns no results. I believe that's because the order of operations for the parser is to first look for columns, then end of line - and it tries to parse empty column without realizing it's actually part of the delimeter.
(I can paste the code for the getRecords call as well, but it's basically generic code taken from examples - the only difference is I'm using System.IO.Abstractions library for easier unit testing)
The attempts to solve the problem
If I remove the NewLine configuration value, parser works fine when reading the file (even if it has end-of-line delimeter character at the end). Then, however, my "write CSV" tests break, since CsvHelper no longer is adding proper line endings to the file.
The question(s)
Is there any way I can configure CsvHelper to cover both cases with one configuration, or should I basically use two different configurations, depending on whether I'm writing to CSV or reading from it? This seems a little bit counter-intuitive for me, since it's basically the same format I'm trying to follow, but different configurations are expected?
You could manually write the empty column for each line and then you could keep the configuration the same for reading and writing.
void Main()
{
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|"
};
var records = new List<MyClass>
{
new MyClass {Column1 = "Row1Val1", Column2 = "Row1Val2", Column3 = "Row1Val3"},
new MyClass {Column1 = "Row2Val1", Column2 = "Row2Val2", Column3 = "Row2Val3"}
};
using (var writer = new StreamWriter("file.csv"))
using (var csv = new CsvWriter(writer, config))
{
csv.WriteHeader<MyClass>();
csv.WriteField(string.Empty);
foreach (var record in records)
{
csv.NextRecord();
csv.WriteRecord(record);
csv.WriteField(string.Empty);
}
}
using (var reader = new StreamReader("file.csv"))
using (var csv = new CsvReader(reader, config))
{
var importRecords = csv.GetRecords<MyClass>();
importRecords.Dump();
}
}
public class MyClass
{
public string Column1 { get; set; }
public string Column2 { get; set; }
public string Column3 { get; set; }
}

Read text from a text file with specific pattern

Hi there I have a requirement where i need to read content from a text file. The sample text content is as below.
Name=Check_Amt
Public=Yes
DateName=pp
Name=DBO
I need to read the text and only extract the value which comes after Name='What ever text'.
So I am expecting the output as Check_Amt, DBO
I need to do this in C#
When querying data (e.g. file lines) Linq is often a convenient tool; if the file has lines in
name=value
format, you can query it like this
Read file lines
Split each line into name, value pair
Filter pairs by their names
Extract value from each pair
Materialize values into a collection
Code:
using System.Linq;
...
// string[] {"Check_Amt", "DBO"}
var values = File
.ReadLines(#"c:\MyFile.txt")
.Select(line => line.Split(new char[] { '=' }, 2)) // split into name, value pairs
.Where(items => items.Length == 2) // to be on the safe side
.Where(items => items[0] == "Name") // name == "Name" only
.Select(items => items[1]) // value from name=value
.ToArray(); // let's have an array
finally, if you want comma separated string, Join the values:
// "Check_Amt,DBO"
string result = string.Join(",", values);
Another way:
var str = #"Name=Check_Amt
Public=Yes
DateName=pp
Name=DBO";
var find = "Name=";
var result = new List<string>();
using (var reader = new StringReader(str)) //Change to StreamReader to read from file
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.StartsWith(find))
result.Add(line.Substring(find.Length));
}
}
You can use LINQ to select what you need:
var names=File. ReadLines("my file.txt" ).Select(l=>l.Split('=')).Where(t=>t.Length==2).Where(t=>t[0]=="Name").Select(t=>t[1])
I think that the best case would be a regex.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?<=Name=).*?(?=Public)";
string input = #"Name=Check_Amt Public=Yes DateName=pp Name=DBO";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
EDIT: My answer was written before your question were corrected, while it's still working the LINQ answer would be better IMHO.

How to read multiple csv file in a folder using csvhelper

How can I read multiple csv file in a folder?
I have a program that map a csv file into its correct format using csvhelper library. And here is my code:
static void Main()
{
var filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.csv");<br>
var tempFilePath = Path.GetTempFileName();
using (var reader = new StreamReader(filePath))
using (var csvReader = new CsvReader(reader))
using (var writer = new StreamWriter(tempFilePath))
using (var csvWriter = new CsvWriter(writer))
{
csvReader.Configuration.RegisterClassMap<TestMapOld>();
csvWriter.Configuration.RegisterClassMap<TestMapNew>();
csvReader.Configuration.Delimiter = ",";
var records = csvReader.GetRecords<Test>();
csvReader.Configuration.PrepareHeaderForMatch = header =>
{
var newHeader = Regex.Replace(header, #"\s", string.Empty);
newHeader = newHeader.Trim();
newHeader = newHeader.ToLower();
return newHeader;
};
csvWriter.WriteRecords(records);
}
File.Delete(filePath);
File.Move(tempFilePath, filePath);
}
Since this is homework (and/or you are seemingly new to coding), I'll give you a very suitable answer that will give you many hours of fun and excitement as you go through the documentation and samples provided
You need
GetFiles(String, String)
Returns the names of files (including their paths) that match the
specified search pattern in the specified directory.
searchPattern
String The search string to match against the names of files in path.
This parameter can contain a combination of valid literal path and
wildcard (* and ?) characters, but it doesn't support regular
expressions.
foreach, in (C# reference)
The foreach statement executes a statement or a block of statements
for each element in an instance of the type that implements the
System.Collections.IEnumerable or
System.Collections.Generic.IEnumerable interface.
I'll leave the details up to you.

How to read csv file having ',' inside a data

I have some csv data:
"data1", "data2", "data2_1", "data3"
I am using csvHelper to read the data.
When I read the data and split it using separator ',' I am getting 4 records.
"data1",
"data2",
"data2_1",
"data3"
But I want 3 records as I have 3 columns
"data1",
"data2, data2_1",
"data3"
Below is code I am trying
var config = new CsvConfiguration() { HasHeaderRecord = false };
var stream = File.OpenRead(FilePath);
using (var csvReader = new CsvReader(new StreamReader(stream, Encoding.UTF8), config))
{
while (csvReader.Read()) {
var parser = csvReader.Parser;
var rowRecors = parser.RawRecord;
var splitedData = rowRecors.Split(',');
}
It seems that you are using this library: CsvHelper
Your file data is quoted meaninig that each field is enclosed in double quotes for the exact reason that some fields can contain the field separator (the comma). In this case you should inform your library what is the character used to quote your fields so it can ignore the presence of the field separator when it is inside the quotes pair.
This library offers the Configuration class that you already initialize.
You just need to add this to the constructor
var config = new CsvConfiguration()
{
HasHeaderRecord = false,
Quote = '"'
};
See the properties list at CsvHelper.Configuration
Don't split manually and instead use lib functions
// By header name
var field = csv["HeaderName"];
// Gets field by position returning string
var field = csv.GetField( 0 );
// Gets field by position returning int
var field = csv.GetField<int>( 0 );
// Gets field by header name returning bool
var field = csv.GetField<bool>( "IsTrue" );
// Gets field by header name returning object
var field = csv.GetField( typeof( bool ), "IsTrue" );
Refer reading samples

Speedily Read and Parse Data

As of now, I am using this code to open a file and read it into a list and parse that list into a string[]:
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
string[] splitCP4DataBaseLines = CP4DataBaseRTB.Text.Split('\n');
List<string> tempCP4List = new List<string>();
string[] line1CP4Components;
foreach (var line in splitCP4DataBaseLines)
tempCP4List.Add(line + Environment.NewLine);
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
line1CP4Components = new Regex("\"UNIT\",\"PARTS\"", RegexOptions.Multiline)
.Split(concattedUnitPart)
.Where(c => !string.IsNullOrEmpty(c)).ToArray();
I am wondering if there is a quicker way to do this. This is just one of the files I am opening, so this is repeated a minimum of 5 times to open and properly load the lists.
The minimum file size being imported right now is 257 KB. The largest file is 1,803 KB. These files will only get larger as time goes on as they are being used to simulate a database and the user will continually add to them.
So my question is, is there a quicker way to do all of the above code?
EDIT:
***CP4***
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106536"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:11"
"PMABAR",""
"COMMENT",""
"PTPNAME","R160805"
"CMPNAME","R160805"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",180
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.25
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.40
"PTPTLCL",10
"PTPTLPX",0.30
"PTPTLPY",0.30
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",100
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106653"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:42"
"PMABAR",""
"COMMENT",""
"PTPNAME","0603R"
"CMPNAME","0603R"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",18
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.23
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.34
"PTPTLCL",0
"PTPTLPX",0.60
"PTPTLPY",0.40
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",80
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
... the file goes on and on and on.. and will only get larger.
The REGEX is placing each "UNIT PARTS" and the following code until the NEXT "UNIT PARTS" into a string[].
After this, I am checking each string[] to see if the "NAME" section exists in a different list. If it does exist, I am outputting that "UNIT PARTS" at the end of a textfile.
This bit is a potential performance killer:
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
(See this article for why.) Use a StringBuilder for repeated concatenation:
// No need to use tempCP4List at all
StringBuilder builder = new StringBuilder();
foreach (var line in splitCP4DataBaseLines)
{
concattedUnitPart.AppendLine(line);
line1CP4PartLines++;
}
Or even just:
string concattedUnitPart = string.Join(Environment.NewLine,
splitCP4DataBaseLines);
Now the regex part may well also be slow - I'm not sure. It's not obvious what you're trying to achieve, whether you need regular expressions at all, or whether you really need to do the whole thing in one go. Can you definitely not just process it line by line?
You could achieve the same output list 'line1CP4Components' using the following:
Regex StripEmptyLines = new Regex(#"^\s*$", RegexOptions.Multiline);
Regex UnitPartsMatch = new Regex(#"(?<=\n)""UNIT"",""PARTS"".*?(?=(?:\n""UNIT"",""PARTS"")|$)", RegexOptions.Singleline);
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
List<string> line1CP4Components = new List<string>(
UnitPartsMatch.Matches(StripEmptyLines.Replace(CP4DataBaseRTB.Text, ""))
.OfType<Match>()
.Select(m => m.Value)
);
return line1CP4Components.ToArray();
You may be able to ignore the use of StripEmptyLines, but your original code is doing this via the Where(c => !string.IsNullOrEmpty(c)). Also your original code is causing the '\r' part of the "\r\n" newline/linefeed pair to be duplicated. I assumed this was an accident and not intentional?
Also you don't seem to be using the value in 'line1CP4PartLines' so I omitted the creation of the value. It was seemingly inconsistent with the omission of empty lines later so I guess you're not depending on it. If you need this value a simple regex can tell you how many new lines are in the string:
int linecount = new Regex("^", RegexOptions.Multiline).Matches(CP4DataBaseRTB.Text).Count;
// example of what your code will look like
string CP4DataBase = "C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
List<string> Cp4DataList = new List<string>(File.ReadAllLines(CP4DataBase);
//or create a Dictionary<int,string[]> object
string strData = string.Empty;//hold the line item data which is read in line by line
string[] strStockListRecord = null;//string array that holds information from the TFE_Stock.txt file
Dictionary<int, string[]> dctStockListRecords = null; //dictionary object that will hold the KeyValuePair of text file contents in a DictList
List<string> lstStockListRecord = null;//Generic list that will store all the lines from the .prnfile being processed
if (File.Exists(strExtraLoadFileLoc + strFileName))
{
try
{
lstStockListRecord = new List<string>();
List<string> lstStrLinesStockRecord = new List<string>(File.ReadAllLines(strExtraLoadFileLoc + strFileName));
dctStockListRecords = new Dictionary<int, string[]>(lstStrLinesStockRecord.Count());
int intLineCount = 0;
foreach (string strLineSplit in lstStrLinesStockRecord)
{
lstStockListRecord.Add(strLineSplit);
dctStockListRecords.Add(intLineCount, lstStockListRecord.ToArray());
lstStockListRecord.Clear();
intLineCount++;
}//foreach (string strlineSplit in lstStrLinesStockRecord)
lstStrLinesStockRecord.Clear();
lstStrLinesStockRecord = null;
lstStockListRecord.Clear();
lstStockListRecord = null;
//Alter the code to fit what you are doing..

Categories