Group list of strings by subparts of item - c#

Sorry for the ambiguous title...
I find explaining my issue difficult - let me know if you need to know more.
I've got a list that i'd like to be grouped by part of a string.
This string is also in the list.
This is the complete list, its not static, and will contain different values.
CookieMaker_TransportSettingsManual
CookieMaker_TransportSettingsParameters
Cookie_WrapperSettings
Cookie_WrapperSettingsManual
Cookie_WrapperSettingsParameters
Cookie_ProfileBendSettings
Cookie_ProfileBendSettingsParameters
Cookie_HopperSettings
Cookie_HopperSettingsManual
Cookie_HopperSettingsParameters
Cookie_CutterSettings
Cookie_CutterSettingsManual
Cookie_CutterSettingsParameters
General_SpeedSetting
General_SpeedSettingManual
General_SpeedSettingSettings
General_CalibrationSettings
General_CalibrationSettingsCalibration
Bonbon_Vertical
Bonbon_VerticalAligner
Bonbon_VerticalHopper
Bonbon_VerticalManual
Bonbon_VerticalTransporter
Bonbon_Horizontal
Bonbon_HorizontalHopper
Bonbon_HorizontalManual
Bonbon_HorizontalCookie
Bonbon_HorizontalTransporter
Bonbon_Bonbon
Bonbon_BonbonExhaust
Bonbon_BonbonManual
Bonbon_BonbonSection1
Bonbon_BonbonSection2
Bonbon_BonbonSection3
Bonbon_Compensator
Bonbon_CompensatorCarriage
Bonbon_CompensatorHopper
Bonbon_CompensatorManual
Bonbon_CollectingUnit
Bonbon_CollectingUnitManual
Bonbon_CollectingUnitTransporter
Bonbon_CollectingUnitTubeMaker
CookieMaker_TransportSettings
CookieMaker_TransportSettingsBonbon
CookieMaker_TransportSettingsPandora
The expected result would be a groups like so:
General_SpeedSetting
==> General_SpeedSettingManual
==> General_SpeedSettingSettings
Cookie_WrapperSettings
==> Cookie_WrapperSettingsManual
==> Cookie_WrapperSettingsParameters
The resulting datatype does not matter.
Also i don't mind linq.
Code / fiddle to get up and running quickly;
using System;
public class Program
{
public static void Main()
{
var inputString = "CookieMaker_TransportSettingsManual|CookieMaker_TransportSettingsParameters|Cookie_WrapperSettings|Cookie_WrapperSettingsManual|Cookie_WrapperSettingsParameters|Cookie_ProfileBendSettings|Cookie_ProfileBendSettingsParameters|Cookie_HopperSettings|Cookie_HopperSettingsManual|Cookie_HopperSettingsParameters|Cookie_CutterSettings|Cookie_CutterSettingsManual|Cookie_CutterSettingsParameters|General_SpeedSetting|General_SpeedSettingManual|General_SpeedSettingSettings|General_CalibrationSettings|General_CalibrationSettingsCalibration|Bonbon_Vertical|Bonbon_VerticalAligner|Bonbon_VerticalHopper|Bonbon_VerticalManual|Bonbon_VerticalTransporter|Bonbon_Horizontal|Bonbon_HorizontalHopper|Bonbon_HorizontalManual|Bonbon_HorizontalCookie|Bonbon_HorizontalTransporter|Bonbon_Bonbon|Bonbon_BonbonExhaust|Bonbon_BonbonManual|Bonbon_BonbonSection1|Bonbon_BonbonSection2|Bonbon_BonbonSection3|Bonbon_Compensator|Bonbon_CompensatorCarriage|Bonbon_CompensatorHopper|Bonbon_CompensatorManual|Bonbon_CollectingUnit|Bonbon_CollectingUnitManual|Bonbon_CollectingUnitTransporter|Bonbon_CollectingUnitTubeMaker|CookieMaker_TransportSettings|CookieMaker_TransportSettingsBonbon|CookieMaker_TransportSettingsPandora";
var inputList = inputString.Split('|');
var result = inputList; // Code here ;)
foreach(var r in result)
{ Console.WriteLine(r);}
}
}
https://dotnetfiddle.net/neCUEL

What about something like this?
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
static List<string> myList = new List<string>(){
"CookieMaker_TransportSettingsManual",
"CookieMaker_TransportSettingsParameters",
"Cookie_WrapperSettings",
"Cookie_WrapperSettingsManual",
"Cookie_WrapperSettingsParameters",
"Cookie_ProfileBendSettings",
"Cookie_ProfileBendSettingsParameters",
"Cookie_HopperSettings",
"Cookie_HopperSettingsManual",
"Cookie_HopperSettingsParameters",
"Cookie_CutterSettings",
"Cookie_CutterSettingsManual",
"Cookie_CutterSettingsParameters",
"General_SpeedSetting",
"General_SpeedSettingManual",
"General_SpeedSettingSettings",
"General_CalibrationSettings",
"General_CalibrationSettingsCalibration",
"Bonbon_Vertical",
"Bonbon_VerticalAligner",
"Bonbon_VerticalHopper",
"Bonbon_VerticalManual",
"Bonbon_VerticalTransporter",
"Bonbon_Horizontal",
"Bonbon_HorizontalHopper",
"Bonbon_HorizontalManual",
"Bonbon_HorizontalCookie",
"Bonbon_HorizontalTransporter",
"Bonbon_Bonbon",
"Bonbon_BonbonExhaust",
"Bonbon_BonbonManual",
"Bonbon_BonbonSection1",
"Bonbon_BonbonSection2",
"Bonbon_BonbonSection3",
"Bonbon_Compensator",
"Bonbon_CompensatorCarriage",
"Bonbon_CompensatorHopper",
"Bonbon_CompensatorManual",
"Bonbon_CollectingUnit",
"Bonbon_CollectingUnitManual",
"Bonbon_CollectingUnitTransporter",
"Bonbon_CollectingUnitTubeMaker",
"CookieMaker_TransportSettings",
"CookieMaker_TransportSettingsBonbon",
"CookieMaker_TransportSettingsPandora"
};
static Dictionary<string, List<string>> results = new Dictionary<string, List<string>>();
//-------------------------------------------------------------------------//
public static void Main()
{
var orderedList = myList.OrderBy(i=>i).ToList();
int i = 0;
while(i < myList.Count){
var prefix = orderedList[i];
results[prefix] = new List<string>();
if(++i >= orderedList.Count) break;
while(orderedList[i].StartsWith(prefix)){
results[prefix].Add(orderedList[i]);
i++;
if(i >= orderedList.Count) {
Print();
return;
}
}//while
}//while
Print();
}//main
//-------------------------------------------------------------------------//
private static void Print(){
foreach (string prefix in results.Keys)
{
Console.WriteLine($"Prefix =>{prefix} - {results[prefix].Count}");
foreach (string result in results[prefix])
{
Console.WriteLine($" ======>{result}");
}//foreach;
}//foreach
}//Print
}//Cls
Fiddle:
https://dotnetfiddle.net/GTI4vV

I'm surprised you accepted a solution that pre-sorted the items. When I tried that, the Bonbon sections got terribly messed up.
My solution is a bit hacky - to get this to work the way I think you want it took a lot of special cases (and fixing off-by-one issues).
The code takes care of this kind of pattern:
CookieMaker_TransportSettingsManual
CookieMaker_TransportSettingsParameters
extracting CookieMaker_TransportSettings and putting both entries under it. It also copes with the fact that you have CookieMaker_TransportSettings at the beginning and the end of the file.
It also handles this:
Bonbon_BonbonSection1
Bonbon_BonbonSection2
Bonbon_BonbonSection3
Figuring that you want the three of those to be part of the Bonbon_Bonbon section and not a new Bonbon_BonbonSection section with three entries (1, 2 and 3).
It also deals with all the Cookie** and Bonbon** sections.
Here's the main code:
//get all the strings from somewhere
var inputStrings = File.ReadAllLines("DataFile.txt");
string lastTitle = null;
var results = new Dictionary<string, List<string>>();
string veryLastItem = string.Empty;
var currentItems = new List<string>();
for (var i = 0; i < inputStrings.Length - 1; ++i)
{
var commonPrefix = FindLongestCommonPrefix(inputStrings[i], inputStrings[i + 1]);
if (string.IsNullOrEmpty(commonPrefix) || (!string.IsNullOrEmpty(lastTitle) && commonPrefix != lastTitle))
{
if (string.IsNullOrEmpty(lastTitle))
{
throw new Exception("This isn't going to work - you need to have at least two common things in a row");
}
if (inputStrings[i].StartsWith(lastTitle) && inputStrings[i] != lastTitle)
{
currentItems.Add(inputStrings[i]);
}
AddResultsToDictionary(results, lastTitle, currentItems);
currentItems = new List<string>();
}
if (commonPrefix != inputStrings[i] &&
((commonPrefix == lastTitle && commonPrefix != inputStrings[i]) ||
(!string.IsNullOrEmpty(commonPrefix) && inputStrings[i].StartsWith(commonPrefix))))
{
currentItems.Add(inputStrings[i]);
}
lastTitle = commonPrefix;
veryLastItem = inputStrings[i + 1];
}
//ok, we're out of the loop:
//add the last item to the current list
currentItems.Add(veryLastItem);
//and add the last set of items to the dictionary
if (lastTitle != null)
{
AddResultsToDictionary(results, lastTitle, currentItems);
}
foreach (var result in results)
{
Debug.WriteLine(result.Key);
foreach (var item in result.Value)
{
Debug.WriteLine($" ==> {item}");
}
}
void AddResultsToDictionary(Dictionary<string, List<string>> dictionary, string s, List<string> list)
{
if (dictionary.TryGetValue(s, out var existingList))
{
existingList.AddRange(list);
}
else
{
dictionary.Add(s, list);
}
}
}
And it calls this function to determine the section headings:
private string FindLongestCommonPrefix(string s1, string s2)
{
var minLen = Math.Min(s1.Length, s2.Length);
for (var i = 0; i < minLen; ++i)
{
if (s1[i] != s2[i])
{
if (i == 0)
{
return string.Empty;
}
else
{
//if the common part is not s1, we need to find the last place where the following
// the last letter of the common part is a lower case letter followed by either
// an underscore or a capital letter
if (i == s1.Length)
{
return s1;
}
if (s1[i] == '_' || s1[i - 1] == '_' || s2[i] == '_' || s2[i - 1] == '_')
{
return string.Empty;
}
for (var j = i; j > 0; --j)
{
if (char.IsLower(s1[j-1]) && (char.IsUpper(s1[j]) /*|| s1[j] == '_'*/))
{
return s1.Substring(0, j);
}
}
//I shouldn't get here, but, if I do
return string.Empty;
}
}
}
//otherwise
return s1.Substring(0, minLen);
}
The result ends up looking like:
CookieMaker_TransportSettings
==> CookieMaker_TransportSettingsManual
==> CookieMaker_TransportSettingsParameters
==> CookieMaker_TransportSettingsBonbon
==> CookieMaker_TransportSettingsPandora
Cookie_WrapperSettings
==> Cookie_WrapperSettingsManual
==> Cookie_WrapperSettingsParameters
Cookie_ProfileBendSettings
==> Cookie_ProfileBendSettingsParameters
Cookie_HopperSettings
==> Cookie_HopperSettingsManual
==> Cookie_HopperSettingsParameters
Cookie_CutterSettings
==> Cookie_CutterSettingsManual
==> Cookie_CutterSettingsParameters
General_SpeedSetting
==> General_SpeedSettingManual
==> General_SpeedSettingSettings
General_CalibrationSettings
==> General_CalibrationSettingsCalibration
Bonbon_Vertical
==> Bonbon_VerticalAligner
==> Bonbon_VerticalHopper
==> Bonbon_VerticalManual
==> Bonbon_VerticalTransporter
Bonbon_Horizontal
==> Bonbon_HorizontalHopper
==> Bonbon_HorizontalManual
==> Bonbon_HorizontalCookie
==> Bonbon_HorizontalTransporter
Bonbon_Bonbon
==> Bonbon_BonbonExhaust
==> Bonbon_BonbonManual
==> Bonbon_BonbonSection1
==> Bonbon_BonbonSection2
==> Bonbon_BonbonSection3
Bonbon_Compensator
==> Bonbon_CompensatorCarriage
==> Bonbon_CompensatorHopper
==> Bonbon_CompensatorManual
Bonbon_CollectingUnit
==> Bonbon_CollectingUnitManual
==> Bonbon_CollectingUnitTransporter
==> Bonbon_CollectingUnitTubeMaker

Try following :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
List<string> lines = File.ReadLines(FILENAME).ToList();
lines = lines.OrderBy(x => x).ToList();
List<Group> groups = new List<Group>();
Group group = new Group();
groups.Add(group);
group.basename = lines[0].Trim();
List<List<string>> results = new List<List<string>>();
for (int i = 2; i < lines.Count; i++)
{
string line = lines[i].Trim();
if (!line.StartsWith(group.basename))
{
group = new Group();
groups.Add(group);
group.basename = line;
}
else
{
if(group.values == null) group.values = new List<string>();
group.values.Add(line.Substring(group.basename.Length));
}
}
}
}
public class Group
{
public string basename { get; set; }
public List<string> values { get; set; }
}
}

Related

Compare text files in C# and remove duplicate lines

1.txt:
Origination,destination,datetime,price
YYZ,YTC,2016-04-01 12:30,$550
YYZ,YTC,2016-04-01 12:30,$550
LKC,LKP,2016-04-01 12:30,$550
2.txt:
Origination|destination|datetime|price
YYZ|YTC|2016-04-01 12:30|$550
AMV|YRk|2016-06-01 12:30|$630
LKC|LKP|2016-12-01 12:30|$990
I have two text files with ',' and '|' as separators, and I want to create a console app in C# which reads these two files when I pass an origination and destination location from command prompt.
While searching, I want to ignore duplicate lines, and I want to display the results in order by price.
The output should be { origination } -> { destination } -> datetime -> price
Need help how to perform.
Here's a simple solution that works for your example files. It doesn't have any error checking for if the file is in a bad format.
using System;
using System.Collections.Generic;
class Program
{
class entry
{
public string origin;
public string destination;
public DateTime time;
public double price;
}
static void Main(string[] args)
{
List<entry> data = new List<entry>();
//parse the input files and add the data to a list
ParseFile(data, args[0], ',');
ParseFile(data, args[1], '|');
//sort the list (by price first)
data.Sort((a, b) =>
{
if (a.price != b.price)
return a.price > b.price ? 1 : -1;
else if (a.origin != b.origin)
return string.Compare(a.origin, b.origin);
else if (a.destination != b.destination)
return string.Compare(a.destination, b.destination);
else
return DateTime.Compare(a.time, b.time);
});
//remove duplicates (list must be sorted for this to work)
int i = 1;
while (i < data.Count)
{
if (data[i].origin == data[i - 1].origin
&& data[i].destination == data[i - 1].destination
&& data[i].time == data[i - 1].time
&& data[i].price == data[i - 1].price)
data.RemoveAt(i);
else
i++;
}
//print the results
for (i = 0; i < data.Count; i++)
Console.WriteLine("{0}->{1}->{2:yyyy-MM-dd HH:mm}->${3}",
data[i].origin, data[i].destination, data[i].time, data[i].price);
Console.ReadLine();
}
private static void ParseFile(List<entry> data, string filename, char separator)
{
using (System.IO.FileStream fs = System.IO.File.Open(filename, System.IO.FileMode.Open))
using (System.IO.StreamReader reader = new System.IO.StreamReader(fs))
while (!reader.EndOfStream)
{
string[] line = reader.ReadLine().Split(separator);
if (line.Length == 4)
{
entry newitem = new entry();
newitem.origin = line[0];
newitem.destination = line[1];
newitem.time = DateTime.Parse(line[2]);
newitem.price = double.Parse(line[3].Substring(line[3].IndexOf('$') + 1));
data.Add(newitem);
}
}
}
}
I'm not 100% clear on what the output of your program is supposed to be, so I'll leave that part of the implementation up to you. My strategy was to use a constructor method that takes a string (that you will read from a file) and a delimiter (since it varies) and use that to create objects which you can manipulate (e.g. add to hash sets, etc).
PriceObject.cs
using System;
using System.Globalization;
namespace ConsoleApplication1
{
class PriceObject
{
public string origination { get; set; }
public string destination { get; set; }
public DateTime time { get; set; }
public decimal price { get; set; }
public PriceObject(string inputLine, char delimiter)
{
string[] parsed = inputLine.Split(new char[] { delimiter }, 4);
origination = parsed[0];
destination = parsed[1];
time = DateTime.ParseExact(parsed[2], "yyyy-MM-dd HH:mm", CultureInfo.InvariantCulture);
price = Decimal.Parse(parsed[3], NumberStyles.Currency, new CultureInfo("en-US"));
}
public override bool Equals(object obj)
{
var item = obj as PriceObject;
return origination.Equals(item.origination) &&
destination.Equals(item.destination) &&
time.Equals(item.time) &&
price.Equals(item.price);
}
public override int GetHashCode()
{
unchecked
{
var result = 17;
result = (result * 23) + origination.GetHashCode();
result = (result * 23) + destination.GetHashCode();
result = (result * 23) + time.GetHashCode();
result = (result * 23) + price.GetHashCode();
return result;
}
}
}
}
Program.cs
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
HashSet<PriceObject> list1 = new HashSet<PriceObject>();
HashSet<PriceObject> list2 = new HashSet<PriceObject>();
using (StreamReader reader = File.OpenText(args[0]))
{
string line = reader.ReadLine(); // this will remove the header row
while (!reader.EndOfStream)
{
line = reader.ReadLine();
if (String.IsNullOrEmpty(line))
continue;
// add each line to our list
list1.Add(new PriceObject(line, ','));
}
}
using (StreamReader reader = File.OpenText(args[1]))
{
string line = reader.ReadLine(); // this will remove the header row
while (!reader.EndOfStream)
{
line = reader.ReadLine();
if (String.IsNullOrEmpty(line))
continue;
// add each line to our list
list2.Add(new PriceObject(line, '|'));
}
}
// merge the two hash sets, order by price
list1.UnionWith(list2);
List<PriceObject> output = list1.ToList();
output.OrderByDescending(x => x.price).ToList();
// display output here, e.g. define your own ToString method, etc
foreach (var item in output)
{
Console.WriteLine(item.ToString());
}
Console.ReadLine();
}
}
}

Parsing CSV File with double quotes [duplicate]

Is there a default/official/recommended way to parse CSV files in C#? I don't want to roll my own parser.
Also, I've seen instances of people using ODBC/OLE DB to read CSV via the Text driver, and a lot of people discourage this due to its "drawbacks." What are these drawbacks?
Ideally, I'm looking for a way through which I can read the CSV by column name, using the first record as the header / field names. Some of the answers given are correct but work to basically deserialize the file into classes.
A CSV parser is now a part of .NET Framework.
Add a reference to Microsoft.VisualBasic.dll (works fine in C#, don't mind the name)
using (TextFieldParser parser = new TextFieldParser(#"c:\temp\test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
The docs are here - TextFieldParser Class
P.S. If you need a CSV exporter, try CsvExport (discl: I'm one of the contributors)
CsvHelper (a library I maintain) will read a CSV file into custom objects.
using (var reader = new StreamReader("path\\to\\file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
var records = csv.GetRecords<Foo>();
}
Sometimes you don't own the objects you're trying to read into. In this case, you can use fluent mapping because you can't put attributes on the class.
public sealed class MyCustomObjectMap : CsvClassMap<MyCustomObject>
{
public MyCustomObjectMap()
{
Map( m => m.Property1 ).Name( "Column Name" );
Map( m => m.Property2 ).Index( 4 );
Map( m => m.Property3 ).Ignore();
Map( m => m.Property4 ).TypeConverter<MySpecialTypeConverter>();
}
}
Let a library handle all the nitty-gritty details for you! :-)
Check out FileHelpers and stay DRY - Don't Repeat Yourself - no need to re-invent the wheel a gazillionth time....
You basically just need to define that shape of your data - the fields in your individual line in the CSV - by means of a public class (and so well-thought out attributes like default values, replacements for NULL values and so forth), point the FileHelpers engine at a file, and bingo - you get back all the entries from that file. One simple operation - great performance!
In a business application, i use the Open Source project on codeproject.com, CSVReader.
It works well, and has good performance. There is some benchmarking on the link i provided.
A simple example, copied from the project page:
using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));
Console.WriteLine();
}
}
As you can see, it's very easy to work with.
I know its a bit late but just found a library Microsoft.VisualBasic.FileIO which has TextFieldParser class to process csv files.
Here is a helper class I use often, in case any one ever comes back to this thread (I wanted to share it).
I use this for the simplicity of porting it into projects ready to use:
public class CSVHelper : List<string[]>
{
protected string csv = string.Empty;
protected string separator = ",";
public CSVHelper(string csv, string separator = "\",\"")
{
this.csv = csv;
this.separator = separator;
foreach (string line in Regex.Split(csv, System.Environment.NewLine).ToList().Where(s => !string.IsNullOrEmpty(s)))
{
string[] values = Regex.Split(line, separator);
for (int i = 0; i < values.Length; i++)
{
//Trim values
values[i] = values[i].Trim('\"');
}
this.Add(values);
}
}
}
And use it like:
public List<Person> GetPeople(string csvContent)
{
List<Person> people = new List<Person>();
CSVHelper csv = new CSVHelper(csvContent);
foreach(string[] line in csv)
{
Person person = new Person();
person.Name = line[0];
person.TelephoneNo = line[1];
people.Add(person);
}
return people;
}
[Updated csv helper: bug fixed where the last new line character created a new line]
If you need only reading csv files then I recommend this library: A Fast CSV Reader
If you also need to generate csv files then use this one: FileHelpers
Both of them are free and opensource.
This solution is using the official Microsoft.VisualBasic assembly to parse CSV.
Advantages:
delimiter escaping
ignores Header
trim spaces
ignore comments
Code:
using Microsoft.VisualBasic.FileIO;
public static List<List<string>> ParseCSV (string csv)
{
List<List<string>> result = new List<List<string>>();
// To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project.
using (TextFieldParser parser = new TextFieldParser(new StringReader(csv)))
{
parser.CommentTokens = new string[] { "#" };
parser.SetDelimiters(new string[] { ";" });
parser.HasFieldsEnclosedInQuotes = true;
// Skip over header line.
//parser.ReadLine();
while (!parser.EndOfData)
{
var values = new List<string>();
var readFields = parser.ReadFields();
if (readFields != null)
values.AddRange(readFields);
result.Add(values);
}
}
return result;
}
I have written TinyCsvParser for .NET, which is one of the fastest .NET parsers around and highly configurable to parse almost any CSV format.
It is released under MIT License:
https://github.com/bytefish/TinyCsvParser
You can use NuGet to install it. Run the following command in the Package Manager Console.
PM> Install-Package TinyCsvParser
Usage
Imagine we have list of Persons in a CSV file persons.csv with their first name, last name and birthdate.
FirstName;LastName;BirthDate
Philipp;Wagner;1986/05/12
Max;Musterman;2014/01/02
The corresponding domain model in our system might look like this.
private class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public DateTime BirthDate { get; set; }
}
When using TinyCsvParser you have to define the mapping between the columns in the CSV data and the property in you domain model.
private class CsvPersonMapping : CsvMapping<Person>
{
public CsvPersonMapping()
: base()
{
MapProperty(0, x => x.FirstName);
MapProperty(1, x => x.LastName);
MapProperty(2, x => x.BirthDate);
}
}
And then we can use the mapping to parse the CSV data with a CsvParser.
namespace TinyCsvParser.Test
{
[TestFixture]
public class TinyCsvParserTest
{
[Test]
public void TinyCsvTest()
{
CsvParserOptions csvParserOptions = new CsvParserOptions(true, new[] { ';' });
CsvPersonMapping csvMapper = new CsvPersonMapping();
CsvParser<Person> csvParser = new CsvParser<Person>(csvParserOptions, csvMapper);
var result = csvParser
.ReadFromFile(#"persons.csv", Encoding.ASCII)
.ToList();
Assert.AreEqual(2, result.Count);
Assert.IsTrue(result.All(x => x.IsValid));
Assert.AreEqual("Philipp", result[0].Result.FirstName);
Assert.AreEqual("Wagner", result[0].Result.LastName);
Assert.AreEqual(1986, result[0].Result.BirthDate.Year);
Assert.AreEqual(5, result[0].Result.BirthDate.Month);
Assert.AreEqual(12, result[0].Result.BirthDate.Day);
Assert.AreEqual("Max", result[1].Result.FirstName);
Assert.AreEqual("Mustermann", result[1].Result.LastName);
Assert.AreEqual(2014, result[1].Result.BirthDate.Year);
Assert.AreEqual(1, result[1].Result.BirthDate.Month);
Assert.AreEqual(1, result[1].Result.BirthDate.Day);
}
}
}
User Guide
A full User Guide is available at:
http://bytefish.github.io/TinyCsvParser/
Here is a short and simple solution.
using (TextFieldParser parser = new TextFieldParser(outputLocation))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
string[] headers = parser.ReadLine().Split(',');
foreach (string header in headers)
{
dataTable.Columns.Add(header);
}
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
dataTable.Rows.Add(fields);
}
}
Here is my KISS implementation...
using System;
using System.Collections.Generic;
using System.Text;
class CsvParser
{
public static List<string> Parse(string line)
{
const char escapeChar = '"';
const char splitChar = ',';
bool inEscape = false;
bool priorEscape = false;
List<string> result = new List<string>();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < line.Length; i++)
{
char c = line[i];
switch (c)
{
case escapeChar:
if (!inEscape)
inEscape = true;
else
{
if (!priorEscape)
{
if (i + 1 < line.Length && line[i + 1] == escapeChar)
priorEscape = true;
else
inEscape = false;
}
else
{
sb.Append(c);
priorEscape = false;
}
}
break;
case splitChar:
if (inEscape) //if in escape
sb.Append(c);
else
{
result.Add(sb.ToString());
sb.Length = 0;
}
break;
default:
sb.Append(c);
break;
}
}
if (sb.Length > 0)
result.Add(sb.ToString());
return result;
}
}
Some time ago I had wrote simple class for CSV read/write based on Microsoft.VisualBasic library. Using this simple class you will be able to work with CSV like with 2 dimensions array. You can find my class by the following link: https://github.com/ukushu/DataExporter
Simple example of usage:
Csv csv = new Csv("\t");//delimiter symbol
csv.FileOpen("c:\\file1.csv");
var row1Cell6Value = csv.Rows[0][5];
csv.AddRow("asdf","asdffffff","5")
csv.FileSave("c:\\file2.csv");
For reading header only you need is to read csv.Rows[0] cells :)
This code reads csv to DataTable:
public static DataTable ReadCsv(string path)
{
DataTable result = new DataTable("SomeData");
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
bool isFirstRow = true;
//IList<string> headers = new List<string>();
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
if (isFirstRow)
{
foreach (string field in fields)
{
result.Columns.Add(new DataColumn(field, typeof(string)));
}
isFirstRow = false;
}
else
{
int i = 0;
DataRow row = result.NewRow();
foreach (string field in fields)
{
row[i++] = field;
}
result.Rows.Add(row);
}
}
}
return result;
}
Single source file solution for straightforward parsing needs, useful. Deals with all the nasty edge cases. Such as new line normalization and handling new lines in quoted string literals. Your welcome!
If you CSV file has a header you just read out the column names (and compute column indexes) from the first row. Simple as that.
Note that Dump is a LINQPad method, you might want to remove that if you are not using LINQPad.
void Main()
{
var file1 = "a,b,c\r\nx,y,z";
CSV.ParseText(file1).Dump();
var file2 = "a,\"b\",c\r\nx,\"y,z\"";
CSV.ParseText(file2).Dump();
var file3 = "a,\"b\",c\r\nx,\"y\r\nz\"";
CSV.ParseText(file3).Dump();
var file4 = "\"\"\"\"";
CSV.ParseText(file4).Dump();
}
static class CSV
{
public struct Record
{
public readonly string[] Row;
public string this[int index] => Row[index];
public Record(string[] row)
{
Row = row;
}
}
public static List<Record> ParseText(string text)
{
return Parse(new StringReader(text));
}
public static List<Record> ParseFile(string fn)
{
using (var reader = File.OpenText(fn))
{
return Parse(reader);
}
}
public static List<Record> Parse(TextReader reader)
{
var data = new List<Record>();
var col = new StringBuilder();
var row = new List<string>();
for (; ; )
{
var ln = reader.ReadLine();
if (ln == null) break;
if (Tokenize(ln, col, row))
{
data.Add(new Record(row.ToArray()));
row.Clear();
}
}
return data;
}
public static bool Tokenize(string s, StringBuilder col, List<string> row)
{
int i = 0;
if (col.Length > 0)
{
col.AppendLine(); // continuation
if (!TokenizeQuote(s, ref i, col, row))
{
return false;
}
}
while (i < s.Length)
{
var ch = s[i];
if (ch == ',')
{
row.Add(col.ToString().Trim());
col.Length = 0;
i++;
}
else if (ch == '"')
{
i++;
if (!TokenizeQuote(s, ref i, col, row))
{
return false;
}
}
else
{
col.Append(ch);
i++;
}
}
if (col.Length > 0)
{
row.Add(col.ToString().Trim());
col.Length = 0;
}
return true;
}
public static bool TokenizeQuote(string s, ref int i, StringBuilder col, List<string> row)
{
while (i < s.Length)
{
var ch = s[i];
if (ch == '"')
{
// escape sequence
if (i + 1 < s.Length && s[i + 1] == '"')
{
col.Append('"');
i++;
i++;
continue;
}
i++;
return true;
}
else
{
col.Append(ch);
i++;
}
}
return false;
}
}
Another one to this list, Cinchoo ETL - an open source library to read and write multiple file formats (CSV, flat file, Xml, JSON etc)
Sample below shows how to read CSV file quickly (No POCO object required)
string csv = #"Id, Name
1, Carl
2, Tom
3, Mark";
using (var p = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in p)
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
Sample below shows how to read CSV file using POCO object
public partial class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
static void CSVTest()
{
string csv = #"Id, Name
1, Carl
2, Tom
3, Mark";
using (var p = ChoCSVReader<EmployeeRec>.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in p)
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
}
Please check out articles at CodeProject on how to use it.
This parser supports nested commas and quotes in a column:
static class CSVParser
{
public static string[] ParseLine(string line)
{
List<string> cols = new List<string>();
string value = null;
for(int i = 0; i < line.Length; i++)
{
switch(line[i])
{
case ',':
cols.Add(value);
value = null;
if(i == line.Length - 1)
{// It ends with comma
cols.Add(null);
}
break;
case '"':
cols.Add(ParseEnclosedColumn(line, ref i));
i++;
break;
default:
value += line[i];
if (i == line.Length - 1)
{// Last character
cols.Add(value);
}
break;
}
}
return cols.ToArray();
}//ParseLine
static string ParseEnclosedColumn(string line, ref int index)
{// Example: "b"",bb"
string value = null;
int numberQuotes = 1;
int index2 = index;
for (int i = index + 1; i < line.Length; i++)
{
index2 = i;
switch (line[i])
{
case '"':
numberQuotes++;
if (numberQuotes % 2 == 0)
{
if (i < line.Length - 1 && line[i + 1] == ',')
{
index = i;
return value;
}
}
else if (i > index + 1 && line[i - 1] == '"')
{
value += '"';
}
break;
default:
value += line[i];
break;
}
}
index = index2;
return value;
}//ParseEnclosedColumn
}//class CSVParser
Based on unlimit's post on How to properly split a CSV using C# split() function? :
string[] tokens = System.Text.RegularExpressions.Regex.Split(paramString, ",");
NOTE: this doesn't handle escaped / nested commas, etc., and therefore is only suitable for certain simple CSV lists.
If anyone wants a snippet they can plop into their code without having to bind a library or download a package. Here is a version I wrote:
public static string FormatCSV(List<string> parts)
{
string result = "";
foreach (string s in parts)
{
if (result.Length > 0)
{
result += ",";
if (s.Length == 0)
continue;
}
if (s.Length > 0)
{
result += "\"" + s.Replace("\"", "\"\"") + "\"";
}
else
{
// cannot output double quotes since its considered an escape for a quote
result += ",";
}
}
return result;
}
enum CSVMode
{
CLOSED = 0,
OPENED_RAW = 1,
OPENED_QUOTE = 2
}
public static List<string> ParseCSV(string input)
{
List<string> results;
CSVMode mode;
char[] letters;
string content;
mode = CSVMode.CLOSED;
content = "";
results = new List<string>();
letters = input.ToCharArray();
for (int i = 0; i < letters.Length; i++)
{
char letter = letters[i];
char nextLetter = '\0';
if (i < letters.Length - 1)
nextLetter = letters[i + 1];
// If its a quote character
if (letter == '"')
{
// If that next letter is a quote
if (nextLetter == '"' && mode == CSVMode.OPENED_QUOTE)
{
// Then this quote is escaped and should be added to the content
content += letter;
// Skip the escape character
i++;
continue;
}
else
{
// otherwise its not an escaped quote and is an opening or closing one
// Character is skipped
// If it was open, then close it
if (mode == CSVMode.OPENED_QUOTE)
{
results.Add(content);
// reset the content
content = "";
mode = CSVMode.CLOSED;
// If there is a next letter available
if (nextLetter != '\0')
{
// If it is a comma
if (nextLetter == ',')
{
i++;
continue;
}
else
{
throw new Exception("Expected comma. Found: " + nextLetter);
}
}
}
else if (mode == CSVMode.OPENED_RAW)
{
// If it was opened raw, then just add the quote
content += letter;
}
else if (mode == CSVMode.CLOSED)
{
// Otherwise open it as a quote
mode = CSVMode.OPENED_QUOTE;
}
}
}
// If its a comma seperator
else if (letter == ',')
{
// If in quote mode
if (mode == CSVMode.OPENED_QUOTE)
{
// Just read it
content += letter;
}
// If raw, then close the content
else if (mode == CSVMode.OPENED_RAW)
{
results.Add(content);
content = "";
mode = CSVMode.CLOSED;
}
// If it was closed, then open it raw
else if (mode == CSVMode.CLOSED)
{
mode = CSVMode.OPENED_RAW;
results.Add(content);
content = "";
}
}
else
{
// If opened quote, just read it
if (mode == CSVMode.OPENED_QUOTE)
{
content += letter;
}
// If opened raw, then read it
else if (mode == CSVMode.OPENED_RAW)
{
content += letter;
}
// It closed, then open raw
else if (mode == CSVMode.CLOSED)
{
mode = CSVMode.OPENED_RAW;
content += letter;
}
}
}
// If it was still reading when the buffer finished
if (mode != CSVMode.CLOSED)
{
results.Add(content);
}
return results;
}
For smaller input CSV data LINQ is fully enough.
For example for the following CSV file content:
schema_name,description,utype
"IX_HE","High-Energy data","x"
"III_spectro","Spectrosopic data","d"
"VI_misc","Miscellaneous","f"
"vcds1","Catalogs only available in CDS","d"
"J_other","Publications from other journals","b"
when we read the whole content into single string called data, then
using System;
using System.IO;
using System.Linq;
var data = File.ReadAllText(Path2CSV);
// helper split characters
var newline = Environment.NewLine.ToCharArray();
var comma = ",".ToCharArray();
var quote = "\"".ToCharArray();
// split input string data to lines
var lines = data.Split(newline);
// first line is header, take the header fields
foreach (var col in lines.First().Split(comma)) {
// do something with "col"
}
// we skip the first line, all the rest are real data lines/fields
foreach (var line in lines.Skip(1)) {
// first we split the data line by comma character
// next we remove double qoutes from each splitted element using Trim()
// finally we make an array
var fields = line.Split(comma)
.Select(_ => { _ = _.Trim(quote); return _; })
.ToArray();
// do something with the "fields" array
}

How to search for multiple strings and keep counters for them

What I'm trying to do is the following - I have hundreds of log files, that I need to search through and do some counting. The basic idea is this, take a .txt file, read every line, if search item 1 is found, increment the counter for search item 1, if search item 2 is found, increment the counter for search item 2 and so on.. For example, if the file contained something like...
a b c
d e f
g h i
j k h
And If I specified the searchables to be e & h, the output should say
e : 1
h : 2
The number of search terms is expandable, basically the user can give either 1 search number or 10, so i'm not sure how I can implement n number of counters based on the number of searchables.
The below is what I have so far, its just a basic approach to see what works and what doesnt... Right now, it only keeps the count for one of the search terms. At the moment, I am writing the results to the console to just test, ultimately, It will be written to a .txt or .xlsx. any help will be appreciated!
string line;
int Scounter = 0;
int Mcounter = 0;
List<string> searchables = new List<string>();
private void search_Log(string p)
{
searchables.Add("S");
searchables.Add("M");
StreamReader reader = new StreamReader(p);
while ((line = reader.ReadLine()) != null)
{
for (int i = 0; i < searchables.Count(); i++)
{
if (line.Contains(searchables[i]))
{
Scounter++;
}
}
}
reader.Close();
Console.WriteLine("# of S: " + Scounter);
Console.WriteLine("# of M: " + Mcounter);
}
A common approach to this is to use a Dictionary<string, int> to track the values and counts:
// Initialise the dictionary:
Dictionary<string, int> counters = new Dictionary<string, int>();
Then later:
if (line.Contains(searchables[i]))
{
if (counters.ContainsKey(searchables[i]))
{
counters[searchables[i]] ++;
}
else
{
counters.Add(searchables[i], 1);
}
}
Then, when you are finished processing:
// Add in any searches which had no results:
foreach (var searchTerm in searchables)
{
if (counters.ContainsKey(searchTerm) == false)
{
counters.Add(searchTerm, 0);
}
}
foreach (var item in counters)
{
Console.WriteLine("Value {0} occurred {1} times", item.Key, item.Value);
}
you could use a class for the searchables like:
public class Searchable
{
public string searchTerm;
public int count;
}
then
while ((line = reader.ReadLine()) != null)
{
foreach (var searchable in searchables)
{
if (line.Contains(searchable.searchTerm))
{
searchable.count++;
}
}
}
This would be one of many ways to track multiple search terms and their counts.
You can make use of linq here:
string lines = reader.ReadtoEnd();
var result = lines.Split(new string[]{" ","\r\n"},StringSplitOptions.RemoveEmptyEntries)
.GroupBy(x=>x)
.Select(g=> new
{
Alphabet = g.Key ,
Count = g.Count()
}
);
Input:
a b c
d e f
Output :
a: 1
b: 1
c: 1
d: 1
e: 1
f: 1
This version will count 1^n search terms that occur 1^n times per file line. It accounts for the possibility of a term existing more than once on one line.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication5
{
class Program
{
static void Main(string[] args)
{
Func<string, string[], Dictionary<string, int>> searchForCounts = null;
searchForCounts = (filePathAndName, searchTerms) =>
{
Dictionary<string, int> results = new Dictionary<string, int>();
if (string.IsNullOrEmpty(filePathAndName) || !File.Exists(filePathAndName))
return results;
using (TextReader tr = File.OpenText(filePathAndName))
{
string line = null;
while ((line = tr.ReadLine()) != null)
{
for (int i = 0; i < searchTerms.Length; ++i)
{
var searchTerm = searchTerms[i].ToLower();
var index = 0;
while (index > -1)
{
index = line.IndexOf(searchTerm, index, StringComparison.OrdinalIgnoreCase);
if (index > -1)
{
if (results.ContainsKey(searchTerm))
results[searchTerm] += 1;
else
results[searchTerm] = 1;
index += searchTerm.Length - 1;
}
}
}
}
}
return results;
};
var counts = searchForCounts("D:\\Projects\\ConsoleApplication5\\ConsoleApplication5\\TestLog.txt", new string[] { "one", "two" });
Console.WriteLine("----Counts----");
foreach (var keyPair in counts)
{
Console.WriteLine("Term: " + keyPair.Key.PadRight(10, ' ') + " Count: " + keyPair.Value.ToString());
}
Console.ReadKey(true);
}
}
}
Input:
OnE, TwO
Output:
----Counts----
Term: one Count: 7
Term: two Count: 15

Regex to get all "cells" form csv file row [duplicate]

Is there a default/official/recommended way to parse CSV files in C#? I don't want to roll my own parser.
Also, I've seen instances of people using ODBC/OLE DB to read CSV via the Text driver, and a lot of people discourage this due to its "drawbacks." What are these drawbacks?
Ideally, I'm looking for a way through which I can read the CSV by column name, using the first record as the header / field names. Some of the answers given are correct but work to basically deserialize the file into classes.
A CSV parser is now a part of .NET Framework.
Add a reference to Microsoft.VisualBasic.dll (works fine in C#, don't mind the name)
using (TextFieldParser parser = new TextFieldParser(#"c:\temp\test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
The docs are here - TextFieldParser Class
P.S. If you need a CSV exporter, try CsvExport (discl: I'm one of the contributors)
CsvHelper (a library I maintain) will read a CSV file into custom objects.
using (var reader = new StreamReader("path\\to\\file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
var records = csv.GetRecords<Foo>();
}
Sometimes you don't own the objects you're trying to read into. In this case, you can use fluent mapping because you can't put attributes on the class.
public sealed class MyCustomObjectMap : CsvClassMap<MyCustomObject>
{
public MyCustomObjectMap()
{
Map( m => m.Property1 ).Name( "Column Name" );
Map( m => m.Property2 ).Index( 4 );
Map( m => m.Property3 ).Ignore();
Map( m => m.Property4 ).TypeConverter<MySpecialTypeConverter>();
}
}
Let a library handle all the nitty-gritty details for you! :-)
Check out FileHelpers and stay DRY - Don't Repeat Yourself - no need to re-invent the wheel a gazillionth time....
You basically just need to define that shape of your data - the fields in your individual line in the CSV - by means of a public class (and so well-thought out attributes like default values, replacements for NULL values and so forth), point the FileHelpers engine at a file, and bingo - you get back all the entries from that file. One simple operation - great performance!
In a business application, i use the Open Source project on codeproject.com, CSVReader.
It works well, and has good performance. There is some benchmarking on the link i provided.
A simple example, copied from the project page:
using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));
Console.WriteLine();
}
}
As you can see, it's very easy to work with.
I know its a bit late but just found a library Microsoft.VisualBasic.FileIO which has TextFieldParser class to process csv files.
Here is a helper class I use often, in case any one ever comes back to this thread (I wanted to share it).
I use this for the simplicity of porting it into projects ready to use:
public class CSVHelper : List<string[]>
{
protected string csv = string.Empty;
protected string separator = ",";
public CSVHelper(string csv, string separator = "\",\"")
{
this.csv = csv;
this.separator = separator;
foreach (string line in Regex.Split(csv, System.Environment.NewLine).ToList().Where(s => !string.IsNullOrEmpty(s)))
{
string[] values = Regex.Split(line, separator);
for (int i = 0; i < values.Length; i++)
{
//Trim values
values[i] = values[i].Trim('\"');
}
this.Add(values);
}
}
}
And use it like:
public List<Person> GetPeople(string csvContent)
{
List<Person> people = new List<Person>();
CSVHelper csv = new CSVHelper(csvContent);
foreach(string[] line in csv)
{
Person person = new Person();
person.Name = line[0];
person.TelephoneNo = line[1];
people.Add(person);
}
return people;
}
[Updated csv helper: bug fixed where the last new line character created a new line]
If you need only reading csv files then I recommend this library: A Fast CSV Reader
If you also need to generate csv files then use this one: FileHelpers
Both of them are free and opensource.
This solution is using the official Microsoft.VisualBasic assembly to parse CSV.
Advantages:
delimiter escaping
ignores Header
trim spaces
ignore comments
Code:
using Microsoft.VisualBasic.FileIO;
public static List<List<string>> ParseCSV (string csv)
{
List<List<string>> result = new List<List<string>>();
// To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project.
using (TextFieldParser parser = new TextFieldParser(new StringReader(csv)))
{
parser.CommentTokens = new string[] { "#" };
parser.SetDelimiters(new string[] { ";" });
parser.HasFieldsEnclosedInQuotes = true;
// Skip over header line.
//parser.ReadLine();
while (!parser.EndOfData)
{
var values = new List<string>();
var readFields = parser.ReadFields();
if (readFields != null)
values.AddRange(readFields);
result.Add(values);
}
}
return result;
}
I have written TinyCsvParser for .NET, which is one of the fastest .NET parsers around and highly configurable to parse almost any CSV format.
It is released under MIT License:
https://github.com/bytefish/TinyCsvParser
You can use NuGet to install it. Run the following command in the Package Manager Console.
PM> Install-Package TinyCsvParser
Usage
Imagine we have list of Persons in a CSV file persons.csv with their first name, last name and birthdate.
FirstName;LastName;BirthDate
Philipp;Wagner;1986/05/12
Max;Musterman;2014/01/02
The corresponding domain model in our system might look like this.
private class Person
{
public string FirstName { get; set; }
public string LastName { get; set; }
public DateTime BirthDate { get; set; }
}
When using TinyCsvParser you have to define the mapping between the columns in the CSV data and the property in you domain model.
private class CsvPersonMapping : CsvMapping<Person>
{
public CsvPersonMapping()
: base()
{
MapProperty(0, x => x.FirstName);
MapProperty(1, x => x.LastName);
MapProperty(2, x => x.BirthDate);
}
}
And then we can use the mapping to parse the CSV data with a CsvParser.
namespace TinyCsvParser.Test
{
[TestFixture]
public class TinyCsvParserTest
{
[Test]
public void TinyCsvTest()
{
CsvParserOptions csvParserOptions = new CsvParserOptions(true, new[] { ';' });
CsvPersonMapping csvMapper = new CsvPersonMapping();
CsvParser<Person> csvParser = new CsvParser<Person>(csvParserOptions, csvMapper);
var result = csvParser
.ReadFromFile(#"persons.csv", Encoding.ASCII)
.ToList();
Assert.AreEqual(2, result.Count);
Assert.IsTrue(result.All(x => x.IsValid));
Assert.AreEqual("Philipp", result[0].Result.FirstName);
Assert.AreEqual("Wagner", result[0].Result.LastName);
Assert.AreEqual(1986, result[0].Result.BirthDate.Year);
Assert.AreEqual(5, result[0].Result.BirthDate.Month);
Assert.AreEqual(12, result[0].Result.BirthDate.Day);
Assert.AreEqual("Max", result[1].Result.FirstName);
Assert.AreEqual("Mustermann", result[1].Result.LastName);
Assert.AreEqual(2014, result[1].Result.BirthDate.Year);
Assert.AreEqual(1, result[1].Result.BirthDate.Month);
Assert.AreEqual(1, result[1].Result.BirthDate.Day);
}
}
}
User Guide
A full User Guide is available at:
http://bytefish.github.io/TinyCsvParser/
Here is a short and simple solution.
using (TextFieldParser parser = new TextFieldParser(outputLocation))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
string[] headers = parser.ReadLine().Split(',');
foreach (string header in headers)
{
dataTable.Columns.Add(header);
}
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
dataTable.Rows.Add(fields);
}
}
Here is my KISS implementation...
using System;
using System.Collections.Generic;
using System.Text;
class CsvParser
{
public static List<string> Parse(string line)
{
const char escapeChar = '"';
const char splitChar = ',';
bool inEscape = false;
bool priorEscape = false;
List<string> result = new List<string>();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < line.Length; i++)
{
char c = line[i];
switch (c)
{
case escapeChar:
if (!inEscape)
inEscape = true;
else
{
if (!priorEscape)
{
if (i + 1 < line.Length && line[i + 1] == escapeChar)
priorEscape = true;
else
inEscape = false;
}
else
{
sb.Append(c);
priorEscape = false;
}
}
break;
case splitChar:
if (inEscape) //if in escape
sb.Append(c);
else
{
result.Add(sb.ToString());
sb.Length = 0;
}
break;
default:
sb.Append(c);
break;
}
}
if (sb.Length > 0)
result.Add(sb.ToString());
return result;
}
}
Some time ago I had wrote simple class for CSV read/write based on Microsoft.VisualBasic library. Using this simple class you will be able to work with CSV like with 2 dimensions array. You can find my class by the following link: https://github.com/ukushu/DataExporter
Simple example of usage:
Csv csv = new Csv("\t");//delimiter symbol
csv.FileOpen("c:\\file1.csv");
var row1Cell6Value = csv.Rows[0][5];
csv.AddRow("asdf","asdffffff","5")
csv.FileSave("c:\\file2.csv");
For reading header only you need is to read csv.Rows[0] cells :)
This code reads csv to DataTable:
public static DataTable ReadCsv(string path)
{
DataTable result = new DataTable("SomeData");
using (TextFieldParser parser = new TextFieldParser(path))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
bool isFirstRow = true;
//IList<string> headers = new List<string>();
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
if (isFirstRow)
{
foreach (string field in fields)
{
result.Columns.Add(new DataColumn(field, typeof(string)));
}
isFirstRow = false;
}
else
{
int i = 0;
DataRow row = result.NewRow();
foreach (string field in fields)
{
row[i++] = field;
}
result.Rows.Add(row);
}
}
}
return result;
}
Single source file solution for straightforward parsing needs, useful. Deals with all the nasty edge cases. Such as new line normalization and handling new lines in quoted string literals. Your welcome!
If you CSV file has a header you just read out the column names (and compute column indexes) from the first row. Simple as that.
Note that Dump is a LINQPad method, you might want to remove that if you are not using LINQPad.
void Main()
{
var file1 = "a,b,c\r\nx,y,z";
CSV.ParseText(file1).Dump();
var file2 = "a,\"b\",c\r\nx,\"y,z\"";
CSV.ParseText(file2).Dump();
var file3 = "a,\"b\",c\r\nx,\"y\r\nz\"";
CSV.ParseText(file3).Dump();
var file4 = "\"\"\"\"";
CSV.ParseText(file4).Dump();
}
static class CSV
{
public struct Record
{
public readonly string[] Row;
public string this[int index] => Row[index];
public Record(string[] row)
{
Row = row;
}
}
public static List<Record> ParseText(string text)
{
return Parse(new StringReader(text));
}
public static List<Record> ParseFile(string fn)
{
using (var reader = File.OpenText(fn))
{
return Parse(reader);
}
}
public static List<Record> Parse(TextReader reader)
{
var data = new List<Record>();
var col = new StringBuilder();
var row = new List<string>();
for (; ; )
{
var ln = reader.ReadLine();
if (ln == null) break;
if (Tokenize(ln, col, row))
{
data.Add(new Record(row.ToArray()));
row.Clear();
}
}
return data;
}
public static bool Tokenize(string s, StringBuilder col, List<string> row)
{
int i = 0;
if (col.Length > 0)
{
col.AppendLine(); // continuation
if (!TokenizeQuote(s, ref i, col, row))
{
return false;
}
}
while (i < s.Length)
{
var ch = s[i];
if (ch == ',')
{
row.Add(col.ToString().Trim());
col.Length = 0;
i++;
}
else if (ch == '"')
{
i++;
if (!TokenizeQuote(s, ref i, col, row))
{
return false;
}
}
else
{
col.Append(ch);
i++;
}
}
if (col.Length > 0)
{
row.Add(col.ToString().Trim());
col.Length = 0;
}
return true;
}
public static bool TokenizeQuote(string s, ref int i, StringBuilder col, List<string> row)
{
while (i < s.Length)
{
var ch = s[i];
if (ch == '"')
{
// escape sequence
if (i + 1 < s.Length && s[i + 1] == '"')
{
col.Append('"');
i++;
i++;
continue;
}
i++;
return true;
}
else
{
col.Append(ch);
i++;
}
}
return false;
}
}
Another one to this list, Cinchoo ETL - an open source library to read and write multiple file formats (CSV, flat file, Xml, JSON etc)
Sample below shows how to read CSV file quickly (No POCO object required)
string csv = #"Id, Name
1, Carl
2, Tom
3, Mark";
using (var p = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in p)
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
Sample below shows how to read CSV file using POCO object
public partial class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
static void CSVTest()
{
string csv = #"Id, Name
1, Carl
2, Tom
3, Mark";
using (var p = ChoCSVReader<EmployeeRec>.LoadText(csv)
.WithFirstLineHeader()
)
{
foreach (var rec in p)
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
}
Please check out articles at CodeProject on how to use it.
This parser supports nested commas and quotes in a column:
static class CSVParser
{
public static string[] ParseLine(string line)
{
List<string> cols = new List<string>();
string value = null;
for(int i = 0; i < line.Length; i++)
{
switch(line[i])
{
case ',':
cols.Add(value);
value = null;
if(i == line.Length - 1)
{// It ends with comma
cols.Add(null);
}
break;
case '"':
cols.Add(ParseEnclosedColumn(line, ref i));
i++;
break;
default:
value += line[i];
if (i == line.Length - 1)
{// Last character
cols.Add(value);
}
break;
}
}
return cols.ToArray();
}//ParseLine
static string ParseEnclosedColumn(string line, ref int index)
{// Example: "b"",bb"
string value = null;
int numberQuotes = 1;
int index2 = index;
for (int i = index + 1; i < line.Length; i++)
{
index2 = i;
switch (line[i])
{
case '"':
numberQuotes++;
if (numberQuotes % 2 == 0)
{
if (i < line.Length - 1 && line[i + 1] == ',')
{
index = i;
return value;
}
}
else if (i > index + 1 && line[i - 1] == '"')
{
value += '"';
}
break;
default:
value += line[i];
break;
}
}
index = index2;
return value;
}//ParseEnclosedColumn
}//class CSVParser
Based on unlimit's post on How to properly split a CSV using C# split() function? :
string[] tokens = System.Text.RegularExpressions.Regex.Split(paramString, ",");
NOTE: this doesn't handle escaped / nested commas, etc., and therefore is only suitable for certain simple CSV lists.
If anyone wants a snippet they can plop into their code without having to bind a library or download a package. Here is a version I wrote:
public static string FormatCSV(List<string> parts)
{
string result = "";
foreach (string s in parts)
{
if (result.Length > 0)
{
result += ",";
if (s.Length == 0)
continue;
}
if (s.Length > 0)
{
result += "\"" + s.Replace("\"", "\"\"") + "\"";
}
else
{
// cannot output double quotes since its considered an escape for a quote
result += ",";
}
}
return result;
}
enum CSVMode
{
CLOSED = 0,
OPENED_RAW = 1,
OPENED_QUOTE = 2
}
public static List<string> ParseCSV(string input)
{
List<string> results;
CSVMode mode;
char[] letters;
string content;
mode = CSVMode.CLOSED;
content = "";
results = new List<string>();
letters = input.ToCharArray();
for (int i = 0; i < letters.Length; i++)
{
char letter = letters[i];
char nextLetter = '\0';
if (i < letters.Length - 1)
nextLetter = letters[i + 1];
// If its a quote character
if (letter == '"')
{
// If that next letter is a quote
if (nextLetter == '"' && mode == CSVMode.OPENED_QUOTE)
{
// Then this quote is escaped and should be added to the content
content += letter;
// Skip the escape character
i++;
continue;
}
else
{
// otherwise its not an escaped quote and is an opening or closing one
// Character is skipped
// If it was open, then close it
if (mode == CSVMode.OPENED_QUOTE)
{
results.Add(content);
// reset the content
content = "";
mode = CSVMode.CLOSED;
// If there is a next letter available
if (nextLetter != '\0')
{
// If it is a comma
if (nextLetter == ',')
{
i++;
continue;
}
else
{
throw new Exception("Expected comma. Found: " + nextLetter);
}
}
}
else if (mode == CSVMode.OPENED_RAW)
{
// If it was opened raw, then just add the quote
content += letter;
}
else if (mode == CSVMode.CLOSED)
{
// Otherwise open it as a quote
mode = CSVMode.OPENED_QUOTE;
}
}
}
// If its a comma seperator
else if (letter == ',')
{
// If in quote mode
if (mode == CSVMode.OPENED_QUOTE)
{
// Just read it
content += letter;
}
// If raw, then close the content
else if (mode == CSVMode.OPENED_RAW)
{
results.Add(content);
content = "";
mode = CSVMode.CLOSED;
}
// If it was closed, then open it raw
else if (mode == CSVMode.CLOSED)
{
mode = CSVMode.OPENED_RAW;
results.Add(content);
content = "";
}
}
else
{
// If opened quote, just read it
if (mode == CSVMode.OPENED_QUOTE)
{
content += letter;
}
// If opened raw, then read it
else if (mode == CSVMode.OPENED_RAW)
{
content += letter;
}
// It closed, then open raw
else if (mode == CSVMode.CLOSED)
{
mode = CSVMode.OPENED_RAW;
content += letter;
}
}
}
// If it was still reading when the buffer finished
if (mode != CSVMode.CLOSED)
{
results.Add(content);
}
return results;
}
For smaller input CSV data LINQ is fully enough.
For example for the following CSV file content:
schema_name,description,utype
"IX_HE","High-Energy data","x"
"III_spectro","Spectrosopic data","d"
"VI_misc","Miscellaneous","f"
"vcds1","Catalogs only available in CDS","d"
"J_other","Publications from other journals","b"
when we read the whole content into single string called data, then
using System;
using System.IO;
using System.Linq;
var data = File.ReadAllText(Path2CSV);
// helper split characters
var newline = Environment.NewLine.ToCharArray();
var comma = ",".ToCharArray();
var quote = "\"".ToCharArray();
// split input string data to lines
var lines = data.Split(newline);
// first line is header, take the header fields
foreach (var col in lines.First().Split(comma)) {
// do something with "col"
}
// we skip the first line, all the rest are real data lines/fields
foreach (var line in lines.Skip(1)) {
// first we split the data line by comma character
// next we remove double qoutes from each splitted element using Trim()
// finally we make an array
var fields = line.Split(comma)
.Select(_ => { _ = _.Trim(quote); return _; })
.ToArray();
// do something with the "fields" array
}

How do i create the functions of List<float> in the options file?

I have in the options file two functions GetKey and SetKey.
I set a key then in the settings_file.txt it will look like:
text = hello where text is the key then = and hello is the value for the current key.
Now i need to add another two functions the first one is type of List that get a string and return a List
And a another function that get a Key and a List.
So this is the first two functions allready working GetKey and SetKey:
/*----------------------------------------------------------------
* Module Name : OptionsFile
* Description : Saves and retrievs application options
* Author : Danny
* Date : 10/02/2010
* Revision : 1.00
* --------------------------------------------------------------*/
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Net;
using System.IO;
using System.Configuration;
/*
* Introduction :
*
* This module helps in saving application options
*
*
* Typical file could look like this:
* user_color=Red
* time_left=30
*
*
*
*
*
* */
namespace DannyGeneral
{
class OptionsFile
{
/*----------------------------------------
* P R I V A T E V A R I A B L E S
* ---------------------------------------*/
/*---------------------------------
* P U B L I C M E T H O D S
* -------------------------------*/
string path_exe;
string temp_settings_file;
string temp_settings_dir;
string Options_File;
StreamWriter sw;
StreamReader sr;
/*----------------------------------------------------------
* Function : OptionsFile
* Description : Constructor
* Parameters : file_name is the name of the file to use
* Return : none
* --------------------------------------------------------*/
public OptionsFile(string settings)
{
if (!File.Exists(settings))
{
if (!Directory.Exists(Path.GetDirectoryName(settings)))
{
Directory.CreateDirectory(Path.GetDirectoryName(settings));
}
File.Create(settings).Close();
}
path_exe = Path.GetDirectoryName(Application.LocalUserAppDataPath);
Options_File = settings;
}
/*----------------------------------------------------------
* Function : GetKey
* Description : gets the value of the key.
* Parameters : key
* Return : value of the key if key exist, null if not exist
* --------------------------------------------------------*/
public string GetKey(string key)
{
// string value_of_each_key;
string key_of_each_line;
string line;
int index;
string key_value;
key_value = null;
sr = new StreamReader(Options_File);
while (null != (line = sr.ReadLine()))
{
index = line.IndexOf("=");
// value_of_each_key = line.Substring(index+1);
if (index >= 1)
{
key_of_each_line = line.Substring(0, index);
if (key_of_each_line == key)
{
key_value = line.Substring(key.Length + 1);
}
}
else
{
}
}
sr.Close();
return key_value;
}
/*----------------------------------------------------------
* Function : SetKey
* Description : sets a value to the specified key
* Parameters : key and a value
* Return : none
* --------------------------------------------------------*/
public void SetKey(string key , string value)
{
bool key_was_found_inside_the_loop;
string value_of_each_key;
string key_of_each_line ;
string line;
int index;
key_was_found_inside_the_loop = false;
temp_settings_file = "\\temp_settings_file.txt";
temp_settings_dir = path_exe + #"\temp_settings";
if (!Directory.Exists(temp_settings_dir))
{
Directory.CreateDirectory(temp_settings_dir);
}
sw = new StreamWriter(temp_settings_dir+temp_settings_file);
sr = new StreamReader(Options_File);
while (null != (line = sr.ReadLine()))
{
index = line.IndexOf("=");
key_of_each_line = line.Substring(0, index);
value_of_each_key = line.Substring( index + 1);
// key_value = line.Substring(0,value.Length);
if (key_of_each_line == key)
{
sw.WriteLine(key + " = " + value);
key_was_found_inside_the_loop = true;
}
else
{
sw.WriteLine(key_of_each_line+"="+value_of_each_key);
}
}
if (!key_was_found_inside_the_loop)
{
sw.WriteLine(key + "=" + value);
}
sr.Close();
sw.Close();
File.Delete(Options_File);
File.Move(temp_settings_dir + temp_settings_file, Options_File);
return;
}
After this two functions i did:
public List<float> GetListFloatKey(string keys)
{
int j;
List<float> t;
t = new List<float>();
int i;
for (i = 0; ; i++)
{
j = Convert.ToInt32(GetKey((keys + i).ToString()));
if (j == 0)
{
break;
}
else
{
t.Add(j);
}
}
if (t.Count == 0)
return null;
else
return t;
}
public void SetListFloatKey(string key, List<float> Values)
{
int i;
for (i = 0; i < Values.Count; i++)
{
string indexed_key;
indexed_key = string.Format("{0}{1}", key, i);
// indexed_key = Key + i.ToString();
SetKey(indexed_key, Values[i].ToString());
}
}
But they are not good.
The last one the SetListFloatKey when i put a List in it the result in the text file settings_file.txt is for exmaple:
coordinates01 = 123
coordinates02 = 144
coordinates03 = 145
For every cell/index in the List i get its making a key. What i need is that the List i get will have one key the format in the text file should be like this:
coordinates = 123,144,145......and so on one key and then all the values from the List i get.
Then in the GetListFloatKey i need re format the values according to the key for example coordinates and return a List with the values in index 0 123 in 1 144 in 2 145 and so on....
The qustion if the function the way im doing them are good in the way im using in both GetKey and SetKey ? And how do i format and re format the values ?
At the moment you are calling SetKey within SetListFloatKey for every item in the list. Instead, you need to build a string and call it once, along the lines of (basic testing done):
public static void SetListFloatKey(string key, List<float> Values)
{
StringBuilder sb = new StringBuilder();
foreach (float value in Values)
{
sb.AppendFormat("{0},", value);
}
SetKey(key, sb.ToString());
}
Note I am getting lazy here - the last item will have a comma after it. Then when loading the list:
public static List<float> GetListFloatKey(string keys)
{
List<float> result = new List<float>();
string s = GetKey(keys);
string[] items = s.Split(new char[] { ',' });
float f;
foreach (string item in items)
{
if (float.TryParse(item, out f))
result.Add(f);
}
return result;
}
However, given you are reading and writing an options file, you might want to investigate options around serializing your objects to and from files.
EDIT There are a few ways you can get rid of the extra comma. One way is to not put it in in the first place...
string sep = "";
foreach (float value in Values)
{
sb.AppendFormat("{0}{1}", sep, value);
if (sep == "") sep = ",";
}
...and another is to exclude it in the call to SetKey...
foreach (float value in Values)
{
sb.AppendFormat(",{0}", value);
}
SetKey(key, sb.ToString().Substring(1));
..note that in both of these cases I moved the comma to the start to make life easier. Alternatively, you could store the numbers in an array and use Array.Join.
I think that you are wasting too much time thinking about how to format the file each time you make a change, you are also causing a lot of file overhead each time you check for a key.
Consider using a class like
public class Options
{
public static string FILENAME = #"C:\Test\testfile.txt";
public List<KeyValuePair<string, string>> OrderedKeys { get; set; }
public Dictionary<string, KeyValuePair<string, string>> Pairs { get; set; }
public string GetKey(string key)
{
return this.Pairs[key].Value;
}
public void SetKey(string key, string value)
{
if(this.Pairs.ContainsKey(key))
{
KeyValuePair<string, string> pair = new KeyValuePair<string, string>(key, value);
this.OrderedKeys.Insert(this.OrderedKeys.IndexOf(this.Pairs[key]), pair);
this.Pairs[key] = pair;
}
}
public Options()
{
LoadFile();
}
~Options()
{
WriteFile();
}
private void LoadFile()
{
Regex regex = new Regex(#"(?<key>\S*?)\s*=\s*(?<val>\S*?)\s*\r\n");
MatchCollection matches = regex.Matches(File.ReadAllText(FILENAME));
this.OrderedKeys = new List<KeyValuePair<string, string>>();
this.Pairs = new Dictionary<string, KeyValuePair<string, string>>();
foreach (Match match in matches)
{
KeyValuePair<string, string> pair =
new KeyValuePair<string,string>(match.Groups["key"].Value, match.Groups["val"].Value);
this.OrderedKeys.Add(pair);
this.Pairs.Add(pair.Key, pair);
}
}
private void WriteFile()
{
if (File.Exists(FILENAME))
File.Delete(FILENAME);
using (System.IO.StreamWriter file = new System.IO.StreamWriter(FILENAME))
{
foreach (KeyValuePair<string, string> pair in this.OrderedKeys)
{
file.WriteLine(pair.Key + " = " + pair.Value);
}
}
}
}
Notice that the options object will read from the file once, and writeout when it is destroyed, meanwhile it will hold a local dictionary of the values in your file. You can then GetKey() and SetKey() to get and set your options.
I modified my original post to use a list and a dictionary, this is because a Dictionary on its own does not maintain the original order that pairs are added, so the list ensures that the options are always written to the file in the correct order.
You will also notice I threw in a Regular Expression to parse your file, makes things much easier and quicker and allows for things like extra whitespace in the options file.
Once you have done this it is easy to add functions like
public List<float> GetListFloatKey(string keybase)
{
List<float> ret = new List<float>();
foreach (string key in this.Pairs.Keys)
{
if (Regex.IsMatch(key, keybase + "[0-9]+"))
ret.Add(float.Parse(this.Pairs[key].Value));
}
return ret;
}
public void SetListFloatKey(string keybase, List<float> values)
{
List<string> oldkeys = new List<string>();
int startindex = -1;
foreach (string key in this.Pairs.Keys)
{
if (Regex.IsMatch(key, keybase + "[0-9]+"))
{
if (startindex == -1)
startindex = this.OrderedKeys.IndexOf(this.Pairs[key]);
oldkeys.Add(key);
}
}
foreach (string key in oldkeys)
{
this.OrderedKeys.Remove(this.Pairs[key]);
this.Pairs.Remove(key);
}
for (int i = 0; i < values.Count; i++)
{
KeyValuePair<string, string> pair = new KeyValuePair<string, string>(keybase + i.ToString(), values[i].ToString());
if (startindex != -1)
this.OrderedKeys.Insert(startindex + i, pair);
else
this.OrderedKeys.Add(pair);
this.Pairs.Add(pair.Key, pair);
}
}
It is easier to do that at this point because you have abstracted the actual file structure away and are now just dealing with a Dictionary

Categories