I have a txt file named fileA.txt that I am trying to validate.
here is an example for fileA.txt
123, joshua, employee
134, vernon, manager
382, lisa, HR
So, what I am trying to do is read the contents of fileA and if e.g the value of the first index of the file is suppose to be the employee ID(an int) but has a string. I want to skip that line and go to the next using try catch. However, if everything is fine, I will return its value and add it to a new list. Any ideas on how may I do the validation part?
here is what I have for now to read the file and add it to a new list
public static List<Employee> readlist(string path)
{
var employees = new List<Employee>();
var content = File.ReadAllText(path);
var lines = content.Split('\n');
foreach (var line in lines)
{
var info = line.Split(',');
employees.Add(new Employee
(
int.Parse(info[0]),
info[1],
info[2]
));
}
return employees;
}
Hope what I have provided is sufficient, thank you for all the help in advance!
There is not need of using a try catch, you can simply use Int32.TryParse method to see if the expected value is a number, if is not a number then you just continue checking the other lines.
foreach (var line in lines)
{
var info = line.Split(',');
var isIdValid = Int32.TryParse(info[0], out int employeeId);
if(!isIdValid)
{
Console.WriteLine($"'{info[0]}' could not be parsed as an Int32.");
continue;
}
employees.Add(new Employee
(
employeeId,
info[1],
info[2]
));
}
as the title suggests, I am looking for guidance in how to turn a string (csvData) into a 2D string array by splitting it two times with ';' and ',' respectivly.
Currently I am at the stage where I am able to split it once into rows and turn it into an array, but I cannot figure out how to instead create a 2D array where the columns divided by ',' are also separate.
string[] Sep = csvData.Split(';').Select(csvData => csvData.Replace(" ","")).Where(csvData => !string.IsNullOrEmpty(csvData)).ToArray();
I have tried various things like :
string[,] Sep = csvData.Split(';',',').Select(csvData => csvData.Replace(" ","")).Where(csvData => !string.IsNullOrEmpty(csvData)).ToArray();
naivly thinking that c# would understand what I tried to achieve, but since I am here it's obvious that I got the error that "cannot implicitly convert type string[] to string [*,*]"
Note that I have not coded for a while, so if my thinking is completely wrong and you do not understand what I am trying to convey with this question, I apologize in advance.
Thanks!
In a strongly-typed language like C#, the compiler makes no assumptions about what you intend to do with your data. You must make your intent explicit through your code. Something like this should work:
string csvData = "A,B;C,D";
string[][] sep = csvData.Split(';') // Returns string[] {"A,B","C,D"}
.Select(str => str.Split(',')) // Returns IEnumerable<string[]> {{"A","B"},{"C","D"}}
.ToArray(); // Returns string[][] {{"A","B"},{"C","D"}}
Rows are separated by semicolon, columns by comma?
Splitting by ';' gives you an array of rows. Split a row by ',' gives you an array of values.
If your data has a consistent schema, as in each csv you process has the same columns, you could define a class to represent the entity to make the data easier to with with.
Let's say it's customer data:
John,Smith,8675309,johnsmith#gmail.com;
You could make a class with those properties:
public class Customer
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string Phone { get; set; }
public string Email { get; set; }
}
Then:
var rows = csvdata.Split(';');
List<Customer> customers = new();
foreach(var row in rows)
{
var customer = row.Split(',');
customers.Add(new()
{
FirstName = row[0],
LastName = row[1],
Phone = row[2],
Email = row[3]
});
}
Now you have a list of customers to do whatever it is you do with customers.
Here is an answer to present a few alternative ideas and things you can do with C# - more for educational/academic purposes than anything else. These days to consume a CSV we'd use a CSV library
If your data is definitely regularly formed you can get away with just one Split. The following code splits on either char to make one long array. It then stands to reason that every 4 elements is a new customer, the data of the customer being given by n+0, n+1, n+2 and n+3. Because we know how many data items we will consume, dividing it by 4 gives us the number of customers so we can presize our 2D array
var bits = data.Split(';',',');
var twoD = new string[bits.Length/4,4];
for(int x = 0; x < bits.Length; x+=4){
twoD[x/4,0] = bits[x+0];
twoD[x/4,1] = bits[x+1];
twoD[x/4,2] = bits[x+2];
twoD[x/4,3] = bits[x+3];
}
I don't think I'd use 2D arrays though - and I commend the other answer advising to create a class to hold the related data; you can use this same technique
var custs = new List<Customer>();
for(int x = 0; x < bits.Length;){
custs.Add(new()
{
FirstName = bits[x++],
LastName = bits[x++],
Phone = bits[x++],
Email = bits[x++]
});
}
Here we aren't incrementing x in the loop header; every time a bit of info is assigned x is bumped up by 1 in the loop body. We could have kept the same approach as before, jumping it by 4 - just demoing another approach that lends itself well here.
I mentioned that these days we probably wouldn't really read a csv manually and split ourselves - what if the data contains a comma, or a semicolon - it wrecks the file structure
There are a boatload of libraries that read CSV files, CsvHelper is a popular one, and you'd use it like:
using var reader = new StreamReader("path\\to\\file.csv");
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture)
var custs = csv.GetRecords<Customer>().ToList();
...
Your file would have a header line with column names that match your property names in c#. If it doesn't then you can use attributes on the properties to tell CsvH what column should be mapped to what property - https://joshclose.github.io/CsvHelper/getting-started/
Here's the simplest way I know to produce a 2d array by splitting a string.
string csvData = "A,B,C;D,E,F,G";
var temporary =
csvData
.Split(';')
.SelectMany((xs, i) => xs.Split(',').Select((x, j) => new { x, i, j }))
.ToArray();
int max_i = temporary.Max(x => x.i);
int max_j = temporary.Max(x => x.j);
string[,] array = new string[max_i + 1, max_j + 1];
foreach (var t in temporary)
{
array[t.i, t.j] = t.x;
}
I purposely chose csvData to be missing a value.
temporary is this:
And the final array is this:
I need to write a CSV Parser I am now trying to separat the fields to manipulate them.
Sample CSV:
mitarbeiter^tagesdatum^lohnart^kostenstelle^kostentraeger^menge^betrag^belegnummer
11005^23.01.2018^1^^31810020^5,00^^
11081^23.01.2018^1^^31810020^5,00^^
As you can see, there a several empty cells.
I am doing the following:
using (CsvFileReader reader = new CsvFileReader(path))
{
CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
csvROW.Add(new aCSVROW());
string[] items = s.Split(new char[] { '^' }, StringSplitOptions.None);
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}
}
Problem:
It seems that Split stops after the comma (5,00). The separator is ^ ... is there a reason why?
I tried several things without success...
Thank you so much!
CsvFileReader reads rows from a CSV file and then strings within that row. What else do you expect the CsvFileReader to do than separating the row?
After reading the second line, row will have the contents
11005^23.01.2018^1^^31810020^5
and
00^^
When you split the first row by ^, the last entry of the resulting array will be "5". Anyway, your code will throw, because you are trying to access items exceeding the bounds of the array.
I don't know CsvFileReader. Maybe you can pass ^ as a separator and spare the splitting of the string. Anyway, you could use a StreamReader, too. This will work much more like you expected.
using (StreamReader reader = new StreamReader(path))
{
while (!reader.EndOfStream)
{
var csvLine = reader.ReadLine();
csvROW.Add(new aCSVROW());
string[] items = csvLine.Split(new char[] { '^' }, StringSplitOptions.None);
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}
Is CsvRow meant to be the data of all rows, or of one row? Because as it is, you keep adding a new aCSVROW object into csvROW for each read line, but you keep replacing the data on just csvROW[0], the first inserted aCSVROW. This means that in the end, you will have a lot of rows that all have no data in them, except for the one on index 0, that had its properties overwritten on each iteration, and ends up containing the data of the last read row.
Also, despite using a CsvReader class, you are using plain normal String.Split to actually separate the fields. Surely that's what the CsvReader class is for?
Personally, I always use the TextFieldParser, from the Microsoft.VisualBasic.FileIO namespace. It has the advantage it's completely native in the .Net framework, and you can simply tell it which separator to use.
This function can get the data out of it as simple List<String[]>:
A:
Using C# to search a CSV file and pull the value in the column next to it
Once you have your data, you can paste it into objects however you want.
List<String[]> lines = SplitFile(path, textEncoding, "^");
// I assume "CsvRow" is some kind of container for multiple rows?
// Looks like pretty bad naming to me...
CsvRow allRows = new CsvRow();
foreach (String items in lines)
{
// Create new object, and add it to list.
aCSVROW row = new aCSVROW();
csvROW.Add(row);
// Fill the actual newly created object, not the first object in allRows.
// conside adding index checks here though to avoid index out of range exceptions.
row.mitarbeiter = items[0];
row.tagesdatum = items[1];
row.lohnart = items[2];
row.kostenstelle = items[3];
row.kostentraeger = items[4];
row.menge = items[5];
row.betrag = items[6];
row.belegnummer = items[7];
}
// Done. All rows added to allRows.
CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
csvROW.Add(new aCSVROW());
s.Split("^","");
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}
}
As of now, I am using this code to open a file and read it into a list and parse that list into a string[]:
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
string[] splitCP4DataBaseLines = CP4DataBaseRTB.Text.Split('\n');
List<string> tempCP4List = new List<string>();
string[] line1CP4Components;
foreach (var line in splitCP4DataBaseLines)
tempCP4List.Add(line + Environment.NewLine);
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
line1CP4Components = new Regex("\"UNIT\",\"PARTS\"", RegexOptions.Multiline)
.Split(concattedUnitPart)
.Where(c => !string.IsNullOrEmpty(c)).ToArray();
I am wondering if there is a quicker way to do this. This is just one of the files I am opening, so this is repeated a minimum of 5 times to open and properly load the lists.
The minimum file size being imported right now is 257 KB. The largest file is 1,803 KB. These files will only get larger as time goes on as they are being used to simulate a database and the user will continually add to them.
So my question is, is there a quicker way to do all of the above code?
EDIT:
***CP4***
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106536"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:11"
"PMABAR",""
"COMMENT",""
"PTPNAME","R160805"
"CMPNAME","R160805"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",180
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.25
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.40
"PTPTLCL",10
"PTPTLPX",0.30
"PTPTLPY",0.30
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",100
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
"UNIT","PARTS"
"BLOCK","HEADER-"
"NAME","106653"
"REVISION","0000"
"DATE","11/09/03"
"TIME","11:10:42"
"PMABAR",""
"COMMENT",""
"PTPNAME","0603R"
"CMPNAME","0603R"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",18
"BLOCK","PRTFORM-"
"PTPSZBX",1.60
"PTPSZBY",0.80
"PTPMNH",0.23
"NeedGlue",0
"BLOCK","TOLEINF-"
"PTPTLBX",0.50
"PTPTLBY",0.34
"PTPTLCL",0
"PTPTLPX",0.60
"PTPTLPY",0.40
"PTPTLPQ",30
"BLOCK","ELDT+" "PGDELSN","PGDELX","PGDELY","PGDELPP","PGDELQ","PGDELP","PGDELW","PGDELL","PGDELWT","PGDELLT","PGDELCT","PGDELR"
0,0.000,0.000,0,0,0.000,0.000,0.000,0.000,0.000,0.000,0
"BLOCK","VISION-"
"PTPVIPL",0
"PTPVILCA",0
"PTPVILB",0
"PTPVICVT",10
"PENVILIT",0
"BLOCK","ENVDT"
"ELEMENT","CP43ENVDT-"
"PENNMI",1.0
"PENNMA",1.0
"PENNZN",""
"PENNZT",1.0
"PENBLM",12
"PENCRTS",0
"PENSPD1",80
"PTPCRDCT",0
"PENVICT",1
"PCCCRFT",1
"BLOCK","CARRING-"
"PTPCRAPO",0
"PTPCRPCK",0
"PTPCRPUX",0.00
"PTPCRPUY",0.00
"PTPCRRCV",0
"BLOCK","PACKCLS-"
"FDRTYPE","Emboss"
"TAPEWIDTH","8mm"
"FEEDPITCH",4
"REELDIAMETER",0
"TAPEDEPTH",0.0
"DOADVVACUUM",0
"CHKBEFOREFEED",0
"TAPEARMLENGTH",0
"PPCFDPP",0
"PPCFDEC",4
"PPCMNPT",30
... the file goes on and on and on.. and will only get larger.
The REGEX is placing each "UNIT PARTS" and the following code until the NEXT "UNIT PARTS" into a string[].
After this, I am checking each string[] to see if the "NAME" section exists in a different list. If it does exist, I am outputting that "UNIT PARTS" at the end of a textfile.
This bit is a potential performance killer:
string concattedUnitPart = "";
foreach (var line in tempCP4List)
{
concattedUnitPart = concattedUnitPart + line;
line1CP4PartLines++;
}
(See this article for why.) Use a StringBuilder for repeated concatenation:
// No need to use tempCP4List at all
StringBuilder builder = new StringBuilder();
foreach (var line in splitCP4DataBaseLines)
{
concattedUnitPart.AppendLine(line);
line1CP4PartLines++;
}
Or even just:
string concattedUnitPart = string.Join(Environment.NewLine,
splitCP4DataBaseLines);
Now the regex part may well also be slow - I'm not sure. It's not obvious what you're trying to achieve, whether you need regular expressions at all, or whether you really need to do the whole thing in one go. Can you definitely not just process it line by line?
You could achieve the same output list 'line1CP4Components' using the following:
Regex StripEmptyLines = new Regex(#"^\s*$", RegexOptions.Multiline);
Regex UnitPartsMatch = new Regex(#"(?<=\n)""UNIT"",""PARTS"".*?(?=(?:\n""UNIT"",""PARTS"")|$)", RegexOptions.Singleline);
string CP4DataBase =
"C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
CP4DataBaseRTB.LoadFile(CP4DataBase, RichTextBoxStreamType.PlainText);
List<string> line1CP4Components = new List<string>(
UnitPartsMatch.Matches(StripEmptyLines.Replace(CP4DataBaseRTB.Text, ""))
.OfType<Match>()
.Select(m => m.Value)
);
return line1CP4Components.ToArray();
You may be able to ignore the use of StripEmptyLines, but your original code is doing this via the Where(c => !string.IsNullOrEmpty(c)). Also your original code is causing the '\r' part of the "\r\n" newline/linefeed pair to be duplicated. I assumed this was an accident and not intentional?
Also you don't seem to be using the value in 'line1CP4PartLines' so I omitted the creation of the value. It was seemingly inconsistent with the omission of empty lines later so I guess you're not depending on it. If you need this value a simple regex can tell you how many new lines are in the string:
int linecount = new Regex("^", RegexOptions.Multiline).Matches(CP4DataBaseRTB.Text).Count;
// example of what your code will look like
string CP4DataBase = "C:\\Program\\Line Balancer\\FUJI DB\\KTS\\KTS - CP4 - Part Data Base.txt";
List<string> Cp4DataList = new List<string>(File.ReadAllLines(CP4DataBase);
//or create a Dictionary<int,string[]> object
string strData = string.Empty;//hold the line item data which is read in line by line
string[] strStockListRecord = null;//string array that holds information from the TFE_Stock.txt file
Dictionary<int, string[]> dctStockListRecords = null; //dictionary object that will hold the KeyValuePair of text file contents in a DictList
List<string> lstStockListRecord = null;//Generic list that will store all the lines from the .prnfile being processed
if (File.Exists(strExtraLoadFileLoc + strFileName))
{
try
{
lstStockListRecord = new List<string>();
List<string> lstStrLinesStockRecord = new List<string>(File.ReadAllLines(strExtraLoadFileLoc + strFileName));
dctStockListRecords = new Dictionary<int, string[]>(lstStrLinesStockRecord.Count());
int intLineCount = 0;
foreach (string strLineSplit in lstStrLinesStockRecord)
{
lstStockListRecord.Add(strLineSplit);
dctStockListRecords.Add(intLineCount, lstStockListRecord.ToArray());
lstStockListRecord.Clear();
intLineCount++;
}//foreach (string strlineSplit in lstStrLinesStockRecord)
lstStrLinesStockRecord.Clear();
lstStrLinesStockRecord = null;
lstStockListRecord.Clear();
lstStockListRecord = null;
//Alter the code to fit what you are doing..