Remove csv row where text does not contain value - c#

I am processing data in a datagrid and want to filter out any rows which do not contain specific text. Is it possible to do this?
Code below is where I am reading the data. I don't want to read/process lines which do not contain the word "INTEREST"
while (fileReader.Peek() != -1)
{
fileRow = fileReader.ReadLine();
fileRow = fileRow.Replace("\"", "");
// fileRow = fileRow.Replace("-", "");
fileDataField = fileRow.Split(delimiter);
fileDataField = fileRow.Split(',');
gridLGTCash.Rows.Add(fileDataField);
}
fileReader.Close();

If you want to skip all lines that contain "INTEREST" check for that string using Contains:
while (fileReader.Peek() != -1)
{
fileRow = fileReader.ReadLine();
if (!fileRow.Contains("INTEREST")) //<--- Add a test here
{
fileRow = fileRow.Replace("\"", "");
// fileRow = fileRow.Replace("-", "");
fileDataField = fileRow.Split(delimiter);
fileDataField = fileRow.Split(',');
gridLGTCash.Rows.Add(fileDataField);
}
}
fileReader.Close();

Here is my fix for no record or blank record on a row. Blank row in cvs looks like this [,,,,,,,..].
using System.Text.RegularExpressions;
:
:
do {
fileRow = fileReader.ReadLine();
if (!Regex.IsMatch(fileRow, #"^,*$"))
{
fileRow = fileRow.Replace("\"", "");
// fileRow = fileRow.Replace("-", "");
fileDataField = fileRow.Split(delimiter);
fileDataField = fileRow.Split(',');
gridLGTCash.Rows.Add(fileDataField);
}
}while (fileReader.Peek() != -1)
fileReader.Close();

First of all, you'll be much better off with a dedicated CSV reader. string.Split() performs poorly and fails for all kinds of edge cases. There are three (at least) built into the framework, but you can easily get other (imo better) options via NuGet. That will likely side-step this whole issue for you.
Assuming for a moment you can't do that, I wouldn't use Peek() for this, either. Use the File.ReadLines() method (note: this is not the same as File.ReadAllLines(), which can be a memory hog):
var lines = File.ReadLines("filename here"))
.Select(line => line.Replace("\"", "").Split(','))
.Where(line => line.Any(field => field.Trim().Length > 0));
gridLGTCash.Rows.AddRange(lines.ToArray());

Related

Read text from a text file with specific pattern

Hi there I have a requirement where i need to read content from a text file. The sample text content is as below.
Name=Check_Amt
Public=Yes
DateName=pp
Name=DBO
I need to read the text and only extract the value which comes after Name='What ever text'.
So I am expecting the output as Check_Amt, DBO
I need to do this in C#
When querying data (e.g. file lines) Linq is often a convenient tool; if the file has lines in
name=value
format, you can query it like this
Read file lines
Split each line into name, value pair
Filter pairs by their names
Extract value from each pair
Materialize values into a collection
Code:
using System.Linq;
...
// string[] {"Check_Amt", "DBO"}
var values = File
.ReadLines(#"c:\MyFile.txt")
.Select(line => line.Split(new char[] { '=' }, 2)) // split into name, value pairs
.Where(items => items.Length == 2) // to be on the safe side
.Where(items => items[0] == "Name") // name == "Name" only
.Select(items => items[1]) // value from name=value
.ToArray(); // let's have an array
finally, if you want comma separated string, Join the values:
// "Check_Amt,DBO"
string result = string.Join(",", values);
Another way:
var str = #"Name=Check_Amt
Public=Yes
DateName=pp
Name=DBO";
var find = "Name=";
var result = new List<string>();
using (var reader = new StringReader(str)) //Change to StreamReader to read from file
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.StartsWith(find))
result.Add(line.Substring(find.Length));
}
}
You can use LINQ to select what you need:
var names=File. ReadLines("my file.txt" ).Select(l=>l.Split('=')).Where(t=>t.Length==2).Where(t=>t[0]=="Name").Select(t=>t[1])
I think that the best case would be a regex.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?<=Name=).*?(?=Public)";
string input = #"Name=Check_Amt Public=Yes DateName=pp Name=DBO";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
EDIT: My answer was written before your question were corrected, while it's still working the LINQ answer would be better IMHO.

Parse CSV file in C# - skip any row that does not match one of two IF conditions

I am using the following C# code to parse a csv file, after which the results will be exported to a SQL Server database.
What I am trying to do with the if statement is say "if the value in column 1 is 1, parse that row one way, if the value in column 1 is 2, parse that row another way..." what I then need to add, where I have the comment in the code below, is to say "otherwise, just skip that row of the csv file."
public List<Entity> ParseCsvFile(List<string> entries, string urlFile)
{
entries.RemoveAt(entries.Count - 1);
entries.RemoveAt(0);
List<Entity> entities = new List<Entity>();
foreach (string line in entries)
{
Entity CsvFile = new Entity();
string[] lineParts = line.Split(',');
if (lineParts[1] == "1")
{
CsvFile.Identifier = $"{lineParts[2]}";
CsvFile.SourceId = $"{lineParts[3]}";
CsvFile.Name = $"{lineParts[5]} {lineParts[6]} {lineParts[7]} {lineParts[8]} " +
$"{lineParts[9]} {lineParts[10]}";
entities.Add(CsvFile);
}
else if (lineParts[1] == "2")
{
CsvFile.Identifier = $"{lineParts[11]}";
CsvFile.SourceId = $"{lineParts[12]}";
CsvFile.Name = $"{lineParts[13]} {lineParts[14]} {lineParts[15]};
entities.Add(CsvFile);
}
//Need to put code here that says "otherwise, skip this line of the CSV file."
}
return entities;
}
Based on this comment, I infer that at least part of your problem is that it isn't that you are having trouble with the syntax of the if statements, but rather that the element you're looking for in the array simply doesn't exist (e.g. if the whole row is blank, or at least has no commas).
Assuming that's the case, then this approach would be more reliable (this will ignore lines that don't have a second field, as well as those where the field doesn't contain an integer value, in case that was yet another issue you might have run into at some point):
if (lineParts.Length < 2 || !int.TryParse(lineParts[1], out int recordType))
{
continue;
}
if (recordType == 1)
{
CsvFile.Identifier = $"{lineParts[2]}";
CsvFile.SourceId = $"{lineParts[3]}";
CsvFile.Name = $"{lineParts[5]} {lineParts[6]} {lineParts[7]} {lineParts[8]} " +
$"{lineParts[9]} {lineParts[10]}";
entities.Add(CsvFile);
}
else if (recordType == 2)
{
CsvFile.Identifier = $"{lineParts[11]}";
CsvFile.SourceId = $"{lineParts[12]}";
CsvFile.Name = $"{lineParts[13]} {lineParts[14]} {lineParts[15]};
entities.Add(CsvFile);
}
For what it's worth, an expression like $"{lineParts[2]}" where the lineParts is already a string[] is pointless and inefficient. And the string.Join() method is helpful if all you want to do is concatenate string values using a particular separator. So, your code could be simplified a bit:
if (lineParts.Length < 2 || !int.TryParse(lineParts[1], out int recordType))
{
continue;
}
if (recordType == 1)
{
CsvFile.Identifier = lineParts[2];
CsvFile.SourceId = lineParts[3];
CsvFile.Name = string.Join(" ", lineParts.Skip(5).Take(6));
entities.Add(CsvFile);
}
else if (recordType == 2)
{
CsvFile.Identifier = lineParts[11];
CsvFile.SourceId = lineParts[12];
CsvFile.Name = string.Join(" ", lineParts.Skip(13).Take(3));
entities.Add(CsvFile);
}
Finally, consider not trying to parse CSV with your own code. The logic you have implemented will work only for the simplest examples of CSV. If you have complete control over the source and can ensure that the file will never have to do things like quote commas or quotation mark characters, then it may work okay. But most CSV data comes from sources outside one's control and it's important to make sure you can handle all the variants found in CSV. See Parsing CSV files in C#, with header for good information on how to do that.

Split does not work as expected with commas

I need to write a CSV Parser I am now trying to separat the fields to manipulate them.
Sample CSV:
mitarbeiter^tagesdatum^lohnart^kostenstelle^kostentraeger^menge^betrag^belegnummer
11005^23.01.2018^1^^31810020^5,00^^
11081^23.01.2018^1^^31810020^5,00^^
As you can see, there a several empty cells.
I am doing the following:
using (CsvFileReader reader = new CsvFileReader(path))
{
CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
csvROW.Add(new aCSVROW());
string[] items = s.Split(new char[] { '^' }, StringSplitOptions.None);
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}
}
Problem:
It seems that Split stops after the comma (5,00). The separator is ^ ... is there a reason why?
I tried several things without success...
Thank you so much!
CsvFileReader reads rows from a CSV file and then strings within that row. What else do you expect the CsvFileReader to do than separating the row?
After reading the second line, row will have the contents
11005^23.01.2018^1^^31810020^5
and
00^^
When you split the first row by ^, the last entry of the resulting array will be "5". Anyway, your code will throw, because you are trying to access items exceeding the bounds of the array.
I don't know CsvFileReader. Maybe you can pass ^ as a separator and spare the splitting of the string. Anyway, you could use a StreamReader, too. This will work much more like you expected.
using (StreamReader reader = new StreamReader(path))
{
while (!reader.EndOfStream)
{
var csvLine = reader.ReadLine();
csvROW.Add(new aCSVROW());
string[] items = csvLine.Split(new char[] { '^' }, StringSplitOptions.None);
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}
Is CsvRow meant to be the data of all rows, or of one row? Because as it is, you keep adding a new aCSVROW object into csvROW for each read line, but you keep replacing the data on just csvROW[0], the first inserted aCSVROW. This means that in the end, you will have a lot of rows that all have no data in them, except for the one on index 0, that had its properties overwritten on each iteration, and ends up containing the data of the last read row.
Also, despite using a CsvReader class, you are using plain normal String.Split to actually separate the fields. Surely that's what the CsvReader class is for?
Personally, I always use the TextFieldParser, from the Microsoft.VisualBasic.FileIO namespace. It has the advantage it's completely native in the .Net framework, and you can simply tell it which separator to use.
This function can get the data out of it as simple List<String[]>:
A:
Using C# to search a CSV file and pull the value in the column next to it
Once you have your data, you can paste it into objects however you want.
List<String[]> lines = SplitFile(path, textEncoding, "^");
// I assume "CsvRow" is some kind of container for multiple rows?
// Looks like pretty bad naming to me...
CsvRow allRows = new CsvRow();
foreach (String items in lines)
{
// Create new object, and add it to list.
aCSVROW row = new aCSVROW();
csvROW.Add(row);
// Fill the actual newly created object, not the first object in allRows.
// conside adding index checks here though to avoid index out of range exceptions.
row.mitarbeiter = items[0];
row.tagesdatum = items[1];
row.lohnart = items[2];
row.kostenstelle = items[3];
row.kostentraeger = items[4];
row.menge = items[5];
row.betrag = items[6];
row.belegnummer = items[7];
}
// Done. All rows added to allRows.
CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
csvROW.Add(new aCSVROW());
s.Split("^","");
csvROW[0].mitarbeiter = items[0];
csvROW[0].tagesdatum = items[1];
csvROW[0].lohnart = items[2];
csvROW[0].kostenstelle = items[3];
csvROW[0].kostentraeger = items[4];
csvROW[0].menge = items[5];
csvROW[0].betrag = items[6];
csvROW[0].belegnummer = items[7];
}
}
}

Unable to order a list using both OrderBy or Sort

So I am trying to sort a file out in a descending order.
The text file looks something like this:
%[TIMESTAMP=1441737006376][EVENT=agentStateEvent][queue=79651][agentID=61871][extension=22801][state=2][reason=0]%
%[TIMESTAMP=1441737006102][EVENT=agentStateEvent][queue=79654][agentID=62278][extension=22828][state=2][reason=0]%
%[TIMESTAMP=1441737006105][EVENT=CallControlTerminalConnectionTalking][callID=2619][ucid=10000026191441907765][deviceType=1][deviceName=21775][Queue=][Trunk=384:82][TrunkType=1][TrunkState=1][Cause=100][CalledDeviceID=07956679058][CallingDeviceID=21775][extension=21775]%
and basically I want the end result to only output unique values of the timestamp. I have used substring to get rid of the excess text, and it outputs fine as shown below:
[TIMESTAMP=1441737006376]
[TIMESTAMP=1441737006102]
[TIMESTAMP=1441737006105]
however i want it to order in the following order (basically numeric descending to ascending):
[TIMESTAMP=1441737006102]
[TIMESTAMP=1441737006105]
[TIMESTAMP=1441737006376]
I have tried the .sort and .orderBy but not having any joy. I wouldve using this prior to doing any substring formatting wouldve sufficed but clearly not.
Code is as follows:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace FedSorter
{
class Program
{
static void Main(string[] args)
{
int counter = 0;
string line;
string readIn = "C:\\Users\\xxx\\Desktop\\Files\\ex1.txt";
System.IO.TextWriter writeOut = new StreamWriter("C:\\Users\\xxx\\Desktop\\Files\\ex1_new.txt");
List<String> list = new List<String>();
// Read the file and display it line by line.
System.IO.StreamReader file = new System.IO.StreamReader(readIn);
string contents = "";
string checkValues = "";
while ((line = file.ReadLine()) != null)
{
string text = line;
text = text.Substring(1, 25);
if (!checkValues.Contains(text))
{
list.Add(text);
Console.WriteLine(text);
writeOut.WriteLine(text);
counter++;
}
contents = text;
checkValues += contents + ",";
}
list = list.OrderBy(x => x).ToList();
writeOut.Close();
file.Close();
orderingFile();
}
public static void orderingFile()
{
string line = "";
string readIn = "C:\\Users\\xxx\\Desktop\\Files\\ex1_new.txt";
System.IO.TextWriter writeOut = new StreamWriter("C:\\Users\\xxx\\Desktop\\Files\\ex1_new2.txt");
List<String> ordering = new List<String>();
// Read the file and display it line by line.
System.IO.StreamReader file = new System.IO.StreamReader(readIn);
while ((line = file.ReadLine()) != null)
{
ordering.OrderBy(x => x).ToList();
ordering.Add(line);
writeOut.WriteLine(line);
}
writeOut.Close();
file.Close();
}
}
}
You are creating a new list and you need to assign it to the variable
list = list.OrderBy(x => x).ToList();
However it doesn't look like you even use list after you create and sort it. Additionally you have the same issue in the orderingFile method with
ordering.OrderBy(x => x).ToList();
However instead of sorting and creating a new list on each line it would be better to use a SortedList<TKey, TValue> that will keep the contents sorted as you add to it.
But again you are not actually using the ordering list after you finish adding to it in the foreach. If you are looking to read the values in a file, sort them and then output them to another file, then you need to do it in that order.
Aside from #juharr's correct answer, you would do well to take advantage of LINQ to simplify your code greatly.
string readIn = "C:\\Users\\xxx\\Desktop\\Files\\ex1.txt";
var timestamps = File.ReadAllLines(readIn)
.Select(l => l.Substring(1, 25))
.Distinct()
.OrderBy(t => t)
.ToArray();
To write out the values, you can either use a foreach on timestamps and write out each line to your TextWriter, or you can use the File class again:
string readOut = "C:\\Users\\xxx\\Desktop\\Files\\ex1_new.txt";
File.WriteAllLines(readOut, timestamps);
//notice I've changed it to ToArray in the first part instead of ToList.

File.ReadLines taking long time to process textfile

I have a text file contains following similar lines for example 500k lines.
ADD GTRX:TRXID=0, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-0", FREQ=81, TRXNO=0, CELLID=639, IDTYPE=BYID, ISMAINBCCH=YES, ISTMPTRX=NO, GTRXGROUPID=2556;
ADD GTRX:TRXID=1, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-1", FREQ=24, TRXNO=1, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;
ADD GTRX:TRXID=5, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-2", FREQ=28, TRXNO=2, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;
ADD GTRX:TRXID=6, TRXNAME="M_RAK_JeerExch_G_1879_18791_A-3", FREQ=67, TRXNO=3, CELLID=639, IDTYPE=BYID, ISMAINBCCH=NO, ISTMPTRX=NO, GTRXGROUPID=2556;
My intention is first to get value for FREQ where ISMAINBCCH=YES that I did easily, but if ISMAINBCCH=NO then concatenate FREQ values for that I have done by using File.ReadLines but it is taking a long time. Is there any better way to do this? If I take FREQ value for ISMAINBCCH=YES then concatenate the values ISMAINBCCH=NO are coming in a range of 10 lines above and below, but I don't know how to implement it. Probably I should get current line where ISMAINBCCH=YES for FREQ. Following is the code what I have done so far
using (StreamReader sr = File.OpenText(filename))
{
while ((s = sr.ReadLine()) != null)
{
if (s.Contains("ADD GTRX:"))
{
try
{
var gtrx = new Gtrx
{
CellId = int.Parse(PullValue(s, "CELLID")),
Freq = int.Parse(PullValue(s, "FREQ")),
//TrxNo = int.Parse(PullValue(s, "TRXNO")),
IsMainBcch = PullValue(s, "ISMAINBCCH").ToUpper() == "YES",
Commabcch = new List<string> { PullValue(s, "ISMAINBCCH") },
DEFINED_TCH_FRQ = null,
TrxName = PullValue(s, "TRXNAME"),
};
var result = String.Join(",",
from ss in File.ReadLines(filename)
where ss.Contains("ADD GTRX:")
where int.Parse(PullValue(ss, "CELLID")) == gtrx.CellId
where PullValue(ss, "ISMAINBCCH").ToUpper() != "YES"
select int.Parse(PullValue(ss, "FREQ")));
}
}
}
gtrx.DEFINED_TCH_FRQ = result;
}
from ss in File.ReadLines(filename)
This reads the entire file, produces an array, which you are then using in a loop (itself from reading the same file) so that array gets thrown away and then created again. You're reading the same file number_of_lines + 1 times when it hasn't changed in the meantime.
An obvious boost would therefore be to just call File.ReadLines(filename) once, store the array and then use that array both for the loop instead of while ((s = sr.ReadLine()) != null) and in the loop instead of that repeated call to ReadLines().
But there's a flaw in your logic in even looking at ReadLines() repeatedly; you're already scanning through the file so you're going to come across all the lines relevant to the same CELLID later anyway:
var gtrxDict = new Dictionary<int, Gtrx>();
using (StreamReader sr = File.OpenText(filename))
{
while ((s = sr.ReadLine()) != null)
{
if (s.Contains("ADD GTRX:"))
{
int cellID = int.Parse(PullValue(s, "CELLID"));
Gtrx gtrx;
if(gtrxDict.TryGetValue(cellID, out gtrx)) // Found previous one
gtrx.DEFINED_TCH_FRQ += "," + int.Parse(PullValue(ss, "FREQ"));
else // First one for this ID, so create a new object
gtrxDict[cellID] = new Gtrx
{
CellId = cellID,
Freq = int.Parse(PullValue(s, "FREQ")),
IsMainBcch = PullValue(s, "ISMAINBCCH").ToUpper() == "YES",
Commabcch = new List<string> { PullValue(s, "ISMAINBCCH") },
DEFINED_TCH_FRQ = int.Parse(PullValue(ss, "FREQ")).ToString(),
TrxName = PullValue(s, "TRXNAME"),
};
}
}
}
This way we don't need to keep more than one line from the file in memory at all, never mind doing so repeatedly. After this has run gtrxDict will contain a Gtrx object for each distinct CELLID in the file, with DEFINED_TCH_FRQ as a comma-separated list of the values from each matching line.
The following code snippet can be used to read the entire text file:
using System.IO;
/// Read Text Document specified by full path
private string ReadTextDocument(string TextFilePath)
{
string _text = String.Empty;
try
{
// open file if exists
if (File.Exists(TextFilePath))
{
using (StreamReader reader = new StreamReader(TextFilePath))
{
_text = reader.ReadToEnd();
reader.Close();
}
}
else
{
throw new FileNotFoundException();
}
return _text;
}
catch { throw; }
}
Get in-memory string, then apply Split() function to create string[] and process array elements in the same way as lines in original text file. In case of processing the very large file this method provides the option of reading it by chunks of data, processing them and then disposing upon completion (re: https://msdn.microsoft.com/en-us/library/system.io.streamreader%28v=vs.110%29.aspx).
As mentioned in comments by #Michael Liu, there is another option of using File.ReadAllText() which provides even more compact solution and can be used instead of reader.ReadToEnd(). Other useful methods of File class are detailed in : https://msdn.microsoft.com/en-us/library/system.io.file%28v=vs.110%29.aspx
And, finally, FileStream class can be used for both file read/write operations with various levels of granularity (re: https://msdn.microsoft.com/en-us/library/system.io.filestream%28v=vs.110%29.aspx).
SUMMARY
In response to the interesting comments thread, here is a brief summary.
The biggest bottleneck pertinent to the procedure described in PO question is Disk IO operations. Here are some numbers: average seek time in good quality HDD is about 5 msec plus the actual read time (per line). It well could be that the entire in-memory file data processing take less time than just a single HDD IO read (sometimes significantly; btw, SSD works better but still not a match to DDR3 RAM). The RAM memory size of modern PC is rather significant (typically 4...8 GB RAM is more than enough to handle most of text files). Thus, the core idea of my solution is to minimize the Disk IO read operations and do entire file data processing in-memory. Implementation can be different, apparently.
Hope this may help. Best regards,
I think that this more-or-less gets you what you want.
First read in all the data:
var data =
(
from s in File.ReadLines(filename)
where s != null
where s.Contains("ADD GTRX:")
select new Gtrx
{
CellId = int.Parse(PullValue(s, "CELLID")),
Freq = int.Parse(PullValue(s, "FREQ")),
//TrxNo = int.Parse(PullValue(s, "TRXNO")),
IsMainBcch = PullValue(s, "ISMAINBCCH").ToUpper() == "YES",
Commabcch = new List<string> { PullValue(s, "ISMAINBCCH") },
DEFINED_TCH_FRQ = null,
TrxName = PullValue(s, "TRXNAME"),
}
).ToArray();
Based on the loaded data create a lookup to return the frequencies based on each cell id:
var lookup =
data
.Where(d => !d.IsMainBcch)
.ToLookup(d => d.CellId, d => d.Freq);
Now update the DEFINED_TCH_FRQ based on the lookup:
foreach (var d in data)
{
d.DEFINED_TCH_FRQ = String.Join(",", lookup[d.CellId]);
}

Categories