How to parse a .txt file with uncommon delimiters in C# - c#

I am currently trying to parse a .txt file containing information listed like this: name/ID number/email/GPA. Here are a few lines showing what the text file looks like.
(LIST (LIST 'Doe 'Jane 'F ) '8888675309 'jfdoe#mail.university.edu 2.3073320999676614 )
(LIST (LIST 'Doe 'John 'F ) 'NONE 'johnfdoe#mail.university.edu 3.1915725161177115 )
(LIST (LIST 'Doe 'Jim 'F ) '8885551234 'jimdoe#mail.university.edu 3.448215586562192 )
In my current code all I am doing is printing the text file line by line to a console window.
static void Main(string[] args)
{
StreamReader inFile;
string inLine;
if (File.Exists("Students.txt"))
{
try
{
inFile = new StreamReader("Students.txt");
while ((inLine = inFile.ReadLine()) != null)
{
Console.WriteLine(inLine);
}
}
catch (System.IO.IOException exc)
{
Console.WriteLine("Error");
}
Console.ReadLine();
}
}
I need to able to, for example, find all the students that have a GPA above 3.0 and print their name and GPA to another text file. I understand how to print to another file, however, I am unsure how to access the individual columns, such as the GPA, since this file does not seem to have any common delimiters that would make using a Split() practical. Any help or insight on how to accomplish this would be appreciated.

IMPORTANT
I considered that the provided string in your question has a fixed format as shown.
IMPLEMENTATION
First, you need to create a class that is blueprint of the information you are getting from the string. It will give you a container to hold a meaningful information about the data.
public class StudentInfo
{
public string Name { get; set; }
public string Number { get; set; }
public string Email { get; set; }
public double GPA { get; set; }
}
Following is an example how to parse the string (string from your question) and convert it to relative information. I assume that you can read/write files in C#.
This sample demonstrates parsing and storing iformation in List. You can further use this to write files.
In you code, you are reading lines and that is why in this sample, I tried to read lines from string so you can understand it better.
I created this sample in C# Console application.
static void Main(string[] args)
{
List<StudentInfo> studentInfo = new List<StudentInfo>();
string input = "(LIST(LIST 'Abbott 'Ashley 'J ) '8697387888 'ajabbott#mail.university.edu 2.3073320999676614 )" + Environment.NewLine +
"(LIST(LIST 'Abbott 'Bradley 'M ) 'NONE 'bmabbott#mail.university.edu 3.1915725161177115 )" + Environment.NewLine +
"(LIST(LIST 'Abbott 'Ryan 'T ) '8698689793 'rtabbott#mail.university.edu 3.448215586562192 )";
string[] lines = input.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
if (lines != null && lines.Count() > 0)
{
foreach (var line in lines)
{
var data = line.Replace("(LIST(LIST ", string.Empty)
.Replace(")", string.Empty)
.Replace("'", string.Empty)
.Trim()
.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
if (data != null && data.Count() > 0)
{
studentInfo.Add(
new StudentInfo()
{
Name = data[0] + " " + data[1] + " " + data[2],
Number = data[3],
Email = data[4],
GPA = Convert.ToDouble(data[5])
});
}
}
}
// GET STUDENTS WHO GOT GPA > 3 (LINQ QUERY)
if (studentInfo.Count > 0)
{
var gpaGreaterThan3 = studentInfo.Where(s => s.GPA >= 3).Select(s => s).ToList();
if (gpaGreaterThan3 != null && gpaGreaterThan3.Count > 0)
{
// LOOP gpaGreaterThan3 TO PRINT STUDENT DATA
foreach (var stud in gpaGreaterThan3)
{
Console.WriteLine("Name: " + stud.Name);
Console.WriteLine("Number: " + stud.Number);
Console.WriteLine("Email: " + stud.Email);
Console.WriteLine("GPA: " + stud.GPA);
Console.WriteLine(string.Empty);
}
}
}
Console.ReadLine();
}

Try this:
var data = inLine.Replace("(LIST(LIST ", string.Empty)
.Replace(")", string.Empty)
.Replace("'", string.Empty)
.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

There are many ways to go about this, but most importantly you need to consider variations to the string format that might trip up any of the approaches
is gpa field always present and at the end?
will it have a defined identifiable format etc
Can there be more than one and if so which one would you pick etc
Below are a couple of approaches with assumptions. You would have to adjust the code per your assumption and how critical this piece of code would be.
// split on both space and closing bracket
// Assumption: GPA field is present and at the end
Console.WriteLine(line.Split(new[] { " ", ")" }, StringSplitOptions.RemoveEmptyEntries).LastOrDefault());
// regex for gpa defined as digit followed by literal . followed by one or more digits
// Assumption: GPA field is present once somewhere in the string.
// No other token conflicts with similar format
var gpaRegex = new Regex(#"\d\.\d+");
Console.WriteLine(gpaRegex.Matches(line)[0]);
See https://dotnetfiddle.net/6Xy0uW for working example
See https://regex101.com/r/P1D7zf/1 for the regex in action where you might try more strict variations

Related

Applying grammar to strings

I'm trying to make the genetive of the inserted name be decided automatically so that I won't have to manually insert the proper genetive for each string (in this case names)
As an example the genetive for James is ' and the genetive for Kennedy is 's.
I guess what I'm trying to say is that I want a cleaner implementation that allows me to skip having to write string Genetive:friend1(2..n) for each name
using System;
namespace Prac
{
class Program
{
static void Main(string[] args)
{
string Friend = ""; string last_char_Friend = ""; string Genetive_Friend = "";
string Friend1 = "Kennedy"; string last_char_Friend1 = Friend1[^1..]; string Genetive_Friend1 = "\'s";
string Friend2 = "James"; string last_char_Friend2 = Friend2[^1..]; string Genetive_Friend2 = "\'";
string Friend3 = "Ngolo Kante"; string last_char_Friend3 = Friend3[^1..]; string Genetive_Friend3 = "\'s";
Console.WriteLine($"My friends are named {Friend1}, {Friend2} and {Friend3}");
Console.WriteLine($"{Friend1}{Genetive_Friend1} name has {Friend1.Length} letters");
Console.WriteLine($"{Friend2}{Genetive_Friend2} name has {Friend2.Length} letters");
Console.WriteLine($"{Friend3}{Genetive_Friend3} name has {Friend3.Length} letters");
for (int i = 1; i < 4; i++)
{
Console.WriteLine($"{Friend + i}{Genetive_Friend + i} name has {(Friend + i).Length} letters");
}
Console.ReadLine();
}
}
}
There simply must be a smarter way for me to ensure that proper grammar is applied to each name, I've got a feeling that I can utilize reading the last char of the Friend string, but how do I within Console.WriteLine pick betweem ' and 's?
I'd like the for loop to print the same as the three individual Console.WriteLine lines.
Also this is my first time asking a question on Stackoverflow, please tell me if I've broken some unwritten rule on how questiosn should be formatted.
Theres a few questions here, firstly to loop through all the arguments you should create an Array (Or list or really anything extending IEnumerable)
then you can iterate over it.
now following your example and not being particularly versed in grammar you can also write a method to check what is the last character of the input string and convert it into the genitive case
static void Main( string[] args )
{
string[] friends = new string[] { "Kennedy", "James", "Ngolo Kante" };
Console.WriteLine($"My friends are named {JoinEndingWithAnd(friends)}");
for ( int i = 1; i < friends.Length; i++ )
{
Console.WriteLine( $"{MakeGenitive(friends[i])} name has {friends[i].Length} letters" );
}
Console.ReadLine();
}
static string JoinEndingWithAnd(string[] friends)
{
string result = friends[0];
for ( int i = 1; i < friends.Length; i++ )
{
if ( i != friends.Length - 1)
{
result += $" , {friends[i]}";
}
else
{
result += $" and {friends[i]}";
}
}
return result;
}
static string MakeGenitive(string friend)
{
char lastLetter = friend[^1];
if( lastLetter == 's' )
{
return friend + "'";
}
return friend + "'s";
}

ArgumentOutOfRangeException: extracting string from string

I have a method that extracts a username from a string using conditionals to check common conventions, although it is resulting in an ArgumentOutOfRangeException on the GetPart utility method, even after explicitly checking before calling it?
Here is the extraction method
public bool TryExtractUsernameFromString(string str, out string username)
{
if (str.Contains("un: "))
{
username = GetPart(str, "un: ", " ");
}
else if (str.Contains("un:"))
{
username = str.Split(" ").Where(x => x.StartsWith("un:")).First().Substring(3);
}
else if (str.Contains("un- "))
{
username = str.IndexOf(" ", str.IndexOf("un- ") + 1) > 0 ? GetPart(str, "un- ", " ") : str[str.IndexOf("un- ")..str.Length];
}
else if (str.Contains("un-"))
{
username = str.Split(" ").Where(x => x.StartsWith("un-")).First().Substring(3);
}
else
{
username = "";
}
return username.Length > 0;
}
I am passing this as the first argument to TryExtractUsernameFromString (without quotes)
"😊un- jennyfromtheblock"
So it happens here,
else if (str.Contains("un- "))
{
username = str.IndexOf(" ", (str.IndexOf("un- ") + 1)) > 0 ? GetPart(str, "un- ", " ") : str[str.IndexOf("un -")..str.Length];
}
But shouldn't be calling GetPart() if it doesn't contain a second space after the first one in the str.Contains check.
GetPart method:
public static string GetPart(string s, string start, string end)
{
return s[(s.IndexOf(start) + start.Length)..s.IndexOf(end)];
}
str.IndexOf("un- ") + 1 is returning the index of the START + 1 of that substring. Try using str.IndexOf("un- ") + 4 instead. That'll get you the index of the second space you're looking for.
#DanRayson looks correct; But I wanted to add there is likely a much cleaner approach to this.
If statements can suck, case statements aren't really better. If you assume any name could have 0 or more matches:
public static void CleanName(string nameString, List<string> badPrefixes)
{
var matchedPrefixes = badPrefixes.Where(w => nameString.Contains(w)
&& && nameString.IndexOf(w) == 0).ToList();
foreach(var prefix in matchedPrefixes)
{
Console.WriteLine(nameString.Replace(prefix, "").Trim());
}
if (!matchedPrefixes.Any())
{
Console.WriteLine(nameString);
}
}
Another option would be using .FirstOrDefault instead of selecting all of the matches. But essentially, just find the match(es) and then remove it, and finally trim spaces.
Example
public static void Main()
{
List<string> badPrefixes = new List<string>()
{
"un:",
"un-",
"un ",
"Un", //Fun example too
};
string longUserName1 = "un- Austin";
string riskyLongName = "un: theUndying";
CleanName(longUserName1, badPrefixes);
// output: Austin
CleanName(riskyLongName, badPrefixes);
// output: theUndying
}

Returning values from text file : C#

I have a .txt file that contains information about vehicles that I add via another project. I want to read the text file, retrieve each VIN number, and place just the actual number itself in a combo box when the form is loaded.
The info for each vehicle in the txt file looks like:
Model: 'model'
Manufacturer: 'manufacturer'
VIN Number: 'VIN number'
This is what I have:
using (StreamReader reader = new StreamReader(#"D:\carCenter\carCenter\bin\Debug\Vehicles.txt"))
{
string[] lines = File.ReadAllLines(#"D:\carCenter\carCenter\bin\Debug\Vehicles.txt");
foreach(string line in lines)
{
if (line.Contains("VIN"))
{
Char colon = ':';
string[] vins = line.Split(new string[] {"VIN Number: "}, StringSplitOptions.None);
for (int i = 0; i < 1; i++)
{
foreach(var vin in vins)
{
vinComboBox.Items.Add(vins[i]);
}
}
}
}
One solution is to have a general purpose function like this:
private String GetDataToRightOfLastColon(String line)
{
line = line.Trim();
var indexOfLastColon = line.LastIndexOf(':');
/* If line does not contain a ':' character,
or ':' is the last non-space character in line,
throw an exception. */
if ((indexOfLastColon == -1) || (indexOfLastColon == (line.Length - 1)))
throw new ArgumentException(
String.Format("The line '{0}' does not have the correct format.", line));
return line.Substring(indexOfLastColon + 1).Trim();
}
Next, apply that function via LINQ to process the text file and populate the combobox:
vinComboBox.Items.AddRange(
File
.ReadAllLines(#"D:\carCenter\carCenter\bin\Debug\Vehicles.txt")
.Where(line => line.Trim().StartsWith("VIN"))
.Select(line => GetDataToRightOfLastColon(line))
.ToArray()
);

Need to count incidents found multiple times within a text file

I'm really trying to count the number of times a regex is found within a text but there are many regex to be found within a text file.
the problem is that my code only counts the first time, the subsequent IF that contains the other regexes will not count, Everything works but the counting of on each line that error occurred :(
could you please shed some light?
int counter = 1;
string liner;
string pattern = #"access-group\s+\w+\s+out\s+interface\s+\w+";
Boolean foundMatch;
int totalOUTgroups = Lines(ofd.FileName)
.Select(line => Regex.Matches(line, pattern).Count)
.Sum();
if (totalOUTgroups > 0)
{
richTextBox2.SelectionFont = new Font("Courier New", 8);
richTextBox2.AppendText(">>> ACls installed by using access-group using the keyword OUT are NOT supported: " + "\u2028");
richTextBox2.AppendText(">>> Total of incidences found: " + totalOUTgroups.ToString() + "\u2028");
System.IO.StreamReader file = new System.IO.StreamReader(ofd.FileName);
while ((liner = file.ReadLine()) != null)
{
foundMatch = performMatch(pattern, liner);
if (foundMatch)
{
richTextBox2.AppendText("Line: " + counter + " " + liner + "\r\n");
}
counter++;
}
}
//Will end 1
// 2 Mark echo-reply ICMP
int counter2 = 1;
string liner2;
string pattern2 = #"/^(?=.*\baccess-list\b)(?=.*\beq echo-reply\b).*$/gm";
Boolean foundMatch2;
int totalIntACLInt = Lines(ofd.FileName)
.Select(line => Regex.Matches(line, pattern2).Count)
.Sum();
if (totalIntACLInt > 0)
{
richTextBox2.SelectionFont = new Font("Courier New", 8);
richTextBox2.AppendText(" " + "\u2028");
richTextBox2.AppendText(">>> Echo-reply is not necessary: " + "\u2028");
richTextBox2.AppendText(">>> Total of incidences found: " + totalIntACLInt.ToString() + "\u2028");
System.IO.StreamReader file = new System.IO.StreamReader(ofd.FileName);
while ((liner2 = file.ReadLine()) != null)
{
foundMatch2 = performMatch(pattern2, liner2);
if (foundMatch2)
{
richTextBox2.AppendText("Line:" + counter2 + " " + liner2 + "\r\n");
}
counter2++;
}
}
If I understand your question, then the problem you're having is most likely tied to your implementation of performMatch(). Post the code for performMatch() if you want help debugging that.
As #Justin lurman pointed out, try printing out each line and line number while only iterating through the file once. If Regex.Matches(line, pattern) is already working for you, then just make use of that.
For example:
int counter = 1;
string pattern = #"access-group\s+\w+\s+out\s+interface\s+\w+";
var totalMatches = 0;
var output = new StringBuilder();
foreach(var line in Lines(ofd.FileName))
{
var matches = Regex.Matches(line, pattern).Count;
if (matches > 0)
{
totalMatches += matches;
output.AppendLine(string.Format("Line: {0} {1}", counter, line));
}
counter++;
}
if(toatlMatches > 0)
{
richTextBox2.SelectionFont = new Font("Courier New", 8);
richTextBox2.AppendText(">>> ACls installed by using access-group using the keyword OUT are NOT supported: " + "\u2028");
richTextBox2.AppendText(">>> Total of incidences found: " + totalMatches.ToString() + "\u2028");
richTextBox2.AppendText(output.ToString());
}
As a warning I haven't compiled or tested the code above, so use it as a guideline. You can certainly improve upon the code further. To start you could refactor your repeated code into methods.
Update
OK, I still don't know that I'm clear on what exactly your problem is, but I wrote out some code that should achieve what it is that I think you're trying to accomplish. While writing my code I noticed some things about the code you posted that may be causing issues for you.
Your second regex /^(?=.*\baccess-list\b)(?=.*\beq echo-reply\b).*$/gm doesn't look like a valid .NET regex, it looks like a JavaScript regex literal
You're appending text to a RichTextBox control, which has a max length property you may be exceeding. I doubt you're writing out that much text, but it's possible.
When this property is set to 0, the maximum length of the text that can be entered in the control is 64 KB of characters
- source
Here is the relevant snippet from the console app I wrote that reads a text file, line by line, and applies a collection of regexes to each line. If a match is found it stores the pertinent information about each match and then prints out its finding once all lines have been examined.
class CommonError
{
public Regex Pattern { get; private set; }
public string Message { get; private set; }
public List<KeyValuePair<int, IEnumerable<string>>> Details { get; private set; }
public CommonError(Regex pattern, string message)
{
Pattern = pattern;
Message = message;
Details = new List<KeyValuePair<int, IEnumerable<string>>>();
}
}
class Program
{
static void Main(string[] args)
{
//take a file read it once and while reading each line check if that line matches any of a slew of regexes.
//if it does match a regex then add the line number and the matching text into a collection of matches for that regex.
//at the end output all the matches by regex and the totals for each pattern. Along with printing each match also print the line it was found on.
var errorsToFind = new List<CommonError>()
{
new CommonError(new Regex(#"access-group\s+\w+\s+out\s+interface\s+\w+"), "ACls installed by using access-group using the keyword OUT are NOT supported"),
new CommonError(new Regex(#"^(?=.*\baccess-list\b)(?=.*\beq echo-reply\b).*$"), "Echo-reply is not necessary")
};
var errorsFound = FindCommonErrorsInFile(".\\test-file.txt", errorsToFind);
foreach (var error in errorsFound)
{
Console.WriteLine(error.Message);
Console.WriteLine("total incidences found: " + error.Details.Count);
error.Details.ForEach(d => Console.WriteLine(string.Format("Line {0} {1}", d.Key, string.Join(",", d.Value))));
}
}
static IEnumerable<CommonError> FindCommonErrorsInFile(string pathToFile, IEnumerable<CommonError> errorsToFind)
{
var lineNumber = 1;
foreach (var line in File.ReadLines(pathToFile))
{
foreach (var error in errorsToFind)
{
var matches = error.Pattern.Matches(line);
if(matches.Count == 0) continue;
var rawMatches = matches.Cast<Match>().Select(m => m.Value);
error.Details.Add(new KeyValuePair<int, IEnumerable<string>>(lineNumber, rawMatches));
}
lineNumber++;
}
return errorsToFind.Where(e => e.Details.Count > 0);
}
}
If you're still having issues give this code a try--this time I actually compiled it and tested it. Hope this helps.

How do I iterate "between" items in an array / collection / list?

This problem has bugged me for years, and I always feel like I'm coming up with a hack when there's a much better solution. The issue at hand occurs when you want to do something to all items in a list and then add something inbetween those items. In short, I want to:
Do something to every item in the list.
Do something else to all but the last item in the list (in effect, do something "inbetween" the items in the list).
For example, let's say I have a class called Equation:
public class Equation
{
public string LeftSide { get; set; }
public string Operator { get; set; }
public string RightSide { get; set; }
}
I want to iterate over a list of Equations and return a string that formats these items together; something like the following:
public string FormatEquationList(List<Equation> listEquations)
{
string output = string.Empty;
foreach (Equation e in listEquations)
{
//format the Equation
string equation = "(" + e.LeftSide + e.Operator + e.RightSide + ")";
//format the "inbetween" part
string inbetween = " and ";
//concatenate the Equation and "inbetween" part to the output
output += equation + inbetween;
}
return ouput;
}
The problem with the above code is that it is going to include and at the end of the returned string. I know that I could hack some code together, replace the foreach with a for loop, and add the inbetween element only if it's not the last item; but this seems like a hack.
Is there a standard methodology for how to deal with this type of problem?
You basically have a few different strategies for dealing with this kind problem:
Process the first (or last) item outside of the loop.
Perform the work and then "undo" the extraneous step.
Detect that your're processing the first or last item inside the loop.
Use a higher-level abstraction that allows you to avoid the situation.
Any of these options can be a legitimate way to implement a "between the items" style of algorithm. Which one you choose depends on things like:
which style you like
how expensive "undoing work" is
how expensive each "join" step is
whether there are any side effects
Amongst other things. For the specific case of string, I personally prefer using string.Join(), as I find it illustrates the intent most clearly. Also, in the case of strings, if you aren't using string.Join(), you should try to use StringBuilder to avoid creating too many temporary strings (a consequence of strings being immutable in .Net).
Using string concatentation as the example, the different options break down into examples as follows. (For simplicity, assume Equation has ToString() as: "(" + LeftSide + Operator + RightSide + ")"
public string FormatEquation( IEnumerable<Equation> listEquations )
{
StringBuilder sb = new StringBuilder();
if( listEquations.Count > 0 )
sb.Append( listEquations[0].ToString() );
for( int i = 1; i < listEquations.Count; i++ )
sb.Append( " and " + listEquations[i].ToString() );
return sb.ToString();
}
The second option looks like:
public string FormatEquation( IEnumerable<Equation> listEquations )
{
StringBuilder sb = new StringBuilder();
const string separator = " and ";
foreach( var eq in listEquations )
sb.Append( eq.ToString() + separator );
if( listEquations.Count > 1 )
sb.Remove( sb.Length, separator.Length );
}
The third would look something like:
public string FormatEquation( IEnumerable<Equation> listEquations )
{
StringBuilder sb = new StringBuilder();
const string separator = " and ";
foreach( var eq in listEquations )
{
sb.Append( eq.ToString() );
if( index == list.Equations.Count-1 )
break;
sb.Append( separator );
}
}
The last option can take multiple forms in .NET, using either String.Join or Linq:
public string FormatEquation( IEnumerable<Equation> listEquations )
{
return string.Join( " and ", listEquations.Select( eq => eq.ToString() ).ToArray() );
}
or:
public string FormatEquation( IEnumerable<Equation> listEquations )
{
return listEquations.Aggregate((a, b) => a.ToString() + " and " + b.ToString() );
}
Personally, I avoid using Aggregate() for string concatenation because it results in many intermediate, discarded strings. It's also not the most obvious way to "join" a bunch of results together - it's primarily geared for computing a "scalar" results from a collection in some arbitrary, caller-defined fashion.
You can use String.Join().
String.Join(" and ",listEquations.Select(e=>String.Format("({0}{1}{2})",e.LeftSide,e.Operator,e.RightSide).ToArray());
You can do this with LINQ's Aggregate operator:
public string FormatEquationList(List<Equation> listEquations)
{
return listEquations.Aggregate((a, b) =>
"(" + a.LeftSide + a.Operator + a.RightSide + ") and (" +
b.LeftSide + b.Operator + b.RightSide + ")");
}
Using a for loop with counter is perfectly reasonable if you don't want a foreach loop. This is why there is more than one type of looping statement.
If you want to process items pairwise, loop at LINQ's Aggregate operator.
I usualy add it before the condition, and check if its the 1st item.
public string FormatEquationList(List<Equation> listEquations)
{
string output = string.Empty;
foreach (Equation e in listEquations)
{
//use conditional to insert your "between" data:
output += (output == String.Empty) ? string.Empty : " and ";
//format the Equation
output += "(" + e.LeftSide + e.Operator + e.RightSide + ")";
}
return ouput;
}
I have to say I would look at the string.Join() function as well, +1 for Linqiness on that. My example is a more of a traditional solution.
I generally try to prefix separators based on a condition rather than add them to the end.
string output = string.Empty;
for (int i = 0; i < 10; i++)
{
output += output == string.Empty ? i.ToString() : " and " + i.ToString();
}
0 and 1 and 2 and 3 and 4 and 5 and 6 and 7 and 8 and 9
I like the String.Join method already posted.
But when you're not using an Array this has normally been my solution to this problem:
public string FormatEquationList(List<Equation> listEquations)
{
string output = string.Empty;
foreach (Equation e in listEquations)
{
// only append " and " when there's something to append to
if (output != string.Empty)
output += " and ";
output += "(" + e.LeftSide + e.Operator + e.RightSide + ")";
}
return output;
}
Of course, it's usually faster to use a StringBuilder:
public string FormatEquationList(List<Equation> listEquations)
{
StringBuilder output = new StringBuilder();
foreach (Equation e in listEquations)
{
// only append " and " when there's something to append to
if (output.Length > 0)
output.Append(" and ");
output.Append("(");
output.Append(e.LeftSide);
output.Append(e.Operator);
output.Append(e.RightSide);
output.Append(")");
}
return output.ToString();
}

Categories