Hi I `m trying to grep through file and count number of lines, maximum number of spaces per line, and longest line.
How I can determine "/n" character if i iterate char by char trough given file?
Thanks a lot.
Here is my code that I used for this:
using (StreamReader sr = new StreamReader(p_FileName))
{
char currentChar;
int current_length=0,current_MaximumSpaces=0;
p_LongestLine=0;
p_NumOfLines=0;
p_MaximumSpaces=0;
while (!sr.EndOfStream){
currentChar=Convert.ToChar(sr.Read());
current_length++;
if(Char.IsWhiteSpace(currentChar) || currentChar==null){
current_MaximumSpaces++;
}
if(currentChar == '\n'){
p_NumOfLines++;
}
if(current_length>p_LongestLine){
p_LongestLine=current_length;
}
if(current_MaximumSpaces>p_MaximumSpaces){
p_MaximumSpaces=current_MaximumSpaces;
}
current_length=0;
current_MaximumSpaces=0;
}
sr.Close();
}
if(currentChar == '\n')
count++;
You do not need to go character by character: for your purposes, going line-by-line is sufficient, and you get the .NET to deal with system-dependent line breaks for you as an added bonus.
int maxLen = -1, maxSpaces = -1;
foreach ( var line in File.ReadLines("c:\\data\\myfile.txt")) {
maxLen = Math.Max(maxLen, line.Length);
maxSpaces = Math.Max(maxSpaces, line.Count(c => c == ' '));
}
EDIT: Your program does not work because of an error unrelated to you checking the '\n': you are zeroing out the current_length and current_MaximumSpaces after each character, instead of clearing them only when you see a newline character.
Try comparing to Environment.NewLine
bool is_newline = currentChar.ToString().Equals(Environment.NewLine);
I'm guessing that you newline is actually \r\n (non Unix) ending. You'll need to keep track of the previous/current char's and look for either \r\n or Environment.NewLine.
Related
I have a really bizarre problem with trim method. I'm trying to trim a string received from database. Here's my current method:
string debug = row["PLC_ADDR1_RESULT"].ToString();
SPCFileLog.WriteToLog(String.Format("Debug: ${0}${1}",debug,Environment.NewLine));
debug = debug.Trim();
SPCFileLog.WriteToLog(String.Format("Debug2: ${0}${1}", debug, Environment.NewLine));
debug = debug.Replace(" ", "");
SPCFileLog.WriteToLog(String.Format("Debug3: ${0}${1}", debug, Environment.NewLine));
Which produces file output as following:
Debug: $ $
Debug2: $ $
Debug3: $ $
Examining the hex codes in file revealed something interesting. The supposedly empty spaces aren't hex 20 (whitespace), but they are set as 00 (null?)
How our database contains such data is another mystery, but regardless, I need to trim those invalid (?) null characters. How can I do this?
If you just want to remove all null characters from a string, try this:
debug = debug.Replace("\0", string.Empty);
If you only want to remove them from the ends of the string:
debug = debug.Trim('\0');
There's nothing special about null characters, but they aren't considered white space.
String.Trim() just doesn't consider the NUL character (\0) to be whitespace. Ultimately, it calls this function to determine whitespace, which doesn't treat it as such.
Frankly, I think that makes sense. Typically \0 is not whitespace.
#Will Vousden got me on the right track...
https://stackoverflow.com/a/32624301/12157575
--but instead of trying to rewrite or remove the line, I filtered out lines before hitting the StreamReader / StreamWriter that start with the control character in the linq statement:
string ctrlChar = "\0"; // "NUL" in notepad++
// linq statement: "where"
!line.StartsWith(ctrlChar)
// could also easily do "Contains" instead of "StartsWith"
for more context:
internal class Program
{
private static void Main(string[] args)
{
// dbl space writelines
Out.NewLine = "\r\n\r\n";
WriteLine("Starting Parse Mode...");
string inputFilePath = #"C:\_logs\_input";
string outputFilePath = #"C:\_logs\_output\";
string ouputFileName = #"consolidated_logs.txt";
// chars starting lines we don't want to parse
string hashtag = "#"; // logs notes
string whtSpace = " "; // white space char
string ctrlChar = "\0"; // "NUL" in notepad++
try
{
var files =
from file in Directory.EnumerateFiles(inputFilePath, "*.log", SearchOption.TopDirectoryOnly)
from line in File.ReadLines(file)
where !line.StartsWith(hashtag) &&
!line.StartsWith(whtSpace) &&
line != null &&
!string.IsNullOrWhiteSpace(line) &&
!line.StartsWith(ctrlChar) // CTRL CHAR FILTER
select new
{
File = file,
Line = line
};
using (StreamWriter writer = new StreamWriter(outputFilePath + ouputFileName, true))
{
foreach (var f in files)
{
writer.WriteLine($"{f.File},{f.Line}");
WriteLine($"{f.File},{f.Line}"); // see console
}
WriteLine($"{files.Count()} lines found.");
ReadLine(); // keep console open
}
}
catch (UnauthorizedAccessException uAEx)
{
Console.WriteLine(uAEx.Message);
}
catch (PathTooLongException pathEx)
{
Console.WriteLine(pathEx.Message);
}
}
}
Okay so I'm trying to make a 'console' like text box within a form, however once you reach the bottom, instaid of being able to scroll up, it will just delete the top line, Im having some difficulties.
So far, when it gets to bottom it deletes the top line, however only once, it just carries on as normal. Here is my function:
StringBuilder sr = new StringBuilder();
public void writeLine(string input)
{
string firstline = "";
int numLines = Convert.ToString(sr).Split('\n').Length;
if (numLines > 15) //Max Lines
{
sr.Remove(0, Convert.ToString(sr).Split('\n').FirstOrDefault().Length);
}
sr.Append(input + "\r\n");
consoleTxtBox.Text = Convert.ToString(sr) + numLines;
}
Would be great if someone could fix this, thanks
Lucas
First, what's wrong with your solution: the reason it does not work is that it removes the content of the line, but it ignores the \n at the end. Adding 1 should fix that:
sr.Remove(0, Convert.ToString(sr).Split('\n').FirstOrDefault().Length+1);
// ^
// |
// This will take care of the trailing '\n' after the first line ---+
Now to doing it a simpler way: all you need to do is finding the first \n, and taking substring after it, like this:
string RemoveFirstLine(string s) {
return s.Substring(s.IndexOf(Environment.NewLine)+1);
}
Note that this code does not crash even when there are no newline characters in the string, i.e. when IndexOf returns -1 (in which case nothing is removed).
You can use the Lines property from the TextBox. This will get all the lines in the TextBox, as an array, then create a new array that doesn't include the first element (Skip(1)). It assigns this new array back to the textbox.
string[] lines = textBox.Lines;
textBox.Lines = lines.Skip(1).ToArray();
A simple alternative: you could split the string by Environment.NewLine and return all but the first:
public static string RemoveFirstLine(string input)
{
var lines = input.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
return string.Join(Environment.NewLine, lines.Skip(1));
}
Demo
you can remove this line
var lines = lines.Remove(0, lines.ToString().IndexOf(Environment.NewLine));
Most solutions does not seem to take into account the fact that Enviroment.NewLine can consist of multiple characters (len > 1).
public void RemoveFirstStringFromStringBuilder()
{
var lines = new StringBuilder();
lines.AppendLine("abc");
var firstLine = lines.ToString().IndexOf(Environment.NewLine, StringComparison.Ordinal);
if (firstLine >= 0)
lines.Remove(0, firstLine + Environment.NewLine.Length);
Console.WriteLine(lines.Length);
Console.WriteLine(lines.ToString());
}
Prints out: 0 and ""
What worked for me is:
var strBuilder = new StringBuilder();
strBuilder.AppendLine("ABC");
strBuilder.AppendLine("54");
strBuilder.AppendLine("04");
strBuilder.Remove(0, strBuilder.ToString().IndexOf(Environment.NewLine) + 2);
Console.WriteLine(strBuilder);
Solution with +1 didn't work for me, probably because of EOF in this context being interpreted as 2 chars (\r\n)
I would like to check some string for invalid characters. With invalid characters I mean characters that should not be there. What characters are these? This is different, but I think thats not that importan, important is how should I do that and what is the easiest and best way (performance) to do that?
Let say I just want strings that contains 'A-Z', 'empty', '.', '$', '0-9'
So if i have a string like "HELLO STaCKOVERFLOW" => invalid, because of the 'a'.
Ok now how to do that? I could make a List<char> and put every char in it that is not allowed and check the string with this list. Maybe not a good idea, because there a lot of chars then. But I could make a list that contains all of the allowed chars right? And then? For every char in the string I have to compare the List<char>? Any smart code for this? And another question: if I would add A-Z to the List<char> I have to add 25 chars manually, but these chars are as I know 65-90 in the ASCII Table, can I add them easier? Any suggestions? Thank you
You can use a regular expression for this:
Regex r = new Regex("[^A-Z0-9.$ ]$");
if (r.IsMatch(SomeString)) {
// validation failed
}
To create a list of characters from A-Z or 0-9 you would use a simple loop:
for (char c = 'A'; c <= 'Z'; c++) {
// c or c.ToString() depending on what you need
}
But you don't need that with the Regex - pretty much every regex engine understands the range syntax (A-Z).
I have only just written such a function, and an extended version to restrict the first and last characters when needed. The original function merely checks whether or not the string consists of valid characters only, the extended function adds two integers for the numbers of valid characters at the beginning of the list to be skipped when checking the first and last characters, in practice it simply calls the original function 3 times, in the example below it ensures that the string begins with a letter and doesn't end with an underscore.
StrChr(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
StrChrEx(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", 11, 1));
BOOL __cdecl StrChr(CHAR* str, CHAR* chars)
{
for (int s = 0; str[s] != 0; s++)
{
int c = 0;
while (true)
{
if (chars[c] == 0)
{
return false;
}
else if (str[s] == chars[c])
{
break;
}
else
{
c++;
}
}
}
return true;
}
BOOL __cdecl StrChrEx(CHAR* str, CHAR* chars, UINT excl_first, UINT excl_last)
{
char first[2] = {str[0], 0};
char last[2] = {str[strlen(str) - 1], 0};
if (!StrChr(str, chars))
{
return false;
}
if (excl_first != 0)
{
if (!StrChr(first, chars + excl_first))
{
return false;
}
}
if (excl_last != 0)
{
if (!StrChr(last, chars + excl_last))
{
return false;
}
}
return true;
}
If you are using c#, you do this easily using List and contains. You can do this with single characters (in a string) or a multicharacter string just the same
var pn = "The String To ChecK";
var badStrings = new List<string>()
{
" ","\t","\n","\r"
};
foreach(var badString in badStrings)
{
if(pn.Contains(badString))
{
//Do something
}
}
If you're not super good with regular expressions, then there is another way to go about this in C#. Here is a block of code I wrote to test a string variable named notifName:
var alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z";
var numbers = "0,1,2,3,4,5,6,7,8,9";
var specialChars = " ,(,),_,[,],!,*,-,.,+,-";
var validChars = (alphabet + "," + alphabet.ToUpper() + "," + numbers + "," + specialChars).Split(',');
for (int i = 0; i < notifName.Length; i++)
{
if (Array.IndexOf(validChars, notifName[i].ToString()) < 0) {
errorFound = $"Invalid character '{notifName[i]}' found in notification name.";
break;
}
}
You can change the characters added to the array as needed. The Array IndexOf method is the key to the whole thing. Of course if you want commas to be valid, then you would need to choose a different split character.
Not enough reps to comment directly, but I recommend the Regex approach. One small caveat: you probably need to anchor both ends of the input string, and you will want at least one character to match. So (with thanks to ThiefMaster), here's my regex to validate user input for a simple arithmetical calculator (plus, minus, multiply, divide):
Regex r = new Regex(#"^[0-9\.\-\+\*\/ ]+$");
I'd go with a regex, but still need to add my 2 cents here, because all the proposed non-regex solutions are O(MN) in the worst case (string is valid) which I find repulsive for religious reasons.
Even more so when LINQ offers a simpler and more efficient solution than nesting loops:
var isInvalid = "The String To Test".Intersect("ALL_INVALID_CHARS").Any();
When reading a line on a stream (for me it's actually a stream on a COM port), the returned string contains no \n or \r characters (or \r\n combinations). For logging purposes, I would like to retain them. At present my loop looks like this:
while (newPort.BytesToRead > 0)
{
received = ReadLine(newPort);
response.Add(received);
}
So basically I'm reading a string and then adding it to a list of strings called response. What I want is for the returned string received to contain the \r or \n or \r\n that was in the original stream, as well as terminating a line of text.
Is this trivially possible? Or even non-trivially!
I'm guessing this is quite hard to do. I mean thinking about it, if I receive a \r, I have to get the next character to see if it's a \n. If there isn't a next character I'll timeout with an exception. If there is a next character and it isn't a \n, I have to make it the current character on the next iteration, and so on...!
You could append Environment.NewLine after adding received.
Update If you need to keep the original whitespace verbatim, then there's no point using ReadLine. You could use ReadBlock in that case to read smaller chunks of a file, or ReadToEnd to just get the whole thing. If you need to mark new lines for processing the message, you can search through the raw string to normalize or tokenize or whatever it is you'd like to do.
Here is the OP's solution from the question post:
Ok, I had a crack at it. Here is what I think is right... :
{
int s = 0, e = 0;
for (; e < line.Length; e++)
{
if (line[e] == '\n')
{
// \n always terminates a line.
lines.Add(line.Substring(s, (e - s) + 1));
s = e + 1;
}
if (line[e] == '\r' && (e < line.Length - 1))
{
// \r only terminates a line if it isn't followed by \n.
if (line[e + 1] != '\n')
{
lines.Add(line.Substring(s, (e - s) + 1));
s = e + 1;
}
}
}
// Check for trailing characters not terminated by anything.
if (s < e)
{
lines.Add(line.Substring(s, (e - s)));
}
}
while (newPort.BytesToRead > 0)
{
received = ReadLine(newPort);
response.Add(string.Format("{0}{1}", received, System.Environment.Newline);
}
I have a big string (let's call it a CSV file, though it isn't actually one, it'll just be easier for now) that I have to parse in C# code.
The first step of the parsing process splits the file into individual lines by just using a StreamReader object and calling ReadLine until it's through the file. However, any given line might contain a quoted (in single quotes) literal with embedded newlines. I need to find those newlines and convert them temporarily into some other kind of token or escape sequence until I've split the file into an array of lines..then I can change them back.
Example input data:
1,2,10,99,'Some text without a newline', true, false, 90
2,1,11,98,'This text has an embedded newline
and continues here', true, true, 90
I could write all of the C# code needed to do this by using string.IndexOf to find the quoted sections and look within them for newlines, but I'm thinking a Regex might be a better choice (i.e. now I have two problems)
Since this isn't a true CSV file, does it have any sort of schema?
From your example, it looks like you have:
int, int, int, int, string , bool, bool, int
With that making up your record / object.
Assuming that your data is well formed (I don't know enough about your source to know how valid this assumption is); you could:
Read your line.
Use a state machine to parse your data.
If your line ends, and you're parsing a string, read the next line..and keep parsing.
I'd avoid using a regex if possible.
State-machines for doing such a job are made easy using C# 2.0 iterators. Here's hopefully the last CSV parser I'll ever write. The whole file is treated as a enumerable bunch of enumerable strings, i.e. rows/columns. IEnumerable is great because it can then be processed by LINQ operators.
public class CsvParser
{
public char FieldDelimiter { get; set; }
public CsvParser()
: this(',')
{
}
public CsvParser(char fieldDelimiter)
{
FieldDelimiter = fieldDelimiter;
}
public IEnumerable<IEnumerable<string>> Parse(string text)
{
return Parse(new StringReader(text));
}
public IEnumerable<IEnumerable<string>> Parse(TextReader reader)
{
while (reader.Peek() != -1)
yield return parseLine(reader);
}
IEnumerable<string> parseLine(TextReader reader)
{
bool insideQuotes = false;
StringBuilder item = new StringBuilder();
while (reader.Peek() != -1)
{
char ch = (char)reader.Read();
char? nextCh = reader.Peek() > -1 ? (char)reader.Peek() : (char?)null;
if (!insideQuotes && ch == FieldDelimiter)
{
yield return item.ToString();
item.Length = 0;
}
else if (!insideQuotes && ch == '\r' && nextCh == '\n') //CRLF
{
reader.Read(); // skip LF
break;
}
else if (!insideQuotes && ch == '\n') //LF for *nix-style line endings
break;
else if (ch == '"' && nextCh == '"') // escaped quotes ""
{
item.Append('"');
reader.Read(); // skip next "
}
else if (ch == '"')
insideQuotes = !insideQuotes;
else
item.Append(ch);
}
// last one
yield return item.ToString();
}
}
Note that the file is read character by character with the code deciding when newlines are to be treated as row delimiters or part of a quoted string.
What if you got the whole file into a variable then split that based on non-quoted newlines?
EDIT: Sorry, I've misinterpreted your post. If you're looking for a regex, then here is one:
content = Regex.Replace(content, "'([^']*)\n([^']*)'", "'\1TOKEN\2'");
There might be edge cases and that two problems but I think it should be ok most of the time. What the Regex does is that it first finds any pair of single quotes that has \n between it and replace that \n with TOKEN preserving any text in-between.
But still, I'd go state machine like what #bryansh explained below.