Stream read line - c#

I have a stream reader line by line (sr.ReadLine()). My code counts the line-end with both line endings \r\n and/or \n.
StreamReader sr = new System.IO.StreamReader(sPath, enc);
while (!sr.EndOfStream)
{
// reading 1 line of datafile
string sLine = sr.ReadLine();
...
How to tell to code (instead of universal sr.ReadLine()) that I want to count new line only a full \r\n and not the \n?

It is not possible to do this using StreamReader.ReadLine.
As per msdn:
A line is defined as a sequence of characters followed by a line feed
("\n"), a carriage return ("\r"), or a carriage return immediately
followed by a line feed ("\r\n"). The string that is returned does not
contain the terminating carriage return or line feed. The returned
value is null if the end of the input stream is reached.
So yoг have to read this stream byte-by-byte and return line only if you've captured \r\n
EDIT
Here is some code sample
private static IEnumerable<string> ReadLines(StreamReader stream)
{
StringBuilder sb = new StringBuilder();
int symbol = stream.Peek();
while (symbol != -1)
{
symbol = stream.Read();
if (symbol == 13 && stream.Peek() == 10)
{
stream.Read();
string line = sb.ToString();
sb.Clear();
yield return line;
}
else
sb.Append((char)symbol);
}
yield return sb.ToString();
}
You can use it like
foreach (string line in ReadLines(stream))
{
//do something
}

you cannot do it with ReadLine, but you can do instead:
stream.ReadToEnd().Split(new[] {"\r\n"}, StringSplitOptions.None)

For simplification, let's work over a byte array:
static int NumberOfNewLines(byte[] data)
{
int count = 0;
for (int i = 0; i < data.Length - 1; i++)
{
if (data[i] == '\r' && data[i + 1] == '\n')
count++;
}
return count;
}
If you care about efficiency, optimize away, but this should work.
You can get the bytes of a file by using System.IO.File.ReadBytes(string filename).

Related

Removing carriage return from specific line in c#

I have this type of data in a text file (csv) :
column1|column2|column3|column4|column5 (\r\n)
column1|column2|column3|column4|column5 (\r\n)
column1|column2 (\r\n)
column2 (\r\n)
column2|column3|column4|column5 (\r\n)
I would like to delete the \r\n that are line 3 and line 4 to have :
column1|column2|column3|column4|column5 (\r\n)
column1|column2|column3|column4|column5 (\r\n)
column1|column2/column2/column2|column3|column4|column5 (\r\n)
My idea is if the row doesn't have 4 column separators ("|") then delete the CRLF, and repeat the operation until you have only correct rows.
This is my code :
String path = "test.csv";
// Read file
string[] readText = File.ReadAllLines(path);
// Empty the file
File.WriteAllText(path, String.Empty);
int x = 0;
int countheaders = 0;
int countlines;
using (StreamWriter writer = new StreamWriter(path))
{
foreach (string s in readText)
{
if (x == 0)
{
countheaders = s.Where(c => c == '|').Count();
x = 1;
}
countlines = 0;
countlines = s.Where(d => d == '|').Count();
if (countlines == countheaders)
{
writer.WriteLine(s);
}
else
{
string s2 = s;
s2 = s2.ToString().TrimEnd('\r', '\n');
writer.Write(s2);
}
}
}
The problem is that i'm reading the file in one pass, so the line break on line 4 is removed and line 4 and line 5 are together...
You could probably do the following (cant test it now, but it should work):
IEnumerable<string> batchValuesIn(
IEnumerable<string> source,
string separator,
int size)
{
var counter = 0;
var buffer = new StringBuilder();
foreach (var line in source)
{
var values = line.Split(separator);
if (line.Length != 0)
{
foreach (var value in values)
{
buffer.Append(value);
counter++;
if (counter % size == 0)
{
yield return buffer.ToString();
buffer.Clear();
}
else
buffer.Append(separator);
}
}
}
if (buffer.Length != 0)
yield return buffer.ToString();
And you'd use it like:
var newLines = batchValuesIn(File.ReadLines(path), "|", 5);
The good thing about this solution is that you are never loading into memory the enitre orignal source. You simply build the lines on the fly.
DISCLAIMER: this may behave weirdly with malfomred input strings.

Counting total characters of a file

Hi I'm pretty new to C# and trying to do some exercises to get up to speed with it. I'm trying to count the total number of characters in a file but it's stopping after the first word, would someone be able to tell me where I am going wrong? Thanks in advance
public void TotalCharacterCount()
{
string str;
int count, i, l;
count = i = 0;
StreamReader reader = File.OpenText("C:\\Users\\Lewis\\file.txt");
str = reader.ReadLine();
l = str.Length;
while (str != null && i < l)
{
count++;
i++;
str = reader.ReadLine();
}
reader.Close();
Console.Write("Number of characters in the file is : {0}\n", count);
}
If you want to know the size of a file:
long length = new System.IO.FileInfo("C:\\Users\\Lewis\\file.txt").Length;
Console.Write($"Number of characters in the file is : {length}");
If you want to count characters to play around with C#, then here is some sample code that might help you
int totalCharacters = 0;
// Using will do the reader.Close for you.
using (StreamReader reader = File.OpenText("C:\\Users\\Lewis\\file.txt"))
{
string str = reader.ReadLine();
while (str != null)
{
totalCharacters += str.Length;
str = reader.ReadLine();
}
}
// If you add the $ in front of the string, then you can interpolate expressions
Console.Write($"Number of characters in the file is : {totalCharacters}");
it's stopping after the first word
It is because you have check && i < l in the loop and then increment it so the check doesn't pass you don't change the value of l variable(by the way, the name is not very good, I was sure it was 1, not l).
Then if you need to get total count of characters in the file you could read the whole file to a string variable and just get it from Count() Length
var count = File.ReadAllText(path).Count();
Getting Length property of the FileInfo will give the size, in bytes, of the current file, which is not necessary will be equal to characters count(depending on Encoding a character may take more than a byte)
And regarding the way you read - it also depends whether you want to count new line symbols and others or not.
Consider the following sample
static void Main(string[] args)
{
var sampleWithEndLine = "a\r\n";
var length1 = "a".Length;
var length2 = sampleWithEndLine.Length;
var length3 = #"a
".Length;
Console.WriteLine($"First sample: {length1}");
Console.WriteLine($"Second sample: {length2}");
Console.WriteLine($"Third sample: {length3}");
var totalCharacters = 0;
File.WriteAllText("sample.txt", sampleWithEndLine);
using(var reader = File.OpenText("sample.txt"))
{
string str = reader.ReadLine();
while (str != null)
{
totalCharacters += str.Length;
str = reader.ReadLine();
}
}
Console.WriteLine($"Second sample read with stream reader: {totalCharacters}");
Console.ReadKey();
}
For the second sample, first, the Length will return 3, because it actually contains three symbols, while with stream reader you will get 1, because The string that is returned does not contain the terminating carriage return or line feed. The returned value is null if the end of the input stream is reached

Read lines with specific NewLine char sequence with StreamReader.ReadLine

Sometimes we need to read lines from a stream, but considering only specific char sequence as newline (CRLF, but not CR or LF).
StreamReader.ReadLine, as documented, treats as newline sequence CRLF, CR and LF. That may be unacceptable if the line can contain single CR ("\r") or single LF ("\n") as business-valued data.
Need to have ability to read stream line-by-line, but delimited by certain character sequence.
Here is a method that reads line from stream and returns it as a string:
public static string ReadLineWithFixedNewlineDelimeter(StreamReader reader, string delim)
{
if (reader.EndOfStream)
return null;
if (string.IsNullOrEmpty(delim))
{
return reader.ReadToEnd();
}
var sb = new StringBuilder();
var delimCandidatePosition = 0;
while (!reader.EndOfStream && delimCandidatePosition < delim.Length)
{
var c = (char)reader.Read();
if (c == delim[delimCandidatePosition])
{
delimCandidatePosition ++;
}
else
{
delimCandidatePosition = 0;
}
sb.Append(c);
}
return sb.ToString(0, sb.Length - (delimCandidatePosition == delim.Length ? delim.Length : 0));
}

C# Streamreader - Break on {CR}{LF} only

I am trying to count the number of rows in a text file (to compare to a control file) before performing a complex SSIS insert package.
Currently I am using a StreamReader and it is breaking a line with a {LF} embedded into a new line, whereas SSIS is using {CR}{LF} (correctly), so the counts are not tallying up.
Does anyone know an alternate method of doing this where I can count the number of lines in the file based on {CR}{LF} Line breaks only?
Thanks in advance
Iterate through the file and count number of CRLFs.
Pretty straightforward implementation:
public int CountLines(Stream stream, Encoding encoding)
{
int cur, prev = -1, lines = 0;
using (var sr = new StreamReader(stream, encoding, false, 4096, true))
{
while ((cur = sr.Read()) != -1)
{
if (prev == '\r' && cur == '\n')
lines++;
prev = cur;
}
}
//Empty stream will result in 0 lines, any content would result in at least one line
if (prev != -1)
lines++;
return lines;
}
Example usage:
using(var s = File.OpenRead(#"<your_file_path>"))
Console.WriteLine("Found {0} lines", CountLines(s, Encoding.Default));
Actually it's a find substring in string task. More generic algorithms can be used.
{CR}{LF} is the desired. Can't really say which is correct.
Since ReadLine strips off the end of line you don't know
Use StreamReader.Read Method () and look for 13 followed by 10
It return Int
Here's a pretty lazy way... this will read the entire file into memory.
var cnt = File.ReadAllText("yourfile.txt")
.Split(new[] { "\r\n" }, StringSplitOptions.None)
.Length;
Here is an extension-method that reads the lines with line-seperator {Cr}{Lf} only, and not {LF}. You could do a count on it.
var count= new StreamReader(#"D:\Test.txt").ReadLinesCrLf().Count()
But could also use it for reading files, sometimes usefull since the normal StreamReader.ReadLine breaks on both {Cr}{Lf} and {LF}. Can be used on any TextReader and works streaming (file size is not an issue).
public static IEnumerable<string> ReadLinesCrLf(this TextReader reader, int bufferSize = 4096)
{
StringBuilder lineBuffer = null;
//read buffer
char[] buffer = new char[bufferSize];
int charsRead;
var previousIsLf = false;
while ((charsRead = reader.Read(buffer, 0, bufferSize)) != 0)
{
int bufferIndex = 0;
int writeIdx = 0;
do
{
var currentChar = buffer[bufferIndex];
switch (currentChar)
{
case '\n':
if (previousIsLf)
{
if (lineBuffer == null)
{
//return from current buffer writeIdx could be higher than 0 when multiple rows are in the buffer
yield return new string(buffer, writeIdx, bufferIndex - writeIdx - 1);
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
}
else
{
Debug.Assert(writeIdx == 0, $"Write index should be 0, when linebuffer != null");
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
Debug.Assert(lineBuffer.ToString().Last() == '\r',$"Last character in linebuffer should be a carriage return now");
lineBuffer.Length--;
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
yield return lineBuffer.ToString();
lineBuffer = null;
}
}
previousIsLf = false;
break;
case '\r':
previousIsLf = true;
break;
default:
previousIsLf = false;
break;
}
bufferIndex++;
} while (bufferIndex < charsRead);
if (writeIdx < bufferIndex)
{
if (lineBuffer == null) lineBuffer = new StringBuilder();
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
}
}
//return last row
if (lineBuffer != null && lineBuffer.Length > 0) yield return lineBuffer.ToString();
}

C# - Read External CSV File Character by Character

What is the easiest way to read a file character by character in C#?
Currently, I am reading line by line by calling System.io.file.ReadLine(). I see that there is a Read() function but it doesn;t return a character...
I would also like to know how to detect the end of a line using such an approach...The input file in question is a CSV file....
Open a TextReader (e.g. by File.OpenText - note that File is a static class, so you can't create an instance of it) and repeatedly call Read. That returns int rather than char so it can also indicate end of file:
int readResult = reader.Read();
if (readResult != -1)
{
char nextChar = (char) readResult;
// ...
}
Or to loop:
int readResult;
while ((readResult = reader.Read()) != -1)
{
char nextChar = (char) readResult;
// ...
}
Or for more funky goodness:
public static IEnumerable<char> ReadCharacters(string filename)
{
using (var reader = File.OpenText(filename))
{
int readResult;
while ((readResult = reader.Read()) != -1)
{
yield return (char) readResult;
}
}
}
...
foreach (char c in ReadCharacters("foo.txt"))
{
...
}
Note that all by default, File.OpenText will use an encoding of UTF-8. Specify an encoding explicitly if that isn't what you want.
EDIT: To find the end of a line, you'd check whether the character is \n... you'd potentially want to handle \r specially too, if this is a Windows text file.
But if you want each line, why not just call ReadLine? You can always iterate over the characters in the line afterwards...
Here is a snippet from msdn
using (StreamReader sr = new StreamReader(path))
{
char[] c = null;
while (sr.Peek() >= 0)
{
c = new char[1];
sr.Read(c, 0, c.Length);
// do something with c[0]
}
}

Categories