What is the easiest way to read a file character by character in C#?
Currently, I am reading line by line by calling System.io.file.ReadLine(). I see that there is a Read() function but it doesn;t return a character...
I would also like to know how to detect the end of a line using such an approach...The input file in question is a CSV file....
Open a TextReader (e.g. by File.OpenText - note that File is a static class, so you can't create an instance of it) and repeatedly call Read. That returns int rather than char so it can also indicate end of file:
int readResult = reader.Read();
if (readResult != -1)
{
char nextChar = (char) readResult;
// ...
}
Or to loop:
int readResult;
while ((readResult = reader.Read()) != -1)
{
char nextChar = (char) readResult;
// ...
}
Or for more funky goodness:
public static IEnumerable<char> ReadCharacters(string filename)
{
using (var reader = File.OpenText(filename))
{
int readResult;
while ((readResult = reader.Read()) != -1)
{
yield return (char) readResult;
}
}
}
...
foreach (char c in ReadCharacters("foo.txt"))
{
...
}
Note that all by default, File.OpenText will use an encoding of UTF-8. Specify an encoding explicitly if that isn't what you want.
EDIT: To find the end of a line, you'd check whether the character is \n... you'd potentially want to handle \r specially too, if this is a Windows text file.
But if you want each line, why not just call ReadLine? You can always iterate over the characters in the line afterwards...
Here is a snippet from msdn
using (StreamReader sr = new StreamReader(path))
{
char[] c = null;
while (sr.Peek() >= 0)
{
c = new char[1];
sr.Read(c, 0, c.Length);
// do something with c[0]
}
}
Related
I am trying to write a little utility class for myself to do some formatting of text so that each line is as close as possible to 152 characters in length. I have written this code:
StreamReader sr = new StreamReader("C:\\Users\\Owner\\Videos\\XSplit\\Luke11\\Luke11fromweb.txt");
StreamWriter sw = new StreamWriter("C:\\Users\\Owner\\Videos\\XSplit\\Luke11\\Luke11raw.txt");
int count = 152;
char chunk;
do
{
for (int i = 0; i < count; i++)
{
chunk = (char)sr.Read();
sw.Write(chunk);
}
while (Char.IsWhiteSpace((char)sr.Peek()) == false && (char)sr.Peek() > -1)
{
chunk = (char)sr.Read();
sw.Write(chunk);
}
sw.WriteLine();
} while (sr.Peek() >= 0);
sr.Close();
sw.Close();
The for statement works fine. It reads and writes 152 characters without flaw. However, there is no guarantee that 152 characters will fall at the end of a word. So I wrote the nested while statement to check if the next character is a space, and if not, to read and write that character. The inner while statement is supposed to stop when it sees that the next character is a space, and then write in the line end statement.
After the reader and writer have gone through the entire document, I close them both and should have a new document where all the lines are approximately 152 characters long and end at the end of a word.
Obviously this isn't working as I anticipated and that is the reason for my question. Since the for statement works, there is something wrong in my nested while statement (perhaps the condition?) and I am not exiting the program without errors.
Any advice would be appreciated. Thanks in advance.
Your end of file test is incorrect
while (Char.IsWhiteSpace((char)sr.Peek()) == false && (char)sr.Peek() > -1)
you mean
while (Char.IsWhiteSpace((char)sr.Peek()) == false && sr.Peek() > -1)
as per docs
The Peek method returns an integer value in order to determine whether the end of the file, or another error has occurred. This allows a user to first check if the returned value is -1 before casting it to a Char type.
Note before casting
Might I suggest something like the following.
using System;
using System.IO;
public class Program
{
public static void Main()
{
Console.WriteLine("Hello World");
int maxLength = 152;
string inputPath = #"c:\Users\Owner\Videos\XSplit\Luke11\Luke11fromweb.txt";
string outputPath = #"c:\Users\Owner\Videos\XSplit\Luke11\Luke11raw.txt";
try
{
if (File.Exists(outputPath))
{
File.Delete(outputPath);
}
using (StreamWriter sw = new StreamWriter(inputPath))
{
using (StreamReader sr = new StreamReader(outputPath))
{
do
{
WriteMaxPlus(sr, sw, maxLength);
}
while (sr.Peek() >= 0);
}
}
}
catch (Exception e)
{
Console.WriteLine("The process failed: {0}", e.ToString());
}
}
private static void WriteMaxPlus(StreamReader sr, StreamWriter sw, int maxLength)
{
for (int i = 0; i < maxLength; i++)
{
if (sr.Peek() >= 0)
{
sw.Write((char)sr.Read());
}
}
while (sr.Peek() >= 0 && !Char.IsWhiteSpace((char)sr.Peek()))
{
sw.Write((char)sr.Read());
}
sw.WriteLine();
}
}
Sometimes we need to read lines from a stream, but considering only specific char sequence as newline (CRLF, but not CR or LF).
StreamReader.ReadLine, as documented, treats as newline sequence CRLF, CR and LF. That may be unacceptable if the line can contain single CR ("\r") or single LF ("\n") as business-valued data.
Need to have ability to read stream line-by-line, but delimited by certain character sequence.
Here is a method that reads line from stream and returns it as a string:
public static string ReadLineWithFixedNewlineDelimeter(StreamReader reader, string delim)
{
if (reader.EndOfStream)
return null;
if (string.IsNullOrEmpty(delim))
{
return reader.ReadToEnd();
}
var sb = new StringBuilder();
var delimCandidatePosition = 0;
while (!reader.EndOfStream && delimCandidatePosition < delim.Length)
{
var c = (char)reader.Read();
if (c == delim[delimCandidatePosition])
{
delimCandidatePosition ++;
}
else
{
delimCandidatePosition = 0;
}
sb.Append(c);
}
return sb.ToString(0, sb.Length - (delimCandidatePosition == delim.Length ? delim.Length : 0));
}
How does StreamReader read all chars, including 0x0D 0x0A chars?
I have an old .txt file I am trying to covert. Many lines (but not all) end with "0x0D 0x0D 0x0A".
This code reads all of the lines.
StreamReader srFile = new StreamReader(gstPathFileName);
while (!srFile.EndOfStream) {
string stFileContents = srFile.ReadLine();
...
}
This results in extra "" strings between each .txt line. As there are some blank lines between the paragraphs, removing all "" strings removes those blank lines.
Is there a way to have StreamReader read all of the chars including the "0x0D 0x0D 0x0A"?
Edited two hours later ... the file is huge, 1.6MB.
A very simple reimplementation of ReadLine. I have done a version that returns an IEnumerable<string> because it's easier. I've put it in an extension method, so the static class. The code is heavily commented, so it should be easy to read.
public static class StreamEx
{
public static string[] ReadAllLines(this TextReader tr, string separator)
{
return tr.ReadLines(separator).ToArray();
}
// StreamReader is based on TextReader
public static IEnumerable<string> ReadLines(this TextReader tr, string separator)
{
// Handling of empty file: old remains null
string old = null;
// Read buffer
var buffer = new char[128];
while (true)
{
// If we already read something
if (old != null)
{
// Look for the separator
int ix = old.IndexOf(separator);
// If found
if (ix != -1)
{
// Return the piece of line before the separator
yield return old.Remove(ix);
// Then remove the piece of line before the separator plus the separator
old = old.Substring(ix + separator.Length);
// And continue
continue;
}
}
// old doesn't contain any separator, let's read some more chars
int read = tr.ReadBlock(buffer, 0, buffer.Length);
// If there is no more chars to read, break the cycle
if (read == 0)
{
break;
}
// Add the just read chars to the old chars
// note that null + "somestring" == "somestring"
old += new string(buffer, 0, read);
// A new "round" of the while cycle will search for the separator
}
// Now we have to handle chars after the last separator
// If we read something
if (old != null)
{
// Return all the remaining characters
yield return old;
}
}
}
Note that, as written, it won't directly handle your problem :-) But it lets you select the separator you want to use. So you use "\r\n" and then you trim the excess '\r'.
Use it like this:
using (var sr = new StreamReader("somefile"))
{
// Little LINQ to strip excess \r and to make an array
// (note that by making an array you'll put all the file
// in memory)
string[] lines = sr.ReadLines("\r\n").Select(x => x.TrimEnd('\r')).ToArray();
}
or
using (var sr = new StreamReader("somefile"))
{
// Little LINQ to strip excess \r
// (note that the file will be read line by line, so only
// a line at a time is in memory (plus some remaining characters
// of the next line in the old buffer)
IEnumerable<string> lines = sr.ReadLines("\r\n").Select(x => x.TrimEnd('\r'));
foreach (string line in lines)
{
// Do something
}
}
You could always use a BinaryReader and manually read in lines a byte at a time. Keep hold of the bytes, then when you come across 0x0d 0x0d 0x0a, make a new string of the bytes for the current line.
Note:
I'm assuming that your encoding is Encoding.UTF8 but your case might be different. Accessing bytes directly, I don't know off-hand how to interpret the encoding.
If your file has extra information, e.g. a byte order mark, that will be returned too.
Here it is:
public static IEnumerable<string> ReadLinesFromStream(string fileName)
{
using ( var fileStream = File.Open(gstPathFileName) )
using ( BinaryReader binaryReader = new BinaryReader(fileStream) )
{
var bytes = new List<byte>();
while ( binaryReader.PeekChar() != -1 )
{
bytes.Add(binaryReader.ReadByte());
bool newLine = bytes.Count > 2
&& bytes[bytes.Count - 3] == 0x0d
&& bytes[bytes.Count - 2] == 0x0d
&& bytes[bytes.Count - 1] == 0x0a;
if ( newLine )
{
yield return Encoding.UTF8.GetString(bytes.Take(bytes.Count - 3).ToArray());
bytes.Clear();
}
}
if ( bytes.Count > 0 )
yield return Encoding.UTF8.GetString(bytes.ToArray());
}
}
A very easy solution (not optimized for memory consumption) could be:
var allLines = File.ReadAllText(gstPathFileName)
.Split('\n');
The if you need to remove trailing carriage return characters, then do:
for(var i = 0; i < allLines.Length; ++i)
allLines[i] = allLines[i].TrimEnd('\r');
You can put relevant processing into that for link if you want. Or if you do not want to keep the array, use this instead of the for:
foreach(var line in allLines.Select(x => x.TrimEnd('\r')))
{
// use 'line' here ...
}
This code works well ... reads every char.
char[] acBuf = null;
int iReadLength = 100;
while (srFile.Peek() >= 0) {
acBuf = new char[iReadLength];
srFile.Read(acBuf, 0, iReadLength);
string s = new string(acBuf);
}
I have a stream reader line by line (sr.ReadLine()). My code counts the line-end with both line endings \r\n and/or \n.
StreamReader sr = new System.IO.StreamReader(sPath, enc);
while (!sr.EndOfStream)
{
// reading 1 line of datafile
string sLine = sr.ReadLine();
...
How to tell to code (instead of universal sr.ReadLine()) that I want to count new line only a full \r\n and not the \n?
It is not possible to do this using StreamReader.ReadLine.
As per msdn:
A line is defined as a sequence of characters followed by a line feed
("\n"), a carriage return ("\r"), or a carriage return immediately
followed by a line feed ("\r\n"). The string that is returned does not
contain the terminating carriage return or line feed. The returned
value is null if the end of the input stream is reached.
So yoг have to read this stream byte-by-byte and return line only if you've captured \r\n
EDIT
Here is some code sample
private static IEnumerable<string> ReadLines(StreamReader stream)
{
StringBuilder sb = new StringBuilder();
int symbol = stream.Peek();
while (symbol != -1)
{
symbol = stream.Read();
if (symbol == 13 && stream.Peek() == 10)
{
stream.Read();
string line = sb.ToString();
sb.Clear();
yield return line;
}
else
sb.Append((char)symbol);
}
yield return sb.ToString();
}
You can use it like
foreach (string line in ReadLines(stream))
{
//do something
}
you cannot do it with ReadLine, but you can do instead:
stream.ReadToEnd().Split(new[] {"\r\n"}, StringSplitOptions.None)
For simplification, let's work over a byte array:
static int NumberOfNewLines(byte[] data)
{
int count = 0;
for (int i = 0; i < data.Length - 1; i++)
{
if (data[i] == '\r' && data[i + 1] == '\n')
count++;
}
return count;
}
If you care about efficiency, optimize away, but this should work.
You can get the bytes of a file by using System.IO.File.ReadBytes(string filename).
what i am trying to do is to read the file a.txt and output each character in a single line i am having a real difficulty to solve this problem any help will be really appreciated.if you write the code please comment so i can understand more clearly as i am beginner.thanks
namespace ConsoleApplication13
{
class Program
{
static void Main(string[] args)
{
using (StreamReader r = new StreamReader("a.txt"))
{
string #char;
while((#char = r.ReadBlock() != null))
foreach(char i in #char)
{
Console.WriteLine(i);
}
}
}
}
}
i want to read the file and output all the file char by char , each char in new line
OK; there's a lot of ways to do that; the simplest would be (for small files):
string body = File.ReadAllText("a.txt");
foreach (char c in body) Console.WriteLine(c);
To use ReadBlock to handle the file in chunks (not lines):
using (StreamReader r = new StreamReader("a.txt"))
{
char[] buffer = new char[1024];
int read;
while ((read = r.ReadBlock(buffer, 0, buffer.Length)) > 0)
{
for (int i = 0; i < read; i++)
Console.WriteLine(buffer[i]);
}
}
This reads blocks of up to 1024 characters at a time, then writes out whatever we read, each character on a new line. The variable read tells us how many characters we read on that iteration; the read > 0 test (hidden slightly, but it is there) asks "have we reached the end of the file?" - as ReadBlock will return 0 at the end.