Strange question mark, when setting StreamReader to beginning - c#

I am writing a program about job interview. Everything is working properly, except one thing. When I use an outside method TotalLines (where I have seperate StreamReader), it is working properly, but when I am calculating a number of totalLines in the program, I am receiving one question mark on the beginning of the first question. So it is like that:
?What is your name?
but in the text file from which I am reading, I have just - What is your name?
I have no idea why is that. Maybe it is problem with that I am returning StreamReader to beginning? I checked my encoding, everything, but nothing worked. Thanks for your help :)
PotentialEmployee potentialEmployee = new PotentialEmployee();
using (StreamReader InterviewQuestions = new StreamReader(text, Encoding.Unicode))
{
int totalLines = 0;
while (InterviewQuestions.ReadLine() != null)
{
totalLines++;
}
InterviewQuestions.DiscardBufferedData();
InterviewQuestions.BaseStream.Seek(0, SeekOrigin.Begin);
for (int numberOfQuestions = 0; numberOfQuestions < totalLines; numberOfQuestions++)
{
string question = InterviewQuestions.ReadLine();
Console.WriteLine(question);
string response = Console.ReadLine();
potentialEmployee.Responses.Add(question, response);
}
}
But when I have a TotalLines calculation in the outside method, the question mark does not show. Any ideas plase?

It's very likely that the file starts with a byte order mark (BOM) which is being ignored by the reader initially, but then not when you "rewind" the stream.
While you could create a new reader, or even just replace it after reading it, I think it would be better to just avoid reading the file twice to start with:
foreach (var question in File.ReadLines(text, Encoding.Unicode))
{
Console.WriteLine(question);
string response = Console.ReadLine();
potentialEmployee.Responses.Add(question, response);
}
That's shorter, simpler, more efficient code that also won't display the problem you asked about.
If you want to make sure you can read the whole file before asking any questions, that's easy too:
string[] questions = File.ReadAllLines(text, Encoding.Unicode);
foreach (var question in questions)
{
Console.WriteLine(question);
string response = Console.ReadLine();
potentialEmployee.Responses.Add(question, response);
}

Whenever you seek your stream to the beginning, the Byte Order Mark (BOM) is not read again, it's only done the first time after you create a stream reader with Encoding specified.
In order for the BOM to be read correctly again, you need to create a new stream reader. However, you can reuse the stream if you instruct the stream reader to keep the stream open after the reader is disposed, but be sure to seek before you create a new reader.

String s="aasddd??dsfas?df";
s.replace('?','\0');

Related

File Copy Program Doesn't Properly Copy File

Hello
I've been working on terminal-like application to get better at programming in c#, just something to help me learn. I've decided to add a feature that will copy a file exactly as it is, to a new file... It seems to work almost perfect. When opened in Notepad++ the file are only a few lines apart in length, and very, very, close to the same as far as actual file size goes. However, the duplicated copy of the file never runs. It says the file is corrupt. I have a feeling it's within the methods for reading and rewriting binary to files that I created. The code is as follows, thank for the help. Sorry for the spaghetti code too, I get a bit sloppy when I'm messing around with new ideas.
Class that handles the file copying/writing
using System;
using System.IO;
//using System.Collections.Generic;
namespace ConsoleFileExplorer
{
class FileTransfer
{
private BinaryWriter writer;
private BinaryReader reader;
private FileStream fsc; // file to be duplicated
private FileStream fsn; // new location of file
int[] fileData;
private string _file;
public FileTransfer(String file)
{
_file = file;
fsc = new FileStream(file, FileMode.Open);
reader = new BinaryReader(fsc);
}
// Reads all the original files data to an array of bytes
public byte[] ReadAllDataToArray()
{
byte[] bytes = reader.ReadBytes((int)fsc.Length); // reading bytes from the original file
return bytes;
}
// writes the array of original byte data to a new file
public void WriteDataFromArray(byte[] fileData, string path) // got a feeling this is the problem :p
{
fsn = new FileStream(path, FileMode.Create);
writer = new BinaryWriter(fsn);
int i = 0;
while(i < fileData.Length)
{
writer.Write(fileData[i]);
i++;
}
}
}
}
Code that interacts with this class .
(Sleep(5000) is because I was expecting an error on first attempt...
case '3':
Console.Write("Enter source file: ");
string sourceFile = Console.ReadLine();
if (sourceFile == "")
{
Console.Clear();
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.Error.WriteLine("Must input a proper file path.\n");
Console.ForegroundColor = ConsoleColor.White;
Menu();
} else {
Console.WriteLine("Copying Data"); System.Threading.Thread.Sleep(5000);
FileTransfer trans = new FileTransfer(sourceFile);
//copying the original files data
byte[] data = trans.ReadAllDataToArray();
Console.Write("Enter Location to store data: ");
string newPath = Console.ReadLine();
// Just for me to make sure it doesnt exit if i forget
if(newPath == "")
{
Console.Clear();
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.Error.WriteLine("Cannot have empty path.");
Console.ForegroundColor = ConsoleColor.White;
Menu();
} else
{
Console.WriteLine("Writing data to file"); System.Threading.Thread.Sleep(5000);
trans.WriteDataFromArray(data, newPath);
Console.WriteLine("File stored.");
Console.ReadLine();
Console.Clear();
Menu();
}
}
break;
File compared to new file
right-click -> open in new tab is probably a good idea
Original File
New File
You're not properly disposing the file streams and the binary writer. Both tend to buffer data (which is a good thing, especially when you're writing one byte at a time). Use using, and your problem should disappear. Unless somebody is editing the file while you're reading it, of course.
BinaryReader and BinaryWriter do not just write "raw data". They also add metadata as needed - they're designed for serialization and deserialization, rather than reading and writing bytes. Now, in the particular case of using ReadBytes and Write(byte[]) in particular, those are really just raw bytes; but there's not much point to use these classes just for that. Reading and writing bytes is the thing every Stream gives you - and that includes FileStreams. There's no reason to use BinaryReader/BinaryWriter here whatsover - the file streams give you everything you need.
A better approach would be to simply use
using (var fsn = ...)
{
fsn.Write(fileData, 0, fileData.Length);
}
or even just
File.WriteAllBytes(fileName, fileData);
Maybe you're thinking that writing a byte at a time is closer to "the metal", but that simply isn't the case. At no point during this does the CPU pass a byte at a time to the hard drive. Instead, the hard drive copies data directly from RAM, with no intervention from the CPU. And most hard drives still can't write (or read) arbitrary amounts of data from the physical media - instead, you're reading and writing whole sectors. If the system really did write a byte at a time, you'd just keep rewriting the same sector over and over again, just to write one more byte.
An even better approach would be to use the fact that you've got file streams open, and stream the files from source to destination rather than first reading everything into memory, and then writing it back to disk.
There is an File.Copy() Method in C#, you can see it here https://msdn.microsoft.com/ru-ru/library/c6cfw35a(v=vs.110).aspx
If you want to realize it by yourself, try to place a breakpoint inside your methods and use a debug. It is like a story about fisher and god, who gived a rod to fisher - to got a fish, not the exactly fish.
Also, look at you int[] fileData and byte[] fileData inside last method, maybe this is problem.

In C#, How can I copy a file with arbitrary encoding, reading line by line, without adding or deleting a newline

I need to be able to take a text file with unknown encoding (e.g., UTF-8, UTF-16, ...) and copy it line by line, making specific changes as I go. In this example, I am changing the encoding, however there are other uses for this kind of processing.
What I can't figure out is how to determine if the last line has a newline! Some programs care about the difference between a file with these records:
Rec1<newline>
Rec2<newline>
And a file with these:
Rec1<newline>
Rec2
How can I tell the difference in my code so that I can take appropriate action?
using (StreamReader reader = new StreamReader(sourcePath))
using (StreamWriter writer = new StreamWriter(destinationPath, false, outputEncoding))
{
bool isFirstLine = true;
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (isFirstLine)
{
writer.Write(line);
isFirstLine = false;
}
else
{
writer.Write("\r\n" + line);
}
}
//if (LastLineHasNewline)
//{
// writer.Write("\n");
//}
writer.Flush();
}
The commented out code is what I want to be able to do, but I can't figure out how to set the condition lastInputLineHadNewline! Remember, I have no a priori knowledge of the input file encoding.
Remember, I have no a priori knowledge of the input file encoding.
That's the fundamental problem to solve.
If the file could be using any encoding, then there is no concept of reading "line by line" as you can't possibly tell what the line ending is.
I suggest you first address this part, and the rest will be easy. Now, without knowing the context it's hard to say whether that means you should be asking the user for the encoding, or detecting it heuristically, or something else - but I wouldn't start trying to use the data before you can fully understand it.
As often happens, the moment you go to ask for help, the answer comes to the surface. The commented out code becomes:
if (LastLineHasNewline(reader))
{
writer.Write("\n");
}
And the function looks like this:
private static bool LastLineHasNewline(StreamReader reader)
{
byte[] newlineBytes = reader.CurrentEncoding.GetBytes("\n");
int newlineByteCount = newlineBytes.Length;
reader.BaseStream.Seek(-newlineByteCount, SeekOrigin.End);
byte[] inputBytes = new byte[newlineByteCount];
reader.BaseStream.Read(inputBytes, 0, newlineByteCount);
for (int i = 0; i < newlineByteCount; i++)
{
if (newlineBytes[i] != inputBytes[i])
return false;
}
return true;
}

How do I read from a file?

I'm trying to get my program to read code from a .txt and then read it back to me, but for some reason, it crashes the program when I compile. Could someone let me know what I'm doing wrong? Thanks! :)
using System;
using System.IO;
public class Hello1
{
public static void Main()
{
string winDir=System.Environment.GetEnvironmentVariable("windir");
StreamReader reader=new StreamReader(winDir + "\\Name.txt");
try {
do {
Console.WriteLine(reader.ReadLine());
}
while(reader.Peek() != -1);
}
catch
{
Console.WriteLine("File is empty");
}
finally
{
reader.Close();
}
Console.ReadLine();
}
}
I don't like your solution for two simple reasons:
1)I don't like gotta Cath 'em all(try catch). For avoing check if the file exist using System.IO.File.Exist("YourPath")
2)Using this code you haven't dispose the streamreader. For avoing this is better use the using constructor like this: using(StreamReader sr=new StreamReader(path)){ //Your code}
Usage example:
string path="filePath";
if (System.IO.File.Exists(path))
using (System.IO.StreamReader sr = new System.IO.StreamReader(path))
{
while (sr.Peek() > -1)
Console.WriteLine(sr.ReadLine());
}
else
Console.WriteLine("The file not exist!");
If your file is located in the same folder as the .exe, all you need to do is StreamReader reader = new StreamReader("File.txt");
Otherwise, where File.txt is, put the full path to the file. Personally, I think it's easier if they are in the same location.
From there, it's as simple as Console.WriteLine(reader.ReadLine());
If you want to read all lines and display all at once, you could do a for loop:
for (int i = 0; i < lineAmount; i++)
{
Console.WriteLine(reader.ReadLine());
}
Use the code below if you want the result as a string instead of an array.
File.ReadAllText(Path.Combine(winDir, "Name.txt"));
Why not use System.IO.File.ReadAllLines(winDir + "\Name.txt")
If all you're trying to do is display this as output in the console, you could do that pretty compactly:
private static string winDir = Environment.GetEnvironmentVariable("windir");
static void Main(string[] args)
{
Console.Write(File.ReadAllText(Path.Combine(winDir, "Name.txt")));
Console.Read();
}
using(var fs = new FileStream(winDir + "\\Name.txt", FileMode.Open, FileAccess.Read))
{
using(var reader = new StreamReader(fs))
{
// your code
}
}
The .NET framework has a variety of ways to read a text file. Each have pros and cons... lets go through two.
The first, is one that many of the other answers are recommending:
String allTxt = File.ReadAllText(Path.Combine(winDir, "Name.txt"));
This will read the entire file into a single String. It will be quick and painless. It comes with a risk though... If the file is large enough, you may run out of memory. Even if you can store the entire thing into memory, it may be large enough that you will have paging, and will make your software run quite slowly. The next option addresses this.
The second solution allows you to work with one line at a time and not load the entire file into memory:
foreach(String line in File.ReadLines(Path.Combine(winDir, "Name.txt")))
// Do Work with the single line.
Console.WriteLine(line);
This solution may take a little longer for files because it's going to do work MORE OFTEN with the contents of the file... however, it will prevent awkward memory errors.
I tend to go with the second solution, but only because I'm paranoid about loading huge Strings into memory.

Issues using StreamReader.EndOfStream?

So I'm doing a project where I am reading in a config file. The config file is just a list of string like "D 1 1", "C 2 2", etc. Now I haven't ever done a read/write in C# so I looked it up online expecting to find some sort of rendition of C/C++ .eof(). I couldn't find one.
So what I have is...
TextReader tr = new StreamReader("/mypath");
Of all the examples online of how I found to read to the end of a file the two examples that kept occurring were
while ((line = tr.ReadLine() != null)
or
while (tr.Peek() >= 0)
I noticed that StreamReader has a bool EndOfStream but no one was suggesting it which led me to believe something was wrong with that solution. I ended up trying it like this...
while (!(tr as StreamReader).EndOfStream)
and it seems to work just fine.
So I guess my question is would I experience issues with casting a TextReader as a StreamReader and checking EndOfStream?
One obvious downside is that it makes your code StreamReader specific. Given that you can easily write the code using just TextReader, why not do so? That way if you need to use a StringReader (or something similar) for unit tests etc, there won't be any difficulties.
Personally I always use the "read a line until it's null" approach - sometimes via an extension method so that I can use
foreach (string line in reader.EnumerateLines())
{
}
EnumerateLines would then be an extension method on TextReader using an iterator block. (This means you can also use it for LINQ etc easily.)
Or you could use ReadAllLines, to simplify your code:
http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx
This way, you let .NET take care of all the EOF/EOL management, and you focus on your content.
No you wont experience any issue's. If you look at the implementation if EndToStream, you'll find that it just checks if there is still data in the buffer and if not, if it can read more data from the underlying stream:
public bool EndOfStream
{
get
{
if (this.stream == null)
{
__Error.ReaderClosed();
}
if (this.charPos < this.charLen)
{
return false;
}
int num = this.ReadBuffer();
return num == 0;
}
}
Ofcourse casting in your code like that makes it dependend on StreamReader being the actual type of your reader which isn't pretty to begin with.
Maybe read it all into a string and then parse it: StreamReader.ReadToEnd()
using (StreamReader sr = new StreamReader(path))
{
//This allows you to do one Read operation.
string contents = sr.ReadToEnd());
}
Well, StreamReader is a specialisation of TextReader, in the sense that StreamReader inherits from TextReader. So there shouldn't be a problem. :)
var arpStream = ExecuteCommandLine(cmd, arg);
arpStream.ReadLine(); // Read entries
while (!arpStream.EndOfStream)
{
var line1 = arpStream.ReadLine().Trim();
// TeststandInt.SendLogPrint(line, true);
}

Bytes consumed by StreamReader

Is there a way to know how many bytes of a stream have been used by StreamReader?
I have a project where we need to read a file that has a text header followed by the start of the binary data. My initial attempt to read this file was something like this:
private int _dataOffset;
void ReadHeader(string path)
{
using (FileStream stream = File.OpenRead(path))
{
StreamReader textReader = new StreamReader(stream);
do
{
string line = textReader.ReadLine();
handleHeaderLine(line);
} while(line != "DATA") // Yes, they used "DATA" to mark the end of the header
_dataOffset = stream.Position;
}
}
private byte[] ReadDataFrame(string path, int frameNum)
{
using (FileStream stream = File.OpenRead(path))
{
stream.Seek(_dataOffset + frameNum * cbFrame, SeekOrigin.Begin);
byte[] data = new byte[cbFrame];
stream.Read(data, 0, cbFrame);
return data;
}
return null;
}
The problem is that when I set _dataOffset to stream.Position, I get the position that the StreamReader has read to, not the end of the header. As soon as I thought about it this made sense, but I still need to be able to know where the end of the header is and I'm not sure if there's a way to do it and still take advantage of StreamReader.
You can find out how many bytes the StreamReader has actually returned (as opposed to read from the stream) in a number of ways, none of them too straightforward I'm afraid.
Get the result of textReader.CurrentEncoding.GetByteCount(totalLengthOfAllTextRead) and then seek to this position in the stream.
Use some reflection hackery to retrieve the value of the private variable of the StreamReader object that corresponds to the current byte position within the internal buffer (different from that with the stream - usually behind, but no more than equal to of course). Judging by .NET Reflector, the this variable seems to be named bytePos.
Don't bother using a StreamReader at all but instead implement your custom ReadLine function built on top of the Stream or BinaryReader even (BinaryReader is guaranteed never to read further ahead than what you request). This custom function must read from the stream char by char, so you'd actually have to use the low-level Decoder object (unless the encoding is ASCII/ANSI, in which case things are a bit simpler due to single-byte encoding).
Option 1 is going to be the least efficient I would imagine (since you're effectively re-encoding text you just decoded), and option 3 the hardest to implement, though perhaps the most elegant. I'd probably recommend against using the ugly reflection hack (option 2), even though it's looks tempting, being the most direct solution and only taking a couple of lines. (To be quite honest, the StreamReader class really ought to expose this variable via a public property, but alas it does not.) So in the end, it's up to you, but either method 1 or 3 should do the job nicely enough...
Hope that helps.
So the data is utf8 (the default encoding for StreamReader). This is a multibyte encoding, so IndexOf would be inadvisable. You could:
Encoding.UTF8.GetByteCount(string)
on your data so far, adding 1 or 2 bytes for the missing line ending.
If you're needing to count bytes, I'd go with the BinaryReader. You can take the results and cast them about as needed, but I find its idea of its current position to be more reliable (in that since it reads in binary, its immune to character-set problems).
So your last line contains 'DATA' + an unknown amount of data bytes. You could extract the position by using IndexOf() with your last read line. Then readjust the stream.Position.
But I am not sure if you should use ReadLine() at all in this case. Maybe it would be better to read byte by byte until you reach the 'DATA' mark.
The line breaks are easily identifiable without needing to decode the stream first (except for some encodings rarely used for text files like EBCDIC, UTF-16, UTF-32), so you can just read each line as bytes and then decode the entire line:
using (FileStream stream = File.OpenRead(path)) {
List<byte> buffer = new List<byte>();
bool hasCr = false;
bool done = false;
while (!done) {
int b = stream.ReadByte();
if (b == -1) throw new IOException("End of file reached in header.");
if (b == 13) {
hasCr = true;
} else if (b == 10 && hasCr) {
string line = Encoding.UTF8.GetString(buffer.ToArray(), 0, buffer.Count);
if (line == "DATA") {
done = true;
} else {
HandleHeaderLine(line);
}
buffer.Clear();
hasCr = false;
} else {
if (hasCr) buffer.Add(13);
hasCr = false;
buffer.Add((byte)b);
}
}
_dataOffset = stream.Position;
}
Instead of closing the stream and open it again, you could of course just keep on reading the data.

Categories