StreamReader read by char encoding - c#

I need to read text file with ASCII encoding char by char. If i have in that file aposthrope(’) character then i get questionmark instead. I checked if the file is in ASCII and it is so there won't be problem.
StreamReader reader = new StreamReader(this.path, Encoding.ASCII);
while (!reader.EndOfStream)
{
char chr = (char) reader.Read();
// if i read character ’ then the content of chr is: 63 '?'
// but i need aposthrope not questionmark
}

Your source must be saved in a different encoding format. If you want to change the question mark(63) to an apostrophe(39) all you have to do is modify your code like so
StreamReader reader = new StreamReader(this.path, Encoding.ASCII);
int c = 0;
while (!reader.EndOfStream)
{
c=reader.Read()
char chr = c==63?(char)39:(char)c;
}
Learn something new: ? : : is the ternary operator and is a very nice shorthand for something like this.
(check condition)?(result if true):(result if false)

Related

Get binary representation of ASCII symbol (C#)

Sorry for asking a question like that, but I'm really stuck.
I have this method for reading data from file:
public void ReadFromFile()
{
string fileName = #"my .txt file path";
StreamReader sr;
List<char> encoded = new List<char>();
List<byte> converted = new List<byte>();
using (StreamReader sr = new StreamReader(fileName))
{
string line = sr.ReadToEnd();
string[] lines = line.Split('\n');
foreach (var v in lines[2])
{
encoded.Add(v); // just get data I need
}
} }
Now in encoded I have F and # symbols.
I want to get 01000110 (F representation) and 01000000 (# representation)
I tried to convert every item in List<char> encoded into bytes and then use Convert.ToString(value, 2)
But it's not a good idea, because there's a mistake "Value was either too large or too small for an unsigned byte."
in the output file I have something like this:
s,01;w,000;e,1;t,001; // dictionary of character and its code
6 // number of zeros
F# // encoded string
So what I want to do is to DECODE this thing into the input string (that is 'sweet'). For this, I need to decode F# into 0100011001000000

Read lines in C# without losing newline characters

At the moment, I am using C#'s inbuilt StreamReader to read lines from a file.
As is well known, if the last line is blank, the stream reader does not acknowledge this as a separate line. That is, a line must contain text, and the newline character at the end is optional for the last line.
This is having the effect on some of my files that I am losing (important, for reasons I don't want to get into) whitespace at the end of the file each time my program consumes and re-writes specific files.
Is there an implementation of TextReader available either as a part of the language or as a NuGet package which provides the ReadLine functionality, but retains the new line characters (whichever they may be) as a part of the line so that I can exactly reproduce the output? I would prefer not to have to roll my own method to consume line-based input.
Edit: it should be noted that I cannot read the whole file into memory.
You can combine ReadToEnd() with Split to get in an array the content of your file, including the empty lines.
I don't recommend you to use ReadToEnd() if your file is big.
In example :
string[] lines;
using (StreamReader sr = new StreamReader(path))
{
var WholeFile = sr.ReadToEnd();
lines = WholeFile.Split('\n');
}
private readonly char newLineMarker = Environment.NewLine.Last();
private readonly char[] newLine = Environment.NewLine.ToCharArray();
private readonly char eof = '\uffff';
private IEnumerable<string> EnumerateLines(string path)
{
using (var sr = new StreamReader(path))
{
char c;
string line;
var sb = new StringBuilder();
while ((c = (char)sr.Read()) != eof)
{
sb.Append(c);
if (c == newLineMarker &&
(line = sb.ToString()).EndsWith(Environment.NewLine))
{
yield return line.Trim(newLine);
sb.Clear();
sb.Append(Environment.NewLine);
}
}
if (sb.Length > 0)
yield return sb.ToString().Trim(newLine);
}
}

How do I read chars from other countries such as ß ä?

How do I read chars from other countries such as ß ä?
The following code reads all chars, including chars such as 0x0D.
StreamReader srFile = new StreamReader(gstPathFileName);
char[] acBuf = null;
int iReadLength = 100;
while (srFile.Peek() >= 0) {
acBuf = new char[iReadLength];
srFile.Read(acBuf, 0, iReadLength);
string s = new string(acBuf);
}
But it does not interpret correctly chars such as ß ä.
I don't know what coding the file uses. It is exported from code (into a .txt file) that was written 20 plus years ago from a C-Tree database.
The ß ä display fine with Notepad.
By default, the StreamReader constructor assumes the UTF-8 encoding (which is the de facto universal standard today). Since that's not decoding your file correctly, your characters (ß, ä) suggest that it's probably encoded using Windows-1252 (Western European):
var encoding = Encoding.GetEncoding("Windows-1252");
using (StreamReader srFile = new StreamReader(gstPathFileName, encoding))
{
// ...
}
A closely-related encoding is ISO/IEC 8859-1. If the above gives some unexpected results, use Encoding.GetEncoding("ISO-8859-1") instead.

ReadLine is changing the text read?

I have a text file that I need to read and modify. This file come from another program so I can not modify its format. I need to use it as a template and make a bunch of replacements for specific cases. One of the lines I am reading is delimited with 0xFF characters. But when I call ReadLine the string returns the line delimited with 0x3F characters. I have tried different encodings. ASCII where it comes back as 0x3f and UTF-8 where it comes back as 3bytes 0xEF 0xBF 0xBD. The original text file seems to be ANSI format and the 0xFF character shows up as a "ÿ". How can I get my ReadLine (and the subsequent WriteLine) to keep this character intact?
var replacements = new Dictionary<string, string> { {"to_replace1", "replacement1"}, {"to_replace2", "replacement2"}, {"etc etc", "more replaces"} };
using (var writer = new StreamWriter(projectfileSpecific, false, Encoding.ASCII))
{
foreach (var line in File.ReadLines(projectfileTemplate, Encoding.ASCII))
{
foreach (var replacement in replacements)
{
if (line.Contains(replacement.Key))
{
var replaceLine = line;
writer.WriteLine(replaceLine.Replace(replacement.Key, replacement.Value));
}
else
{
writer.WriteLine(line);
}
}
}
}

How can I determine the index in codepage 850 for a char in C#?

I have a text file which is encoded with codepage 850. I am reading this file the following way:
using (var reader = new StreamReader(filePath, Encoding.GetEncoding(850)))
{
string line;
while ((line = reader.ReadLine()) != null)
{
//...
}
//...
}
Now I need for every character in the string line in the loop above the zero-based index of that character which it has in codepage 850, something like:
for (int i = 0; i < line.Length; i++)
{
int indexInCodepage850 = GetIndexInCodepage850(line[i]); // ?
//...
}
Is this possible and how could int GetIndexInCodepage850(char c) look like?
Use Encoding.GetBytes() on the line. CP850 is an 8-bit encoding, so the byte array should have just as many elements as the string had characters, and each element is the value of the character.
Just read the file as bytes, and you have the codepage 850 character codes:
byte[] data = File.ReadAllBytes(filePath);
You don't get it separated into lines, though. The character codes for CR and LF that you need to look for in the data are 13 and 10.
You don't need to.
You are already specifying the encoding in the streamreader constructor.
The string returned from reader.ReadLine() will already have been encoding using CP850

Categories