How to retain similar character encoding

How to retain similar character encoding - c#

I have a logfile that contains the half character ½, I need to process this log file and rewrite certain lines to a new file, which contain that character. However, when I write out the file the characters appear in notepad incorrectly.
I know this is some kind of encoding issue, and i'm not sure if it's just that the files i'm writing don't contain the correct bom or what.
I've tried reading and writing the file with all the available encoding options in the Encoding enumeration.
I'm using this code:
string line;
// Note i've used every version of the Encoding enumeration
using (StreamReader sr = new StreamReader(file, Encoding.Unicode))
using (StreamWRiter sw = new StreamWriter(newfile, false, Encoding.Unicode))
{
while ((line = sr.ReadLine()) != null)
{
// process code, I do not alter the lines, they are copied verbatim
// but i do not write every line that i read.
sw.WriteLine(line);
}
}
When I view the original log in notepad, the half character displays correctly. When I view the new file, it does not. This tells me the problem is not with notepad being able to display the character, because it works in the original.
Can anyone help me to solve this?

The solution was PEBKAC.
I was changing the encodings in a different part of the program that wasn't creating these files. When I changed the correct files, using Encoding.Default, it displays correctly.
Thanks Jon and others.

Related

What is the correct encoding to read and write PostScript files

I would like to parse a PostScript file, find appropriate line number and insert a PostScript command. So, I need to read the whole file and write it as a new file along with the new commands I want to insert.
I'm using StreamReader and StreamWriter for this process.
StreamReader sr = new StreamReader("filename.ps", System.Text.Encoding.UTF8, true);
StreamWriter sw = new StreamWriter("updatedfilename.ps",true, System.Text.Encoding.UTF8);
When doing this, even though the commands are inserted in the appropriate location, some characters are getting lost due to encoding issues.
For example, please check the below image: In the After content, you can notice the yellow highlighted characters which got added during my write process.
In summary, I would like to know the process to read and write a PS file as it is without losing data because of encoding.

Streamreader initialization with a text file that has an empty line each other line

I'm facing a quite curious problem. I'm trying to initialize a streamreader with a file name and an encoding parameter but my code fails due to the fact the file has an empty line each other line contained.
What I'm trying to do is to read the lines in a list. If the file does not contain empty lines then the code executes sucessfuly.
i'm initalizing the reader like this
using (StreamReader reader = new StreamReader(filename, encoding))
{
//do stuff...
}
Any thoughts as of how I could perform the operation mentioned above ? This is for an automated process, so no manual tampering with the file can be performed.
Thank you in advance

I hope this helps:
using (StreamReader sr = new StreamReader(FileName)){
while (!sr.EndOfStream)
{
// read every line
line = sr.ReadLine();
// if line is empty
if (string.IsNullOrEmpty(line.Trim(' '))) // if you want to handle a line with a space as empty use trim for spaces
{
//... do something
}
}
}

It seems that the enconding of the file was messed up when I saved it using Notepad++ (without altering anything on the file, simply clicking save) and the Streamreader did not like it. I can parse it correctly now (if do not click on "save" accidentally) and i can handle the empty lines using
string.IsNullOrWhiteSpace
Thank you all for your time.

Irregular character/text encoding issue with writing back to file

I'm using this function to read text lines from a file:
string[] postFileLines = System.IO.File.ReadAllLines(pstPathTextBox.Text);
Inserting a few additional lines at strategic spots, then writing the text lines back to a file with:
TextWriter textW = new StreamWriter(filePath);
for (int i = 0; i < linesToWrite.Count; i++)
{
textW.WriteLine(linesToWrite[i]);
}
textW.Close();
This works perfectly well until the text file I am reading in contains an international or special character. When writing back to the file, I don't get the same character - it is a box.
Ex:
Before = W:\Contrat à faire aujourdhui\ `
After = W:\Contrat � faire aujourdhui\ `
This webpage is portraying it as a question mark, but in the text file it's a rect white box.
Is there a way to include the correct encoding in my application to be able to handle these characters? Or, if not, throw a warning saying it was not able to properly write given line?

Add encondig like this:
File.ReadAllLines(path, Encoding.UTF8);
and
new StreamWriter(filePath, Encoding.UTF8);
Hope it helps.

use This , works for me
string txt = System.IO.File.ReadAllText(inpPath, Encoding.GetEncoding("iso-8859-1"));

You can try UTF encoding while writing to the file as well,
textW.WriteLine(linesToWrite[i],Encoding.UTF8);

You may be need to write Single-byte Character Sets
Using Encoding.GetEncodings() you can easily get all possible encoding. ("DOS" encoding are System.Text.SBCSCodePageEncoding)
In your case you may need to use
File.ReadAllLines(path, Encoding.GetEncoding("IBM850"));
and
new StreamWriter(filePath, Encoding.GetEncoding("IBM850"));
Bonne journée! ;)

Is there a restriction on the amount of text I can write / read with a stream writer / reader in C#?

I am trying to write to some text file using a stream writer.
the text I am trying to write is from a different text file.
I try:
string line = reader.ReadLine(); //reader is a streamReader I defined before
while (line != null)
{
sw.WriteLine(line); //sw is a streamWriter I defined before
line = reader.ReadLine();
}
I also tried:
while (!(reader.EndOfStream))
{
sw.WriteLine(reader.ReadLine()); //sw is a streamWriter I defined before
}
this two methods succeeded to copy the text from the file to the other file, but from some reason not all of the text was copied.
The text file I am trying to copy from is very large, about 96000 lines, and only the ~95000 first lines are copied.
Therfore, I am asking if there is a restriction on the amount of text I can write / read with a stream writer / reader in C#?
Also, I asking for some suggestions for how to succeed copy all the text.
(I read that there is a method copy of the Stream class, but that is for .NET4, so it wont help).
EDIT: I tried to replace the text in the end that didn't copied by a text form the start that was copied. I got the same problem, so it isn't a problem with the characters.

Hmm. Probably you are not flushing your stream. Try doing sw.Autoflush=true; Or, before you close sw, call sw.Flush();

I am going to guess that you are not calling flush on your output stream. This would cause the last few (sometimes a lot) of lines to not be written to the output file.

StreamReader is unable to correctly read extended character set (UTF8)

I am having an issue where I am unable to read a file that contains foreign characters. The file, I have been told, is encoded in UTF-8 format.
Here is the core of my code:
using (FileStream fileStream = fileInfo.OpenRead())
{
using (StreamReader reader = new StreamReader(fileStream, System.Text.Encoding.UTF8))
{
string line;
while (!string.IsNullOrEmpty(line = reader.ReadLine()))
{
hashSet.Add(line);
}
}
}
The file contains the word "achôcre" but when examining it during debugging it is adding it as "ach�cre".
(This is a profanity file so I apologize if you speak French. I for one, have no idea what that means)

The evidence clearly suggests that the file is not in UTF-8 format. Try System.Text.Encoding.Default and see if you get the correct text then — if you do, you know the file is in Windows-1252 (assuming that is your system default codepage). In that case, I recommend that you open the file in Notepad, then re-“Save As” it as UTF-8, and then you can use Encoding.UTF8 normally.
Another way to check what encoding the file is actually in is to open it in your browser. If the accents display correctly, then the browser has detected the correct character set — so look at the “View / Character set” menu to find out which one is selected. If the accents are not displaying correctly, then change the character set via that menu until they do.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to retain similar character encoding - c#

The solution was PEBKAC. I was changing the encodings in a different part of the program that wasn't creating these files. When I changed the correct files, using Encoding.Default, it displays correctly. Thanks Jon and others.

Related

What is the correct encoding to read and write PostScript files

Streamreader initialization with a text file that has an empty line each other line

Irregular character/text encoding issue with writing back to file

Is there a restriction on the amount of text I can write / read with a stream writer / reader in C#?

StreamReader is unable to correctly read extended character set (UTF8)

Categories

Resources