Handling Special Characters (¦) - c#

I'm a bit lost on how to read and write to/from text files in C# when special characters are present. I'm writing a simple script that does some cleanup on a .txt data file which contains the '¦' character as its delimiter.
foreach (string file in Directory.EnumerateFiles(#"path\raw txt","*.txt"))
{
string contents = File.ReadAllText(file);
contents = contents.Replace("¦", ",");
File.WriteAllText(file.Replace("raw txt", "txt"), contents);
}
However, when I open the txt file in Notepad++, the delimeter is now �. What exactly is going on? What even is this characters (¦) encoding / how would I determine that? I've tried adding things like:
string contents = File.ReadAllText(file, Encoding.UTF8);
File.WriteAllText(file.Replace("raw txt", "txt"), contents, Encoding.UTF8);

Everything is now working correctly by switching the encoding to 'default' when both reading/writing.
string contents = File.ReadAllText(file, Encoding.Default);
File.WriteAllText(file.Replace("raw txt", "txt"), contents, Encoding.Default);

Try change encoding of Notepad to UTF-8

Related

UTF-8 CSV file created with C# shows  characters in Excel

When a CSV file is generated using C# and opened in Microsoft Excel it displays  characters before special symbols e.g. £
In Notepad++ the hex value for  is: C2
So before writing the £ symbol to file, I have tried the following...
var test = "£200.00";
var replaced = test.Replace("\xC2", " ");
StreamWriter outputFile = File.CreateText("testoutput.csv"); // default UTF-8
outputFile.WriteLine(replaced);
outputFile.Close();
When opening the CSV file in Excel, I still see the "Â" character before the £ symbol (hex equivalent \xC2 \xA3); It made no difference.
Do I need to use a different encoding? or am I missing something?
Thank you #Evk and #Mortalier, your suggestions lead me to the right direction...
I needed to update my StreamWriter so it would explicitly include UTF-8 BOM at the beginning http://thinkinginsoftware.blogspot.co.uk/2017/12/correctly-generate-csv-that-excel-can.html
So my code has changed from:
StreamWriter outputFile = File.CreateText("testoutput.csv"); // default UTF-8
To:
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, new UTF8Encoding(true))
Or: Another solution I found here was to use a different encoding if you're only expecting latin characters...
http://theoldsewingfactory.com/2010/12/05/saving-csv-files-in-utf8-creates-a-characters-in-excel/
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, Encoding.GetEncoding("Windows-1252"))
My system will most likely use latin & non-latin characters so I'm using the UTF-8 BOM solution.
Final code
var test = "£200.00";
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, new UTF8Encoding(true))
outputFile.WriteLine(test);
outputFile.Close();
I tried your code and Excel does show AŁ in the cell.
Then I tried to open the csv with LibreOffice Clac. At first there too was AŁ, but
on import the program will ask you about encoding.
Once I chose UTF-8 the £ symbol was displayed correctly.
My guess is that in fact there is an issue with your encoding.
This might help with Excel https://superuser.com/questions/280603/how-to-set-character-encoding-when-opening-excel

StreamWriter encoding string incorrectly

I'm trying to append a base64 string to an existing file. Here's my code:
StreamWriter output = new StreamWriter(file, true, Encoding.ASCII);
output.WriteLine(output.NewLine + str);
Here file is the file path.
For some reason, there is one particular file (a .cs file, if it matters) where the actual text that gets appended is a string of Chinese characters. It works as expected for all the other files I've tested.
Following the multiple suggestions in the comments, I replaced Encoding.ASCII with the file's encoding, which I look up using 2Toad's answer here. That solved the problem.

Problems with strings in the CSV file

I have an application that reads information from a CSV file to write it to the database. But some characters (example: º ç) are appearing problems Gravalos base. Anyone know how to fix this problem?
Thank you.
I'm using these lines of code to read the information from the CSV file:
string directory = #"C:\test.csv";
StreamReader stream = new StreamReader(directory);
string line = "";
line = stream.ReadLine();
string[] column = line.Split(';');
StreamReader defaults to UTF8 encoding and your file is in a different encoding. Try specifying it like this...
var encoding = Encoding.UTF16;
StreamReader stream = new StreamReader(directory, encoding);
Note that you need to know what encoding the file is in to properly read it... I'm just guessing that it might be UTF16 but obviously I can't know what it is.
You should specify the right encoding when reading the file. The default is UTF-8. Your file is probably encoded with a different encoding.
This is most likely related to the Encoding that is used when reading the file. By default, UTF8 is assumed as the Encoding. In order to read the file correctly, you need to specify the right encoding, e.g.:
string directory = #"C:\test.csv";
using(StreamReader stream = new StreamReader(directory, Encoding.ASCII))
{
string line = "";
line = stream.ReadLine();
string[] column = line.Split(';');
}
You can try the following encodings (see this link for a complete list):
Encoding.Default for ANSI encoding based in the current windows code page.
Encoding.ASCII for ASCII encoding.
Encoding.UTF* for different Unicode encodings.
Please note that I enclosed the StreamReader in a using block so that it is disposed when it is not needed anymore.

Irregular character/text encoding issue with writing back to file

I'm using this function to read text lines from a file:
string[] postFileLines = System.IO.File.ReadAllLines(pstPathTextBox.Text);
Inserting a few additional lines at strategic spots, then writing the text lines back to a file with:
TextWriter textW = new StreamWriter(filePath);
for (int i = 0; i < linesToWrite.Count; i++)
{
textW.WriteLine(linesToWrite[i]);
}
textW.Close();
This works perfectly well until the text file I am reading in contains an international or special character. When writing back to the file, I don't get the same character - it is a box.
Ex:
Before = W:\Contrat à faire aujourdhui\ `
After = W:\Contrat � faire aujourdhui\ `
This webpage is portraying it as a question mark, but in the text file it's a rect white box.
Is there a way to include the correct encoding in my application to be able to handle these characters? Or, if not, throw a warning saying it was not able to properly write given line?
Add encondig like this:
File.ReadAllLines(path, Encoding.UTF8);
and
new StreamWriter(filePath, Encoding.UTF8);
Hope it helps.
use This , works for me
string txt = System.IO.File.ReadAllText(inpPath, Encoding.GetEncoding("iso-8859-1"));
You can try UTF encoding while writing to the file as well,
textW.WriteLine(linesToWrite[i],Encoding.UTF8);
You may be need to write Single-byte Character Sets
Using Encoding.GetEncodings() you can easily get all possible encoding. ("DOS" encoding are System.Text.SBCSCodePageEncoding)
In your case you may need to use
File.ReadAllLines(path, Encoding.GetEncoding("IBM850"));
and
new StreamWriter(filePath, Encoding.GetEncoding("IBM850"));
Bonne journée! ;)

How to correctly open UTF-8 files in RichTextBox?

I have a question. This code open fine the txt files with english text, but when I trying to open the txt files with cyrillic text... the cyrillic symbols are "squares". Is it possible to resolve this problem?
string fileData = openFileDialog1.FileName;
StreamReader sr = new StreamReader(fileData);
richTextBox.Text = sr.ReadToEnd();
sr.Close();
SavedFile = saveFileDialog1.FileName;
dataTextBox.SaveFile(SavedFile, RichTextBoxStreamType.PlainText);
Solution:
string fileData = openFileDialog1.FileName;
StreamReader sr = new StreamReader(fileData, Encoding.Default);
richTextBox.Text = sr.ReadToEnd();
sr.Close();
And you are SURE the file is UTF8, right? If you write string str = sr.ReadToEnd();, place a breakpoint on the next line and watch str in Visual Studio, you see cyrillic text right? Try opening the file in notepad, File->Save As and select UTF8 as encoding.
The reason notepad is able to "read" the file is that it uses the user codepage, and in your case it's probably the Windows-1251 (Cyrillic) Codepage. StreamReader tries to read the file as UTF8. If you want you can force StreamReader to use a different codepage. The second parameter is the Encoding you want to use. You pass Encoding.GetEncoding(1251) for cyrillic. Sadly you must know the Encoding "a priori" (=before) reading the file.
StreamWriter, by default read by UTF-8 encoding format unless explicitly specified.
Try converting the text to Windows Encoding and try reading it again with same code.

Categories