I have a text file that contains a strange encoded characters, the original characters of the file was Arabic characters.
As a sample: the file contains this string ÝíæáÇ ãÍÝæÑ which equivalent to فيولا محفور
other some examples here:
ÈÇÑíÜÜÜÜÜÒ = باريـــــز
ÏíäÇ ÔÇÌ = دينا شاج
ßíÑãÇäì ãÍÝæÑ = كيرمانى محفور
ÇäÌì ÈÇáÝæã ãßãáÇÊ = انجى بالفوم مكملات
ÓÈÔíÇá ÑæíÇá 35 ãáã = سبشيال رويال 35 ملم
Is there is any way to revert back the file content to its original Arabic characters?
Note: I am using C# programming language.
I'm not too familiar with Arabic encodings, but I assume that your text file is encoded using a Windows-1256 code page.
So you need to specify this codepage when reading the file:
var text = File.ReadAllText(pathToFile, Encoding.GetEncoding(1256));
Related
When a CSV file is generated using C# and opened in Microsoft Excel it displays  characters before special symbols e.g. £
In Notepad++ the hex value for  is: C2
So before writing the £ symbol to file, I have tried the following...
var test = "£200.00";
var replaced = test.Replace("\xC2", " ");
StreamWriter outputFile = File.CreateText("testoutput.csv"); // default UTF-8
outputFile.WriteLine(replaced);
outputFile.Close();
When opening the CSV file in Excel, I still see the "Â" character before the £ symbol (hex equivalent \xC2 \xA3); It made no difference.
Do I need to use a different encoding? or am I missing something?
Thank you #Evk and #Mortalier, your suggestions lead me to the right direction...
I needed to update my StreamWriter so it would explicitly include UTF-8 BOM at the beginning http://thinkinginsoftware.blogspot.co.uk/2017/12/correctly-generate-csv-that-excel-can.html
So my code has changed from:
StreamWriter outputFile = File.CreateText("testoutput.csv"); // default UTF-8
To:
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, new UTF8Encoding(true))
Or: Another solution I found here was to use a different encoding if you're only expecting latin characters...
http://theoldsewingfactory.com/2010/12/05/saving-csv-files-in-utf8-creates-a-characters-in-excel/
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, Encoding.GetEncoding("Windows-1252"))
My system will most likely use latin & non-latin characters so I'm using the UTF-8 BOM solution.
Final code
var test = "£200.00";
StreamWriter outputFile = new StreamWriter("testoutput.csv", false, new UTF8Encoding(true))
outputFile.WriteLine(test);
outputFile.Close();
I tried your code and Excel does show AŁ in the cell.
Then I tried to open the csv with LibreOffice Clac. At first there too was AŁ, but
on import the program will ask you about encoding.
Once I chose UTF-8 the £ symbol was displayed correctly.
My guess is that in fact there is an issue with your encoding.
This might help with Excel https://superuser.com/questions/280603/how-to-set-character-encoding-when-opening-excel
I'm using the code below to read a text file that contains foreign characters, the file is encoded ANSI and looks fine in notepad. The code below doesn't work, when the file values are read and shown in the datagrid the characters appear as squares, could there be another problem elsewhere?
StreamReader reader = new StreamReader(inputFilePath, System.Text.Encoding.ANSI);
using (reader = File.OpenText(inputFilePath))
Thanks
Update 1: I have tried all encodings found under System.Text.Encoding. and all fail to show the file correctly.
Update 2: I've changed the file encoding (resaved the file) to unicode and used System.Text.Encoding.Unicode and it worked just fine. So why did notepad read it correctly? And why didn't System.Text.Encoding.Unicode read the ANSI file?
You may also try the Default encoding, which uses the current system's ANSI codepage.
StreamReader reader = new StreamReader(inputFilePath, Encoding.Default, true)
When you try using the Notepad "Save As" menu with the original file, look at the encoding combo box. It will tell you which encoding notepad guessed is used by the file.
Also, if it is an ANSI file, the detectEncodingFromByteOrderMarks parameter will probably not help much.
I had the same problem and my solution was simple: instead of
Encoding.ASCII
use
Encoding.GetEncoding("iso-8859-1")
The answer was found here.
Edit: more solutions. This maybe more accurate one:
Encoding.GetEncoding(1252);
Also, in some cases this will work for you too if your OS default encoding matches file encoding:
Encoding.Default;
Yes, it could be with the actual encoding of the file, probably unicode. Try UTF-8 as that is the most common form of unicode encoding. Otherwise if the file ASCII then standard ASCII encoding should work.
Using Encoding.Unicode won't accurately decode an ANSI file in the same way that a JPEG decoder won't understand a GIF file.
I'm surprised that Encoding.Default didn't work for the ANSI file if it really was ANSI - if you ever find out exactly which code page Notepad was using, you could use Encoding.GetEncoding(int).
In general, where possible I'd recommend using UTF-8.
Try a different encoding such as Encoding.UTF8. You can also try letting StreamReader find the encoding itself:
StreamReader reader = new StreamReader(inputFilePath, System.Text.Encoding.UTF8, true)
Edit: Just saw your update. Try letting StreamReader do the guessing.
For swedish Å Ä Ö the only solution form the ones above working was:
Encoding.GetEncoding("iso-8859-1")
Hopefully this will save someone time.
File.OpenText() always uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.
like
using (StreamReader reader = new StreamReader(#"C:\test.txt", Encoding.Default)
{
// ...
}
I solved my problem of reading portuguese characters, changing the source file on notepad++.
C#
var url = System.Web.HttpContext.Current.Server.MapPath(#"~/Content/data.json");
string s = string.Empty;
using (System.IO.StreamReader sr = new System.IO.StreamReader(url, System.Text.Encoding.UTF8,true))
{
s = sr.ReadToEnd();
}
I'm also reading an exported file which contains french and German languages. I used Encoding.GetEncoding("iso-8859-1"), true which worked out without any challenges.
for Arabic, I used Encoding.GetEncoding(1256). it is working good.
I had a similar problem with ProcessStartInfo and the property StandardOutputEncoding. I set it for German language console output to code page 850. This way I could read the output like ausführen instead of ausf�hren.
I am using a third party OCR Library to convert an Image containing Japanese characters to a text file. The Text file created looks alright when I open it by double clicking but when I load it in TextBox using the code below it becomes strange.
this.textBox1.Text = File.ReadAllText(Outpath);
ReadAllText method can take the encoding as a parameter.
for a japanese file you should probably use:
this.textBox1.Text = File.ReadAllText(Outpath, Encoding.Unicode);
if your file is encoded in UTF8:
this.textBox1.Text = File.ReadAllText(Outpath, Encoding.UTF8);
Hope this helps,
Sylvain
I'm trying to decode an Base64 data which contains a mixture of English and Arabic characters. I'm using the following code to decode.
var bytes = Convert.FromBase64String(data); //data contains base64 data
string text = Encoding.UTF8.GetString(bytes);
After decoding I'm displaying it on the ASP page. My problem here is, English text is displayed properly whereas in place of arabic text i'm getting empty boxes and question marks like this. ����� ���
Please suggest where i'm going wrong.
After searching for few days. I came up with this and is working..
byte[] plain = Convert.FromBase64String(data);
Encoding iso = Encoding.GetEncoding("ISO-8859-6");
newData = iso.GetString(plain);
return newData;
You should run this under debugger and see whether you get the correct Arabic text in string text:
If text is incorrect, then The bytes (after Base64 decode) are not encoded as UTF-8, but some other encoding - UTF-16, Windows-1256, etc.
If text is correct, then it gets corrupted when displayed on the ASP.NET page. In that case, you should set the page's encoding to one that supports Arabic - best is UTF-8, as Shekhar suggests.
try this
byte[] dec1_byte = Base64.decodeBase64(data.getBytes());
String dec1 = new String(dec1_byte);
byte[] newBytes = Base64.encodeBase64(dec1_byte);
String newStr = new String(newBytes);
hope this will work
Try using encoding in your page on which you are displaying the Arabic characters
<%# Page RequestEncoding="utf-8" ResponseEncoding="utf-8" %>
I am reading a file (line by line) full of Swedish characters like äåö but how can I read and save the strings with Swedish characters. Here is my code and I am using UTF8 encoding:
TextReader tr = new StreamReader(#"c:\testfile.txt", System.Text.Encoding.UTF8, true);
tr.ReadLine() //returns a string but Swedish characters are not appearing correctly...
You need to change the System.Text.Encoding.UTF8 to System.Text.Encoding.GetEncoding(1252). See below
System.IO.TextReader tr = new System.IO.StreamReader(#"c:\testfile.txt", System.Text.Encoding.GetEncoding(1252), true);
tr.ReadLine(); //returns a string but Swedish characters are not appearing correctly
I figured it out myself i.e System.Text.Encoding.Default will support Swedish characters.
TextReader tr = new StreamReader(#"c:\testfile.txt", System.Text.Encoding.Default, true);
System.Text.Encoding.UTF8 should be enough and it is supported both on .NET Framework and .NET Core https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding?redirectedfrom=MSDN&view=netframework-4.8
If you still have issues with ��� characters (instead of having ÅÖÄ) then check the source file - what kind of encoding does it have? Maybe it's ANSI, then you have to convert to UTF8.
You can do it in Notepad++. You can open text file and go to Encoding - Convert to UTF-8.
Alternatively in the source code (C#):
var myString = Encoding.UTF8.GetString(File.ReadAllBytes(pathToTheTextFile));