Problems with strings in the CSV file

Problems with strings in the CSV file - c#

I have an application that reads information from a CSV file to write it to the database. But some characters (example: º ç) are appearing problems Gravalos base. Anyone know how to fix this problem?
Thank you.
I'm using these lines of code to read the information from the CSV file:
string directory = #"C:\test.csv";
StreamReader stream = new StreamReader(directory);
string line = "";
line = stream.ReadLine();
string[] column = line.Split(';');

StreamReader defaults to UTF8 encoding and your file is in a different encoding. Try specifying it like this...
var encoding = Encoding.UTF16;
StreamReader stream = new StreamReader(directory, encoding);
Note that you need to know what encoding the file is in to properly read it... I'm just guessing that it might be UTF16 but obviously I can't know what it is.

You should specify the right encoding when reading the file. The default is UTF-8. Your file is probably encoded with a different encoding.

This is most likely related to the Encoding that is used when reading the file. By default, UTF8 is assumed as the Encoding. In order to read the file correctly, you need to specify the right encoding, e.g.:
string directory = #"C:\test.csv";
using(StreamReader stream = new StreamReader(directory, Encoding.ASCII))
{
string line = "";
line = stream.ReadLine();
string[] column = line.Split(';');
}
You can try the following encodings (see this link for a complete list):
Encoding.Default for ANSI encoding based in the current windows code page.
Encoding.ASCII for ASCII encoding.
Encoding.UTF* for different Unicode encodings.
Please note that I enclosed the StreamReader in a using block so that it is disposed when it is not needed anymore.

Related

C# .csv-file in WinForm with Ä, Ö, Ü [duplicate]

I'm using the code below to read a text file that contains foreign characters, the file is encoded ANSI and looks fine in notepad. The code below doesn't work, when the file values are read and shown in the datagrid the characters appear as squares, could there be another problem elsewhere?
StreamReader reader = new StreamReader(inputFilePath, System.Text.Encoding.ANSI);
using (reader = File.OpenText(inputFilePath))
Thanks
Update 1: I have tried all encodings found under System.Text.Encoding. and all fail to show the file correctly.
Update 2: I've changed the file encoding (resaved the file) to unicode and used System.Text.Encoding.Unicode and it worked just fine. So why did notepad read it correctly? And why didn't System.Text.Encoding.Unicode read the ANSI file?

You may also try the Default encoding, which uses the current system's ANSI codepage.
StreamReader reader = new StreamReader(inputFilePath, Encoding.Default, true)
When you try using the Notepad "Save As" menu with the original file, look at the encoding combo box. It will tell you which encoding notepad guessed is used by the file.
Also, if it is an ANSI file, the detectEncodingFromByteOrderMarks parameter will probably not help much.

I had the same problem and my solution was simple: instead of
Encoding.ASCII
use
Encoding.GetEncoding("iso-8859-1")
The answer was found here.
Edit: more solutions. This maybe more accurate one:
Encoding.GetEncoding(1252);
Also, in some cases this will work for you too if your OS default encoding matches file encoding:
Encoding.Default;

Yes, it could be with the actual encoding of the file, probably unicode. Try UTF-8 as that is the most common form of unicode encoding. Otherwise if the file ASCII then standard ASCII encoding should work.

Using Encoding.Unicode won't accurately decode an ANSI file in the same way that a JPEG decoder won't understand a GIF file.
I'm surprised that Encoding.Default didn't work for the ANSI file if it really was ANSI - if you ever find out exactly which code page Notepad was using, you could use Encoding.GetEncoding(int).
In general, where possible I'd recommend using UTF-8.

Try a different encoding such as Encoding.UTF8. You can also try letting StreamReader find the encoding itself:
StreamReader reader = new StreamReader(inputFilePath, System.Text.Encoding.UTF8, true)
Edit: Just saw your update. Try letting StreamReader do the guessing.

For swedish Å Ä Ö the only solution form the ones above working was:
Encoding.GetEncoding("iso-8859-1")
Hopefully this will save someone time.

File.OpenText() always uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.
like
using (StreamReader reader = new StreamReader(#"C:\test.txt", Encoding.Default)
{
// ...
}

I solved my problem of reading portuguese characters, changing the source file on notepad++.
C#
var url = System.Web.HttpContext.Current.Server.MapPath(#"~/Content/data.json");
string s = string.Empty;
using (System.IO.StreamReader sr = new System.IO.StreamReader(url, System.Text.Encoding.UTF8,true))
{
s = sr.ReadToEnd();
}

I'm also reading an exported file which contains french and German languages. I used Encoding.GetEncoding("iso-8859-1"), true which worked out without any challenges.

for Arabic, I used Encoding.GetEncoding(1256). it is working good.

I had a similar problem with ProcessStartInfo and the property StandardOutputEncoding. I set it for German language console output to code page 850. This way I could read the output like ausführen instead of ausf�hren.

Handling Special Characters (¦)

I'm a bit lost on how to read and write to/from text files in C# when special characters are present. I'm writing a simple script that does some cleanup on a .txt data file which contains the '¦' character as its delimiter.
foreach (string file in Directory.EnumerateFiles(#"path\raw txt","*.txt"))
{
string contents = File.ReadAllText(file);
contents = contents.Replace("¦", ",");
File.WriteAllText(file.Replace("raw txt", "txt"), contents);
}
However, when I open the txt file in Notepad++, the delimeter is now �. What exactly is going on? What even is this characters (¦) encoding / how would I determine that? I've tried adding things like:
string contents = File.ReadAllText(file, Encoding.UTF8);
File.WriteAllText(file.Replace("raw txt", "txt"), contents, Encoding.UTF8);

Everything is now working correctly by switching the encoding to 'default' when both reading/writing.
string contents = File.ReadAllText(file, Encoding.Default);
File.WriteAllText(file.Replace("raw txt", "txt"), contents, Encoding.Default);

Try change encoding of Notepad to UTF-8

Irregular character/text encoding issue with writing back to file

I'm using this function to read text lines from a file:
string[] postFileLines = System.IO.File.ReadAllLines(pstPathTextBox.Text);
Inserting a few additional lines at strategic spots, then writing the text lines back to a file with:
TextWriter textW = new StreamWriter(filePath);
for (int i = 0; i < linesToWrite.Count; i++)
{
textW.WriteLine(linesToWrite[i]);
}
textW.Close();
This works perfectly well until the text file I am reading in contains an international or special character. When writing back to the file, I don't get the same character - it is a box.
Ex:
Before = W:\Contrat à faire aujourdhui\ `
After = W:\Contrat � faire aujourdhui\ `
This webpage is portraying it as a question mark, but in the text file it's a rect white box.
Is there a way to include the correct encoding in my application to be able to handle these characters? Or, if not, throw a warning saying it was not able to properly write given line?

Add encondig like this:
File.ReadAllLines(path, Encoding.UTF8);
and
new StreamWriter(filePath, Encoding.UTF8);
Hope it helps.

use This , works for me
string txt = System.IO.File.ReadAllText(inpPath, Encoding.GetEncoding("iso-8859-1"));

You can try UTF encoding while writing to the file as well,
textW.WriteLine(linesToWrite[i],Encoding.UTF8);

You may be need to write Single-byte Character Sets
Using Encoding.GetEncodings() you can easily get all possible encoding. ("DOS" encoding are System.Text.SBCSCodePageEncoding)
In your case you may need to use
File.ReadAllLines(path, Encoding.GetEncoding("IBM850"));
and
new StreamWriter(filePath, Encoding.GetEncoding("IBM850"));
Bonne journée! ;)

How to correctly open UTF-8 files in RichTextBox?

I have a question. This code open fine the txt files with english text, but when I trying to open the txt files with cyrillic text... the cyrillic symbols are "squares". Is it possible to resolve this problem?
string fileData = openFileDialog1.FileName;
StreamReader sr = new StreamReader(fileData);
richTextBox.Text = sr.ReadToEnd();
sr.Close();
SavedFile = saveFileDialog1.FileName;
dataTextBox.SaveFile(SavedFile, RichTextBoxStreamType.PlainText);
Solution:
string fileData = openFileDialog1.FileName;
StreamReader sr = new StreamReader(fileData, Encoding.Default);
richTextBox.Text = sr.ReadToEnd();
sr.Close();

And you are SURE the file is UTF8, right? If you write string str = sr.ReadToEnd();, place a breakpoint on the next line and watch str in Visual Studio, you see cyrillic text right? Try opening the file in notepad, File->Save As and select UTF8 as encoding.
The reason notepad is able to "read" the file is that it uses the user codepage, and in your case it's probably the Windows-1251 (Cyrillic) Codepage. StreamReader tries to read the file as UTF8. If you want you can force StreamReader to use a different codepage. The second parameter is the Encoding you want to use. You pass Encoding.GetEncoding(1251) for cyrillic. Sadly you must know the Encoding "a priori" (=before) reading the file.

StreamWriter, by default read by UTF-8 encoding format unless explicitly specified.
Try converting the text to Windows Encoding and try reading it again with same code.

C# - Detecting encoding in a file, write change to file using the found encoding

I wrote a small program for iterating through a lot of files and applying some changes where a certain string match is found, the problem I have is that different files have different encodings. So what I would like to do is check the encoding, then overwrite the file in its original encoding.
What would be the prettiest way of doing that in C# .net 2.0?
My code looks very simple as of now;
String f1 = File.ReadAllText(fileList[i]).ToLower();
if (f1.Contains(oPath))
{
f1 = f1.Replace(oPath, nPath);
File.WriteAllText(fileList[i], f1, Encoding.Unicode);
}
I took a look at Auto encoding detect in C# which made me realize how I could detect encoding, but I am not sure how I could use that information to write in the same encoding.
Would greatly appreciate any help here.

Unfortunately encoding is one of those subjects where there is not always a definitive answer. In many cases it's much closer to guessing the encoding as opposed to detecting it. Raymond Chen did an excellent blog post on this subject that is worth the read
http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx
The gist of the article is
If the BOM (byte order marker) exists then you're golden
Else it's guess work and heuristics
However I still think the best approach is to Darin mentioned in the question you linked. Let StreamReader guess for you vs. re-inventing the wheel. It only requires a very slight modification to your sample.
String f1;
Encoding encoding;
using (var reader = new StreamReader(fileList[i])) {
f1 = reader.ReadToEnd().ToLower();
encoding = reader.CurrentEncoding;
}
if (f1.Contains(oPath))
{
f1 = f1.Replace(oPath, nPath);
File.WriteAllText(fileList[i], f1, encoding);
}

By default, .Net use UTF8. It is hard to detect character encoding becus most of the time .Net will read as UTF8. i alway have problem with ANSI.
my trick is i will read the file as Stream as force it to read as UTF8 and detect usual character that should be in text. If found, then UTF8 else ANSI ... and tell user u can use just 2 encoding either ANSI or UTF8. auto dectect not quite work in my language :p

I am afraid, you will have to know the encoding. For UTF based encodings though you can use StreamReader built in functionality though.
Taken form here.
With regard to encodings - you will
need to have identified the encoding
in order to use the StreamReader.
However, the StreamReader itself can
help if you create it with one of the
constructor overloads that allows you
to supply the flag
detectEncodingFromByteOrderMarks as
true (or you can use
Encoding.GetPreamble and look at the
byte preamble yourself).
Both these methods will only help
auto-detect UTF based encodings though
- so any ANSI encodings with a specified codepage will probably not
be parsed correctly.

Prob a bit late but I encountered the same problem myself, using the previous answers I found a solution that works for me, It reads in the text using StreamReaders default encoding, extracts the encoding used on that file and uses StreamWriter to write it back with the changes using the found Encoding. Also removes\reAdds the ReadOnly flag
string file = "File to open";
string text;
Encoding encoding;
string oldValue = "string to be replaced";
string replacementValue = "New string";
var attributes = File.GetAttributes(file);
File.SetAttributes(file, attributes & ~FileAttributes.ReadOnly);
using (StreamReader reader = new StreamReader(file, Encoding.Default))
{
text = reader.ReadToEnd();
encoding = reader.CurrentEncoding;
reader.Close();
}
bool changedValue = false;
if (text.Contains(oldValue))
{
text = text.Replace(oldValue, replacementValue);
changedValue = true;
}
if (changedValue)
{
using (StreamWriter write = new StreamWriter(file, false, encoding))
{
write.Write(text.ToString());
write.Close();
}
File.SetAttributes(file, attributes | FileAttributes.ReadOnly);
}

The solution for all Germans => ÄÖÜäöüß
This function opens the file an determines the Encoding by the BOM.
If the BOM is missing the file will be interpreted as ANSI, but if there are UTF8 encoded German Umlaute in it, it will be detected as UTF8.
https://stackoverflow.com/a/69312696/9134997

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Problems with strings in the CSV file - c#

You should specify the right encoding when reading the file. The default is UTF-8. Your file is probably encoded with a different encoding.

Related

C# .csv-file in WinForm with Ä, Ö, Ü [duplicate]

Handling Special Characters (¦)

Irregular character/text encoding issue with writing back to file

How to correctly open UTF-8 files in RichTextBox?

C# - Detecting encoding in a file, write change to file using the found encoding

Categories

Resources