SQL Server & C# UTF8 encoding not correct - c#

I am getting a very big file from a linux box which I import with TOAD Wizard to SQL Server Express for testing
The file is supposed to be correctly using special characters like ÄäÖö... which the admin of the box confirms.
I am seeing only misinterpreted characters (like Ä) via Putty&less, textviewer in windows, toads import wizard, inside the db and when returning the values in .net
The only idea I have is to replace the characters in C# but for that I would need a complete list of replacements to do.
Does anyone have such a list, a finished class or any other idea?

I solved the problem by converting the file on the unix side:
iconv unicode unknown input format
use iconv to upconvert UTF-8 to UTF-16, which SQLServer can import correctly

Related

Unicode characters in Listview c#

Here is the thing:
I need to display japanese character in listview in a SQL operated database manager I am currently building for a friendly company. Tried to google, but all answers led me to nothing really. Instead of displaying characters it just does "????". Have a look:
but I am loading a properly displayed .csv file from a machine that has a japanese installed on it. Also its been saved as utf8:
Font I am using is Meiryo UI. Tried Tahoma and the same thing is happening. Loading is being done including encoding:
3
And finally here's the code responsible for stuffing the data into a listview:
4
I would really appreciate if someone could help me. Thanks!
You are using a streamreader to open the file, but you are not using that same streamreader to read the data. Instead you are instructing SQL server to open it using the BULK INSERT command. Prior to Sql 2012 SP2, there was no support for UTF-8 in BULK INSERT.
If you are using Sql 2012 SP2 or above, you might consider Tom-K answer here:
How to write UTF-8 characters using bulk insert in SQL Server?
Failing that, you must either convert the file to UTF-16 before doing the bulk insert, or use another method.
I managed to solve this thing. While using SQL Server 2014 I simply forgot to change the collation encoding in database settings. It was set on Latin instead of Japanese-Unicode BIN. Thanks to Ben for pointing me right direction.
Fixed

Opening a Unix file in Windows Notepad++?

I receive a file from a supplier that I download per SFTP. Our systems are all working on Windows.
When I open the File in Notepad++ the status bar says "UNIX" and "UTF-8"
The special characters aren't displayed correctly.
I tried to convert the file to the different formats Notepad++ allows but no one converted the char 'OSC' to the german letter 'ä'. Is this a known Unix-Windows-thing? My google-foo obviously isn't good enough.
Which kind of conversion should I try to display the file correctly?
How can I achieve the same programmatically in C#?
It is common on windows that a file's encoding doesn't match what the editor or even its xml header say it is. People are sloppy. Maybe it's really UTF-16, or the unstandard windows extended ascii thing which I think is probably cp-1252. (It's not common on *nix since we all usually just use utf-8, no need for others... not saying *nix users are much less sloppy)
To figure out which encoding it is, I would make a copy of the file, then delete the bits that are not a problem (leaving Mägenwil as the entire file) and then save, and use the linux command "file" which will tell what the right encoding is (reliable only for small files... it doesn't read the whole file; maybe notepad++ will do the exact same thing). The reason for deleting the other bits is that it might be a mix of UTF-8 which the editor has used for detection, plus something else.
I would try the iconv command in linux to test. For example:
iconv -f UTF-16 -t UTF-8 -o outfile infile
And any encoding conversion should be possible in C# or any featureful language, as long as you know how it was mutilated so you can reverse it. And if you find that it is part utf-8 and part something else, then remember not to convert the whole file, but only the important parts.

How to import Turkish and Chinese characters into SQL database

I have a database table with translations in different Languages. I'm trying to insert chinese and Turkish char with a C# program. But it doesn't seem to be working. I changed the collation of my database to Chinese_PRC_90_CI_AS now the import works for Chinese, but not for Turkish.
But I don't want to be changing the Collation everytime I upload a new language is there a way to resolve this in my code. I'm Reading a excel file -> build insert query(with parameters(#col1,#col2,..)) -> Execute -> Result: b?lüm in database.
Can somebody please help me?
Phoenix
Fixed the problem.
I had a problem with my import routine.
I had Nvarchar database fields but when import with parameters I used Char database type instead of NChar.

Asp.net renders string with wrong encoding, but PHP doesn't (MySQL)

I took over some old php application with MySQL as database. Inside the database, there are tables including content with localized strings (therefore containing special chars)
Currently there is a PHP application accessing that database. My job is to create an ASP.net (C# codebehind) application that accesses that strings as well. That works, as far as encoding goes.
If I try to access these strings, I do get a kind of encoding problem, like 'Ändern' and 'Prüfzeichen', but only in the ASP.net application. The PHP app sets utf-8 as charset and the strings are perfectly rendered. In the ASP.net application it's gibberish, regardless of the page encoding.
In the MySQL database, the charset for the specified table 'translations' is set to 'latin --cp1252 West European' and collation to 'latin_swedish_ci'.
I can't seem to figure out what PHP apparently does, and ASP.net does not. I traced the php code and could not find any sign of special encoding while getting a string from the database.
The question is, how can I ensure correct encoding inside the ASP.net application without modifying the database, because big changes at the php code are not possible?
Does anybody have a clue?
The best long-term solution would be to convert the table to use UTF-8 encoding:
ALTER TABLE translations CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
If the data is already in utf-8 format (even though the character set is latin1), you'll need to convert each column to the correct encoding.
This converts a column defined as being latin1 but containing utf8 to a column declared as and containing utf8:
ALTER TABLE translations CHANGE columnNameHere columnNameHere BLOB;
ALTER TABLE translations CHANGE columnNameHere columnNameHere TEXT CHARACTER SET utf8;
I can't seem to figure out what PHP apparently does,
The PHP app sets utf-8 as charset. For the database connection. With SET NAMES <encoding> query. Where <encoding> is your pages encoding
If finally managed to find way to convert into UTF8.
System.Text.Encoding.UTF8.GetString(System.Text.Encoding.Default.GetBytes("convert me"))

C# Writing Hebrew to a db the text is written left to right e.g. olleh not hello

When writing Hebrew to a database the text is being written left to right when it should be right to left, as Hebrew is written right to left, my app is writing "hello" and it should be writing "olleh" (in Hebrew of course).
To read the Hebrew into my app I use System.Text.Encoding.GetEncoding(1255);
The text displays correctly in my app but when written to the database it is written left to right. My question is what am I missing when writing the text to the db?
Many thanks
Jonathan
Codepage 1255 encodes the text in logical, not visual order. Since you said it displays correctly in your app but not in your database, the most likely explanation is that the database tool does not support bidirectional text when you query it interactively. That does not matter, since the users don't directly query the database. Your app does, and then properly displays the bidirectional text.
Is your database set up with a sort order/collation that is right-to-left? For example, SQL Server sort order 138 = Dictionary order, case-insensitive, for use with the 1255 (Hebrew) character set.
Try with this encoding
Encoding.UTF8;
Encoding.GetEncoding("iso-8859-8");

Categories