Asp.net renders string with wrong encoding, but PHP doesn't (MySQL) - c#

I took over some old php application with MySQL as database. Inside the database, there are tables including content with localized strings (therefore containing special chars)
Currently there is a PHP application accessing that database. My job is to create an ASP.net (C# codebehind) application that accesses that strings as well. That works, as far as encoding goes.
If I try to access these strings, I do get a kind of encoding problem, like 'Ändern' and 'Prüfzeichen', but only in the ASP.net application. The PHP app sets utf-8 as charset and the strings are perfectly rendered. In the ASP.net application it's gibberish, regardless of the page encoding.
In the MySQL database, the charset for the specified table 'translations' is set to 'latin --cp1252 West European' and collation to 'latin_swedish_ci'.
I can't seem to figure out what PHP apparently does, and ASP.net does not. I traced the php code and could not find any sign of special encoding while getting a string from the database.
The question is, how can I ensure correct encoding inside the ASP.net application without modifying the database, because big changes at the php code are not possible?
Does anybody have a clue?

The best long-term solution would be to convert the table to use UTF-8 encoding:
ALTER TABLE translations CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
If the data is already in utf-8 format (even though the character set is latin1), you'll need to convert each column to the correct encoding.
This converts a column defined as being latin1 but containing utf8 to a column declared as and containing utf8:
ALTER TABLE translations CHANGE columnNameHere columnNameHere BLOB;
ALTER TABLE translations CHANGE columnNameHere columnNameHere TEXT CHARACTER SET utf8;

I can't seem to figure out what PHP apparently does,
The PHP app sets utf-8 as charset. For the database connection. With SET NAMES <encoding> query. Where <encoding> is your pages encoding

If finally managed to find way to convert into UTF8.
System.Text.Encoding.UTF8.GetString(System.Text.Encoding.Default.GetBytes("convert me"))

Related

How to store other language in LiteDB?

I am making a quiz game with C# in which I wanna store Korean as question . I'm not sure that It is storing Korean but when I fetch data then it show like ??? instead of Korean characters , so what I can do to show or use Korean character in my program?
Your issue will be around the code page being used.
You can test the encoding with this pragma:
PRAGMA encoding;
Note that you can’t change the encoding for an existing database. You will need to create a new database with a specific encoding then open a SQLite connection to a new file.
Then:
PRAGMA encoding = "UTF-8"; // change as needed
You’ll need to recreate the schema / structure and import all the data.

My website is displaying Arabic text as question mark symbols

I have this problem where I am making a website that displays a news rss feed in Arabic so I insert to sql server database the title, body (description) and the link of each news but they stored in database as (?) symbols so when I request the data from the database to display it in the webpage it displays (?) symbols. How can I make it display the Arabic characters?
I tried
<globalization requestEncoding="utf-8" responseEncoding="utf-8" />
but that was not the solution please any help?!!
Make sure your data type in your database allow insertion of special (eg. Unicode) characters. In Sql Server, as example, you should use nvarchar data type instead of varchar. What is your RDBMS?
Few suggestions:
Make sure that the database tables that will store the Arabic data have the proper collation.
You'll probably need Arabic_CI_AS instead of the default Latin1_General_CI_AS.
Make sure that the database columns are set to nvarchar.
Make sure that any JavaScripts that are used on your website are saved with UTF8 encoding.
I just bumped into this link in my Smashing Magazine newsletter, it might provide some useful additional info on UTF8 and common difficulties people have with it:
http://the-pastry-box-project.net/oli-studholme/2013-october-8/

SQL Server & C# UTF8 encoding not correct

I am getting a very big file from a linux box which I import with TOAD Wizard to SQL Server Express for testing
The file is supposed to be correctly using special characters like ÄäÖö... which the admin of the box confirms.
I am seeing only misinterpreted characters (like Ä) via Putty&less, textviewer in windows, toads import wizard, inside the db and when returning the values in .net
The only idea I have is to replace the characters in C# but for that I would need a complete list of replacements to do.
Does anyone have such a list, a finished class or any other idea?
I solved the problem by converting the file on the unix side:
iconv unicode unknown input format
use iconv to upconvert UTF-8 to UTF-16, which SQLServer can import correctly

German Letters encoding problem

I get HTML from a webpage that is in german language, i have to insert its html in database, but when I insert it in database the german letters does not appear coorectly.
E.g. Bundesstraße appears as Bundesstraße. I am using C# and MYsql database.
It seems like special characters are encoded as html entities (http://www.w3schools.com/tags/ref_entities.asp) on the website. When using UTF8 this isn't necessary, but many sites still do it.
If you want to have the exact html as it is on the website these encoded entities are correct.
To decode the entities you can use System.Net.WebUtility.HtmlDecode(yourString).
What encoding are you using?
Try switching to UTF-8 and ensure your database supports it. It looks as if though your string is getting HTML encoding, this is fine for presentation, but you'll need the original format to store it in the database.
In HTML, ß is encoded as ß.
You say "i have to insert its html in database", and what you're currently getting is correct.

Microsoft.Jet.OLEDB.4.0 Converting Characters

I'm working with a CSV that contains characters like:
” and •
I am reading the CSV via OleDb and the provider is Microsoft.Jet.OLEDB.4.0. when the data is loaded into the OleDbCommand, the characters are converted to the following respectively:
“ and •
I suspected there might be a collation setting in the connection string but I was unable to find anything about this.
I can confirm the following:
I can see the original character in the CSV when I open it.
If I run a select on the file through OleDb WHERE [field] LIKE '%•%' I get 0 rows but if SELECT WHERE [field] LIKE '%“%' I get rows returned.
Any thoughts?
Finally! Thanks to #HABJAN I was able to get to the resolution which is as simple as setting the CharaterSet in the Extended Properties of the connection string. For my situation it was UTF-8...commonly used by default in PHPMyAdmin which is where my data was retrieved from.
Resulting working connection string:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"text;HDR=Yes;FMT=Delimited;CharacterSet=65001;\""
Key is CharacterSet=65001 (Code Page Identifier for UTF-8) which might have been obvious to some collation savvy individuals but I've somehow managed to avoid these issues over the years and never come across it in this respect.
I was also able to get HABJAN's solution to work when also following the documentation found # http://msdn.microsoft.com/en-us/library/ms709353%28v=vs.85%29.aspx and setting the CharacterSet to the same as above.
For my situation, this is the better method as it is a simpler/more maintainable solution, but +1 to HABJAN for helping me get there!
Thanks
You can create schema.ini file and play with format and CharacterSet properties.
Take a look at this sample: How to read data from Unicode formatted text file and import to Data Table using .Net
And here is another sample that will show you how to read csv file with schema.ini: Importing CSV file into Database with Schema.ini

Categories