I get HTML from a webpage that is in german language, i have to insert its html in database, but when I insert it in database the german letters does not appear coorectly.
E.g. Bundesstraße appears as Bundesstraße. I am using C# and MYsql database.
It seems like special characters are encoded as html entities (http://www.w3schools.com/tags/ref_entities.asp) on the website. When using UTF8 this isn't necessary, but many sites still do it.
If you want to have the exact html as it is on the website these encoded entities are correct.
To decode the entities you can use System.Net.WebUtility.HtmlDecode(yourString).
What encoding are you using?
Try switching to UTF-8 and ensure your database supports it. It looks as if though your string is getting HTML encoding, this is fine for presentation, but you'll need the original format to store it in the database.
In HTML, ß is encoded as ß.
You say "i have to insert its html in database", and what you're currently getting is correct.
Related
I have this problem where I am making a website that displays a news rss feed in Arabic so I insert to sql server database the title, body (description) and the link of each news but they stored in database as (?) symbols so when I request the data from the database to display it in the webpage it displays (?) symbols. How can I make it display the Arabic characters?
I tried
<globalization requestEncoding="utf-8" responseEncoding="utf-8" />
but that was not the solution please any help?!!
Make sure your data type in your database allow insertion of special (eg. Unicode) characters. In Sql Server, as example, you should use nvarchar data type instead of varchar. What is your RDBMS?
Few suggestions:
Make sure that the database tables that will store the Arabic data have the proper collation.
You'll probably need Arabic_CI_AS instead of the default Latin1_General_CI_AS.
Make sure that the database columns are set to nvarchar.
Make sure that any JavaScripts that are used on your website are saved with UTF8 encoding.
I just bumped into this link in my Smashing Magazine newsletter, it might provide some useful additional info on UTF8 and common difficulties people have with it:
http://the-pastry-box-project.net/oli-studholme/2013-october-8/
I am getting a very big file from a linux box which I import with TOAD Wizard to SQL Server Express for testing
The file is supposed to be correctly using special characters like ÄäÖö... which the admin of the box confirms.
I am seeing only misinterpreted characters (like Ä) via Putty&less, textviewer in windows, toads import wizard, inside the db and when returning the values in .net
The only idea I have is to replace the characters in C# but for that I would need a complete list of replacements to do.
Does anyone have such a list, a finished class or any other idea?
I solved the problem by converting the file on the unix side:
iconv unicode unknown input format
use iconv to upconvert UTF-8 to UTF-16, which SQLServer can import correctly
I'm storing some html-encoded data in a sql server database and I've written a script to output the data in a csv format minus the html tags and I'm getting a weird issue when html-decoding the remaining data. For example the data contains a quote character (which is html-encoded as ’), but when I try to html-decode it the data comes out as a series of weird characters (’). Does anyone know how to solve this issue? The output encoding of the page is UTF-8 if that helps.
Any advice would be much appreciated!
Cheers
Tim
Those 3 weird characters are how UTF-8 encodes the HTML entity ’. (They're actually the octets 0xE2 0x80 0x99, and those bytes render as "’" in your computer's default charset windows-1252.) So I don't think you've got an issue with your encoding.
It's evidently a known problem that Excel 2000 has problems with .csv files in UTF-8 encoding. The solution, bizarrely enough, is to switch the filename extension to .txt, at which point Excel 2000 will evidently import the file correctly.
If the data is read from the CSV files, open the csv file in notepad press Save As in the fiile menu, save the file as Encoding-UTF8.
I took over some old php application with MySQL as database. Inside the database, there are tables including content with localized strings (therefore containing special chars)
Currently there is a PHP application accessing that database. My job is to create an ASP.net (C# codebehind) application that accesses that strings as well. That works, as far as encoding goes.
If I try to access these strings, I do get a kind of encoding problem, like 'Ändern' and 'Prüfzeichen', but only in the ASP.net application. The PHP app sets utf-8 as charset and the strings are perfectly rendered. In the ASP.net application it's gibberish, regardless of the page encoding.
In the MySQL database, the charset for the specified table 'translations' is set to 'latin --cp1252 West European' and collation to 'latin_swedish_ci'.
I can't seem to figure out what PHP apparently does, and ASP.net does not. I traced the php code and could not find any sign of special encoding while getting a string from the database.
The question is, how can I ensure correct encoding inside the ASP.net application without modifying the database, because big changes at the php code are not possible?
Does anybody have a clue?
The best long-term solution would be to convert the table to use UTF-8 encoding:
ALTER TABLE translations CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
If the data is already in utf-8 format (even though the character set is latin1), you'll need to convert each column to the correct encoding.
This converts a column defined as being latin1 but containing utf8 to a column declared as and containing utf8:
ALTER TABLE translations CHANGE columnNameHere columnNameHere BLOB;
ALTER TABLE translations CHANGE columnNameHere columnNameHere TEXT CHARACTER SET utf8;
I can't seem to figure out what PHP apparently does,
The PHP app sets utf-8 as charset. For the database connection. With SET NAMES <encoding> query. Where <encoding> is your pages encoding
If finally managed to find way to convert into UTF8.
System.Text.Encoding.UTF8.GetString(System.Text.Encoding.Default.GetBytes("convert me"))
I am having a problem where users are composing some large chunks of text in MS Word, then pasting that in to the online form. These get entered into the DB as an upside down ?. What are my options to replace these with standard quotes?
These smart quotes are a unicode point. All you need is a simple String.Replace to sort them out.
-edit- Something like:
mystring.Replace("\u201C","\"").Replace("\u201D","\"")
What are my options to replace these with standard quotes?
The best approach is not to replace them. People want to use “smart quotes”, let them. They're not aberrations that only exist in MS Word, they're perfectly valid Unicode characters, and if your application isn't storing non-ASCII characters right then there's a whole lot more that will go wrong than just smart quotes.
Use UTF-8 encoding for all your web pages and store your content in a Unicode-capable database (eg. if you are using SQL Server, use NVARCHAR) and you'll not only support smart quotes but also accents and other alphabets.
You should run the input through the HtmlEncode method, which will convert from or to and , allowing you to save those and other higher characters to a format that can be saved without hassle.
Should I also mention Joel's post again?
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)