Displaying Swedish charaters in aspx page - c#

In an aspx page, Combo box is displaying swedish characters in wrong way. It is displaying like "Réunion" instead of"'Re'union" ? This value is retrieved from oracle database? Please suggest workarounds to fix this issue? Note: Culture and UICulture attributes are tried with. But not working"

You either use encoding for your html page different from the default UTF-8 or are reading wrong values from the database. You can check the encoding headers with Firebug or IE Dev Tools and also the document encoding. You can check if your column in the database is unicode or ASCII in which case you will need encoding. There are two simple test you can do
Add some swedish text directly into a C# string and assign it to a label. See how it renders. If it is OK then your page encoding is OK.
Put a breakpoint after you retrieve the value from the database and check with the debugger if it is displayed correctly in the debugger window.
If 1 does not display correctly but 2 does then you have encoding problem with the page. If you 1 is displayed correctly but 2 is not you have a problem when reading or writing values to the database.

First of all, determine if you receive the string correctly from the Oracle database (in debugger, view the received string). If the string is already received wrong, it means you have not properly set the database charset on your connection. You should fix that; a nasty workaround would be to “ungarble” the garbled string by something like Encoding.UTF8.GetString(Encoding.GetEncoding(1252).GetBytes(garbledString)).

Related

Is it safe to encode an Html string before it's inserted in an SQL database and then send it back to the browser

I've been reading extensively for the last 2 days about securing ASP.NET applications. However there are contradictory opinions whether html form inputs should be encoded before they are inserted in a SQL database or just before they are sent back to the browser. I think the former opinion makes more sense to me and I will encode all form inputs once they are received by the server and then inserted in the database. However I'm a bit confused about what will happen next!
So let me follow a string from the time it's created until it is sent back to the browser.
Step 1
A user inputs the following string:
string userString = "It's important to know that 1 > 0";
Step 2
Once received by the server the string is encoded and inserted in the database.
string RenderedString = HttpUtility.HtmlEncode(userString);
As a result the string in the database is saved as follows:
"It's important to know that 1 > 0."
Step 3
I want to send the rendered string back to the browser as it is so it will be written on the webpage as html, and not parsed and rendered by the browser. The result will be:
string StringResult = "It's important to know that 1 > 0";
My question is: Should I add any extra "safety" step before sending the string back to the browser or is it enough with the 3 above steps? Any help would be well appreciated?
I would always save the data without encoding because I might need to show it in some other form e.g: JSON. Saving it encoded to the database would mean to me that I'm putting the view logic inside my data model.
Your assumption (that strings stored in the database are only sent back to the browser) is incorrect.
You can do other things with strings, which do not expect them to be HTML-encoded:
Search for them
Send them to Web Services
Process them in C# or Visual Basic
Use them in PDF files and Excel spreadsheets
Most software assumes strings are unencoded.
MVC automatically HTML-encodes strings before sending them to the browser.

Unicode issue with insert ZERO WIDTH SPACE into database

I am using CKEditor and it seems that it is possible with the correct keypresses to get the following unicode character inserted into the textarea.
U+200B ​ \xe2\x80\x8b ZERO WIDTH SPACE
Now when I try to save this into a MySQL database I get the following error:-
MySql.Data.MySqlClient.MySqlException
Incorrect string value: '\xE2\x80\x8B </...' for column 'Content' at row 1
From what I can see I have a several options:-
Change the collation on my table, however I am not entirely sure what impact this will have on my c# MVC4 application that uses NHibernate as the ORM
Strip out the unicode from the string before I insert into the database, however I am not entirely how to do this and even if it is correct.
This seems to be a bug in CKEditor for certain browsers, however I would like to future proof myself by not waiting for a fix.
So my question is simply what is my best option to get around this issue?
Visibly your charset is Latin1.
You shouldn't try to store unicode data in Latin1 column. You will probably have to change that:
ALTER TABLE campaignemail MODIFY Content LONGTEXT CHARACTER SET utf8
Beware when doing so that if you erroneously stored "unicode-pretending-to-be-latin1" this might put a mess in your table values.
BTW the charset is the encoding used to map from a "letter" (strictly speaking: a codepoint) to "bytes".
The collation define the relative order between the various "letters". If is used to search/sort columns.

How to handle Japanese names in url?

The method described here URL Slugify algorithm in C#? returns "" for the input ブルノ.
Is it ok to include Japanese characters in URLs or does it hurt SEO? Will Google/Bing display as ブルノ or %E3%83%96%E3%83%AB%E3%83%8E?
What should I do for the user アウロン?
/user/1/auron (different field to set the name in url)
/user/1/アウロン (set his displayname in url)
ja.wikipedia.org does use Japanese characters in the URL, so can I just assume it is safe? Or does it need something else?
I suggest you do not use the second choice 2./user/1/アウロン.
The reason is, if I do not have the japanses language installed on my machine (I use Windows 7) and you send me a link like www.abc.com/user/1/アウロン it will display a link to www.abc.com/user/1/ instead of www.abc.com/user/1/アウロン.
I noticed this behavior when I sent a link that contains Thai characters to my coworkers on Skype. It linked to incorrect link like what I said in the previous paragraph.

MySqlException incorrect string value [duplicate]

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.
Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.
Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:
What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
How can I fix it?
What are the likely effects of such a fix?
One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.
UPDATE to the below answer:
The time the question was asked, "UTF8" in MySQL meant utf8mb3. In the meantime, utf8mb4 was added, but to my knowledge MySQLs "UTF8" was not switched to mean utf8mb4.
That means, you'd need to specifically put "utf8mb4", if you mean it (and you should use utf8mb4)
I'll keep this here instead of just editing the answer, to make clear there is still a difference when saying "UTF8"
Original
I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.
If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.
Then, check your database connection, you should do this after connecting:
SET NAMES 'utf8mb4';
SET CHARACTER SET utf8mb4;
Next, verify that the tables where the data is stored have the utf8mb4 character set:
SELECT
`tables`.`TABLE_NAME`,
`collations`.`character_set_name`
FROM
`information_schema`.`TABLES` AS `tables`,
`information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations`
WHERE
`tables`.`table_schema` = DATABASE()
AND `collations`.`collation_name` = `tables`.`table_collation`
;
Last, check your database settings:
mysql> show variables like '%colla%';
mysql> show variables like '%charac%';
If source, transport and destination are utf8mb4, your problem is gone;)
MySQL’s utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).
If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.
The table and fields have the wrong encoding; however, you can convert them to UTF-8.
ALTER TABLE logtest CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest CHANGE title title VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci;
"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:
>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data
If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.
Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?
I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.
The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.
Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.html for more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.
First check if your default_character_set_name is utf8.
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";
If the result is not utf8 you must convert your database. At first you must save a dump.
To change the character set encoding to UTF-8 for all of the tables in the specified database, type the following command at the command line. Replace DBNAME with the database name:
mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME
To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace DBNAME with the database name:
ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;
You can now retry to to write utf8 character into your database. This solution help me when i try to upload 200000 row of csv file into my database.
Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.
ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';
I got a similar error (Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1). I have tried to change character set of column to utf8mb4 and after that the error has changed to 'Data too long for column 'content' at row 1'.
It turned out that mysql shows me wrong error. I turned back character set of column to utf8 and changed type of the column to MEDIUMTEXT. After that the error disappeared.
I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.
That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.
In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC? for details. The trick is to use column type utf8mb4 instead of type utf8 which doesn't actually support all of UTF-8 despite the name. The former type is the correct type to use for all UTF-8 strings.
In my case, Incorrect string value: '\xCC\x88'..., the problem was that an o-umlaut was in its decomposed state. This question-and-answer helped me understand the difference between o¨ and ö. In PHP, the fix for me was to use PHP's Normalizer library. E.g., Normalizer::normalize('o¨', Normalizer::FORM_C).
The solution for me when running into this Incorrect string value: '\xF8' for column error using scriptcase was to be sure that my database is set up for utf8 general ci and so are my field collations. Then when I do my data import of a csv file I load the csv into UE Studio then save it formatted as utf8 and Voila! It works like a charm, 29000 records in there no errors. Previously I was trying to import an excel created csv.
I have tried all of the above solutions (which all bring valid points), but nothing was working for me.
Until I found that my MySQL table field mappings in C# was using an incorrect type: MySqlDbType.Blob . I changed it to MySqlDbType.Text and now I can write all the UTF8 symbols I want!
p.s. My MySQL table field is of the "LongText" type. However, when I autogenerated the field mappings using MyGeneration software, it automatically set the field type as MySqlDbType.Blob in C#.
Interestingly, I have been using the MySqlDbType.Blob type with UTF8 characters for many months with no trouble, until one day I tried writing a string with some specific characters in it.
Hope this helps someone who is struggling to find a reason for the error.
If you happen to process the value with some string function before saving, make sure the function can properly handle multibyte characters. String functions that cannot do that and are, say, attempting to truncate might split one of the single multibyte characters in the middle, and that can cause such string error situations.
In PHP for instance, you would need to switch from substr to mb_substr.
I added binary before the column name and solve the charset error.
insert into tableA values(binary stringcolname1);
Hi i also got this error when i use my online databases from godaddy server
i think it has the mysql version of 5.1 or more. but when i do from my localhost server (version 5.7) it was fine after that i created the table from local server and copied to the online server using mysql yog i think the problem is with character set
Screenshot Here
To fix this error I upgraded my MySQL database to utf8mb4 which supports the full Unicode character set by following this detailed tutorial. I suggest going through it carefully, because there are quite a few gotchas (e.g. the index keys can become too large due to the new encodings after which you have to modify field types).
There's good answers in here. I'm just adding mine since I ran into the same error but it turned out to be a completely different problem. (Maybe on the surface the same, but a different root cause.)
For me the error happened for the following field:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
private URI consulUri;
This ends up being stored in the database as a binary serialization of the URI class. This didn't raise any flags with unit testing (using H2) or CI/integration testing (using MariaDB4j), it blew up in our production-like setup. (Though, once the problem was understood, it was easy enough to see the wrong value in the MariaDB4j instance; it just didn't blow up the test.) The solution was to build a custom type mapper:
package redacted;
import javax.persistence.AttributeConverter;
import java.net.URI;
import java.net.URISyntaxException;
import static java.lang.String.format;
public class UriConverter implements AttributeConverter<URI, String> {
#Override
public String convertToDatabaseColumn(URI attribute) {
return attribute.toString();
}
#Override
public URI convertToEntityAttribute(String field) {
try {
return new URI(field);
}
catch (URISyntaxException e) {
throw new RuntimeException(format("could not convert database field to URI: %s", field));
}
}
}
Used as follows:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
#Convert(converter = UriConverter.class)
private URI consulUri;
As far as Hibernate is involved, it seems it has a bunch of provided type mappers, including for java.net.URL, but not for java.net.URI (which is what we needed here).
In my case that problem was solved by changing Mysql column encoding to 'binary' (data type will be changed automatically to VARBINARY). Probably I will not be able to filter or search with that column, but I'm no need for that.
In my case ,first i've meet a '???' in my website, then i check Mysql's character set which is latin now ,so i change it into utf-8,then i restart my project ,then i got the same error with you , then i found that i forget to change the database's charset and change into utf-8, boom,it worked.
I tried almost every steps mentioned here. None worked. Downloaded mariadb. It worked. I know this is not a solution yet this might help somebody to identify the problem quickly or give a temporary solution.
Server version: 10.2.10-MariaDB - MariaDB Server
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)
I had a table with a varbinary column that I wanted to convert to utf8mb4 varchar. Unfortunately some of the existing data was invalid UTF-8 and the ALTER query returned Incorrect string value for various rows.
I tried every suggestion I could find regarding cast / convert / char_length = length etc. but nothing in SQL detected the erroneous values, other than the ALTER query returning bad rows one by one. I would love a pure SQL solution to remove the bad values. Sadly this solution is not pretty
I ended up select *'ing the entire table into PHP, where the erroneous rows could be detected en-masse by:
if (empty(htmlspecialchars($row['whatever'])))
The problem can also be caused by the client if the charset is not set to utf8mb4. so even if every Database, Table and Column is set to utf8mb4 you will still get an error, for instance in PyCharm.
For Python, set the charset of the connection in the MySQL Connector connect method:
mydb = mysql.connector.connect(
host="IP or Host",
user="<user>",
passwd="<password>",
database="<yourDB>",
# set charset to utf8mb4 to support emojis
charset='utf8mb4'
)
I know i`m late to the ball but someone else might come accross the problem i had with this and be happy to read my workaround.
I have come accross this problem with french characters. turns out i the text I was copying had encoding the accents on some charaatcers as 2 chars and others as single chars...
i couldn`t find how to set my table to accept the strings so i ended up changing the diacritics in my text import.
here is a list of them as double characters to search for them in your texts.
ùòìàè
áéíóú
ûôêâî
ç
1 - You have to declare in your connection the propertie of enconding UTF8. http://php.net/manual/en/mysqli.set-charset.php.
2 - If you are using mysql commando line to execute a script, you have to use the flag, like:
Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql.exe -h localhost -u root -P 3306 --default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\www\PontoEletronico\PE10002Corporacao\BancoDeDadosModelo\omega_empresa_parametros.sql

Error in Decompression?

I am writing a crawler for a website.
Its response is gzip encoded.
I am not able to parse correctly a particular field, though the decompression is successful.
I am also using htmlagilitypack to parse it,
the parsed value of the field is only a part of the original value
as an example :
I am getting only /wEWAwKc04vTCQKb86mzBwKln/PuCg==
whereas the firebug shows the actual value as much longer:
/wEWBgKj7IuJCgKb86mzBwKln/PuCgLT250qAtC0+8cMAvimiNYD
what does the '==' at the end means?
I am assuming it that its a error on decompressors behalf?
The character = is added by the Base64 encoding.
Encoding the following sentence
Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.
you would get
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
The = character can only be present at the end of the Base64 string. If you obtain it, it means you are probably getting all the characters; vice versa is not true, as that character is used as padding character, and it is not always mandatory in all the Base64 implementations.
You don't have a problem with decompression. The page has obviously been correctly decompressed. Otherwise your software would likely throw an error or you'd see just a bunch of strange characters.
However, what you get is an ASCII string that's obviously in Base 64 encoding. The equal signs at the end appear if the original binary data is not a multiple of 3 bytes. So that's all perfect Base 64 data.
As to why your crawler gets different data than Firefox with Firebug: I don't know but can image many reasons. These are two separate browsing sessions and the web site might just assign them different session IDs or somehow record some history of the session.
Anyhow, at the end of the day I don't understand your problem. What exactly are you unable to parse? Do you get some kind of error? What do you mean by field? Are you talking about a field of an HTML form?

Categories