C# with MySql and Unicode characters - c#

My problem is when using C# and MySQL database to save some records by sending parameters.
Although i have already set the charset as Utf-8 and i can see the unicode characters correctly, the problem i get when trying to insert unicode characters is that it only saves half of the string.
The really weird this is that this happens only with unicode strings such as Greek words and only when i send the query with parameters.
Ie. if my query as seen in C# is:
string query = "INSERT INTO tablename VALUES (NULL, #somestring)";
and i set the #somestring parameters value as "TESTING". This would work just fine.
If i try to set the value as unicode string "ΤΕΣΤΙΝΓ", the query executes fine with no errors but only saves half the characters in the database, ie. it only saves "ΤΕΣΤ".
On the other hand if i remove the parameters and adjust the query to be as:
string somestring = "ΤΕΣΤΙΝΓ";
string query = "INSERT INTO tablename VALUES (NULL,'" + somestring + "')";
the query again works just fine AND saves the whole word/sentence in the database.
Hope i explained it correctly and you can understand my situation.
Thanks

The length of how you declare the parameter #somestring is too short in c#.
UTF-8 takes upto 3 bytes per character so you'd need length to be 21 not 7 for example to fit testing and varieties thereof
Saying that, I've not used c# to call MySQL (only SQL Server) but I'm sure this is the problem

Related

C# encoding issue, Character '¤' result to '?'

I have a C# program where I stored a long SQL query in a resource file (sql.resx).
In my query I have a special currency character (¤). When my program gets the query from the resource file and executes it on SQL Server, the currency character (¤) appears as an unsupported character. On appending character ¤ to the result it appears as � instead of ¤.
For me it seems like an encoding issue in C#.
Here's an excerpt from the query, where things are correct:
and tb.beward = b.beward and tb.beroom = b.beroom and tb.beidnr = b.beidnr
and wp.wpspecialbranch = c.tunnus
and ((b.beward + '¤' + b.beroom + '¤' + b.beidnr) LIKE #SCHEDULE
and
((tb.beward + '¤' + tb.beroom + '¤' + tb.beidnr) LIKE #SCHEDULE))
And here's what I'm observing:
and tb.beward = b.beward and tb.beroom = b.beroom and tb.beidnr = b.beidnr
and wp.wpspecialbranch = c.tunnus
and ((b.beward + '�' + b.beroom + '�' + b.beidnr) LIKE #SCHEDULE
and
((tb.beward + '�' + tb.beroom + '�' + tb.beidnr) LIKE #SCHEDULE))
This happens when I copy the query from debug mode and paste in SQL Server Management Studio.
Note: It was working fine on my server few days before, but isn't working now. There must be some changes happened on my server, but I'm not sure what.
I think there are two issues here.
First, in SQL Server, string literals with wide characters need the N prefix:
b.reward + N'¤'
Second, the character encoding for the *.resx file is probably wrong, or you'd at least see the character in the window, even if Sql Server didn't read it properly.
If this was working a few days ago, possibly someone opened and saved the file with a program that only knows how to do ASCII, and your special character was mangled. You'll need to fix the file.
If this came from the Visual Studio debug window — which is notorious for mangling values while trying to be "helpful" — you might not even be looking in the right place.
I also have three items for you separate from the question.
Looking at the SQL, this isn't gonna perform well. The concatenation going on here makes any indexes on those columns worthless. You will get much better performance... probably orders of magnitude, if you structure the query to not require concatenating those columns. At least, possibly a computed column with a FULL-TEXT index, could make this query drastically faster.
Logically, the SQL is also doing extra work. If the beward, beroom, beidnr columns already match between the two tables, you only need to concatenated and test ONE of them against the #SCHEDULE input. They have the same values, so if one matches (or not), the other must have the same result.
In the future, please PASTE THE CODE into your question. Images don't work as well here. It saves you work, too.
In SQL Server
You need to use the datatype Nvarchar which is able to store unicode characters.
To declare a string literal as nvarchar you need to prefix it with N', without it is just a normal varchar. Varchars allow only characters in the specified underlying collation.
b.reward + N'¤'

Arabic_CI_AS to utf8 in C#

I have a DataBase in Sql Server with collection Arabic_CI_AS and i need to compare some string data with another Postgres Database with Utf8 character set. Also i use C# for convert & compare. It easy done when string contains just one word (in these cases i should just replace 'ي' to 'ی'), but for long string special with '(' charachter has problem.
I cant do it! I try some suggested solution such as:
var enc = Encoding.GetEncoding(1256);
byte[] encBytes = enc.GetBytes(customer.name);
customer.name = Encoding.UTF8.GetString(encBytes, 0, encBytes.Length);
or:
SELECT cast (name as nvarchar) as NewName
from Customer
But they dont work! Can anyone help me?
Example of input and output, see tooltips on the right:
maybe this can help you to change your collation dynamically
SELECT name collate SQL_Latin1_General_CP1_CI_AS
from Customer
or
SELECT name collate Persian_100_CI_AI
from Customer
or
you can try this in c# side
string _Value=string.Empty;
byte[] enBuff= Encoding.GetEncoding("windows-1256").GetBytes(customer.name);
customer.name= Encoding.GetEncoding("windows-1252").GetString(enBuff);
you can choose another collations too.
you should change many collation and Encoding number to get wanted result.
SQL Server does not support utf-8 strings. If you have to deal with characters other than plain-latin it is strongly recommended to use NVARCHAR instead of VARCHAR with an arabic collation.
Many people think, that NVARCHAR is utf-16 while VARCHAR is utf-8. This is not true! The second is extended ASCII and is using 1 byte in any case, while utf-8 will encode some characters with more than one byte.
So - the most important question is: WHY?
SQL Server can take your string into a NVARCHAR variable, cast it to a chain of bytes and re-cast it to the former string:
DECLARE #str NVARCHAR(MAX)=N'(نماینده اراک)';
SELECT #str
,CAST(#str AS VARBINARY(MAX))
,CAST(CAST(#str AS VARBINARY(MAX)) AS NVARCHAR(MAX));
The problem with the ) is - quite probably! - that your arabic letters are right-to-left while the ) is left-to-right. I wanted to paste the result of the query above into this answer but did not manage to get the closing ) to the original place... You try to edit, delete, replace, but you get something else... Somehow funny, but not a question of bad encoding but one of buggy editors...
Anyway, SQL-Server is not your issue. You must read the string as NVARCHAR out of SQL-Server. C# is working with unicode strings and not a collated 1-byte string. Every conversion carries the chance to destroy your text.
If your target (or the tooltip you show us) is not capable to show the string properly, it might be perfectly okay, but the editor is not...
If you pass such an UTF-8 string back to SQL-Server, you'll get a mess...
The only place, where UTF-8 makes sense is written to a file or transmitted via small band. If a text contains very many plain latin characters and just a few strange letters (like ver often XML, HTML) you can save quite some diskspace or band with. With a far-east text you'd even bloat you text. Some of these characters will need 3 or even 4 bytes to be encoded.
Within your database and application you should stick with unicode.

C# SQL Select Chinese characters returns weird characters

I'm trying to convert a piece of software into Chinese but I'm having some problems with the database. It returns weird strings of characters and my guess is that it's because of wrong encoding but I'm not sure about what to do.
If I set column data to 头版 it returns
>> 头版
If I set column data to 头版 it returns
>> ??
It works fine because if I insert '头版' into the database, it will get inserted as '头版' but I would like it to display the characters correctly, so searching through the database will be easier.
I've tried running this query when connected to the database
SET NAMES utf8;
Also tried this
SET NAMES utf8; SELECT * FROM `table` ORDER BY num;
But it doesn't change anything.
The culture is set zh-Hans.
The column should be nvarchar. This type supports Unicode and allows non-English characters (such as Mandarin, Arabic, etc.) to be used.
Update
The above was for SQL Server. For MySql the column should be VARCHAR(50) CHARACTER SET UCS2.
UCS2 is better than utf-8 for Chinese because most of its characters require 16-bit code points. If using utf-8, 3 bytes would be needed to store the code point.

C# chinese Encoding/Network

I have a Client/Server architecture where messages in text-format are exchanged.
For example:
12 2013/11/11 abcd 5
^ ^ ^ ^
int date text int
Everything works fine with "normal" text.
Now this is a chinese project, so they also want so send chinese symbols. Encoding GB18030 or GB2312.
I read the data this way:
char[] dataIn = binaryReader.ReadChars(length);
then i create a new string from the char array and convert it to the right data type (int, float, string etc.).
How can I change/enable chinese encoding, or convert the string values to chinese?
And what would be a good & easy way to test this.
Thanks.
I tried using something like this
string stringData = new string(dataIn).Trim();
byte[] data = Encoding.Unicode.GetBytes(stringData);
stringData = Encoding.GetEncoding("GB18030").GetString(data);
Without success.
Also I need to save some text values to MS SQL Server 2008, is this possible - do I need to configurate anything special?
I also tried this example with storing to the database and printing to the console, but I just get ????????
string chinese = "123东北特钢大连新基地testtest";
byte[] utfBytes = Encoding.Unicode.GetBytes(chinese);
byte[] chineseBytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding("GB18030"), utfBytes);
string msg = Encoding.GetEncoding("GB18030").GetString(chineseBytes);
Edit
The problem was with the INSERT queries, which I send to the database. I fixed it with using N' before the string.
sqlCommand = string.Format("INSERT INTO uber_chinese (columnName) VALUES(N'{0}')", myChineseString);
Also the column dataType has to be nvarchar instead of varchar.
This anser is "promoted" (by request from the Original Poster) from comments by myself.
In the .NET Framework, strings are already Unicode strings.
(Don't test Unicode strings by writing to the console, though, since the terminal window and console typically won't display them correctly. However, since .NET version 4.5 there is some support for this.)
The thing to be aware of is the Encoding when you get text from an outside source. In this case, the constructor of BinaryReader offers an overload that takes in an Encoding:
using (var binaryReader = new BinaryReader(yourStream, Encoding.GetEncoding("GB18030")))
...
On the SQL Server, be sure that any column that needs to hold Chinese strings is of type nvarchar (or nchar), not just varchar (char). Otherwise, depending on the collation, the column may not be able to hold general Unicode characters (it may be represented internally by some 8-bit Microsoft code page).
Whenever you give an nchar literal in SQL, use the format N'my text', not just 'my text', to make sure the literal is interpreted as an nchar rather than just char. For example N'Erdős' is distinct from N'Erdos' while, in many collations, 'Erdős' and 'Erdos' might be (projected onto) the same value in the underlying code page.
Similarly N'东北特钢大连新基地' will work, while '东北特钢大连新基地' might result in a lot of question marks. From the update of your quetion:
sqlCommand = string.Format("INSERT INTO uber_chinese (columnName) VALUES(N'{0}')", myChineseString);
↑
(This is prone to SQL injection, of course.)
The default collation of your column will be that of your database (SQL_Latin1_General_CP1_CI_AS from your comment). Unless you ORDER BY that column, or similar, that will probably be fine. If you do order by this column, consider using some Chinese language collation for the column (or for the entire database).

Insert Russian Language data into database from an array

My query looks like:
string str = string.Format("Insert into [MyDB].[dbo].[tlb1] ([file_path],[CONTENT1],[CONTENT2]) values ('{0}','{1}','{2}');", fullpath, _val[0], _val[1]);
Now when I insert data into database if array _val[] contains data in english language it insert correctly but when array contains data in Russian Language in database this show like ???????????????????????
Is there a way to insert data in Russian Language from an array.
According to this (Archived) Microsoft Support Issue:
You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server
First of all, you should use prepared statements and let the database driver insert the placeholders correctly (i.e. SqlCommand with parameters). Then the issue should go away (as well as any potential SQL injection problems).
As a quick fix in your case: Prefix the string literals you're inserting with N:
... values (N'{0}',N'{1}',N'{2}')
This causes the literals to be Unicode literals, not arbitrary-legacy-codepage ones and thus preventing the conversion from Unicode to the legacy codepage (which results in question marks for characters that cannot be represented).
It seems that the datatype of columns [Content1] and [Content2] is nchar. You should convert the columns to nvarchar which is used to store unicode data.
First of all you must see Database codepage at server. May be non-Unicode CP in database, but data from your app comes in Unicode format.

Categories