Insert Russian Language data into database from an array

Insert Russian Language data into database from an array - c#

My query looks like:
string str = string.Format("Insert into [MyDB].[dbo].[tlb1] ([file_path],[CONTENT1],[CONTENT2]) values ('{0}','{1}','{2}');", fullpath, _val[0], _val[1]);
Now when I insert data into database if array _val[] contains data in english language it insert correctly but when array contains data in Russian Language in database this show like ???????????????????????
Is there a way to insert data in Russian Language from an array.

According to this (Archived) Microsoft Support Issue:
You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server

First of all, you should use prepared statements and let the database driver insert the placeholders correctly (i.e. SqlCommand with parameters). Then the issue should go away (as well as any potential SQL injection problems).
As a quick fix in your case: Prefix the string literals you're inserting with N:
... values (N'{0}',N'{1}',N'{2}')
This causes the literals to be Unicode literals, not arbitrary-legacy-codepage ones and thus preventing the conversion from Unicode to the legacy codepage (which results in question marks for characters that cannot be represented).

It seems that the datatype of columns [Content1] and [Content2] is nchar. You should convert the columns to nvarchar which is used to store unicode data.

First of all you must see Database codepage at server. May be non-Unicode CP in database, but data from your app comes in Unicode format.

Related

Arabic_CI_AS to utf8 in C#

I have a DataBase in Sql Server with collection Arabic_CI_AS and i need to compare some string data with another Postgres Database with Utf8 character set. Also i use C# for convert & compare. It easy done when string contains just one word (in these cases i should just replace 'ي' to 'ی'), but for long string special with '(' charachter has problem.
I cant do it! I try some suggested solution such as:
var enc = Encoding.GetEncoding(1256);
byte[] encBytes = enc.GetBytes(customer.name);
customer.name = Encoding.UTF8.GetString(encBytes, 0, encBytes.Length);
or:
SELECT cast (name as nvarchar) as NewName
from Customer
But they dont work! Can anyone help me?
Example of input and output, see tooltips on the right:

maybe this can help you to change your collation dynamically
SELECT name collate SQL_Latin1_General_CP1_CI_AS
from Customer
or
SELECT name collate Persian_100_CI_AI
from Customer
or
you can try this in c# side
string _Value=string.Empty;
byte[] enBuff= Encoding.GetEncoding("windows-1256").GetBytes(customer.name);
customer.name= Encoding.GetEncoding("windows-1252").GetString(enBuff);
you can choose another collations too.
you should change many collation and Encoding number to get wanted result.

SQL Server does not support utf-8 strings. If you have to deal with characters other than plain-latin it is strongly recommended to use NVARCHAR instead of VARCHAR with an arabic collation.
Many people think, that NVARCHAR is utf-16 while VARCHAR is utf-8. This is not true! The second is extended ASCII and is using 1 byte in any case, while utf-8 will encode some characters with more than one byte.
So - the most important question is: WHY?
SQL Server can take your string into a NVARCHAR variable, cast it to a chain of bytes and re-cast it to the former string:
DECLARE #str NVARCHAR(MAX)=N'(نماینده اراک)';
SELECT #str
,CAST(#str AS VARBINARY(MAX))
,CAST(CAST(#str AS VARBINARY(MAX)) AS NVARCHAR(MAX));
The problem with the ) is - quite probably! - that your arabic letters are right-to-left while the ) is left-to-right. I wanted to paste the result of the query above into this answer but did not manage to get the closing ) to the original place... You try to edit, delete, replace, but you get something else... Somehow funny, but not a question of bad encoding but one of buggy editors...
Anyway, SQL-Server is not your issue. You must read the string as NVARCHAR out of SQL-Server. C# is working with unicode strings and not a collated 1-byte string. Every conversion carries the chance to destroy your text.
If your target (or the tooltip you show us) is not capable to show the string properly, it might be perfectly okay, but the editor is not...
If you pass such an UTF-8 string back to SQL-Server, you'll get a mess...
The only place, where UTF-8 makes sense is written to a file or transmitted via small band. If a text contains very many plain latin characters and just a few strange letters (like ver often XML, HTML) you can save quite some diskspace or band with. With a far-east text you'd even bloat you text. Some of these characters will need 3 or even 4 bytes to be encoded.
Within your database and application you should stick with unicode.

Converting UTF-8 Encoded Data from Hashtable of ASP.NET Webform Before Inserting Into SQL Server Database

What I am working with:
Within my Asp.net Webforms application, I am getting form data from the user and then inserting that data into a SQL Server database. Each key is the identifier for the field from within the form, and the value is the data received by the user.
My Issue:
My issue is that users are copying and pasting UTF-8 data from emails, etc into the "notes" field. The SQL Server database does not recognize UTF-8 as valid character data. Instead, it utilizes both the the UCS-2 & ISO-8859-1 character sets. Thus, these character sets are being inserted into the database as question marks (?). So, I would like to properly convert any UTF-8 characters to UCS-2 or ISO-8859-1.
Questions:
Should I convert the UTF-8 characters to UCS-2 or to ISO-8859-1?
Within the ASP.NET web form, what is the best means of determining the character sets used within the value for the "notes" key of my hashtable?
What is the best possible means for converting the characters that are UTF-8 into the acceptable character set?

Option 1: use nvarchar
You could just change your field from varchar to nvarchar so that your unicode characters are stored correctly. That's the point of that nvarchar data type. It's cool. Use it.
Option 2: Convert Intelligently.
If you have a legacy db where nvarchar simply wont work, then you can just create a string extension that lets you store the ascii version of your values from users. Below is one such extension (note that we are doing some initial replacements for "smart" quotes/etc before ditching all characters that aren't ascii).
if you're supporting international (accents, etc), then this is a little culturally insensitive ("bah - away with your crazy accent marks and strange non-english looking letters").
public static class StringExt {
static public string TryGetAsciiString(this string original) {
//Replace those msword "smart" characters with ascii (dumb) characters.
string escaped = System.Convert.ToString(p_String.Replace('\u2013', '-').Replace('\u2014', '-').Replace('\u2015', '-').Replace('\u2017', '_').Replace('\u2018', '\'').Replace('\u2019', '\'').Replace('\u201a', ',').Replace('\u201b', '\'').Replace('\u201c', '\"').Replace('\u201d', '\"').Replace('\u201e', '\"').Replace("\u2026", "...").Replace('\u2032', '\'').Replace('\u2033', '\"'));
//regex out all those other non-ascii characters.
escaped = Regex.Replace(p_sVal, "[^A-Za-z 0-9 \\.,\\?\'\"!##\\$%\\^&\\*\\(\\)-_=\\+;:<>\\/\\\\\\|\\}\\{\\[\\]`~\\n\\r]*", "");
//All set..
return escaped;
}
}
Option ... err... 2A? : Ditch the first 30 ascii codes (give or take)
I've noticed that, when users copy/paste from MAC word (and a few other programs), that pasted data contains characters in the first 30 ascii characters. Aside from 9, 10 and 13 ... you can probably ditch those (they're just NUL's ACK's DC's and some other garbage no user would actually type).

C# SQL Select Chinese characters returns weird characters

I'm trying to convert a piece of software into Chinese but I'm having some problems with the database. It returns weird strings of characters and my guess is that it's because of wrong encoding but I'm not sure about what to do.
If I set column data to å¤´ç‰ˆ it returns
>> 头版
If I set column data to 头版 it returns
>> ??
It works fine because if I insert '头版' into the database, it will get inserted as 'å¤´ç‰ˆ' but I would like it to display the characters correctly, so searching through the database will be easier.
I've tried running this query when connected to the database
SET NAMES utf8;
Also tried this
SET NAMES utf8; SELECT * FROM `table` ORDER BY num;
But it doesn't change anything.
The culture is set zh-Hans.

The column should be nvarchar. This type supports Unicode and allows non-English characters (such as Mandarin, Arabic, etc.) to be used.
Update
The above was for SQL Server. For MySql the column should be VARCHAR(50) CHARACTER SET UCS2.
UCS2 is better than utf-8 for Chinese because most of its characters require 16-bit code points. If using utf-8, 3 bytes would be needed to store the code point.

C# chinese Encoding/Network

I have a Client/Server architecture where messages in text-format are exchanged.
For example:
12 2013/11/11 abcd 5
^ ^ ^ ^
int date text int
Everything works fine with "normal" text.
Now this is a chinese project, so they also want so send chinese symbols. Encoding GB18030 or GB2312.
I read the data this way:
char[] dataIn = binaryReader.ReadChars(length);
then i create a new string from the char array and convert it to the right data type (int, float, string etc.).
How can I change/enable chinese encoding, or convert the string values to chinese?
And what would be a good & easy way to test this.
Thanks.
I tried using something like this
string stringData = new string(dataIn).Trim();
byte[] data = Encoding.Unicode.GetBytes(stringData);
stringData = Encoding.GetEncoding("GB18030").GetString(data);
Without success.
Also I need to save some text values to MS SQL Server 2008, is this possible - do I need to configurate anything special?
I also tried this example with storing to the database and printing to the console, but I just get ????????
string chinese = "123东北特钢大连新基地testtest";
byte[] utfBytes = Encoding.Unicode.GetBytes(chinese);
byte[] chineseBytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding("GB18030"), utfBytes);
string msg = Encoding.GetEncoding("GB18030").GetString(chineseBytes);
Edit
The problem was with the INSERT queries, which I send to the database. I fixed it with using N' before the string.
sqlCommand = string.Format("INSERT INTO uber_chinese (columnName) VALUES(N'{0}')", myChineseString);
Also the column dataType has to be nvarchar instead of varchar.

This anser is "promoted" (by request from the Original Poster) from comments by myself.
In the .NET Framework, strings are already Unicode strings.
(Don't test Unicode strings by writing to the console, though, since the terminal window and console typically won't display them correctly. However, since .NET version 4.5 there is some support for this.)
The thing to be aware of is the Encoding when you get text from an outside source. In this case, the constructor of BinaryReader offers an overload that takes in an Encoding:
using (var binaryReader = new BinaryReader(yourStream, Encoding.GetEncoding("GB18030")))
...
On the SQL Server, be sure that any column that needs to hold Chinese strings is of type nvarchar (or nchar), not just varchar (char). Otherwise, depending on the collation, the column may not be able to hold general Unicode characters (it may be represented internally by some 8-bit Microsoft code page).
Whenever you give an nchar literal in SQL, use the format N'my text', not just 'my text', to make sure the literal is interpreted as an nchar rather than just char. For example N'Erdős' is distinct from N'Erdos' while, in many collations, 'Erdős' and 'Erdos' might be (projected onto) the same value in the underlying code page.
Similarly N'东北特钢大连新基地' will work, while '东北特钢大连新基地' might result in a lot of question marks. From the update of your quetion:
sqlCommand = string.Format("INSERT INTO uber_chinese (columnName) VALUES(N'{0}')", myChineseString);
↑
(This is prone to SQL injection, of course.)
The default collation of your column will be that of your database (SQL_Latin1_General_CP1_CI_AS from your comment). Unless you ORDER BY that column, or similar, that will probably be fine. If you do order by this column, consider using some Chinese language collation for the column (or for the entire database).

MySQL comparing Japanese characters in a query as question marks

I have a MySQL database with some varchar fields that can contain Latin characters or Japanese characters. There are entries that contain Japanese characters, that is not a problem. However, from my C# code, using MySqlConnection, I have been unable to get the correct results using Japanese characters in my WHERE clauses. It seems to compare the Japanese characters as though they are question marks. For example a query with WHERE series_title LIKE '%未来警%' does not return values where series_title contains "未来警", but instead returns all entries where series_title contains "???".
Some details:
series_title is a varchar(150) with collation utf8_general_ci.
the ConnectionString for the MySqlConnection includes the kv pair CharSet=utf8_general_ci
the database does contain Japanese characters and is able to return them to the C# client - it only has problems when Japanese characters are being sent to it

Try adding charset=utf8 to your connection string:
server=server;uid=my_user;password=pass;database=db;charset=utf8;
EDIT:
Try execute this sql after connect:
SET NAMES utf8

I would ensure your data is stored using the right encoding. For Japanese, you might want to try eucjp, and you can find out more than you ever wanted to know about character encoding here. It looks like you may also need the BOM. Best of luck and let me know how you get on.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.