I'm working with C# and MySQL now. I've tried to search around the internet for day to find out why I can't use AddWithValue method to add unicode characters because when I manually add it in MySQL, it works! But back in the C# code with MySQL connector for .NET it doesn't work. Other than the unicode characters is fine.
cmd.CommandText = "INSERT INTO tb_osm VALUES (#id, #timestamp, #user)";
cmd.Parameters.AddWithValue("#id", osmobj.ID);
cmd.Parameters.AddWithValue("#timestamp", osmobj.TimeStamp);
cmd.Parameters.AddWithValue("#user", osmobj.User);
cmd.ExecuteNonQuery();
For example: osmbj.User = "ສະບາຍດີ", it will be "???????" in the database.
Please T^T
does this link help you?
read/write unicode data in MySql
Basically it says, you should append your connection string with charset=utf8;
Like so:
id=my_user;password=my_password;database=some_db123;charset=utf8;
You have to be sure that unicode characters are supported at every level of the process, all the way from the input into C# to the column stored in MySql.
The C# level is easy, because strings are already utf-16 by default. As long as you're not using some weird gui toolkit, reading from a bad file or network stream, or running in a weird console app environment with no unicode support, you'll be in good shape.
The next layer is the parameter definition. Here, you're better off avoiding the AddWithValue() method, anyway. The link pertains the Sql Server, but the same reasoning applies to MySql, even if MySql is less strict with your data than it should be. You should use an Add() override that lets you explicitly the declare the type of your parameters as NVarChar, instead of making the ADO.Net provider try to guess.
Next up is the connection between your application and the database. Here, you want to make sure to include the charset=utf8 clause (or better) as part of the connection string.
Then we need to think about the collation of the database itself. You have to be sure that an NVarChar column in MySql will be able to support your data. One of the answers from the question at previous link also covers how to handle this.
Finally, make sure the column is defined with the NVarChar type, instead of just VarChar.
Yes, utf8 at all stages -- byte-encoding in client, conversion on the wire (charset=utf8), and on the column. I do not know whether C# converts from utf16 to utf8 before exposing the characters; if it does not, then charset=utf16 (or no setting) may be the correct step.
Because you got multiple ?, the likely cause is trying to transform non-latin1 characters into a CHARACTER SET latin1 column. Since latin1 has no codes for Lao, ? was substituted. Probably you said nothing about the column, but depended on the DEFAULT on the table and/or database, which happened to be latin1.
The ສະບາຍດີ is lost and cannot be recovered from ???????.
Once you have changed things, check that it is stored correctly by doing SELECT col, HEX(col) .... For the string ສະບາຍດີ, you should get hex E0BAAAE0BAB0E0BA9AE0BAB2E0BA8DE0BA94E0BAB5. Notice how that is groups of E0BAxx, which is the range of utf8 values for Lao.
If you still have troubles, please provide the HEX for further analysis.
Related
Background
I noticed that when saving data from my MVC website through Entity Framework, if I had something like the Greek "α" it would be converted to "a".
Actions Taken
I overrode OnModelCreating in the database context and added the following code.
modelBuilder.Properties<string>().Configure(x => { x.HasColumnType("NVARCHAR"); x.IsUnicode(true); });
This initially looked promising as the newly generated migration had this structure.
AlterColumn("dbo.Item", "Name", c => c.String(maxLength: 800, storeType: "nvarchar"));
And after running the migration I saw the relevant columns had collation utf8_general_ci.
Persisting Problems
This changed nothing when saving data through my application. When passing Greek characters down from the website it still downgrades to a basic equivalent.
If I try to add these letters directly through MySQL Workbench however, it stores them just fine and the website will display correctly when retrieving the data.
Other Information
Using the database logging code below, I was able to see the SQL Entity Framework is using.
dbContext.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);
The seemingly okay SQL.
SET SESSION sql_mode='ANSI';INSERT INTO `Item`(
`Name`,
`Owner_Id`) VALUES (
#gp1,
#gp2);
-- #gp1: 'The_α_1' (Type = String, IsNullable = false, Size = 7)
-- #gp2: '7a897e05-cc87-410b-bc80-70c75abae95b' (Type = String, IsNullable = false, Size = 36)
Any ideas? Thanks for any help.
MySQL allows for configuring several aspects of the client-server communication (according to the 10.4 Connection Character Sets and Collations documentation):
Source (i.e. client) encoding: character_set_client
Destination (i.e. server) encoding: character_set_connection
Returned data and meta-data: character_set_results
I am guessing that it is assumed that the source encoding, coming from a Microsoft technology, is UTF-16 Little Endian.
As for the other two, the Connector/NET Connection-String Options Reference documentation states:
CharSet , Character Set
Specifies the character set that should be used to encode all queries sent to the server. Results are still returned in the character set of the result data.
The connection to MySQL needs to be told that the target encoding is UTF-8 (which is what your MySQL columns are using). MySQL is currently assuming that you are sending non-Unicode strings, effectively doing the same thing as converting to VARCHAR in SQL Server, assuming that the code page specified by the default Collation of the current Database is 1252 (Windows Code Page 1252 is commonly referred to as "ANSI", even if that is a technically inaccurate name).
The following shows the behavior in SQL Server by not prefixing the string with an upper-case "N":
SELECT 'α'; -- Database's default Collation = Latin1_General_100_CI_AS_SC
-- a
SELECT 'α'; -- Database's default Collation = Hebrew_100_BIN2
-- ?
Try the following to fix this:
First attempt should be to add the following to your connection string to send character data as UTF-8 to MySQL (this should just set character_set_connection):
CharSet=utf8;
Full Connection String example here
Second attempt should be to send a SQL command, upon initial connection, to set the session-level variable that controls the destination encoding:
SET character_set_connection = utf8;
For more information, please see the following:
MySQL Charset/Collate
According to the "utf8 Collations" section of that page, it would be far better to use utf8_unicode_ci for the Collation instead of utf8_general_ci (to be clear, this recommendation has nothing to do with the character conversion issue being dealt with here).
P.S. This question / answer has a companion Q & A on DBA.StackExhange:
Why do I get incorrect characters when decoding a Base64 string to NVARCHAR in SQL Server?
Overview
This question is a more specific version of this one:
sql server - performance hit when passing argument of C# type Int64 into T-SQL bigint stored procedure parameter
But I've noticed the same performance hit for other data types (and, in fact, in my case I'm not using any bigint types at all).
Here are some other questions that seem like they should cover the answer to this question, but I'm observing the opposite of what they indicate:
c# - When should "SqlDbType" and "size" be used when adding SqlCommand Parameters? - Stack Overflow
.net - What's the best method to pass parameters to SQLCommand? - Stack Overflow
Context
I've got some C# code for inserting data into a table. The code is itself data-driven in that some other data specifies the target table into which the data should be inserted. So, tho I could use dynamic SQL in a stored procedure, I've opted to generate dynamic SQL in my C# application.
The command text is always the same for row I insert so I generate it once, before inserting any rows. The command text is of the form:
INSERT SomeSchema.TargetTable ( Column1, Column2, Column3, ... )
VALUES ( SomeConstant, #p0, #p1, ... );
For each insert, I create an array of SqlParameter objects.
For the 'nvarchar' behavior, I'm just using the SqlParameter(string parameterName, object value) constructor method, and not setting any other properties explicitly.
For the 'degenerate' behavior, I was using the SqlParameter(string parameterName, SqlDbType dbType) constructor method and also setting the Size, Precision, and Scale properties as appropriate.
For both versions of the code, the value either passed to the constructor method or separately assigned to the Value property has a type of object.
The 'nvarchar' version of the code takes about 1-1.5 minutes. The 'degenerate' or 'type-specific' code takes longer than 9 minutes; so 6-9 times slower.
SQL Server Profiler doesn't reveal any obvious culprits. The type-specific code is generating what would seem like better SQL, i.e. a dynamic SQL command whose parameters contain the appropriate data type and type info.
Hypothesis
I suspect that, because I'm passing an object type value as the parameter value, the ADO.NET SQL Server client code is casting, converting, or otherwise validating the value before generating and sending the command to SQL Server. I'm surprised tho that the conversion from nvarchar to each of the relevant target table column types that SQL Server must be performing is so much faster than whatever the client code is doing.
Notes
I'm aware that SqlBulkCopy is probably the best-performing option for inserting large numbers of rows but I'm more curious why the 'nvarchar' case out-performs the 'type-specific' case, and my current code is fast enough as-is given the amount of data it routinely handles.
The answer does depend on the database you are running, but it has to do with the character encoding process. SQL Server introduced the NVarChar and NText field types to handle UTF encoded data. UTF also happens to be the internal string representation for the .NET CLR. NVarChar and NText don't have to be converted to another character encoding, which takes a very short but measurable amount of time.
Other databases allow you to define character encoding at the database level, and others let you define it on a column by column basis. The performance differences really depend on the driver.
Also important to note:
Inserting using a prepared statement emphasizes inefficiencies in converting to the database's internal format
This has no bearing on how efficient the database queries against a string--UTF-16 takes up more space than the default Windows-1252 encoding for Text and VarChar.
Of course, in a global application, UTF support is necessary
They're Not (but They're Almost as Fast)
My original discrepancy was entirely my fault. The way I was creating the SqlParameter objects for the 'degenerate' or 'type-specific' version of the code used an extra loop than the 'nvarchar' version of the code. Once I rewrote the type-specific code to use the same number of loops (one), the performance is almost the same. [About 1–2% slower now instead of 500-800% slower.]
A slightly modified version of the type-specific code is now a little faster; at least based on my (limited) testing – about 3-4% faster for ~37,000 command executions.
But it's still (a little) surprising that it's not even faster, as I'd expect SQL Server converting hundreds of nvarchar values to lots of other data types (for every execution) to be significantly slower than the C# code to add type info to the parameter objects. I'm guessing it's really hard to observe much difference because the time for SQL Server to convert the parameter values is fairly small relative to the time for all of the other code (including the SQL client code communicating with SQL Server).
One lesson I hope to remember is that it's very important to compare like with like.
Another seeming lesson is that SQL Server is pretty fast at converting text to its various other data types.
I am trying to fetch the record of 3rd june of 2013 from my database which is made in ms access. Dates are stored in the format of dd/MM/yyyy, below is my query
AND (a.Date = #" + date + "#) ) order by e.E_ID asc
But the amazing thing is i have inserted a record on date of 03/06/2013 which is todays date, while it takes it as 6th march 2013, i have corrected my regional settings, still the same issue. Also in my query i am query for matching date i am using dd/MM/yyyy. Is this a bug from microsoft? please help
Dates are stored in the format of dd/MM/yyyy
I suspect they're not. I suspect they're stored in some native date/time format which is doubtless much more efficient than a 10 character string. (I'm assuming you're using an appropriate field type rather than varchar, for example.) It's important to differentiate between the inherent nature of the data and "how it gets displayed when converted to text".
But the amazing thing
I don't see this as amazing. I see it as a perfectly natural result of using string conversions unnecessarily. They almost always bite you in the end. You're not trying to represent a string - you're trying to represent a date. So use that type as far as you possibly can.
You should:
Use parameterized SQL for queries for many reasons - most importantly to avoid SQL injection attacks, but also to avoid unneccessary string conversions of this kind
Specify the parameter value as a DateTime, thus avoiding the string conversion
You haven't specified which provider type you're using - my guess is OleDbConnection etc. Generally if you look at the documentation for the Parameters property of the relevant command class, you'll find an appropriate example. For example, OleDbCommand.Parameters shows a parameterized query on an OleDbConnection. One thing worth noting from the docs:
The OLE DB .NET Provider does not support named parameters for passing parameters to an SQL statement or a stored procedure called by an OleDbCommand when CommandType is set to Text. In this case, the question mark (?) placeholder must be used. [...]
Therefore, the order in which OleDbParameter objects are added to the OleDbParameterCollection must directly correspond to the position of the question mark placeholder for the parameter in the command text.
I'm working a C# form application that ties into an access database. Part of this database is outside of my control, specifically a part that contains strings with ", ), and other such characters. Needless to say, this is mucking up some queries as I need to use that column to select other pieces of data. This is just a desktop form application and the issue lies in an exporter function, so there's no concern over SQL injection or other such things. How do I tell this thing to ignore quotes and such in a query when I'm using a variable that may contain them and match that to what is stored in the Access database?
Well, an example would be that I've extracted several columns from a single row. One of them might be something like:
large (3-1/16" dia)
You get the idea. The quotes are breaking the query. I'm currently using OleDb to dig into the database and didn't have an issue until now. I'd rather not gut what I've currently done if it can be helped, at least not until I'm ready for a proper refactor.
This is actually not as big problem as you may see it: just do NOT handle SQL queries by building them as plain strings. Use SqlCommand class and use query parameters. This way, the SQL engine will escape everything properly for you, because it will know what is the code to be read directly, and what is the parameter's value to be escaped.
You are trying to protect against a SQL Inject attack; see https://www.owasp.org/index.php/SQL_Injection.
The easiest way to prevent these attacks is to use query parameters; http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlparameter.aspx
var cmd = new SqlCommand("select * from someTable where id = #id");
cmd.Parameters.Add("#id", SqlDbType.Int).Value = theID;
At least for single quotes, adding another quote seems to work: '' becomes '.
Even though injection shouldn't be an issue, I would still look into using parameters. They are the simpler option at the end of the day as they avoid a number of unforeseen problems, injection being only one of them.
So as I read your question, you are building up a query as a string in C#, concatenating already queried column values, and the resulting string is either ceasing to be a string in C#, or it won't match stuff in the access db.
If the problem is in C#, I guess you'll need some sort of escaping function like
stringvar += escaped(columnvalue)
...
private static void escaped(string cv) as string {
//code to put \ in front of problem characters in cv
}
If the problem is in access, then
' escapes '
" escapes "
& you can put a column value containing " inside of '...' and it should work.
However my real thought is that, the SQL you're trying to run might be better restructured to use subqueries to get the matched value(s) and then you're simply comparing column name with column name.
If you post some more information re exactly what the query you're producing is, and some hint of the table structures, I'll try and help further - or someone else is bound to be able to give you something constructive (though you may need to adjust it per Jet SQL syntax)
My question is very similar to this question - Addin parameter for Oracle except that i'm using Oracle 11g. The datbase has two different charactersets for VARCHAR(Western European) and NVARCHAR(Unicode) datatypes.
db.AddInParameter(cmd, "nationalColumn", DbType.String, "高野")
National characterset in the database is unicode and so NVARCHAR columns are able to hold these characters.
My question is how do i tell db.AddInParameter function that the parameter i'm adding is a NVARCHAR and not a VARCHAR which it seems to be assuming by default.
Adding to this - I'm using System.Data.OracleClient to connect to the database
You can't encode Chinese characters in the Western Europe encoding. This encoding has a limited number of characters defined and they don't include Chinese.
What output did you expect? I'd expect either the data to be garbled or an error to be returned.
Are you specifying the parameters yourself or are you letting Enterprise Library try to find out the parameters type?
If you are calling
command.AddInParameter(“parameterName”, value);
try calling the procedure and let Enterprise Library find out the parameters, like this example:
DB.ExecuteNonQuery(“PKG_USER.DELETE”, userId);
The procedure expects an INT parameter called P_ID, but the command only pass in the parameter value. EntLib uses the parameters order to send them.
Also take a look at
this other post that I wrote: http://devstuffs.wordpress.com/2012/03/13/enterprise-library-5-with-odp-net/
this sample code: https://github.com/stanleystl/EntLib5ODP.NET
and this stack overflow answer: C#/Oracle: Specify Encoding/Character Set of Query?
I'm answering my own question as i spent a week figuring it out. The System.Data namespace doesn't understand that the column it's handling is a national column unless it is specified explicitly as OracleType.Nvarchar.
To resolve the situation above, I had to override the AddInParameter function to add a switch case that checks if the input DBType is a String and map it to a OracleType.NVarchar.
HTH