I understand that collation can be set differently in different tables in a database. Collation is understood from What does character set and collation mean exactly?
There is a query that performs CAST from a char results as shown below. There are no tables involved. I guess, the encoding applied will be based on the collation in database level. Is this assumption correct?
SELECT CAST ( SSS.id_encrypt ('E','0000000{0}') AS CHAR(100) FOR BIT DATA)
AS ENCRYPT_ID FROM FFGLOBAL.ONE_ROW FETCH FIRST 1 ROW ONLY
QUESTION
In the question Get Byte[] from Db2 without Encoding answer given by #AlexFilipovici [.Net BlockCopy ] provides a different result when compared to CAST result. Why is it so if there is no codepage associated?
Based on National language support - Character conversion
Bit data (columns defined as FOR BIT DATA, or BLOBs, or binary strings) is not associated with any character set.
REFERENCE
Get Byte[] from Db2 without Encoding
Default code page for new databases is Unicode
National language support - Character conversion
To find out the collation at database level in SQL Server, try this:
SELECT DATABASEPROPERTYEX('databasename', 'Collation');
More: DATABASEPROPERTYEX
To answer your questions:
#1: Specifying FOR BIT DATA on a character-based data type (in DB2) means that DB2 stores / returns the raw data back with no codepage associated (i.e. it's just a string of bytes and will not go through any codepage conversion between client and server).
#2: In DB2 for Linux, UNIX and Windows, you can determine the database's collation by querying SYSIBMADM.DBCFG
select name,value
from sysibmadm.dbcfg
where name in ('codepage','codeset');
#3: Per #Iswanto San:
SELECT DATABASEPROPERTYEX('databasename', 'Collation');
Related
Question: Is it possible to search for values that are in between (i.e., BETWEEN, greater than, and less than type math operators) each other when the data is stored in a VARBINARY data type?
Problem: I have a list of IP addresses (both IPv4 and IPv6) where I need to determine the geolocation of that IP address, which means I need to search between ranges.
Typically, this can be accomplished by converting the address to integer and then using the BETWEEN operator. However, with IPv6 effectively exceeding all numeric, decimal, and integer related data types, as of this posting, then it appears that I need to store the data in the VARBINARY data type.
I have not used this data type in the past, so I am not aware of how, or if it is even possible, to search between values. My searches online have not turned up any hits, so I am asking here.
Note: currently using SQL Server 2014, but will be migrating to SQL Server 2017 for this project.
Your approach is correct.
You can use VARBINARY operator for comparison.
Here is an approved answer in MSDN groups. But the link may be broken in future, so I am pasting the query also below.
Questions about dealing with IPV6 varbinary and comparing hex values in a range?
Query:
DECLARE #b1 varbinary(16) = convert(varbinary(16), newid()),
#b2 varbinary(16) = convert(varbinary(16), newid())
SELECT CASE WHEN #b1 > #b2 THEN '#b1 is bigger' ELSE '#b2 is bigger' END
Background
I noticed that when saving data from my MVC website through Entity Framework, if I had something like the Greek "α" it would be converted to "a".
Actions Taken
I overrode OnModelCreating in the database context and added the following code.
modelBuilder.Properties<string>().Configure(x => { x.HasColumnType("NVARCHAR"); x.IsUnicode(true); });
This initially looked promising as the newly generated migration had this structure.
AlterColumn("dbo.Item", "Name", c => c.String(maxLength: 800, storeType: "nvarchar"));
And after running the migration I saw the relevant columns had collation utf8_general_ci.
Persisting Problems
This changed nothing when saving data through my application. When passing Greek characters down from the website it still downgrades to a basic equivalent.
If I try to add these letters directly through MySQL Workbench however, it stores them just fine and the website will display correctly when retrieving the data.
Other Information
Using the database logging code below, I was able to see the SQL Entity Framework is using.
dbContext.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);
The seemingly okay SQL.
SET SESSION sql_mode='ANSI';INSERT INTO `Item`(
`Name`,
`Owner_Id`) VALUES (
#gp1,
#gp2);
-- #gp1: 'The_α_1' (Type = String, IsNullable = false, Size = 7)
-- #gp2: '7a897e05-cc87-410b-bc80-70c75abae95b' (Type = String, IsNullable = false, Size = 36)
Any ideas? Thanks for any help.
MySQL allows for configuring several aspects of the client-server communication (according to the 10.4 Connection Character Sets and Collations documentation):
Source (i.e. client) encoding: character_set_client
Destination (i.e. server) encoding: character_set_connection
Returned data and meta-data: character_set_results
I am guessing that it is assumed that the source encoding, coming from a Microsoft technology, is UTF-16 Little Endian.
As for the other two, the Connector/NET Connection-String Options Reference documentation states:
CharSet , Character Set
Specifies the character set that should be used to encode all queries sent to the server. Results are still returned in the character set of the result data.
The connection to MySQL needs to be told that the target encoding is UTF-8 (which is what your MySQL columns are using). MySQL is currently assuming that you are sending non-Unicode strings, effectively doing the same thing as converting to VARCHAR in SQL Server, assuming that the code page specified by the default Collation of the current Database is 1252 (Windows Code Page 1252 is commonly referred to as "ANSI", even if that is a technically inaccurate name).
The following shows the behavior in SQL Server by not prefixing the string with an upper-case "N":
SELECT 'α'; -- Database's default Collation = Latin1_General_100_CI_AS_SC
-- a
SELECT 'α'; -- Database's default Collation = Hebrew_100_BIN2
-- ?
Try the following to fix this:
First attempt should be to add the following to your connection string to send character data as UTF-8 to MySQL (this should just set character_set_connection):
CharSet=utf8;
Full Connection String example here
Second attempt should be to send a SQL command, upon initial connection, to set the session-level variable that controls the destination encoding:
SET character_set_connection = utf8;
For more information, please see the following:
MySQL Charset/Collate
According to the "utf8 Collations" section of that page, it would be far better to use utf8_unicode_ci for the Collation instead of utf8_general_ci (to be clear, this recommendation has nothing to do with the character conversion issue being dealt with here).
P.S. This question / answer has a companion Q & A on DBA.StackExhange:
Why do I get incorrect characters when decoding a Base64 string to NVARCHAR in SQL Server?
I'm working with C# and MySQL now. I've tried to search around the internet for day to find out why I can't use AddWithValue method to add unicode characters because when I manually add it in MySQL, it works! But back in the C# code with MySQL connector for .NET it doesn't work. Other than the unicode characters is fine.
cmd.CommandText = "INSERT INTO tb_osm VALUES (#id, #timestamp, #user)";
cmd.Parameters.AddWithValue("#id", osmobj.ID);
cmd.Parameters.AddWithValue("#timestamp", osmobj.TimeStamp);
cmd.Parameters.AddWithValue("#user", osmobj.User);
cmd.ExecuteNonQuery();
For example: osmbj.User = "ສະບາຍດີ", it will be "???????" in the database.
Please T^T
does this link help you?
read/write unicode data in MySql
Basically it says, you should append your connection string with charset=utf8;
Like so:
id=my_user;password=my_password;database=some_db123;charset=utf8;
You have to be sure that unicode characters are supported at every level of the process, all the way from the input into C# to the column stored in MySql.
The C# level is easy, because strings are already utf-16 by default. As long as you're not using some weird gui toolkit, reading from a bad file or network stream, or running in a weird console app environment with no unicode support, you'll be in good shape.
The next layer is the parameter definition. Here, you're better off avoiding the AddWithValue() method, anyway. The link pertains the Sql Server, but the same reasoning applies to MySql, even if MySql is less strict with your data than it should be. You should use an Add() override that lets you explicitly the declare the type of your parameters as NVarChar, instead of making the ADO.Net provider try to guess.
Next up is the connection between your application and the database. Here, you want to make sure to include the charset=utf8 clause (or better) as part of the connection string.
Then we need to think about the collation of the database itself. You have to be sure that an NVarChar column in MySql will be able to support your data. One of the answers from the question at previous link also covers how to handle this.
Finally, make sure the column is defined with the NVarChar type, instead of just VarChar.
Yes, utf8 at all stages -- byte-encoding in client, conversion on the wire (charset=utf8), and on the column. I do not know whether C# converts from utf16 to utf8 before exposing the characters; if it does not, then charset=utf16 (or no setting) may be the correct step.
Because you got multiple ?, the likely cause is trying to transform non-latin1 characters into a CHARACTER SET latin1 column. Since latin1 has no codes for Lao, ? was substituted. Probably you said nothing about the column, but depended on the DEFAULT on the table and/or database, which happened to be latin1.
The ສະບາຍດີ is lost and cannot be recovered from ???????.
Once you have changed things, check that it is stored correctly by doing SELECT col, HEX(col) .... For the string ສະບາຍດີ, you should get hex E0BAAAE0BAB0E0BA9AE0BAB2E0BA8DE0BA94E0BAB5. Notice how that is groups of E0BAxx, which is the range of utf8 values for Lao.
If you still have troubles, please provide the HEX for further analysis.
EDIT: I am now strongly suspecting this behavior is due to a bug in the OleDB.Oracle provider. Upon other testing, I was able to perform Select statements against other CAST column values with negative scale that did not cause the 'Decimal byte constructor...' exception. I also note that the provider is returning the absolute value of the scale when viewing the schema, eg scale of -2 is returned as 2. Additionally, this same test query does not cause an exception when run through the ODP.NET driver (rather than the Oracle OLEDB provider). Changing the numeric delimiter as suggested by Lalit (in comments) did not affect the results (but I thank him for his time nonetheless). I continue to research this problem and will advise if more information is realized.
I have a 64-bit C# application that fetches data from an Oracle database via the Oracle 11g OLEDB provider. When Oracle returns a numeric type defined or cast with negative scale (such as 'Select Cast(123.1 as Number(3,-1))', the mapped OleDB schema (from GetSchemaTable) is reporting that column as a Decimal with a scale of 255. The documentation indicates 255 is intended to represent an N/A or irrelevant value.
When OleDBDataReader.GetValues() is later called on the row containing such a column, an ArgumentException is thrown, advising that a 'Decimal byte array constructor...requires four valid decimal bytes," telling me that even though the OleDB Provider thinks its Decimal data, there's no valid Decimal data to read. I'm making an assumption that data is present, but not sure exactly what.
I have tried:
Explicitly getting the bytes from the column via calls to OleDbDataReader.GetBytes (even a call to size a buffer excepts), but doing so throws "Specified cast is not valid" ArgumentExceptions.
Written a chunk of test code to get every possible supported return data type, eg GetInt16, GetInt32, etc., etc., and each throws the same exception (invalid cast).
Does the Oracle OleDB provider not even return data to the caller when fetching a column defined with a negative scale? Is there some other mechanism to at least get the bytes "across the pond" and manipulate them on the receiving end?
I need to do some in-memory merging in C# of two sorted streams of strings coming from one or more SQL Server 2000 databases into a single sorted stream. These streams of data can be huge, so I don't want to pull both streams into memory. Instead, I need to keep one item at a time from each stream in memory and at each step, compare the current item from each stream, push the minimum onto the final stream, and pull the next item from the appropriate source stream. To do this correctly, though, the in-memory comparison has to match the collation of the database (consider the streams [A,B,C] and [A,B,C]: the correct merged sequence is [A,A,B,B,C,C], but if your in-memory comparison thinks C < B, your in-memory merge will yield A,A,B, at which point it will be looking at a B and a C, and will yield the C, resulting in an incorrectly sorted stream.)
So, my question is: is there any way to mimic any of the collations in SQL Server 2000 with a System.StringComparison enum in C# or vise-versa? The closest I've come is to use System.StringCompaison.Ordinal with the results of the database strings converted to VARBINARY with the standard VARBINARY ordering, which works, but I'd rather just add an "order by name collate X" clause to my SQL queries, where X is some collation that works exactly like the VARBINARY ordering, rather than converting all strings to VARBINARY as they leave the database and then back to strings as they come in memory.
Have a look at the StringComparer class. This provides for more robust character and string comparisons than you'll find with String.Compare. There are three sets of static instances (CurrentCulture, InvariantCulture, Ordinal) and case-insesitive versions of each. For more specialized cultures, you can use the StringComparer.Create() function to create a comparer tied to a particular culture.
With sql 2005 I know that the db engine does not make OS calls to do the sorting, the ordering rules are statically shipped with the db (may update with a service pack, but doesn't change with the OS). So I don't think you can safely say that a given set of application code can order the same way unless you have the same code as the db server, unless you use a binary collation.
But if you use a binary collation in the db and client code you should have no problem at all.
EDIT - any collation that ends in _BIN will give you binary sorting. The rest of the collation name will determine what code page is used for storing CHAR data, but will not affect the ordering. The _BIN means strictly binary sorting. See http://msdn.microsoft.com/en-us/library/ms143515(SQL.90).aspx