Incorrect string value: '\xC2\x9Fe 10...' for column

Incorrect string value: '\xC2\x9Fe 10...' for column - c#

We have a Old 5.1 Mysql server running on server 2003. Recently we move to a newer environment with Mysql 5.6 and server 2008. Now on the new server we keep getting errors when inserting special chars like 'Ã'.
Now I have checked the source encoding and it is UTF-8. But the old Mysql server was configured as latin1(Server / tables / colonms) with collation latin_swedish_ci and we did not receive any errors on the old environment.
Now I have done some testing since we are not live on the new environment. I have tried setting all tables to tables / colonms as well as latin1. In both cases I keep getting these errors.
What I noticed is that on the old server the servers default char-set is latin1 and on the new server its utf-8. Could that be the problem? I find this very strange because the source is utf-8.
Is there maybe some option to handle this that could be turned on on the old environment? I'm not sure if something like that exists. I did compare the settings within the mysql admin tool and apart from the default char-set it looks the same.
EDIT:
SHOW VARIABLES LIKE 'char%';
Old server:
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | utf8 | *
| character_set_connection | utf8 | *
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 | *
| character_set_server | latin1 |
| character_set_system | utf8 |
New Server:
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | utf8mb4 | *
| character_set_connection | utf8mb4 | *
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 | *
| character_set_server | utf8 |
| character_set_system | utf8 |
As far as I understand from the article over at the MySQL site utf8mb4 is a super-set of utf8 this should not create a problem for encoding I think since they are basically identical on encoding right?

The old UTF-8 of MySQL was not real UTF-8. If you try "special" characters (japanese or chinese) you'll probably end up with squares or question marks on your old server.
Your new server is now really using UTF-8 (mb4 stands for multi-bytes 4). The server receives UTF-8 characters but, obviously, can not store UTF-8 characters because your table are not using UTF-8. Convert all the tables to UTF-8 and the database to UTF-8 and you'll solve your problem.
You can do this with :
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Don't forget to backup before.
Source : https://stackoverflow.com/a/6115705/1980659

First, since the old environment was working correctly, the first choice would be to use the same "character set" setting in the new environment. If you still have access to the 5.0 server, grab SHOW VARIABLES;.
5.0 defaulted to latin1; 5.6 defaults to utf8. This is mostly visible in
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | utf8 | *
| character_set_connection | utf8 | *
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 | *
| character_set_server | latin1 |
| character_set_system | utf8 |
SET NAMES utf8; sets the three flagged lines.
Ã is hex C3 in latin1 and C383 in utf8. More encodings here. Do this to see what is currently in a table:
SELECT col, HEX(col) FROM table WHERE ...
Another possibility is that the "move" mangled the data. If you can do the same SELECT on both machines, and if they come out differently, then the migration was bad. Since there are many ways to move data, please provide the details of the migration so we can dissect what might have gone wrong.
In your title, you have C29F. That is a strange one -- it is a control code APPLICATION PROGRAM COMMAND, which I have never heard of. (Note: It is not related to the Ã you mentioned later.) Please provide more examples of the problems; neither of those clues is helpful.

The significant part of this is that your old server had:
| character_set_database | latin1
while your new server has
| character_set_database | utf8
It does not matter that the connection and client are using utf8 if the database is using latin1, the tables will default to latin1 and so the data will be stored in latin1 and you will get your error. You can of course explicitly set the character set and collation for any table to be other than the database default.
I guess that when you migrated the database schema you did not edit the character encoding for the database, or the tables before running the migration script.
Now you can either change the database and each table manually, or you can edit the migration script and rerun it. Most migration script and database dumps will include the specific character set for each table as well as for the database, even when they are all the same.

One experienced I got when I was moving the my application to new env. I got some weird thing when inserting data related to data to be insert to table, my case it complained about the date was empty so it cannot be inserted to table (No change on source code. Only new env(Mysql server from 5.1 to 5.6, tomcat 6 to tomcat 7, new Suse server version).
I try to replace the mysql connector driver to newer version for my application and It resolved the issue.

Related

Extending BSON `$type` attribute for complex object?

I'm trying to store object in MongoDB. This objects comes from third-party system, and has very specific format, i.e. all object properties are stored in dictionary. Values in this dictionary could be of different types and in no particular order.
I believe to effectively search on these field I need to turn them into BSON properties. And it is doable with custom serializer / deserializer, until it comes to deserialization itself. If property is a complex object which is represented as an BSON document, custom deseriazer doesn't know to which type this document should be transformed.
How issues like that solved in a proper way using MongoDB BSON?
I would add new property $type to complex document, and store there destination type during serialization, but it is interfering with build in MongoDB $type property.
Is it possible to use standard and custom $type attributes side by side? What's the best practice approach for implementing custom deserializer in this case?

not without extending the spec itself or including some reference to how it should be (de)serialized in the document itself.
PHP driver has an ODM framework that does exactly what you're proposing. I suggest you look at http://php.net/manual/en/class.mongodb-bson-persistable.php
During serialization, the driver will inject a __pclass property
containing the PHP class name into the data
So, it adds a specifc key "__pclass" to the document to be stored. During deserialization, the driver reads from the key to decide what specific deserialization steps to take and strips the __pclass key/value before it returns the document (now deserialized into whatever PHP class is specified by the __pclass key) to the user.
This is incredibly dangerous if you have any reason to not trust the data held in mongodb. It's basically allowing data to dictate a call to executable PHP code.
About the spec itself.
http://bsonspec.org/spec.html
The types and their associated type index is hard coded into the spec.
element ::= "\x01" e_name double 64-bit binary floating point
| "\x02" e_name string UTF-8 string
| "\x03" e_name document Embedded document
| "\x04" e_name document Array
| "\x05" e_name binary Binary data
| "\x06" e_name Undefined (value) — Deprecated
| "\x07" e_name (byte*12) ObjectId
| "\x08" e_name "\x00" Boolean "false"
| "\x08" e_name "\x01" Boolean "true"
| "\x09" e_name int64 UTC datetime
| "\x0A" e_name Null value
| "\x0B" e_name cstring cstring Regular expression - The first cstring is the regex pattern, the second is the regex options string. Options are identified by characters, which must be stored in alphabetical order. Valid options are 'i' for case insensitive matching, 'm' for multiline matching, 'x' for verbose mode, 'l' to make \w, \W, etc. locale dependent, 's' for dotall mode ('.' matches everything), and 'u' to make \w, \W, etc. match unicode.
| "\x0C" e_name string (byte*12) DBPointer — Deprecated
| "\x0D" e_name string JavaScript code
| "\x0E" e_name string Symbol. Deprecated
| "\x0F" e_name code_w_s JavaScript code w/ scope
| "\x10" e_name int32 32-bit integer
| "\x11" e_name uint64 Timestamp
| "\x12" e_name int64 64-bit integer
| "\x13" e_name decimal128 128-bit decimal floating point
| "\xFF" e_name Min key
| "\x7F" e_name Max key
you could create your own user generated binary subtype if you stored the blob in a binary block, using the user-defined subtype range.
binary ::= int32 **subtype** (byte*) Binary - The int32 is the number of bytes in the (byte*).
subtype ::= "\x00" Generic binary subtype
| "\x01" Function
| "\x02" Binary (Old)
| "\x03" UUID (Old)
| "\x04" UUID
| "\x05" MD5
| **"\x80" User defined**
The down side there is that the object would be stored in the database as a binary blob, making it very difficult to query beyond subtype checking.
Anything beyond that would involve extending the specification itself

Run a string as an instruction to build an email

I have a table in which I save an ID and a rule like:
| ID | Rule |
|------|--------------------------------------|
| 1 | firstname[0]+'.'+lastname+'#'+domain |
| 2 | firstname+'_'+lastname+'#'+domain |
| 3 | lastname[0]+firstname+'#'+domain |
My problem is: How can I get and analyze/execute that rule in my code? Because the cell is taken as a string and I don't know how to apply that rule to my variables or my code.
I was thinking about String.Format, but I don't know how to split a string taking just the first character with it.
If you could give me an advice or any better way to do this, I'd appreciate that because I'm completely lost.

If that is C#, you could construct a LINQ Expression out of the parse tree from for example ANTLR, or if the format is very simple, regex.
You have to make these steps:
Evaluate the incoming string using ANTLR. You could start off with the C# grammar;
Build an expression from it;
Run the expression giving the firstname, domain, etc. parameters.

Not sure that would do the trick, but you might want to look at CSharpCodeProvier. Never used it, but according to the examples, it seems to be capable of compiling code entered in a textbox.
The thing is that this solution generates an exe file that will be stored in your project folder. Even if you delete them after a successful compiling, that might not be the best option.

Generate list string based on replacement possibility

I'm doing an basic CSV import/export in C#. Most of it is really simple and basic, we just have one speciality.
In values we import/export, we have some specials values, which are not ASCII values. To ease the work of our end users, the customer decided to convert some values in some other values and do the opposite when importing.
Some examples
Value in our application | ValueS that must be accepted on parsing
-----------------------------------------------------------------------
³ | 3, ^3, **3
μ | u
₃ | 3
⁹ | 9
° | deg
φ | phi
To export, it's easy, we replace the matching character by the first on the second column.
But for the parsing, it's more complicated, and I don't see an easy way to get all the possible values to import?
One example:
H³ 3° (asd)₃
Would be exported as
H3 3deg (asd)3
So to parse this correctly, I've to get all the possibilities:
H3 3deg (asd)3 //This may be a real values
H³ 3deg (asd)3
H₃ 3deg (asd)3
H3 ³deg (asd)3
....
What would be the good way of doing this?

I doubt it's possible with such an encoding. All H3 values are equally likely unless there is a rule that differentiates them. This makes parsing more difficult, not less.
What you are trying to do though looks a lot like what has already been done with tools like Latex or even Word. You should probably use the encodings used by Latex since they've already done the work of encoding symbols to human readable and editable keywords that can be parsed easily, eg: use ^ for power, _ for indices, \degree for degrees, etc.
In fact, even Word allows these same keywords nowadays in the Math editor, allowing you to type \sum and get ∑, or \oint for ∮
You should probably tag the fields that contain substitutions, eg by surrounding them in multiple braces, so that users can use the keywords in their own text.

I think you need to exclude ambiguous mappings. E.g.:
³ | ^3, **3
₃ | 3
⁹ | ^9, **9
or
³ | 3, ^3, **3
₃ | _3
⁹ | 9

ASCII has 7 Bits for each character. Now you want to use chars which are implemented in the space of 8 Bits (UTF8 for example).
Now you lose information by converting your UTF8 character to ASCII but you want get back the full information.
To manage this, you need a mask, which helps to recognize the right character.
You could use special characters as your mask. So you don't reinvent the wheel and others can find the documentation all over the internet for your interface.
But if you make ³ => 3, you lose information (3 superscript => 3; where is the superscript and how you should guess the right choice?)

SpecFlow: Scenario Outline Examples

I just starting to work with SpecFlow and really like the tool. However I am running across some issues in relation to example data inputs into the Scenario Outlines.
Just wondering if what I am facing is normal or whether there is a trick to it.
I am using C# Visual Studio 2013 and writing an MVC App using the underscore style of step definition. I have also tried the regular expression style but still get similar issues.
So the issue is I am providing username, password etc as parameters and including sample data in my Examples. It appears that the following occurs: -
I have to put "" around the parameters when 1st generating the scenario, otherwise it does not get picked up as a parameter at all. However when passing data in from the examples I get a "/" at the end of the data passed in. When I go back to the scenario I then remove the "" around the parameter. This is a little frustrating but if that is the best way to handle it I can live with that. Just wondering if anyone has any advice on this point.
The next issue is related to the data itself. It appears if I have any characters such as # or & etc in my data, then it splits that data at that point and feeds it to the next parameter so I get incorrect data being fed through.
I have included my code below - if anyone has any suggestions or resources to look at that would be appreciated.
Feature File
Feature: AccountRegistration
In order to use Mojito services in my organisation
As a guest user
I want to create an account with administration privelages
Scenario Outline: Register with valid details
Given I am on the registration page
And I have completed the form with <email> <organisation> <password> and <passwordConfirmation>
When I have clicked on the register button
Then I will be logged in as <username>
And my account will be assigned the role of <role>
Examples:
| email | organisation | password | passwordConfirmation | username | role |
| usernamea | Bytes | password1 | password1 | usernamea | Admin |
| usernameb | Bytes | password2 | password2 | usernameb | Admin |
| usernamec | Bytes | password3 | password3 | usernamec | Admin |
| usernamed | Bytes | password4 | password4 | usernamed | Admin |
| usernamee | Bytes | password5 | password5 | usernamee | Admin |
Scenario Outline: Register with invalid details
Given I am on the registration page
And I have completed the form with <email> <organisation> <password> and <passwordConfirmation>
When I have clicked on the register button
Then I will get an error message
Examples:
| email | organisation | password | passwordConfirmation |
| Jonesa#mojito.com | Bytes | 1LTIuta&Sc | wrongpassword |
| Jonesb#mojito.com | Bytes | 1LTIuta&Sc | 1LTIuta&Sc |
| Jonesc#mojito.com | No Organisation | 1LTIuta&Sc | 1LTIuta&Sc |
Steps Generated File
[Binding]
public class AccountRegistrationSteps
{
[Given]
public void Given_I_am_on_the_registration_page()
{
ScenarioContext.Current.Pending();
}
[Given]
public void Given_I_have_completed_the_form_with_usernamea_Bytes_password_P0_and_password_P1(int p0, int p1)
{
ScenarioContext.Current.Pending();
}
[Given]
public void Given_I_have_completed_the_form_with_Jonesa_mojito_com_Bytes_P0_LTIuta_Sc_and_wrongpassword(int p0)
{
ScenarioContext.Current.Pending();
}
[When]
public void When_I_have_clicked_on_the_register_button()
{
ScenarioContext.Current.Pending();
}
[Then]
public void Then_I_will_be_logged_in_as_usernamea()
{
ScenarioContext.Current.Pending();
}
[Then]
public void Then_my_account_will_be_assigned_the_role_of_Admin()
{
ScenarioContext.Current.Pending();
}
[Then]
public void Then_I_will_get_an_error_message()
{
ScenarioContext.Current.Pending();
}
}

SpecFlow does handle string parameters by default, the problem is that you left control up to SpecFlow in determining at runtime what your values are.
When you ran "Generate Step Definitions," you selected "Method name - underscores" in the Style dropdown. This left interpreting the input parameters up to SpecFlow, which will create what are called 'greedy' regular expressions to identify the parameter values. This means that it would include the comma as part of the value.
Had you selected "Regular expressions in attributes," (or refactored your code a touch and decorated your attributes by hand) your step could look like this:
[Given(#"I have completed the form with (.*), (.*), (.*), and (.*)")]
public void Given_I_have_completed_the_form_with(string email, string org, string pwd, string conf)
{
//do stuff here
}
This creates a more 'parsimonious' expression that tells SpecFlow to accept strings of any length, up to but not including any trailing commas. Single quotes around the regular expressions would make it even more explicit:
[Given(#"I have completed the form with '(.*)', '(.*)', '(.*)', and '(.*)'")]
Managing the regular expressions yourself can create headaches, but it really exposes the full power of SpecFlow if you do so.

RESOLVED - It was not an issue with the use of characters such as # or &. It was actually using commas in my Given Statement. I found if I used 'and' it works. So to get it working the statement had to be written as below: -
SOLUTION
Write statement as
Given I have completed the form with <email> and <organisation> and <password> and <passwordConfirmation>
Modify statement to put single quotes around paramaters that need to be strings
Given I have completed the form with '<email>' and '<organisation>' and '<password>' and '<passwordConfirmation>'
Generation Step Definitions and then change statement back to exclude single quotes
Given I have completed the form with <email> and <organisation> and <password> and <passwordConfirmation>
A bit of mucking around but it gets the correct results. Hopefully in the future SpecFlow will be updated to handle paramaters as strings as default.

For future reference, if Cater's answer doesn't do the job. I had the following problem
Give I have a <typeOfDay> day
When Im asked how I am
Then I will say <feeling>
Example:
|typeOfDay|feeeling|
|good |happy |
|bad |sad |
You'll notice that "feeling" in the Then statement won't be able to find a corresponding value because of a typo. This causes SpecFlow to throw the "Input string not in a correct format" error. Which in my case took an embarrassingly long time to find.
Something else to check :)

How can I prevent ReSharper from inserting a blank line between my MSpec fields?

When I write an MSpec context like this:
[Subject(typeof(TheType), "Concern")]
internal class when_this_test_is_run
{
Establish context = () =>
{
// some code...
};
Because of = () => Do.Something();
It should_do_this;
It should_do_that;
}
When I let ReSharper reformat the code, it always inserts a blank line beneath any of the delegates that is an anonymous method, i.e. has a { block } as its body. It doesn't insert blank lines after delegates that are simple expressions. So in the example above, the Establish context delegate gest a blank line, but the Because of and the It delegates do not.
This is driving me crazy as I don't want it to insert the blank lines, but I can't figure out what setting I need to change to stop it happening.
Any ideas?

Try this:
ReSharper | Options | Code Editing | C# | Formatting Style | Blank Lines | Preserve existing formatting | Keep max blank lines in declaration | Select '0'
ReSharper | Options | Code Editing | C# | Formatting Style | Blank Lines | Blank lines | Around field | Uncheck

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Incorrect string value: '\xC2\x9Fe 10...' for column - c#

Related

Extending BSON `$type` attribute for complex object?

Run a string as an instruction to build an email

Generate list string based on replacement possibility

SpecFlow: Scenario Outline Examples

How can I prevent ReSharper from inserting a blank line between my MSpec fields?

Categories

Resources