Random unable to connect from Azure to On-Premise MySql - c#

We are using different services in the Azure Cloud (App services / function apps) to connect to a On-Premise MySql database on Ubuntu. This works the most time very well, but sometimes and totally random we received the following error.
Detail:
The connection works most time great and everything seems correct configured. Only sometimes e.g. every 27th request runs into an error. I assume the received error message is maybe wrong.
Example extract from our log:
2018-02-21 15:53:55.647 +00:00 [Error] Microsoft.EntityFrameworkCore.Database.Connection: An error occurred using the connection to database 'XXXXXXX' on server 'XXX.XXX.XXX.XXX'.
MySql.Data.MySqlClient.MySqlException (0x80004005): Unable to connect to any of the specified MySQL hosts.
at MySqlConnector.Core.ServerSession.<ConnectAsync>d__56.MoveNext() in C:\projects\mysqlconnector\src\MySqlConnector\Core\ServerSession.cs:line 212
We are connecting to this database from different sites, but this error only appears from the Azure Cloud.
We double-checked the connection string, firewall settings and outgoing IPs several times.
Connection String:
Persist Security Info=True;server=xxx.xxx.xxx.xxx;database=xxxxxx;uid=xxxxx;password=xxxxx;port=3306;
On some days we doesn’t received this error, but on the other day many times per day and totally random. It would be great to get help from you to identify the problem and solve it soon. Thanks a lot!

It is hard to help without a code example.
You can only check the common things:
- connection string correct order
"Server=TheServerAddress; Port=YourPortNumber; Database=YourDatabase; Uid=YourUsername; Pwd=YourPassword;"
- add default port. Ex.: 3306 to connection string
- check the query code async or not and if its the same in other application
Also you can try a retry:
https://blogs.msdn.microsoft.com/bartr/2010/06/20/sql-azure-connection-retry-update/
http://peterkellner.net/2011/01/21/sqlazure-connection-retry-problem-using-best-practices/

Related

Fatal NServiceBus exception when starting endpoint, cannot open database

We just made some changes to our project and deployed it. We have 7 endpoint databases in our SQL Server database for NServiceBus. We recently added 2 more. One of the two services using one of the two new endpoints fails to start. When looking at our log file we see the following error:
FATAL NServiceBus.GenericHost [(null)] - Exception when starting endpoint.
System.Data.SqlClient.SqlException (0x80131904): Cannot open database "A_NSBEP_WebEventProcessor" requested by the login. The login failed. Login failed for user 'xxxx'
The connection string is
'connection.connection_string'='Data Source=localhost\DevSQLServer;Initial Catalog=A_NSBEP_WebEventProcessor;Integrated Security=true;Enlist=true'
All of the connection strings are configured in OctopusDeploy. They are all identical except obviously for the database name. All of the other services start and login to their respective endpoint with no problem. I even went through the process of creating a .udl file, connecting to the database and using that slightly different connection string. Still without success.
Any thoughts?

.NET/C# to MySql running on linux - exception on first command, but subsequent commands do work

Have a really crazy situation. I can't post specifics, so I'm just looking for general guidance. We have already opened a ticket with Oracle/MySql support. I'm just looking to see if anyone else has run into this situation or anything similar. Here is our scenario:
Windows 2012 R2 Server with .NET 4.7.1 running.
Simple Windows Forms .NET application.
We are trying to run a simple query against a Linux MySql Server. MySql is Enterprise Version 5.7.x.
On the first attempted connection, the Windows Forms app locks the UI, waits about 15 seconds, and then reports back that there is an error running the command. The error is shown below.
System.ApplicationException: An exception occurred on the following sql command:select * from tablename where compl_date >= '2019-12-17 04:44:34 PM' ---> MySql.Data.MySqlClient.MySqlException: Authentication to host 'ip address' for user 'userid' using method 'mysql_native_password' failed with message: Reading from the stream has failed. ---> MySql.Data.MySqlClient.MySqlException: Reading from the stream has failed. ---> System.IO.EndOfStreamException: Attempted to read past the end of the stream.
When this error pops up, if I click on the "Continue" button, subsequent calls to the database work as intended (at about a 95% rate).
On the server, the mysqld error logs are shown below for the first call. Subsequent calls do work.
2019-12-16T22:06:29.554171Z 3496 [Warning] IP address 'client ip address' could not be resolved: Name or service not known
2019-12-16T22:06:50.188443Z 3496 [Note] Aborted connection 3496 to db: 'drupaldb' user: 'userid' host: 'ip address' (Got an error reading communication packets)
2019-12-17T02:53:17.832725Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 11355ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
2019-12-17T03:25:18.200855Z 3527 [Note] Got an error reading communication packets
2019-12-17T03:25:37.167395Z 3528 [Note] Got packets out of order
2019-12-17T03:25:37.382512Z 3529 [Note] Got packets out of order
2019-12-17T03:25:47.688836Z 3530 [Note] Bad handshake
2019-12-17T14:26:33.619967Z 4803 [Note] Got timeout reading communication packets
2019-12-17T19:34:34.741441Z 4851 [Note] Got timeout reading communication packets
2019-12-17T19:47:47.595426Z 4853 [Note] Got timeout reading communication packets
2019-12-17T19:48:45.586357Z 4854 [Note] Got timeout reading communication packets
If you have some general ideas, let me know.
FYI, we have some other linux/mysql instances, and this runs just fine.
At this point, we think we have solved the problem, at least for the short term. Both server and client are sitting on a private network. We think that the database server is trying to send a certificate to the windows client. The windows client is also on this private network. We think the Windows Client is not accepting the ssl certificate and that this is causing the failure on the first connection attempt. By adding the option "SslMode=None", this seems to resolve the issue.
Blog post we found that helped us: https://blog.csdn.net/fancyf/article/details/78295964

IIS can't get access to database of context

I have some troubles with IIS client. I mean, when I use tests or just connect in my DB in MSSMS everything is fine. But when I open my site I get the error:
provider: SQL Network Interfaces, error: 50 - Local Database Runtime error occurred.
** - for secure ofc.
Connection string what I found when I debug code:
Data Source=(LocalDB)\MSSQLLocalDB;integrated security=false;Initial Catalog=**;User ID=**;Password=**;Connection Timeout=60;MultipleActiveResultSets=True;Max Pool Size=1024
Maybe someone can help me, idk how I should fix that because that is rly strange bug. I mean... all tests works ( they have same connections string)

: MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server

So, I am using MongoDB on a Azure VM and I have a web site hosted on Azure web Sites as a service.
My problem is: Sometimes I get an error like this:
"Exception: MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"
After I got this error, just I access the endpoint again and I have success immediately.
Details:
ConnectionString: mongodb://xxx.aaa.net:1000, xxx.aaa.net:1001, xxx.aaa.net:1002/?readPreference=nearest
Before open the conection with Mongo I set the MaxConnectioIdleTimeout property like this "MongoDefaults.MaxConnectionIdleTime = TimeSpan.FromSeconds(30);" , To keep alive connection, because the idle timeout for azure LoadBalancer become inactivy after 4 minutes by default.
So, I don't know what's going on.
Can anybody help me?
Yes, 4 minutes. Azure closes a connection session if it keeping on idling up to 4 minutes.But the mongodb driver don't know it,it will still get this connection form connection pool.
You can either set the MaxConnectionIdleTime(in mongodb driver settings) less than 4 minutes, or set the max session idle time of azure VM more larger by powershell (30 minutes max).
BTW, if your web site which also hosted on azure access the mongodb server with the private IP(PIP), there won't be this problem.
As I asked #apodemakels for some official documentation, I have found MongoDB Production Notes where it states:
The TCP idle timeout on the Azure load balancer is 240 seconds by default, which can cause it to silently drop connections if the TCP keepalive on your Azure systems is greater than this value. You should set tcp_keepalive_time to 120 to ameliorate this problem.
In addition, there's a ticket on MongoDB Jira with regard to this issue.
I hope both these documents may help with this or in case someone faces a similar situation.

How do you deal with transport-level errors in SqlConnection?

Every now and then in a high volume .NET application, you might see this exception when you try to execute a query:
System.Data.SqlClient.SqlException: A transport-level error has
occurred when sending the request to the server.
According to my research, this is something that "just happens" and not much can be done to prevent it. It does not happen as a result of a bad query, and generally cannot be duplicated. It just crops up maybe once every few days in a busy OLTP system when the TCP connection to the database goes bad for some reason.
I am forced to detect this error by parsing the exception message, and then retrying the entire operation from scratch, to include using a new connection. None of that is pretty.
Anybody have any alternate solutions?
I posted an answer on another question on another topic that might have some use here. That answer involved SMB connections, not SQL. However it was identical in that it involved a low-level transport error.
What we found was that in a heavy load situation, it was fairly easy for the remote server to time out connections at the TCP layer simply because the server was busy. Part of the reason was the defaults for how many times TCP will retransmit data on Windows weren't appropriate for our situation.
Take a look at the registry settings for tuning TCP/IP on Windows. In particular you want to look at TcpMaxDataRetransmissions and maybe TcpMaxConnectRetransmissions. These default to 5 and 2 respectively, try upping them a little bit on the client system and duplicate the load situation.
Don't go crazy! TCP doubles the timeout with each successive retransmission, so the timeout behavior for bad connections can go exponential on you if you increase these too much. As I recall upping TcpMaxDataRetransmissions to 6 or 7 solved our problem in the vast majority of cases.
This blog post by Michael Aspengren explains the error message "A transport-level error has occurred when sending the request to the server."
To answer your original question:
A more elegant way to detect this particular error, without parsing the error message, is to inspect the Number property of the SqlException.
(This actually returns the error number from the first SqlError in the Errors collection, but in your case the transport error should be the only one in the collection.)
I had the same problem albeit it was with service requests to a SQL DB.
This is what I had in my service error log:
System.Data.SqlClient.SqlException: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
I have a C# test suite that tests a service. The service and DB were both on external servers so I thought that might be the issue. So I deployed the service and DB locally to no avail. The issue continued. The test suite isn't even a hard pressing performance test at all, so I had no idea what was happening. The same test was failing each time, but when I disabled that test, another one would fail continuously.
I tried other methods suggested on the Internet that didn't work either:
Increase the registry values of TcpMaxDataRetransmissions and TcpMaxConnectRetransmissions.
Disable the "Shared Memory" option within SQL Server Configuration Manager under "Client Protocols" and sort TCP/IP to 1st in the list.
This might occur when you are testing scalability with a large number of client connection attempts. To resolve this issue, use the regedit.exe utility to add a new DWORD value named SynAttackProtect to the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ with value data of 00000000.
My last resort was to use the old age saying "Try and try again". So I have nested try-catch statements to ensure that if the TCP/IP connection is lost in the lower communications protocol that it does't just give up there but tries again. This is now working for me, however it's not a very elegant solution.
use Enterprise Services with transactional components
I have seen this happen in my own environment a number of times. The client application in this case is installed on many machines. Some of those machines happen to be laptops people were leaving the application open disconnecting it and then plugging it back in and attempting to use it. This will then cause the error you have mentioned.
My first point would be to look at the network and ensure that servers aren't on DHCP and renewing IP Addresses causing this error. If that isn't the case then you have to start trawlling through your event logs looking for other network related.
Unfortunately it is as stated above a network error. The main thing you can do is just monitor the connections using a tool like netmon and work back from there.
Good Luck.
You should also check hardware connectivity to the database.
Perhaps this thread will be helpful:
http://channel9.msdn.com/forums/TechOff/234271-Conenction-forcibly-closed-SQL-2005/
I'm using reliability layer around my DB commands (abstracted away in the repository interfaece). Basically that's just code that intercepts any expected exception (DbException and also InvalidOperationException, that happens to get thrown on connectivity issues), logs it, captures statistics and retries everything again.
With that reliability layer present, the service has been able to survive stress-testing gracefully (constant dead-locks, network failures etc). Production is far less hostile than that.
PS: There is more on that here (along with a simple way to define reliability with the interception DSL)
I had the same problem. I asked my network geek friends, and all said what people have replied here: Its the connection between the computer and the database server. In my case it was my Internet Service Provider, or there router that was the problem. After a Router update, the problem went away. But do you have any other drop-outs of internet connection from you're computer or server? I had...
I experienced the transport error this morning in SSMS while connected to SQL 2008 R2 Express.
I was trying to import a CSV with \r\n. I coded my row terminator for 0x0d0x0a. When I changed it to 0x0a, the error stopped. I can change it back and forth and watch it happen/not happen.
BULK INSERT #t1 FROM 'C:\123\Import123.csv' WITH
( FIRSTROW = 1, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0d0x0a' )
I suspect I am not writing my row terminator correctly because SQL parses one character at a time right while I'm trying to pass two characters.
Anyhow, this error is 4 years old now, but it may provide a bit of information for the next user.
I just wanted to post a fix here that worked for our company on new software we've installed. We were getting the following error since day 1 on the client log file: Server was unable to process request. ---> A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.) ---> The semaphore timeout period has expired.
What completely fixed the problem was to set up a link aggregate (LAG) on our switch. Our Dell FX1 server has redundant fiber lines coming out of the back of it. We did not realize that the switch they're plugged into needed to have a LAG configured on those two ports. See details here: https://docs.meraki.com/display/MS/Switch+Ports#SwitchPorts-LinkAggregation

Categories