NServiceBus exception and subscriber not able to read queue - c#

I've been battling with this for a few days now with no luck. I've setup a NServiceBus project by following the guidelines on the getting started guide.
My MVC site successfully pushed messages onto my queue, but the subscriber never reads them off. No error messages until after about a minute, where the following pops up on the console:
2012-06-26 13:05:43,648 [1] FATAL NServiceBus.Hosting.GenericHost [(null)] <(null)> - Autofac.Core.DependencyResolutionException: An exception was thrown while invoking the constructor 'Void .ctor(Raven.Client.IDocumentStore)' on type 'RavenTimeoutPersistence'. ---> System.Net.WebException: The operation has timed out at System.Net.HttpWebRequest.GetResponse()
at Raven.Client.Connection.HttpJsonRequest.ReadStringInternal(Func`1 getRespo
nse)
at Raven.Client.Connection.HttpJsonRequest.ReadResponseString()
at Raven.Client.Connection.HttpJsonRequest.ReadResponseJson()
I can successfully connect to Raven via the web portal, and I can see my queues listed there - so am at a loss as to why NServiceBus cannot read the messages on the queue. I've reinstalled MSMQ, rebooted machine, re-installed NServiceBus - nothing seems to work.
Does anyone have any idea as to whats going wrong here?

This turned out to be another victim of Kaspersky Anti Virus. Even though firewall was set to "Allow all network connections", turning it off solved the issue.

Related

.NET/C# to MySql running on linux - exception on first command, but subsequent commands do work

Have a really crazy situation. I can't post specifics, so I'm just looking for general guidance. We have already opened a ticket with Oracle/MySql support. I'm just looking to see if anyone else has run into this situation or anything similar. Here is our scenario:
Windows 2012 R2 Server with .NET 4.7.1 running.
Simple Windows Forms .NET application.
We are trying to run a simple query against a Linux MySql Server. MySql is Enterprise Version 5.7.x.
On the first attempted connection, the Windows Forms app locks the UI, waits about 15 seconds, and then reports back that there is an error running the command. The error is shown below.
System.ApplicationException: An exception occurred on the following sql command:select * from tablename where compl_date >= '2019-12-17 04:44:34 PM' ---> MySql.Data.MySqlClient.MySqlException: Authentication to host 'ip address' for user 'userid' using method 'mysql_native_password' failed with message: Reading from the stream has failed. ---> MySql.Data.MySqlClient.MySqlException: Reading from the stream has failed. ---> System.IO.EndOfStreamException: Attempted to read past the end of the stream.
When this error pops up, if I click on the "Continue" button, subsequent calls to the database work as intended (at about a 95% rate).
On the server, the mysqld error logs are shown below for the first call. Subsequent calls do work.
2019-12-16T22:06:29.554171Z 3496 [Warning] IP address 'client ip address' could not be resolved: Name or service not known
2019-12-16T22:06:50.188443Z 3496 [Note] Aborted connection 3496 to db: 'drupaldb' user: 'userid' host: 'ip address' (Got an error reading communication packets)
2019-12-17T02:53:17.832725Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 11355ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
2019-12-17T03:25:18.200855Z 3527 [Note] Got an error reading communication packets
2019-12-17T03:25:37.167395Z 3528 [Note] Got packets out of order
2019-12-17T03:25:37.382512Z 3529 [Note] Got packets out of order
2019-12-17T03:25:47.688836Z 3530 [Note] Bad handshake
2019-12-17T14:26:33.619967Z 4803 [Note] Got timeout reading communication packets
2019-12-17T19:34:34.741441Z 4851 [Note] Got timeout reading communication packets
2019-12-17T19:47:47.595426Z 4853 [Note] Got timeout reading communication packets
2019-12-17T19:48:45.586357Z 4854 [Note] Got timeout reading communication packets
If you have some general ideas, let me know.
FYI, we have some other linux/mysql instances, and this runs just fine.
At this point, we think we have solved the problem, at least for the short term. Both server and client are sitting on a private network. We think that the database server is trying to send a certificate to the windows client. The windows client is also on this private network. We think the Windows Client is not accepting the ssl certificate and that this is causing the failure on the first connection attempt. By adding the option "SslMode=None", this seems to resolve the issue.
Blog post we found that helped us: https://blog.csdn.net/fancyf/article/details/78295964

What would cause AmqpException in azure wcf relay?

I am logging the connection status events with the wcf relay, and I'm seeing something like this in the logs.
1/26 06:47:12 ERROR Service Bus ConnectionStatus: 'Reconnecting' Event. [(null)][42]
LastError: System.ServiceModel.CommunicationException: Exception of type 'System.ServiceModel.CommunicationException' was thrown. ---> Microsoft.ServiceBus.Messaging.Amqp.AmqpException: An AMQP error occurred (condition='amqp:unauthorized-access').
--- End of inner exception stack trace ---
This exception doesn't show up in the list on this microsoft page, and the only other post I can find anywhere related to this error message is here. However, that post does not have any recent comments or a resolution or workaround for the issue. Also, the exception doesn't have any stacktrace, so how am I supposed to troubleshoot this error?
I guess as a follow-up, I would ask whether this is anything to actually worry about if the wcf connection is never faulting.
Apparently, the token that the relay keeps refreshing to stay active requires the time on the server to match the azure service that it is connecting with, and if not, this type of error will show up. We were able to fix it by correcting the server time.

Visual Studio '15 Web Deploy continuously failing

I'm trying to deploy an ASP.NET MVC 5 application to a shared hosting account (SmarterASP). It was working perfectly fine but recently it mysteriously stopped working for no reason. I tried contacting support, spent 2 days talking to them and they say "it's working fine for us, something's wrong at your end". I'm not sure whether it's my end. The task always fails with this:
3>Start Web Deploy Publish the Application/package to *url removed* ...
3>C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v14.0\Web\Microsoft.Web.Publishing.targets(4295,5): Error : Web deployment task failed. ((12-Nov-16 3:07:55 PM) An error occurred when the request was processed on the remote computer.)
3>
3>(12-Nov-16 3:07:55 PM) An error occurred when the request was processed on the remote computer.
3>The server experienced an issue processing the request. Contact the server administrator for more information.
3>Publish failed to deploy.
Nothing else at all. No further error messages. Is there any I could get a detailed error message what EXACTLY is causing the process to fail? I'm already considering to move to a VPS if the issue persists.

Rebus stops retrieving messages from RabbitMQ

We have an issue in our Rebus/RabbitMQ setup where Rebus suddenly stops retrieving/handling messages from RabbitMQ. This has happened two times in the last month and we're not really sure how to proceed.
Our RabbitMQ setup has two nodes on different servers, and the Rebus side is a windows service.
We see no errors in Rebus or in the eventlog on the server where Rebus runs. We also do not see errors on the RabbitMQ servers.
Rebus (and the windows service) keeps running as we do see other log messages, like the DueTimeOutSchedular and timeoutreplies. However it seems the worker thread stops running, but without any errors being logged.
It results in a RabbitMQ input queue that keeps growing :(, we're adding logging to monitor this so we get notified if it happens again.
But I'm looking for advise on how to continue the "investigation" and ideas on how to prevent this. Maybe some of you have experienced this before?
UPDATE
It seems that we actually did have a node crashing, at least the last time it happened. The master RabbitMQ node crashed (the server crashed) and the slave was promoted to master. As far as I can see from the RabbitMQ logs on the nodes everything went according to planned. There are no other errors in the RabbitMQ logs.
At the time this happened Rebus was configured to connect only to the node that was the slave (then promoted to master) so Rebus did not experience the rabbitmq failure and thus no Rebus connection errors. However, it seems that Rebus stopped handling messages when the failure occurred.
We are actually experiencing this on a few queues it seems, and some of them, but not all seems to have ended up in an unsynchronized state.
UPDATE 2
I was able to reproduce the problem quite easily, so it might be a configuration issue in our setup. But this is what we do to reproduce it
Start two nodes in a cluster, ex. rabbit1 (master) and rabbit2 (slave)
Rebus connects to rabbit2, the slave
Close rabbit1, the master. rabbit2 is promoted to master
The queues are mirrored
We have two small tests apps to reproduce this, a "sender" that sends a message every second and a "consumer" that handles the messages.
When rabbit1 is closed, the "consumer" stops handling messages, but the "sender" keeps sending the messages and the queue keeps growing.
Start rabbit1 again, it joins as slave
This has no effect and the "consumer" still does not handle messages.
Restart the "consumer" app
When the "consumer" is restarted it retrieves all the messages and handles them.
I think I have followed the setup guides correctly, but it might be a configuration issue on our part. I can't seem to find anything that would suggest what we have done wrong.
Rebus is still connected to RabbitMQ, we see that in the connections tab on the management site, the "consumers" send/recieved B/s drop to about 2 B/s when it stops handling messages
UPDATE 3
Ok so I downloaded the Rebus source and attached to our process so I could see what happens in the "RabbitMqMessageQueue" class when it stops. When "rabbit1* is closed the "BasicDeliverEventArgs" is null, this is the code
BasicDeliverEventArgs ea;
if (!threadBoundSubscription.Next((int)BackoffTime.TotalMilliseconds, out ea))
{
return null;
}
// wtf??
if (ea == null)
{
return null;
}
See: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L178
I like the "wtf ??" comment :)
That sounds very weird!
Whenever Rebus' RabbitMQ transport experiences an error on the connection, it will throw out the connection, wait a few seconds, and ensure that the connection is re-established again when it can.
You can see the relevant place in the source here: https://github.com/rebus-org/Rebus/blob/master/src/Rebus.RabbitMQ/RabbitMqMessageQueue.cs#L205
So I guess the question is whether the RabbitMQ client library can somehow enter a faulted state, silently, without throwing an exception when Rebus attemps to get the next message...?
When you experienced the error, did you check out the 'connections' tab in RabbitMQ management UI and see if the client was still connected?
Update:
Thanks for you thorough investigation :)
The "wtf??" is in there because I once experienced a hiccup when ea had apparently been null, which was unexpected at the time, thus causing a NullReferenceException later on and the vomiting of exceptions all over my logs.
According to the docs, Next will return true and set the result to null when it reaches "end-of-stream", which is apparently what happens when the underlying model is closed.
The correct behavior in that case for Rebus would be to throw a proper exception and let the connection be re-established - I'll implement that right away!
Sit tight, I'll have a fix ready for you in a few minutes!

Crashing the AzureDiagnostics agent on compute emulator

We've recently started running into a problem with the Windows Azure Computer Emulator in which the DiagnosticsAgent crashes on role startup, however we still get all our traces to the compute emulator console. I'm not 100% I've even really got an issue, but I don't want there to be a lingering bug that causes me trouble when I deploy to the cloud. Has anyone seen error messaging similar to the below, and if so, what were you able to do about it? I've not been able to find any info on SO or the remainder of the interwebs.
Thanks in advance.
[Diagnostics]: Error starting diagnostics:
System.Net.WebException: The server committed a protocol violation. Section=ResponseStatusLine
at System.Net.HttpWebRequest.GetResponse()
at Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitor.ValidateEndpointValid(CloudStorageAccount acct, Action`1 error)
at Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitor.StartWithExplicitConfiguration(DiagnosticMonitorStartupInfo startupInfo, DiagnosticMonitorConfiguration initialConfiguration)
at Microsoft.WindowsAzure.Plugins.Diagnostics.DiagnosticsAgentManager.<StartAgent>b__0()

Categories