Datastax C# driver 3.3.0 deadlocking on connect to cluster? - c#

To Datastax C# driver engineers:
C# driver 3.3.0 is deadlocking while calling to Connect(). The following code snippet on Windows Forms will deadlock trying to connect:
public void SimpleConnectTest()
{
const string ip = "127.0.0.1";
const string keyspace = "somekeyspace";
QueryOptions queryOptions = new QueryOptions();
queryOptions.SetConsistencyLevel(ConsistencyLevel.One);
Cluster cluster = Cluster.Builder()
.AddContactPoints(ip)
.WithQueryOptions(queryOptions)
.Build();
var cassandraSession = cluster.Connect(keyspace);
Assert.AreNotEqual(null, cassandraSession);
cluster.Dispose();
}
Deadlocking happens here:
Cluster.cs ->
private void Init()
{
...
TaskHelper.WaitToComplete(_controlConnection.Init(), initialAbortTimeout);
...
}
I have tested this on Cassandra 3.9.0, CQL spec 3.4.2 on local machine.
Everything deadlocks on calling this method _controlConnection.Init() here:
task = Id = 11, Status = WaitingForActivation, Method = "{null}", Result = "{Not yet computed}"
This then just runs for 30000ms and throws this:
throw new TimeoutException(
"Cluster initialization was aborted after timing out. This mechanism is put in place to" +
" avoid blocking the calling thread forever. This usually caused by a networking issue" +
" between the client driver instance and the cluster.", ex);
Running same test on 3.2.0 has no such problems. Can anyone else test this? Maybe this just happens to me.
Edit:
Here is the screenshot for the deadlock:

Thanks to the details in your comments, we were able to identify the underlying issue.
Similar to what was proposed by Luke, there were some missing ConfigureAwait() calls.
This issue impacts users that are calling Cluster.Connect() on environments with SynchonizationContext which is not a common use case:
For Windows Forms, its unlikely to communicate directly to a database (without a service in the middle). Furthermore, users should call Connect() before creating a form (where there is no SynchonizationContext) to share the same Session instance across all forms.
For ASP.NET, users should call Connect() outside of any endpoint action, before the HttpContext is created (where there is no SynchonizationContext).
Note that this issue affects only Connect() calls. Other blocking calls like Execute() don't have this issue.
In any case, this issue could be a showstopper for users getting started with the driver, for example, users creating a simple windows forms app to try a concept.
I've submitted a pull request with the fix, which also contains a test that looks into the source code for the usage of await without ConfigureAwait() calls to avoid having this issue in the future:
https://github.com/datastax/csharp-driver/pull/309
You can expect the fix to land in the next patch release.

I can't reproduce the problem, but I suspect the problem might be with a recent change to make the connection process asynchronous internally. I don't know for sure, but tracing through the Connect code, I suspect it might be a missing ConfigureAwait(false). In particular, it looks like the Reconnect method (which could definitely get hit as part of that Init code path) is missing one after that commit. It's possible that I'm not able to reproduce it because I'm not hitting the Reconnect code path while for some reason you are in your environment.
I'm not 100% sure that's the culprit, but I opened a PR to fix it. Stephen Cleary wrote a great explanation on why this can happen in Forms/Web apps. You could try building the driver from my fork to see if that change fixes the problem, or wait and see what happens with the PR and a new release. If it's still happening, I'd suggest opening an issue on the JIRA.
Hope that helps!

Issue has been opened here with workaround:
https://datastax-oss.atlassian.net/projects/CSHARP/issues/CSHARP-579
For anyone experiencing the same - just wrap your connection code into a new task.
Task.Run(() =>
{
SimpleConnectTest();
});

Related

gRPC client having lots of issues and not working

I started using gRPC with Visual Studio 2022 and never saw so many issues as now.
When I wanted to create an additional Proto file I got the error saying that I need to specify the language. That's weird, because I selected the proper option which is designed for C#. So it never worked and I simply had to copy default greeter file.
I created the console app which uses this default greeter service and it even worked. But I added the additional proto and created another fairy simple service and it did not want to compile it referring to some missing types of something. I can't remember the exact error message but I resolved it only by reducing the grpc.* package version to 2.27.I found this answer by googling and I find it weird that Microsoft releases something what does not work in the most simple case scenario.
I decided to test my new test grpc service and created the client:
var channel = GrpcChannel.ForAddress("https://localhost:5001");
var client = new Greeter.GreeterClient(channel);
var reply = await client.MySimpleMethodAsync(new MyRequest { Id = 123 });
Console.WriteLine(reply.Message);
Console.ReadKey();
The MySimpleMethodAsync method is very simple, it just reads the record from the DB using Dapper, nothing special.
Surprisingly there was no compilation error, but when I tried to run it (along with the server app) I got the exception on the line var reply = await client.MySimpleMethodAsync, saying Grpc.Core.RpcException: 'Status(StatusCode=Unimplemented, Detail="Service is unimplemented.")'
I don't understand why it says so. The service is implemented, it's compilable! Googling did not help but I found that other people are having the same issue too.
Eventually I found that if I modify the grpc service and for some reason it does not like it and then I rollback the changes, it's not compilable anymore! I clearn solution, rebuild it - nothing helps! The only thing which helps is addting the brand new project and copy pasting the previous "stable" code.
I've never seen such the ...technology that never works!
Anyway, now the most important issue for me is #3 , why it says the service is not implemented?

Using RunImpersonated for an HttpClient call fails for a NUnit test, but works in Console

I need to have my tests run as a testing account. To accomplish that I setup to the following code to create a handle into my testing account:
SafeAccessTokenHandle testAccountHandle;
bool returnValue = LogonUser("TestAccount", "myDom.net",
"pass", 2, 0, out testAccountHandle);
I can then make a call to load a URL:
HttpResponseMessage response = null;
await WindowsIdentity.RunImpersonated<Task>(testAccountHandle, async () =>
{
var url = "https://accounts.google.com/.well-known/openid-configuration";
response = await httpClient.GetAsync(url);
});
testAccountHandle.Dispose();
When I run this in a console application, it works just fine. (Likewise in LinqPad.)
However when I run this code in an NUnit test, I get the following error:
System.Net.Sockets.SocketException : This is usually a temporary error during hostname resolution and means that the local server did not receive a response from an authoritative server.
It says it is usually a temporary error, but it happens every single time I run impersonated in the NUnit Test, and never when I run in the Console. It also never happens when I run in the NUnit test if I am not running impersonated. (In short it ONLY happens when in an NUnit Test and Impersonated.)
I am not sure how to go about debugging this. It seems clear that NUnit does not like my impersonation, but I am not sure what to do about it.
How can I make a successful HttpClient.GetAsync call while using RunImpersonated in an NUnit test?
NOTE: Full repro code can be found here: https://github.com/nunit/nunit/issues/3672
This appears to be a bug with .NET Core. See the open issue and discussion here: https://github.com/dotnet/runtime/issues/29935
It seems that WindowsIdentity.RunImpersonated() works differently in .NET Core compared with .NET Framework. This is causing a variety of issues including one affecting ASP.NET Core, #29351.
There is some difference in the way that the identity token permissions are getting set on the impersonated token. This is causing "access denied" issues in a variety of ways.
Workaround
Issue #29351 from ASP.NET Core references this exact error message and contains a workaround. Setting the environment variable DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER to 0 disables the SocketsHttpHandler and makes this problem go away. For example, the following works without the error:
Environment.SetEnvironmentVariable("DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER", "0");
HttpResponseMessage response = null;
await WindowsIdentity.RunImpersonated<Task>(testAccountHandle, async () =>
{
var url = "https://accounts.google.com/.well-known/openid-configuration";
response = await httpClient.GetAsync(url);
});
You may need to make considerations for only setting this environment variable for this specific test and not for every test. I'm not sure what the impacts of disabling the SocketsHttpHandler are, so use the workaround at your own risk.

MongoClient ignores the connection string

I just started to use MongoDB(4.4) and its C# driver. I set up my MongoDB with the default option, localhost:27017. Then I turned on the authorization, created a superuser with root permission, like this:
db.createUser(
{
user: "superuser",
pwd: "123",
roles: [ "root" ]
}
)
I tested it on both Mongo shell and Compass, it all worked as expected, connected with correct password, and denied with the wrong one.
Then I created a C# windows form app, use NuGet to install all required packages such as MongoDB.Driver for C#(v2.11.0) and its related packages such as MongoDB.Bson, etc
After that, I used the following code to create a MongoClient:
MongoClient client = new MongoClient( "mongodb://superuser:12#localhost:27017" );
So I expected it should throw an exception because I used the wrong password, "12" in this case. But it didn't. Then I tried to list database names with:
client.ListDatabaseNames();
It threw a timeout exception: "A timeout occured after 30000ms selecting a server using CompositeServerSelector"
Even I used the correct password or turned off the authorization, and just go with "mongodb://localhost:27017" or ""mongodb://127.0.0.1:27017", it still threw the same timeout exception.
It feels like something wrong with that client it created which caused the timeout issue later on. But I couldn't figure out what I am missing.
Thank you for your help!
Edit:
The same code works perfectly in a console app, just not a windows form app, which really confuses me.
After trial and error for 2 days, I finally found a work around for this issue, still don't know why this works though.
So basically I have to split the MongoClient creation and its following function calls separately. I can't do anything related to the MongoClient right after its creation. For example, the following throws the time out exception:
MongoClient client = new MongoClient( "mongodb://localhost:27017" ); //I turned off authorization
client.ListDatabaseNames(); //Throw time out exception here!!!
I have to split them separately in 2 functions call, like one in a "Connect" button event, another one in a "ListDatabaseNames" button event.
Once I did that, everything works fine.
mongodb://superuser:12#localhost:27017?authMechanism=SCRAM-SHA-1
because dotNet not support SCRAM-SHA-256 yet
Creating a client does not perform any network operations like connecting to your MongoDB deployment - that is done in background. Hence incorrect credentials won't make client creation fail.
mongo shell works differently and compass probably performs some queries that would fail if credentials aren't correct.
Why you are getting a timeout error - my guess is it's an ipv4/ipv6 difference. Try 127.0.0.1 instead of localhost. If this doesn't help enable debug information in your driver.

Mongo C# driver try reconnect on connection failure

Is there a common way to recover from a connection error in MongoDB with the C# driver?
Currently, my Windows service shuts down if MongoDB is turned off. I currently have my app structured like this at the start of my Windows service:
//Set up connections for Mongo
var con = new MongoConnectionStringBuilder(ConfigurationManager.ConnectionStrings["MongoDB"].ConnectionString);
var client = new MongoClient(con.ToString());
var server = client.GetServer();
var db = server.GetDatabase(con.DatabaseName);
I then inject the db object into my repositories.
I'm trying to find something like an event handler or a condition I could listen to in my whole application to prevent from crashing the entire service should mongo go down for some reason.
As suggested by the driver document, the MongoClient is added to manage Replica Set stuff, which earlier or later you will need. To avoid mass code refactoring then, you need to make better use of it now.
The MongoClient, which is thread-safe, have implemented the failover logic among replica set nodes already. It's supposed to be singleton along with your application domain. Thus you can inject the MongoClient, other than db (which is not even thread safe).
So always retry the GetServer() and GetDatabase() from MongoClient, and try/catch the exceptions produced by them would finally give you the available db object when MongoDB is online again.
The point is, MongoDB will not notify the clients about its online, so there's no such event to notify you, either. You'll have to keep trying in your client side until it's ok. And to avoid the exceptions to bring down your service, you'll have to catch them.
EDIT: I am wrong about the thread-safety according to the document. However, it doesn't change the fact you shouldn't store MongoDatabase for future migration to replica set.
In addition to yaoxing answer, wanted to do show code piece to solve this issue.
var client = new MongoClient(connString);
var server = client.GetServer();
while (server.State == MongoServerState.Disconnected)
{
Thread.Sleep(1000);
try
{
server.Reconnect();
}
catch (Exception ex)
{
Debug.WriteLine("Failed to connect mongodb {0} Attempt Count: {1}",
ex, server.ConnectionAttempt);
}
}

Connection Pooling with NEST ElasticSearch Library

I'm currently using the NEST ElasticSearch C# Library for interacting with ElasticSearch. My project is an MVC 4 WebAPI project that basically builds a RESTful webservice for accessing directory assistance information.
We've only just started working with NEST, and have been stumbling over the lack of documentation. What's there is useful, but it's got some very large holes. Currently, everything we need works, however, we're running into an issue with connections sometimes taking up to a full second. What we'd like to do is use some sort of connection pooling, similar to how you'd interact with SQL Server.
Here is the documentation on how to connect using nest: http://mpdreamz.github.com/NEST/concepts/connecting.html
Here is the relevant code snippet from our project:
public class EOCategoryProvider : IProvider
{
public DNList ExecuteQuery(Query query)
{
//Configure the elastic client and it's settings
ConnectionSettings elasticSettings = new ConnectionSettings(Config.server, Config.port).SetDefaultIndex(Config.index);
ElasticClient client = new ElasticClient(elasticSettings);
//Connect to Elastic
ConnectionStatus connectionStatus;
if (client.TryConnect(out connectionStatus))
{
// Elastic Search Code here ...
} // end if
} // end ExecuteQuery
} // end EOCategoryProvider
From looking at the documentation, I can't see any provisions for a connection pool. I've been thinking about implementing my own (having, say 3 or 4 ElasticClient objects stored, and selecting them round-robin style), but I was wondering if anyone had a better solution. If not, does anyone have advice on the best way to implement a connection pool by hand? Any articles to point to?
Thanks for anything you guys come up with.
Update: This seems to have been related to calling TryConnect on every request, and the particular network setup. The problem completely disappeared when using a machine on the same network as the Elastic box; My development machine (which averages 350ms to the Elastic box) seemed to fail to make http connections sometimes, which caused the long times in TryConnect.
You don't have to call TryConnect() each time you do a call to Elasticsearch. It's basically a sanity check call for when your application starts.
NEST is the C# REST client for Elasticsearch and the default IConnection uses WebRequest.Create which already pools TCP connections.
Review the actual implementation: https://github.com/elastic/elasticsearch-net/blob/master/src/Elasticsearch.Net/Connection/HttpConnection.cs
Reusing ElasticClient won't offer any performance gains since each call already gets its own HttpWebRequest. The whole client is built stateless on purpose.
I am however very interested in why calls are taking 1 second for you. Could you post the actual NEST code, how you are are measuring the calls and describe your data.
Disclaimer: I'm the author of NEST.

Categories