I've been playing with MongoDB recently (It's AMAZINGLY FAST) using the C# driver on GitHub. Everything is working just fine in my little single threaded console app that I'm testing with. I'm able to add 1,000,000 documents (yes, million) in under 8 seconds running single threaded. I only get this performance if I use the connection outside the scope of a for loop. In other words, I'm keeping the connection open for each insert rather than connecting for each insert. Obviously that's contrived.
I thought I'd crank it up a notch to see how it works with multiple threads. I'm doing this because I need to simulate a website with multiple concurrent requests. I'm spinning up between 15 and 50 threads, still inserting a total of 150,000 documents in all cases. If I just let the threads run, each creating a new connection for each insert operation, the performance grinds to a halt.
Obviously I need to find a way to share, lock, or pool the connection. Therein lies the question. What's the best practice in terms of connecting to MongoDB? Should the connection be kept open for the life of the app (there is substantial latency opening and closing the TCP connection for each operation)?
Does anyone have any real world or production experience with MongoDB, and specifically the underlying connection?
Here is my threading sample using a static connection that's locked for insert operations. Please offer suggestions that would maximize performance and reliability in a web context!
private static Mongo _mongo;
private static void RunMongoThreaded()
{
_mongo = new Mongo();
_mongo.Connect();
var threadFinishEvents = new List<EventWaitHandle>();
for(var i = 0; i < 50; i++)
{
var threadFinish = new EventWaitHandle(false, EventResetMode.ManualReset);
threadFinishEvents.Add(threadFinish);
var thread = new Thread(delegate()
{
RunMongoThread();
threadFinish.Set();
});
thread.Start();
}
WaitHandle.WaitAll(threadFinishEvents.ToArray());
_mongo.Disconnect();
}
private static void RunMongoThread()
{
for (var i = 0; i < 3000; i++)
{
var db = _mongo.getDB("Sample");
var collection = db.GetCollection("Users");
var user = GetUser(i);
var document = new Document();
document["FirstName"] = user.FirstName;
document["LastName"] = user.LastName;
lock (_mongo) // Lock the connection - not ideal for threading, but safe and seemingly fast
{
collection.Insert(document);
}
}
}
Most answers here are outdated and are no longer applicable as the .net driver has matured and had numberless features added.
Looking at the documentation of the new 2.0 driver found here:
http://mongodb.github.io/mongo-csharp-driver/2.0/reference/driver/connecting/
The .net driver is now thread safe and handles connection pooling. According to documentation
It is recommended to store a MongoClient instance in a global place, either as a static variable or in an IoC container with a singleton lifetime.
The thing to remember about a static connection is that it's shared among all your threads. What you want is one connection per thread.
When using mongodb-csharp you treat it like you would an ADO connection.
When you create a Mongo object it borrows a connection from the pool, which it owns until it is disposed. So after the using block the connection is back into the pool.
Creating Mongo objects are cheap and fast.
Example
for(var i=0;i<100;i++)
{
using(var mongo1 = new Mongo())
using(var mongo2 = new Mongo())
{
mongo1.Connect();
mongo2.Connect();
}
}
Database Log
Wed Jun 02 20:54:21 connection accepted from 127.0.0.1:58214 #1
Wed Jun 02 20:54:21 connection accepted from 127.0.0.1:58215 #2
Wed Jun 02 20:54:21 MessagingPort recv() errno:0 No error 127.0.0.1:58214
Wed Jun 02 20:54:21 end connection 127.0.0.1:58214
Wed Jun 02 20:54:21 MessagingPort recv() errno:0 No error 127.0.0.1:58215
Wed Jun 02 20:54:21 end connection 127.0.0.1:58215
Notice it only opened 2 connections.
I put this together using mongodb-csharp forum.
http://groups.google.com/group/mongodb-csharp/browse_thread/thread/867fa78d726b1d4
Somewhat but still of interest is CSMongo, a C# driver for MongoDB created by the developer of jLinq. Here's a sample:
//create a database instance
using (MongoDatabase database = new MongoDatabase(connectionString)) {
//create a new document to add
MongoDocument document = new MongoDocument(new {
name = "Hugo",
age = 30,
admin = false
});
//create entire objects with anonymous types
document += new {
admin = true,
website = "http://www.hugoware.net",
settings = new {
color = "orange",
highlight = "yellow",
background = "abstract.jpg"
}
};
//remove fields entirely
document -= "languages";
document -= new[] { "website", "settings.highlight" };
//or even attach other documents
MongoDocument stuff = new MongoDocument(new {
computers = new [] {
"Dell XPS",
"Sony VAIO",
"Macbook Pro"
}
});
document += stuff;
//insert the document immediately
database.Insert("users", document);
}
Connection Pool should be your answer.
The feature is being developed (please see http://jira.mongodb.org/browse/CSHARP-9 for more detail).
Right now, for web application, the best practice is to connect at the BeginRequest and release the connection at EndRequest. But to me, I think that operation is too expensive for each request without Connection Pool. So I decide to have the global Mongo object and using that as shared resource for every threads (If you get the latest C# driver from github right now, they also improve the performance for concurrency a bit).
I don't know the disadvantage for using Global Mongo object. So let's wait for another expert to comment on this.
But I think I can live with it until the feature(Connection pool) have been completed.
I am using csharp-mongodb driver and it doesn't help me with his connection pool :( I have about 10-20 request to mongodb per web request.(150 users online - average) And i can't even monitor statistics or connect to mongodb from shell it throw exception to me.
I have created repository, which open and dispose connection per request. I rely on such things as:
1) Driver has connection pool
2) After my research(i have posted some question in user groups about this) - i understood that creating mongo object and open connection doesn't heavy operation, so heavy operation.
But today my production go down :(
May be i have to save open connection per request...
here is link to user group http://groups.google.com/group/mongodb-user/browse_thread/thread/3d4a4e6c5eb48be3#
Related
I want to understand the behavior I'm seeing with the following code. Using the RabbitMQ.Client library version 6.2.2.
Expected behavior: Connections are created quickly and the process does not slow down.
Actual behavior: First 6 connections are created quickly, after that there is a significant slowdown and connections are created one by done (1s apart).
Note; starting the program multiple times shows similar behavior. That leads me to believe that the bottleneck is per-process rather than RabbitMQ or system resources.
Note 2; system resources are not the bottleneck (AFAIK).
Does anybody know what is causing the observed behavior? RabbitMQ installed on Windows 10 with default settings.
using RabbitMQ.Client;
using System;
namespace ConsoleApp1
{
internal class Program
{
static void Main(string[] args)
{
for (int i = 0; i < 50; i++)
{
var factory = new ConnectionFactory() { HostName = "localhost" };
var connection = factory.CreateConnection();
var channel1 = connection.CreateModel();
var channel2 = connection.CreateModel();
Console.WriteLine(i);
}
Console.ReadLine();
}
}
}
EDIT: I know this violates every best practice regarding "Single connection per process". I'm just curious what is limiting the connection creation and if there is any setting that can control this behavior.
The .NET client uses the ThreadPool which probably doesn't have enough threads out of the box. You need to increase the amount available:
https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/main/projects/TestApplications/MassPublish/Program.cs#L21
See issues and discussion here:
https://github.com/rabbitmq/rabbitmq-dotnet-client/search?q=threadpool
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
I am using Parallel.ForEach to do work on multiple threads, using a new EF5 DbContext for each iteration, all wrapped within a TransactionScope, as follows:
using (var transaction = new TransactionScope())
{
int[] supplierIds;
using (var appContext = new AppContext())
{
supplierIds = appContext.Suppliers.Select(s => s.Id).ToArray();
}
Parallel.ForEach(
supplierIds,
supplierId =>
{
using (var appContext = new AppContext())
{
Do some work...
appContext.SaveChanges();
}
});
transaction.Complete();
}
After running for a few minutes it is throwing an EntityException "The underlying provider failed on Open" with the following inner detail:
"The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions."
Does anyone know what's causing this or how it can be prevented? Thanks.
You could also try setting the maximum number of concurrent tasks in the Parallel.ForEach() method using new ParallelOptions { MaxDegreeOfParallelism = 8 } (replace 8 with the whatever you want to limit it to.
See MSDN for more details
You should also find out why your app is taking such huge amounts of locks? You have wrapped a TransactionScope around multiple db connections. This probably causes a distributed transaction which might have to do with it. It certainly causes locks to never be released until the very end. Change that.
You can only turn up locking limits so far. It does not scale to arbitrary amounts of supplier ids. You need to find the cause for the locks, not mitigate the symptoms.
You are running into the maximum number of locks allowed by sql server - which by default is set automatically and governed by available memory.
You can
Set it manually - I forget exactly how but google is your friend.
Add more memory to your sql server
Commit your transactions more frequently.
For the windows azure queues the scalability target per storage is supposed to be around 500 messages / second (http://msdn.microsoft.com/en-us/library/windowsazure/hh697709.aspx). I have the following simple program that just writes a few messages to a queue. The program takes 10 seconds to complete (4 messages / second). I am running the program from inside a virtual machine (on west-europe) and my storage account also is located in west-europe. I don't have setup geo replication for my storage. My connection string is setup to use the http protocol.
// http://blogs.msdn.com/b/windowsazurestorage/archive/2010/06/25/nagle-s-algorithm-is-not-friendly-towards-small-requests.aspx
ServicePointManager.UseNagleAlgorithm = false;
CloudStorageAccount storageAccount=CloudStorageAccount.Parse(ConfigurationManager.AppSettings["DataConnectionString"]);
var cloudQueueClient = storageAccount.CreateCloudQueueClient();
var queue = cloudQueueClient.GetQueueReference(Guid.NewGuid().ToString());
queue.CreateIfNotExist();
var w = new Stopwatch();
w.Start();
for (int i = 0; i < 50;i++ )
{
Console.WriteLine("nr {0}",i);
queue.AddMessage(new CloudQueueMessage("hello "+i));
}
w.Stop();
Console.WriteLine("elapsed: {0}", w.ElapsedMilliseconds);
queue.Delete();
Any idea how I can get better performance?
EDIT:
Based on Sandrino Di Mattia's answer I re-analyzed the code I've originally posted and found out that it was not complete enough to reproduce the error. In fact I had created a queue just before the call to ServicePointManager.UseNagleAlgorithm = false; The code to reproduce my problem looks more like this:
CloudStorageAccount storageAccount=CloudStorageAccount.Parse(ConfigurationManager.AppSettings["DataConnectionString"]);
var cloudQueueClient = storageAccount.CreateCloudQueueClient();
var queue = cloudQueueClient.GetQueueReference(Guid.NewGuid().ToString());
//ServicePointManager.UseNagleAlgorithm = false; // If you change the nagle algorithm here, the performance will be okay.
queue.CreateIfNotExist();
ServicePointManager.UseNagleAlgorithm = false; // TOO LATE, the queue is already created without 'nagle'
var w = new Stopwatch();
w.Start();
for (int i = 0; i < 50;i++ )
{
Console.WriteLine("nr {0}",i);
queue.AddMessage(new CloudQueueMessage("hello "+i));
}
w.Stop();
Console.WriteLine("elapsed: {0}", w.ElapsedMilliseconds);
queue.Delete();
The suggested solution from Sandrino to configure the ServicePointManager using the app.config file has the advantage that the ServicePointManager is initialized when the application starts up, so you don't have to worry about time dependencies.
I answered a similar question a few days ago: How to achive more 10 inserts per second with azure storage tables.
For adding 1000 items in table storage it took over 3 minutes, and with the changes I described in my answer it dropped to 4 seconds (250 requests/sec). In the end, table storage and storage queues aren't all that different. The backend is the same, data is simply stored in a different way. And both table storage and queues are exposed through a REST API, so if you improve the way you handle your requests, you'll get a better performance.
The most important changes:
expect100Continue: false
useNagleAlgorithm: false (you're already doing this)
Parallel requests combined with connectionManagement/maxconnection
Also, ServicePointManager.DefaultConnectionLimit should be increased before making a service point. Actually Sandrino's answer says the same thing but using config.
Turn off proxy detection even in the cloud. Auto detect in proxy config element. Slows initialisation.
Choose distributed partition keys.
Collocate your account near to compute, and customers.
Design to add more accounts as needed.
Microsoft set the SLA at 2,000 tps on queues and tables as of 07 2012.
I didn't read Sandrino's linked answer, sorry, just was on this question as I was watching Build 2012 session on exactly this.
When I initialize my client to connect to AppFabric's cache, it seems to inconsistently take up to 30 seconds to connect on the following line:
factory = new DataCacheFactory(configuration);
See full Init() code below - mostly taken from here.
I say inconsistently because sometimes it takes 1 second and other times 27, 28 , etc ... seconds. I have an asp.net site using the AppFabric cache - which lives on a different box (on the same domain). Everything is working great, except for the inconsistent connection time. When it connects, its all good - I just need to get it to consistently connect in ~1 second :) ... Thoughts?
public static void Init()
{
if (cache == null)
{
Stopwatch sw = new Stopwatch();
sw.Start();
try
{
//Define Array for 1 Cache Host
List<DataCacheServerEndpoint> servers = new List<DataCacheServerEndpoint>(1);
var appFabricHost = ConfigurationManager.AppSettings["AppFabricHost"];
var appFabricPort = ConfigurationManager.AppSettings["AppFabricPort"].ParseAs<int>();
//Specify Cache Host Details
// Parameter 1 = host name
// Parameter 2 = cache port number
servers.Add(new DataCacheServerEndpoint(appFabricHost, appFabricPort));
TraceHelper.TraceVerbose("Init", string.Format("Defined AppFabric - Host: {0}, Port: {1}", appFabricHost, appFabricPort));
//Create cache configuration
DataCacheFactoryConfiguration configuration = new DataCacheFactoryConfiguration();
//Set the cache host(s)
configuration.Servers = servers;
//Set default properties for local cache (local cache disabled)
configuration.LocalCacheProperties = new DataCacheLocalCacheProperties();
//Disable tracing to avoid informational/verbose messages on the web page
DataCacheClientLogManager.ChangeLogLevel(System.Diagnostics.TraceLevel.Off);
//Pass configuration settings to cacheFactory constructor
factory = new DataCacheFactory(configuration);
//Get reference to named cache
cache = factory.GetCache(cacheName);
TraceHelper.TraceVerbose("Init", "Defined AppFabric - CacheName: " + cacheName);
}
catch (Exception ex)
{
TraceHelper.TraceError("Init", ex);
}
finally
{
TraceHelper.TraceInfo("Init", string.Format("AppFabric init took {0} seconds", sw.Elapsed.Seconds));
}
if (cache == null)
{
TraceHelper.TraceError("Init", string.Format("First init cycle took {0} seconds and failed, retrying", sw.Elapsed.Seconds));
UrlShortener.Init(); // if at first you don't succeed, try try again ...
}
}
}
Is it any faster and/or more consistent if you keep all the configuration info in a .config file rather than creating your configuration programmatically? See here for details - I would always use this method as opposed to the programmatic configuration as it's much easier to update when something changes.
Otherwise I think the general advice is that DataCacheFactory is an expensive object to create due to what it does i.e. makes a network connection to each server in the cluster. You definitely don't want to be creating a DataCacheFactory every time you need to get something from the cache, instead you might want to think about creating it in Application_Start as perhaps a singleton and then reusing that one throughout your application (which, granted, doesn't solve the problem but it might serve to mitigate it).
Within ActiveMQ I've been told that the most optimal solution for increased throughput is to have multiple connections, each with their own session and consumer.
I've been trying to achieve this with NMS (Connecting via C#), but in the "Active Consumwers" screen of the MQ web console, I'm seeing all my connections & consumers listed as I'd expect to see them, but in the next column they all have a sessionId of "1". I would have expected there to be a separate session ID for each.
Is this right? And if there should be different session IDs for each connection/consumer, how would I go about ensuring these extra sessions are created?
Here's some example code I'm using to start a new connection (this is based on Remarks ActiveMQ transactional messaging introduction code):
public QueueConnection(IConnectionFactory connectionFactory, string queueName, AcknowledgementMode acknowledgementMode)
{
this.connection = connectionFactory.CreateConnection();
this.connection.Start();
this.session = this.connection.CreateSession(acknowledgementMode);
this.queue = new ActiveMQQueue(queueName);
}
... and this is being done each time for each of the connections I'm opening.
It looks this piece of code which is the root source of my problem:
public SimpleQueueListener CreateSimpleQueueListener(IMessageProcessor processor)
{
IMessageConsumer consumer = this.session.CreateConsumer(this.queue, "2 > 1");
return new SimpleQueueListener(consumer, processor, this.session);
}
As it's using a common session (this.session) for all the consumer. By creating a new session each time (and holding it in a collection or by other means), I have achieved the goal of operating each listener on its own session within the same connection. Eg:
public SimpleQueueListener CreateSimpleQueueListener(IMessageProcessor processor)
{
var listenerSession = this.connection.CreateSession();
IMessageConsumer consumer = listenerSession.CreateConsumer(this.queue, "2 > 1");
return new SimpleQueueListener(consumer, processor, listenerSession);
}