ActiveMQ - multiple connections per session? - c#

Within ActiveMQ I've been told that the most optimal solution for increased throughput is to have multiple connections, each with their own session and consumer.
I've been trying to achieve this with NMS (Connecting via C#), but in the "Active Consumwers" screen of the MQ web console, I'm seeing all my connections & consumers listed as I'd expect to see them, but in the next column they all have a sessionId of "1". I would have expected there to be a separate session ID for each.
Is this right? And if there should be different session IDs for each connection/consumer, how would I go about ensuring these extra sessions are created?
Here's some example code I'm using to start a new connection (this is based on Remarks ActiveMQ transactional messaging introduction code):
public QueueConnection(IConnectionFactory connectionFactory, string queueName, AcknowledgementMode acknowledgementMode)
{
this.connection = connectionFactory.CreateConnection();
this.connection.Start();
this.session = this.connection.CreateSession(acknowledgementMode);
this.queue = new ActiveMQQueue(queueName);
}
... and this is being done each time for each of the connections I'm opening.

It looks this piece of code which is the root source of my problem:
public SimpleQueueListener CreateSimpleQueueListener(IMessageProcessor processor)
{
IMessageConsumer consumer = this.session.CreateConsumer(this.queue, "2 > 1");
return new SimpleQueueListener(consumer, processor, this.session);
}
As it's using a common session (this.session) for all the consumer. By creating a new session each time (and holding it in a collection or by other means), I have achieved the goal of operating each listener on its own session within the same connection. Eg:
public SimpleQueueListener CreateSimpleQueueListener(IMessageProcessor processor)
{
var listenerSession = this.connection.CreateSession();
IMessageConsumer consumer = listenerSession.CreateConsumer(this.queue, "2 > 1");
return new SimpleQueueListener(consumer, processor, listenerSession);
}

Related

ActiveMQ Queue Count Stops at 400

I am creating an application to connect to multiple ActiveMQ servers and get the total number of messages withing their different queues.
I am using a slightly modified version of the code found in this link ActiveMQ with C# and Apache NMS - Count messages in queue
to count the messages withing the queue.
The problem I am having is that if the queue contains more than 400 messages this code stops counting at 400.
public static int GetMessageCount(string server, string user, string pw) {
int messageCount = 0;
var _server = $"activemq:ssl://{server}:61616?transport.acceptInvalidBrokerCert=true";
IConnectionFactory factory = new NMSConnectionFactory(_server);
using (IConnection connection = factory.CreateConnection(user, pw)) {
connection.Start();
using (ISession session = connection.CreateSession(AcknowledgementMode.AutoAcknowledge)) {
IDestination requestDestination = session.GetQueue(QueueRequestUri);
IQueueBrowser queueBrowser = session.CreateBrowser((IQueue)requestDestination);
IEnumerator messages = queueBrowser.GetEnumerator();
while (messages.MoveNext()) {
IMessage message = (IMessage)messages.Current;
messageCount++;
}
connection.Close();
session.Close();
connection.Close();
}
}
return messageCount;
}
How do I get the actual number of messages in the queue?
Why is this behavior?
Is this an issue with IEnumerator interface or is it an issue with the Apache.NMS.ActiveMQ API?
Normally there is no guarantee that the browser will return all messages from the queue. It provides a snapshot of the messages but may not return all of them. ActiveMQ has a limit for overhead reduction. You can increase the limits, see maxBrowsePageSize, however there is still no guarantee.
maxBrowsePageSize - 400 - The maximum number of messages to page in from the store at one time for a browser.
Those APIs are not designed for counting messages and you shouldn't do it. Just process the messages without counting them. If you want to get metrics then use some kind of admin libraries. JMX (yes I know you use C#) could be helpful as well.

StackExchange.Redis timeout on only 1 server

When a new box starts up (or presumably gets the app pool recycled), we're seeing a timeout error for every redis request. What's interesting is that it's probably 1/30 or so. That is, 30 boxes will boot up just fine and work (actual call is a Redis Lock call) to every 1 box that boots up in this faulty state. The example below shows 9k items in queue. The ConnectionMultiplexer is being initialized lazily, per MS azure recommendation (though we're not on Azure) and here's the call:
var db = m_dbFactory.GetDatabase();
bool gotLock = db.LockTake(key, value, m_redisLockConfig.RedisLockMaxAgeTimeSpan);
and we're using Ninject to get a singleton of that dbFactory injected in:
kernel.Bind<IRedisDatabaseFactory>().To<RedisDatabaseFactory>().InSingletonScope();
We've had to redeploy the code (recycling the app pool) to fix the issue, or kill the 1 bad box behind the load balancer. Has anyone come across this problem before? I see we have 9k items in queue that haven't been written to outbound network, following an azure troubleshooting link: https://azure.microsoft.com/en-us/blog/investigating-timeout-exceptions-in-stackexchange-redis-for-azure-redis-cache/
If the connection was not opened, however, I am specifically throwing an error from my redis db factory (which I'm not seeing in our logs). Here's the whole class to see the connectionmultiplexer initialization:
public class RedisDatabaseFactory : IRedisDatabaseFactory
{
private readonly Lazy<IConnectionMultiplexer> m_lazyConnectionMultiplexer;
public RedisDatabaseFactory(IRedisConfig redisConfig)
{
var endPoint = new DnsEndPoint(redisConfig.Host, redisConfig.Port);
var configOptions = new ConfigurationOptions
{
EndPoints = { endPoint },
Password = redisConfig.Password,
ConnectTimeout = 5000,
AbortOnConnectFail = false
};
m_lazyConnectionMultiplexer = new Lazy<IConnectionMultiplexer>(() =>
ConnectionMultiplexer.Connect(configOptions));
}
private IConnectionMultiplexer Connection
{
get { return m_lazyConnectionMultiplexer.Value; }
}
/// <summary>
/// Gets a connected redis database
/// </summary>
/// <exception cref="Exception"></exception>
/// <returns>Connected redis database</returns>
public IDatabase GetDatabase()
{
if (!Connection.IsConnected)
{
throw new Exception("Redis connection failure");
}
return Connection.GetDatabase();
}
}
Here's the stack trace:
System.TimeoutException: Timeout performing SET mykey, inst: 0, mgr: ExecuteSelect, err: never, queue: 9058, qu: 9058, qs: 0, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=1,Free=32766,Min=1,Max=32767), clientName: myclient
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisDatabase.StringSet(RedisKey key, RedisValue value, Nullable1 expiry, When when, CommandFlags flags)
at StackExchange.Redis.RedisDatabase.LockTake(RedisKey key, RedisValue value, TimeSpan expiry, CommandFlags flags)
I changed the name of my key, client name, and removed backticks.
This is really late, but we did eventually make a change that solved the problem. We upgraded to the latest StackExchange.Redis in case the issue was fixed by Marc Gravell and team, but we also made the following change:
m_lazyConnectionMultiplexer = new Lazy<IConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(configOptions),LazyThreadSafetyMode.PublicationOnly;
so that should the connection multiplexer initialize to a bad state, another would get initialized afterwards. After making those 2 changes, we never saw the issue again. I believe the issue was not actually in app pool recycle but in our process of tearing down and building up the boxes from Amazon Machine Image on a regular basis. When they were built back up, occasionally 1 was in a bad state. I wish I had pinpointed the fix, but that's what worked for us.
Two things jump out at me from your timeout error message.
Your "qu: 9058" number means that 9058 requests have been queued up locally but haven't yet been sent on the wire. This may mean that your system is taking too long to connect to Redis.
You should probably change your ThreadPool configuration as described here: https://gist.github.com/JonCole/e65411214030f0d823cb. You have 1 min thread for both IOCP and WORKER threads, which can cause problems during bursts of traffic, which is common for many apps during startup.
If that doesn't fix things for you, then you may want to monitor your client side CPU usage. If your client CPU is spiking up around 100%, then your system will just not have enough CPU to keep up with all the work you are trying to give it. Upgrade your client machine to something faster. The default Min Threads in the ThreadPool is 1 in your case, which often indicates that you have only 1 CPU Core, which may not be enough.

Rabbit MQ - Recovery of connection/channel/consumer

I am creating a consumer that runs in an infinite loop to read messages from the queue. I am looking for advice/sample code on how to recover abd continue within my infinite loop even if there are network disruptions. The consumer has to stay running as it will be installed as a WindowsService.
1) Can someone please explain how to properly use these settings? What is the difference between them?
NetworkRecoveryInterval
AutomaticRecoveryEnabled
RequestedHeartbeat
2) Please see my current sample code for the consumer. I am using the .Net RabbitMQ Client v3.5.6.
How will the above settings do the "recovery" for me?
e.g. will consumer.Queue.Dequeue block until it is recovered?
That doesn't seem right
so...
Do I have to code for this manually? e.g. will consumer.Queue.Dequeue throw an exception for which I have to detect and manually re-create my connection, channel, and consumer? Or just the consumer, as "AutomaticRecovery" will recover the channel for me?
Does this mean I should move the consumer creation inside the while loop? what about the channel creation? and the connection creation?
3) Assuming I have to do some of this recovery code manually, are there event callbacks (and how do I register for them) to tell me that there are network problems?
Thanks!
public void StartConsumer(string queue)
{
using (IModel channel = this.Connection.CreateModel())
{
var consumer = new QueueingBasicConsumer(channel);
const bool noAck = false;
channel.BasicConsume(queue, noAck, consumer);
// do I need these conditions? or should I just do while(true)???
while (channel.IsOpen &&
Connection.IsOpen &&
consumer.IsRunning)
{
try
{
BasicDeliverEventArgs item;
if (consumer.Queue.Dequeue(Timeout, out item))
{
string message = System.Text.Encoding.UTF8.GetString(item.Body);
DoSomethingMethod(message);
channel.BasicAck(item.DeliveryTag, false);
}
}
catch (EndOfStreamException ex)
{
// this is likely due to some connection issue -- what am I to do?
}
catch (Exception ex)
{
// should never happen, but lets say my DoSomethingMethod(message); throws an exception
// presumably, I'll just log the error and keep on going
}
}
}
}
public IConnection Connection
{
get
{
if (_connection == null) // _connection defined in class -- private static IConnection _connection;
{
_connection = CreateConnection();
}
return _connection;
}
}
private IConnection CreateConnection()
{
ConnectionFactory factory = new ConnectionFactory()
{
HostName = "RabbitMqHostName",
UserName = "RabbitMqUserName",
Password = "RabbitMqPassword",
};
// why do we need to set this explicitly? shouldn't this be the default?
factory.AutomaticRecoveryEnabled = true;
// what is a good value to use?
factory.NetworkRecoveryInterval = TimeSpan.FromSeconds(5);
// what is a good value to use? How is this different from NetworkRecoveryInterval?
factory.RequestedHeartbeat = 5;
IConnection connection = factory.CreateConnection();
return connection;
}
RabbitMQ features
The documentation on RabbitMQ's site is actually really good. If you want to recover queues, exchanges and consumers, you're looking for topology recovery, which is enabled by default. Automatic Recovery (which is enabled by default) includes:
Reconnect
Restore connection listeners
Re-open channels
Restore channel listeners
Restore channel basic.qos setting, publisher confirms and transaction settings
The NetworkRecoveryInterval is the amount of time before a retry on an automatic recovery is performed (defaults to 5s).
Heartbeat has another purpose, namely to identify dead TCP connections. There are more to read about that at RabbitMQ's site.
Code sample
Writing reliable code for recovery is tricky. The EndOfStreamException is (as you suspect) most likely due to network problems. If you use the management plugin, you can reproduce this by closing the connection from there and see that the exception is triggered. For production-like applications, you might want to have a set of brokers that you alternate between in case of connection failure. If you have several RabbitMQ brokers, you might also want to guard yourself against long-term server failure on one or more of the servers. You might want to implement error strategies, like requeuing the message, or using a dead letter exchange.
I've been thinking a bit of these things and written a thin client, RawRabbit, that handles some of these things. Maybe it could be something for you? If not, I would suggest that you change the QueueingBasicConsumer to an EventingBasicConsumer. It is event driven, rather than thread blocking.
var eventConsumer = new EventingBasicConsumer(channel);
eventConsumer.Received += (sender, args) =>
{
var body = args.Body;
eventConsumer.Model.BasicAck(args.DeliveryTag, false);
};
channel.BasicConsume(queue, false, eventConsumer);
If you have topology recovery activated, the consumer will be restored by the RabbitMQ Client and start receiving messages again.
For more granular control, hook up event handlers for ConsumerCancelled and Shutdown to detect connectivity problems and Registered to know when the consumer can be used again.

C# console app to send email at scheduled times

I've got a C# console app running on Windows Server 2003 whose purpose is to read a table called Notifications and a field called "NotifyDateTime" and send an email when that time is reached. I have it scheduled via Task Scheduler to run hourly, check to see if the NotifyDateTime falls within that hour, and then send the notifications.
It seems like because I have the notification date/times in the database that there should be a better way than re-running this thing every hour.
Is there a lightweight process/console app I could leave running on the server that reads in the day's notifications from the table and issues them exactly when they're due?
I thought service, but that seems overkill.
My suggestion is to write simple application, which uses Quartz.NET.
Create 2 jobs:
First, fires once a day, reads all awaiting notification times from database planned for that day, creates some triggers based on them.
Second, registered for such triggers (prepared by the first job), sends your notifications.
What's more,
I strongly advice you to create windows service for such purpose, just not to have lonely console application constantly running. It can be accidentally terminated by someone who have access to the server under the same account. What's more, if the server will be restarted, you have to remember to turn such application on again, manually, while the service can be configured to start automatically.
If you're using web application you can always have this logic hosted e.g. within IIS Application Pool process, although it is bad idea whatsoever. It's because such process is by default periodically restarted, so you should change its default configuration to be sure it is still working in the middle of the night, when application is not used. Unless your scheduled tasks will be terminated.
UPDATE (code samples):
Manager class, internal logic for scheduling and unscheduling jobs. For safety reasons implemented as a singleton:
internal class ScheduleManager
{
private static readonly ScheduleManager _instance = new ScheduleManager();
private readonly IScheduler _scheduler;
private ScheduleManager()
{
var properties = new NameValueCollection();
properties["quartz.scheduler.instanceName"] = "notifier";
properties["quartz.threadPool.type"] = "Quartz.Simpl.SimpleThreadPool, Quartz";
properties["quartz.threadPool.threadCount"] = "5";
properties["quartz.threadPool.threadPriority"] = "Normal";
var sf = new StdSchedulerFactory(properties);
_scheduler = sf.GetScheduler();
_scheduler.Start();
}
public static ScheduleManager Instance
{
get { return _instance; }
}
public void Schedule(IJobDetail job, ITrigger trigger)
{
_scheduler.ScheduleJob(job, trigger);
}
public void Unschedule(TriggerKey key)
{
_scheduler.UnscheduleJob(key);
}
}
First job, for gathering required information from the database and scheduling notifications (second job):
internal class Setup : IJob
{
public void Execute(IJobExecutionContext context)
{
try
{
foreach (var kvp in DbMock.ScheduleMap)
{
var email = kvp.Value;
var notify = new JobDetailImpl(email, "emailgroup", typeof(Notify))
{
JobDataMap = new JobDataMap {{"email", email}}
};
var time = new DateTimeOffset(DateTime.Parse(kvp.Key).ToUniversalTime());
var trigger = new SimpleTriggerImpl(email, "emailtriggergroup", time);
ScheduleManager.Instance.Schedule(notify, trigger);
}
Console.WriteLine("{0}: all jobs scheduled for today", DateTime.Now);
}
catch (Exception e) { /* log error */ }
}
}
Second job, for sending emails:
internal class Notify: IJob
{
public void Execute(IJobExecutionContext context)
{
try
{
var email = context.MergedJobDataMap.GetString("email");
SendEmail(email);
ScheduleManager.Instance.Unschedule(new TriggerKey(email));
}
catch (Exception e) { /* log error */ }
}
private void SendEmail(string email)
{
Console.WriteLine("{0}: sending email to {1}...", DateTime.Now, email);
}
}
Database mock, just for purposes of this particular example:
internal class DbMock
{
public static IDictionary<string, string> ScheduleMap =
new Dictionary<string, string>
{
{"00:01", "foo#gmail.com"},
{"00:02", "bar#yahoo.com"}
};
}
Main entry of the application:
public class Program
{
public static void Main()
{
FireStarter.Execute();
}
}
public class FireStarter
{
public static void Execute()
{
var setup = new JobDetailImpl("setup", "setupgroup", typeof(Setup));
var midnight = new CronTriggerImpl("setuptrigger", "setuptriggergroup",
"setup", "setupgroup",
DateTime.UtcNow, null, "0 0 0 * * ?");
ScheduleManager.Instance.Schedule(setup, midnight);
}
}
Output:
If you're going to use service, just put this main logic to the OnStart method (I advice to start the actual logic in a separate thread not to wait for the service to start, and the same avoid possible timeouts - not in this particular example obviously, but in general):
protected override void OnStart(string[] args)
{
try
{
var thread = new Thread(x => WatchThread(new ThreadStart(FireStarter.Execute)));
thread.Start();
}
catch (Exception e) { /* log error */ }
}
If so, encapsulate the logic in some wrapper e.g. WatchThread which will catch any errors from the thread:
private void WatchThread(object pointer)
{
try
{
((Delegate) pointer).DynamicInvoke();
}
catch (Exception e) { /* log error and stop service */ }
}
You trying to implement polling approach, where a job is monitoring a record in DB for any changes.
In this case we are trying to hit DB for periodic time, so if the one hour delay reduced to 1 min later stage, then this solution turns to performance bottle neck.
Approach 1
For this scenario please use Queue based approach to avoid any issues, you can also scale up number of instances if you are sending so many emails.
I understand there is a program updates NotifyDateTime in a table, the same program can push a message to Queue informing that there is a notification to handle.
There is a windows service looking after this queue for any incoming messages, when there is a message it performs the required operation (ie sending email).
Approach 2
http://msdn.microsoft.com/en-us/library/vstudio/zxsa8hkf(v=vs.100).aspx
you can also invoke C# code from SQL Server Stored procedure if you are using MS SQL Server. but in this case you are making use of your SQL server process to send mail, which is not a good practice.
However you can invoke a web service, OR WCF service which can send emails.
But Approach 1 is error free, Scalable , Track-able, Asynchronous , and doesn't trouble your data base OR APP, you have different process to send email.
Queues
Use MSMQ which is part of windows server
You can also try https://www.rabbitmq.com/dotnet.html
Pre-scheduled tasks (at undefined times) are generally a pain to handle, as opposed to scheduled tasks where Quartz.NET seems well suited.
Furthermore, another distinction is to be made between fire-and-forget for tasks that shouldn't be interrupted/change (ex. retries, notifications) and tasks that need to be actively managed (ex. campaign or communications).
For the fire-and-forget type tasks a message queue is well suited. If the destination is unreliable, you will have to opt for retry levels (ex. try send (max twice), retry after 5 minutes, try send (max twice), retry after 15 minutes) that at least require specifying message specific TTL's with a send and retry queue. Here's an explanation with a link to code to setup a retry level queue
The managed pre-scheduled tasks will require that you use a database queue approach (Click here for a CodeProject article on designing a database queue for scheduled tasks)
. This will allow you to update, remove or reschedule notifications given you keep track of ownership identifiers (ex. specifiy a user id and you can delete all pending notifications when the user should no longer receive notifications such as being deceased/unsubscribed)
Scheduled e-mail tasks (including any communication tasks) require finer grained control (expiration, retry and time-out mechanisms). The best approach to take here is to build a state machine that is able to process the e-mail task through its steps (expiration, pre-validation, pre-mailing steps such as templating, inlining css, making links absolute, adding tracking objects for open tracking, shortening links for click tracking, post-validation and sending and retrying).
Hopefully you are aware that the .NET SmtpClient isn't fully compliant with the MIME specifications and that you should be using a SAAS e-mail provider such as Amazon SES, Mandrill, Mailgun, Customer.io or Sendgrid. I'd suggest you look at Mandrill or Mailgun. Also if you have some time, take a look at MimeKit which you can use to construct MIME messages for the providers allow sending raw e-mail and doesn't necessarily support things like attachments/custom headers/DKIM signing.
I hope this sets you on the right path.
Edit
You will have to use a service to poll at specific intervals (ex. 15 seconds or 1 minute). The database load can be somewhat negated by checkout out a certain amount of due tasks at a time and keeping an internal pool of messages due for sending (with a time-out mechanism in place). When there's no messages returned, just 'sleep' the polling for a while. I'd would advise against building such a system out against a single table in a database - instead design an independant e-mail scheduling system that you can integrate with.
I would turn it into a service instead.
You can use System.Threading.Timer event handler for each of the scheduled times.
Scheduled tasks can be scheduled to run just once at a specific time (as opposed to hourly, daily, etc.), so one option would be to create the scheduled task when the specific field in your database changes.
You don't mention which database you use, but some databases support the notion of a trigger, e.g. in SQL: http://technet.microsoft.com/en-us/library/ms189799.aspx
If you know when the emails need to be sent ahead of time then I suggest that you use a wait on an event handle with the appropriate timeout. At midnight look at the table then wait on an event handle with the timeout set to expire when the next email needs to be sent. After sending the email wait again with the timeout set based on the next mail that should be sent.
Also, based on your description, this should probably be implemented as a service but it is not required.
I have been dealing with the same problem about three years ago. I have changed the process several times before it was good enough, I tell you why:
First implementation was using special deamon from webhosting which called the IIS website. The website checked the caller IP and then check the database and send emails. This was working until one day, when I got a lot of very dirty emails from the users that I have totally spammed their mailboxes. The drawback of keeping email in database and sending from SMTP email is that there is NOTHING which ensure DB to SMTP transaction. You are never sure if the email has been successfully sent or not. Sending email can be successfull, can failed or it can be false positive or it can be false negative (SMTP client tells you, that the email was not sent, but it was). There was some problem with SMTP server and the server returned false(email not send), but the email was sent. The daemon was resending the email every hour the whole day before the dirty emails appears.
Second implementation: To prevent spamming, I have changed the algorithm, that the email is considered to be sent even if it failed (my email notification was not too important). My first advice is: "Don't launch the deamon too often, because this false negative smtp error makes users upset."
After several month there were some changes on the server and the daemon was not working well. I got the idea from the stackoverflow: bind the .NET timer to the web application domain. It wasn't good idea, because it seems, that IIS can restart application from time to time because of memory leaks and the timer never fires if the restarts are more often then timer ticks.
The last implementation. Windows scheduler every hour fires python batch which read local website. This fire ASP.NET code. The advantage is that time windows scheduler call the the local batch and website reliably. IIS doesn't hang, it has restart ability. The timer site is part of my website, it is still one projects. (you can use console app instead). Simple is better. It just works!
Your first choice is the correct option in my opinion. Task Scheduler is the MS recommended way to perform periodic jobs. Moreover it's flexible, can reports failures to ops, is optimized and amortized amongst all tasks in the system, ...
Creating any console-kind app that runs all the time is fragile. It can be shutdown by anyone, needs an open seesion, doesn't restart automatically, ...
The other option is creating some kind of service. It's guaranteed to be running all the time, so that would at least work. But what was your motivation?
"It seems like because I have the notification date/times in the database that there should be a better way than re-running this thing every hour."
Oh yeah optimization... So you want to add a new permanently running service to your computer so that you avoid one potentially unrequired SQL query every hour? The cure looks worse than the disease to me.
And I didn't mention all the drawbacks of the service. On one hand, your task uses no resource when it doesn't run. It's very simple, lightweight and the query efficient (provided you have the right index).
On the other hand, if your service crashes it's probably gone for good. It needs a way to be notified of new e-mails that may need to be sent earlier than what's currently scheduled. It permanently uses computer resources, such as memory. Worse, it may contain memory leaks.
I think that the cost/benefit ratio is very low for any solution other than the trivial periodic task.

.NET best practices for MongoDB connections?

I've been playing with MongoDB recently (It's AMAZINGLY FAST) using the C# driver on GitHub. Everything is working just fine in my little single threaded console app that I'm testing with. I'm able to add 1,000,000 documents (yes, million) in under 8 seconds running single threaded. I only get this performance if I use the connection outside the scope of a for loop. In other words, I'm keeping the connection open for each insert rather than connecting for each insert. Obviously that's contrived.
I thought I'd crank it up a notch to see how it works with multiple threads. I'm doing this because I need to simulate a website with multiple concurrent requests. I'm spinning up between 15 and 50 threads, still inserting a total of 150,000 documents in all cases. If I just let the threads run, each creating a new connection for each insert operation, the performance grinds to a halt.
Obviously I need to find a way to share, lock, or pool the connection. Therein lies the question. What's the best practice in terms of connecting to MongoDB? Should the connection be kept open for the life of the app (there is substantial latency opening and closing the TCP connection for each operation)?
Does anyone have any real world or production experience with MongoDB, and specifically the underlying connection?
Here is my threading sample using a static connection that's locked for insert operations. Please offer suggestions that would maximize performance and reliability in a web context!
private static Mongo _mongo;
private static void RunMongoThreaded()
{
_mongo = new Mongo();
_mongo.Connect();
var threadFinishEvents = new List<EventWaitHandle>();
for(var i = 0; i < 50; i++)
{
var threadFinish = new EventWaitHandle(false, EventResetMode.ManualReset);
threadFinishEvents.Add(threadFinish);
var thread = new Thread(delegate()
{
RunMongoThread();
threadFinish.Set();
});
thread.Start();
}
WaitHandle.WaitAll(threadFinishEvents.ToArray());
_mongo.Disconnect();
}
private static void RunMongoThread()
{
for (var i = 0; i < 3000; i++)
{
var db = _mongo.getDB("Sample");
var collection = db.GetCollection("Users");
var user = GetUser(i);
var document = new Document();
document["FirstName"] = user.FirstName;
document["LastName"] = user.LastName;
lock (_mongo) // Lock the connection - not ideal for threading, but safe and seemingly fast
{
collection.Insert(document);
}
}
}
Most answers here are outdated and are no longer applicable as the .net driver has matured and had numberless features added.
Looking at the documentation of the new 2.0 driver found here:
http://mongodb.github.io/mongo-csharp-driver/2.0/reference/driver/connecting/
The .net driver is now thread safe and handles connection pooling. According to documentation
It is recommended to store a MongoClient instance in a global place, either as a static variable or in an IoC container with a singleton lifetime.
The thing to remember about a static connection is that it's shared among all your threads. What you want is one connection per thread.
When using mongodb-csharp you treat it like you would an ADO connection.
When you create a Mongo object it borrows a connection from the pool, which it owns until it is disposed. So after the using block the connection is back into the pool.
Creating Mongo objects are cheap and fast.
Example
for(var i=0;i<100;i++)
{
using(var mongo1 = new Mongo())
using(var mongo2 = new Mongo())
{
mongo1.Connect();
mongo2.Connect();
}
}
Database Log
Wed Jun 02 20:54:21 connection accepted from 127.0.0.1:58214 #1
Wed Jun 02 20:54:21 connection accepted from 127.0.0.1:58215 #2
Wed Jun 02 20:54:21 MessagingPort recv() errno:0 No error 127.0.0.1:58214
Wed Jun 02 20:54:21 end connection 127.0.0.1:58214
Wed Jun 02 20:54:21 MessagingPort recv() errno:0 No error 127.0.0.1:58215
Wed Jun 02 20:54:21 end connection 127.0.0.1:58215
Notice it only opened 2 connections.
I put this together using mongodb-csharp forum.
http://groups.google.com/group/mongodb-csharp/browse_thread/thread/867fa78d726b1d4
Somewhat but still of interest is CSMongo, a C# driver for MongoDB created by the developer of jLinq. Here's a sample:
//create a database instance
using (MongoDatabase database = new MongoDatabase(connectionString)) {
//create a new document to add
MongoDocument document = new MongoDocument(new {
name = "Hugo",
age = 30,
admin = false
});
//create entire objects with anonymous types
document += new {
admin = true,
website = "http://www.hugoware.net",
settings = new {
color = "orange",
highlight = "yellow",
background = "abstract.jpg"
}
};
//remove fields entirely
document -= "languages";
document -= new[] { "website", "settings.highlight" };
//or even attach other documents
MongoDocument stuff = new MongoDocument(new {
computers = new [] {
"Dell XPS",
"Sony VAIO",
"Macbook Pro"
}
});
document += stuff;
//insert the document immediately
database.Insert("users", document);
}
Connection Pool should be your answer.
The feature is being developed (please see http://jira.mongodb.org/browse/CSHARP-9 for more detail).
Right now, for web application, the best practice is to connect at the BeginRequest and release the connection at EndRequest. But to me, I think that operation is too expensive for each request without Connection Pool. So I decide to have the global Mongo object and using that as shared resource for every threads (If you get the latest C# driver from github right now, they also improve the performance for concurrency a bit).
I don't know the disadvantage for using Global Mongo object. So let's wait for another expert to comment on this.
But I think I can live with it until the feature(Connection pool) have been completed.
I am using csharp-mongodb driver and it doesn't help me with his connection pool :( I have about 10-20 request to mongodb per web request.(150 users online - average) And i can't even monitor statistics or connect to mongodb from shell it throw exception to me.
I have created repository, which open and dispose connection per request. I rely on such things as:
1) Driver has connection pool
2) After my research(i have posted some question in user groups about this) - i understood that creating mongo object and open connection doesn't heavy operation, so heavy operation.
But today my production go down :(
May be i have to save open connection per request...
here is link to user group http://groups.google.com/group/mongodb-user/browse_thread/thread/3d4a4e6c5eb48be3#

Categories