What is the best way to measure inserts per second in MongoDB?

What is the best way to measure inserts per second in MongoDB? - c#

I have a multithreaded (using a threadpool) C# program that reads from a text file containing logs and batch inserts them into a MongoDB collection. I want a consistent and precise way to measure how long it takes to insert the whole file into the collection.
I can't really call a thread.join (because it's a threadpool), and I can't use a stopwatch because they're running on separate threads.
What's the next best thing?
The current way I'm doing it is the timer on my smartphone. I repeatedly call db.collection.stats() and wait till the count is the same as the number of logs in the file...

If you're using C# 4.0+, I'd recommend you use the CountdownEvent class. Using that class, you can just create an instance with the number of logs for example as the counter:
var countdown = new CountdownEvent(numberOfLogs);
Then, each time you complete a write to MongoDB, you signal from the worker thread:
countdown.Signal(); // decrement counter
And then, in your main process (or another thread):
countdown.Wait(); // returns when the count is zero
// All writes complete

With mongostat (command line tool) you see exactly what goes on in the MongoDB server. It will give you inserts/queries etc per second. It won't automatically stop when it's done inserting, but it will definitely give you an insight to performance. The "inserts" will drop to 0 once you're done importing.

Related

How to move a thread or process to another computer

I am trying to serialize a thread (or Process) to a file and execute the thread on a different machine at some other time.
Actually what I have is something like this:
for (BigInteger i = 0; i < ABigIntegerVariable; i++)
{
// My Calculation
}
I want to suspend the computation and save its state, and resume it later with the saved state, possibly on a different machine.
note: I can't save data at the program closing cuz it contains object and It seems that it is note true to save an object
Thank you

Can't you just save your current loop iterator value and whatever calculation state is at the time you want to "move" it? It depends of course what exactly is happening inside that loop but maybe even crude serialization after each iteration would be enough for you to start at new location?
Of course your loop would have to start from saved data not from i = 0 but as I said, you didn't share any details about what is going on in //My Calculation so either put more details in question or try to figure it out on your own.
Also, as per comment from Sidewinder94, there's no problem with serialization of objects unless you are doing it wrong.
One additional thought: are those calculations dependent on your loop iterator or result of previous loop(s)? Because if not you could just split them into multiple threads/tasks and take advantage of parallel calculations.

Which way is better, and how to pass parameters to windows.forms.timer

What I want to accomplish it the following thing:
I have a lot of "checks(if/else if etc)" inside a timer, that his interval is 1000 ms,
there is a text file that getting updated and the timer read it every 1000 ms and check for some specific changes,
under 1 of those conditions if it is true the timer, i need to wait 10 sec and then read another text file and then continue with the rest of the timer code.
but in the mean time the timer keep running under those 10 sec and preform the checks every 1sec for all the other conditions and this 1 also.
what i thought to do it
if the conditions i wanted it true i will start a new timer with 10sec interval and it will continue with the code of that specific part.
but what i have hard time to accomplish is how to pass parameters into that timer
such as
newTimer.Start(int "parameter", string "parameter b", list<string> parameters c")
etc etc
or if you got any other idea i will be glad to hear.

To pass parameters, you can always use Tuple.
newTimer.Start(Tuple.Create(param1, param2, param3));
You might not need to have two timer if you do the logic correctly.
Timer *run every second*
Check file
If file has flag Then save in variable the current date + 10 seconds
If current date > saved date Then Check the other file

You need to consider that a System.Windows.Forms.Timer will run on the GUI thread (the thread that run the containing form's message loop / pump), as it uses window messages (WM_TIMER) that are processed by the GUI thread. Because of that, any code that runs under any System.Windows.Forms.Timer in your form will execute synchronously. So, if your start a new System.Windows.Forms.Timer which blocks for 10 seconds, all your other timers will be blocked as well (since the message loop is blocked).
Consider using a System.Threading.Timer or System.Timers.Timer, as they run the timer callback on a different thread. This way, your code will run pretty much as you'd want - every one second the method will be running, regardless of weather the previous call completed or is still blocking (waiting 10 seconds).

Count the threads ended per second

I have some code that spawns a few threads and invokes a Run() method on them.
What I want to do is to embed a timer into that loop which starts the threads and count how many threads have been completed.
The way I want to do that is by creating an entry in a List every time an IStuff is being ran and then count how many elements are in the list every second, and thus produce a per second result.
Not sure if I'm along the correct lines, but please suggest ways of doing that.

All you need is a variable in a shared scope, say int completionsPerSecond = 0 - and last thing your Run() method should do is increment it by 1.
Then you'd have a timer that every second, copies the value in completionsPerSecond to show you as your value, and then sets completionsPerSecond to 0.
Now every second, you'll know how many finished in the previous second.

.net section running real slow

Update: The answers from Andrew and Conrad were both equally helpful. The easy fix for the timing issue fixed the problem, and caching the bigger object references instead of re-building them every time removed the source of the problem. Thanks for the input, guys.
I'm working with a c# .NET API and for some reason the following code executes what I feel is /extremely/ slowly.
This is the handler for a System.Timers.Timer that triggers its elapsed event every 5 seconds.
private static void TimerGo(object source, System.Timers.ElapsedEventArgs e)
{
tagList = reader.GetData(); // This is a collection of 10 objects.
storeData(tagList); // This calls the 'storeData' method below
}
And the storeData method:
private static void storeData(List<obj> tagList)
{
TimeSpan t = (DateTime.UtcNow - new DateTime(1970, 1, 1));
long timestamp = (long)t.TotalSeconds;
foreach (type object in tagList)
{
string file = #"path\to\file" + object.name + ".rrd";
RRD dbase = RRD.load(file);
// Update rrd with current time timestamp and data.
dbase.update(timestamp, new object[1] { tag.data });
}
}
Am I missing some glaring resource sink? The RRD stuff you see is from the NHawk C# wrapper for rrdtool; in this case I update 10 different files with it, but I see no reason why it should take so long.
When I say 'so long', I mean the timer was triggering a second time before the first update was done, so eventually "update 2" would happen before "update 1", which breaks things because "update 1" has a timestamp that's earlier than "update 2".
I increased the timer length to 10 seconds, and it ran for longer, but still eventually out-raced itself and tried to update a file with an earlier timestamp. What can I do differently to make this more efficient, because obviously I'm doing something drastically wrong...

Doesn't really answer your perf question but if you want to fix the rentrancy bit set your timer.AutoRest to false and then call start() at the end of the method e.g.
private static void TimerGo(object source, System.Timers.ElapsedEventArgs e)
{
tagList = reader.GetData(); // This is a collection of 10 objects.
storeData(tagList); // This calls the 'storeData' method below
timer.Start();
}

Is there a different RRD file for each tag in your tagList? In your pseudo code you open each file N number of times. (You stated there is only 10 objects in the list thought.) Then you perform an update. I can only assume that you dispose your RRD file after you have updated it. If you do not you are keeping references to an open file.
If the RRD is the same but you are just putting different types of plot data into a single file then you only need to keep it open for as long as you want exclusive write access to it.
Without profiling the code you have a few options (I recommend profiling btw)
Keep the RRD files open
Cache the opened files to prevent you from having to open, write close every 5 seconds for each file. Just cache the 10 opened file references and write to them every 5 seconds.
Separate the data collection from data writing
It appears you are taking metric samples from some object every 5 seconds. If you do not having something 'tailing' your file, separate the collection from the writing. Take your data sample and throw it into a queue to be processed. The processor will dequeue each tagList and write it as fast as it can, going back for more lists from the queue.
This way you can always be sure you are getting ~5 second samples even if the writing mechanism is slowed down.

Use a profiler. JetBrains is my personal recommendation. Run the profiler with your program and look for the threads / methods taking the longest time to run. This sounds very much like an IO or data issue, but that's not immediately obvious from your example code.

Parallel programming in C#

I'm interested in learning about parallel programming in C#.NET (not like everything there is to know, but the basics and maybe some good-practices), therefore I've decided to reprogram an old program of mine which is called ImageSyncer. ImageSyncer is a really simple program, all it does is to scan trough a folder and find all files ending with .jpg, then it calculates the new position of the files based on the date they were taken (parsing of xif-data, or whatever it's called). After a location has been generated the program checks for any existing files at that location, and if one exist it looks at the last write-time of both the file to copy, and the file "in its way". If those are equal the file is skipped. If not a md5 checksum of both files is created and matched. If there is no match the file to be copied is given a new location to be copied to (for instance, if it was to be copied to "C:\test.jpg" it's copied to "C:\test(1).jpg" instead). The result of this operation is populated into a queue of a struct-type that contains two strings, the original file and the position to copy it to. Then that queue is iterated over untill it is empty and the files are copied.
In other words there are 4 operations:
1. Scan directory for jpegs
2. Parse files for xif and generate copy-location
3. Check for file existence and if needed generate new path
4. Copy files
And so I want to rewrite this program to make it paralell and be able to perform several of the operations at the same time, and I was wondering what the best way to achieve that would be. I've came up with two different models I can think of, but neither one of them might be any good at all. The first one is to parallelize the 4 steps of the old program, so that when step one is to be executed it's done on several threads, and when the entire of step 1 is finished step 2 is began. The other one (which I find more interesting because I have no idea of how to do that) is to create a sort of worker and consumer model, so when a thread is finished with step 1 another one takes over and performs step 2 at that object (or something like that). But as said, I don't know if any of these are any good solutions. Also, I don't know much about parallel programming at all. I know how to make a thread, and how to make it perform a function taking in an object as its only parameter, and I've also used the BackgroundWorker-class on one occasion, but I'm not that familiar with any of them.
Any input would be appreciated.

There are few a options:
Parallel LINQ: Running Queries On Multi-Core Processors
Task Parallel Library (TPL): Optimize Managed Code For Multi-Core Machines
If you are interested in basic threading primitives and concepts: Threading in C#
[But as #John Knoeller pointed out, the example you gave is likely to be sequential I/O bound]

This is the reference I use for C# thread: http://www.albahari.com/threading/
As a single PDF: http://www.albahari.com/threading/threading.pdf
For your second approach:
I've worked on some producer/consumer multithreaded apps where each task is some code that loops for ever. An external "initializer" starts a separate thread for each task and initializes an EventWaitHandle for each task. For each task is a global queue that can be used to produce/consume input.
In your case, your external program would add each directory to the queue for Task1, and Set the EventWaitHandler for Task1. Task 1 would "wake up" from its EventWaitHandler, get the count of directories in its queue, and then while the count is greater than 0, get the directory from the queue, scan for all the .jpgs, and add each .jpg location to a second queue, and set the EventWaitHandle for task 2. Task 2 reads its input, processes it, forwards it to a queue for Task 3...
It can be a bit of a pain getting all the locking to work right (I basically lock any access to the queue, even something as simple as getting its count). .NET 4.0 is supposed to have data structures that will automatically support a producer/consumer queue with no locks.

Interesting problem.
I came up with two approaches. The first is based on PLinq and the second is based on te Rx Framework.
The first one iterates through the files in parallel.
The second one generates asynchronously the files from the directory.
Here is how it looks like in a much simplified version (The first method does require .Net 4.0 since it uses PLinq)
string direcory = "Mydirectory";
var jpegFiles = System.IO.Directory.EnumerateFiles(direcory,"*.jpg");
// -- PLinq --------------------------------------------
jpegFiles
.AsParallel()
.Select(imageFile => new {OldLocation = imageFile, NewLocation = GenerateCopyLocation(imageFile) })
.Do(fileInfo =>
{
if (!File.Exists(fileInfo.NewLocation ) ||
(File.GetCreationTime(fileInfo.NewLocation)) != (File.GetCreationTime(fileInfo.NewLocation)))
File.Copy(fileInfo.OldLocation,fileInfo.NewLocation);
})
.Run();
// -----------------------------------------------------
//-- Rx Framework ---------------------------------------------
var resetEvent = new AutoResetEvent(false);
var doTheWork =
jpegFiles.ToObservable()
.Select(imageFile => new {OldLocation = imageFile, NewLocation = GenerateCopyLocation(imageFile) })
.Subscribe( fileInfo =>
{
if (!File.Exists(fileInfo.NewLocation ) ||
(File.GetCreationTime(fileInfo.NewLocation)) != (File.GetCreationTime(fileInfo.NewLocation)))
File.Copy(fileInfo.OldLocation,fileInfo.NewLocation);
},() => resetEvent.Set());
resetEvent.WaitOne();
doTheWork.Dispose();
// -----------------------------------------------------

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.