MongoDb C# driver is slow even in simple queries - c#

I need to retrieve all documents from a collection in mongodb. There's nothing special about the query, I just need all the documents to be returned.
The statuses of my collection are:
{
"ns" : "MyDb.MyCollection",
"size" : 206553804,
"count" : 123663,
"avgObjSize" : 1670,
"storageSize" : 30953472,
"capped" : false,
"nindexes" : 1,
"totalIndexSize" : 1122304,
"indexSizes" : {
"_id_" : 1122304
},
"ok" : 1.0
}
In my C # function, I wrote the following code:
var client = new MongoClient("mongodb://localhost:27017");
var database = client.GetDatabase("MyDb");
var collection = database.GetCollection<BsonDocument>("MyCollection");
var documents = collection.Find(new BsonDocument()).ToList();
The problem is that in the line of the function find() takes about 60 seconds to return all documents and this is hindering my application performance. This simple query in other collections is also taking longer than usual to return something. The database is running locally.
I'm using:
MongoDb 3.6.10;
MongoDb.Driver 2.7.0
.NET Framework 4.7
Windows 10
Also, my machine is a 7th generation i7 with 16gb ram and 256 SSD. Queries take the same time when the application is on the production server.
Is there anything I can do to improve this?
Thanks in advance

Related

MongoDB BulkWrite ExceededTimeLimit error in .Net

I'm trying to push about 150k updates into Mongo database (v 4.2.9 running on Windows, stage replica with two nodes) using BulkWrite on c# driver (v2.11.6) and looks like it is impossible. The project is .Net Framework 4.7.2.
Mongo c# driver documentation is terrible, but somehow on forums and with a lot of googling, I was finnaly able to find a way how to run about 150k updates using a batch, something like this (a little simplified for SO):
client = new MongoClient(connString);
database = client.GetDatabase(db);
// Build all the updates
List<UpdateOneModel<GroupEntry>> updates = new List<UpdateOneModel<GroupEntry>>();
foreach (GroupEntry groupEntry in stats)
{
FilterDefinition<GroupEntry> filter = Builders<GroupEntry>.Filter.Eq(e => e.Key, groupEntry.Key);
UpdateDefinitionBuilder<GroupEntry> update = Builders<GroupEntry>.Update;
var groupEntrySubUpdates = new List<UpdateDefinition<GroupEntry>>();
if (groupEntry.Value.Clicks != 0)
groupEntrySubUpdates.Add(update.Inc(u => u.Value.Clicks, groupEntry.Value.Clicks));
if (groupEntry.Value.Position != 0)
groupEntrySubUpdates.Add(update.Set(u => u.Value.Position, groupEntry.Value.Position));
UpdateOneModel<GroupEntry> groupEntryUpdate = new UpdateOneModel<GroupEntry>(filter, update.Combine(updates));
groupEntryUpdate.IsUpsert = true;
updates.Add(groupEntryUpdate);
}
// Now BulkWrite them in transaction to make sure data are consistent
IClientSessionHandle session = client.StartSession();
session.StartTransaction();
IMongoCollection<GroupEntry> collection = database.GetCollection<GroupEntry>(collectionName);
// Following line FAILS after some time
BulkWriteResult<GroupEntry> bulkWriteResult = collection.BulkWrite(session, updates);
if (!bulkWriteResult.IsAcknowledged)
throw new Exception("Mongo BulkWrite is not acknowledged!");
session.CommitTransaction();
The problem is that I keep getting the following exception:
{
"operationTime":Timestamp(1612737199,
1),
"ok":0.0,
"errmsg":"Exec error resulting in state FAILURE :: caused by :: operation was interrupted",
"code":262,
"codeName":"ExceededTimeLimit",
"$clusterTime":{
"clusterTime":Timestamp(1612737199,
1),
"signature":{
"hash":new BinData(0,
"ljcwS5Gf2JBpEu/OgPFbvRqclLw="")",
"keyId":"NumberLong(""6890288652832735234"")"
}
}
}
Does anyone have any clue? Mongo c# driver docs are completely useless. It looks like I should somehow set property $maxTimeMS, but it is not possible on BulkInsert. I have tried:
Restarts and rebuilds
Different versions of MongoDriver
Set much bigger timeouts for all "timeout" properties on MongoClient and session
Create smaller batches for BulkWrite (up to 1000 items per batch). Fails after 50-100 updates.
Spent hours and hours in useless Mongo docs and Mongo JIRA
So far no luck. The funny thing is, that the same approach works on c# driver 2.10.3 on .Net CORE 3.1 (yes, i tried) even with bigger batches (about 300k updates).
What am I missing?
EDIT:
I tried set maxCommitTime to 25 minutes based on dododo's comments like this:
IClientSessionHandle session = client.StartSession(new ClientSessionOptions()
{
DefaultTransactionOptions = new TransactionOptions(new Optional<ReadConcern>(ReadConcern.Default),
new Optional<ReadPreference>(ReadPreference.Primary),
new Optional<WriteConcern>(WriteConcern.Acknowledged),
new Optional<TimeSpan?>(TimeSpan.FromMinutes(25)))
});
It now throws exception while doing commmit: NoSuchTransaction - Transaction 1 has been aborted.. We checked MongoDB log file and found new error in there:
Aborting transaction with txnNumber 1 on session
09ea7755-7148-43e8-83d8-8bf58c211bda because it has been running for
longer than 'transactionLifetimeLimitSeconds
Based on docs, this is 60 seconds by default. So we set it to 5 minutes and now it works.
So, thank you dododo for pointing me the right direction.
Anyway, it would be really great if Mongo team described errors better and write documentation above basic CRUD operations.
As dododo suggested, this error was manifestation of server closing the transaction, because it took longer then transactionLifetimeLimitSeconds, which is 60 seconds by default. So two things needs to be done:
Set parameter transactionLifetimeLimitSeconds to more than 60 seconds
Set maxCommitTime to higher value. I'm unable to find default value, so I set it to 10 minutes (same as transactionLifetimeLimitSeconds). Set it while starting a session (see the question).
Anyway documentation for this is missing and the error itself was misleading. So I hope it helps anyone who will have to deal with with this.

Querying Firestore for Boolean property in Asp.net Core/Kestrel Crashes in Linux Docker

This is something that works locally in Asp.Net Core 2.0 with C# on IIS Express and Firestore Emulator.
I am trying the same code base on GCP with Asp.Net core running on Kestrel inside Linux Container in GCP (Kubernetes)
I can confirm that this is specific to Firestore and specific to Querying with Boolean property only as other queries are working fine.
CollectionReference FoosRef = FirestoreDb.Collection(FooKind);
Query query = FoosRef.WhereEqualTo("IsGoodFoo", true);
// QuerySnapshot querySnapshot = await query.Offset(offset).GetSnapshotAsync();
// Query query = FoosRef.WhereEqualTo("FooName", "p,ezbnR33GU_");
QuerySnapshot querySnapshot = await query.GetSnapshotAsync();
DocumentSnapshot documentSnapshot = querySnapshot.Documents.FirstOrDefault();
As you can see the commented code works fine in the same setup while querying for "IsGoodFoo" with true fails. I can confirm that the data on Firestore is saved as Boolean only as searching for "true" in Firestore UI doesn't give me any results.
(Querying for true works on Firestore Emulator)
As this is hard crash, I don't see any logs written to StackDriver either, any idea where to check the Kestrel Logs or debug this issue properly.
The problem turned out not to be "querying using a Boolean property" but "trying to retrieve the results of a query with about 200,000 results".
There are a few options for doing this. The simplest one when the query still has a small enough number of results is to use the StreamAsync method. Using C# 8 (and version 2.x of the Google.Cloud.Firestore APIs, which support the newer version of IAsyncEnumerable<>) you can just use code like this:
var stream = collection.WhereEqualTo("IsGoodFoo", true).StreamAsync();
await foreach (var document in stream)
{
// Do whatever with the document
}
In my testing (with a 900K+ result query), that timed out after a minute after fetching ~210K items. It's not clear to me yet whether that's expected or not.
A more robust alternative is to issue queries with a limit and a cursor, until the query doesn't retrieve that limit. Here's an example of that:
int limit = 1000;
var query = collection.WhereEqualTo("IsGoodFoo", true).Limit(limit);
// Used to specify a cursor
DocumentSnapshot lastDocument = null;
while (true)
{
var queryWithCursor = lastDocument is null ? query : query.StartAfter(lastDocument);
var querySnapshot = await queryWithCursor.GetSnapshotAsync();
foreach (var document in querySnapshot)
{
// Use the document
}
if (querySnapshot.Count != limit)
{
break;
}
}
Note that while you can specify an Offset rather than using a cursor, this becomes significantly less efficient when the offset is very large.

Get Volume Guid of EFI partition on Windows 2012 R2

I am trying to extract the VolumeGuid of EFI partition. I have been able to do it successfully on Windows 10 Machine using WMI query and Via code using C# ManagementObjectSearcher. I created a VHD with a GPT partition type within it I created the following, a recovery partition, EFI system partition and a basic data partition. The following is the WMI query I run in powershell after mounting the VHD.
I am unable to extract the same in a Windows 2012 R2 machine. Rest of the partitions volume guid I am able to extract in Window 2012 R2 machine.
Sample DiskPart script
CREATE PARTITION PRIMARY SIZE=450 OFFSET=1024 ID=de94bba4-06d1-4d40-a16a-bfd50179d6ac
FORMAT FS=NTFS LABEL="Recovery" UNIT=4096 QUICK
CREATE PARTITION PRIMARY SIZE=99 OFFSET=461824 ID=c12a7328-f81f-11d2-ba4b-00a0c93ec93b
FORMAT FS=FAT32 LABEL="" UNIT=512 QUICK
CREATE PARTITION PRIMARY SIZE=129481 OFFSET=579584 ID=ebd0a0a2-b9e5-4433-87c0-68b6b72699c7
FORMAT FS=NTFS LABEL="" UNIT=4096 QUICK
WMI Query
"Get-WmiObject -Query "SELECT * FROM Msft_Volume" -Namespace Root/Microsoft/Windows/Storage"
In powershell on windows 10, I can see out for the EFI partition as shown below.
__GENUS : 2
__CLASS : MSFT_Volume
__SUPERCLASS : MSFT_StorageObject
__DYNASTY : MSFT_StorageObject
__RELPATH : MSFT_Volume.ObjectId="{1}\\\\computer\\root/Microsoft/Windows/Storage/Providers_v2\\WSP_Volume
.ObjectId=\"{efe10384-2fc4-11e9-bb16-806e6f6e6963}:VO:\\\\?\\Volume{f2f37b30-47b8-4553-804d-9b14
f6b32e1b}\\\""
__PROPERTY_COUNT : 18
__DERIVATION : {MSFT_StorageObject}
__SERVER : computer
__NAMESPACE : Root\Microsoft\Windows\Storage
__PATH : \\computer\Root\Microsoft\Windows\Storage:MSFT_Volume.ObjectId="{1}\\\\computer\\root/Micros
oft/Windows/Storage/Providers_v2\\WSP_Volume.ObjectId=\"{efe10384-2fc4-11e9-bb16-806e6f6e6963}:V
O:\\\\?\\Volume{f2f37b30-47b8-4553-804d-9b14f6b32e1b}\\\""
AllocationUnitSize : 512
DedupMode : 4
DriveLetter :
DriveType : 3
FileSystem : FAT32
FileSystemLabel :
FileSystemType : 6
HealthStatus : 0
ObjectId : {1}\\computer\root/Microsoft/Windows/Storage/Providers_v2\WSP_Volume.ObjectId="{efe10384-2fc4-
11e9-bb16-806e6f6e6963}:VO:\\?\Volume{f2f37b30-47b8-4553-804d-9b14f6b32e1b}\"
OperationalStatus : {2}
PassThroughClass :
PassThroughIds :
PassThroughNamespace :
PassThroughServer :
Path : \\?\Volume{f2f37b30-47b8-4553-804d-9b14f6b32e1b}\
Size : 99614720
SizeRemaining : 99613696
UniqueId : **\\?\Volume{f2f37b30-47b8-4553-804d-9b14f6b32e1b}\**
PSComputerName : computer
However the above WMI query does not return details for EFI partition when running on the "Windows 2012 R2". Even the same query run using c# code doesnt work.
Is there any restriction on Windows 2012 R2 that prevents it from displaying the EFI partition details?
Is there any other way to extract the volume guid of EFI partition?
Currently I had to assign a drive letter to EFI partition in order to read it, I would prefer using the \?\Volume{guid} syntax to open the drive and read it programmatically as it will avoid unnecessarily assigning a drive letter.
Kindly suggest.

Fastest way to retrieve filtered remote event logs

I need to retrieve a few event logs (with specific IDs) from the Security event log from a handful of servers.
I've parallelized the server loop and it works fine and speeds up, but a couple of them have -huge- retention strategies (exported EVTX files for full registries are over 10gb).
I get the logs using a loop like this:
var eventIds = new [] { 1, 2, 3, 4 }
var eventIdQueryStr = string.Join(" or ", eventIds.Select(x => $"EventID={x}"));
var queryXPath = $"*[System[TimeCreated[#SystemTime >= '{dateFrom:s}' and #SystemTime < '{dateTo:s}']]] and *[System[{eventIdQueryStr}]]";
using var session = new EventLogSession(
serverName,
domain,
user,
password,
SessionAuthentication.Default);
var eventsQuery = new EventLogQuery("Security", PathType.LogName, queryXPath) {Session = session};
using (var logReader = new EventLogReader(eventsQuery))
{
for (var eventDetail = logReader.ReadEvent();
eventDetail != null;
eventDetail = logReader.ReadEvent())
{
// event list is a `ConcurrentBag<EventRecord>`
_eventList.Add(eventDetail);
/* other irrelevant stuff for showing progress, not relevant to the speed of the process */
}
}
/* Parsing of the eventList here... irrelevant to the question */
This works, but it's extremely slow. I get around 1 million records (filtered with the xpath query) per hour for each server (not counting the parsing, that's why it's not relevant here) over a VPN.
I understand the event log on Windows is not an indexed database so the query doesn't really speed up things (just filters them), but this is the fastest I could get (with several different techniques, including dumping the whole registry on the remote server, copying the file and/or parsing it directly via the network).
Is there anything I'm missing which could speed it up?
I've tested this on both .NET Core 3.0 and .NET Framework 4.8 with no difference on results whatsoever. Using the wevtutil command line with the same xpath query (using /q:"<query>") give similar performance but I can't get over my head that this is the fastest that can be done.
The servers with big retention in particular are dual xeon servers with LOTS of free CPU and RAM, so I'm quite positive this is not a hardware performance problem.
I'd be grateful for any tips
PS: when I wrote the "progress-showing" parts are irrelevant is because I've tried removing them (just in case outputting progress was the culprit) with no relevant differences to the process speed

Cannot create a capped collection larger than 500 Megabytes

I'm using a Mongo db on 32bit system and I need to create a large capped collection with a max size of 1GB. Everything works fine on 64bit system, but on 32bit I'm getting the error:
com.mongodb.CommandResult$CommandFailure: command failed [command failed [create] {
"serverUsed" : "localhost:27017" ,
"errmsg" : "exception: assertion db\\pdfile.cpp:437" ,
"code" : 0 ,
"ok" : 0.0}
The total storage size for the server is 2GB on 32bit system, but even with this size I can't create a collection larger than 500MB. What does this magic number mean?
Mongo db server version is 2.0.6
Additional info:
I have a couple of database files, the total size of which is 34MB. Before running a mongo db, I'm copying those files into the 'data' directory, starting Mongo db and then in shell I see the same number for the totat size - 35651584 (34MB) (the command used is taken from the comments below). If I try to create a collection of size 500MB I see a new file added (512MB). But if for example I will try to create a collection of size 600MB, I have an error discribed above (but the 512MB file still added).
The Mongo db server log
The Mongo db is started with the command line options:
> db.adminCommand("getCmdLineOpts")
{
"argv" : [
"mongod.exe",
"--dbpath",
"..\\data",
"-vvvvvv",
"--logpath",
"..\\log\\server.log"
],
"parsed" : {
"dbpath" : "..\\data",
"logpath" : "..\\log\\server.log",
"vvvvvv" : true
},
"ok" : 1
}
>
MongoDB runs much better on a 64-bit system, can you change to x64? As Stennie said you're must likely hitting a mmap limit due to other data in your database.
Can you test this hypothesis by connecting with the mongo shell and trying to create a new by running a new collection that is 1 byte larger than 512 MB -
db.createCollection("mycoll6", {capped:true, size:536870913})
You should hopefully get the following error message -
"errmsg" : "exception: can't map file memory - mongo requires 64 bit build for larger datasets",
In the Mongo shell, connect to the admin database and view the size of your database to see how much data you have -
use admin
show dbs
Update: based on some additional testing (I used Ubuntu 12.04 32-bit), this seems like it could be a bug.
Ubuntu Testing
db.createCollection("my13", {capped:true, size:536608768})
{
"errmsg" : "exception: assertion db/pdfile.cpp:437",
"code" : 0,
"ok" : 0
}
db.createCollection("my13", {capped:true, size:536608767})
{ "ok" : 1 }`
536608767 bytes is a little under 512 MB, leaving room for some sort of header in the file.
I thought it was maybe related to [smallfiles][2] as all 32-bit installs run with that option, however, an x64 build with the smallfiles does not display the same symptoms.
I have logged SERVER-6722 for this issue.

Categories