Select all _id from mongodb collection with c# driver - c#

I have large document collection in mongodb and want to get only _id list. Mongodb query is db.getCollection('Documents').find({},{_id : 0, _id: 1}). But in C# query
IMongoCollection<T> Collection { get; set; }
...
List<BsonDocument> mongoResult = this.Collection.FindAsync(FilterDefinition<T>.Empty, new FindOptions<T, BsonDocument>() { Projection = "{ _id: 0, _id: 1 }" }).Result.ToList();
throw exeption InvalidOperationException: Duplicate element name '_id'.
I want to get only _id list, other fileds not needed. Documents may have different structures and exclude all other fileds manualy difficult.
What C# query corresponds to the specified mongodb query db.getCollection('Documents').find({},{_id : 0, _id: 1}?
UPDATE: Do not offer solutions related query large amounts of data from the server, for example like
this.Collection.Find(d => true).Project(d => d.Id).ToListAsync().Result;

Since your using C# driver I would recommend to use the AsQueryable and then use linq instead.
In my opinion it is better since you wouldn't need the magic strings and you would benefit from your linq knowledge. Then it would look something like this
database.GetCollection<T>("collectionname").AsQueryable().Select(x => x.Id);

Alexey is correct, solutions such as these
var result = (await this.Collection<Foos>
.Find(_ => true)
.ToListAsync())
.Select(foo => foo.Id);
Will pull the entire document collection over the wire, deserialize, and then map the Id out in Linq To Objects, which will be extremely inefficient.
The trick is to use .Project to return just the _id keys, before the query is executed with .ToListAsync().
You can specify the type as a raw BsonDocument if you don't want to use a strongly typed DTO to deserialize into.
var client = new MongoClient(new MongoUrl(connstring));
var database = client.GetDatabase(databaseName);
var collection = database.GetCollection<BsonDocument>(collectionName);
var allIds = (await collection
.Find(new BsonDocument()) // OR (x => true)
.Project(new BsonDocument { { "_id", 1 } })
.ToListAsync())
.Select(x => x[0].AsString);
Which executes a query similar to:
db.getCollection("SomeCollection").find({},{_id: 1})

Related

MongoDB .NET Fluent Aggregate Query

I'm trying to write the query below using the fluent syntax of MongoDB. I'm using the latest .NET driver. I don't like the strings for naming the columns and would prefer to not have to do the Bson Serialization as well.
var collection = _mongoDbClient.GetDocumentCollection<JobResult>();
var bsonDocuments = collection.Aggregate()
.Group<BsonDocument>(new BsonDocument{ { "_id", "$RunDateTime" }, { "Count", new BsonDocument("$sum", 1) } })
.Sort(new BsonDocument { { "count", -1 } })
.Limit(20)
.ToList();
foreach (var bsonDocument in bsonDocuments)
{
jobResultRunDateTimes.Add(BsonSerializer.Deserialize<JobResultRunDateTime>(bsonDocument));
}
C# driver has implementation of LINQ targeting the mongo aggregation framework, so you should be able to do your query using standard linq operators.
The example below shows a group by (on an assumed property Id) and take the count of documents followed by sorting. In example below x would be of type JobResult, i.e. type you use when getting the collection.
var result = collection.AsQueryable().GroupBy(x => x.Id).
Select(g=>new { g.Key, count=g.Count()}).OrderBy(a=>a.Key).Take(1).ToList();
For detailed reference and more example refer to C# driver documentation

C# and .NET MongoDB Driver: Retrieving distinct elements while setting a limit

I'm trying to retrieve a single property from all the documents in a collection. And a want a way to filter the result for a period while setting a limit and getting distinct values.
This is the way my documents look in the collection:
{
_id: [ObjectId],
number: [string],
timestamp: [Datetime]
}
Im using the 2.7 version of the driver.
So i want to retrive the number field distinct for documents in a specific period of the timestamp property. And while the _id property is bigger then a specific one...
The queries i tried so far are:
var filter = Builders<Entity>.Filter.And(
Builders<Entity>.Filter.Gte(entity => entity.timestamp, startDate),
Builders<Entity>.Filter.Lte(entity => entity.timestamp, endDate),
Builders<Entity>.Filter.Gte(entity => entity.Id, objectId));
var query = collection
.DistinctAsync(item => item.number, filter);
No way to set a limit?
var filter = Builders<Entity>.Filter.And(
Builders<Entity>.Filter.Gte(entity => entity.timestamp, startDate),
Builders<Entity>.Filter.Lte(entity => entity.timestamp, endDate),
Builders<Entity>.Filter.Gte(entity => entity.Id, lastObjectId));
var query = collection
.Find(filter)
.Sort(Builders<Entity>.Sort.Ascending(entity => entity.number))
.Project(entity => new { entity.number})
.Limit(100);
No way to get distinct values?
Because of the size of the collection i do not want to do any of these operation on the client.
Do anybody have a solution? Thanks in advance!

c# mongo 2.0 reduce traffic of FindAsync

I have to get some minor data from each document I have in the database but I still want to reduce traffic to prevent "Table-Scan" (just the term, i know its not tables).
I have a collection of lets say "Books" (just because everyone are using it to give examples with ), now, my issue is that I want only the books titles with given author.
var filter = Builders<Book>.Filter.Eq(n => n.Author, AuthorId);
List<string> books = new List<string>();
using (var cursor = await BooksCollection.FindAsync(filter))
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (Book b in batch)
books.Add(b.Title);
}
}
But, when I scan the entire collection result, I'm using big chunks of data, isn't it? lets assume those are not books but entire grid networks and each document is around 5-10 MB and I have thousands of them..how can I reduce the traffic here, without storing this data I need in another collection?
Edit
I think its called "Views" in SQL database.
You can reduce the size of the returned documents via projection which you can set in the FindOptions parameter of FindAsync to only include the fields you need:
var filter = Builders<Book>.Filter.Eq(n => n.Author, AuthorId);
// Just project the Title and Author properties of each Book document
var projection = Builders<Book>.Projection
.Include(b => b.Title)
.Include(b => b.Author)
.Exclude("_id"); // _id is special and needs to be explicitly excluded if not needed
var options = new FindOptions<Book, BsonDocument> { Projection = projection };
List<string> books = new List<string>();
using (var cursor = await BooksCollection.FindAsync(filter, options))
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (BsonDocument b in batch)
// Get the string value of the Title field of the BsonDocument
books.Add(b["Title"].AsString);
}
}
Note that the returned documents are BsonDocument objects instead of Book objects as they only contain the projected fields.
In addition to the accepted answer, you can also apply an expression to the projection for transformation purposes, which works similar as the .Select() method of Linq:
var projection = Builders<Page>.Projection.Expression(x => new Page { Title = x.Title });

Order query by int from an ordered list LINQ C#

So I am trying to order a query by an int var that is in an ordered list of the same int vars; e.g. the query must be sorted by the lists order of items. Each datacontext is from a different database which is the reason i'm making the first query into an ordered list of id's based on pet name order, only the pet id is available from the second query's data fields, Query looks like:
using (ListDataContext syndb = new ListDataContext())
{
using (QueryDataContext ledb = new QueryDataContext())
{
// Set the order of pets by name and make a list of the pet id's
var stp = syndb.StoredPets.OrderBy(x => x.Name).Select(x => x.PetID).ToList();
// Reorder the SoldPets query using the ordered list of pet id's
var slp = ledb.SoldPets.OrderBy(x => stp.IndexOf(x.petId)).Select(x => x);
// do something with the query
}
}
The second query is giving me a "Method 'Int32 IndexOf(Int32)' has no supported translation to SQL." error, is there a way to do what I need?
LINQ to SQL (EF) has to translate your LINQ queries into SQL that can be executed against a SQL server. What the error is trying to say, is that the .NET method of IndexOf doesn't have a SQL equivalent. You may be best to get your data from your SoldPets table without doing the IndexOf part and then doing any remaining ordering away from LINQ to SQL (EF).
Something like this should work:
List<StoredPet> storedPets;
List<SoldPet> soldPets;
using (ListDataContext listDataContext = new ListDataContext())
{
using (QueryDataContext queryDataContext= new QueryDataContext())
{
storedPets =
listDataContext.StoredPets
.OrderBy(sp => sp.Name)
.Select(sp => sp.PetId)
.ToList();
soldPets =
queryDataContext.SoldPets
.ToList();
}
}
List<SoldPets> orderedSoldPets =
soldPets.OrderBy(sp => storedPets.IndexOf(sp.PetId))
Note: Your capitalisation of PetId changes in your example, so you may wish to look at that.
LinqToSql can't transalte your linq statement into SQL because there is no equivalent of IndexOf() method. You will have to execute the linq statement first with ToList() method and then do sorting in memory.
using (ListDataContext syndb = new ListDataContext())
using (QueryDataContext ledb = new QueryDataContext())
{
var stp = syndb.StoredPets.OrderBy(x => x.Name).Select(x => x.PetID).ToList();
// Reorder the SoldPets query using the ordered list of pet id's
var slp = ledb.SoldPets.ToList().OrderBy(x => stp.IndexOf(x.petId));
}
You can use this, if the list size is acceptable:
using (ListDataContext syndb = new ListDataContext())
{
using (QueryDataContext ledb = new QueryDataContext())
{
var stp = syndb.StoredPets.OrderBy(x => x.Name).Select(x => x.PetID).ToList();
var slp = ledb.SoldPets.ToList().OrderBy(x => stp.IndexOf(x.petId));
// do something with the query
}
}

Using LINQ, How do I return objects from a GroupBy clause?

Thanks for looking!
Background
Within my C# code, I am calling a stored procedure from a MSSQL database that returns a history of orders that a user has made. I use the Entity Framework for this. Here is the code:
var db = new CustomerEntities();
return db.GetOrderHistoryByUserId(id);
The output of this stored procedure is a list of orders with multiple records having the same OrderNumber because there may have been multiple products in a given order. So I modified my code to look like this:
var db = new CustomerEntities();
return db.GetOrderHistoryByUserId(id).GroupBy(p => p.OrderNumber);
I was hoping to now have a list of Order objects with nested Product objects, but instead this code essentially produced the same response as before.
Ultimately, I just want to convert this query into a JSON response that looks something like this:
Orders : [
{
OrderNumber : 1,
OrderTotal: $500,
Products: [
{ProductSku : 11111, ProductPrice: $200},
{ProductSku : 22222, ProductPrice: $300}
]
}
]
I am using MVC 4 to aid in producing the JSON output, so I am already clear on that part, I just need to know how to consume the results of the stored procedure in a manner that produces an array of objects with the desired structure.
Question
Is there a way of producing this desired object structure with the original LINQ call to the stored procedure, or am I going to need to iterate the results of the stored procedure and construct a new object?
Thanks for your help!
var serializer = new JavaScriptSerializer();
var rows = db.GetOrderHistoryByUserId(id);
var json = serializer.Serialize(
new { Orders = rows.GroupBy(o => o.Number)
.Select(g =>
new
{
OrderNumber = g.Key,
OrderTotal = g.Sum(o => o.Price),
Products = g.Select(
o => new {SKU = o.Sku, ProductPrice = o.Price}
)
})
});
I actually did search for help on this earlier before posting, but I happened upon the answer just now. I needed a Select clause:
var db = new CustomerEntities();
return db.GetOrderHistoryByUserId(id).GroupBy(o => o.OrderNumber).Select(g => new {Order = g.Key, Items = g});
Click here to see the webpage that finally helped me.
Hope this helps someone.

Categories