I have to get some minor data from each document I have in the database but I still want to reduce traffic to prevent "Table-Scan" (just the term, i know its not tables).
I have a collection of lets say "Books" (just because everyone are using it to give examples with ), now, my issue is that I want only the books titles with given author.
var filter = Builders<Book>.Filter.Eq(n => n.Author, AuthorId);
List<string> books = new List<string>();
using (var cursor = await BooksCollection.FindAsync(filter))
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (Book b in batch)
books.Add(b.Title);
}
}
But, when I scan the entire collection result, I'm using big chunks of data, isn't it? lets assume those are not books but entire grid networks and each document is around 5-10 MB and I have thousands of them..how can I reduce the traffic here, without storing this data I need in another collection?
Edit
I think its called "Views" in SQL database.
You can reduce the size of the returned documents via projection which you can set in the FindOptions parameter of FindAsync to only include the fields you need:
var filter = Builders<Book>.Filter.Eq(n => n.Author, AuthorId);
// Just project the Title and Author properties of each Book document
var projection = Builders<Book>.Projection
.Include(b => b.Title)
.Include(b => b.Author)
.Exclude("_id"); // _id is special and needs to be explicitly excluded if not needed
var options = new FindOptions<Book, BsonDocument> { Projection = projection };
List<string> books = new List<string>();
using (var cursor = await BooksCollection.FindAsync(filter, options))
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (BsonDocument b in batch)
// Get the string value of the Title field of the BsonDocument
books.Add(b["Title"].AsString);
}
}
Note that the returned documents are BsonDocument objects instead of Book objects as they only contain the projected fields.
In addition to the accepted answer, you can also apply an expression to the projection for transformation purposes, which works similar as the .Select() method of Linq:
var projection = Builders<Page>.Projection.Expression(x => new Page { Title = x.Title });
Related
I seem to have written some very slow piece of code which gets slower when I have to deal with EF Core.
Basically I have a list of items that store attributes in a Json string in the database as I am storing many different items with different attributes.
I then have another table that contains the display order for each attribute, so when I send the items to the client I am order them based on that order.
It is kinda slow at doing 700 records in about 18-30 seconds (from where I start my timer, not the whole block of code).
var itemDtos = new List<ItemDto>();
var inventoryItems = dbContext.InventoryItems.Where(x => x.InventoryCategoryId == categoryId);
var inventorySpecifications = dbContext.InventoryCategorySpecifications.Where(x => x.InventoryCategoryId == categoryId).Select(x => x.InventorySpecification);
Stopwatch a = new Stopwatch();
a.Start();
foreach (var item in inventoryItems)
{
var specs = JObject.Parse(item.Attributes);
var specDtos = new List<SpecDto>();
foreach (var inventorySpecification in inventorySpecifications.OrderBy(x => x.DisplayOrder))
{
if (specs.ContainsKey(inventorySpecification.JsonKey))
{
var value = specs.GetValue(inventorySpecification.JsonKey);
var newSpecDto = new SpecDto()
{
Key = inventorySpecification.JsonKey,
Value = displaySpec.ToString()
};
specDtos.Add(newSpecDto);
}
}
var dto = new InventoryItemDto()
{
// create dto
};
inventoryItemDtos.Add(dto);
}
Now it goes crazy slow when I add EF some more columns that I need info from.
In the //create dto area I access some information from other tables
var dto = new InventoryItemDto()
{
// access brand columns
// access company columns
// access branch columns
// access country columns
// access state columns
};
By trying to access these columns in the loop takes 6mins to process 700 rows.
I don't understand why it is so slow, it's the only change I really made and I made sure to eager load everything in.
To me it almost makes me think eager loading is not working, but I don't know how to verify if it is or not.
var inventoryItems = dbContext.InventoryItems.Include(x => x.Branch).ThenInclude(x => x.Company)
.Include(x => x.Branch).ThenInclude(x => x.Country)
.Include(x => x.Branch).ThenInclude(x => x.State)
.Include(x => x.Brand)
.Where(x => x.InventoryCategoryId == categoryId).ToList();
so I thought because of doing this the speed would not be that much different then the original 18-30 seconds.
I would like to speed up the original code too but I am not really sure how to get rid of the dual foreach loops that is probably slowing it down.
First, loops inside loops is a very bad thing, you should refactor that out and make it a single loop. This should not be a problem because inventorySpecifications is declared outside the loop
Second, the line
var inventorySpecifications = dbContext.InventoryCategorySpecifications.Where(x => x.InventoryCategoryId == categoryId).Select(x => x.InventorySpecification);
should end with ToList(), because it's enumerations is happening within the inner foreach, which means that the query is running for each of "inventoryItems"
that should save you a good amount of time
I'm no expert but this part of your second foreach raises a red flag: inventorySpecifications.OrderBy(x => x.DisplayOrder). Because this is getting called inside another foreach it's doing the .OrderBy call every time you iterate over inventoryItems.
Before your first foreach loop, try this: var orderedInventorySpecs = inventorySpecifications.OrderBy(x => x.DisplayOrder); and then use foreach (var inventorySpec in orderedInventorySpecs) and see if it makes a difference.
To help you better understand what EF is running behind the scenes add some logging in to expose the SQL being run which might help you see how/where your queries are going wrong. This can be extremely helpful to help determine if your queries are hitting the DB too often. As a very general rule you want to hit the DB as few times as possible and retrieve only the information you need via the use of .Select() to reduce what is being returned. The docs for the logging are: http://learn.microsoft.com/en-us/ef/core/miscellaneous/logging
I obviously cannot test this and I am a little unsure where your specDto's go once you have them but I assume they become part of the InventoryItemDto?
var itemDtos = new List<ItemDto>();
var inventoryItems = dbContext.InventoryItems.Where(x => x.InventoryCategoryId == categoryId).Select(x => new InventoryItemDto() {
Attributes = x.Attributes,
//.....
// access brand columns
// access company columns
// access branch columns
// access country columns
// access state columns
}).ToList();
var inventorySpecifications = dbContext.InventoryCategorySpecifications
.Where(x => x.InventoryCategoryId == categoryId)
.OrderBy(x => x.DisplayOrder)
.Select(x => x.InventorySpecification).ToList();
foreach (var item in inventoryItems)
{
var specs = JObject.Parse(item.Attributes);
// Assuming the specs become part of an inventory item?
item.specs = inventorySpecification.Where(x => specs.ContainsKey(x.JsonKey)).Select(x => new SpecDto() { Key = x.JsonKey, Value = specs.GetValue(x.JsonKey)});
}
The first call to the DB for inventoryItems should produce one SQL query that will pull all the information you need at once to construct your InventoryItemDto and thus only hits the DB once. Then it pulls the specs out and uses OrderBy() before materialising which means the OrderBy will be run as part of the SQL query rather than in memory. Both those results are materialised via .ToList() which will cause EF to pull the results into memory in one go.
Finally the loop goes over your constructed inventoryItems, parses the Json and then filters the specs based on that. I am unsure of where you were using the specDtos so I made an assumption that it was part of the model. I would recomend checking the performance of the Json work you are doing as that could be contributing to your slow down.
A more integrated approach to using Json as part of your EF models can be seen at this answer: https://stackoverflow.com/a/51613611/621524 however you will still be unable to use those properties to offload execution to SQL as accessing properties that are defined within code will cause queries to fragment and run in several parts.
This issue is a new one to me in LINQ. And maybe I'm going about this wrong.
What I have is a list of objects in memory, which could number up to 100k, and I need to find in my database which objects represent an existing customer.
This search needs to be done across multiple object properties and all I have to go on are the name and address of the person - no unique identifier since this data comes from an outside source.
Is it possible to join my generic of objects against my database context and then update the generic objects, with data from the context, based on whether they are found in the join?
I thought I was getting close to the join working with the below code. And I think the join works .. maybe. But I can't even seem to loop through the records.
public void FindCustomerMatches(List<DocumentLine> lines)
{
IQueryable<DocumentLine> results = null;
var linesQuery = lines.AsQueryable();
using (var customerContext = new Entities())
{
customerContext.Configuration.LazyLoadingEnabled = false;
var dbCustomerQuery = customerContext.customers.Where(c => !c.customernumber.StartsWith("D"));
results = from c in dbCustomerQuery
from l in linesQuery
where c.firstname1 == l.CustomerFirstName
&& c.lastname1 == l.CustomerLastName
&& c.street_address1.Contains(l.CustomerAddress)
&& c.city == l.CustomerCity
&& c.state == l.CustomerState
&& c.zip == l.CustomerZip
select l;
foreach (var result in results)
{
// Do something with each record here, like update it.
}
}
}
It seems to me that you have two collections: a local collection of DocumentLines in variable lines, and a collection of Customers in a customerContext.Customers, probably in a database management system.
Every DocumentLine contains several properties that can also be found in a Customer. Alas you didn't say whether all DocumentLine properties can be found in a Customer.
From lines (the local collection of DocumentLines) you only want to keep only those DocumentLines of which there is at least one Customer in your queryable collection of Customers that match all these properties.
So the result is a sequence of DocumentLines, a sub-collection of lines.
The problem is that you don't want to query a sub-collection of the database table Customers, but you want a sub-collection of your local lines.
Using AsQueryable doesn't transport your lines to your DBMS. I doubt whether the query you defined will be performed by the DBMS. I suspect that all Customers will be transported to your local process to perform the query.
If all properties of a DocumentLine are in a Customer then it is possible to extract the DocumentLines properties from every Customer and use Queryable.Contains to keep only those extracted DocumentLines that are in your lines:
IQueryable<DocumentLine> customerDocumentLines = dbContext.Customers
.Select(customer => new DocumentLine()
{
FirstName = customer.FirstName,
LastName = customer.LastName,
...
// etc, fill all DocumentLine properties
});
Note: the query is not executed yet! No communication with the DBMS is performed
Your requested result are all customerDocumentLines that are contained in lines, removing the duplicates.
var result = customerDocumentLines // extract the document lines from all Customers
.Distinct // remove duplicates
.Where(line => lines.Contains(line)); // keep only those lines that are in lines
This won't work if you can't extract a complete DocumentLine from a Customer. If lines contains duplicates, the result won't show these duplicates.
If you can't extract all properties from a DocumentLine you'll have to move the values to check to local memory:
var valuesToCompare = dbContext.Customers
.Select(customer => new
{
FirstName = customer.FirstName,
LastName = customer.LastName,
...
// etc, fill all values you need to check
})
.Distinct() // remove duplicates
.AsEnumerable(); // make it IEnumerable,
// = efficiently move to local memory
Now you can use Enumerable.Contains to get the subset of lines. You'll need to compare by value, not by reference. Luckily anonymous types compare for equality by value
var result = lines
// extract the values to compare
.Select(line => new
{
Line = line,
ValuesToCompare = new
{
FirstName = customer.FirstName,
LastName = customer.LastName,
...
})
})
// keep only those lines that match valuesToCheck
.Where(line => valuesToCheck.Contains(line.ValuesToCompare));
I hope this is not a duplicate but I wasn't able to find an answer on this.
It either seems to be an undesired behavior or missing knowledge on my part.
I have a list of platform and configuration objects. Both contains a member string CodeName in it.
The list of CodeNames look like this:
dbContext.Platforms.Select(x => x.CodeName) => {"test", "PC", "Nintendo"}
dbContext.Configurations.Select(x => x.CodeName) => {"debug", "release"}
They are obtained from a MySQL database hence the dbContext object.
Here is a simple code that I was to translate in LINQ because 2 foreach are things of the past:
var choiceList = new List<List<string>>();
foreach (Platform platform in dbContext.Platforms.ToList())
{
foreach (Configuration configuration in dbContext.Configurations.ToList())
{
choiceList.Add(new List<string>() { platform.CodeName, configuration.CodeName });
}
}
This code gives my exactly what I want, keeping the platform name first which looks like :
var results = new List<List<string>>() {
{"test", "debug"},
{"test", "release"},
{"PC", "debug"}
{"PC", "release"}
{"Nintendo", "debug"}
{"Nintendo", "release"}};
But if I translate that to this, my list contains item in a different order:
var choiceList = dbContext.Platforms.SelectMany(p => dbContext.Configurations.Select(t => new List<string>() { p.CodeName, t.CodeName })).ToList();
I will end up with this, where the platform name isn't always first, which is not what is desired:
var results = new List<List<string>>() {
{"debug", "test"},
{"release", "test"},
{"debug", "PC"}
{"PC", "release"}
{"debug", "Nintendo"}
{"Nintendo", "release"}};
My question is, is it possible to obtain the desired result using LINQ?
Let me know if I'm not clear or my question lacks certain details.
Thanks
EDIT: So Ivan found the explanation and I modified my code in consequence.
In fact, only the Enumerable in front of the SelectMany needed the .ToList().
I should also have mentioned that I was stuck with the need of a List>.
Thanks everyone for the fast input, this was really appreciated.
When you use
var choiceList = dbContext.Platforms.SelectMany(p => dbContext.Configurations.Select(t => new List<string>() { p.CodeName, t.CodeName })).ToList();
it's really translated to some SQL query where the order of the returned records in not defined as soon as you don't use ORDER BY.
To get the same results as your nested loops, execute and materialize both queries, and then do SelectMany in memory:
var platforms = dbContext.Platforms.ToList();
var configurations = dbContext.Configurations.ToList();
var choiceList = platforms.SelectMany(p => configurations,
(p, c) => new List<string>() { p.CodeName, c.CodeName })
.ToList();
Rather than projecting it out to an array, project it out two a new object with two fields (potentially an anonymous object) and then, if you need it, project that into a two element array after you have retrieved the objects from the database, if you really do need these values in an array.
Try this-
var platforms= dbContext.Platforms.Select(x=>x.CodeName);
var configurations=dbContext.Configurations.Select(x=>x.CodeName);
var mix=platforms.SelectMany(num => configurations, (n, a) => new { n, a });
If you want to learn more in detail- Difference between Select and SelectMany
I am using MediatR to request A visualizationDto
public VisualizationResponse Handle(VisualizationQuery message)
{
return new VisualizationResponse
{
LoadTick = DateTime.Now.Ticks,
Visualization = new VisualizationDto
{
infeed = context.Unloaders.ProjectToList<InfeedDto>(),
Levels = context.Levels.ProjectToList<LevelDto>()
}
};
}
These get mapped directly from a DbContext. Now the problem is that ProjectToList<> maps recursively. In the level there's a list of buffers and in each buffer there's a list of stacks. Now i only need to map the stacks that have a TimeOut value of null. I don't want to filter trough everything after mapping because this might slow down things. And i tried
var lq = context.Levels;
var stacks = lq
.SelectMany(l => l.Buffers)
.SelectMany(b => b.StackLocations)
.Where(s => s.TimeOut == null);
Levels = lq.ProjectTo<LevelDto>().Select(l => new {l, stacks}).ToList().Select(x => x.l).ToList()
But the values i receive aren't the filtered ones still the full dataset. Are there any other ways to filter on a ProjectToList?
right now i have an output that looks like
List<LevelDto>
-List<BufferDto>
-List<StackLocationDto>
-stack timeIn- TimeOut
-stack timeIn- TimeOut
-stack timeIn- null
-stack timeIn- null
i need to filter out the stacks that are already finished so those that do not have a TimeOut of null.
The Where condition is only executed to stacks not to lq. Try stacks.ProjectTo<LevelDto>(), that should do the trick.
I am taking input from a client to build up an elasticsearch query using NEST. I start out with the basics, like so:
var search = esClient.Search<MyData>(s => s
.From(pageNum * pageSize)
.Take(pageSize)
.QueryRaw(#"{""match_all"": {} }")
I then parse out the request and see if an optional sorting parameter was passed in. If it was, I create a new SearchDescriptor<MyData>() which performs that requested sort, and I want to add it to my original search criteria. Obviously .Search() will actually perform an HTTP call, so it can't happen as it is today, but how can I stick a series of SearchDescriptor calls together and then perform the search at the end?
You can build SearchDescriptor incrementally as under. I've used aggregations instead of facets (which are deprecated now) but I hope you get the idea.
var sd = new SearchDescriptor<MyData>();
sd = sd.QueryRaw(<raw query string>);
if (<should sort>)
{
string fieldToBeSortedOn; // input from user
bool sortInAscendingOrder; // input from user
if (sortInAscendingOrder)
{
sd = sd.Sort(f => f
.Ascending()
.OnField(fieldToBeSortedOn));
}
else
{
sd = sd.Sort(f => f
.Descending()
.OnField(fieldToBeSortedOn));
}
}
if (<should compute aggregations>)
{
sd = sd.Aggregations(a => a
.Terms(
"term_aggs",
t => t
.Field(<name of field to compute terms aggregation on>)));
}
var search = esClient.Search<MyData>(s => sd);