Background / Goal
I have a query in ElasticSearch where I'm using filters on a several fields (relatively small data set and we know exactly what values should be in those fields at the time we query). The idea is that we'll perform a full-text query but only after we've filtered on some selections as made by the user.
I'm putting ElasticSearch behind a WebAPI controller and figured it made sense to use NEST to accomplish the query.
The query, in plain English
We have filters for several fields. Each inner filter is an or filter, but they're together as an AND.
In SQL, the pseudo-code equivalent would be select * from table where foo in (1,2,3) AND bar in (4,5,6).
Questions
Can I simplify the way I'm thinking about this query, based on what you see below? Am I overlooking some basic approach? This seems heavy but I'm new to ES.
How would I properly represent the query below in NEST syntax?
Is NEST the best choice for this? Should I be using the ElasticSearch library instead and going lower level?
The Query Text
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"or": [
{ "term": { "foo": "something" } },
{ "term": { "foo": "somethingElse" } }
]
},
{
"or": [
{ "term": { "bar": "something" } },
{ "term": { "bar": "somethingElse" } }
]
}
]
}
}
}
},
"size": 100
}
This kind of task is quite simple and popular in ES.
You can represent it in NEST like following:
var rs = es.Search<dynamic>(s => s
.Index("your_index").Type("your_type")
.From(0).Size(100)
.Query(q => q
.Filtered(fq => fq
.Filter(ff => ff
.Bool(b => b
.Must(
m1 => m1.Terms("foo", new string[] { "something", "somethingElse" }),
m2 => m2.Terms("bar", new string[] { "something", "somethingElse" })
)
)
)
.Query(qq => qq
.MatchAll()
)
)
)
);
Some notes:
I use filtered query to filter what I need first, then search stuffs later. In this case the it will filter for foo in ("something", "somethingElse") AND bar in ("something", "somethingElse"), then query all filtered results (match_all). You can change match_all to what you need. filtered query it's for best performance as ES will only need to evaluate scores of documents in query part (after filtered), not all documents.
I use terms filter, which more simple and better performance than or. Default mode of terms is OR all input terms, you can refer more in document about available modes (AND, OR, PLAIN, ...).
Nest is best choice for .NET in my opinion as it designed for simple & easy to use purposes. I only used lower API if I want to use new features that Nest does not support at that time, or if Nest have bugs in functions I use.
You can refer here for a brief NEST tutorial: http://nest.azurewebsites.net/nest/writing-queries.html
Updated: Building bool filters dynamic:
var filters = new List<Nest.FilterContainer>();
filters.Add(Nest.Filter<dynamic>.Terms("foo", new string[] { "something", "somethingElse" }));
// ... more filter
then replace .Bool(b => b.Must(...)) with .Bool(b => b.Must(filters.ToArray()))
Hope it help
Related
I have the following collection and I want to query based on Class and FullName from Students
{
"id" : "ABCD",
"Class" : "Math",
"Students" : [
{
"FullName" : "Dan Smith",
},
{
"FullName" : "Dave Jackson",
},
]
}
The following filter works based on class.
var filter = builder.Eq(x => x.Class, "Math");
var document = collection.Find(filter).FirstOrDefaultAsync();
But I want to query based on student also, I tried to add another filter and it has "Cannot implicitly convert type string to bool" error
filter &= builder.Eq(x => x.Students.Any(y => y.FullName,"Dan"));
As you want to query with the nested document in an array, you need $elemMatch operator. In MongoDB .NET Driver syntax, you can achieve with either of these ways:
Solution 1: ElemMatch with LINQ Expression
filter &= builder.ElemMatch(x => x.Students, y => y.FullName == "Dan");
Solution 2: ElemMatch with FilterDefinition
filter &= builder.ElemMatch(x => x.Students,
Builders<Student>.Filter.Eq(y => y.FullName, "Dan"));
The above methods will return no document as the filter criteria don't match with the attached document.
If you look for matching the partial word, you need to work with $regex operator.
Solution: With regex match
filter &= builder.ElemMatch(x => x.Students,
Builders<Student>.Filter.Regex(y => y.FullName, "Dan"));
Demo
I have a list of Students, each student can enter one or more addresses.
I have to find either any of the addresses overlapped in the list and then retrieve those overlapping objects from the list.
below is the example of the list of objects
[
{
"ID" : 1,
"Addresses": ["SRJ", "SJ"]
},
{
"ID" : 2,
"Addresses": ["SJ"}
},
{
"ID" : 3,
"Addresses": ["FRT", "FRI"}
},
{
"ID" : 4,
"Addresses": ["NR", "SJ"}
},
]
in the above list SJ is a repeated string in the three of the objects so, I have to return those three objects from the list with ID 1,2,4.
I have already done the above using loops but I am looking into the most efficient way of doing this task.
Assuming each list item in your question is mapped to a class like below:
public class AddressRecord
{
public int ID { get; set; }
public List<string> Addresses { get; set; }
}
You can then use a Linq expression like below to find duplicates and construct an object that tracks all duplicate addresses and IDs those addresses belong to:
var result = list.SelectMany(x => x.Addresses).GroupBy(x => x).Where(x => x.Count() > 1)
.Select(x => new { x.Key, IDs = list.Where(y => y.Addresses.Contains(x.Key)).Select(y => y.ID).ToList() })
.ToList()
First line in linq expression flattens the address list, runs a groupby expression on them and finds all addresses that exist more than once (across all "addresses" lists.)
Then the second line select each address with duplicates and IDs corresponding to that address. (Note that x in x.Key is the IGrouping object created by the groupby expression and x.Key is the address that the grouping was created on.)
The result object should look something like this when serialized:
[{"Key":"SJ","IDs":[1,2,4]}]
Runtime performance wise, a handmade for loop would most certainly beat this expression. However, from maintainability perspective, this linq expression is likely much easier to manage. (This does depend on comfort level of team with linq.)
I assume that you have a given address and want to check if that exists somewhere else:
string givenAddress = "SJ";
List<Student> overlappingAddressesStudents = students
.Where(s => s.Addresses.Contains(givenAddress))
.ToList();
This might not be more efficient that your loop approach, but maybe it's more readable.
If I've got a JSON array and I want to extract one field from each object, it's fairly simple:
Data:
{
"Values": [
{
"Name": "Bill",
"Age": "25",
"Address": "1234 Easy St."
},
{
"Name": "Bob",
"Age": "28",
"Address": "1600 Pennsylvania Ave."
},
{
"Name": "Joe",
"Age": "31",
"Address": "653 28th St NW"
}
]
}
Query:
data.SelectTokens("Values[*].Name")
This will give me an array of all the names. But what if I want more than one field? Is there any way to get an array of objects containing names and addresses?
The obvious way is to run SelectTokens twice and then Zip them, but will that work? Are the two result arrays guaranteed to preserve the ordering of the original source data? And is there a simpler way that can do it with just one query?
You can use the union operator ['Name','Address'] to select the values of multiple properties simultaneously. However, at some point you're going to need to generate new objects containing just the required properties, for instance by grouping them by parent object:
var query = data.SelectTokens("Values[*]['Name','Address']")
.Select(v => (JProperty)v.Parent) // Get parent JProperty (which encapsulates name as well as value)
.GroupBy(p => p.Parent) // Group by parent JObject
.Select(g => new JObject(g)); // Create a new object with the filtered properties
While this works and uses only one JSONPath query, it feels a little overly complex. I'd suggest just selecting for the objects, then using a nested query to get the required properties like so:
var query = data.SelectTokens("Values[*]")
.OfType<JObject>()
.Select(o => new JObject(o.Property("Name"), o.Property("Address")));
Or maybe
var query = data.SelectTokens("Values[*]")
.Select(o => new JObject(o.SelectTokens("['Name','Address']").Select(v => (JProperty)v.Parent)));
Demo fiddle here.
I'm trying to write the query below using the fluent syntax of MongoDB. I'm using the latest .NET driver. I don't like the strings for naming the columns and would prefer to not have to do the Bson Serialization as well.
var collection = _mongoDbClient.GetDocumentCollection<JobResult>();
var bsonDocuments = collection.Aggregate()
.Group<BsonDocument>(new BsonDocument{ { "_id", "$RunDateTime" }, { "Count", new BsonDocument("$sum", 1) } })
.Sort(new BsonDocument { { "count", -1 } })
.Limit(20)
.ToList();
foreach (var bsonDocument in bsonDocuments)
{
jobResultRunDateTimes.Add(BsonSerializer.Deserialize<JobResultRunDateTime>(bsonDocument));
}
C# driver has implementation of LINQ targeting the mongo aggregation framework, so you should be able to do your query using standard linq operators.
The example below shows a group by (on an assumed property Id) and take the count of documents followed by sorting. In example below x would be of type JobResult, i.e. type you use when getting the collection.
var result = collection.AsQueryable().GroupBy(x => x.Id).
Select(g=>new { g.Key, count=g.Count()}).OrderBy(a=>a.Key).Take(1).ToList();
For detailed reference and more example refer to C# driver documentation
I have a simple document.
{
Name: "Foo",
Tags: [
{ Name: "Type", Value: "One" },
{ Name: "Category", Value: "A" },
{ Name: "Source", Value: "Example" },
]
}
I would like to make a LINQ query that can find these documents by matching multiple Tags.
i.e. Not a SQL query, unless there is no other option.
e.g.
var tagsToMatch = new List<Tag>()
{
new Tag("Type", "One"),
new Tag("Category", "A")
};
var query = client
.CreateDocumentQuery<T>(documentCollectionUri)
.Where(d => tagsToMatch.All(tagToMatch => d.Tags.Any(tag => tag == tagToMatch)));
Which gives me the error Method 'All' is not supported..
I have found examples where a single property on the child object is being matched: LINQ Query Issue with using Any on DocumentDB for child collection
var singleTagToMatch = tagsToMatch.First();
var query = client
.CreateDocumentQuery<T>(documentCollectionUri)
.SelectMany
(
d => d.Tags
.Where(t => t.Name == singleTagToMatch.Name && t.Value == singleTagToMatch.Value)
.Select(t => d)
);
But it's not obvious how that approach can be extended to support matching multiple child objects.
I found there's a function called ARRAY_CONTAINS which can be used: Azure DocumentDB ARRAY_CONTAINS on nested documents
But all the examples I came across are using SQL queries.
This thread indicates that LINQ support was "coming soon" in 2015, but it was never followed up so I assume it wasn't added.
I haven't come across any documentation for ARRAY_CONTAINS in LINQ, only in SQL.
I tried the following SQL query to see if it does what I want, and it didn't return any results:
SELECT Document
FROM Document
WHERE ARRAY_CONTAINS(Document.Tags, { Name: "Type", Value: "One" })
AND ARRAY_CONTAINS(Document.Tags, { Name: "Category", Value: "A" })
According to the comments on this answer, ARRAY_CONTAINS only works on arrays of primitives, not objects. SO it appears not to be suited for what I want to achieve.
It seems the comments on that answer are wrong, and I had syntax errors in my query. I needed to add double quotes around the property names.
Running this query did return the results I wanted:
SELECT Document
FROM Document
WHERE ARRAY_CONTAINS(Document.Tags, { "Name": "Type", "Value": "One" })
AND ARRAY_CONTAINS(Document.Tags, { "Name": "Category", "Value": "A" })
So ARRAY_CONTAINS does appear to achieve what I want, so I'm looking for how to use it via the LINQ syntax.
Using .Contains in the LINQ query will generate SQL that uses ARRAY_CONTAINS.
So:
var tagsToMatch = new List<Tag>()
{
new Tag("Type", "One"),
new Tag("Category", "A")
};
var singleTagToMatch = tagsToMatch.First();
var query = client
.CreateDocumentQuery<T>(documentCollectionUri)
.Where(d => d.Tags.Contains(singleTagToMatch));
Will become:
SELECT * FROM root WHERE ARRAY_CONTAINS(root["Tags"], {"Name":"Type","Value":"One"})
You can chain .Where calls to create a chain of AND predicates.
So:
var query = client.CreateDocumentQuery<T>(documentCollectionUri)
foreach (var tagToMatch in tagsToMatch)
{
query = query.Where(s => s.Tags.Contains(tagToMatch));
}
Will become:
SELECT * FROM root WHERE ARRAY_CONTAINS(root["Tags"], {"Name":"Type","Value":"One"}) AND ARRAY_CONTAINS(root["Tags"], {"Name":"Category","Value":"A"})
If you need to chain the predicates using OR then you'll need some expression predicate builder library.