In my (C# + SQL Server) application, the user will be able to define rules over data such as:
ITEM_ID = 1
OR (ITEM_NAME LIKE 'something' AND ITEM_PRICE > 123
AND (ITEM_WEIGHT = 456 OR ITEM_HEIGHT < 789))
The set of items to validate will always be different but they are not a huge number. However, the number of rules is high (let's say, 100000).
How can I check which rules validated (considering also into account performance) a given set of numbers?
This looks like your "rules" or conditions should be performed in C# instead.
If you are really going to feed 100,000 ORs and ANDs into the WHERE clause of your SQL statement, you are going to have a very hard time scaling your application. I can only imagine the mess of indexes you would have to have to have any arbitrary set of 100,000 conditions be applied to the data set and every permutation perform well.
Instead, I would run a basic select query and read each row and filter it in C# instead. Then you can track which conditions/rules do and don't pass for each row by applying each rule individually and tracking pass/fail.
Of course, if you are querying a very large table, then performance could become an issue, but you stated that "The set of items to validate ... are not a huge number" so I assume it would be relatively quick to bring back all the data for the table and perform your rules in code, or apply some fundamental filtering up front, then more specific filtering back in code.
Out of curiosity, how are the users entering these "rules", like:
ITEM_ID = 1
OR (ITEM_NAME LIKE 'something' AND ITEM_PRICE > 123
AND (ITEM_WEIGHT = 456 OR ITEM_HEIGHT < 789))
Please please please tell me they aren't entering actual SQL queries (in text form) and you are just appending them together, like:
var sql = "select * from myTable where ";
foreach(var rule in rules)
sql += rule;
Maybe some kind of rule-builder UI that builds up these SQL-looking statements?
You could use some of Microsoft's own parsing engine for T-SQL.
You can find them in the assemblies Microsoft.Data.Schema.ScriptDom.dll and Microsoft.Data.Schema.ScriptDom.Sql.dll.
TSql100Parser parser = new TSql100Parser(false);
IList<ParseError> errors;
Expression expr = parser.ParseBooleanExpression(
new StringReader(condition),
out errors
);
if (errors != null && errors.Count > 0)
{
// Error handling
return;
}
If you don't get any errors, the string is a valid filter expression. Though there might be some harmful expressions.
If you wish, you could run the expression trough your own visitor to detect any unwanted constructs (such as sub-queries). But be aware that you would have to override almost all of the 650 overloads, for both Visit(...) and ExplicitVisit(...). Partial classes would be good here.
When you are satisfied, could then build a complete SELECT statement, with all of the expressions:
var schemaObject = new SchemaObjectName();
schemaObject.Identifiers.Add(new Identifier {Value = "MyTable"});
var queryExpression = new QuerySpecification();
queryExpression.FromClauses.Add(
new SchemaObjectTableSource {SchemaObject = schemaObject});
// Add the expression from before (repeat as necessary)
Literal zeroLiteral = new Literal
{
LiteralType = LiteralType.Integer,
Value = "0",
};
Literal oneLiteral = new Literal
{
LiteralType = LiteralType.Integer,
Value = "1",
};
WhenClause whenClause = new WhenClause
{
WhenExpression = expr, // <-- here
ThenExpression = oneLiteral,
};
CaseExpression caseExpression = new CaseExpression
{
ElseExpression = zeroLiteral,
};
caseExpression.WhenClauses.Add(whenClause);
queryExpression.SelectElements.Add(caseExpression);
var selectStatement = new SelectStatement {QueryExpression = queryExpression};
... and turn it all back into a string:
var generator = new Sql100ScriptGenerator();
string query;
generator.GenerateScript(selectStatement, out query);
Console.WriteLine(query);
Output:
SELECT CASE WHEN ITEM_ID = 1
OR (ITEM_NAME LIKE 'something'
AND ITEM_PRICE > 123
AND (ITEM_WEIGHT = 456
OR ITEM_HEIGHT < 789)) THEN 1 ELSE 0 END
FROM MyTable
If this expression gets too large to handle, you could always split up the rules into chunks, to run a few at the time.
Though, to be allowed to redistribute the Microsoft.Data.Schema.ScriptDom.*.dll files, you have to own a licence of Visual Studio Team System (Is this included in at least VS Pro/Ultimate?)
Link: http://blogs.msdn.com/b/gertd/archive/2008/08/22/redist.aspx
Related
I'm working with an application that features Kendo grid and a MongoDB database. The column sorting isn't working properly. The sorting is done on our backend. It's C# and we use the MongoDB C# driver.
I'm very new to MongoDB C# driver and SortDefinition. But I know that the code in our backend that handles the sorting is this:
public async Task<QueryResult> ODataQueryAsync(string filter, string orderBy, int? skip, int? top, string #select = null, string expand = null, CancellationToken cancellationToken = default)
{
var parser = oDataParserProvider.Get<TDto>(filter, orderBy, select, expand);
var filterClause = parser.ParseFilter();
var orderByClause = parser.ParseOrderBy();
var filterDefinition = filterClause.GetFilterDefinition<TDto>();
var sortDefinition = orderByClause.GetSortDefinition<TDto>();
var query = Collection
.Find(filterDefinition)
.Sort(sortDefinition);
...
}
The important lines are 1): var parser = oDataParserProvider.Get<TDto>(filter, orderBy, select, expand) which creates a parser based on the query data: filter, orderBy, select, and expand. 2): var orderByClause = parser.ParseOrderBy() which creates an orderByClause from the parser which simply contains the information about which field to sort by and whether it's ascending or descending. 3) var sortDefinition = orderByClause.GetSortDefinition<TDto>() which creates a SortDefinition out of orderByClause. And finally 4): var query = Collection.Find(filterDefinition).Sort(sortDefinition) which uses the SortDefinition to create/run the query.
There's also the GetSortDefinition() method which looks like this:
public static SortDefinition<T> GetSortDefinition<T>(this OrderByClause sort)
{
var visitor = new MongoDbQueryJsonNodeVisitor();
var builder = new SortDefinitionBuilder<T>();
var current = sort;
var sortDefinitions = new List<SortDefinition<T>>();
while (current != null)
{
var field = current.Expression.Accept(visitor);
switch (current.Direction)
{
case OrderByDirection.Ascending:
sortDefinitions.Add(builder.Ascending(field));
break;
case OrderByDirection.Descending:
sortDefinitions.Add(builder.Descending(field));
break;
}
current = current.ThenBy;
}
return builder.Combine(sortDefinitions);
}
This seems to create SortDefinitions with only two pieces of information: the field to sort by and whether it's ascending or descending (just like the OrderByClause), and then combines them into one SortDefinition which is returned.
But there are columns in the grid that don't sort right. For example, we have a column whose data is numbers but its type is string, so the SortDefinitionBuilder sorts it alphabetically. We get the result:
1.08
1.09
12
13
2.01
^ For this case, is there a way to tell the SortDefinitionBuilder to sort numerically, or create a custom SortDefinition that sorts numerically?
There's also this:
006
011
AAA
BBB
CCC
#
(
+
-
0-001
0-002
aaa
bbb
ccc
^ This is the result of sorting a column alphabetically in ascending order. But the sorting seems to take numbers first, then upper case letter, then symbols (#, (, +, -), then numbers again (0-001, 0-002), and finally lower case letters. I have no idea what this sorting algorithm is doing. I'd like it to ignore case so the upper case and lower case letters aren't separated, and I'd like to move the symbols either to the beginning or end, not between upper case and lower case.
Things like this make me want to create my own sorting algorithm and give it to the SortDefinition or SortDefinitionBuilder. Is it possible to do this? Thanks.
I have the follwoing query in cypher
Match(n1: Red)
Where n1.Id = "someId"
Call apoc.path.subgraphAll(n1,{ minLevel: 0,maxLevel: 100,relationshipFilter: "link",labelFilter: "+Red|Blue"})
Yield nodes, relationships
Return nodes, relationships
The graph I query has roughly a structure of "Red -> Blue -> Red" where all the edges are of the type "link".
The query yield exactly the expected result in the browser client.
My C# looks like this:
string subgraphAll = "apoc.path.subgraphAll";
object optionsObj = new {
minLevel = 0,
maxLevel = 100,
relationshipFilter = $"{link}",
labelFilter = $"+{Red}|{Blue}",
beginSequenceAtStart = "true",
bfs = true,
filterStartNode = false,
limit = -1,
//endNodes = null,
//terminatorNodes = null,
//whitelistNodes = null,
//blacklistNodes = null,
};
string options = JObject.FromObject(optionsObj).ToString();
var query = client.Cypher
.Match($"(n1:{"Red"})")
.Where((Red n1) => n1.Id == "someId")
.Call($"{subgraphAll}(n1, {options})")
.Yield($"nodes,relationships")
//FigureOut what to do
.Return<Object>("");
var result = query.ResultsAsync.Result;
My question is: How would I write that in C# with the Neo4J client and how do I get typesafe lists at the end (something like List<Red>, List<Blue>, List<Relationship>).
As Red and Blue are different types in C#, I don't see how I can deserialize the mixed "nodes" list from the query.
Note that my examples are a bit simplified. The Nodetypes are not strings but come from Enums in my application to have a safe way to know what node types exist and there are real models behind those types.
I tried to break out the whole parametrization of the stored proc, but the code is untested and I don't know if there is a better solution to do this yet. If there is a better way, please advise on that too.
I am new to cypher, so I need a little help here.
My idea was to split the nodes list into two lists (Red and Blue List) and then output the three Lists as properties of an anonymous object (as in the examples). Unfortunately My cypher isn't good enough to figure it out yet, and translating to the c# syntax at the same time doesn't help either.
My main concern is that once I deserialize into a list of untyped objects, It will be hell to parse them back into my models. So I want the query to do that sorting out for me.
In my view, if you want to go down the route of parsing the outputs into Red/Blue classes, it's going to be easier to do it in C# than in Cypher.
Unfortunately, also in this case - I think it'll be easier to execute the query using the Neo4j.Driver driver instead of Neo4jClient - and that's because at the moment, Neo4jClient seems to remove the id (etc) properties you'd need to be able to rebuild the graph properly.
With 4.0.3 of the Client you can access the Driver by doing this:
((BoltGraphClient)client).Driver
I have used a 'Movie/Person' example, as it's a dataset I had to hand, but the principals are the same, something like:
var queryStr = #"
Match(n1: Movie)
Where n1.title = 'The Matrix'
Call apoc.path.subgraphAll(n1,{ minLevel: 0,maxLevel: 2,relationshipFilter: 'ACTED_IN',labelFilter: '+Movie|Person'})
Yield nodes, relationships
Return nodes, relationships
";
var movies = new List<Movie>();
var people = new List<People>();
var session = client.Driver.AsyncSession();
var res = await session.RunAsync(queryStr);
await res.FetchAsync();
foreach (var node in res.Current.Values["nodes"].As<List<INode>>())
{
//Assumption of one label per node.
switch(node.Labels.Single().ToLowerInvariant()){
case "movie":
movies.Add(new Movie(node));
break;
case "person":
/* similar to above */
break;
default:
throw new ArgumentOutOfRangeException("node", node.Labels.Single(), "Unknown node type");
}
}
With Movie etc defined as:
public class Movie {
public long Id {get;set;}
public string Title {get;set;}
public Movie(){}
public Movie(INode node){
Id = node.Id;
Title = node.Properties["title"].As<string>();
}
}
The not pulling back ids etc problem for the client is something I need to look at how to fix, but this is the quickest way short of that to get where you want to be.
So I have done as much research as sanely possible and could not come to a conclusion that worked for me. I have a query I use in Mongo to only grab _id's that end with a certain character. These are standard base 16 object ids in MongoDB. What I want from this is to divide up work where the one doing the query is only working on Ids that end in the base 16 value. So I would have 16 processes responsible for running their respective instructions against ObjectId where _id ends with 0-e.
db.Collection.find({ $where : "this._id.toString().match(/(SomeNumber)..$/i)"})
My question is what would be the C# driver equivalent of this I could use with other filters such as Builders.Filter.Lte("Somefield", someVal) and then .FindOne()?
My final would look something like
var filter1 = Builders<BsonDocument>.Filter.Where(Id matches regex);
var filter2 = Builders<BsonDocument>.Filter.Lte("Somefield", someVal);
var qfilter = filter1 & filter2;
var returndoc = Collection.FindOne(qfilter);
In hopes that this will return one document at a time where the last value of my object id matches the one I am interested in and also has Somefield less than someVal.
This is what I have tried thus far.
//This one doesn't work because objectId does not support regex comparison
var iDFilter = Builders<BsonDocument>.Filter.Regex("_id", _threadIndexB16);
//These two never returned any values
var iDFilter = Builders<BsonDocument>.Filter.Where("this._id.toString().match(/\"" + _myWantedIndex + "..$/i)\")";
var iDFilter = new BsonDocument("$where", "this._id.toString().match(/" + _myWantedIndex + "..$/i)");
Help converting this would be appreciated. Alternatives are also welcome as long as the result is the document I am after.
I have a document that looks essentially like this:
{
"Name": "John Smith",
"Value": "SomethingIneed",
"Tags: ["Tag1" ,"Tag2", "Tag3"]
}
My goal is to write a query where I find all documents in my database whose Tag property contains all of the tags in a filter.
For example, in the case above, my query might be ["Tag1", "Tag3"]. I want all documents whose tags collection contains Tag1 AND Tag3.
I have done the following:
tried an All Contains type linq query
var tags = new List<string>() {"Test", "TestAccount"};
var req =
Client.CreateDocumentQuery<Contact>(UriFactory.CreateDocumentCollectionUri("db", "collection"))
.Where(x => x.Tags.All(y => tags.Contains(y)))
.ToList();
Created a user defined function (I couldn't get this to work at all)
var tagString = "'Test', 'TestAccount'";
var req =
Client.CreateDocumentQuery<Contact>(UriFactory.CreateDocumentCollectionUri("db", "collection"),
$"Select c.Name, c.Email, c.id from c WHERE udf.containsAll([${tagString}] , c.Tags)").ToList();
with containsAll defined as:
function arrayContainsAnotherArray(needle, haystack){
for(var i = 0; i < needle.length; i++){
if(haystack.indexOf(needle[i]) === -1)
return false;
}
return true;
}
Use System.Linq.Dynamic to create a predicate from a string
var query = new StringBuilder("ItemType = \"MyType\"");
if (search.CollectionValues.Any())
{
foreach (var searchCollectionValue in search.CollectionValues)
{
query.Append($" and Collection.Contains(\"{searchCollectionValue}\")");
}
}
3 actually worked for me, but the query was very expensive (more than 2000 RUs on a collection of 10K documents) and I am getting throttled like crazy. My result set for the first iteration of my application must be able to support 10K results in the result set. How can I best query for a large number of results with an array of filters?
Thanks.
The UDF could be made to work but it would be a full table scan and so not recommended unless combined with other highly-selective criteria.
I believe the most performant (index-using) approach would be to split it into a series of AND statements. You could do this programmatically building up your query string (being careful to fully escape and user-provided data for security reasons). So, the resulting query would look like:
SELECT *
FROM c
WHERE
ARRAY_CONTAINS(c.Tags, "Tag1") AND
ARRAY_CONTAINS(c.Tags, "Tag3")
I am trying to figure out how I can add more weight to a description that has the same word multiple times in it to appear first for the lucene.net in c#.
Example:
Pre-condition:
Lets say I have a list of items like this:
Restore Exchange
Backup exchange
exchange is a really great tool, exchange can have many mailboxes
Scenario:
I search for exchange.
The list would be returned in this order:
(it has the same weight as 2 and it was added to the index first)
(it has the same weight as 1 and it was added to the index second)
(has a reference of exchange in it, but its length is greater then 1 and 2)
So I am trying to get #3 to show up first as it has exchange in the description more then one time.
Here is some code showing that I set the Similarity:
// set up lucene searcher
using (var searcher = new IndexSearcher(directory, false))
{
var hits_limit = 1000;
var analyzer = new StandardAnalyzer(Version.LUCENE_29);
searcher.Similarity = new test();
// search by single field
if (!string.IsNullOrEmpty(searchField))
{
var parser = new QueryParser(Version.LUCENE_29, searchField, analyzer);
var query = parseQuery(searchQuery, parser);
var hits = searcher.Search(query, hits_limit).ScoreDocs;
var results = mapLuceneToDataList(hits, searcher);
analyzer.Close();
searcher.Dispose();
return results;
}
// search by multiple fields (ordered by RELEVANCE)
else
{
var parser = new MultiFieldQueryParser
(Version.LUCENE_29, new[] { "Id", "Name", "Description" }, analyzer);
var query = parseQuery(searchQuery, parser);
var hits = searcher.Search
(query, null, hits_limit, Sort.RELEVANCE).ScoreDocs;
var results = mapLuceneToDataList(hits, searcher);
analyzer.Close();
searcher.Dispose();
return results;
}
Disclaimer: I can only speak about Lucene (and not Lucene.NET) but I believe they are built using the same principles.
The reason why documents #1 & #2 come up first is because field weights (1/2 for #1, 1/2 for #2) are higher than 2/11 for #3 (assuming you are not using stop words). The point here is that "exchange" term in first two documents has far more weight than in the third where it's more diluted. This is how default similarity algorithm works. In practice this is a bit more complex, as you can observe in the given link.
So what you are asking for is an alternative similarity algorithm. There's a similar discussion here where MySim, I believe, attempts to achieve something close to what you want. Just don't forget to set this similarity instance to both index writer and searcher.