allow user to enter equation used to evaluate telemetry data - c#

I currently have sensor data being dumped into a database. This is raw data, and needs an equation applied to it in order for it to make any sense to the end users. The problem I have, is that I do not know most of the formulas yet, and would also like the program to be flexible enough that when a new sensor is added to the system, the user would be able to enter in the calibration equation that would be able to convert the raw data into something useful.
I have never worked with letting a user enter in an equation to manipulate data. I would appreciate any input that might help. What direction should I be looking, should I be trying out lambda expression trees, evaluating the equation and compiling it using CodeDom, or looking in another direction? I have never done much with either lambda expression trees or CodeDom, and like always and on a fairly tight schedule, so the learning curve does count. I will have the opportunity to go back and make it better at a later date, they just need it up and running for now.
Thanks for any input.

I highly recommend FLEE for expression parsing/evaluation. It has a custom IL compiler that emits fast IL that doesn't have the memory problems that CodeDOM has.
It also has the desirable attribute of being easy to code with and extend.

I think you need to see what works for you. I also thought of the two only to find out you have mentioned them. I think the other alternative is to allow for parameters of a few major formulae to be stored (i.e. cubic, quadratic, exponential, log, ...) and one selected as the one to be used.
I would personally use the expression trees because it is the cleanest. One problem with CodeDom is the memory leak caused by compiling code especially if the user changes the code and builds the formula multiple times. One solution would be to load the compiled code in a separate AppDomain and then unload the whole appdomain.

Related

Implement Language Auto-Completion based on ANTLR4 Grammar

I am wondering if are there any examples (googling I haven't found any) of TAB auto-complete solutions for Command Line Interface (console), that use ANTLR4 grammars for predicting the next term (like in a REPL model).
I've written a PL/SQL grammar for an open source database, and now I would like to implement a command line interface to the database that provides the user the feature of completing the statements according to the grammar, or eventually discover the proper database object name to use (eg. a table name, a trigger name, the name of a column, etc.).
Thanks for pointing me to the right direction.
Actually it is possible! (Of course, based on the complexity of your grammar.) Problem with auto-completion and ANTLR is that you do not have complete expression and you want to parse it. If you would have complete expression, it wont be any big problem to know what kind of element is at what place and to know what can be used at such a place. But you do not have complete expression and you cannot parse the incomplete one. So what you need to do is to wrap the input into some wrapper/helper that will complete the expression to create a parse-able one. Notice that nothing that is added only to complete the expression is important to you - you will only ask for members up to last really written character.
So:
A) Create the wrapper that will change this (excel formula) '=If(' into '=If()'
B) Parse the wrapped input
C) Realize that you are in the IF function at the first parameter
D) Return all that can go into that place.
It actually works, I have completed intellisense editor for several simple languages. There is much more infrastructure than this, but the basic idea is as I wrote it. Only be careful, writing the wrapper is not easy if not impossible if the grammar is really complex. In that case look at Papa Carlo project. http://lakhin.com/projects/papa-carlo/
As already mentioned auto completion is based on the follow set at a given position, simply because this is what we defined in the grammar to be valid language. But that's only a small part of the task. What you need is context (as Sam Harwell wrote: it's a semantic process, not a syntactic one). And this information is independent of the parser. And since a parser is made to parse valid input (and during auto completion you have most of the time invalid input), it's not the right tool for this task.
Knowing what token can follow at a given position is useful to control the entire process (e.g. you don't want to show suggestions if only a string can appear), but is most of the time not what you actually want to suggest (except for keywords). If an ID is possible at the current position, it doesn't tell you what ID is actually allowed (a variable name? a namespace? etc.). So what you need is essentially 3 things:
A symbol table that provides you with all possible names sorted by scope. Creating this depends heavily on the parsed language. But this is a task where a parser is very helpful. You may want to cache this info as it is time consuming to run this analysis step.
Determine in which scope you are when invoking auto completion. You could use a parser as well here (maybe in conjunction with step 1).
Determine what type of symbol(s) you want to show. Many people think this is where a parser can give you all necessary information (the follow set). But as mentioned above that's not true (keywords aside).
In my blog post Universal Code Completion using ANTLR3 I especially addressed the 3rd step. There I don't use a parser, but simulate one, only that I don't stop when a parser would, but when the caret position is reached (so it is essential that the input must be valid syntax up to that point). After reaching the caret the collection process starts, which not only collects terminal nodes (for keywords) but looks at the rule names to learn what needs to be collected too. Using specific rule names is my way there to put context into the grammar, so when the collection code finds a rule table_ref it knows that it doesn't need to go further down the rule chain (to the ultimate ID token), but instead can use this information to provide a list of tables as suggestion.
With ANTLR4 things might become even simpler. I haven't used it myself yet, but the parser interpreter could be a big help here, as it essentially doing what I do manually in my implementation (with the ANTLR3 backend).
This is probably pretty hard to do.
Fundamentally you want to use some parser to predict "what comes next" to display as auto-completion. This has to at least predict what the FIRST token is at the point where the user's input stops.
For ANTLR, I think this will be very difficult. The reason is that ANTLR generates essentially procedural, recursive descent parsers. So at runtime, when you need to figure out what FIRST tokens are, you have to inspect the procedural source code of the generated parser. That way lies madness.
This blog entry claims to achieve autocompletion by collecting error reports rather than inspecting the parser code. Its sort of an interesting idea, but I do not understand how his method really works, and I cannot see how it would offer all possible FIRST tokens; it might acquire some of them. This SO answer confirms my intuition.
Sam Harwell discusses how he has tackled this; he is one of the ANTLR4 implementers and if anybody can make this work, he can. It wouldn't surprise me if he reached inside ANTLR to extract the information he needs; as an ANTLR implementer he would certainly know where to tap in. You are not likely to be so well positioned. Even so, he doesn't really describe what he did in detail. Good luck replicating. You might ask him what he really did.
What you want is a parsing engine for which that FIRST token information is either directly available (the parser generator could produce it) or computable based on the parser state. This is actually possible to do with bottom up parsers such as LALR(k); you can build an algorithm that walks the state tables and computes this information. (We do this with our DMS Software Reengineering Toolkit for its GLR parser precisely to produce syntax error reports that say "missing token, could be any of these [set]")

Creating mathematical function in runtime from string

I am asking this question, because I didn't find yet any posts that are C# related and there might be some build in methods for that I couldn't find. If there are, please tell me so and I can close this question.
Basically I have the common situation:
User types a function w.r.t. one or two variables into some TextBlock
I take this string analyse it
As a return I would like to have a delegate to a method that will take one or two inputs (the variables) and return the function value according to what the user typed in.
Now, I could probably think (and I would like to do this on my own, because I want to use my brain) of an algorithm of analysing the string step by step to actually find out, what has to be calculated first and in what way. E.g. First scan for parentheses, look for the expression within a group of parantheses and calculate that according to more general functions etc.
But in the end I would like to "create" a method of this analysis to be easily used as a normal delegate with a couple of arguments that will return the correct function value.
Are there any methods included in C# for that already, or would I have to go and program everything by myself?
As a remark: I don't want to use anybody else' library, only .NET libraries are acceptable for me.
Edit: After Matt pointed out expression trees, I found this thread which is a good example to my problem.
Edit2: The example pointed out does only include simple functions and will not be useful if I want to include more complex functions such as trigonometric ones or exponentials.
What you are describing is a parser. There are a number of different ways of implementing them, although generally speaking, for complex grammars, a "parser generator" is often used.
A parser generator will take a description of the grammar and convert it into code that will parse text that conforms to the grammar into some form of internal representation that can be manipulated by the program, e.g. a parse tree.
Since you indicate you want to avoid third-party libraries, I'll assume that the use of a parser generator is similarly excluded, which leaves you with implementing your own parser (which fortunately is quite an interesting exercise).
The Wikipedia page on Recursive descent parsers will be particularly useful. I suggest reading through it and perhaps adapting the example code therein to your particular use case. I have done this myself a number of times for different grammars with this as a starting point, so can attest to its usefulness.
The output from such a parser will be a "parse tree". And you then have a number of possibilities for how you convert this into an executable delegate. One option is to implement an Evaluate() method on your parse tree nodes, which will take a set of variables and return the result of evaluating the user's expression. As others have mentioned, your parse tree could leverage .NET's Expression trees, or you can go down the route of emitting IL directly (permitting you to produce a compiled .NET assembly from the user's expression for later use as required).
You might want to look at expression trees.
Check out NCalc for some examples of how to do this. You don't need to use the library, but reading the source is pretty educational.
I found a very helpful pdf explaining the parsing in C# 2.0. This link leads to a very good tutorial on parsers used in C# and also applies that later on to an arithmetic expression.
As this directly helps and answers to my question, I posted this as an answer, rather than as a comment or edit.

Dynamic user control over variables (embedded language?)

I'm creating a piece of software (written in C#, will be a windows application) and I ran into this problem-
I've got a set of variables, and I need to allow the user to define a wide range of mathematical functions on those variables.
But my users don't necessarily have to have any prior knowledge about programming.
The options I've considered are:
Create some sort of GUI for defining the mathematical "functions". But that is very limiting.
Implement a very simple embedded language, that will offer flexibility while remaining relatively easy to understand. I looked at Lua, but the problem with that is that you pretty much need to have prior knowledge in programming. I was thinking about something more readable (somewhat similar to SQL), for example "assign 3 to X;"
Other ideas are welcome.
I'm basically looking for the best way to go here, under the assumption that my users don't have any knowledge in programming.
However, note that this is not the main feature of my software, so I'm assuming that if a user wants/needs to use this feature, he will take the time to look at the manual for a few minutes and learn how to do so, as long as it's not too complicated.
Thanks, Malki :)
What you want is a domain specific language. I see you've tried Lua and didn't find that acceptable--I'll assume that most pre-built scripting languages are out then.
Depending on your expected function complexity, I would recommend that you give a shot at implementing a small recursive-descent parser so that you can exactly specify your language. This way you can realize something like:
assign 3 to X
show sin(X * 5)
If this is a bit beyond what you're willing to do, you can get some parsing assistance from a library such as Irony; this will let you focus on using the abstract syntax tree rather than playing with tokenizing/lexing for some time.
If you want, you can even look at FLEE, which will parse and evaluate some pretty complex expressions right out of the gate.
ANTLR is a greate parser if you want to make your own language

Why can't you edit and continue debugging when there's a Lambda expression in the method?

I've seen it said in other questions that the Linq query syntax compiles to a Lambda.
So why can you not do edit-and-continue when there is a Lambda expression in the method, while with query notation you can?
What's most infuriating, and is seriously making me consider switching to using query notation everywhere, is that even if your code is not in the Lambda, but there's a Lambda somewhere else in the same method, you can't edit-and-continue! That's, like, gratuitous pain inflicted upon unwary developers!
Edit and continue is able to change method implementations "live", but not what fields are in types.
Lambda expressions (and anonymous methods) can end up creating their own private types when they capture variables. Changing the lambda expression can change the types involved, which would break edit and continue.
It sounds like it should be possible to make changes to the code which don't have this impact, but I suspect it's simply easier to prevent it entirely - which also means you don't start making changes and then find that you're prevented half way through your change.
(Personally I'm not a fan of E&C in the first place, so I've never noticed it.)
I don't know for sure, but my guess is the complexity around figuring out what needs to change when there are local variables involved that are lifted to classes. I'm guessing that figuring out what changes would be safe and what wouldn't was deemed to complex and error-prone to get right at this point. The tooling in 2010 focused around threading and the new UI -- maybe we'll get it in the next version.
I don't know it for sure, but I assume it has to do with the way the compiler converts lambda expressions forming closures into compiler generated classes. Probably there is no (easy) way to apply changes made to the compiled code and preserve the current state.

Programmatically checking code complexity, possibly via c#?

I'm interested in data mining projects, and have always wanted to create a classification algorithm that would determine which specific check-ins need code-reviews, and which may not.
I've developed many heuristics for my algorithm, although I've yet to figure out the killer...
How can I programmatically check the computational complexity of a chunk of code?
Furthermore, and even more interesting - how could I use not just the code but the diff that the source control repository provides to obtain better data there..
IE: If I add complexity to the code I'm checking in - but it reduces complexity in the code that is left - shouldn't that be considered 'good' code?
Interested in your thoughts on this.
UPDATE
Apparently I wasn't clear. I want this
double codeValue = CodeChecker.CheckCode(someCodeFile);
I want a number to come out based on how good the code was. I'll start with numbers like VS2008 gives when you calculate complexity, but would like to move to further heuristics.
Anyone have any ideas? It would be much appreciated!
Have you taken a look at NDepend? This tool can be used to calculated code complexity and supports a query language by which you can get an incredible amount of data on your application.
The NDepend web site contains a list of definitions of various metrics. Deciding which are most important in your environment is largely up to you.
NDepend also has a command line version that can be integrated into your build process.
Also, Microsoft's Code Analysis (ships with VS Team Suite) includes metrics which check the cyclomatic complexity of code, and raises a build error (or warning) if this number is over a certain threshold.
I don't know off hand, but ut may be worth checking whether this number is configurable to your requirements. You could then modify your build process to run code analysis any time something is checked in.
See Semantic Designs C# Metrics Tool for a tool that computes a variety of standard metrics value both over complete files, and all reasonable subdivisions (methods, classes, ...).
The output is an XML document, but extracting the value(s) you want from that should be trivial with an XML reader.

Categories