I'm working on a application in C# in which I want to calculate an arithmetic expression that is given as a string.
So like I got a string:
string myExpr="4*(80+(5/2))+2";
And I want to calculate the outcome of the arithmetic expression.
While in a language such as Javascript, PHP etc. you could just use Eval to do the trick this doesnt seem to be an option in C#.
I suppose it is possible to write a code to devide it into countless simple expressions, calculate them and add them together but this would take quite some time and I'm likely to have lots of troubles in my attempt to do so.
So... my question, Is there any 'simple' way to do this?
There's a javascript library you can reference, then just do something like:
var engine = VsaEngine.CreateEngine();
Eval.JScriptEvaluate(mySum, engine);
Edit;
Library is Microsoft.JScript
You could just call the JScript.NET eval function. Any .NET language can call into any other.
Have you seen http://ncalc.codeplex.com ?
It's extensible, fast (e.g. has its own cache) enables you to provide custom functions and varaibles at run time by handling EvaluateFunction/EvaluateParameter events. Example expressions it can parse:
Expression e = new Expression("Round(Pow(Pi, 2) + Pow([Pi2], 2) + X, 2)");
e.Parameters["Pi2"] = new Expression("Pi * Pi");
e.Parameters["X"] = 10;
e.EvaluateParameter += delegate(string name, ParameterArgs args)
{
if (name == "Pi")
args.Result = 3.14;
};
Debug.Assert(117.07 == e.Evaluate());
It also handles unicode & many data type natively. It comes with an antler file if you want to change the grammer. There is also a fork which supports MEF to load new functions.
It also supports logical operators, date/time's strings and if statements.
I've used NCalc with great success. It's extremely flexible and allows for variables in your formulas. The formula you listed in your question could be evaluated this easily:
string myExpr = "4*(80+(5/2))+2";
decimal result = Convert.ToDecimal(new Expression(myExpr).Evaluate());
You need to implement an expression evaluator. It's fairly straightforward if you have the background, but it's not "simple". Eval in interpreted environments actually re-runs the language parser over the string; you need to emulate that operation, for the bits you care about, in your C# code.
Search for "expression evaluators" and "recursive descent parser" to get started.
If you have Bjarne Stroustrup's The C++ Programming Language, in Chapter 6 he explains step by step (in C++) how to do exactly what Chris Tavares suggests.
It's straightforward but a little heady if you're not familiar with the procedure.
I needed to do something similar for an undergrad projectand I found this
Reverse Polish Notation In C#
Tutorial and code to be extremely valuable.
It's pretty much just an implementation of converting the string to Reverse Polish Notation then evaluating it. It's extremely easy to use, understand and add new functions.
Source code is included.
Try something like this:
int mySum = 4*(80+(5/2))+2;
var myStringSum = mySum.toString();
Related
I am trying to make a basic translator, which changes values in the code, for example, a . may be ::, e.t.c, and I can do that by using
if(code.Contains("."))
{
code.Replace(".", "::");
}
But my problem is I don't know how to ignore it in the inside of speech, as if the sentence was "Hello.", it could be translated to "Hello::". How would I be able to stop this? (I know you can use Regex "\".+?\"" to find speech in text)
What you're trying to do here is vastly more complicated than you realise.
Programming languages differ in more than just appearance, they have different capabilities and syntax rules, even for two as similar as C# and C++. Aside from the fact that the C# . is equivalent to . -> and :: in C++. There's also different rules regarding pointers, and sometimes you get issues like having a pointer to a pointer, not to mention that the * and & symbols can be binary arithmetic/logic operations or pointer operations depending on their use. There's also issues involving keywords such as const, auto and sizeof.
In short, unless you're prepared to write a proper tokeniser, you aren't going to pull this off properly. To properly translate one programming language to another you would at least have to write a good chunk of a full compiler, which is a specialist subject.
I suggest you do some research into tokenisers and lexical analysis before you go any further.
As a hint though, you'll find it easier to split your code up into an array of characters and handle one character at a time whilst keeping track of the program's state (ie are you currently in the middle of a string, are you in the middle of two brackets, how many levels of nesting have you come across). By doing it that way you can at least manage to change the surface-level differences (as opposed to deeper ones like keywords and typing).
EDIT:
Some useful resources for writing tokenisers and compilers:
http://www.cs.man.ac.uk/~pjj/farrell/comp3.html
http://msdn.microsoft.com/en-us/library/vstudio/3yx2xe3h(v=vs.100).aspx
http://en.wikibooks.org/wiki/Compiler_Construction/Lexical_analysis
I speak from experience, as I did recently attempt to write my own compiler (in C#), but put the project on hold due to more important matters getting in the way.
You could probably use a regex to help you out here, but it would be fairly complex and perform poorly. You could also just do this:
var sb = new StringBuilder();
bool insideSpeech = false;
foreach(char c in code) {
if(c == '"') {
insideSpeech = !insideSpeech;
}
if(c == '.' && !insideSpeech) {
sb.Append("::");
} else {
sb.Append(c);
}
}
code = sb.ToString();
I want to evaluate a math expression which the user enters in a textbox. I have done this so far
string equation, finalString;
equation = textBox1.Text;
StringBuilder stringEvaluate = new StringBuilder(equation);
stringEvaluate.Replace("sin", "math.sin");
stringEvaluate.Replace("cos", "math.cos");
stringEvaluate.Replace("tan", "math.tan");
stringEvaluate.Replace("log", "math.log10");
stringEvaluate.Replace("e^", "math.exp");
finalString = stringEvaluate.ToString();
StringBuilder replaceI = new StringBuilder(finalString);
replaceI.Replace("x", "i");
double a;
for (int i = 0; i<5 ; i++)
{
a = double.Parse(finalStringI);
if(a<0)
break;
}
when I run this program it gives an error "Input string was not in a correct format." and highlights a=double.Parse(finalStringI);
I used a pre defined expression a=i*math.log10(i)-1.2 and it works, but when I enter the same thing in the textbox it doesn't.
I did some search and it came up with something to do with compiling the code at runtime.
any ideas how to do this?
i'm an absolute beginner.
thanks :)
The issue is within your stringEvaluate StringBuilder. When you're replacing "sin" with "math.sin", the content within stringEvaluate is still a string. You've got the right idea, but the error you're getting is because of that fact.
Math.sin is a method inside the Math class, thus it cannot be operated on as you are in your a = double.Parse(finalStringI); call.
It would be a pretty big undertaking to accomplish your goal, but I would go about it this way:
Create a class (perhaps call it Expression).
Members of the Expression class could include Lists of operators and operands, and perhaps a double called solution.
Pass this class the string at instantiation, and tear it apart using the StringBuilder class. For example, if you encounter a "sin", add Math.sin to the operator collection (of which I'd use type object).
Each operator and operand within said string should be placed within the two collections.
Create a method that evaluates the elements within the operator and operand collection accordingly. This could get sticky for complex calculations with more than 2 operators, as you would have to implement a PEMDAS-esque algorithm to re-order the collections to obey the order of operations (and thus achieve correct solutions).
Hope this helps :)
The .Parse methods (Int.Parse, double.Parse, etc) will only take a string such as "25" or "3.141" and convert it to the matching value type (int 25, or double 3.141). They will not evaluate math expressions!
You'll pretty much have to write your own text-parser and parse-tree evaluator, or explore run-time code-generation, or MSIL code-emission.
Neither topic can really be covered in the Q&A format of StackOverflow answers.
Take a look at this blog post:
http://www.c-sharpcorner.com/UploadFile/mgold/CodeDomCalculator08082005003253AM/CodeDomCalculator.aspx
It sounds like it does pretty much what you're trying to do. Evaluating math expressions is not as simple as just parsing a double (which is really only going to work for strings like "1.234", not "1 + 2.34"), but apparently it is possible.
You can use the eval function that the framework includes for JScript.NET code.
More details: http://odetocode.com/code/80.aspx
Or, if you're not scared to use classes marked "deprecated", it's really easy:
static string EvalExpression(string s)
{
return Microsoft.JScript.Eval.JScriptEvaluate(s, null, Microsoft.JScript.Vsa.VsaEngine.CreateEngine()).ToString();
}
For example, input "Math.cos(Math.PI / 3)" and the result is "0.5" (which is the correct cosine of 60 degrees)
I am looking for a parser that can operate on a query filter. However, I'm not quite sure of the terminology so it's proving hard work. I hope that someone can help me. I've read about 'Recursive descent parsers' but I wonder if these are for full-blown language parsers rather than the logical expression evaluation that I'm looking for.
Ideally, I am looking for .NET code (C#) but also a similar parser that works in T-SQL.
What I want is for something to parse e.g.:
((a=b)|(e=1))&(c<=d)
Ideally, the operators can be definable (e.g. '<' vs 'lt', '=' vs '==' vs 'eq', etc) and we can specify function-type labels (e.g. (left(x,1)='e')). The parser loads this, obeys order precedence (and ideally handles the lack of any brackets) and then calls-back to my code with expressions to evaluate to a boolean result - e.g. 'a=b'?). I wouldn't expect the parser to understand the custom functions in the expression (though some basic ones would be useful, like string splitting). Splitting the expression (into left- and right-hand parts) would be nice.
It is preferable that the parser asks the minimum number of questions to have to work out the final result - e.g. if one side of an AND is false, there is no point evaluating the other side, and to evaluate the easiest side first (i.e. in the above expression, 'c<=d' should be assumed to be quicker and thus evaluated first.
I can imagine that this is a lot of work to do, however, fairly common. Can anyone give me any pointers? If there aren't parsers that are as flexible as above, are there any basic parsers that I can use as a start?
Many Thanks
Lee
Take a look at this. ANTLR is a good parser generator and the linked-to article has working code which you may be able to adapt to your needs.
You could check out Irony. With it you define your grammar in C# code using a syntax which is not to far from bnf. They even have a simple example on their site (expression evaluator) which seems to be quite close to what you want to achieve.
Edit: There's been a talk about Irony at this year's Lang.Net symposium.
Hope this helps!
Try Vici.Parser: download it here (free) , it's the most flexible expression parser/evaluator I've found so far.
If it's possible for you, use .Net 3.5 expressions.
Compiler parses your expression for you and gives you expression tree that you can analyze and use as you need. Not very simple but doable (actually all implementations of IQueryable interface do exactly this).
You can use .NET expression trees for this. And the example is actually pretty simple.
Expression<Func<int, int, int, int, bool>> test = (int a, int b, int c, int d) => ((a == b) | (c == 1)) & (c <= d);
And then just look at "test" in the debugger. Everything is already parsed for you, you can just use it.
The only problem is that in .NET 3.5 you can have only up to 4 arguments in Func. So, I changed "e" to "c" in one place. In 4.0 this limit is changed to 16.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
We've used ANTLR to create a parser for a SQL-like grammar, and while the results are satisfactory in most cases, there are a few edge cases that we need to fix; and since we didn't write the parser ourselves we don't really understand it well enough to be able to make sensible changes.
So, we'd like to write our own parser. What's the best way to go about writing a parser by hand? What sort of parser should we use - recursive descent has been recommended; is that right? We'll be writing it in C#, so any tutorials for writing parsers in that language would be gratefully received.
UPDATE: I'd also be interested in answers that involve F# - I've been looking for a reason to use that in a project.
I would highly recommend the F# language as your language of choice for parsing on the .NET Platform. It's roots in the ML family of languages means it has excellent support for language-oriented programming.
Discriminated unions and pattern-matching allow for a very succinct and powerful specification of your AST. Higher-order functions allow for definition of parse operations and their composition. First-class support for monadic types allows for state management to be handled implicitly greatly simplifying the composition of parsers. Powerful type-inference greatly aides the definition of these (complex) types. And all of this can be specified and executed interactively allowing you to rapidly prototype.
Stephan Tolksdorf has put this into practice with his parser combinator library FParsec
From his examples we see how naturally an AST is specified:
type expr =
| Val of string
| Int of int
| Float of float
| Decr of expr
type stmt =
| Assign of string * expr
| While of expr * stmt
| Seq of stmt list
| IfThen of expr * stmt
| IfThenElse of expr * stmt * stmt
| Print of expr
type prog = Prog of stmt list
the implementation of the parser (partially elided) is just as succinct:
let stmt, stmtRef = createParserForwardedToRef()
let stmtList = sepBy1 stmt (ch ';')
let assign =
pipe2 id (str ":=" >>. expr) (fun id e -> Assign(id, e))
let print = str "print" >>. expr |>> Print
let pwhile =
pipe2 (str "while" >>. expr) (str "do" >>. stmt) (fun e s -> While(e, s))
let seq =
str "begin" >>. stmtList .>> str "end" |>> Seq
let ifthen =
pipe3 (str "if" >>. expr) (str "then" >>. stmt) (opt (str "else" >>. stmt))
(fun e s1 optS2 ->
match optS2 with
| None -> IfThen(e, s1)
| Some s2 -> IfThenElse(e, s1, s2))
do stmtRef:= choice [ifthen; pwhile; seq; print; assign]
let prog =
ws >>. stmtList .>> eof |>> Prog
On the second line, as an example, stmt and ch are parsers and sepBy1 is a monadic parser combinator that takes two simple parsers and returns a combination parser. In this case sepBy1 p sep returns a parser that parses one or more occurrences of p separated by sep. You can thus see how quickly a powerful parser can be combined from simple parsers. F#'s support for overridden operators also allow for concise infix notation e.g. the sequencing combinator and the choice combinator can be specified as >>. and <|>.
Best of luck,
Danny
The only kind of parser that can be handwritten by a sane human being is a recursive-descent. That said, writing bottom-up parser by hand is still possible but is very undesirable.
If you're up for RD parser you have to verify that SQL grammar is not left-recursive (and eliminate the recursion if necessary), and then basically write a function for each grammar rule. See this for further reference.
Adding my voice to the chorus in favor of recursive-descent (LL1). They are simple, fast, and IMO, not at all hard to maintain.
However, take a good look at your language to make sure it is LL1. If you have any syntax like C has, like ((((type))foo)[ ]) where you might have to descend multiple layers of parentheses before you even find out if you are looking at a type, variable, or expression, then LL1 will be very difficult, and bottom-up wins.
Recursive descent will give you the simplest way to go, but I would have to agree with mouviciel that flex and bison and definitely worth learning. When you find out you have a mistake in your grammar, fixing a definition of the language in flex /bison will be a hell of a lot easier then rewriting your recursive descent code.
FYI the C# parser is written recursive descent and it tends to be quite robust.
Recursive Descent parsers are indeed the best, maybe only, parsers that can be built by hand. You will still have to bone-up on what exactly a formal, context-free language is and put your language in a normal form. I would personally suggest that you remove left-recursion and put your language in Greibach Normal Form. When you do that, the parser just about writes itself.
For example, this production:
A => aC
A => bD
A => eF
becomes something simple like:
int A() {
chr = read();
switch char
case 'a': C();
case 'b': D();
case 'e': F();
default: throw_syntax_error('A', chr);
}
And there aren't any much more difficult cases here (what's more difficult is make sure your grammar is in exactly the right form but this allows you the control that you mentioned).
Anton's Link is also seems excellent.
If you want to write it by hand, recursive decent is the most sensible way to go.
You could use a table parser, but that will be extremely hard to maintain.
Example:
Data = Object | Value;
Value = Ident, '=', Literal;
Object = '{', DataList, '}';
DataList = Data | DataList, Data;
ParseData {
if PeekToken = '{' then
return ParseObject;
if PeekToken = Ident then
return ParseValue;
return Error;
}
ParseValue {
ident = TokenValue;
if NextToken <> '=' then
return Error;
if NextToken <> Literal then
return Error;
return(ident, TokenValue);
}
ParseObject {
AssertToken('{');
temp = ParseDataList;
AssertToken('}');
return temp;
}
ParseDataList {
data = ParseData;
temp = []
while Ok(data) {
temp = temp + data;
data = ParseData;
}
}
I suggest that you don't write the lexer by hand - use flex or similar. The task of recognising tokens is not that hard to do by hand, but I don't think you'd gain much.
As others have said, recursive descent parsers are easiest to write by hand. Otherwise you have to maintain the table of state transitions for each token, which isn't really human-readable.
I'm pretty sure ANTLR implements a recursive descent parser anyway: there's a mention of it in an interview about ANTLR 3.0.
I've also found a series of blog posts about writing a parser in C#. It seems quite gentle.
At the risk of offending the OP, writing a parser for a large langauge like some specific vendor's SQL by hand when good parser generator tools (such as ANTLR) are available is simply crazy. You'll spend far more time rewriting your parser by hand than you will fixing the "edge cases" using the parser generator, and you'll invariably have to go back and revise the parser anyway as the SQL standards move or you discover you misunderstood something else. If you don't understand your parsing technology well enough, invest the time to understand it. It won't take you months to figure out how to deal with the edge cases with the parser generator, and you've already admitted it you are willing to spend months doing it by hand.
Having said that, if you are hell-bent on redoing it manually, using a recursive descent parser is the best way to do it by hand.
(NOTE: OP clarified below that he wants a reference implementation. IMHO you should NEVER code a reference implementation of a grammar procedurally, so recursive descent is out.)
There is no "one best" way. Depending on your needs you may want bottom up (LALR1) or recursive descent(LLk). Articles such as this one give personal reasons for prefering LALR(1) (bottom up) to LL(k). However, each type of parser has its benefits and drawbacks. Generally LALR will be faster as a finite-state_machine is generated as a lookup table.
To pick what's right for you examine your situation; familiarize yourself with the tools and technologies. Starting with some LALR and LL Wikipedia articles isn't a bad choice. In both cases you should ALWAYS start with specifying the grammar in BNF or EBNF. I prefer EBNF for its succinctness.
Once you've gotten your mind wrapped around what you want to do, and how to represent it as a grammar, (BNF or EBNF) try a couple of different tools and run them across representative samples of text to be parsed.
Anecdotally:
However, I've heard that LL(k) is more flexible. I've never bothered to find out for myself. From my few parser building experiences I have noticed that regardles if it's LALR or LL(k) the best way to pick what's best for your needs is to start with writing the grammar. I've written my own C++ EBNF RD parser builder template library, used Lex/YACC and had coded a small R-D parser. This was spread over the better part of 15 years, and I spent no more than 2 months on the longest of the three projects.
In C/Unix, the traditional way is to use lex and yacc. With GNU, the equivalent tools are flex and bison. I don't know for Windows/C#.
If I were you I would have another go at ANTLRv3 using the GUI ANTLRWorks which gives you a very convenient way of testing your grammar. We use ANTLR in our project and although the learning curve may be a bit steep in the beginning once you learn it is quite convenient. Also on their email newsletter there are a lot of people who are very helpful.
PS. IIRC they also have a SQL-grammar you could take a look at.
hth
Well if you don't mind using another compiler compiler tool like ANTLR I suggest you take a look at Coco/R
I've used it in the past and it was pretty good...
The reason you don't want a table-driven parser is that you will not be able to create sensible error-messages. That is ok for a generated language, but not one where humans are involved. The error messages produced by c-like language compilers provide ample evidence that people can adapt to anything, no matter how bad.
I would also vote for using an existing parser + lexer.
The only reason I can see in doing it by hand:
if you want something relatively simple (like validating/parsing some input)
to learn/understand the principles.
Look at gplex and gppg, lexer and parser generators for .NET. They work well, and are based on the same (and almost compatible) input as lex and yacc, which is relatively easy to use.
JFlex is a flex implementation for Java, and there is now a C# port of that project http://sourceforge.net/projects/csflex/. There also appears to be a C# port of CUP in progress, which can be found here: http://sourceforge.net/projects/datagraph/
I too would recommend avoiding hand crafting your own solution. I tried this once myself for a very simple language (part of a university project) and it was incredibly time consuming and difficult. It is also exceedingly hard to maintain and change once written.
Using an existing parser generator is the way to go, as the bulk of the hard work has been done and has been well tested over the years.
In a project that I'm working on I have to work with a rather weird data source. I can give it a "query" and it will return me a DataTable. But the query is not a traditional string. It's more like... a set of method calls that define the criteria that I want. Something along these lines:
var tbl = MySource.GetObject("TheTable");
tbl.AddFilterRow(new FilterRow("Column1", 123, FilterRow.Expression.Equals));
tbl.AddFilterRow(new FilterRow("Column2", 456, FilterRow.Expression.LessThan));
var result = tbl.GetDataTable();
In essence, it supports all the standard stuff (boolean operators, parantheses, a few functions, etc.) but the syntax for writing it is quite verbose and uncomfortable for everyday use.
I wanted to make a little parser that would parse a given expression (like "Column1 = 123 AND Column2 < 456") and convert it to the above function calls. Also, it would be nice if I could add parameters there, so I would be protected against injection attacks. The last little piece of sugar on the top would be if it could cache the parse results and reuse them when the same query is to be re-executed on another object.
So I was wondering - are there any existing solutions that I could use for this, or will I have to roll out my own expression parser? It's not too complicated, but if I can save myself two or three days of coding and a heapload of bugs to fix, it would be worth it.
Try out Irony. Though the documentation is lacking, the samples will get you up and running very quickly. Irony is a project for parsing code and building abstract syntax trees, but you might have to write a little logic to create a form that suits your needs. The DLR may be the complement for this, since it can dynamically generate / execute code from abstract syntax trees (it's used for IronPython and IronRuby). The two should make a good pair.
Oh, and they're both first-class .NET solutions and open source.
Bison or JavaCC or the like will generate a parser from a grammar. You can then augment the nodes of the tree with your own code to transform the expression.
OP comments:
I really don't want to ship 3rd party executables with my soft. I want it to be compiled in my code.
Both tools generate source code, which you link with.
I wrote a parser for exaclty this usage and complexity level by hand. It took about 2 days. I'm glad I did it, but I wouldn't do it again. I'd use ANTLR or F#'s Fslex.