Tokenizing a expression string in C# - c#

I have a calculation formula string of the form
string formula = w_tb + Min(d_3,a_x) * Pow(x,2)
In the above example w_tb, d_3, a_x, x are variables. I am able to find the list of variables in the formula by splitting the strings using the operands as the delimiter.
But in order to assign the values to the variables (value from the database), my first approach was to replace the variable name with the value of the variable from the database. But that would result in replacing even a substring of the expression string with the value. Eg : If the variable x is replaced with the value 1,2 in the expression string a_x will be replaced to a_1,2 which is not the required result.
I have the list of variables and and the complete list of delimiters(operators).
What I am trying to achieve?
I am trying to get the following list from the expression string
List<string>() {"w_tb","+","Min","(","d_3",",","a_x",")","*","Pow","(","x",",","2",")"}
Is there a way I could achieve the task.
I have already tried the solution mentioned in this answer. But the tokenizer I am having is a string.

What you are trying to do is called Parsing, frequently used by compilers.
You can define a simple grammar for your expression and let a parser-generator generate the parser code for you. It allows you to define syntax errors and gives the full token list like the one you want. A very good example in Antlr. Have a look http://www.antlr.org/

So i could find an approach to achieve my task. I have a list of all available operators and a list of all available operands. The list of operands can be obtained by splitting the formula string based on the operator string.
var Operatorlist = new string[] { "Min", "Max", "Abs", "Pow", "+", "-", "*", "/", "(", ")", "²", "³", "Length", " ", "\r", "\n", ",", "[", "]", "Sqrt", "Cubrt", "^" };
string[] formulaSplit = formula.Split(Operatorlist,StringSplitOption.None);
Now to parse the formula to get a list of operands and operators
string sb="";
var formlist = new list<string>();
foreach(var c in calc.CalculationFormula)
{
sb = sb + c;
if(delimstringlist.Contains(sb))
{
formlist.Add(sb);
sb = "";
}
else if(formulaSplit.Contains(sb))
{
formlist.Add(sb);
sb = "";
}
}
May not be so efficient, but will surely do the task.

Related

Parse multiple values from string c#

Suppose I have written "5 and 6" or "5+6". How can I assign 5 and 6 to two different variables in c# ?
P.S. I also want to do certain work if certain chars are found in string. Suppose I have written 5+5. Will this code do that ?
if(string.Contains("+"))
{
sum=x+y;
}
string input="5+5";
var numbers = Regex.Matches(input, #"\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Personally, I would vote against doing some splitting and regular expression stuff.
Instead I would (and did in the past) use one of the many Expression Evaluation libraries, like e.g. this one over at Code Project (and the updated version over at CodePlex).
Using the parser/tool above, you could do things like:
A simple expression evaluation then could look like:
Expression e = new Expression("5 + 6");
Debug.Assert(11 == e.Evaluate());
To me this is much more error-proof than doing the parsing all by myself, including regular expressions and the like.
You should use another name for your string than string
var numbers = yourString.Split("+");
var sum = Convert.ToInt32(numbers[0]) + Convert.ToInt32(numbers[1]);
Note: Thats an implementation without any error checking or error handling...
If you want to assign numbers from string to variables, you will have to parse string and make conversion.
Simple example, if you have text with only one number
string text = "500";
int num = int.Parse(text);
Now, if you want to parse something more complicated, you can use split() and/or regex to get all numbers and operators between them. Than you just iterate array and assign numbers to variables.
string text = "500+400";
if (text.Contains("+"))
{
String[] data = text.Split("+");
int a = int.Parse(data[0]);
int b = int.Parse(data[1]);
int res = a + b;
}
Basicly, if you have just 2 numbers and operazor between them, its ok. If you want to make "calculator" you will need something more, like Binary Trees or Stack.
Use the String.Split method. It splits your string rom the given character and returns a string array containing the value that is broken down into multiple pieces depending on the character to break, in this case, its "+".
int x = 0;
int y = 0;
int z = 0;
string value = "5+6";
if (value.Contains("+"))
{
string[] returnedArray = value.Split('+');
x = Convert.ToInt32(returnedArray[0]);
y = Convert.ToInt32(returnedArray[1]);
z = x + y;
}
Something like this may helpful
string strMy = "5&6";
char[] arr = strMy.ToCharArray();
List<int> list = new List<int>();
foreach (char item in arr)
{
int value;
if (int.TryParse(item.ToString(), out value))
{
list.Add(item);
}
}
list will contains all the integer values
You can use String.Split method like;
string s = "5 and 6";
string[] a = s.Split(new string[] { "and", "+" }, StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(a[0].Trim());
Console.WriteLine(a[1].Trim());
Here is a DEMO.
Use regex to get those value and then switch on the operand to do the calculation
string str = "51 + 6";
str = str.Replace(" ", "");
Regex regex = new Regex(#"(?<rightHand>\d+)(?<operand>\+|and)(?<leftHand>\d+)");
var match = regex.Match(str);
int rightHand = int.Parse(match.Groups["rightHand"].Value);
int leftHand = int.Parse(match.Groups["leftHand"].Value);
string op = match.Groups["operand"].Value;
switch (op)
{
case "+":
.
.
.
}
Split function maybe is comfortable in use but it is space inefficient
because it needs array of strings
Maybe Trim(), IndexOf(), Substring() can replace Split() function

Extracting data from plain text string

I am trying to process a report from a system which gives me the following code
000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}
I need to extract the values between the curly brackets {} and save them in to variables. I assume I will need to do this using regex or similar? I've really no idea where to start!! I'm using c# asp.net 4.
I need the following variables
param1 = 000
param2 = GEN
param3 = OK
param4 = 1 //Q
param5 = 1 //M
param6 = 002 //B
param7 = 3e5e65656-e5dd-45678-b785-a05656569e //I
I will name the params based on what they actually mean. Can anyone please help me here? I have tried to split based on spaces, but I get the other garbage with it!
Thanks for any pointers/help!
If the format is pretty constant, you can use .NET string processing methods to pull out the values, something along the lines of
string line =
"000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}";
int start = line.IndexOf('{');
int end = line.IndexOf('}');
string variablePart = line.Substring(start + 1, end - start);
string[] variables = variablePart.Split(' ');
foreach (string variable in variables)
{
string[] parts = variable.Split('=');
// parts[0] holds the variable name, parts[1] holds the value
}
Wrote this off the top of my head, so there may be an off-by-one error somewhere. Also, it would be advisable to add error checking e.g. to make sure the input string has both a { and a }.
I would suggest a regular expression for this type of work.
var objRegex = new System.Text.RegularExpressions.Regex(#"^(\d+)=\[([A-Z]+)\] ([A-Z]+) \{Q=(\d+) M=(\d+) B=(\d+) I=([a-z0-9\-]+)\}$");
var objMatch = objRegex.Match("000=[GEN] OK {Q=1 M=1 B=002 I=3e5e65656-e5dd-45678-b785-a05656569e}");
if (objMatch.Success)
{
Console.WriteLine(objMatch.Groups[1].ToString());
Console.WriteLine(objMatch.Groups[2].ToString());
Console.WriteLine(objMatch.Groups[3].ToString());
Console.WriteLine(objMatch.Groups[4].ToString());
Console.WriteLine(objMatch.Groups[5].ToString());
Console.WriteLine(objMatch.Groups[6].ToString());
Console.WriteLine(objMatch.Groups[7].ToString());
}
I've just tested this out and it works well for me.
Use a regular expression.
Quick and dirty attempt:
(?<ID1>[0-9]*)=\[(?<GEN>[a-zA-Z]*)\] OK {Q=(?<Q>[0-9]*) M=(?<M>[0-9]*) B=(?<B>[0-9]*) I=(?<I>[a-zA-Z0-9\-]*)}
This will generate named groups called ID1, GEN, Q, M, B and I.
Check out the MSDN docs for details on using Regular Expressions in C#.
You can use Regex Hero for quick C# regex testing.
You can use String.Split
string[] parts = s.Split(new string[] {"=[", "] ", " {Q=", " M=", " B=", " I=", "}"},
StringSplitOptions.None);
This solution breaks up your report code into segments and stores the desired values into an array.
The regular expression matches one report code segment at a time and stores the appropriate values in the "Parsed Report Code Array".
As your example implied, the first two code segments are treated differently than the ones after that. I made the assumption that it is always the first two segments that are processed differently.
private static string[] ParseReportCode(string reportCode) {
const int FIRST_VALUE_ONLY_SEGMENT = 3;
const int GRP_SEGMENT_NAME = 1;
const int GRP_SEGMENT_VALUE = 2;
Regex reportCodeSegmentPattern = new Regex(#"\s*([^\}\{=\s]+)(?:=\[?([^\s\]\}]+)\]?)?");
Match matchReportCodeSegment = reportCodeSegmentPattern.Match(reportCode);
List<string> parsedCodeSegmentElements = new List<string>();
int segmentCount = 0;
while (matchReportCodeSegment.Success) {
if (++segmentCount < FIRST_VALUE_ONLY_SEGMENT) {
string segmentName = matchReportCodeSegment.Groups[GRP_SEGMENT_NAME].Value;
parsedCodeSegmentElements.Add(segmentName);
}
string segmentValue = matchReportCodeSegment.Groups[GRP_SEGMENT_VALUE].Value;
if (segmentValue.Length > 0) parsedCodeSegmentElements.Add(segmentValue);
matchReportCodeSegment = matchReportCodeSegment.NextMatch();
}
return parsedCodeSegmentElements.ToArray();
}

Converting String List to Single Quoted List in C# .NET

I have list of strings. I want to convert each element of it to single quoted string (i.e "ABC" --> 'ABC'), How to do this in .net.
Thanks,
Omkar
Linq can help here.
var newList = oldList.Select(c => c.Replace("\"", "'"));
This is already well answered. However, I have the hunch that you are taking a list of strings in C#, then trying to build an SQL expression for use in IN statements, e.g.:
SELECT * FROM table WHERE name IN ('John','Mary','Peter')
In that case, you'd need to join the strings together, as well as protect from code injection attacks by doubling any single-quote characters.
StringBuilder sb = new StringBuilder();
foreach (string entry in list) {
if (sb.Length > 0) sb.Append(",");
sb.Append("\'" + entry.Replace("'","''") + "\'");
}
string expr = sb.ToString();
You'd also need to handle the special case when the list is empty because IN () is not a valid syntax for SQL.
If this is not what you want, just ignore me. :-)
I assume you have regular strings s to 's' (quoted string) and you wanted a List<> to be converted.
List<string> stringList = new List<string>();
//Fill the list with strings here.
var query = from str in stringList
select string.Format("\'{0}\'", str);
List<string> quotedList = query.ToList<string>();
If you want to replace all double with single quotes, simply do this:
myString = myString.Replace( "\"", "'" );
However, note that ' is not a valid string delimiter in C#, so you can't have the string 'ABC', but you can have the string "'ABC'" that contains the text 'ABC'
EDIT
When looking at Geoff's answer, I saw that you wanted a list. In that case, his answer is almost correct- Try this variant instead:
var convertedList = myStringList.Select(s => s = s.Replace("\"", "'").ToList();

Ignore "The" and "A" when sorting a listview in C#

Currently I am making a mini music player / organizer for myself. However, when using a list view, it sorts alphabetically, and doesn't ignore "The" and "A":
A Lot Like Me
Adiago For Strings
Stay Crunchy
The Foresaken
Time to Pretend
should be:
Adiago For Strings
The Foresaken
A Lot Like Me
Stay Crunchy
Time To Pretend
It's all loaded from a multi-dimensional array, and I've even tried filtering out "The" and "A" manually, and then display the real name (from a different array), but then it just sorts the displayed name (including "The" and "A")
What you could do is to create a customer comparer and set it on your ListView instance using the ListView.ListViewItemSorter property. Then, your comparer is responsible for remvoing "the" and "a" from the start of the items being compared.
When your ListView is sorted, it will use this custom comparer to sort with, but the original values including "the" and "a" will be used as display values in the ListView (ie. you do not need to modify the values you put in the ListView - the comparer just ignores the words you want it to when sorting).
You could do it with a custom comparison method like this:
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
class Example
{
static void Main()
{
List<String> names = new List<String>
{
"A Lot Like Me",
"Adiago For Strings",
"Stay Crunchy",
"The Foresaken",
"Time to Pretend"
};
names.Sort(smartCompare);
}
static Regex smartCompareExpression
= new Regex(#"^(?:A|The)\s*",
RegexOptions.Compiled |
RegexOptions.CultureInvariant |
RegexOptions.IgnoreCase);
static Int32 smartCompare(String x, String y)
{
x = smartCompareExpression.Replace(x, "");
y = smartCompareExpression.Replace(y, "");
return x.CompareTo(y);
}
}
The regular expression strips off any leading "A " or "The " from the strings so that they won't effect the comparison.
This LINQ approach seems to work:
string[] input = new string[] {
"A Lot Like Me",
"Adiago For Strings",
"Stay Crunchy",
"The Foresaken",
"Time to Pretend"
};
IEnumerable<string> ordered = input.OrderBy(s =>
s.StartsWith("A ", StringComparison.OrdinalIgnoreCase) || s.StartsWith("The ", StringComparison.OrdinalIgnoreCase) ?
s.Substring(s.IndexOf(" ") + 1) :
s);
foreach (var item in ordered)
{
Console.WriteLine(item);
}
It strips away leading "a " and "the " from the comparison, but does not alter the values in the list.

C# Named parameters to a string that replace to the parameter values

I want in a good performance way (I hope) replace a named parameter in my string to a named parameter from code, example, my string:
"Hi {name}, do you like milk?"
How could I replace the {name} by code, Regular expressions? To expensive? Which way do you recommend?
How do they in example NHibernates HQL to replace :my_param to the user defined value? Or in ASP.NET (MVC) Routing that I like better, "{controller}/{action}", new { controller = "Hello", ... }?
Have you confirmed that regular expressions are too expensive?
The cost of regular expressions is greatly exaggerated. For such a simple pattern performance will be quite good, probably only slightly less good than direct search-and-replace, in fact. Also, have you experimented with the Compiled flag when constructing the regular expression?
That said, can't you just use the simplest way, i.e. Replace?
string varname = "name";
string pattern = "{" + varname + "}";
Console.WriteLine("Hi {name}".Replace(pattern, "Mike"));
Regex is certainly a viable option, especially with a MatchEvaluator:
Regex re = new Regex(#"\{(\w*?)\}", RegexOptions.Compiled); // store this...
string input = "Hi {name}, do you like {food}?";
Dictionary<string, string> vals = new Dictionary<string, string>();
vals.Add("name", "Fred");
vals.Add("food", "milk");
string q = re.Replace(input, delegate(Match match)
{
string key = match.Groups[1].Value;
return vals[key];
});
Now if you have you replacements in a dictionary, like this:
var replacements = new Dictionary<string, string>();
replacements["name"] = "Mike";
replacements["age"]= "20";
then the Regex becomes quite simple:
Regex regex = new Regex(#"\{(?<key>\w+)\}");
string formattext = "{name} is {age} years old";
string newStr = regex.Replace(formattext,
match=>replacements[match.Groups[1].Captures[0].Value]);
After thinking about this, I realized what I actually wished for, was that String.Format() would take an IDictionary as argument, and that templates could be written using names instead of indexes.
For string substitutions with lots of possible keys/values, the index numbers result in illegible string templates - and in some cases, you may not even know which items are going to have what number, so I came up with the following extension:
https://gist.github.com/896724
Basically this lets you use string templates with names instead of numbers, and a dictionary instead of an array, and lets you have all the other good features of String.Format(), allowing the use of a custom IFormatProvider, if needed, and allowing the use of all the usual formatting syntax - precision, length, etc.
The example provided in the reference material for String.Format is a great example of how templates with many numbered items become completely illegible - porting that example to use this new extension method, you get something like this:
var replacements = new Dictionary<String, object>()
{
{ "date1", new DateTime(2009, 7, 1) },
{ "hiTime", new TimeSpan(14, 17, 32) },
{ "hiTemp", 62.1m },
{ "loTime", new TimeSpan(3, 16, 10) },
{ "loTemp", 54.8m }
};
var template =
"Temperature on {date1:d}:\n{hiTime,11}: {hiTemp} degrees (hi)\n{loTime,11}: {loTemp} degrees (lo)";
var result = template.Subtitute(replacements);
As someone pointed out, if what you're writing needs to be highly optimized, don't use something like this - if you have to format millions of strings this way, in a loop, the memory and performance overhead could be significant.
On the other hand, if you're concerned about writing legible, maintainable code - and if you're doing, say, a bunch of database operations, in the grand scheme of things, this function will not add any significant overhead.
...
For convenience, I did attempt to add a method that would accept an anonymous object instead of a dictionary:
public static String Substitute(this String template, object obj)
{
return Substitute(
template,
obj.GetType().GetProperties().ToDictionary(p => p.Name, p => p.GetValue(obj, null))
);
}
For some reason, this doesn't work - passing an anonymous object like new { name: "value" } to that extension method gives a compile-time error message saying the best match was the IDictionary version of that method. Not sure how to fix that. (anyone?)
How about
stringVar = "Hello, {0}. How are you doing?";
arg1 = "John"; // or args[0]
String.Format(stringVar, arg1)
You can even have multiple args, just increment the {x} and add another parameter to the Format() method. Not sure the different but both "string" and "String" have this method.
A compiled regex might do the trick , especially if there are many tokens to be replaced. If there are just a handful of them and performance is key, I would simply find the token by index and replace using string functions. Believe it or not this will be faster than a regex.
Try using StringTemplate. It's much more powerful than that, but it does the job flawless.
or try this with Linq if you have all your replace values stored in a Dictionary obj.
For example:
Dictionary<string,string> dict = new Dictionary<string,string>();
dict.add("replace1","newVal1");
dict.add("replace2","newVal2");
dict.add("replace3","newVal3");
var newstr = dict.Aggregate(str, (current, value) => current.Replace(value.Key, value.Value));
dict is your search-replace pairs defined Dictionary object.
str is your string which you need to do some replacements with.
I would go for the mindplay.dk solution... Works quite well.
And, with a slight modification, it supports templates-of-templates, like
"Hi {name}, do you like {0}?", replacing {name} but retaining {0}:
In the given source (https://gist.github.com/896724), replace as follows:
var format = Pattern.Replace(
template,
match =>
{
var name = match.Groups[1].Captures[0].Value;
if (!int.TryParse(name, out parsedInt))
{
if (!map.ContainsKey(name))
{
map[name] = map.Count;
list.Add(dictionary.ContainsKey(name) ? dictionary[name] : null);
}
return "{" + map[name] + match.Groups[2].Captures[0].Value + "}";
}
else return "{{" + name + "}}";
}
);
Furthermore, it supports a length ({name,30}) as well as a formatspecifier, or a combination of both.
UPDATE for 2022 (for both .NET 4.8 and .NET 6):
Especially when multi-line string templates are needed, C# 6 now offers us both $ and # used together like:
(You just need to escape quotes by replacing " with "")
string name = "Mike";
int age = 20 + 14; // 34
string product = "milk";
var htmlTemplateContent = $#"
<!DOCTYPE html>
<html>
<head>
<meta charset=""utf-8"" />
<title>Sample HTML page</title>
</head>
<body>
Hi {name}, now that you're {age.ToString()}, how do you like {product}?
</body>
</html>";

Categories