What is the best way to do calculations on values in a string, for example:
"(3.25 * 4) / 1.25 + 10"
You need to write math formula parser.
If you don't want to write your own, check out this one
http://www.codeproject.com/KB/cs/MathParserLibrary.aspx
Have you seen http://ncalc.codeplex.com ?
It's extensible, fast (e.g. has its own cache) enables you to provide custom functions and varaibles at run time by handling EvaluateFunction/EvaluateParameter events. Example expressions it can parse:
Expression e = new Expression("Round(Pow(Pi, 2) + Pow([Pi2], 2) + X, 2)");
e.Parameters["Pi2"] = new Expression("Pi * Pi");
e.Parameters["X"] = 10;
e.EvaluateParameter += delegate(string name, ParameterArgs args)
{
if (name == "Pi")
args.Result = 3.14;
};
Debug.Assert(117.07 == e.Evaluate());
It also handles unicode & many data type natively. It comes with an antler file if you want to change the grammer. There is also a fork which supports MEF to load new functions.
It also supports logical operators, date/time's strings and if statements.
Converting from infix to postfix notation is one way of solving this problem, as you can use a stack to parse and compute the expression.
There are a dozen solutions available online. Here's one of them.
Related
I would like to store math expressions and do operations with, something like:
a=x^2+2x+1
b=2x+3
c=a+b
writeln(c(4)) - would calculate and write the answer of 36
Is there a math library that allows this kind of coding in C# ?
If you want to calculate the same expressions every time, delegates are a good idea as #Paul Griffin suggested. But if you have very different expressions, you will likely have to build your own parsing algorithm or find an existing one. In this case it would be a duplicate of Parse Math Expression.
I don't know about a math library per se, but you could certainly do what you are showing us here with delegates. Off the top of my head (EDIT: a little better example):
delegate int myDelegate(int x);
var a = new myDelegate(x => (Math.Pow(x, 2)) + (2*x) + 1));
var b = new myDelegate(x => (2*x) + 3);
var c = new myDelegate(x => a(x) + b(x));
Console.WriteLine(c(4));
//Output: 36
Am reading from a text file using the code below.
if (!allLines.Contains(":70"))
{
var firstIndex = allLines.IndexOf(":20");
var secondIndex = allLines.IndexOf(":23B");
var thirdIndex = allLines.IndexOf(":59");
var fourthIndex = allLines.IndexOf(":71A");
var fifthIndex = allLines.IndexOf(":72");
var sixthIndex = allLines.IndexOf("-}");
var firstValue = allLines.Substring(firstIndex + 4, secondIndex - firstIndex - 5).TrimEnd();
var secondValue = allLines.Substring(thirdIndex + 4, fourthIndex - thirdIndex - 5).TrimEnd();
var thirdValue = allLines.Substring(fifthIndex + 4, sixthIndex - fifthIndex - 5).TrimEnd();
var len1 = firstValue.Length;
var len2 = secondValue.Length;
var len3 = thirdValue.Length;
inflow103.REFERENCE = firstValue.TrimEnd();
pointer = 1;
inflow103.BENEFICIARY_CUSTOMER = secondValue;
inflow103.RECEIVER_INFORMATION = thirdValue;
}
else if (allLines.Contains(":70"))
{
var firstIndex = allLines.IndexOf(":20");
var secondIndex = allLines.IndexOf(":23B");
var thirdIndex = allLines.IndexOf(":59");
var fourthIndex = allLines.IndexOf(":70");
var fifthIndex = allLines.IndexOf(":71");
var sixthIndex = allLines.IndexOf(":72");
var seventhIndex = allLines.IndexOf("-}");
var firstValue = allLines.Substring(firstIndex + 4, secondIndex - firstIndex - 5).TrimEnd();
var secondValue = allLines.Substring(thirdIndex + 5, fourthIndex - thirdIndex - 5).TrimEnd();
var thirdValue = allLines.Substring(sixthIndex + 4, seventhIndex - sixthIndex - 5).TrimEnd();
var len1 = firstValue.Length;
var len2 = secondValue.Length;
var len3 = thirdValue.Length;
inflow103.REFERENCE = firstValue.TrimEnd();
pointer = 1;
inflow103.BENEFICIARY_CUSTOMER = secondValue;
inflow103.RECEIVER_INFORMATION = thirdValue;
}
Below is the format of the text file am reading.
{1:F21DBLNNGLAAXXX4695300820}{4:{177:1405260906}{451:0}}{1:F01DBLNNGLAAXXX4695300820}{2:O1030859140526SBICNGLXAXXX74790400761405260900N}{3:{103:NGR}{108:AB8144573}{115:3323774}}{4:
:20:SBICNG958839-2
:23B:CRED
:23E:SDVA
:32A:140526NGN168000000,
:50K:IHS PLC
:53A:/3000025296
SBICNGLXXXX
:57A:/3000024426
DBLNNGLA
:59:/0040186345
SONORA CAPITAL AND INVSTMENT LTD
:71A:OUR
:72:/CODTYPTR/001
-}{5:{MAC:00000000}{PAC:00000000}{CHK:42D0D867739F}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}
The above file format represent one transaction in a single text file, but while testing with live files, I came accross a situation where a file can have more than one transaction. Example is the code below.
{1:F21DBLNNGLAAXXX4694300150}{4:{177:1405231923}{451:0}}{1:F01DBLNNGLAAXXX4694300150}{2:O1031656140523FCMBNGLAAXXX17087957771405231916N}{3:{103:NGR}{115:3322817}}{4:
:20:TRONGN3RDB16
:23B:CRED
:23E:SDVA
:26T:001
:32A:140523NGN1634150,00
:50K:/2206117013
SUNLEK INVESTMENT LTD
:53A:/3000024763
FCMBNGLA
:57A:/3000024426
DBLNNGLA
:59:/0022617678
GOLDEN DC INT'L LTD
:71A:OUR
:72:/CODTYPTR/001
//BNF/TRSF
-}{5:{MAC:00000000}{PAC:00000000}{CHK:C21000C4ECBA}{DLM:}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}${1:F21DBLNNGLAAXXX4694300151}{4:{177:1405231923}{451:0}}{1:F01DBLNNGLAAXXX4694300151}{2:O1031656140523FCMBNGLAAXXX17087957781405231916N}{3:{103:NGR}{115:3322818}}{4:
:20:TRONGN3RDB17
:23B:CRED
:23E:SDVA
:26T:001
:32A:140523NGN450000,00
:50K:/2206117013
SUNLEK INVESTMENT LTD
:53A:/3000024763
FCMBNGLA
:57A:/3000024426
DBLNNGLA
:59:/0032501697
SUNSTEEL INDUSTRIES LTD
:71A:OUR
:72:/CODTYPTR/001
//BNF/TRSF
-}{5:{MAC:00000000}{PAC:00000000}{CHK:01C3B7B3CA53}{DLM:}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}
My challenge is that in my code, while reading allLines, each line is identified by certain index, a situation where I need to pick up the second transaction from the file, and the same index exist like as before, how can I manage this situation.
This is a simple problem obscured by excess code. All you are doing is extracting 3 values from a chunk of text where the precise layout can vary from one chunk to another.
There are 3 things I think you need to do.
Refactor the code. Instead of two hefty if blocks inline, you need functions that extract the required text.
Use regular expressions. A single regular expression can extract the values you need in one line instead of several.
Separate the code from the data. The logic of these two blocks is identical, only the data changes. So write one function and pass in the regular expression(s) needed to extract the data items you need.
Unfortunately this calls for a significant lift in the abstraction level of the code, which may be beyond what you're ready for. However, if you can do this and (say) you have function Extract() with regular expressions as arguments, you can apply that function once, twice or as often as needed to handle variations in your basic transaction.
You may perhaps use the code below to achieve multiple record manipulation using your existing code
//assuming fileText is all the text read from the text file
string[] fileData = fileText.Split('$');
foreach(string allLines in fileData)
{
//your code goes here
}
Maybe indexing works, but given the particular structure of the format, I highly doubt that is a good solution. But if it works for you, then that's great. You can simply split on $ and then pass each substring into a method. This assures that the index for each substring starts at the beginning of the entry.
However, if you run into a situation where indices are no longer static, then before you even start to write a parser for any format, you need to first understand the format. If you don't have any documentation and are basically reverse engineering it, that's what you need to do. Maybe someone else has specifications. Maybe the source of this data has it somewhere. But I will proceed under the assumption that none of this information is available and you have been given a task with absolutely no support and are expected to reverse-engineer it.
Any format that is meant to be parsed and written by a computer will 9 out of 10 times be well-formed. I'd say 9.9 out of 10 for that matter, since there are cases where people make things unnecessarily complex for the sake of "security".
When I look at your sample data, I see "chunks" of data enclosed within curly braces, as well as nested chunks.
For example, you have things like
{tag1:value1} // simple chunk
{tag2:{tag3: value3}{tag4:value4}} // nested chunk
Multiple transactions are delimited by a $ apparently. You may be able to split on $ signs in this case, but again you need to be sure that the $ is a special character and doesn't appear in tags or values themselves.
Do not be fixated on what a "chunk" is or why I use the term. All you need to know is that there are "tags" and each tag comes with a particular "value".
The value can be anything: a primitive such as a string or number, or it can be another chunk. This suggests that you need to first figure out what type of value each tag accepts. For example, the 1 tag takes a string. The 4 tag takes multiple chunk, possibly representing different companies. There are chunks like DLM that have an empty value.
From these two samples, I would assume that you need to consume each each chunk, check the tag, and then parse the value. Since there are nested chunks, you likely need to store them in a particular way to correctly handle it.
I want to append a character to a long integer, using the below code:
if (strArrIds[1].Contains("CO"))
{
long rdb2 = Convert.ToInt64(strArrIds[1].Substring(strArrIds[1].Length - 1));
assessmentEntity.RatingType = rdb2;
}
If rdb2 = 5, I want to append a L to this value, like rdb2 = 5L.
Any ideas? Thanks in advance.
You can using Long.Parse instead Convert.ToInt64 to get the long and you wont need to append L to make it long
if (strArrIds[1].Contains("CO"))
{
long rdb2 = long.Parse(strArrIds[1].Substring(strArrIds[1].Length - 1));
assessmentEntity.RatingType = rdb2;
}
If you are processing a lot of these, and if this item will be used for further processing, you could consider turning this into a class which might be easier to manage. This would also allow you to more easily customise your ratings.
I'm thinking maybe a factory might serve you well also which could instantiate and return your assessment entity. You could then leverage a dependency injection strategy for any other functionality.
Its a little hard to tell what you require this for without a bit more context.
If this is a once off, I would refactor to
assessmentEntity.RatingType = strArrIds[1].Contains("CO") ?
String.Concat(long.Parse(strArrIds[1].Substring(strArrIds[1].Length - 1)).ToString(), "L") :
"0N";
Assuming "0N" is some other default rating..
You do not need an L here. That is only for literals of type long appearing in the C# source.
You could do something like:
string str1 = strArrIds[1];
if (str1.Contains("CO"))
{
long rdb2 = str1[str1.Length - 1] - '0';
if (rdb2 < 0L || rdb2 > 9L)
throw new InvalidOperationException("Unexpected rdb2 value from str1=" + str1);
assessmentEntity.RatingType = rdb2;
}
In C#, I am looking for a way to solve simple equations like this. Z = A + B
I am trying to build a class that would give me the 3rd parameter if I give any of the 2 others.
Example, given Z=A+B
If you know A=3 and B=6 then you know Z=9
If you know A=4 and Z=8 then you know B=4
How would I best perform these kinds of tasks in software?
The other idea is to use math expressions evaluates, like ncalc. They can interpret math expressions, for example convert 3*(8+2) into 30, but not solve equations like 3*(8+x)=30 --> x=2.
Are you sure NCalc wouldn't do what you need? Take a look at an example from http://ncalc.codeplex.com/.
Define parameters, even dynamic or expressions
Expression e = new Expression("Round(Pow([Pi], 2) + Pow([Pi2], 2) + [X], 2)");
e.Parameters["Pi2"] = new Expression("Pi * [Pi]");
e.Parameters["X"] = 10;
e.EvaluateParameter += delegate(string name, ParameterArgs args)
{
if (name == "Pi")
args.Result = 3.14;
};
Debug.Assert(117.07 == e.Evaluate());
Please note this is untested - but it looks like you could do something like this with NCalc:
var e = new Expression("[A] + [B]");
e.Parameters = /* your input */
var result = e.Evaluate();
Try out C# Expression Evaluator , See if it matches your requirements.
The following class parses through a very large string (an entire novel of text) and breaks it into consecutive 4-character strings that are stored as a Tuple. Then each tuple can be assigned a probability based on a calculation. I am using this as part of a monte carlo/ genetic algorithm to train the program to recognize a language based on syntax only (just the character transitions).
I am wondering if there is a faster way of doing this. It takes about 400ms to look up the probability of any given 4-character tuple. The relevant method _Probablity() is at the end of the class.
This is a computationally intensive problem related to another post of mine: Algorithm for computing the plausibility of a function / Monte Carlo Method
Ultimately I'd like to store these values in a 4d-matrix. But given that there are 26 letters in the alphabet that would be a HUGE task. (26x26x26x26). If I take only the first 15000 characters of the novel then performance improves a ton, but my data isn't as useful.
Here is the method that parses the text 'source':
private List<Tuple<char, char, char, char>> _Parse(string src)
{
var _map = new List<Tuple<char, char, char, char>>();
for (int i = 0; i < src.Length - 3; i++)
{
int j = i + 1;
int k = i + 2;
int l = i + 3;
_map.Add
(new Tuple<char, char, char, char>(src[i], src[j], src[k], src[l]));
}
return _map;
}
And here is the _Probability method:
private double _Probability(char x0, char x1, char x2, char x3)
{
var subset_x0 = map.Where(x => x.Item1 == x0);
var subset_x0_x1_following = subset_x0.Where(x => x.Item2 == x1);
var subset_x0_x2_following = subset_x0_x1_following.Where(x => x.Item3 == x2);
var subset_x0_x3_following = subset_x0_x2_following.Where(x => x.Item4 == x3);
int count_of_x0 = subset_x0.Count();
int count_of_x1_following = subset_x0_x1_following.Count();
int count_of_x2_following = subset_x0_x2_following.Count();
int count_of_x3_following = subset_x0_x3_following.Count();
decimal p1;
decimal p2;
decimal p3;
if (count_of_x0 <= 0 || count_of_x1_following <= 0 || count_of_x2_following <= 0 || count_of_x3_following <= 0)
{
p1 = e;
p2 = e;
p3 = e;
}
else
{
p1 = (decimal)count_of_x1_following / (decimal)count_of_x0;
p2 = (decimal)count_of_x2_following / (decimal)count_of_x1_following;
p3 = (decimal)count_of_x3_following / (decimal)count_of_x2_following;
p1 = (p1 * 100) + e;
p2 = (p2 * 100) + e;
p3 = (p3 * 100) + e;
}
//more calculations omitted
return _final;
}
}
EDIT - I'm providing more details to clear things up,
1) Strictly speaking I've only worked with English so far, but its true that different alphabets will have to be considered. Currently I only want the program to recognize English, similar to whats described in this paper: http://www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf
2) I am calculating the probabilities of n-tuples of characters where n <= 4. For instance if I am calculating the total probability of the string "that", I would break it down into these independent tuples and calculate the probability of each individually first:
[t][h]
[t][h][a]
[t][h][a][t]
[t][h] is given the most weight, then [t][h][a], then [t][h][a][t]. Since I am not just looking at the 4-character tuple as a single unit, I wouldn't be able to just divide the instances of [t][h][a][t] in the text by the total no. of 4-tuples in the next.
The value assigned to each 4-tuple can't overfit to the text, because by chance many real English words may never appear in the text and they shouldn't get disproportionally low scores. Emphasing first-order character transitions (2-tuples) ameliorates this issue. Moving to the 3-tuple then the 4-tuple just refines the calculation.
I came up with a Dictionary that simply tallies the count of how often the tuple occurs in the text (similar to what Vilx suggested), rather than repeating identical tuples which is a waste of memory. That got me from about ~400ms per lookup to about ~40ms per, which is a pretty great improvement. I still have to look into some of the other suggestions, however.
In yoiu probability method you are iterating the map 8 times. Each of your wheres iterates the entire list and so does the count. Adding a .ToList() ad the end would (potentially) speed things. That said I think your main problem is that the structure you've chossen to store the data in is not suited for the purpose of the probability method. You could create a one pass version where the structure you store you're data in calculates the tentative distribution on insert. That way when you're done with the insert (which shouldn't be slowed down too much) you're done or you could do as the code below have a cheap calculation of the probability when you need it.
As an aside you might want to take puntuation and whitespace into account. The first letter/word of a sentence and the first letter of a word gives clear indication on what language a given text is written in by taking punctuaion charaters and whitespace as part of you distribution you include those characteristics of the sample data. We did that some years back. Doing that we shown that using just three characters was almost as exact (we had no failures with three on our test data and almost as exact is an assumption given that there most be some weird text where the lack of information would yield an incorrect result). as using more (we test up till 7) but the speed of three letters made that the best case.
EDIT
Here's an example of how I think I would do it in C#
class TextParser{
private Node Parse(string src){
var top = new Node(null);
for (int i = 0; i < src.Length - 3; i++){
var first = src[i];
var second = src[i+1];
var third = src[i+2];
var fourth = src[i+3];
var firstLevelNode = top.AddChild(first);
var secondLevelNode = firstLevelNode.AddChild(second);
var thirdLevelNode = secondLevelNode.AddChild(third);
thirdLevelNode.AddChild(fourth);
}
return top;
}
}
public class Node{
private readonly Node _parent;
private readonly Dictionary<char,Node> _children
= new Dictionary<char, Node>();
private int _count;
public Node(Node parent){
_parent = parent;
}
public Node AddChild(char value){
if (!_children.ContainsKey(value))
{
_children.Add(value, new Node(this));
}
var levelNode = _children[value];
levelNode._count++;
return levelNode;
}
public decimal Probability(string substring){
var node = this;
foreach (var c in substring){
if(!node.Contains(c))
return 0m;
node = node[c];
}
return ((decimal) node._count)/node._parent._children.Count;
}
public Node this[char value]{
get { return _children[value]; }
}
private bool Contains(char c){
return _children.ContainsKey(c);
}
}
the usage would then be:
var top = Parse(src);
top.Probability("test");
I would suggest changing the data structure to make that faster...
I think a Dictionary<char,Dictionary<char,Dictionary<char,Dictionary<char,double>>>> would be much more efficient since you would be accessing each "level" (Item1...Item4) when calculating... and you would cache the result in the innermost Dictionary so next time you don't have to calculate at all..
Ok, I don't have time to work out details, but this really calls for
neural classifier nets (Just take any off the shelf, even the Controllable Regex Mutilator would do the job with way more scalability) -- heuristics over brute force
you could use tries (Patricia Tries a.k.a. Radix Trees to make a space optimized version of your datastructure that can be sparse (the Dictionary of Dictionaries of Dictionaries of Dictionaries... looks like an approximation of this to me)
There's not much you can do with the parse function as it stands. However, the tuples appear to be four consecutive characters from a large body of text. Why not just replace the tuple with an int and then use the int to index the large body of text when you need the character values. Your tuple based method is effectively consuming four times the memory the original text would use, and since memory is usually the bottleneck to performance, it's best to use as little as possible.
You then try to find the number of matches in the body of text against a set of characters. I wonder how a straightforward linear search over the original body of text would compare with the linq statements you're using? The .Where will be doing memory allocation (which is a slow operation) and the linq statement will have parsing overhead (but the compiler might do something clever here). Having a good understanding of the search space will make it easier to find an optimal algorithm.
But then, as has been mentioned in the comments, using a 264 matrix would be the most efficent. Parse the input text once and create the matrix as you parse. You'd probably want a set of dictionaries:
SortedDictionary <int,int> count_of_single_letters; // key = single character
SortedDictionary <int,int> count_of_double_letters; // key = char1 + char2 * 32
SortedDictionary <int,int> count_of_triple_letters; // key = char1 + char2 * 32 + char3 * 32 * 32
SortedDictionary <int,int> count_of_quad_letters; // key = char1 + char2 * 32 + char3 * 32 * 32 + char4 * 32 * 32 * 32
Finally, a note on data types. You're using the decimal type. This is not an efficient type as there is no direct mapping to CPU native type and there is overhead in processing the data. Use a double instead, I think the precision will be sufficient. The most precise way will be to store the probability as two integers, the numerator and denominator and then do the division as late as possible.
The best approach here is to using sparse storage and pruning after each each 10000 character for example. Best storage strucutre in this case is prefix tree, it will allow fast calculation of probability, updating and sparse storage. You can find out more theory in this javadoc http://alias-i.com/lingpipe/docs/api/com/aliasi/lm/NGramProcessLM.html