Extracting Formula from String

Extracting Formula from String - c#

I have to extract all variables from Formula
Fiddle for below problem
eg. (FB+AB+ESI) / 12
Output {FB,AB,ESI}
Code written so far
var length = formula.Length;
List<string> variables = new List<string>();
List<char> operators = new List<char> { '+', '-', '*', '/', ')', '(', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
int count = 0;
string character = string.Empty;
for (int i = 0; i < length; i++)
{
if (!operators.Contains(formula[i]))
character += formula[i];
else
{
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
character = string.Empty;
count = i;
}
}
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
return variables;
Output of the Method is {FB,AB,ESI} which is correct
My problem is where Varaible contains numeric field i.e
eg. (FB1+AB1)/100
Expected Output : {FB1,AB1}
But My method return {FB,AB}

If variable's names must start with
letter A..Z, a..z
and if variable's names can contain
letters A..Z, a..z
digits 0..1
underscopes _
you can use regular expressions:
String source = "(FB2+a_3B+EsI) / 12";
String pattern = #"([A-Z]|[a-z])+([A-z]|[a-z]|\d|_)*";
// output will be "{FB2,a_3B,EsI}"
String output = "{" + String.Join(",",
Regex.Matches(source, pattern)
.OfType<Match>()
.Select(item => item.Value)) + "}";
In case you need a collection, say an array of variable's names, just modify the Linq:
String names[] = Regex.Matches(source, pattern)
.OfType<Match>()
.Select(item => item.Value)
.ToArray();
However, what is implemented is just a kind of naive tokenizer: you have to separate "variable names" found from function names, class names, check if they are commented out etc.

Have changed your code to do what you asked, but not sure about the approach of the solution, seeing that bracket and operator precedence is not taken into consideration.
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
string formula = "AB1+FB+100";
var length = formula.Length;
List<string> variables = new List<string>();
List<char> operators = new List<char>{'+', '-', '*', '/', ')', '('};
List<char> numerals = new List<char>{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
int count = 0;
string character = string.Empty;
char prev_char = '\0';
for (int i = 0; i < length; i++)
{
bool is_operator = operators.Contains(formula[i]);
bool is_numeral = numerals.Contains(formula[i]);
bool is_variable = !(is_operator || is_numeral);
bool was_variable = character.Contains(prev_char);
if (is_variable || (was_variable && is_numeral) )
character += formula[i];
else
{
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
character = string.Empty;
count = i;
}
prev_char = formula[i];
}
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
foreach (var item in variables)
Console.WriteLine(item);
Console.WriteLine();
Console.WriteLine();
}
}
Maybe also consider something like Math-Expression-Evaluator (on nuget)

Here is how you could do it with Regular Expressions.
Regex regex = new Regex(#"([A-Z])\w+");
List<string> matchedStrings = new List<string>();
foreach (Match match in regex.Matches("(FB1+AB1)/100"))
{
matchedStrings.Add(match.Value);
}
This will create a list of strings of all the matches.

Without regex, you can split on the actual operators (not numbers), and then remove any items that begin with a number:
public static List<string> GetVariables(string formula)
{
if (string.IsNullOrWhitespace(formula)) return new List<string>();
var operators = new List<char> { '+', '-', '*', '/', '^', '%', '(', ')' };
int temp;
return formula
.Split(operators.ToArray(), StringSplitOptions.RemoveEmptyEntries)
.Where(operand => !int.TryParse(operand[0].ToString(), out temp))
.ToList();
}

You can do it this way, just optimize the code as you want.
string ss = "(FB+AB+ESI) / 12";
string[] spl = ss.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
string final = spl[0].Replace("(", "").Replace(")", "").Trim();
string[] entries = final.Split(new char[] {'+'}, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sbFinal = new StringBuilder();
sbFinal.Append("{");
foreach(string en in entries)
{
sbFinal.Append(en + ",");
}
string finalString = sbFinal.ToString().TrimEnd(',');
finalString += "}";

What you are trying to do is an interpreter.
I can't give you the whole code but what I can give you is a head start (it will require a lot of coding).
First, learn about reverse polish notation.
Second, you need to learn about stacks.
Third, you have to apply both to get what you want to interpret.

Related

Extract multiple substring in the same line

I'm trying to build a logparser but i'm stuck.
Right now my program goes trough multiple file in a directory and read all the file line by line.
I was able to identify the substring i was looking for "fct=" and extract the value next to the "=" using delimiter but i notice that when i have a line with more then one "fct=" it doesnt see it.
So i restart my code and i find a way to get the index position of all occurence of fct= in the same line using an extension method that put the index in a list but i dont see how i can use this list to get the value next to the "=" and using my delimiter.
How can i extract the value next to the "=" knowing the start position of "fct=" and the delimiter at the end of the wanted value?
I'm starting in C# so let me know if i can give you more information.
Thanks,
Here's an example of what i would like to parse:
<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>
I would like t retrieve 10019,666 and 4515.
namespace LogParserV1
{
class Program
{
static void Main(string[] args)
{
int counter = 0;
string[] dirs = Directory.GetFiles(#"C:/LogParser/LogParserV1", "*.txt");
string fctnumber;
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
foreach (string fileName in dirs)
{
StreamReader sr = new StreamReader(fileName);
{
String lineRead;
while ((lineRead = sr.ReadLine()) != null)
{
if (lineRead.Contains("fct="))
{
List<int> list = MyExtensions.GetPositions(lineRead, "fct");
//int start = lineRead.IndexOf("fct=") + 4;
// int end = lineRead.IndexOfAny(enddelimiter, start);
//string result = lineRead.Substring(start, end - start);
fctnumber = result;
//System.Console.WriteLine(fctnumber);
list.ForEach(Console.WriteLine);
}
// affiche tout les ligne System.Console.WriteLine(lineRead);
counter++;
}
System.Console.WriteLine(fileName);
sr.Close();
}
}
// Suspend the screen.
System.Console.ReadLine();
}
}
}
namespace ExtensionMethods
{
public class MyExtensions
{
public static List<int> GetPositions(string source, string searchString)
{
List<int> ret = new List<int>();
int len = searchString.Length;
int start = -len;
while (true)
{
start = source.IndexOf(searchString, start + len);
if (start == -1)
{
break;
}
else
{
ret.Add(start);
}
}
return ret;
}
}
}

You could simplify your code a lot by using Regex pattern matching instead.
The following pattern: (?<=FCT=)[0-9]* will match any group of digits preceded by FCT=.
Try it out
This enables us to do the following:
string input = "<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>...";
string pattern = "(?<=FCT=)[0-9]*";
var values = Regex.Matches(input, pattern).Cast<Match>().Select(x => x.Value);

I have tested this solution with your data, and it gives me the expected results (10019,666 and 4515)
string data = #"<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>";
char[] delimiters = { '<', ',', '&', ':', ' ', '\\', '\'' };
Regex regex = new Regex("fct=(.+)", RegexOptions.IgnoreCase);
var values = data.Split(delimiters).Select(x => regex.Match(x).Groups[1].Value);
values = values.Where(x => !string.IsNullOrWhiteSpace(x));
values.ToList().ForEach(Console.WriteLine);
I hope my solution will be helpful, let me know.

Below code is usefull to extract the repeated words with linq in text
string text = "Hi Naresh, How are you. You will be next Super man";
IEnumerable<string> strings = text.Split(' ').ToList();
var result = strings.AsEnumerable().Select(x => new {str = Regex.Replace(x.ToLowerInvariant(), #"[^0-9a-zA-Z]+", ""), count = Regex.Matches(text.ToLowerInvariant(), #"\b" + Regex.Escape(Regex.Replace(x.ToLowerInvariant(), #"[^0-9a-zA-Z]+", "")) + #"\b").Count}).Where(x=>x.count>1).GroupBy(x => x.str).Select(x => x.First());
foreach(var item in result)
{
Console.WriteLine(item.str +" = "+item.count.ToString());
}

As always, break down the porblem into smaller bits. See if the following methods help in any way. Tying it up to your code is left as an excercise.
private const string Prefix = "fct=";
//make delimiter look up fast
private static HashSet<char> endDelimiters =
new HashSet<char>(new [] { '<', ',', '&', ':', ' ', '\\', '\'' });
private static string[] GetAllFctFields(string line) =>
line.Split(new string[] { Prefix });
private static bool TryGetValue(string delimitedString, out string value)
{
var buffer = new StringBuilder(delimitedString.Length);
foreach (var c in delimitedString)
{
if (endDelimiters.Contains(c))
break;
buffer.Append(c);
}
//I'm assuming that no end delimiter is a format error.
//Modify according to requirements
if (buffer.Length == delimitedString.Length)
{
value = null;
return false;
}
value = buffer.ToString();
return true;
}

Something like :
class Program
{
static void Main(string[] args)
{
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
var fct = "fct=";
var lineRead = "fct=value1,useless text fct=vfct=alue2,fct=value3";
var values = new List<string>();
int start = lineRead.IndexOf(fct);
while(start != -1)
{
start += fct.Length;
int end = lineRead.IndexOfAny(enddelimiter, start);
if (end == -1)
end = lineRead.Length;
string result = lineRead.Substring(start, end - start);
values.Add(result);
start = lineRead.IndexOf(fct, end);
}
values.ForEach(Console.WriteLine);
}
}

You can split the line by string[]
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
while ((lineRead = sr.ReadLine()) != null)
{
string[] parts1 = lineRead.Split(new string[] { "fct=" },StringSplitOptions.None);
if(parts1.Length > 0)
{
foreach(string _ar in parts1)
{
if(!string.IsNullOrEmpty(_ar))
{
if(_ar.IndexOfAny(enddelimiter) > 0)
{
MessageBox.Show(_ar.Substring(0, _ar.IndexOfAny(enddelimiter)));
}
else
{
MessageBox.Show(_ar);
}
}
}
}
}

manipulating strings

I am trying to remove some special characters from a string.
I have got the following string
[_fesd][009] Statement
and I want to get rid of all '_' '[' and ']'
I managed to remove the first characters with TrimStart and I get fesd][009] Statement
How should I remove the special characters from the middle of my string?
Currently Im using the following code
string newStr = str.Trim(new Char[] { '[', ']', '_' });
where str is the strin that should be manupulated and the result should be stored in newStr

string newStr = str.Replace("[", "").Replace("]", "").Replace("_", "");

var newStr = Regex.Replace("[_fesd][009] Statement", "(\\[)|(\\])|(_)", string.Empty);

Use string.Replace with string.Empty as the string to replace with.

You could use Linq for it:
static void Main(string[] args)
{
var s = #"[_fesd][009] Statement";
var unwanted = #"_[]";
var sanitizedS = s
.Where(i => !unwanted.Contains(i))
.Aggregate<char, string>("", (a, b) => a + b);
Console.WriteLine(sanitizedS);
// output: fesd009 Statement
}

var chars = new Char[] { '[', ']', '_' };
var newValue = new String(str.Where(x => !chars.Contains(x)).ToArray());

How to convert Turkish chars to English chars in a string?

string strTurkish = "ÜST";
how to make value of strTurkish as "UST" ?

var text = "ÜST";
var unaccentedText = String.Join("", text.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));

You can use the following method for solving your problem. The other methods do not convert "Turkish Lowercase I (\u0131)" correctly.
public static string RemoveDiacritics(string text)
{
Encoding srcEncoding = Encoding.UTF8;
Encoding destEncoding = Encoding.GetEncoding(1252); // Latin alphabet
text = destEncoding.GetString(Encoding.Convert(srcEncoding, destEncoding, srcEncoding.GetBytes(text)));
string normalizedString = text.Normalize(NormalizationForm.FormD);
StringBuilder result = new StringBuilder();
for (int i = 0; i < normalizedString.Length; i++)
{
if (!CharUnicodeInfo.GetUnicodeCategory(normalizedString[i]).Equals(UnicodeCategory.NonSpacingMark))
{
result.Append(normalizedString[i]);
}
}
return result.ToString();
}

I'm not an expert on this sort of thing, but I think you can use string.Normalize to do it, by decomposing the value and then effectively removing an non-ASCII characters:
using System;
using System.Linq;
using System.Text;
class Test
{
static void Main()
{
string text = "\u00DCST";
string normalized = text.Normalize(NormalizationForm.FormD);
string asciiOnly = new string(normalized.Where(c => c < 128).ToArray());
Console.WriteLine(asciiOnly);
}
}
It's entirely possible that this does horrible things in some cases though.

public string TurkishCharacterToEnglish(string text)
{
char[] turkishChars = {'ı', 'ğ', 'İ', 'Ğ', 'ç', 'Ç', 'ş', 'Ş', 'ö', 'Ö', 'ü', 'Ü'};
char[] englishChars = {'i', 'g', 'I', 'G', 'c', 'C', 's', 'S', 'o', 'O', 'u', 'U'};
// Match chars
for (int i = 0; i < turkishChars.Length; i++)
text = text.Replace(turkishChars[i], englishChars[i]);
return text;
}

This is not a problem that requires a general solution. It is known that there only 12 special characters in Turkish alphabet that has to be normalized. Those are ı,İ,ö,Ö,ç,Ç,ü,Ü,ğ,Ğ,ş,Ş. You can write 12 rules to replace those with their English counterparts: i,I,o,O,c,C,u,U,g,G,s,S.

Public Function Ceng(ByVal _String As String) As String
Dim Source As String = "ığüşöçĞÜŞİÖÇ"
Dim Destination As String = "igusocGUSIOC"
For i As Integer = 0 To Source.Length - 1
_String = _String.Replace(Source(i), Destination(i))
Next
Return _String
End Function

public static string TurkishChrToEnglishChr(this string text)
{
if (string.IsNullOrEmpty(text)) return text;
Dictionary<char, char> TurkishChToEnglishChDic = new Dictionary<char, char>()
{
{'ç','c'},
{'Ç','C'},
{'ğ','g'},
{'Ğ','G'},
{'ı','i'},
{'İ','I'},
{'ş','s'},
{'Ş','S'},
{'ö','o'},
{'Ö','O'},
{'ü','u'},
{'Ü','U'}
};
return text.Aggregate(new StringBuilder(), (sb, chr) =>
{
if (TurkishChToEnglishChDic.ContainsKey(chr))
sb.Append(TurkishChToEnglishChDic[chr]);
else
sb.Append(chr);
return sb;
}).ToString();
}

Removing Specified Punctuation From Strings

I have a String that in need to convert into a String[] of each word in the string. However I do not need any white space or any punctuation EXCEPT hyphens and Apostrophes that belong in the word.
Example Input:
Hello! This is a test and it's a short-er 1. - [ ] { } ___)
Example of the Array made from Input:
[ "Hello", "this", "is", "a", "test", "and", "it's", "a", "short-er", "1" ]
Currently this is the code I have tried
(Note: the 2nd gives an error later in the program when string.First() is called):
private string[] ConvertWordsFromFile(String NewFileText)
{
char[] delimiterChars = { ' ', ',', '.', ':', '/', '|', '<', '>', '/', '#', '#', '$', '%', '^', '&', '*', '"', '(', ')', ';' };
string[] words = NewFileText.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
return words;
}
or
private string[] ConvertWordsFromFile(String NewFileText)
{
return Regex.Split(NewFileText, #"\W+");
}
The second example crashes with the following code
private string GroupWordsByFirstLetter(List<String> words)
{
var groups =
from w in words
group w by w.First();
return FormatGroupsByAlphabet(groups);
}
specifically, when w.First() is called.

To remove unwanted characters from a String
string randomString = "thi$ is h#ving s*me inva!id ch#rs";
string excpList ="$#*!";
LINQ Option 1
var chRemoved = randomString
.Select(ch => excpList.Contains(ch) ? (char?)null : ch);
var Result = string.Concat(chRemoved.ToArray());
LINQ Option 2
var Result = randomString.Split().Select(x => x.Except(excList.ToArray()))
.Select(c => new string(c.ToArray()))
.ToArray();

Here is a little something I worked up. Splits on \n and removes any unwanted characters.
private string ValidChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789'-";
private IEnumerable<string> SplitRemoveInvalid(string input)
{
string tmp = "";
foreach(char c in input)
{
if(c == '\n')
{
if(!String.IsNullOrEmpty(tmp))
{
yield return tmp;
tmp = "";
}
continue;
}
if(ValidChars.Contains(c))
{
tmp += tmp;
}
}
if (!String.IsNullOrEmpty(tmp)) yield return tmp;
}
Usage could be something like this:
string[] array = SplitRemoveInvalid("Hello! This is a test and it's a short-er 1. - [ ] { } _)")
.ToArray();
I didnt actually test it, but it should work. If it doesnt, it should be easy enough to fix.

Use string.Split(char [])
string strings = "4,6,8\n9,4";
string [] split = strings .Split(new Char [] {',' , '\n' });
OR
Try below if you get any unwanted empty items. String.Split Method (String[], StringSplitOptions)
string [] split = strings .Split(new Char [] {',' , '\n' },
StringSplitOptions.RemoveEmptyEntries);

This can be done quite easily with a RegEx, by matching words. I am using the following RegEx, which will allow hyphens and apostrophes in the middle of words, but will strip them out if they occur at a word boundary.
\w(?:[\w'-]*\w)?
See it in action here.
In C# it could look like this:
private string[] ConvertWordsFromFile(String NewFileText)
{
return (from m in new Regex(#"\w(?:[\w'-]*\w)?").Matches(NewFileText)
select m.Value).ToArray();
}
I am using LINQ to get an array of words from the MatchCollection returned by Matches.

Using LINQ to parse the numbers from a string

Is it possible to write a query where we get all those characters that could be parsed into int from any given string?
For example we have a string like: "$%^DDFG 6 7 23 1"
Result must be "67231"
And even slight harder: Can we get only first three numbers?

This will give you your string
string result = new String("y0urstr1ngW1thNumb3rs".
Where(x => Char.IsDigit(x)).ToArray());
And for the first 3 chars use .Take(3) before ToArray()

The following should work.
var myString = "$%^DDFG 6 7 23 1";
//note that this is still an IEnumerable object and will need
// conversion to int, or whatever type you want.
var myNumber = myString.Where(a=>char.IsNumber(a)).Take(3);
It's not clear if you want 23 to be considered a single number sequence, or 2 distinct numbers. My solution above assumes you want the final result to be 672

public static string DigitsOnly(string strRawData)
{
return Regex.Replace(strRawData, "[^0-9]", "");
}

string testString = "$%^DDFG 6 7 23 1";
string cleaned = new string(testString.ToCharArray()
.Where(c => char.IsNumber(c)).Take(3).ToArray());
If you want to use a white list (not always numbers):
char[] acceptedChars = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
string cleaned = new string(testString.ToCharArray()
.Where(c => acceptedChars.Contains(c)).Take(3).ToArray());

How about something like this?
var yourstring = "$%^DDFG 6 7 23 1";
var selected = yourstring.ToCharArray().Where(c=> c >= '0' && c <= '9').Take(3);
var reduced = yourstring.Where(char.IsDigit).Take(3);

Regex:
private int ParseInput(string input)
{
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(#"\d+");
string valueString = string.Empty;
foreach (System.Text.RegularExpressions.Match match in r.Matches(input))
valueString += match.Value;
return Convert.ToInt32(valueString);
}
And even slight harder: Can we get
only first three numbers?
private static int ParseInput(string input, int take)
{
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(#"\d+");
string valueString = string.Empty;
foreach (System.Text.RegularExpressions.Match match in r.Matches(input))
valueString += match.Value;
valueString = valueString.Substring(0, Math.Min(valueString.Length, take));
return Convert.ToInt32(valueString);
}

> 'string strRawData="12#$%33fgrt$%$5";
> string[] arr=Regex.Split(strRawData,"[^0-9]"); int a1 = 0;
> foreach (string value in arr) { Console.WriteLine("line no."+a1+" ="+value); a1++; }'
Output:line no.0 =12
line no.1 =
line no.2 =
line no.3 =33
line no.4 =
line no.5 =
line no.6 =
line no.7 =
line no.8 =
line no.9 =
line no.10 =5
Press any key to continue . . .

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extracting Formula from String - c#

Here is how you could do it with Regular Expressions. Regex regex = new Regex(#"([A-Z])\w+"); List<string> matchedStrings = new List<string>(); foreach (Match match in regex.Matches("(FB1+AB1)/100")) { matchedStrings.Add(match.Value); } This will create a list of strings of all the matches.

What you are trying to do is an interpreter. I can't give you the whole code but what I can give you is a head start (it will require a lot of coding). First, learn about reverse polish notation. Second, you need to learn about stacks. Third, you have to apply both to get what you want to interpret.

Related

Extract multiple substring in the same line

manipulating strings

How to convert Turkish chars to English chars in a string?

Removing Specified Punctuation From Strings

Using LINQ to parse the numbers from a string

Categories

Resources