Remove keywords and operators from string expression - c#

I need you help
I have one string like
a1 + b1 + ( v1 + g1 ) * 10
I need to retrieve only a1,b1,v1,g1
any idea

I would use RegEx to filter the desired output.
Assuming that your result always starts with a small letter and ends with a digit [a-z][0-9]
string input = "a1 + b1 + ( v1 + g1 ) * 10";
List<string> Result = Regex.Matches(input, #"[a-z][0-9]")
.Cast<Match>()
.Select(x => x.Value)
.ToList();

I'd go after it using a recursive descent parser. It may seem as an overkill to start with but it will work for all sorts of expressions.
Here's a good introduction to the theory.

Related

LINQ conditional selection and formatting

I have a string of characters and I'm trying to set up a query that'll substitute a specific sequence of similar characters into a character count. Here's an example of what I'm trying to do:
agd69dnbd555bdggjykbcx555555bbb
In this case, I'm trying to isolate and count ONLY the occurrences of the number 5, so my output should read:
agd69dnbd3bdggjykbcx6bbb
My current code is the following, where GroupAdjacentBy is a function that groups and counts the character occurrences as above.
var res = text
.GroupAdjacentBy((l, r) => l == r)
.Select(x => new { n = x.First(), c = x.Count()})
.ToArray();
The problem is that the above function groups and counts EVERY SINGLE character in my string, not the just the one character I'm after. Is there a way to conditionally perform that operation on ONLY the character I need counted?
Regex is a better tool for this job than LINQ.
Have a look at this:
string input = "agd69dnbd555bdggjykbcx555555bbb";
string pattern = #"5+"; // Match 1 or more adjacent 5 character
string output = Regex.Replace(input, pattern, match => match.Length.ToString());
// output = agd69dnbd3bdggjykbcx6bbb
Not sure if your intending to replace every 5 character, or just when there is more than one adjacent 5.
If it's the latter, just change your pattern to:
string pattern = #"5{2,}"; // Match 2 or more adjacent 5's
The best answer was already given by Johnathan Barclay. But just for the case that you need something similar by using Linq and to show an alternative solution:
var charToCombine = '5';
var res = text
.GroupAdjacentBy((l, r) => l == r )
.SelectMany(x => x.Count() > 1 && x.First() == charToCombine ? x.Count().ToString() : x)
.ToArray();

Why is one character missing in the query result?

Take a look at the code:
string expression = "x & ~y -> (s + t) & z";
var exprCharsNoWhitespace = expression.Except( new[]{' ', '\t'} ).ToList();
var exprCharsNoWhitespace_2 = expression.Replace( " ", "" ).Replace( "\t", "" ).ToList();
// output for examination
Console.WriteLine( exprCharsNoWhitespace.Aggregate( "", (a,x) => a+x ) );
Console.WriteLine( exprCharsNoWhitespace_2.Aggregate( "", (a,x) => a+x ) );
// Output:
// x&~y->(s+t)z
// x&~y->(s+t)&z
I want to remove all whitespace from the original string and then get the individual characters.
The result surprized me.
The variable exprCharsNoWhitespace contains, as expected, no whitespace, but unexpectedly, only almost all of the other characters. The last occurence of '&' is missing, the Count of the list is 12.
Whereas exprCharsNoWhitespace_2 is completely as expected: Count is 13, all characters other than whitespace are contained.
The framework used was .NET 4.0.
I also just pasted this to csharppad (web-based IDE/compiler) and got the same results.
Why does this happen?
EDIT:
Allright, I was unaware that Except is, as pointed out by Ryan O'Hara, a set operation. I hadn't used it before.
// So I'll continue just using something like this:
expression.Where( c => c!=' ' && c!='\t' )
// or for more characters this can be shorter:
expression.Where( c => ! new[]{'a', 'b', 'c', 'd'}.Contains(c) ).
Except produces a set difference. Your expression isn’t a set, so it’s not the right method to use. As to why the & specifically is missing: it’s because it’s repeated. None of the other characters is.
Ryan already answered your question as asked, but I'd like to provide you an alternative solution to the problem you are facing. If you need to do a lot of string manipulation, you may find regular expression pattern matching to be helpful. The examples you've given would work something like this:
string expression = "x & ~y -> (s + t) & z";
string pattern = #"\s";
string replacement = "";
string noWhitespace = new Regex(pattern).Replace(expression, replacement);
Or for the second example, keep everything the same except the pattern:
string pattern = "[abcd]";
Keep the Regex object stored somewhere rather than creating it each time if you're going to use the same pattern a lot.
As already mentioned .Except(...) is a set operation so it drops duplicates.
Try just using .Where(...) instead:
string expression = "x & ~y -> (s + t) & z";
var exprCharsNoWhitespace =
String.Join(
"",
expression.Where(c => !new[] { ' ', '\t' }.Contains(c)));
This gives:
x&~y->(s+t)&z

Custom string variable in regex

I want to find two or more variable in a string with Regex. For instance I have an string like this "Result = Num + 2 ( 6 * Count )". I want to find out if "Result", "Num" and "Count" are in this string or not. Suppose that I want to build a small compiler and these Strings are my reserved words and I want to use regex for this checks.
Case sensitive is more important for me. For example if client inputs "num" or "count" in a string, the method must return false.
How can I do it in C#?
Update to use arbitary word collection
var words = new [] {
"Result",
"Num",
"Count"
};
var source = "Result = Num + 2 ( 6 * Count)";
var regex=new Regex(string.format(#"\b(?<words>(?-i){0})\b", string.Join("|",words));
var results = (
from m in regex.Matches(source).OfType<Match>()
select m.Groups["words"].Value
).ToArray();
results will be an array of matching words
However if as you state as your comment in another answer you are building a small compiler you would be better off building a tokenising engine. For example Build a Better Tokeniser
This is probably one of the first things in a Regex tutorial..
string expression = "Result = Num + 2 ( 6 * Count )";
foreach (Match match in Regex.Matches(expression, "[a-zA-Z]+")) {
Console.WriteLine(match.Value);
}

How to trim characters from certain patterned words in string?

Given the following string:
string s = "I need drop the 1 from the end of AAAAAAAA1 and BBBBBBBB1"
How can I trim the "1" from any 8 character string that ends in a 1? I got so far as to find a working Regex pattern that finds these strings, and I'm guessing I could use a TrimEnd to remove the "1", but how I do I modify the string itself?
Regex regex = new Regex("\\w{8}1");
foreach (Match match in regex.Matches(s))
{
MessageBox.Show(match.Value.TrimEnd('1'));
}
The result I'm looking for would be "I need drop the 1 from the end of AAAAAAAA and BBBBBBBB"
Regex.Replace is the tool for the job:
var regex = new Regex("\\b(\\w{8})1\\b");
regex.replace(s, "$1");
I slightly modified the regular expression to match the description of what you are trying to do more closely.
Here a non-regex approach:
s = string.Join(" ", s.Split().Select(w => w.Length == 9 && w.EndsWith("1") ? w.Substring(0, 8) : w));
In VB with LINQ:
Dim l = 8
Dim s = "I need drop the 1 from the end of AAAAAAAA1 and BBBBBBBB1"
Dim d = s.Split(" ").Aggregate(Function(p1, p2) p1 & " " & If(p2.Length = l + 1 And p2.EndsWith("1"), p2.Substring(0, p2.Length - 1), p2))
Try this:
s = s.Replace(match.Value, match.Value.TrimEnd('1'));
And the s string will have the value you want.

LINQ to count occurrences

I have the following query which works great:
string[] Words = {"search","query","example"};
... Snip ...
var Results = (
from a in q
from w in Words
where
(
a.Title.ToLower().Contains(w)
|| a.Body.ToLower().Contains(w)
)
select new
{
a,
Count = 0
}).OrderByDescending(x=> x.Count)
.Distinct()
.Take(Settings.ArticlesPerPage);
What I need it to do, is return Count which is the total occurrences of the words. I'm going to weight it in favour of the title as well, example:
Count = (OccuranceInTitle * 5) + (OccurancesInBody)
I'm assuming I need to use the Linq.Count but I'm not sure how to apply it in this instance.
This is what I came up with:
var query =
from a in q
from w in Words
let title = a.Title.ToLower()
let body = a.Body.ToLower()
let replTitle = Regex.Replace(title, string.Format("\\b{0}\\b", w), string.Empty)
let replBody = Regex.Replace(body, string.Format("\\b{0}\\b", w), string.Empty)
let titleOccurences = (title.Length - replTitle.Length) / w.Length
let bodyOccurences = (body.Length - replBody.Length) / w.Length
let score = titleOccurences * 5 + bodyOccurences
where score > 0
select new { Article = a, Score = score };
var results = query.GroupBy(r => r.Article)
.OrderByDescending(g => g.Sum(r => r.Score))
.Take(Settings.ArticlesPerPage);
Counting occurrences is done with the (surprisingly) quick and dirty method of replacing occurrences with string.Empty and calculating based on the resulting string length. After the scores for each article and each word are calculated, I 'm grouping for each article, ordering by the sum of scores for all the words and taking a chunk out of the results.
I didn't fire up the compiler, so please excuse any obvious mistakes.
Update: This version uses regexes as in
Regex.Replace(title, string.Format("\\b{0}\\b", w), string.Empty)
instead of the original version's
title.Replace(w, string.Empty)
so that it now matches only whole words (the string.Replace version would also match word fragments).

Categories