Remove substring if number exists before keyword

Remove substring if number exists before keyword - c#

I have a strings with the form:
5 dogs = 1 medium size house
4 cats = 2 small houses
one bird = 1 bird cage
What I amt trying to do is remove the substring that exists before the equals sign but only if the substring contains a keyword and the data before that keyword is a integer.
So in this example my key words are:
dogs,
cats,
bird
In the above example, the ideal output of my process would be:
1 medium size house
2 small houses
one bird = 1 bird cage
My code so far looks like this (I am hard coding the keyword values/strings for now)
var orginalstring= "5 dogs = 1 medium size house";
int equalsindex = originalstring.indexof('=');
var prefix = originalstring.Substring(0,equalsindex);
if(prefix.Contains("dogs")
{
var modifiedstring = originalstring.Remove(prefix).Replace("=", string.empty);
return modifiedstring;
}
return originalstring;
The issue here is that I am removing the whole substring regardless of whether or not the data preceding the keyword is a number.
Would somebody be able to help me with this additional logic?
Thanks so much as always for anybody who takes a few minutes to read this question.
Mick

You can do it with a simple regex of the form
\d+\s+(?:kw1|kw2|kw3|...)\s*=\s*
where kwX is the corresponding keyword.
var data = new[] {
"5 dogs = 1 medium size house",
"4 cats = 2 small houses",
"one bird = 1 bird cage"
};
var keywords = new[] {"dogs", "cats", "bird"};
var regexStr = string.Format( #"\d+\s+(?:{0})\s*=\s*", string.Join("|", keywords));
var regex = new Regex(regexStr);
foreach (var s in data) {
Console.WriteLine("'{0}'", regex.Replace(s, string.Empty));
}
In the example above the call of string.Format pastes the list of keywords joined by | into the "template" of the expression at the top of the post, i.e.
\d+\s+(?:dogs|cats|bird)\s*=\s*
This expression matches
One or more digits \d+, followed by
One or more space \s+, followed by
A keyword from the list: dogs, cats, bird (?:dogs|cats|bird), followed by
Zero or more spaces \s*, followed by
An equal sign =, followed by
Zero or more spaces \s*
The rest is easy: since this regex matches the part that you wish to remove, you need to call Replace and pass it string.Empty.
Demo.

You can use regex (System.Text.RegularExpressions) to identify whether or not there is a number in the string.
Regex r = new Regex("[0-9]"); //Look for a number between 0 and 9
bool hasNumber = r.IsMatch(prefix);
This Regex simply searches for any number in the string. If you want to search for a number-space-string you could use [0-9] [a-z]|[A-Z]. The | is an "or" so that both upper and lower case letters result in a match.

You can try something like this:
int i;
if(int.TryParse(prefix.Substring(0, 1), out i)) //try to get an int from first char of prefix
{
//remove prefix
}
This will only work for single-digit integers, however.

Related

Extract numbers if string format matches

I want to check if an input string follows a pattern and if it does extract information from it.
My pattern is like this Episode 000 (Season 00). The 00s are numbers that can range from 0-9. Now I want to check if this input Episode 094 (Season 02) matches this pattern and because it does it should then extract those two numbers, so I end up with two integer variables 94 & 2:
string latestFile = "Episode 094 (Season 02)";
if (!Regex.IsMatch(latestFile, #"^(Episode)\s[0-9][0-9][0-9]\s\((Season)\s[0-9][0-9]\)$"))
return
int Episode = Int32.Parse(Regex.Match(latestFile, #"\d+").Value);
int Season = Int32.Parse(Regex.Match(latestFile, #"\d+").Value);
The first part where I check if the overall string matches the pattern works, but I think it can be improved. For the second part, where I actually extract the numbers I'm stuck and what I posted above obviously doesn't works, because it grabs all digits from the string. So if anyone of you could help me figure out how to only extract the three number characters after Episode and the two characters after Season that would be great.

^Episode (\d{1,3}) \(Season (\d{1,2})\)$
Captures the 2 numbers (even with length 1 to 3/2) and gives them back as a group.
You can go even further and name your groups:
^Episode (?<episode>\d{1,3}) \(Season (?<season>\d{1,2})\)$
and then call them.
Example for using groups:
string pattern = #"abc(?<firstGroup>\d{1,3})abc";
string input = "abc234abc";
Regex rgx = new Regex(pattern);
Match match = rgx.Match(input);
string result = match.Groups["firstGroup"].Value; //=> 234
You can see what the expressions mean and test them here

In your regex ^(Episode)\s[0-9][0-9][0-9]\s\((Season)\s[0-9][0-9]\)$ you are capturing Episode and Season in a capturing group, but what you actually want to capture is the digits. You could switch your capturing groups like this:
^Episode\s([0-9][0-9][0-9])\s\(Season\s([0-9][0-9])\)$
Matching 3 digits in this way [0-9][0-9][0-9] can be written as \d{3} and [0-9][0-9] as \d{2}.
That would look like ^Episode\s(\d{3})\s\(Season\s(\d{2})\)$
To match one or more digits you could use \d+.
The \s is a matches a whitespace character. You could use \s or a whitespace.
Your regex could look like:
^Episode (\d{3}) \(Season (\d{2})\)$
string latestFile = "Episode 094 (Season 02)";
GroupCollection groups = Regex.Match(latestFile, #"^Episode (\d{3}) \(Season (\d{2})\)$").Groups;
int Episode = Int32.Parse(groups[1].Value);
int Season = Int32.Parse(groups[2].Value);
Console.WriteLine(Episode);
Console.WriteLine(Season);
That would result in:
94
2
Demo C#

Checking syntax of strings - C#

I'm trying to find out how to analyze the syntax of a sentence in C#.
In my case I have a syntax which every sentence has to follow.
The syntax looks like this:
A 'B' is a 'C'.
Every sentence has to contain five words. The first word of my sentence has to be 'A', the third 'is' and the fourth 'a'.
Now I would like to examine a test sentence if it matches my syntax.
Test sentence:
A Dog is no Cat.
In this example the test sentence would be wrong, because the fourth word is 'no' and not 'a' what it should be basend on the syntax.
I read about LINQ where I can query sentences that contain a specified set of words.
The code would look something like this:
//Notice the third sentence would have the correct syntax
string text = "A Dog is no Cat. My Dog is a Cat. A Dog is a Cat.";
//Splitting text into single sentences
string[] sentences = text.Split(new char[] { '.'});
//Defining the search terms
string[] wordToMatch ={"A", "is"};
//Find sentences that contain all terms I'm looking for
var sentenceQuery = from sentence in sentences
let w = sentence.Split(new Char[] {'.'})
where w.Distinct().Intersect(wordsToMatch).Count == wordsToMatch.Count()
select sentence;
With this code I could check if the sentences contain my terms I'm looking for, but the problem is it's not checking the position of the words in the sentence.
Is there a way I could check the position as well or maybe a better way to check the syntax of a sentence with C#?

Try using regular expressions, something like this:
using System.Text.RegularExpressions;
...
string source = "A Dog is no Cat.";
bool result = Regex.IsMatch(source, #"^A\s+[A-Za-z0-9]+\s+is\s+a\s+[A-Za-z0-9]+\.$");
Pattern explanation:
^ - start of the string (anchor)
A - Letter A
\s+ - one or more whitelines (spaces)
[A-Za-z0-9]+ - 1st word (can contain A..Z, a..z letters and 0..9 digits)
\s+ - one or more whitelines (spaces)
is - is
\s+ - one or more whitelines (spaces)
a - a
\s+ - one or more whitelines (spaces)
[A-Za-z0-9]+ - 2nd word (can contain A..Z, a..z letters and 0..9 digits)
\. - full stop
$ - end of the string (anchor)
You can slightly modify the code and obtain actual 1st and 2nd strings' values:
string source = "A Dog is a Cat."; // valid string
string pattern =
#"^A\s+(?<First>[A-Za-z0-9]+)\s+is\s+a\s+(?<Second>[A-Za-z0-9]+)\.$";
var match = Regex.Match(source, pattern);
if (match.Success) {
string first = match.Groups["First"].Value; // "Dog"
string second = match.Groups["Second"].Value; // "Cat"
...
}

A regular expression would work for this, and would be the most concise, but may not be the most readable solution. Here is a simple method that will return true if the sentence is valid:
private bool IsSentenceValid(string sentence)
{
// split the sentence into an array of words
char[] splitOn = new char[] {' '};
string[] words = sentence.ToLower().Split(splitOn); // make all chars lowercase for easy comparison
// check for 5 words.
if (words.Length != 5)
return false;
// check for required words
if (words[0] != "a" || words[2] != "is" || words[3] != "a")
return false;
// if we got here, we're fine!
return true;
}

Just want to throw ideas. I would write three classes for this:
SentenceManager: which gets string as a sentence and has a public method public string GetWord(word_index). for example GetWord(3) would return the 3rd word in the sentence that has been given to the class constructor.
SentenceSyntax: in this class, you can say how many words your sentence must have. what words must be known and you can set the index of those words too.
SyntaxChecker: this class gets a SentenceSyntax object and a SentenceManager object and has a function called Check which returns true if the syntax matches the sentence.
remember there can be thousands of ways to make something work. but there are some few ways to do it right.

You should definitely do this using Regex or something similar like Dmitry has answered
Just for kicks, I wanted to do it your way. This is how I would do if I was going nuts :)
//Notice the third sentence would have the correct syntax
string text = "A Dog is no Cat.My Dog is a Cat.A Dog is a Cat.";
//Splitting text into single sentences
string[] sentences = text.Split(new char[] { '.' });
string[] wordsToMatch = { "A", "*", "is", "a", "*" };
var sentenceQuery = from sentence in sentences
let words = sentence.Split(' ')
where words.Length == wordsToMatch.Length &&
wordsToMatch.Zip(words, (f, s) => f == "*" || f == s).All(p => p)
select sentence;
Using this code, you can also get flexibility like cases insensitive comparison, and trim space around the word, etc - of course you will have to code for that

.NET RegEx.Replace substring with special chars [duplicate]

I want to insert a dollar sign at a specific position between two named capturing groups. The problem is that this means two immediately following dollar-signs in the replacement-string which results in problems.
How am I able to do that directly with the Replace-method? I only found a workaround by adding some temporary garbage that I instantly remove again.
See code for the problem:
// We want to add a dollar sign before a number and use named groups for capturing;
// varying parts of the strings are in brackets []
// [somebody] has [some-dollar-amount] in his [something]
string joeHas = "Joe has 500 in his wallet.";
string jackHas = "Jack has 500 in his pocket.";
string jimHas = "Jim has 740 in his bag.";
string jasonHas = "Jason has 900 in his car.";
Regex dollarInsertion = new Regex(#"(?<start>^.*? has )(?<end>\d+ in his .*?$)", RegexOptions.Multiline);
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Console.WriteLine("--------------------------");
joeHas = dollarInsertion.Replace(joeHas, #"${start}$${end}");
jackHas = dollarInsertion.Replace(jackHas, #"${start}$-${end}");
jimHas = dollarInsertion.Replace(jimHas, #"${start}\$${end}");
jasonHas = dollarInsertion.Replace(jasonHas, #"${start}$kkkkkk----kkkk${end}").Replace("kkkkkk----kkkk", "");
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Output:
Joe has 500 in his wallet.
Jack has 500 in his pocket.
Jim has 740 in his bag.
Jason has 900 in his car.
--------------------------
Joe has ${end}
Jack has $-500 in his pocket.
Jim has \${end}
Jason has $900 in his car.

Use this replacement pattern: "${start}$$${end}"
The double $$ escapes the $ so that it is treated as a literal character. The third $ is really part of the named group ${end}. You can read about this on the MSDN Substitutions page.
I would stick with the above approach. Alternately you can use the Replace overload that accepts a MatchEvaluator and concatenate what you need, similar to the following:
jackHas = dollarInsertion.Replace(jackHas,
m => m.Groups["start"].Value + "$" + m.Groups["end"].Value);

Why are you using regex for this in the first place?
string name = "Joe";
int amount = 500;
string place = "car";
string output = string.Format("{0} has ${1} in his {2}",name,amount,place);

Regex expression for matching only floating point numbers

Hi i need a Regex Expression for extracting only floating point numbers from right to left
Example string
Earning per Equity Share (in ) face value of 2 each26 1,675.10
1,252.56
My current Regex
(\+|-)?[0-9][0-9]*(\,[0-9]*)?(\.[0-9]*)? with Rex options-Right to left
but
Current Output is
1,252.56
1,675.10
26
2
However i do not want to match on 26 or 2
Please help me

Maybe something like this will help
Regex
/[-+]?[0-9,\.]*([,\.])[0-9]*/g
Example input
Earning -34 5 b4 pe8r blah4 t3st + - (in) 1,252.56 face
-12234,23423.342 of 1,675.10 1,252.56
Matches
1,252.56
-12234,23423.342
1,675.10
1,252.56
Explanation
[-+]? match a single character present in the list below
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
-+ a single character in the list -+ literally
[0-9,\.]* match a single character present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
, the literal character ,
\. matches the character . literally
1st Capturing group ([,\.])
[,\.] match a single character present in the list below
, the literal character ,
\. matches the character . literally
[0-9]* match a single character present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
g modifier: global. All matches (don't return on first match)

Although this is a Regex question this is also taged as C#.
Below is an example of how you might get a little bit more control over your output.
It's also culture-specific and only picks up numbers with a decimal place, and has no false positives.
Method
private List<double> GetNumbers(string input)
{
// declare result
var resultList = new List<double>();
// if input is empty return empty results
if (string.IsNullOrEmpty(input))
{
return resultList;
}
// Split input in to words, exclude empty entries
var words = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// set your desirted culture
var culture = CultureInfo.CreateSpecificCulture("en-GB");
// Refine words into a list that represents potential numbers
// must have decimal place, must not start or end with decimal place
var refinedList = words.Where(x => x.Contains(".") && !x.StartsWith(".") && !x.EndsWith("."));
foreach (var word in refinedList)
{
double value;
// parse words using designated culture, and the Number option of double.TryParse
if (double.TryParse(word, NumberStyles.Number, culture, out value))
{
resultList.Add(value);
}
}
return resultList;
}
Usage
var testString = "Earning -34 5 b4 , . 234. 234, ,345 45.345 $234234 234.3453.345 $23423.2342 +234 -23423 pe8r blah4 t3st + - (in) 1,252.56 face -12234,23423.342 of 1,675.10 1,252.56";
var results = GetNumbers(testString);
foreach (var item in results)
{
Debug.WriteLine("{0}", item);
}
Output
45.345
1252.56
-1223423423.342
1675.1
1252.56
Additional Notes
You can learn more about double.TryParse and its options here.
You can learn more about the CultureInfo Class here.

C# dollar problem with regex-replace

I want to insert a dollar sign at a specific position between two named capturing groups. The problem is that this means two immediately following dollar-signs in the replacement-string which results in problems.
How am I able to do that directly with the Replace-method? I only found a workaround by adding some temporary garbage that I instantly remove again.
See code for the problem:
// We want to add a dollar sign before a number and use named groups for capturing;
// varying parts of the strings are in brackets []
// [somebody] has [some-dollar-amount] in his [something]
string joeHas = "Joe has 500 in his wallet.";
string jackHas = "Jack has 500 in his pocket.";
string jimHas = "Jim has 740 in his bag.";
string jasonHas = "Jason has 900 in his car.";
Regex dollarInsertion = new Regex(#"(?<start>^.*? has )(?<end>\d+ in his .*?$)", RegexOptions.Multiline);
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Console.WriteLine("--------------------------");
joeHas = dollarInsertion.Replace(joeHas, #"${start}$${end}");
jackHas = dollarInsertion.Replace(jackHas, #"${start}$-${end}");
jimHas = dollarInsertion.Replace(jimHas, #"${start}\$${end}");
jasonHas = dollarInsertion.Replace(jasonHas, #"${start}$kkkkkk----kkkk${end}").Replace("kkkkkk----kkkk", "");
Console.WriteLine(joeHas);
Console.WriteLine(jackHas);
Console.WriteLine(jimHas);
Console.WriteLine(jasonHas);
Output:
Joe has 500 in his wallet.
Jack has 500 in his pocket.
Jim has 740 in his bag.
Jason has 900 in his car.
--------------------------
Joe has ${end}
Jack has $-500 in his pocket.
Jim has \${end}
Jason has $900 in his car.

Use this replacement pattern: "${start}$$${end}"
The double $$ escapes the $ so that it is treated as a literal character. The third $ is really part of the named group ${end}. You can read about this on the MSDN Substitutions page.
I would stick with the above approach. Alternately you can use the Replace overload that accepts a MatchEvaluator and concatenate what you need, similar to the following:
jackHas = dollarInsertion.Replace(jackHas,
m => m.Groups["start"].Value + "$" + m.Groups["end"].Value);

Why are you using regex for this in the first place?
string name = "Joe";
int amount = 500;
string place = "car";
string output = string.Format("{0} has ${1} in his {2}",name,amount,place);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove substring if number exists before keyword - c#

You can try something like this: int i; if(int.TryParse(prefix.Substring(0, 1), out i)) //try to get an int from first char of prefix { //remove prefix } This will only work for single-digit integers, however.

Related

Extract numbers if string format matches

Checking syntax of strings - C#

.NET RegEx.Replace substring with special chars [duplicate]

Regex expression for matching only floating point numbers

C# dollar problem with regex-replace

Categories

Resources