String not in regular expression - c#

I want to use Regex to find matches in a string. There are other ways to find the pattern I am looking for, but I am interested in the Regex solution.
Concider these strings
"ABC123"
"ABC245"
"ABC435"
"ABC Oh say can You see"
I want to match the find "ABC" followed by ANYTHING BUT "123". What is the correct regex expression?

Using a negative lookahead:
/ABC(?!123)/
You can check if there are matches in a string str with:
Regex.IsMatch(str, "ABC(?!123)")
Full example:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string[] strings = {
"ABC123",
"ABC245",
"ABC435",
"ABC Oh say can You see"
};
string pattern = "ABC(?!123)";
foreach (string str in strings)
{
Console.WriteLine(
"\"{0}\" {1} match.",
str, Regex.IsMatch(str, pattern) ? "does" : "does not"
);
}
}
}
Live Demo
Alas, my Regex above will match ABC as long as it is not followed by 123. If you need to match at least a character after ABC that is not 123 (that is, do not match ABC on its own/end of the string), you can use ABC(?!123)., the dot ensures that you match at least one character after ABC: demo.
I believe the first Regex is what you're looking for though (as long as "nothing" can be considered "anything" :P).

Try the following test code. This should do what you require
string s1 = "ABC123";
string s2 = "we ABC123 weew";
string s3 = "ABC435";
string s4 = "Can ABC Oh say can You see";
List<string> list = new List<string>() { s1, s2, s3, s4 };
Regex regex = new Regex(#".*(?<=.*ABC(?!.*123.*)).*");
Match m = null;
foreach (string s in list)
{
m = regex.Match(s);
if (m != null)
Console.WriteLine(m.ToString());
}
The output is:
ABC435
Can ABC Oh say can You see
This uses both a 'Negative Lookahead' and a 'Positive Lookbehind'.
I hope this helps.

An alternative to regex, should you find this easier to use. Only a suggestion.
List<string> strs = new List<string>() { "ABC123",
"ABC245",
"ABC435",
"NOTABC",
"ABC Oh say can You see"
};
for (int i = 0; i < strs.Count; i++)
{
//Set the current string variable
string str = strs[i];
//Get the index of "ABC"
int index = str.IndexOf("ABC");
//Do you want to remove if ABC doesn't exist?
if (index == -1)
continue;
//Set the index to be the next character from ABC
index += 3;
//If the index is within the length with 3 extra characters (123)
if (index <= str.Length && (index + 3) <= str.Length)
if (str.Substring(index, 3) == "123")
strs.RemoveAt(i);
}

Related

Variable length colon seperated string regex validation

I am trying to write a regex to validate and extract the values from a colon separated string that can have 1-4 values. I have found example where there are a fixed number of variables and tried to use this but it only picks up the first and last values, I need to extract all of them. The current regex is also including the : in the match, I simply want the value if possible
I am currently using this;
^([01ab])+(\:[01ab])*
but it only pulls the first and last values, not those in between if they exist.
Valid values;
0
0:a
0:a:1
0:1:a:b
Not valid
0:a:
0:a:1:b:
I suggest a two-step approach: validate the format with the regex and then split the string with : if it qualifies:
if (Regex.IsMatch(text, #"^[01ab](?::[01ab])*$"))
{
result = text.Split(':');
}
The ^[01ab](?::[01ab])*$ regex matches start of a string with ^, a 0, 1, a or b, and then 0 or more repetitions of : followed with a 0, 1, a or b and then end of string ($).
If you want to play with the regex a bit you will see that C# allows you to access all capture group values via CaptureCollection:
var text = "0:1:a:b";
var results = Regex.Match(text, #"^(?:([01ab])(?::\b|$))+$")?
.Groups[1].Captures.Cast<Capture>().Select(c => c.Value);
Console.WriteLine(string.Join(", ", results)); // => 0, 1, a, b
See the C# demo and the regex demo.
Regex details
^ - start of string
(?:([01ab])(?::\b|$))+ - 1 or more repetitions of:
([01ab]) - Group 1: 0, 1, a or b
(?::\b|$) - either : followed with a letter, digit (\b will also allow _ to follow, but it is missing in the pattern) or end of string
$ - end of string.
A not using regex approach (and why would you use regex unless you really have to) is this:
bool Validate(string s)
{
string[] valid = {"0", "1", "a", "b"};
var splitArray = s.Split(':');
if (splitArray.Length < 1 || splitArray.Length > 4)
return false;
return splitArray.All(a => valid.Contains(a));
}
It is more efficient to use a string method than regex. So try following :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication137
{
class Program
{
static void Main(string[] args)
{
string[] inputs = { "0", "0:a", "0:a:1", "0:1:a:b", "Not valid", "0:a:", "0:a:1:b:" };
foreach (string input in inputs)
{
string[] splitArray = input.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries).ToArray();
if (splitArray.Length < 2)
{
Console.WriteLine("Input: '{0}' Not Valid", input);
}
else
{
Console.WriteLine("Input: '{0}' First Value : '{1}', Last Value : '{2}'", input, splitArray[0], splitArray[splitArray.Length - 1]);
}
}
Console.ReadLine();
}
}
}

C# "between strings" run several times

Here is my code to find a string between { }:
var text = "Hello this is a {Testvar}...";
int tagFrom = text.IndexOf("{") + "{".Length;
int tagTo = text.LastIndexOf("}");
String tagResult = text.Substring(tagFrom, tagTo - tagFrom);
tagResult Output: Testvar
This only works for one time use.
How can I apply this for several Tags? (eg in a While loop)
For example:
var text = "Hello this is a {Testvar}... and we have more {Tagvar} in this string {Endvar}.";
tagResult[] Output (eg Array): Testvar, Tagvar, Endvar
IndexOf() has another overload that takes the start index of which starts to search the given string. if you omit it, it will always look from the beginning and will always find the first one.
var text = "Hello this is a {Testvar}...";
int start = 0, end = -1;
List<string> results = new List<string>();
while(true)
{
start = text.IndexOf("{", start) + 1;
if(start != 0)
end = text.IndexOf("}", start);
else
break;
if(end==-1) break;
results.Add(text.Substring(start, end - start));
start = end + 1;
}
I strongly recommend using regular expressions for the task.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var regex = new Regex(#"(\{(?<var>\w*)\})+", RegexOptions.IgnoreCase);
var text = "Hello this is a {Testvar}... and we have more {Tagvar} in this string {Endvar}.";
var matches = regex.Matches(text);
foreach (Match match in matches)
{
var variable = match.Groups["var"];
Console.WriteLine($"Found {variable.Value} from position {variable.Index} to {variable.Index + variable.Length}");
}
}
}
}
Output:
Found Testvar from position 17 to 24
Found Tagvar from position 47 to 53
Found Endvar from position 71 to 77
For more information about regular expression visit the MSDN reference page:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
and this tool may be great to start testing your own expressions:
http://regexstorm.net/tester
Hope this help!
I would use Regex pattern {(\\w+)} to get the value.
Regex reg = new Regex("{(\\w+)}");
var text = "Hello this is a {Testvar}... and we have more {Tagvar} in this string {Endvar}.";
string[] tagResult = reg.Matches(text)
.Cast<Match>()
.Select(match => match.Groups[1].Value).ToArray();
foreach (var item in tagResult)
{
Console.WriteLine(item);
}
c# online
Result
Testvar
Tagvar
Endvar
Many ways to skin this cat, here are a few:
Split it on { then loop through, splitting each result on } and taking element 0 each time
Split on { or } then loop through taking only odd numbered elements
Adjust your existing logic so you use IndexOf twice (instead of lastindexof). When you’re looking for a } pass the index of the { as the start index of the search
This is so easy by using Regular Expressions just by using a simple pattern like {([\d\w]+)}.
See the example below:-
using System.Text.RegularExpressions;
...
MatchCollection matches = Regex.Matches("Hello this is a {Testvar}... and we have more {Tagvar} in this string {Endvar}.", #"{([\d\w]+)}");
foreach(Match match in matches){
Console.WriteLine("match : {0}, index : {1}", match.Groups[1], match.index);
}
It can find any series of letters or number in these brackets one by one.

Splitting a string at first number and then returning 2 strings

Having some trouble adapting my splitting of a string into 2 parts to do it from the first number. It's currently splitting on the first space, but that won't work long term because cities have spaces in them too.
Current code:
var string = "Chicago 1234 Anytown, NY"
var commands = parameters.Split(new[] { ' ' }, 2);
var originCity = commands[0];
var destination = commands[1];
This works great for a city that has a single name, but I break on:
var string = "Los Angeles 1234 Anytown, NY"
I've tried several different approaches that I just haven't been able to work out. Any ideas on being able to return 2 strings as the following:
originCity = Los Angeles
destination = 1234 Anytown, NY
You can't use .Split() for this.
Instead, you need to find the index of the first number. You can use .indexOfAny() with an array of numbers (technically a char[] array) to do this.
int numberIndex = address.IndexOfAny("0123456789".ToCharArray())
You can then capture two substrings; One before the index, the other after.
string before = line.Substring(0, numberIndex);
string after = line.Substring(numberIndex);
You could use Regex. In the following, match is the first match in the regex results.
var match = Regex.Match(s, "[0-9]");
if (match.Success)
{
int index = match.Index;
originCity = s.Substring(0, index);
destination = s.Substring(index, s.Length - index);
}
Or you can do it yourself:
int index = 0;
foreach (char c in s)
{
int result;
if (int.TryParse(c, out result))
{
index = result;
break;
}
//or if (char.IsDigit()) { index = int.Parse(c); break; }
}
...
You should see if using a regular expression will do what you need here. At least with the sample data you're showing, the expression:
(\D+)(\d+)(\D+)
would group the results into non-numeric characters up to the first numeric character, the numeric characters until a non-numeric is encountered, and then the rest of the non-numeric characters. Here is how it would be used in code:
var pattern = #"(\D+)(\d+)(\D+)";
var input = "Los Angeles 1234 Anytown, NY";
var result = Regex.Match(input, pattern);
var city = result.Groups[1];
var destination = $"{result.Groups[2]} {result.Groups[3]}";
This falls apart in cases like 29 Palms, California or if the numbers would contain comma, decimal, etc so it is certainly not a silver bullet but I don't know your data and it may be ok for such a simple solution.

Extracting string between two characters?

I want to extract email id between < >
for example.
input string : "abc" <abc#gmail.com>; "pqr" <pqr#gmail.com>;
output string : abc#gmail.com;pqr#gmail.com
Without regex, you can use this:
public static string GetStringBetweenCharacters(string input, char charFrom, char charTo)
{
int posFrom = input.IndexOf(charFrom);
if (posFrom != -1) //if found char
{
int posTo = input.IndexOf(charTo, posFrom + 1);
if (posTo != -1) //if found char
{
return input.Substring(posFrom + 1, posTo - posFrom - 1);
}
}
return string.Empty;
}
And then:
GetStringBetweenCharacters("\"abc\" <abc#gmail.com>;", '<', '>')
you will get
abc#gmail.com
string input = #"""abc"" <abc#gmail.com>; ""pqr"" <pqr#gmail.com>;";
var output = String.Join(";", Regex.Matches(input, #"\<(.+?)\>")
.Cast<Match>()
.Select(m => m.Groups[1].Value));
Tested
string input = "\"abc\" <abc#gmail.com>; \"pqr\" <pqr#gmail.com>;";
matchedValuesConcatenated = string.Join(";",
Regex.Matches(input, #"(?<=<)([^>]+)(?=>)")
.Cast<Match>()
.Select(m => m.Value));
(?<=<) is a non capturing look behind so < is part of the search but not included in the output
The capturing group is anything not > one or more times
Can also use non capturing groups #"(?:<)([^>]+)(?:>)"
The answer from LB +1 is also correct. I just did not realize it was correct until I wrote an answer myself.
Use the String.IndexOf(char, int) method to search for < starting at a given index in the string (e.g. the last index that you found a > character at, i.e. at the end of the previous e-mail address - or 0 when looking for the first address).
Write a loop that repeats for as long as you find another < character, and everytime you find a < character, look for the next > character. Use the String.Substring(int, int) method to extract the e-mail address whose start and end position is then known to you.
Could use the following regex and some linq.
var regex = new Regex(#"\<(.*?)\>");
var input= #"""abc"" <abc#gmail.com>; ""pqr"" <pqr#gmail.com>";
var matches = regex.Matches(input);
var res = string.Join(";", matches.Cast<Match>().Select(x => x.Value.Replace("<","").Replace(">","")).ToArray());
The <> brackets get removed afterwards, you could also integrate it into Regex I guess.
string str = "\"abc\" <abc#gmail.com>; \"pqr\" <pqr#gmail.com>;";
string output = string.Empty;
while (str != string.Empty)
{
output += str.Substring(str.IndexOf("<") + 1, str.IndexOf(">") -1);
str = str.Substring(str.IndexOf(">") + 2, str.Length - str.IndexOf(">") - 2).Trim();
}

How to find the number of occurrences of a letter in only the first sentence of a string?

I want to find number of letter "a" in only first sentence. The code below finds "a" in all sentences, but I want in only first sentence.
static void Main(string[] args)
{
string text; int k = 0;
text = "bla bla bla. something second. maybe last sentence.";
foreach (char a in text)
{
char b = 'a';
if (b == a)
{
k += 1;
}
}
Console.WriteLine("number of a in first sentence is " + k);
Console.ReadKey();
}
This will split the string into an array seperated by '.', then counts the number of 'a' char's in the first element of the array (the first sentence).
var count = Text.Split(new[] { '.', '!', '?', })[0].Count(c => c == 'a');
This example assumes a sentence is separated by a ., ? or !. If you have a decimal number in your string (e.g. 123.456), that will count as a sentence break. Breaking up a string into accurate sentences is a fairly complex exercise.
This is perhaps more verbose than what you were looking for, but hopefully it'll breed understanding as you read through it.
public static void Main()
{
//Make an array of the possible sentence enders. Doing this pattern lets us easily update
// the code later if it becomes necessary, or allows us easily to move this to an input
// parameter
string[] SentenceEnders = new string[] {"$", #"\.", #"\?", #"\!" /* Add Any Others */};
string WhatToFind = "a"; //What are we looking for? Regular Expressions Will Work Too!!!
string SentenceToCheck = "This, but not to exclude any others, is a sample."; //First example
string MultipleSentencesToCheck = #"
Is this a sentence
that breaks up
among multiple lines?
Yes!
It also has
more than one
sentence.
"; //Second Example
//This will split the input on all the enders put together(by way of joining them in [] inside a regular
// expression.
string[] SplitSentences = Regex.Split(SentenceToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase);
//SplitSentences is an array, with sentences on each index. The first index is the first sentence
string FirstSentence = SplitSentences[0];
//Now, split that single sentence on our matching pattern for what we should be counting
string[] SubSplitSentence = Regex.Split(FirstSentence, WhatToFind, RegexOptions.IgnoreCase);
//Now that it's split, it's split a number of times that matches how many matches we found, plus one
// (The "Left over" is the +1
int HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the first sentence, {0} '{1}'.", HowMany, WhatToFind));
//Do all this again for the second example. Note that ideally, this would be in a separate function
// and you wouldn't be writing code twice, but I wanted you to see it without all the comments so you can
// compare and contrast
SplitSentences = Regex.Split(MultipleSentencesToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
SubSplitSentence = Regex.Split(SplitSentences[0], WhatToFind, RegexOptions.IgnoreCase | RegexOptions.Singleline);
HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the second sentence, {0} '{1}'.", HowMany, WhatToFind));
}
Here is the output:
We found, in the first sentence, 3 'a'.
We found, in the second sentence, 4 'a'.
You didn't define "sentence", but if we assume it's always terminated by a period (.), just add this inside the loop:
if (a == '.') {
break;
}
Expand from this to support other sentence delimiters.
Simply "break" the foreach(...) loop when you encounter a "." (period)
Well, assuming you define a sentence as being ended with a '.''
Use String.IndexOf() to find the position of the first '.'. After that, searchin a SubString instead of the entire string.
find the place of the '.' in the text ( you can use split )
count the 'a' in the text from the place 0 to instance of the '.'
string SentenceToCheck = "Hi, I can wonder this situation where I can do best";
//Here I am giving several way to find this
//Using Regular Experession
int HowMany = Regex.Split(SentenceToCheck, "a", RegexOptions.IgnoreCase).Length - 1;
int i = Regex.Matches(SentenceToCheck, "a").Count;
// Simple way
int Count = SentenceToCheck.Length - SentenceToCheck.Replace("a", "").Length;
//Linq
var _lamdaCount = SentenceToCheck.ToCharArray().Where(t => t.ToString() != string.Empty)
.Select(t => t.ToString().ToUpper().Equals("A")).Count();
var _linqAIEnumareable = from _char in SentenceToCheck.ToCharArray()
where !String.IsNullOrEmpty(_char.ToString())
&& _char.ToString().ToUpper().Equals("A")
select _char;
int a =linqAIEnumareable.Count;
var _linqCount = from g in SentenceToCheck.ToCharArray()
where g.ToString().Equals("a")
select g;
int a = _linqCount.Count();

Categories