Regex Puzzle Find all Valid String Combinations - c#

I am trying to find the possible subsets within in a string which satisfy the all given condition.
The first letter is a lowercase English letter.
Next, it contains a sequence of zero or more of the following characters:
lowercase English letters, digits, and colons.
Next, it contains a forward slash '/'.
Next, it contains a sequence of one or more of the following characters:
lowercase English letters and digits.
Next, it contains a backward slash '\'.
Next, it contains a sequence of one or more lowercase English letters.
Given some string, s, we define the following:
s[i..j] is a substring consisting of all the characters in the inclusive range between index i and index j.
Two substrings, s[i1..j1] and s[i[2]..j[2]], are said to be distinct if either i1 ≠ i[2] or j1 ≠ j[2].
For example, your command line is abc:/b1c\xy. Valid command substrings are:
abc:/b1c\xy
bc:/b1c\xy
c:/b1c\xy
abc:/b1c\x
bc:/b1c\x
c:/b1c\x
to which I solved as ^([a-z])([a-z0-9:]*)(/)([a-z0-9]+)([\\])([a-z]*)
but this doesn't satisfy the second condition, I tried ^([a-z])([a-z0-9:]*)(/)([a-z0-9]+)([\\])([a-z]+[a-z]*) but still for w:/a\bc it should be 2 subsets [w:/a\b,w:/a\bc] but by regex wise its 1 which is obviuos . what i am doing wrong
Regex Tool: Check
Edit: why w:/a\bc should yield two subsets [w:/a\b, w:/a\bc], cause it satisfies all 6 constraints and its distinct as 'w:/a\bc' is super set of w:/a\b,

You have to perform sub string operations after matching the strings.
For Example:
your string is "abc:/b1c\xy", you matched it using your regex, now it's time to get the required data.
int startIndex=1;
String st="abc:/b1c\xy";
regex1="[a-z0-9:]*(/)"
regex2="(/)([a-z0-9]+)([\\])";
regex3="([\\])([a-z])+";
String PrefixedString=regex1.match(st).group(0);
String CenterString=regex2.match(st).group(0);
String PostfixedString=regex3.match(st).group(0);
if(PrefixedString.contains(":"))
{ startIndex=2; }
for(int i=;i<PrefixedString.length-startIndex;i++)//ends with -startIndex because '/' is included in the string or ':' may be
{
String temp=PrefixedString[i];
if(i!=PrefixedString.length)
{
for(int j=i+1;j<PrefixedString.length;j++)
{
temp+=PrefixedString[j];
}
}
print(temp+CenterString+PostfixedString);
}
for(int i=1;i<PostfixedString.length;i++)//starts with -1 because '\' is included in the string
{
String temp=PrefixedString+CenterString+PostfixedString[i];
if(i!=PostfixedString.length)
{
for(int j=i+1;j<PostfixedString.length;j++)
{
temp+=PostfixedString[j];
}
}
print(temp);
}
I hope this will give you some idea.

You may be able to create a regex that helps you in separating all relevant result parts, but as far as I know, you can't create a regex that gives you all result sets with a single search.
The tricky part are the first two conditions, since there can be many possible starting points when there is a mix of letters, digits and colons.
In order to find possible starting points, I suggest the following pattern for the part before the forward slash: (?:([a-z]+)(?:[a-z0-9:]*?))+
This will match potentially multiple captures where every letter within the capture could be a starting point to the substring.
Whole regex: (?:([a-z]+)(?:[a-z0-9:]*?))+/[a-z0-9]+\\([a-z]*)
Create your results by combining all postfix sub-lengths from all captures of group 1 and all prefix sub-lengths from group 2.
Example code:
var testString = #"a:ab2c:/b1c\xy";
var reg = new Regex(#"(?:([a-z]+)(?:[a-z0-9:]*?))+/[a-z0-9]+\\([a-z]*)");
var matches = reg.Matches(testString);
foreach (Match match in matches)
{
var prefixGroup = match.Groups[1];
var postfixGroup = match.Groups[2];
foreach (Capture prefixCapture in prefixGroup.Captures)
{
for (int i = 0; i < prefixCapture.Length; i++)
{
for (int j = 0; j < postfixGroup.Length; j++)
{
var start = prefixCapture.Index + i;
var end = postfixGroup.Index + postfixGroup.Length - j;
Console.WriteLine(testString.Substring(start, end - start));
}
}
}
}
Output:
a:ab2c:/b1c\xy
a:ab2c:/b1c\x
ab2c:/b1c\xy
ab2c:/b1c\x
b2c:/b1c\xy
b2c:/b1c\x
c:/b1c\xy
c:/b1c\x

Intuitive Way might not correct.
var regex = new Regex(#"(^[a-z])([a-z0-9:]*)(/)([a-z0-9]+)([\\])([a-z]+)");
var counter = 0;
for (var c = 0; c < command.Length; c++)
{
var isMatched = regex.Match(string.Join(string.Empty, command.Skip(c)));
if (isMatched.Success)
{
counter += isMatched.Groups.Last().Value.ToCharArray().Length;
}
}
return counter;

Related

String splitting with a special structure

I have strings of the following form:
str = "[int]:[int],[int]:[int],[int]:[int],[int]:[int], ..." (for undefined number of times).
What I did was this:
string[] str_split = str.Split(',');
for( int i = 0; i < str_split.Length; i++ )
{
string[] str_split2 = str_split[i].Split(':');
}
Unfortunately this breaks when some of the numbers have extra ',' inside a number. For example, we have something like this:
695,000:14,306,000:12,136000:12,363000:6
in which the followings are the numbers, ordered from the left to the right:
695,000
14
306,000
12
136000
12
363000
6
How can I resolve this string splitting problem?
If it is the case that only the number to the left of the colon separator can contain commas, then you could simply express this as:
string s = "695,000:14,306,000:12,136000:12,363000:6";
var parts = Regex.Split(s, #":|(?<=:\d+),");
The regex pattern, which identifies the separators, reads: "any colon, or any comma that follows a colon and a sequence of digits (but not another comma)".
A simple solution is split using : as delimiter. The resultant array will have numbers of the format [int],[int]. Parse through the array and split each entry using , as the delimiter. This will give you an array of [int] numbers.
It might not be the best way to do it and it might not work all the time but here's what I'd do.
string[] leftRightDoubles = str.Split(':');
foreach(string substring in leftRightDoubles){
string[] indivNumbers = str.Split(',');
//if indivNumbers.Length == 2, you know that these two are separate numbers
//if indivNumbers.Length > 2, use heuristics to determine which parts belong to which number
if(indivNumbers.Length > 2) {
for(int i = 0, i < indivNumbers.Length, i++) {
if(indivNumbers[i] != '000') { //Or use some other heuristic
//It's a new number
} else {
//It's the rest of previous number
}
}
}
}
//It's sort of pseudocode with comments (haven't touched C# in a while so I don't want to write full C# code)

Check for special characters are not allowed in C#

I have to validate a text box from a list of special characters that are not allowed.
This all are not allowed characters.
"&";"\";"/";"!";"%";"#";"^";"(";")";"?";"|";"~";"+";" ";
"{";"}";"*";",";"[";"]";"$";";";":";"=";"
Where semi-column is used to just separate between characters .I tried to write a regex for some characters to validate if it had worked i would extend it.it is not working .
What I am doing wrong in this.
Regex.IsMatch(textBox1.Text, #"^[\%\/\\\&\?\,\'\;\:\!\-]+$")
^[\%\/\\\&\?\,\'\;\:\!\-]+$
matches the strings that consist entirely of special characters. You need to invert the character class to match the strings that do not contain a special character:
^[^\%\/\\\&\?\,\'\;\:\!\-]+$
^--- added
Alternatively, you can use this regex to match any string containing only alphanumeric characters, hyphens, underscores and apostrophes.
^[a-zA-Z0-9\-'_]$
The regex you mention in the comments
[^a-zA-Z0-9-'_]
matches a string that contains any character except those that are allowed (you might need to escape the hyphen, though). This works as well, assuming you reverse the condition correctly (accept the strings that do not match).
If you are just looking for any of a list of characters then a regular expression is the more complicated option. String.IndexOfAny will return the first index of any of an array of characters or -1. So the check:
if (input.IndexOfAny(theCharacetrers) != -1) {
// Found one of them.
}
where theCharacetrers has previously been set up at class scope:
private readonly char[] theCharacetrers = new [] {'&','\','/','!','%','#','^',... };
You needed to remove ^ from the beginning and $ from the end of the pattern, otherwise in order to match the string should start and end with the special characters.
So, instead of
#"^[\%\/\\\&\?\,\'\;\:\!\-]+$"
it should be
#"[\%\/\\\&\?\,\'\;\:\!\-]+"
You can read more about start of string and end of string anchors here
Your RegExp is "string consiting only of special characters (since you have begin/end markers ^ and $).
You probably want just check if string does not contain any of the characters #"[\%\/\\\&\?\,\'\;\:\!\-]") would be enough.
Also String.IndexOfAny may be better fit if you just need to see if any of the characters is present in the source string.
PLease use this in textchange event
//Regex regex = new Regex("([a-zA-Z0-9 ._#]+)");
Regex regex = new Regex("^[a-zA-Z0-9_#(+).,-]+$");
string alltxt = txtOthers.Text;//txtOthers is textboxes name;
int k = alltxt.Length;
for (int i = 0; i <= k - 1; i++)
{
string lastch = alltxt.Substring(i, 1);
MatchCollection matches = regex.Matches(lastch);
if (matches.Count > 0)
{
}
else
{
txtOthers.Text = alltxt.Remove(i, 1);
i = i - 1;
alltxt = txtOthers.Text;
k = alltxt.Length;
}
txtOthers.Select(txtOthers.TextLength, 0);
}
BY Sharafu Hameed

Find the longest sequence of digits in a string

I am trying to clear up the results for poor quality OCR reads, attempting to remove everything I can safely assume is a mistake.
The desired result is a 6 digit numerical string, so I can rule out any character that isn't a digit from the results. I also know these numbers appear sequentially, so any numbers out of sequence are also very likely to be incorrect.
(Yes, fixing the quality would be best but no... they won't/can't change their documents)
I immediately Trim() to remove white space, also as these are going to end up as file names I also remove all illegal characters.
I've found out which Characters are digits and added them to a dictionary against the array position in which they where found.
This leaves me with a clear visual indication of the number sequencies but I am struggling on the logic of how to get my program to recognise this.
Tested with the string "Oct', 2$3622" (an actual bad read)
The ideal output for this would be "3662"
public String FindLongest(string OcrText)
{
try
{
Char[] text = OcrText.ToCharArray();
List<char> numbers = new List<char>();
Dictionary<int, char> consec = new Dictionary<int, char>();
for (int a = 0; a < text.Length; a++)
{
if (Char.IsDigit(text[a]))
{
consec.Add(a, text[a]);
// Won't allow duplicates?
//consec.Add(text[a].ToString(), true);
}
}
foreach (var item in consec.Keys)
{
#region Idea that didn't work
// Combine values with consecutive keys into new list
// With most consecutive?
for (int i = 0; i < consec.Count; i++)
{
// if index key doesn't match loop, value was not consecutive
// Ah... falsely assuming it will start at 1. Won't work.
if (item == i)
numbers.Add(consec[item]);
else
numbers.Add(Convert.ToChar("#")); //string split value
}
#endregion
}
return null;
}
catch (Exception ex)
{
string message;
if (ex.InnerException != null)
message =
"Exception: " + ex.Message +
"\r\n" +
"Inner: " + ex.InnerException.Message;
else
message = "Exception: " + ex.Message;
MessageBox.Show(message);
return null;
}
}
A quick and dirty way to get the longest sequence of digits would be by using a Regex like this:
var t = "sfas234sdfsdf55323sdfasdf23";
var longest = Regex.Matches(t, #"\d+").Cast<Match>().OrderByDescending(m => m.Length).First();
Console.WriteLine(longest);
This will actually get all the sequences and obviously you can use LINQ to select the longest of these.
This doesn't handle multiple sequences of the same length.
so you just need find the longest # sequence? why not use regex?
Regex reg = new Regex("\d+");
Matches mc = reg.Matches(input);
foreach (Match mt in mc)
{
// mt.Groups[0].Value.Length is the len of the sequence
// just find the longest
}
Just a thought.
Since you strictly want numeric matches, I would suggest using a regex that matches (\d+).
MatchCollection matches = Regex.Matches(input, #"(\d+)");
string longest = string.Empty;
foreach (Match match in matches) {
if (match.Success) {
if (match.Value.Length > longest.Length) longest = match.Value;
}
}
This will give you the number of the longest length. If you wanted to actually compare values (which would also work with the "longest length", but could solve an issue with same-length matches):
MatchCollection matches = Regex.Matches(input, #"(\d+)");
int biggest = 0;
foreach (Match match in matches) {
if (match.Success) {
int current = 0;
int.TryParse(match.Value, out current);
if (current > biggest) biggest = current;
}
}
var split = Regex.Split(OcrText, #"\D+").ToList();
var longest = (from s in split
orderby s.Length descending
select s).FirstOrDefault();
I would recommend using a Regex.Split using \D (#"\D+" in code) which finds all characters that are not digits. I would then perform a Linq query to find the longest string by .Length.
As you can see, it's both simple and very readable.

How to find the number of occurrences of a letter in only the first sentence of a string?

I want to find number of letter "a" in only first sentence. The code below finds "a" in all sentences, but I want in only first sentence.
static void Main(string[] args)
{
string text; int k = 0;
text = "bla bla bla. something second. maybe last sentence.";
foreach (char a in text)
{
char b = 'a';
if (b == a)
{
k += 1;
}
}
Console.WriteLine("number of a in first sentence is " + k);
Console.ReadKey();
}
This will split the string into an array seperated by '.', then counts the number of 'a' char's in the first element of the array (the first sentence).
var count = Text.Split(new[] { '.', '!', '?', })[0].Count(c => c == 'a');
This example assumes a sentence is separated by a ., ? or !. If you have a decimal number in your string (e.g. 123.456), that will count as a sentence break. Breaking up a string into accurate sentences is a fairly complex exercise.
This is perhaps more verbose than what you were looking for, but hopefully it'll breed understanding as you read through it.
public static void Main()
{
//Make an array of the possible sentence enders. Doing this pattern lets us easily update
// the code later if it becomes necessary, or allows us easily to move this to an input
// parameter
string[] SentenceEnders = new string[] {"$", #"\.", #"\?", #"\!" /* Add Any Others */};
string WhatToFind = "a"; //What are we looking for? Regular Expressions Will Work Too!!!
string SentenceToCheck = "This, but not to exclude any others, is a sample."; //First example
string MultipleSentencesToCheck = #"
Is this a sentence
that breaks up
among multiple lines?
Yes!
It also has
more than one
sentence.
"; //Second Example
//This will split the input on all the enders put together(by way of joining them in [] inside a regular
// expression.
string[] SplitSentences = Regex.Split(SentenceToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase);
//SplitSentences is an array, with sentences on each index. The first index is the first sentence
string FirstSentence = SplitSentences[0];
//Now, split that single sentence on our matching pattern for what we should be counting
string[] SubSplitSentence = Regex.Split(FirstSentence, WhatToFind, RegexOptions.IgnoreCase);
//Now that it's split, it's split a number of times that matches how many matches we found, plus one
// (The "Left over" is the +1
int HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the first sentence, {0} '{1}'.", HowMany, WhatToFind));
//Do all this again for the second example. Note that ideally, this would be in a separate function
// and you wouldn't be writing code twice, but I wanted you to see it without all the comments so you can
// compare and contrast
SplitSentences = Regex.Split(MultipleSentencesToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
SubSplitSentence = Regex.Split(SplitSentences[0], WhatToFind, RegexOptions.IgnoreCase | RegexOptions.Singleline);
HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the second sentence, {0} '{1}'.", HowMany, WhatToFind));
}
Here is the output:
We found, in the first sentence, 3 'a'.
We found, in the second sentence, 4 'a'.
You didn't define "sentence", but if we assume it's always terminated by a period (.), just add this inside the loop:
if (a == '.') {
break;
}
Expand from this to support other sentence delimiters.
Simply "break" the foreach(...) loop when you encounter a "." (period)
Well, assuming you define a sentence as being ended with a '.''
Use String.IndexOf() to find the position of the first '.'. After that, searchin a SubString instead of the entire string.
find the place of the '.' in the text ( you can use split )
count the 'a' in the text from the place 0 to instance of the '.'
string SentenceToCheck = "Hi, I can wonder this situation where I can do best";
//Here I am giving several way to find this
//Using Regular Experession
int HowMany = Regex.Split(SentenceToCheck, "a", RegexOptions.IgnoreCase).Length - 1;
int i = Regex.Matches(SentenceToCheck, "a").Count;
// Simple way
int Count = SentenceToCheck.Length - SentenceToCheck.Replace("a", "").Length;
//Linq
var _lamdaCount = SentenceToCheck.ToCharArray().Where(t => t.ToString() != string.Empty)
.Select(t => t.ToString().ToUpper().Equals("A")).Count();
var _linqAIEnumareable = from _char in SentenceToCheck.ToCharArray()
where !String.IsNullOrEmpty(_char.ToString())
&& _char.ToString().ToUpper().Equals("A")
select _char;
int a =linqAIEnumareable.Count;
var _linqCount = from g in SentenceToCheck.ToCharArray()
where g.ToString().Equals("a")
select g;
int a = _linqCount.Count();

How to find the first x occurrences of a Char in a String using Regex

i'm trying to find out how i can get the first x Matches of a Char in a String. I tried using a Matchcollection but i cant find any escapesequence to stop after the x'd-match.
FYI:
I need this for a string with a variable length and a different number of occurences of the searched Char, so just getting all and using only the first x isnt a solution.
Thanks in advance
Edit:
I am using steam reader to get information out of a .txt files and write it to a atring, for each file one string. These atrings have very different lengths. In every string are lets say 3 keywords. But sometimes something went wrong and i have only one or two of the keywords. Between the keywords are other fields separated with a ;. So if i use a Matchcollection to get the indexes of the ;'s and one Keyword is missing the Information in the File is shifted. Because of that i need to find the first x occourencces before/after a (existing)keyword.
Do you really want to use Regex, something like this won't do ?
string simpletext = "Hello World";
int firstoccur = simpletext.IndexOfAny(new char[]{'o'});
Since you want all the indexes for that character you can try in this fashion
string simpletext = "Hello World";
int[] occurences = Enumerable.Range(0, simpletext.Length).Where(x => simpletext[x] == 'o').ToArray();
You can use the class Match. this class returns only one result, but you can iterate over the string till it found the last one.
Something like this:
Match match = Regex.Match(input, pattern);
int count = 0;
while (match.Success)
{
count++;
// do something with match
match = match.NextMatch();
// Exit the loop when your match number is reached
}
If you're determined to use Regex then I'd do this with Matches as opposed to Match actually; largely because you get the count up front.
string pattern = "a";
string source = "this is a test of a regex match";
int maxMatches = 2;
MatchCollection mc = Regex.Matches(source, pattern);
if (mc.Count() > 0)
{
for (int i = 0; i < maxMatches; i++)
{
//do something with mc[i].Index, mc[i].Length
}
}
The split operation is pretty fast so if the regex is not a requirement this could be used:
public static IEnumerable<int> IndicesOf(this string text, char value, int count)
{
var tokens = text.Split(value);
var sum = tokens[0].Length;
var currentCount = 0;
for (int i = 1; i < tokens.Length &&
sum < text.Length &&
currentCount < count; i++)
{
yield return sum;
sum += 1 + tokens[i].Length;
currentCount++;
}
}
executes in roughly 60% of the time of the regex

Categories