Match text after colon - c#

I want to match the word after the "type :".
What I have?
My actual pattern:
(?<=type\s:\s)(\w*)
Text:
"type : text,"
It work exact as I want when I have just one whitespace before/after color...
"type_SPACE_:_SPACE_text
But if I have 2 spaces or none, it doesn't work.
I already try with this, but doesn't match.
(?<=type\s*:\s*)(\w*)
Also, I try with this, best approach. But with this, the matched text contain the colon.
(?<=type)(\s*):(\s*)(.*)(?=,)
To do the test I use gskinner's tester...
http://gskinner.com/RegExr/

If you're doing this in C# and using the included Regex engine, your original regex should work, with a slight modification:
string myString = "type : something";
var match = Regex.Match(myString, #"(?<=type\s*:\s*)\w+");
Console.Write(match);
Edit: The reason why the ?<=type\s*:\s*)\w* version wasn't working for you with multiple spaces, is because the regex match was happily returning various combinations of strings with 0 characters after the variable number of spaces following the colon.
You can view the various matched strings by using Regex.Matches, you'll see that your matched word is in there, but it's not the first result.

Related

Regex or Substring to Match Filename With Extension

I have a current situation where I can be given a filename with path that looks like:
C:\\Users\\testUser\\Documents\\testFile.txt.9043632d298f44ad88509c677a8249f8
or
C:\\Users\\testUser\\Documents\\testFile.txt.9043632d298f44ad88509c677a8249f8.enc
I need to be able to extract everything up until the end of the extension (can be any file extension, will always have guid string preceded by a . after the extension)
So an example output would be:
C:\\Users\\testUser\\Documents\\testFile.txt
C:\\Users\\testUser\\Documents\\testFile.pdf
C:\\Users\\testUser\\Documents\\testFile.jpeg
I have tried substrings but cannot seem to get the proper input (though I assume it is a simple task). I can never seem to get the proper result.
An example I tried was along the lines of:
filename.Substring(0,filename.Indexof('.', //what goes here??));
but keep getting stuck.
Any help would be lovely!
You might use:
new Regex(#".*(?=\.[a-f\d]{32})", RegexOptions.IgnoreCase).Match(yourString)
Explanation:
.+ match one or more of any char
(?= ) look ahead, check if the following chars match, but don't include in match
\. match a dot
[a-f\d]{32} match any character a-f or digit exactly 32 times
RegexOptions.IgnoreCase ignores the case

Regex - find matches not contained within pattern

I would like to use a regular expression to match all occurrences of a phrase where it's not contained within some delimiting characters. I tried putting one together but had some difficulty with the negative lookaheads.
My search phrase is "my phrase". The start delimiter tag is [[ and the end delimiter tag is ]]. The string I'd like to search is:
Here is a sentence with my phrase, here's another part which I don't want to match on [[my phrase]]. I would like to find this occurrence of my phrase.
From this string I would expect to find all occurrences of "my phrase" except the one contained within [[ ]].
I hope that makes sense, thanks in advance for any guidance.
[^#]my phrase[^#]
I have knocked up a RegEx that will do what you ask, this can be seen here.
Literally just escaping out # as a character and allowing any other character to be returned. You can return the index of these results but remember to strip off the first and last character of the string.
Note: This will not pick up any "my phrase" that end the sentence without a character following it
Edit - Seeing as you changed the scope while I was writing this answer,
here is the RegEx for the other delimiter:
[^[[]my phrase[^\]\]]
(?<=[^\[])my phrase(?=[^\]]*)
This will also elliminate the trailing punctuation marks.

Find a pattern using Regex

I have a bunch of string lines with different formats. I want to find a pattern using regex in order to match specific lines. I have tried to figure it out myself up to some degree, using this: \b([A-Z0-9]{2,})\b. However I wasn't able to find the right pattern that will match only the lines 3, 6 and 8.Thank you.
// DONE:
return Test;
TESTER
MessageBoxButtons.OK,
.GetConnectionString();
TOURNAMENT TRACKER
// Create
TEST 4 ME
My guess is that your solution also matched the first and fourth line. If you want to exclude the lines with characters other than those specified, you could look at the whole line instead of checking single words:
^[0-9A-Z]+(\s[0-9A-Z]+)*$
It will match lines consisting of white space separated words which contain numbers or uppercase letters.
If you check the whole line you can use this
^[A-Z0-9 ]+$
Assuming case-sensitivity is set then this will match only uppercase characters, numbers and spaces from the start to the end of the line.
See demo here

Match text not surrounded by & and ;

I am currently using the following regular expression:
(?<!&)[^&;]*(?!;)
To match text like this:
match1<match2>
And extract:
match1
match2
However, this seems to match an extra five empty strings. See Regex Storm.
How can I only match the two listed above?
Note the existing pattern ((?<=^|;)[^&]+) by #xanatos will only match matches 1 to 3 in the following string and not match4:
match1&lte;match2<match;3+match&4
Try changing the * to a +:
(?<!&)[^&;]+(?!;)
Test here
More correct regex:
(?<=^|;)[^&]+
Test here
The basic idea here is that a "good" substring starts at the beginning of the string (^) or right after the ;, and ends when you encounter a & ([^&]+).
Third version... But here we are showing how if you have a problem, and you decide to use regexes, now you have two problems:
(?<=^|;)([^&]|&(?=[^&;]*(?:&|$)))+
Test here
I have managed it with:
(?<Text>.+?)(?:&[^&;]*?;|$)
This seems to match all of the corner cases but it might not work with a case I can't think of at the moment.
This won't work if the string starts with a &...; pattern or is only that.
See Regex Storm.

I want only matching string using regex

I have a string "myname 18-may 1234" and I want only "myname" from whole string using a regex.
I tried using the \b(^[a-zA-Z]*)\b regex and that gave me "myname" as a result.
But when the string changes to "1234 myname 18-may" the regex does not return "myname". Please suggest the correct way to select only "myname" whole word.
Is it also possible - given the string in
"1234 myname 18-may" format - to get myname only, not may?
UPDATE
Judging by your feedback to your other question you might need
(?<!\p{L})\p{L}+(?!\p{L})
ORIGINAL ANSWER
I have come up with a lighter regex that relies on the specific nature of your data (just a couple of words in the string, only one is whole word):
\b(?<!-)\p{L}+\b
See demo
Or even a more restrictive regex that finds a match only between (white)spaces and string start/end:
(?<=^|\s)\p{L}+(?=\s|$)
The following regex is context-dependent:
\p{L}+(?=\s+\d{1,2}-\p{L}{3}\b)
See demo
This will match only the word myname.
The regex means:
\p{L}+ - Match 1 or more Unicode letters...
(?=\s+\d{1,2}-\p{L}{3}\b) - until it finds 1 or more whitespaces (\s+) followed with 1 or 2 digits, followed with a hyphen and 3 Unicode letters (\p{L}{3}) which is a whole word (\b). This construction is a positive look-ahead that only checks if something can be found after the current position in the string, but it does not "consume" text.
Since the date may come before the string, you can add an alternation:
\p{L}+(?=[ ]+\d{1,2}-\p{L}{3}\b)|(?<=\d{1,2}-\p{L}{3}[ ]+)\p{L}+
See another demo
The (?<=\d{1,2}-\p{L}{3}\s+) is a look-behind that checks for the same thing (almost) as the look-ahead, but before the myname.
here is a solution without RegEx
string input = "myname 18-may 1234";
string result = input.Split(' ').Where(x => x.All(y => char.IsLetter(y))).FirstOrDefault();
Do a replace using this regex:
(\s*\d+\-.{3}\s*|\s*.{3}\-\d+\s*)|(\s*\d+\s*)
you will end up with just your name.
Demo

Categories