Regex, replace group between other groups? - c#

I have such a regex:
string ipPort = #"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}[\s\S]*?[0-9]{1,5}";
Regex Rx = new Regex(ipPort,RegexOptions.Singleline);
List<string> catched = new List<string>();
foreach (Match ItemMatch in Rx.Matches(page))
{
catched.Add(ItemMatch.ToString());
}
It will find ip, followed by any number of characters, followed by port number. I want this "any number of characters" replaced by single colon ":". How to do that, I'm not very experienced with regular expressions...

You can use this general expression that uses lookarounds in order to find a pattern between a prefix and a suffix:
(?<=prefix)find(?=suffix)
Applied to your specific problem:
(?<=[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})[^0-9].*?(?=[0-9]{1,5})
Note that I have added a [^0-9] meaning "not a digit". There must be at least one non-digit character there, because otherwise the search cannot distinguish between digits belonging to the last ip-group and the port number.
You can also repeat the number-dot group three times and then append the fourth number
(?<=([0-9]{1,3}\.){3}[0-9]{1,3})[^0-9].*?(?=[0-9]{1,5})
You can also replace [\s\S] (space or non-space character) by . (any character).
Applied to our general expression, now we have:
prefix (ip): ([0-9]{1,3}\.){3}[0-9]{1,3}
find (stuff to be replaced by colon): [^0-9].*?
suffix (port): [0-9]{1,5}

Related

Regex pattern for search first letters of the first and last name

I have a problem with regex pattern. Every day I get names and surnames. Example:
Darkholme Van Tadashi
Herrington Billy Aniki
Johny
Walker Sam Cooler
etc..
The fact is that they are specific and do not consist of just one last name and first name.
From this list, I need to select one person (whose last name and first name I know). To do this, I found pattern:
"Darkholme|\b[vt]"
As I said, I know the person's data in advance (before the list arrives). But I only know his last name. The second and third names (Van Tadashi) are unknown to me, I only know the first letters of these names ("V" and "T"). I ran into this problem: when regex analyzes incoming data (I use regex.ismatch), it returns true if the input string is "Van Dungeonmaster". How do I create a pattern that will only return true if the surname=Darkholme, first letters of the second and third names match (=V and T)?
Perhaps I'm not making myself clear.. But in the end, it should turn out that I passed only the last name and the first letters of the first name and patronymic to pattern, and regex gave a match for input string.
If there is a comma present and the names can start with either V or T where the third name can be optional, you could use an optional group matching any non whitespace char except a comma.
\bDarkholme\s+[VT][^\s,]+(?:\s+[VT][^\s,]+)?
\b Word bounary, to prevent Darkholme being part of a larger word
Darkholme Match literally
\s+[VT] Match 1+ whitespace chars followed by either V or T
[^\s,]+ Match 1+ times any char except a whitespace char or comma
(?: Non capture group
\s+[VT] Match 1+ whitespace chars followed by either V or T
[^\s,]+ Match 1+ times any char except a whitespace char or comma
)? Close the group to make the 3rd part optional
.NET regex demo
If you know that the name starts with V for the second and T for the third:
\bDarkholme\s+V[^\s,]+(?:\s+T[^\s,]+)?
.NET regex demo
If the name can also be a Single V or T, the quantifier could be an asterix for [^\s,]*
Your pattern as is means "match any string that contains Darkholme or any string where any word starts with a v or a t" which isn't quite what you want
Perhaps
Darkholme\s+V\S*\s+T
Would suit you better. It means "darkholme followed by at least one white space then V, followed by any number of non whitespace characters then any number of whitespace followed by T

Split String At Every Non-Letter/Non-Number Character

Imagine a string that contains special characters like $§%%,., numbers and letters.
I want to receive the letter and number junks of an arbitrary string as an array of strings.
A good solution seems to be the use of regex, but I don't know how to express [numbers and letters]
// example
"abc" = {"abc"};
"ab .c" = {"ab", "c"}
"ab123,cd2, ,,%&$§56" = {"ab123", "cd2", "56"}
// try
string input = "jdahs32455$§&%$§df233§$fd";
string[] output = input.Split(Regex("makejunksfromstring"));
To extract chunks of 1 or more letters/digits you may use
[A-Za-z0-9]+ # ASCII only letters/digits
[\p{L}0-9]+ # Any Unicode letters and ASCII only digits
[\p{L}\p{N}]+ # Any Unicode letters/digits
See a regex demo.
C# usage:
string[] output = Regex.Matches(input, #"[\p{L}\p{N}]+").Cast<Match>().Select(x => x.Value).ToArray();
Yes, regex is indeed a good solution for this.
And in fact, to just match all standard words in the input sequence, this is all you need:
(\w+)
Let me quickly explain
\w matches any word character and is equivalent to [a-zA-Z0-9_] - matching a through z or A through Z or 0-9 or _, you might wanna go with [a-zA-Z0-9] instead to avoid that underscore.
Wrapping an expression in () means that you want to capture that part as a group.
The + means that you want sequences of 1 or more of the preceding characters.
Refer to a regular expression cheat sheet to see all the possibilities, such as
https://cheatography.com/davechild/cheat-sheets/regular-expressions/
Or any that you find online.
Also there are tools available to quickly test out your regular expressions, such as
https://regex101.com/ (quite well visualised matching)
or http://regexstorm.net/tester specifically for .NET

.NET Regular Expression white space special characters

This pattern is not working sometimes (it works only for the 3rd instance). The pattern is ^\s*flood\s{55}\s+\w+
I am new to regular expression and I am trying to write a regular expression that captures all the following conditions:
Example 1: flood a)
Example 2: flood As respects
Example 3: flood USD100,000
(it's in a tabular format and there's a lot of space between flood and the next word)
Your expression is saying:
^\s* The start of the string may have zero or more whitespace characters
flood followed by the string flood
\s{55} followed by exactly 55 whitespace characters
\s+\w+ followed by one or more whitespace characters and then one or more word characters.
If you want a minimum number of whitespace characters, say at least 30, followed by one or more word chraracters, then you could do this:
^\s*flood\s{30,}\w+
Try this:
string input =
#" flood a)
flood As respects
flood USD100,000";
string pattern = #"^\s*flood\s+.+$";
MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Multiline);
If there are a lot of spaces between flood and the next word you could omit \s{55} which is a quantifier that matches a whitespace character 55 times.
That would leave you with ^\s*flood\s+\w+ which does not yet match all the values at the end because \w matches a word character but not a whitespace or any of ),.
To match your values you might use a character class and add the characters that you allow to match:
^\s*flood\s+[\w,) ]+
Or if you want to match any character you could use a dot instead of a character class.
According to your comment, you might use a positive lookbehind:
(?<=\(13\. Deductible\))\s*(\s*flood\s+[\w,) ]+)+
Demo

C# Regular Expression: Search the first 3 letters of each name

Does anyone know how to say I can get a regex (C#) search of the first 3 letters of a full name?
Without the use of (.*)
I used (.**)but it scrolls the text far beyond the requested name, or
if it finds the first condition and after 100 words find the second condition he return a text that is not the look, so I have to limit in number of words.
Example: \s*(?:\s+\S+){0,2}\s*
I would like to ignore names with less than 3 characters if they exist in name.
Search any name that contains the first 3 characters that start with:
'Mar Jac Rey' (regex that performs search)
Should match:
Marck Jacobs L. S. Reynolds
Marcus Jacobine Reys
Maroon Jacqueline by Reyils
Can anyone help me?
The zero or more quantifier (*) is 'greedy' by default—that is, it will consume as many characters as possible in order to finding the remainder of the pattern. This is why Mar.*Jac will match the first Mar in the input and the last Jac and everything in between.
One potential solution is just to make your pattern 'non-greedy' (*?). This will make it consume as few characters as possible in order to match the remainder of the pattern.
Mar.*?Jac.*?Rey
However, this is not a great solution because it would still match the various name parts regardless of what other text appears in between—e.g. Marcus Jacobine Should Not Match Reys would be a valid match.
To allow only whitespace or at most 2 consecutive non-whitespace characters to appear between each name part, you'd have to get more fancy:
\bMar\w*(\s+\S{0,2})*\s+Jac\w*(\s+\S{0,2})*\s+Rey\w*
The pattern (\s+\S{0,2})*\s+ will match any number of non-whitespace characters containing at most two characters, each surrounded by whitespace. The \w* after each name part ensures that the entire name is included in that part of the match (you might want to use \S* instead here, but that's not entirely clear from your question). And I threw in a word boundary (\b) at the beginning to ensure that the match does not start in the middle of a 'word' (e.g. OMar would not match).
I think what you want is this regular expression to check if it is true and is case insensitive
#"^[Mar|Jac|Rey]{3}"
Less specific:
#"^[\w]{3}"
If you want to capture the first three letters of every words of at least three characters words you could use something like :
((?<name>[\w]{3})\w+)+
And enable ExplicitCapture when initializing your Regex.
It will return you a serie of Match named "name", each one of them is a result.
Code sample :
Regex regex = new Regex(#"((?<name>[\w]{3})\w+)+", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
var match = regex.Matches("Marck Jacobs L. S. Reynolds");
If you want capture also 3 characters words, you can replace the last "\w" by a space. In this case think to handle the last word of the phrase.

Regex: Match any punctuation character except . and _

Is there an easy way to match all punctuation except period and underscore, in a C# regex? Hoping to do it without enumerating every single punctuation mark.
Use Regex Subtraction
[\p{P}-[._]]
See the .NET Regex documentation. I'm not sure if other flavors support it.
C# example
string pattern = #"[\p{P}\p{S}-[._]]"; // added \p{S} to get ^,~ and ` (among others)
string test = #"_""'a:;%^&*~`bc!##.,?";
MatchCollection mx = Regex.Matches(test, pattern);
foreach (Match m in mx)
{
Console.WriteLine("{0}: {1} {2}", m.Value, m.Index, m.Length);
}
Explanation
The pattern is a Character Class Subtraction. It starts with a standard character class like [\p{P}] and then adds a Subtraction Character Class like -[._], which says to remove the . and _. The subtraction is placed inside the [ ] after the standard class guts.
The answers so far do not respect ALL punctuation. This should work:
(?![\._])\p{P}
(Explanation: Negative lookahead to ensure that neither . nor _ are matched, then match any unicode punctuation character.)
Here is something a little simpler. Not words or white-space (where words include A-Za-z0-9 AND underscore).
[^\w\s.]
You could possibly use a negated character class like this:
[^0-9A-Za-z._\s]
This includes every character except those listed. You may need to exclude more characters (such as control characters), depending on your ultimate requirements.

Categories