Match text not surrounded by & and ; - c#

I am currently using the following regular expression:
(?<!&)[^&;]*(?!;)
To match text like this:
match1<match2>
And extract:
match1
match2
However, this seems to match an extra five empty strings. See Regex Storm.
How can I only match the two listed above?
Note the existing pattern ((?<=^|;)[^&]+) by #xanatos will only match matches 1 to 3 in the following string and not match4:
match1&lte;match2<match;3+match&4

Try changing the * to a +:
(?<!&)[^&;]+(?!;)
Test here
More correct regex:
(?<=^|;)[^&]+
Test here
The basic idea here is that a "good" substring starts at the beginning of the string (^) or right after the ;, and ends when you encounter a & ([^&]+).
Third version... But here we are showing how if you have a problem, and you decide to use regexes, now you have two problems:
(?<=^|;)([^&]|&(?=[^&;]*(?:&|$)))+
Test here

I have managed it with:
(?<Text>.+?)(?:&[^&;]*?;|$)
This seems to match all of the corner cases but it might not work with a case I can't think of at the moment.
This won't work if the string starts with a &...; pattern or is only that.
See Regex Storm.

Related

How to eliminate digits followed by specific string

I have quite a long regex pattern. Here is just a part of it:
string pattern = #"((?<!top=)(?<![A-Za-z])\d)+";
Given the string:
date(Account/AccountClose) gt 2019-03-25 and Brg eq '100'&$select=IdAccountCurrent&$skip=10&$top=10
It matches 2019, 03, 25, 100, 10 and 0.
I want to eliminate the last 0 from the matching result. In other words, all numbers that are followed by top= should not match.
My solution works only if I have one digit after top=.How can I achieve the desired result ?
regex101 example
UPDATE: Unfortunately, the suggested solutions are not suited for the whole pattern. I tried to make my example simple but it looks like it's imposible to do.
So my whole regex pattern is:
string pattern = #"((?<!top=)(?<![A-Za-z])\d|-|T\d+|:|\.|\+|(?<=\d)Z)+|\bfalse\b|\btrue\b|\bnull\b|'[^']+'|\(['\d][^\)]+\)";
I need to edit this pattern to eliminate all digits right after top=.
my whole example (please see the last row in this example, last 0 should not be matched)
Just add 0-9 in your regex, for forcing the digit not to be preceded by another digit:
((?<!top=)(?<![A-Za-z0-9])\d+)
See here for a demo.
But you can also just use word boundaries:
(?<!top=)\b(\d+)
See here for a demo.
You can change your regex to this where I've used \b to reject the partial matching of digits,
(?<!top=)(?<![A-Za-z])\b\d+
Demo
The way your wrote your regex ((?<!top=)(?<![A-Za-z])\d)+ will work by applying the condition on an individually and then counting one or more such characters which wouldn't have allowed using \b in your regex and hence I changed it to remove outer parenthesis and used \b\d+. Hopefully this should give you all your desired matches. Let me know if you face any issues.

Use OR in Regex Expression

I have a regex to match the following:
somedomain.com/services/something
Basically I need to ensure that /services is present.
The regex I am using and which is working is:
\/services*
But I need to match /services OR /servicos. I tried the following:
(\/services|\/servicos)*
But this shows 24 matches?! https://regex101.com/r/jvB1lr/1
How to create this regex?
The (\/services|\/servicos)* matches 0+ occurrences of /services or /servicos, and that means it can match an empty string anywhere inside the input string.
You can group the alternatives like /(services|servicos) and remove the * quantifier, but for this case, it is much better to use a character class [oe] as the strings only differ in 1 char.
You want to use the following pattern:
/servic[eo]s
See the regex demo
To make sure you match a whole subpart, you may append (?:/|$) at the pattern end, /servic[eo]s(?:/|$).
In C#, you may use Regex.IsMatch with the pattern to see if there is a match in a string:
var isFound = Regex.IsMatch(s, #"/servic[eo]s(?:/|$)");
Note that you do not need to escape / in a .NET regex as it is not a special regex metacharacter.
Pattern details
/ - a /
servic[eo]s - services or servicos
(?:/|$) - / or end of string.
Well the * quantifier means zero or more, so that is the problem. Remove that and it should work fine:
(\/services|\/servicos)
Keep in mind that in your example, you have a typo in the URL so it will correctly not match anything as it stands.
Here is an example with the typo in the URL fixed, so it shows 1 match as expected.
First off you specify C# (really .Net is the library which holds regex not the language) in this post but regex101 in your example is set to PHP. That is providing you with invalid information such as needed to escape a forward slash / with \/ which is unnecessary in .Net regular expressions. The regex language is the same but there are different tools which behave differently and php is not like .Net regex.
Secondly the star * on the ( ) is saying that there may be nothing in the parenthesis and your match is getting null nothing matches on every word.
Thirdly one does not need to split the whole word. I would just extract the commonality in the words into a set [ ]. That will allow the "or-ness" you need to match on either services or servicos. Such as
(/servic[oe]s)
Will inform you if services are found or not. Nothing else is needed.

Regex for git's repository

I want to use regex to validate git repository url. I found a few answers on stackoverflow but none of them passes my tests.
The debug is here: http://regexr.com/39qia
How can I make it passes the last four cases?
git#git.host.hy:group-name/project-name.git
git#git.ho-st.hy:group-name/project-name.git
http://host.xy/agroup-name/project-name.git
http://ho-st.xy/agroup-name/project-name.git
I can't be certain since I'm not familiar with git link syntaxes, but the following regex will additionally match the 4 next values:
((git|ssh|http(s)?)|(git#[\w.-]+))(:(//)?)([\w.#\:/~-]+)(\.git)(/)?
^ ^^ ^
I have indicated the changed parts; namely:
Added - to the part after # because ho-st was not passing otherwise.
Moved - to the end of the character class because otherwise /-~ would mean the character range / to ~ which matches a lot of characters.
Escaped the final dot (thanks #MatiCicero)
There are a lot of things that could be simplified from the above, but since I don't know your exact goals, I'm leaving the regex as close as possible to the one you have.
You can try this one:
(?'protocol'git#|https?:\/\/)(?'domain'[a-zA-Z0-9\.\-_]+)(\/|:)(?'group'[a-zA-Z0-9\-]+)\/(?'project'[a-zA-Z0-9\-]+)\.git
You can then extract the needed information from the matched groups.
You can test this regex on: Regex101
Ok, the following expression matches all of your current test-text and does not match any of your false positives provided before:
((((git|user)#[\w.-]+)|(git|ssh|http(s)?|file))(:(\/){0,3}))?([\w.#\:/~\-]+)(\.git)(\/)?
See also, regex.
Caveat: Be aware, that currently input is matched with '~' and '-' appearing in places where they shouldn't.

Character 'e' is not recognized by simple regular expression - why?

I wrote a very simple regular expression that need to match the next pattern:
word.otherWord
- Word must have at least 2 characters and must not start with digit.
I wrote the next expression:
[a-zA-Z][a-zA-Z](.[a-zA-Z0-9])+
I tested it using Regex tester and it seems to be working at most of the cases but when I try some inputs that ends with 'e' it's not working.
for example:
Hardware.Make does not work but Hardware.Makee is works fine, why? How can I fix it?
That's because your regex looks for inputs which length is even.
You have two characters matched by [a-zA-Z][a-zA-Z] and then another two characters matched by (.[a-zA-Z0-9]) as a group which is repeated one or more times (because of +).
You can see it here: http://regex101.com/r/fW2bC1
I think you need that:
[a-zA-Z]+(\.[a-zA-Z0-9]+)+
Actually, the dot is a regex metacharacter, which stands for "any character". You'll need to escape the dot.
For your situation, I'd do this:
[a-zA-Z]{2,}\.[a-zA-Z0-9]+
The {2,} means, at least 2 characters from the previous range.
In regex, the dot period is one of the most commonly used metacharacters and unfortunately also commonly misused metacharacter. The dot matches a single character without caring what that character is...
So u would also re-write it like
[a-zA-Z]+(\.[a-zA-Z0-9]+)+

Match text after colon

I want to match the word after the "type :".
What I have?
My actual pattern:
(?<=type\s:\s)(\w*)
Text:
"type : text,"
It work exact as I want when I have just one whitespace before/after color...
"type_SPACE_:_SPACE_text
But if I have 2 spaces or none, it doesn't work.
I already try with this, but doesn't match.
(?<=type\s*:\s*)(\w*)
Also, I try with this, best approach. But with this, the matched text contain the colon.
(?<=type)(\s*):(\s*)(.*)(?=,)
To do the test I use gskinner's tester...
http://gskinner.com/RegExr/
If you're doing this in C# and using the included Regex engine, your original regex should work, with a slight modification:
string myString = "type : something";
var match = Regex.Match(myString, #"(?<=type\s*:\s*)\w+");
Console.Write(match);
Edit: The reason why the ?<=type\s*:\s*)\w* version wasn't working for you with multiple spaces, is because the regex match was happily returning various combinations of strings with 0 characters after the variable number of spaces following the colon.
You can view the various matched strings by using Regex.Matches, you'll see that your matched word is in there, but it's not the first result.

Categories