Simple regex match question - c#

I have the following string "sometextsometextSiteId-111-aaaaasometext"
If the string contains "SiteId-111-aaaaa" I would like to get the 111-aaaaa part. (any number, any char)
"sometextsometextSiteId-111-aaaaasometext" -> 111-aaaaa
"sometextsometextSiteId-123-abcdesometext" -> 123-abcde
"sometextsometextsitId-111-aaaaasometext" -> (nothing)
"SiteId-999-QWERTPOIPOI" -> "999-QWERR"
I guess this should be possible to do?
Any hints?
Thanks Larsi

(?<=SiteId-)([0-9]+-[a-zA-Z]{5})
should capture that part.
PowerShell test:
$re = '(?<=SiteId-)([0-9]+-[a-zA-Z]{5})'
'sometextsometextSiteId-111-aaaaasometext',
"sometextsometextSiteId-123-abcdesometext",
"sometextsometextsitId-111-aaaaasometext",
"SiteId-999-QWERTPOIPOI" |
% {
$x = [regex]::Matches($_, $re)
Write-Host $_ - $x
}
yields
sometextsometextSiteId-111-aaaaasometext - 111-aaaaa
sometextsometextSiteId-123-abcdesometext - 123-abcde
sometextsometextsitId-111-aaaaasometext -
SiteId-999-QWERTPOIPOI - 999-QWERT

SiteId-(\d{3}-\D+) this should capture that.
Also you can use rubular to try your regular expressions and it has a quick regexp reference at the bottom.

Related

Regex trying to get just package name from `az.accounts.2.10.4.nupkg`

I am trying to get the package name from the file name using C# and Regex. This is my attempt so far which works, but I am wondering if is there a more elegant way.
Given for example, az.accounts.2.10.4.nupkg I want to get az.accounts
My attempt:
var filename = Path.GetFileNameWithoutExtension(nupkgPackagePath);
var nupkgPackageGetModulePath = Regex.Matches(filename, #"[^\d]+").First().Value.TrimEnd('.'));
Test cases:
$ ls *.nupkg
PowerShellGet.nupkg az.iothub.2.7.4.nupkg
az.9.2.0.nupkg az.keyvault.4.9.1.nupkg
az.accounts.2.10.4.nupkg az.kusto.2.1.0.nupkg
az.advisor.2.0.0.nupkg az.logicapp.1.5.0.nupkg
az.aks.5.1.0.nupkg az.machinelearning.1.1.3.nupkg
az.analysisservices.1.1.4.nupkg az.maintenance.1.2.1.nupkg
az.apimanagement.4.0.1.nupkg az.managedserviceidentity.1.1.0.nupkg
az.appconfiguration.1.2.0.nupkg az.managedservices.3.0.0.nupkg
az.applicationinsights.2.2.0.nupkg az.marketplaceordering.2.0.0.nupkg
az.attestation.2.0.0.nupkg az.media.1.1.1.nupkg
az.automation.1.8.0.nupkg az.migrate.2.1.0.nupkg
az.batch.3.2.1.nupkg az.monitor.4.3.0.nupkg
az.billing.2.0.0.nupkg az.mysql.1.1.0.nupkg
az.cdn.2.1.0.nupkg az.network.5.2.0.nupkg
az.cloudservice.1.1.0.nupkg az.notificationhubs.1.1.1.nupkg
az.cognitiveservices.1.12.0.nupkg az.operationalinsights.3.2.0.nupkg
az.compute.5.2.0.nupkg az.policyinsights.1.5.1.nupkg
az.confidentialledger.1.0.0.nupkg az.postgresql.1.1.0.nupkg
az.containerinstance.3.1.0.nupkg az.powerbiembedded.1.2.0.nupkg
az.containerregistry.3.0.0.nupkg az.privatedns.1.0.3.nupkg
az.cosmosdb.1.9.0.nupkg az.recoveryservices.6.1.2.nupkg
az.databoxedge.1.1.0.nupkg az.rediscache.1.6.0.nupkg
az.databricks.1.4.0.nupkg az.redisenterprisecache.1.1.0.nupkg
az.datafactory.1.16.11.nupkg az.relay.1.0.3.nupkg
az.datalakeanalytics.1.0.2.nupkg az.resourcemover.1.1.0.nupkg
az.datalakestore.1.3.0.nupkg az.resources.6.5.0.nupkg
az.dataprotection.1.0.1.nupkg az.security.1.3.0.nupkg
az.datashare.1.0.1.nupkg az.securityinsights.3.0.0.nupkg
az.deploymentmanager.1.1.0.nupkg az.servicebus.2.1.0.nupkg
az.desktopvirtualization.3.1.1.nupkg az.servicefabric.3.1.0.nupkg
az.devtestlabs.1.0.2.nupkg az.signalr.1.5.0.nupkg
az.dns.1.1.2.nupkg az.sql.4.1.0.nupkg
az.eventgrid.1.5.0.nupkg az.sqlvirtualmachine.1.1.0.nupkg
az.eventhub.3.2.0.nupkg az.stackhci.1.4.0.nupkg
az.frontdoor.1.9.0.nupkg az.storage.5.2.0.nupkg
az.functions.4.0.6.nupkg az.storagesync.1.7.0.nupkg
az.hdinsight.5.0.1.nupkg az.streamanalytics.2.0.0.nupkg
az.healthcareapis.2.0.0.nupkg az.support.1.0.0.nupkg
You can try something like this:
string text = "az.streamanalytics.2.0.0.nupkg";
var result = Regex
.Match(text, #"(?<name>[a-zA-Z0-9.]+?)(\.[0-9]+)*\.nupkg$")
.Groups["name"]
.Value;
Pattern explained:
(?<name>[a-zA-Z0-9.]+?) - letters, digits, dots as few as possible
(in order do not match version part)
(\.[0-9]+)* - zero or more version part: . followed by digits
\.nupkg - .nupkg
$ - end of string
Fiddle
^[^.]*\.[^.]*
You can test it out at https://regex101.com/
using System.Text.RegularExpressions;
// ...
string filename = "az.accounts.2.10.4.nupkg";
string pattern = #"^[^.]*\.[^.]*";
string nupkgPackageGetModulePath = Regex.Match(filename, pattern).Value;
// nupkgPackageGetModulePath is now "az.accounts"
You've got two different input formats
<PackageName>.nupkg
<PackageName>.<Major>.<Minor>.<Patch>.nupkg
Your current attempt:
Regex.Matches(fileName, #"[^\d]+").First().Value.TrimEnd('.')
This actually doesn't work for an input of "PowerShellGet.nupkg". To explain how this code works.
Starting at the beginning of the string, find the first non-digit character, and greedily include all other consecutive non-digit characters. This is the "matched text"
If the matched text ends with a period, take off that period.
This works fine if your input has a number in it, but "PowerShellGet.nupkg" doesn't, hence nupkgPackageGetModulePath in your code example will be the full file name not "PowerShellGet".
This will also be a huge problem if the package name itself contains a digit. How about "runtime.opensuse.13.2-x64.runtime.native.System.Security.Cryptography.OpenSsl.4.3.3.nupkg", or (and I can't believe this is actually a package) "2.2.0.0.nupgk".
It's not a good idea to find the first non-digit. Instead, work with the expected format of nuget packages.
Using string.Split:
Split the input by periods. If there's two elements in the resulting array, it's the first format and return the first element of the array. If there's at least 5 elements in the array, it's the second format. Otherwise, the format is unknown.
private static string GetPackageName(string packageFileName)
{
var segments = packageFileName.Split('.');
return segments.Length switch
{
2 => segments[0],
>= 5 => string.Join(".", segments[..^4]),
_ => throw new Exception("Unknown what you want done here")
};
}
segments[..^4] is a handy way to get all the element(s) before the major version.
https://dotnetfiddle.net/Ok6jbq
Using Regex:
Again, because you've got two different formats you've got to account for both so this gets a bit more complicated.
([\S]+?)(?:\.\d+\.\d+\.\d+)?\.nupkg
The middle section ((?:\.\d+\.\d+\.\d+)?) is a non-capture group (starts with ?:) which is optional (suffixed with ?).
Capture group 1 will have the package name.
https://regexr.com/74mgf

Find index of first Char(Letter) in string

I have a mental block and can't seem to figure this out, sure its pretty easy 0_o
I have the following string: "5555S1"
String can contain any number of digits, followed by a Letter(A-Z), followed by numbers again.
How do I get the index of the Letter(S), so that I can substring so get everything following the Letter
Ie: 5555S1
Should return S1
Cheers
You could also check if the integer representation of the character is >= 65 && <=90.
Simple Python:
test = '5555Z187456764587368457638'
for i in range(0,len(test)):
if test[i].isalpha():
break
print test[i:]
Yields: Z187456764587368457638
Given that you didn't say what language your using I'm going to pick the one I want to answer in - c#
String.Index see http://msdn.microsoft.com/en-us/library/system.string.indexof.aspx for more
for good measure here it is in java string.indexOf
One way could be to loop through the string untill you find a letter.
while(! isAlpha(s[i])
i++;
or something should work.
This doesn't answer your question but it does solve your problem.
(Although you can use it to work out the index)
Your problem is a good candidate for Regular Expressions (regex)
Here is one I prepared earlier:
String code = "1234A0987";
//timeout optional but needed for security (so bad guys dont overload your server)
TimeSpan timeout = TimeSpan.FromMilliseconds(150);
//Magic here:
//Pattern == (Block of 1 or more numbers)(block of 1 or more not numbers)(Block of 1 or more numbers)
String regexPattern = #"^(?<firstNum>\d+)(?<notNumber>\D+)(?<SecondNum>\d+)?";
Regex r = new Regex(regexPattern, RegexOptions.None, timeout);
Match m = r.Match(code);
if (m.Success)//We got a match!
{
Console.WriteLine ("SecondNumber: {0}",r.Match(code).Result("${SecondNum}"));
Console.WriteLine("All data (formatted): {0}",r.Match(code).Result("${firstNum}-${notNumber}-${SecondNum}"));
Console.WriteLine("Offset length (not that you need it now): {0}", r.Match(code).Result("${firstNum}").Length);
}
Output:
SecondNumber: 0987
All data (formatted): 1234-A-0987
Offset length (not that you need it now): 4
Further info on this example here.
So there you go you can even work out what that index was.
Regex cheat sheet

antlr grammar for tree construction from simple logic string

I want to create a parser using antlr for the following string:
"1 AND (2 OR (3 AND 4)) AND 5"
-> so i want to have AND and OR operations which should result in a tree after parsing was successful. this should result in the following tree:
AND
- 1
- OR
- 2
- AND
-3
-4
- 5
i also want to avoid unclear inputs like "1 AND 2 OR 3" as it is not clear how to construct the tree from that. And it also seems like the parser "accepts" input with trailing sings such as "1 AND 2asdf".
what i have so far is (not working as expected):
grammar code;
options {
language=CSharp3;
output=AST;
ASTLabelType=CommonTree;
//backtrack=true;
}
tokens {
ROOT;
}
#rulecatch {
catch {
throw;
}
}
#parser::namespace { Web.DealerNet.Areas.QueryBuilder.Parser }
#lexer::namespace { Web.DealerNet.Areas.QueryBuilder.Parser }
#lexer::members {
public override void ReportError(RecognitionException e) {
throw e;
}
}
public parse : exp EOF -> ^(ROOT exp);
exp
: atom
( And^ atom (And! atom)*
| Or^ atom (Or! atom)*
)?
;
atom
: Number
| '(' exp ')' -> exp
;
Number
: ('0'..'9')+
;
And
: 'AND' | 'and'
;
Or
: 'OR' | 'or'
;
WS : (' '|'\t'|'\f'|'\n'|'\r')+{ Skip(); };
Hope someone of you guys can help me get on the right track!
edit: and how can i archieve "1 AND 2 AND 3" to result in
AND
1
2
3
instead of
AND
AND
1
2
3
EDIT:
thanks for the great solution, it works like a charm except for one thing: when i call the parse() method on the following term "1 AND (2 OR (1 AND 3) AND 4" (closing bracket missing) the parser still accepts the input as valid.
this is my code so far:
grammar code;
options {
language=CSharp3;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ROOT;
}
#rulecatch {
catch {
throw;
}
}
#lexer::members {
public override void ReportError(RecognitionException e) {
throw e;
}
}
public parse
: exp -> ^(ROOT exp)
;
exp
: atom
( And^ atom (And! atom)*
| Or^ atom (Or! atom)*
)?
;
atom
: Number
| '(' exp ')' -> exp
;
Number
: ('0'..'9')+
;
And
: 'AND' | 'and'
;
Or
: 'OR' | 'or'
;
WS : (' '|'\t'|'\f'|'\n'|'\r')+{ Skip(); };
edit2:
i just found another problem with my grammar:
when i have input like "1 AND 2 OR 3" the grammar gets parsed just fine, but it should fail because either the "1 AND 2" needs to be inside brackets or the "2 OR 3" part.
i dont understand why the parser runs through as in my opinion this grammar should really cover that case.
is there any sort of online-testing-environment or so to find the problem? (i tried antlrWorks but the errors given there dont lead me anywhere...)
edit3:
updated the code to represent the new grammar like suggested.
i still have the same problem that the following grammar:
public parse : exp EOF -> ^(ROOT exp);
doesnt parse to the end.. the generated c# sources seem to just ignore the EOF... can you provide any further guidance on how i could resolve the issue?
edit4
i still have the same problem that the following grammar:
public parse : exp EOF -> ^(ROOT exp);
doesnt parse to the end.. the generated c# sources seem to just ignore the EOF... can you provide any further guidance on how i could resolve the issue?
the problem seems to be in this part of the code:
EOF2=(IToken)Match(input,EOF,Follow._EOF_in_parse97);
stream_EOF.Add(EOF2);
When i add the following code (just a hack) it works...
if (EOF2.Text == "<missing EOF>") {
throw new Exception(EOF2.Text);
}
can i change anything so the parser gets generated correclty from the start?
This rule will disallow expressions containing both AND and OR without parentheses. It will also construct the parse tree you described by making the first AND or OR token the root of the AST, and then hiding the rest of the AND or OR tokens from the same expression.
exp
: atom
( 'AND'^ atom ('AND'! atom)*
| 'OR'^ atom ('OR'! atom)*
)?
;
Edit: The second problem is unrelated to this. If you don't instruct ANTLR to consume all input by including an explicit EOF symbol in one of your parser rules, then it is allowed to consume only a portion of the input in an attempt to successfully match something.
The original parse rule says "match some input as exp". The following modification to the parse rule says "match the entire input as exp".
public parse : exp EOF -> ^(ROOT exp);

How to delete some text between the string C#

I have this kind of text:
LINE\r\n 5\r\n11DA3\r\n330\r\n2\r\n100\r\nAcDbEntity\r\n
8\r\n0-FD\r\n 6\r\nHIDDEN\r\n100
Take a look at the text in bold. I would like to replace the text between 5\r\n and \r\n100. I tried this code:
result[line] = Regex.Replace(result[line], #"((?<=5\r\n)(\S+?)(?=\r\n100))", "0");
But it doesn't work. Is there something wrong with my code? I was sure the (\S+?) is the problem. Any way to solve it?
you can use the code:
string type_1 = "LINE\r\n 5\r\n11DA3\r\n330\r\n2\r\n100\r\nAcDbEntity\r\n 8\r\n0-FD\r\n 6\r\nHIDDEN\r\n100";
string output = Regex.Replace (
type_1,
"5\r\n(.*?)\r\n100",
"5\r\n0\r\n100",
RegexOptions.Singleline|RegexOptions.Compiled
);
Console.WriteLine (output);
it outputs:
LINE
5
0
100,1
AcDbEntity
8
0-FD
6
HIDDEN
100
It will change all encounters of text 5\r\n - ANYTHING HERE - \r\n100 to 5\r\n0\r\n100. If you want a more specific change please let me know.
If the removable Contents are Static you can use
s.Replace("11DA3\r\n330\r\n2" ,100);
Or even you can try with string.indexof

How to check the arguments with Regex?

I'm stuck with regular expressions. The program is a console application written in C#. There are a few commands. I want to check the arguments are right first. I thought it'll be easy with Regex but couldn't do that:
var strArgs = "";
foreach (var x in args)
{
strArgs += x + " ";
}
if (!Regex.IsMatch(strArgs, #"(-\?|-help|-c|-continuous|-l|-log|-ip|)* .{1,}"))
{
Console.WriteLine("Command arrangement is wrong. Use \"-?\" or \"-help\" to see help.");
return;
}
Usage is:
program.exe [-options] [domains]
The problem is, program accepts all commands. Also I need to check "-" prefixed commands are before the domains. I think the problem is not difficult to solve.
Thanks...
Since you will end up writing a switch statement to process the options anyway, you would be better off doing the checking there:
switch(args[i])
{
case "-?": ...
case "-help": ...
...
default:
if (args[i][0] == '-')
throw new Exception("Unrecognised option: " + args[i]);
}
First, to parse command line arguments don't use regular expressions. Here is a related question that I think you should look at:
Best way to parse command line arguments in C#?
But for your specific problem with your regular expression - the options are optional and then you match against a space followed by anything at all, where anything can include for example invalid domains and/or invalid options. So far example this is valid according to your regular expression:
program.exe -c -invalid
One way to improve this by being more precise about the allowed characters in a domain rather than just matching anything.
Another problem with your regular expressions is that you don't allow spaces between the switches. To handle that you probably want something like this:
(?:(?:-\?|-help|-c|-continuous|-l|-log|-ip) +)*
I'd also like to point out that you should use string.Join instead of the loop you are currently using.
string strArgs = string.Join(" ", args);
Don't reinvent the wheel, handling command line arguments is a solved problem.
I've gotten good use out of the Command Line Parser Library for .Net.
Actually the easiest way to achieve command line argument parsing is to create a powershell commandlet. That gives you a really nice way to work with arguments.
I have been using this function with success... perhaps it will be useful for someone else...
First, define your variables:
private string myVariable1;
private string myVariable2;
private Boolean debugEnabled = false;
Then, execute the function:
loadArgs();
and add the function to your code:
private void loadArgs()
{
const string namedArgsPattern = "^(/|-)(?<name>\\w+)(?:\\:(?<value>.+)$|\\:$|$)";
System.Text.RegularExpressions.Regex argRegEx = new System.Text.RegularExpressions.Regex(namedArgsPattern, System.Text.RegularExpressions.RegexOptions.Compiled);
foreach (string arg in Environment.GetCommandLineArgs())
{
System.Text.RegularExpressions.Match namedArg = argRegEx.Match(arg);
if (namedArg.Success)
{
switch (namedArg.Groups["name"].ToString().ToLower())
{
case "myArg1":
myVariable1 = namedArg.Groups["value"].ToString();
break;
case "myArg2":
myVariable2 = namedArg.Groups["value"].ToString();
break;
case "debug":
debugEnabled = true;
break;
default:
break;
}
}
}
}
and to use it you can use the command syntax with either a forward slash "/" or a dash "-":
myappname.exe /myArg1:Hello /myArg2:Chris -debug
This regex parses the command line arguments into matches and groups so that you can build a parser based on this regex.
((?:|^\b|\s+)--(?<option_1>.+?)(?:\s|=|$)(?!-)(?<value_1>[\"\'].+?[\"\']|.+?(?:\s|$))?|(?:|^\b)-(?<option_2>.)(?:\s|=|$)(?!-)(?<value_2>[\"\'].+?[\"\']|.+?(?:\s|$))?|(?<arg>[\"\'].+?[\"\']|.+?(?:\s|$)))
This Regex will parse the Following and works in almost all the languages
--in-argument hello --out-stdout false positional -x
--in-argument 'hello world"
"filename"
--in-argument="hello world'
--in-argument='hello'
--in-argument hello
"hello"
helloworld
--flag-off
-v
-x="hello"
-u positive
C:\serverfile
--in-arg1='abc' --in-arg2=hello world c:\\test
Try on Regex101

Categories