How to find « («) in a string - c#

I need to find &#171 in a string with a regex. How would you add that into the following:
String RegExPattern = #"^[0-9a-df-su-z]+\.\s&#171";
Regex PatternRegex = new Regex(RegExPattern);
return (PatternRegex.Match(Source).Value);

You should be able to simply use it directly:
var pattern = new Regex("&#171");
Of course, if used alone you can also use String.IndexOf instead. If you want to use it in another pattern, as in your question, go ahead. The usage is correct.
If, on the other hand, you also want to allow the named entity, use an alternation:
var pattern = new Regex("(?:&#171|«)");
Once again, the same can be done in a more complex expression. The ?: at the beginning of the group isn’t necessary; it just prevents that a capture group will be created for this alternation.

Related

Regex : Replace text between semicolons a certain amount of times

i'm a bit confused with regex, i have a line which looks like something like this :
test = "article;vendor;qty;desc;price1;price2"
and what i'm trying to do is to only get price1.
I'm currently using this function :
Regex.Replace(test, #".*;[^;]*;", "");
which permit me to get price2 but I can't see how I can isolate price1.
Have you consider just using a String.Split() call instead to break your current semi-colon delimited string into an array :
var input = "article;vendor;qty;desc;price1;price2";
var output = input.Split(';');
And then you could simply access your value by its index :
var result = output[4]; // yields "price1"
You will only want to use a Regular Expression if there is a specific pattern that you can use to match and select exactly what you are looking for, but for delimited lists, the String.Split() method will usually make things easier (especially if there is nothing to uniquely identify the item you are trying to pull from the list).
Use the following regex:
(?:[^;]*;){4}([^;]*);
And replace the first match group.

.net System.Text.RegularExpressions nesting regexps in regexp

I would like to ask any skilled .net developer, if there is a possibility to define regular expression (using the .net RegularExpressions namespace cpabilities), which would include references to another regexp(s). I would like to describe grammar rules, each rule as a single regexp. The final regexp would be the grammar's start symbol.
Of course I can perform the expansion to single line regular expression, but the readability would suffer. I also would not like to try each option included in start symbol programatically (like foreach(regexp r in line.regexps) {check if r.matches(input)}).
For example having following ini-like file grammar in regexp-like form (does not follow microsoft regexp rules, just general ones):
sp = \s*
allowed_char = [a-zA-Z0-9_]
key = <allowed_char>+
value = <allowed_char>((<allowed_char>|[ ])*<allowed_char>)?
comment = (;|(//)|#)(.*)
empty_line = ^<sp>$
line_comment = ^<sp><comment>$
section = ^<sp>\[<sp><value><sp>\]<sp>(<comment>)?$
item = ^<sp><key><sp>=<sp><value><sp>(<comment>)?$
line = <empty_line>|<line_comment>|<section>|<item>
I would like to:
Check if a sentence is part of the language (true/false) - seems trivial: matches the <line> start symbol.
Access the terminal-like symbol values (e.g. <section>, <key>, <value>, ...) - I suppose this could be achieved via named matching groups (or whatever exactly is it called - still nedd to read some details at msdn).
I do not expect you to write the code, just if you could give me some hints, whether it is possible (and how) or not, because I have not found this info yet. All examples are for single regexp matching.
Thank you.
This is what I came up with when I was doing my own regex based mathematical expression parser:
private static class Regexes {
// omitted...
private static readonly string
strFunctionNames = "sin|ln|cos|tg|tan|abs",
strReal = #"([\+-]?\d+([,\.]\d+)?(E[\+-]?\d+)?)|[\+-]Infinit(y|o)",
strFunction = string.Format( #"(?<function>{0})(?<argument>{1})",
strFuncitonNames, strReal );
// omitted...
public static readonly Regex
FunzioniLowerCase = new Regex( strFunctionNames ),
RealNumber = new Regex( strReal ),
Function = new Regex( strFunction );
}
This has the obvious disadvantage that there's some sort of repetition in the code, but you could use reflection to compile (and perhaps even create) those regexes in a static constructor.

Alternative to RegEx

I am currently passing a parameter to a SQL string like this -
grid=0&
And I am using a RegEx to get the 0 value like so-
Match match = Regex.Match(input, #"grid=([A-Za-z0-9\-]+)\&$",
RegexOptions.IgnoreCase);
string grid = match.Groups[1].Value;
which works perfectly.
However as development has progressed it is clear that more parameters will be added to the string like so-
grid=0&hr=3&tb=0
These parameters may come in a different order in the string each time so clearly the RegEx I am currently using wont work. I have looked into it and think Split may be an option however not sure.
What would the best method be and how could I apply it to my current problem?
If you're parsing query string and looking for an alternative to Regex, there is a specialized class and method for that, it returns collection of parameters:
string s = "http://www.something.com?grid=0&hr=3&tb=0";
Uri uri = new Uri(s);
var result = HttpUtility.ParseQueryString(uri.Query);
You have to include System.Web namespace.
You can access each of the parameters' values by using it's key:
foreach (string key in result.Keys)
{
string value = result[key];
// action...
}
Regexes can still be used here. Consider adding another capture group to capture the property name, and then looping over all of the results using Matches rather that Match, or calling Match multiple times.

String-parsing-fu: Can you help me find a way to retrieve this value?

I need to somehow detect if there is a parent OU value, and if there is retrieve it.
For example, here there is no parent:
LDAP://servera/OU=Santa Cruz,DC=contoso,DC=com
But here, there is a parent:
LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com
So I would need to retrieve that "Ventas" string.
Another example:
LDAP://servera/OU=Contabilidad,OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com
I would need to retrieve that "Ventas" string as well.
Any suggestions on how to tackle this?
string ldap = "LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com";
Match match = Regex.Match(ldap, #"LDAP://\w+/OU=(?<toplevelou>\w+?),OU=");
if(match.Success)
{
Console.WriteLine(match.Result("${toplevelou}"));
}
I'd find the first occurrence of OU=... and get it's value. Then I'd check if there was another occurrence after it. If so, return the value I've got. If not, return whatever it is you want if there's no parent (String.Empty, or, null, or whatever).
You could also use a regular express like this:
var regex = new Regex(#"OU=(.*?),");
var matches = regex.Matches(ldapString);
Then check how many matches there are. If >1 return the captured value from the first match.
Update
The regex above needs to be improved to allow the case where there's an escaped comma (\,) in the LDAP string. Maybe something like:
var regex = new Regex(#"OU=((.*?(\\\,)+?)+?),");
That may be broken, and there may be simpler way to do the same thing. I'm not a regex wizard.
Another Update
Per Kimberly's comment below the regex should be #"OU=((?:.*?(?:\\\,)*?)+?),".
Call me crazy, but I 'd do it this way (hey ma, look, an one-liner!):
var str = "LDAP://servera/OU=Ventas,OU=Santa Cruz,DC=contoso,DC=com";
var result = str.Substring(str.LastIndexOf('/') + 1).Split(',')
.Select(s => s.Split('='))
.Where(a => a[0] == "OU")
.Select(a => a[1])
.Reverse().Skip(1).FirstOrDefault();
result is either null or has the string you want. This will work no matter how many OUs are in there and return the second-to-last one, as long as the format of the string is valid to begin with.
Update: possible improvements:
The above will not work correctly if your DN contains an escaped forward slash or an escaped comma.
To fix both of these you need to use regular expressions. Change:
str.Substring(str.LastIndexOf('/') + 1).Split(',')
to:
Regex.Split(Regex.Split(str, "(?<!\\\\)/").Last(), "(?<!\\\\),")
What this does is separate the DN by getting the last part of str after splitting on forward slashes, and split the in parts DN by splitting on commas. In both cases, negative lookbehind is used to make sure that the slashes/commas are not escaped.
Not as pretty, I know. But it's still an one-liner (yay!) and it still allows you to use LINQ further down to handle multiple OUs any way you choose to.

Conditional Regex Replace in C# without MatchEvaluator

So, Im trying to make a program to rename some files. For the most part, I want them to look like this,
[Testing]StupidName - 2[720p].mkv
But, I would like to be able to change the format, if so desired. If I use MatchEvaluators, you would have to recompile every time. Thats why I don't want to use the MatchEvaluator.
The problem I have is that I don't know how, or if its possible, to tell Replace that if a group was found, include this string. The only syntax for this I have ever seen was something like (?<group>:data), but I can't get this to work. Well if anyone has an idea, im all for it.
EDIT:
Current Capture Regexes =
^(\[(?<FanSub>[^\]\)\}]+)\])?[. _]*(?<SeriesTitle>[\w. ]*?)[. _]*\-[. _]*(?<EpisodeNumber>\d+)[. _]*(\-[. _]*(?<EpisodeName>[\w. ]*?)[. _]*)?([\[\(\{](?<MiscInfo>[^\]\)\}]*)[\]\)\}][. _]*)*[\w. ]*(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*[Ss](?<SeasonNumber>\d+)[Ee](?<EpisodeNumber>\d+).*?(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*(?<SeasonNumber>\d)(?<EpisodeNumber>\d{2}).*?(?<Extension>\.[a-zA-Z]+)$
Current Replace Regex = [${FanSub}]${SeriesTitle} - ${EpisodeNumber} [${MiscInfo}]${Extension}
Using Regex.Replace, the file TestFile 101.mkv, I get []TestFile - 1[].mkv. What I want to do is make it so that [] is only included if the group FanSub or MiscInfo was found.
I can solve this with a MatchEvaluator because I actually get to compile a function. But this would not be a easy solution for users of the program. The only other idea I have to solve this is to actually make my own Regex.Replace function that accepts special syntax.
It sounds like you want to be able to specify an arbitrary format dynamically rather than hard-code it into your code.
Perhaps one solution is to break your filename parts into specific groups then pass in a replacement pattern that takes advantage of those group names. This would give you the ability to pass in different replacement patterns which return the desired filename structure using the Regex.Replace method.
Since you didn't explain the categories of your filename I came up with some random groups to demonstrate. Here's a quick example:
string input = "Testing StupidName Number2 720p.mkv";
string pattern = #"^(?<Category>\w+)\s+(?<Name>.+?)\s+Number(?<Number>\d+)\s+(?<Resolution>\d+p)(?<Extension>\.mkv)$";
string[] replacePatterns =
{
"[${Category}]${Name} - ${Number}[${Resolution}]${Extension}",
"${Category} - ${Name} - ${Number} - ${Resolution}${Extension}",
"(${Number}) - [${Resolution}] ${Name} [${Category}]${Extension}"
};
foreach (string replacePattern in replacePatterns)
{
Console.WriteLine(Regex.Replace(input, pattern, replacePattern));
}
As shown in the sample, named groups in the pattern, specified as (?<Name>pattern), are referred to in the replacement pattern by ${Name}.
With this approach you would need to know the group names beforehand and pass these in to rearrange the pattern as needed.

Categories