C# Regex Formatting String without Input string - c#

I have the following Regex that is being used to matching incoming packets:
public static class ProtobufConstants
{
public static Regex ContentTypeNameRegex => new Regex("application/protobuf; proto=(.*?)");
}
I also need to write outgoing packets strings in the same format, i.e. create strings similar to "application/protobuf; proto=mynamespace.class1" ideally by using the same regex definition new Regex("application/protobuf; proto=(.*?)");.
To keep this code in one place, is it possible to use this regex template and replace the (.*?) parameter with a string (as per above example i would like to substitute "mynamespace.class1").
I see there is a Regex.Replace(string input, string replacement) but given the above ContentTypeNameRegex already has the format defined I don't have an input per se, I just want to format - not sure what to put here, if anything.
Is it possible to use in this manner, or do i need to revert to string.Format?

If you just want to replace the matched group with something else, you can change your pattern to:
(application/protobuf; proto=)(.*?)
That way, you can replace it by doing something like:
Regex re = ContentTypeNameRegex;
string replacement = "mynamespace.class1";
re.Replace(input, "$1" + replacement);

Use Regex.Replace but use the match evaluator to handle your formatting needs. Here is an example which simply replaces a slash with a dash and visa versa, based on what has been matched.
var text = "001-34/323";
Regex.Replace(text, "[-/]", me => { return me.Value == "-" ? "/" : "-"; })
Result
001/34-323
You can do the same with your input, to decide to change it or send it on as is.

Related

C# Extract part of the string that starts with specific letters

I have a string which I extract from an HTML document like this:
var elas = htmlDoc.DocumentNode.SelectSingleNode("//a[#class='a-size-small a-link-normal a-text-normal']");
if (elas != null)
{
//
_extractedString = elas.Attributes["href"].Value;
}
The HREF attribute contains this part of the string:
gp/offer-listing/B002755TC0/
And I'm trying to extract the B002755TC0 value, but the problem here is that the string will vary by its length and I cannot simply use Substring method that C# offers to extract that value...
Instead I was thinking if there's a clever way to do this, to perhaps a match beginning of the string with what I search?
For example I know for a fact that each href has this structure like I've shown, So I would simply match these keywords:
offer-listing/
So I would find this keyword and start extracting the part of the string B002755TC0 until the next " / " sign ?
Can someone help me out with this ?
This is a perfect job for a regular expression :
string text = "gp/offer-listing/B002755TC0/";
Regex pattern = new Regex(#"offer-listing/(\w+)/");
Match match = pattern.Match(text);
string whatYouAreLookingFor = match.Groups[1].Value;
Explanation : we just match the exact pattern you need.
'offer-listing/'
followed by any combination of (at least one) 'word characters' (letters, digits, hyphen, etc...),
followed by a slash.
The parenthesis () mean 'capture this group' (so we can extract it later with match.Groups[1]).
EDIT: if you want to extract also from this : /dp/B01KRHBT9Q/
Then you could use this pattern :
Regex pattern = new Regex(#"/(\w+)/$");
which will match both this string and the previous. The $ stands for the end of the string, so this literally means :
capture the characters in between the last two slashes of the string
Though there is already an accepted answer, I thought of sharing another solution, without using Regex. Just find the position of your pattern in the input + it's lenght, so the wanted text will be the next character. to find the end, search for the first "/" after the begining of the wanted text:
string input = "gp/offer-listing/B002755TC0/";
string pat = "offer-listing/";
int begining = input.IndexOf(pat)+pat.Length;
int end = input.IndexOf("/",begining);
string result = input.Substring(begining,end-begining);
If your desired output is always the last piece, you could also use split and get the last non-empty piece:
string result2 = input.Split(new string[]{"/"},StringSplitOptions.RemoveEmptyEntries)
.ToList().Last();

Cutting text to specific length preserving the words

I have the following text:
Test some text. Now here is some new realylonglonglong text
And I need to cut it to 50 characters but without cutting the words. So, the desire result is:
Test some text. Now here is some new ...
I am looking only for solution using regular expression replace. The following regular expression:
^.{0,50}(?= |$)
matches:
Test some text. Now here is some new
but I failed transforming it for use in replace function.
In my real case I have SQL CLR function called [dbo].[RegexReplace] and I am calling it like this:
SELECT [dbo].[RegexReplace](#TEST, '^.{0,50}(?= |$)', '...')
Its C# definition is:
public static string Replace(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement)
{
string input = (sqlInput.IsNull) ? string.Empty : sqlInput.Value;
string pattern = (sqlPattern.IsNull) ? string.Empty : sqlPattern.Value;
string replacement = (sqlReplacement.IsNull) ? string.Empty : sqlReplacement.Value;
return Regex.Replace(input, pattern, replacement);
}
That's why I want to to this with regular expression replace function.
This is the regex you want:
string result = Regex.Replace("Test some text. Now here is some new realylonglonglong text", "(?=.{50,})(^.{0,50}) .*", "$1...");
so look for ^(?=.{50,})(.{0,50}) .* and replace it with $1...
Explanation... You are looking for texts that are AT LEAST 50 characters long, because shorter texts don't need shortening, so (?=.{50,}) (but note that this won't capture anything). Then you look for the first 0...50 characters (.{0,50}) followed by a space , followed by anything else .*. You'll replace all of this with the first 0...50 characters ($1) followed by ...
I need the (?=.{50,}) because otherwise the regex would replace Test test with Test..., replacing from the first space.

Check if an expression is a match with regex

In C# I have two strings: [I/text] and [S/100x20].
So, the first one is [I/ followed by text and ending in ].
And the second is [S/ followed by an integer, then x, then another integer, and ending in ].
I need to check if a given string is a match of one of this formats. I tried the following:
(?<word>.*?) and (?<word>[0-9]x[0-9])
But this does not seem to work and I am missing the [I/...] and [S/...] parts.
How can I do this?
This should do nicely:
Regex rex = new Regex(#"\[I/[^\]]+\]|\[S/\d+x\d+\]");
If the text in [I/text] is supposed to include only alphanumeric characters then #Oleg's use of the \w instead of [^\]] would be better. Also using + means there needs to be at least one of the preceding character class, and the * allows class to be optional. Adjust as needed..
And use:
string testString1 = "[I/text]";
if(rex.IsMatch(testString1))
{
// should match..
}
string testString2 = "[S/100x20]";
if(rex.IsMatch(testString2))
{
// should match..
}
Following regex does it. Matches the whole string
"(\[I/\w+\])|(\[S/\d+x\d+\])"
([I/\w+])
(S/\d+x\d+])
the above works.
use http://regexr.com?34543 to play with your expressions

A string replace function with support of custom wildcards and escaping these wildcards in C#

I need to write a string replace function with custom wildcards support. I also should be able to escape these wildcards. I currently have a wildcard class with Usage, Value and Escape properties.
So let's say I have a global list called Wildcards. Wildcards has only one member added here:
Wildcards.Add(new Wildcard
{
Usage = #"\Break",
Value = Enviorement.NewLine,
Escape = #"\\Break"
});
So I need a CustomReplace method to do the trick. I should replace the specified parameter in a given string with another one just like the string.Replace. The only difference here that it must use my custom wildcards.
string test = CustomReplace("Hi there! What's up?", "! ", "!\\Break");
// Value of the test variable should be: "Hi there!\r\nWhat's up?"
// Because \Break is specified in a custom wildcard in Wildcards
// But if I use the value of the wildcard's Escape member,
// it should be replaced with the value of Usage member.
test = CustomReplace("Hi there! What's up?", "! ", "!\\\\Break");
// Value of the test variable should be: "Hi there!\\BreakWhat's up?"
My current method doesn't support escape strings.
It also can't be good when it comes to performance since I call string.Replace two times and each one searches the whole string, I guess.
// My current method. Has no support for escape strings.
CustomReplace(string text, string oldValue, string newValue)
{
string done = text.Replace(oldValue, newValue);
foreach (Wildcard wildcard in Wildcards)
{
// Doing this:
// done = done.Replace(wildcard.Escape, wildcard.Usage);
// ...would cause trouble when Escape contains Usage.
done = done.Replace(wildcard.Usage, wildcard.Value);
}
return done;
}
So, do I have to write a replace method which searches the string char by char with the logic to find and seperate both Usage and Escape values, then replace Escape with Usage while replacing Usage with another given string?
Or do you know an already written one?
Can I use regular expressions in this scenerio?
If I can, how? (Have no experience in this, a pattern would be nice)
If I do, would it be faster or slower than char by char searching?
Sorry for the long post, I tried to keep it clear and sorry for any typos and such; it's not my primary language. Thanks in advance.
You can try this:
public string CustomReplace(string text, string oldValue, string newValue)
{
string done = text.Replace(oldValue, newValue);
var builder = new StringBuilder();
foreach (var wildcard in Wildcards)
{
builder.AppendFormat("({0}|{1})|", Regex.Escape(wildcard.Usage),
Regex.Escape(wildcard.Escape));
}
builder.Length = builder.Length - 1; // Remove the last '|' character
return Regex.Replace(done, builder.ToString(), WildcardEvaluator);
}
private string WildcardEvaluator(Match match)
{
var wildcard = Wildcards.Find(w => w.Usage == match.Value);
if (wildcard != null)
return wildcard.Value;
else
return match.Value;
}
I think this is the easiest and fastest solution as there is only one Replace method call for all wildcards.
So if you are happy to just use Regex to fulfil your needs then you should check out this link. It has some great info for using in .Net. The website also has loads of examples on who to construct Regex patterns for many different needs.
A basic example of a Replace on a string with wildcards might look like this...
string input = "my first regex replace";
string result = System.Text.RegularExpressions.Regex.Replace(input, "rep...e", "result");
//result is now "my first regex result"
notice how the second argument in the Replace function takes a regex pattern string. In this case, the dots are acting as a wildcard character, they basically mean "match any single character"
Hopefully this will help you get what you need.
If you define a pattern for both your wildcard and your escape method, you can create a Regex which will find all the wildcards in your text. You can then use a MatchEvaluator to replace them.
class Program
{
static Dictionary<string, string> replacements = new Dictionary<string, string>();
static void Main(string[] args)
{
replacements.Add("\\Break", Environment.NewLine);
string template = #"This is an \\Break escaped newline and this should \Break contain a newline.";
// (?<=($|[^\\])(\\\\){0,}) will handle double escaped items
string outcome = Regex.Replace(template, #"(?<=($|[^\\])(\\\\){0,})\\\w+\b", ReplaceMethod);
}
public static string ReplaceMethod(Match m)
{
string replacement = null;
if (replacements.TryGetValue(m.Value, out replacement))
{
return replacement;
}
else
{
//return string.Empty?
//throw new FormatException()?
return m.Value;
}
}
}

Multifunction RegEx for parsing JCL variables - out of working solutions

I'm a bit lost creating a RegEx under C#.NET.
I'm doing something like parser, so I use Regex.Replace to search text for certain "variables" and replace them with their "values".
Each variable starts with ampersand ("&") and ends with ampersand (begining of another variable) or dot.
Each variable (as well as text surrounding variables) can only consist of alphanumerical characters and certain "special" characters, that being "$", "#", "#" and "-".
Nor variables, nor the rest of the text could contain space characters (" ").
Now, the problem is that I'm trying to figure out a RegEx replacing one possible ending character ("."), while not replacing the other possible ending character ("&").
Which happanes to be quite an issue:
"&"+variable+"[^A-Za-z0-9##$]" does what I want, except for it also replaces "&" - not acceptable.
"&"+variable+"(.)?\b" replaces dot, but only if followed by literal character - not if it's followed by \&\##\$\- and that could occur, so this doesn't work either.
"&"+variable+"(.)?(?!A-Za-z0-9)" does exactly what i want as for the ending characters, except it doesn't recognize true end of variable - this way, search-and-replace for "&DEN" also replaces that part in another variable, called "&DENV" - of which "&DEN" is a substring. This would create false/misleading results - totally unacceptable.
These were all the possibilities I could think of (and search of); is it possible to do the task I require with one RegEx at all? Under C#.NET RegEx parser?
Just to illustrate desired function:
string variable="DEN";
string replaceWith="28";
string replText;
string regex = "<desired regex>";
replText = Regex.Replace(replText, "&"+variable+regex, replaceWith);
replText="&DEN";
=> replaced => repltext=="28"
replText="&DENV"
=> not replaced => repltext=="&DENV"
replText="&DEN&DEN"
=> replaced => repltext=="2828"
replText="&DEN&DENV"
=> replaced, not replaced => repltext=="28&DENV"
replText="&DEN.anything"
=> replaced and dot removed => repltext=="28anything"
replText="&DEN..anything"
=> replaced and first dot removed => repltext=="28.anything"
variable could also be like "#DE#N-$".
The following works correctly on all of your examples. I assumed that a variable &FOO should only be replaced if it's followed by ., &, or end-of-string $. If it's followed by anything else, it's not replaced.
In order to match but not capture a terminating &, I used a lookahead assertion (?=&). Assertions force the string to match the regex, but they don't consume any characters, so those characters aren't replaced. Trailing . are still captured and replaced as part of the variable, however.
Finally, a MatchEvaluator is specified to use the captured pattern to do a lookup in the replacements dictionary for the replacement value. If the pattern (variable name) is not found, the text is effectively untouched (the full original capture is returned).
class Program
{
static string ReplaceVariables(Dictionary<string, string> replacements, string input)
{
return Regex.Replace(input, #"&([\w\d$##-]+)(\.|(?=&)|$)", m =>
{
string replacement = null;
return replacements.TryGetValue(m.Groups[1].Value, out replacement)
? replacement
: m.Groups[0].Value;
});
}
static void Main(string[] args)
{
string[] tests = new[]
{
"&DEN", "&DENV", "&DEN&DEN",
"&DEN&DENV", "&DEN.anything",
"&DEN..anything", "&DEN Foo",
"&DEN&FOO&DEN"
};
var replace = new Dictionary<string, string>
{
{ "DEN", "28" },
{ "FOO", "42" }
};
foreach (var test in tests)
{
Console.WriteLine("{0} -> {1}", test, ReplaceVariables(replace, test));
}
}
}
Ok, I think I finally found it, using ORs. Regex
(.)?([^A-Za-z0-9#\#\$\&\,\;\:-\<>()\ ]|(?=\&)|\b)
seems to work fine. I'm just posting this if anyone found it helpfull.
EDIT: sorry, I haven't refreshed the page and thus reacted without knowing there is a better answer provided by Chris Schmich

Categories