c# Regex: find placeholders as substring

c# Regex: find placeholders as substring - c#

i have following string.
"hello [#NAME#]. nice to meet you. I heard about you via [#SOURCE#]."
in above text i have two place holders. NAME and SOURCE
i want to extract these sub string using Reg Ex.
what would be the reg ex pattern to find list of these place holders.
i tried
string pattern = #"\[#(\w+)#\]";
result
hello
NAME
. nice to meet you. I heard about you via
SOURCE
.
what i want is only
NAME
SOURCE
Sample code
string tex = "hello [#NAME#]. nice to meet you. I heard about you via [#SOURCE#].";
string pattern = #"\[#(\w+)#\]";
var sp = Regex.Split(tex, pattern);
sp.Dump();

Your regex is working correctly. That's, how Regex.Split() should behave (see the doc). If what you said is really what you want, you can use something like:
var matches = from Match match in Regex.Matches(text, pattern)
select match.Groups[1].Value;
If, on the other hand, you wanted to replace the placeholders using some rules (e.g. using a Dictionary<string, string>), then you could do:
Regex.Replace(text, pattern, m => substitutions[m.Groups[1].Value]);

Try this regex:
\[#([A-Z]+)#\]

^hello (.*?). nice to meet you. I heard about you via (.*?).$
Very simply, the () means you want to capture what's inside, the .*? is (what's known as) an "ungreedy" capture (capture as few characters as possible). and . means any character.
demo of above
Unless you're placeholds are always going to use [# prefix, and #] postfix, then see the other users' posts.

Related

Regex Expression for Pseudocode

I am trying to figure out a regex expression for a project, and struggling here.
Here's my sample string:
[link="http://www.cnn.com"]CNN Webpage[/link]
I want to regex.replace the above example to this:
CNN Webpage
I know there is a Regex way to do this. Can anyone help?

I personally prefer using named groups when I can. As you'll see it makes the regex/code a little more maintainable/readable. This also helps with maintenance on the code as the captured groups are no longer being referenced by the index. As you probably know, the index groups will change if you change any preceding capturing groups within the regex.
The named groups will stay consistent through the lifetime of the regex unless you specifically change it.
Regex
\[link=["\u201C](?<href>[^"\u201D]+)["\u201D]\](?<title>[^\[]+)\[/link\]
Regex Demo - Note the regex is different because of the different regex engines, but the regex is equal to the one I present here.
Code
var str = "[link=\"http://www.cnn.com\"]CNN Webpage[/link] OR [link=“http://www.cnn.com”]CNN Webpage[/link]";
var regex = new Regex(#"\[link=[""\u201C](?<href>[^""\u201D]+)[""\u201D]\](?<title>[^\[]+)\[/link\]");
//The ${name} refers to a named capture group in the regex. Makes it a little more readable, and maintainable.
str = regex.Replace(str, "${title}");
Console.WriteLine(str);
Please note that the regex only supports the "smart quotes" if the quotes are used properly, to handle cases where the quotes might be reversed you'd need to do something like this:
\[link=["\u201C\u201D](?<href>[^"\u201D\u201C]+)["\u201D\u201C]\](?<title>[^\[]+)\[/link\]
Just for clarity, the example below shows where this regex would be useful. Notice the last link has the unicode characters messed up. It uses the unicode right quote (\u201D ”) on both sides of the text. This regex will parse the data, but the one at the beginning of the post will not.
var str = "[link=\"http://www.cnn.com\"]CNN Webpage[/link] OR [link=“http://www.cnn.com”]CNN Webpage[/link] OR [link=”http://www.cnn.com”]CNN Webpage[/link]";
var regex = new Regex(#"\[link=[""\u201C\u201D](?<href>[^""\u201D\u201C]+)[""\u201D\u201C]\](?<title>[^\[]+)\[/link\]");
//The ${name} refers to a named capture group in the regex. Makes it a little more readable, and maintainable.
str = regex.Replace(str, "${title}");

Use capturing groups to capture the http link and the content of [link] tag.
Regex:
\[link="([^"]*)"\]([^\[\]]*)\[\/link]
Replacement string:
$2
DEMO

\[link(="[^"]+")\]([^\[]+)\[\/link\]
Try this.Replace by <a href$1 target="_blank">$2</a>.See demo.
http://regex101.com/r/kP8uF5/18

Why isn't this C# Regex working?

I have the following string (from a large HTML string):
href="/cgi-bin/pin.cgi?pin=94841&sid=9548.1386389012.v1"><
And here is my code:
var sids = Regex.Matches( htmlCode, "sid=(.)\">" );
I'm not pulling back any results. Is my Regex correct?

This is what it should be:
var str = #"href=""/cgi-bin/pin.cgi?pin=94841&sid=9548.1386389012.v1"">";
var sid = Regex.Match(str, #"sid=([^""]*)");
Console.WriteLine (sid.Groups[1].Value);
What you originally posted was wrong because "." acts as a wildcard, and the way you presented it meant that it would only capture 1 character, the problem with wildcards is that they're difficult to stop till you reach the end of a line, so never use them unless you have to.

. match only single character. To match multiple character you should use * or + modifier: (.+); or more preferably non-greedy version: (.+?)
Use #"verbatim string literal" if possible for regular expression.
var sids = Regex.Matches(htmlCode, #"sid=(.+?)""");
See demo run.

I think you are pretty close. Consider the following minor change to your regex...
sid=.*?\">
Good Luck!

Replacing text with RegEx and C# isn't working the way I need it to

I’m looking for a way to go through a string and replace all instances where the second and third characters will always be different but the rest will be the same. For example, if I had:
"ú07ú" to be replaced with "ú07 ú"
"ú1Eú" to be replaced with "ú1E ú"
"ú12ú" to be replaced with "ú12 ú"
I know I should use Regular Expressions, but they baffle me. I’m pretty sure the syntax will be something like:
Content = Regex.Replace(Content, #"ú...", “ú.. ú");
But obviously this isn’t working. Can any RegEx gurus lend a hand please?
Thanks

Looks like you want:
Content = Regex.Replace(Content, #"ú([^ú]+)ú", #"ú$1 ú");
This regex:
ú([^ú]+)ú
Means: match ú, then at least one character that isn't ú (and capture this part), then another ú. If you want it to only match exactly two characters in the middle, then change [^ú]+ to [^ú]{2}
Then we replace the whole thing by:
ú$1 ú
Which is: ú, then the captured part of the string, then a space and ú again.

I'm totally unfamiliar with C#, but from a regex perspective you need capturing groups.
"ú..." needs to be "(ú...)(.)" and “ú.. ú" needs to be "$1 $2" assuming C# uses the standard regex notation for capturing groups.

[TestMethod]
public void regex_test()
{
string expr = #"(?<firstThree>.{3})(?<lastOne>.{1})";
string replace = "${firstThree} ${lastOne}";
string first = "u84u";
string firstResult = "u84 u";
Assert.AreEqual<string>(firstResult, Regex.Replace(first, expr, replace));
}

Convert C# regex Code to Java

I have found this Regex extractor code in C#.
Can someone tell me how this works, and how do I write the equivalent in Java?
// extract songtitle from metadata header.
// Trim was needed, because some stations don't trim the songtitle
fileName =
Regex.Match(metadataHeader,
"(StreamTitle=')(.*)(';StreamUrl)").Groups[2].Value.Trim();

This should be what you want.
// Create the Regex pattern
Pattern p = Pattern.compile("(StreamTitle=')(.*)(';StreamUrl)");
// Create a matcher that matches the pattern against your input
Matcher m = p.matcher(metadataHeader);
// if we found a match
if (m.find()) {
// the filename is the second group. (The `(.*)` part)
filename = m.group(2);
}

It pulls "MyTitle" from a string such as "StreamTitle='MyTitle';StreamUrl".
The () operators define match groups, there are 3 in your regex. The second one contains the string of interest, and is gotten in the Groups[2].Value.
There's a few very good regex designers out there. The one I use is Rad Software's Regular Expression Designer (www.radsoftware.com.au). It is very useful for figuring out stuff like this (and it uses C# RegEx's).

C# regex need characters after \player_n\

I need a regex pattern which will accommodate for the following.
I get a response from a UDP server, it's a very long string and each word is separated by \, for example:
\g79g97\g879o\wot87gord\player_0\name0\g6868o\g78og89\g79g79\player_1\name1\gyuvui\yivyil\player_2\name2\g7g87\g67og9o\v78v9i7
I need the strings after \player_n\, so in the above example I would need name0, name1 and name3,
I know this is the second regex question of the day but I have the book (Mastering Regular Expressions) on order! Thank you.
UPDATE. elusive's regex pattern will suffice, and I can add the match(0) to a textbox. However, what if I want to add all the matches to the text box ?
textBox1.Text += match.Captures[0].ToString(); //this works fine.
How do I add "all" match.captures to the text box? :s sorry for being so lame, this Regex class is brand new to me .

Try this one:
\\player_\d+\\([^\\]+)

i think that this test sample can help you
string inp = #"\g79g97\g879o\wot87gord\player_0\name0\g6868o\g78og89\g79g79\player_1\name1\gyuvui\yivyil\player_2\name2\g7g87\g67og9o\v78v9i7";
string rex = #"[\w]*[\\]player_[0-9]+[\\](?<name>[A-Za-z0-9]*)\b";
Regex re = new Regex(rex);
Match mat = re.Match(inp);
for (Match m = re.Match(inp); m.Success; m = m.NextMatch())
{
Console.WriteLine(m.Groups["name"]);
}
you can take the name of the player from the m.Groups["name"]

To get only the player name, you could use:
(?<=\\player_\d+\\)[^\\]+
This (?<=\\player_\d+\\) is something called a positive look-behind. It makes sure that the actual match [^\\]+ is preceded by the expression in the parentheses.
In this case, it's even specific to only a few regex engines (.NET being among them, luckily), in that it contains a variable length expression (due to \d+). Most regex engines only support fixed-length look-behind.
In any case, look-behind is not necessarily the best approach to this problem, match groups are simpler easier to read.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# Regex: find placeholders as substring - c#

Try this regex: \[#([A-Z]+)#\]

Related

Regex Expression for Pseudocode

Why isn't this C# Regex working?

Replacing text with RegEx and C# isn't working the way I need it to

Convert C# regex Code to Java

C# regex need characters after \player_n\

Categories

Resources