how to build a regex with square brackets

how to build a regex with square brackets - c#

I need to build a reg-ex to find all strings between [" and ",
There are multiple occurrences of both the above strings and i want all content between them. Help please?
here is an example : http://pastebin.com/crFDit2N

You mean a string such as [" my beautiful string ", ?
Then it sounds like this simple regex:
\[".*?",
To get all the strings in C#, you can do something like
using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program {
static void Main() {
string s1 = #" ["" my beautiful string "", ["" my second string "", ";
var resultList = new StringCollection();
try {
var myRegex = new Regex(#"\["".*?"",", RegexOptions.Multiline);
Match matchResult = myRegex.Match(s1);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[0].Value);
Console.WriteLine(matchResult.Groups[0].Value);
matchResult = matchResult.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program

You can use this regex:
(?<=\[").*?(?=",)
It uses look-behind and look-ahead positive assertions to check that the match is preceded by [" and followed by ",.

#Szymon and #zx81 :
Be careful : there can be a problem with your regex (depending of xhammer needs). If the string is for example :
["anything you want["anything you want",anything you want
Your regex will catch ["anything you want["anything you want", and not ["anything you want",
To solve this problem, you can use : [^\[] instead of the . in each regex.
The best way to see if a regex works for your needs is to test it in an online regex tester.
(PS : Even this solution isn't perfect in case there can be '[' in the string but I don't see how to solve this case in only one regex)

Related

c# Replace text within curly brackets, including the curly brackets

I'm trying to replace a string,enclosed in curly brackets.
If I use the Replace method provided by the Regex class and I don't specify the curly brackets, the string is found and replaced correctly, but if I do specify the curly brackets like this: {{FullName}}, the text is left untouched.
var pattern = "{{" + keyValue.Key + "}}";
docText = new Regex(pattern, RegexOptions.IgnoreCase).Replace(docText, keyValue.Value);
Take this string as a example
Dear {{FullName}}
I want to replace it with John, so that the text ends up like this:
Dear John.
How can I express the regex, so that the string is found and replace correctly?

You don't need a regular expression if the key is just a string. Just replace "{{FullName}}" with "John". example:
string template = "Dear {{FullName}}";
string result = template.Replace("{{" + keyValue.Key + "}}", keyValue.Value);
Edit: addressing concerns that this doesn't work...
The following is a complete example. You can run it at https://dotnetfiddle.net/wnIkvf
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var keyValue = new KeyValuePair<string,string>("FullName", "John");
string docText = "Dear {{FullName}}";
string result = docText.Replace("{{" + keyValue.Key + "}}", keyValue.Value);
Console.WriteLine(result);
}
}

var keyValue = new KeyValuePair<string,string>("FullName", "John");
var pattern = "{{" + keyValue.Key + "}}";
Console.WriteLine(new Regex(Regex.Escape(pattern), RegexOptions.IgnoreCase).Replace("Dear {{FullName}}", keyValue.Value));
Output:
Dear John

Looking for "Dear {{FullName}}" to "Dear John"?
not a regex solution... but this is how I prefer to do it sometimes.
string s = "Dear {{FullName}}";
// use regex to replace FullName like you mentioned before, then...
s.Replace("{",string.empty);
s.Replace("}",string.empty);

If you actually want to use a regular expression, then escape your literal text to turn it into a regular expression pattern using Regex.Escape.
var keyValue = new KeyValuePair<string,string>("FullName", "John");
string docText = "Dear {{FullName}}";
var pattern = "{{" + keyValue.Key + "}}";
docText = new Regex(Regex.Escape(pattern), RegexOptions.IgnoreCase).Replace(docText, keyValue.Value);
docText will be Dear John

What I believe is that you really want to do is replace multiple things in a document.
To do so use the regex pattern I provide, but also use the regex replace match evaluator delegate. What that does, is that every match can be actively evaluated for each item and a proper item will be replaced as per C# logic.
Here is an example with two possible keywords setup.
string text = "Dear {{FullName}}, I {{UserName}} am writing to say what a great answer!";
string pattern = #"\{\{(?<Keyword>[^}]+)\}\}";
var replacements
= new Dictionary<string, string>() { { "FullName", "OmegaMan" }, { "UserName", "eddy" } };
Regex.Replace(text, pattern, mt =>
{
return replacements.ContainsKey(mt.Groups["Keyword"].Value)
? replacements[mt.Groups["Keyword"].Value]
: "???";
}
);
Result
Dear OmegaMan, I eddy am writing to say what a great answer!
The preceding example uses
Match Evaluator Delegate
Named match capture groups (?<{Name here}> …)
Set Negation [^ ] which says match until the negated item is found, in this case a closing curly }.

How can I return only the digits from a large string with symbols, letters and of course digits with C#

I have this code down here and I need to return the args.Content (my input data) with only digits and deleteing the rest of characteres. I've been trying many things with regular expressions but it didnt work for me. I have almost no idea of C# and I really need the help from the programers of this website.
using System;
using VisualWebRipper.Internal.SimpleHtmlParser;
using VisualWebRipper;
public class Script
{
public static string TransformContent(WrContentTransformationArguments args)
{
try
{
//Place your transformation code here.
//This example just returns the input data
return args.Content;
}
catch(Exception exp)
{
//Place error handling here
args.WriteDebug("Custom script error: " + exp.Message);
return "Custom script error";
}
}
}
Hope you can help

Just delete anything that is not a digit. There is a predefined character class for digits: \d, the negation is \D.
So you regex is simply:
\D+
In your C# code it would be something like
return Regex.Replace(args.Content, #"\D+", "");

Certainly not the most efficient, but oh, well, I couldn't resist doing some LINQ:
var digitsOnly = new string(args.Content.Where(c => char.IsDigit(c)).ToArray())

StringBuilder builder = new StringBuilder();
Regex regex = new Regex(#"\d{1}");
MatchCollection matches = regex.Matches(args.Content);
foreach (var match in matches)
{
builder.Append(match.ToString());
}
return builder.ToString();

I want to strip off everything but numbers, $, comma(,)

I want to strip off everything but numbers, $, comma(,).
this only strip letters
string Cadena;
Cadena = tbpatronpos6.Text;
Cadena = Regex.Replace(Cadena, "([^0-9]|\\$|,)", "");
tbpatronpos6.Text = Cadena;
Why doesn't my regex work, and how can I fix it?

I suspect this is what you want:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main(string[] args)
{
string original = #"abc%^&123$\|a,sd";
string replaced = Regex.Replace(original, #"[^0-9$,]", "");
Console.WriteLine(replaced); // Prints 123$,
}
}
The problem was your use of the alternation operator, basically - you just want the set negation for all of (digits, comma, dollar).
Note that you don't need to escape the dollar within a character group.

you want something like this?
[^\\d\\$,]

How can I change this regular expression so grab the text before the FIRST colon and ignore the rest?

What do I have to change in this regular expression so that in both cases below it gets the text before the first colon as the "label" and all the rest of the text as the "text".
using System;
using System.Text.RegularExpressions;
namespace TestRegex92343
{
class Program
{
static void Main(string[] args)
{
{
//THIS WORKS:
string line = "title: The Way We Were";
Regex regex = new Regex(#"(?<label>.+):\s*(?<text>.+)");
Match match = regex.Match(line);
Console.WriteLine("LABEL IS: {0}", match.Groups["label"]); //"title"
Console.WriteLine("TEXT IS: {0}", match.Groups["text"]); //"The Way We Were"
}
{
//THIS DOES NOT WORK:
string line = "title: The Way We Were: A Study of Youth";
Regex regex = new Regex(#"(?<label>.+):\s*(?<text>.+)");
Match match = regex.Match(line);
Console.WriteLine("LABEL IS: {0}", match.Groups["label"]);
//GETS "title: The Way We Were"
//SHOULD GET: "title"
Console.WriteLine("TEXT IS: {0}", match.Groups["text"]);
//GETS: "A Study of Youth"
//SHOULD GET: "The Way We Were: A Study of Youth"
}
Console.ReadLine();
}
}
}

new Regex(#"(?<label>[^:]+):\s*(?<text>.+)");
This simply replaces the dot with a [^:] character class. This means any character except colon.

Regular expression are greedy, and the . matches anything. That's why label is getting the whole string. If your titles are always just text, I would recommend the following:
(?<label>\w+):\s*(?<text>.+)
Otherwise, you could make the expression not greedy with:
(?<label>.+?):\s*(?<text>.+)
You want to avoid the greedy operators whenever possible and always try to match specifically what you want.

Regular expression to use which matches text before .html and after /

With this string
http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html
I need to get sdf-as
with this
hellow-1/yo-sdf.html
I need yo-sdf

This should get you want you need:
Regex re = new Regex(#"/([^/]*)\.html$");
Match match = re.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
Console.WriteLine(match.Groups[1].Value); //Or do whatever you want with the value
This needs using System.Text.RegularExpressions; at the top of the file to work.

There are many ways to do this. The following uses lookarounds to match only the filename portion. It actually allows no / if such is the case:
string[] urls = {
#"http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html",
#"hellow-1/yo-sdf.html",
#"noslash.html",
#"what-is/this.lol",
};
foreach (string url in urls) {
Console.WriteLine("[" + Regex.Match(url, #"(?<=/|^)[^/]*(?=\.html$)") + "]");
}
This prints:
[sdf-as]
[yo-sdf]
[noslash]
[]
How the pattern works
There are 3 parts:
(?<=/|^) : a positive lookbehind to assert that we're preceded by a slash /, or we're at the beginning of the string
[^/]* : match anything but slashes
(?=\.html$): a positive lookahead to assert that we're followed by ".html" (literally on the dot)
References
regular-expressions.info/Lookarounds, Anchors
A non-regex alternative
Knowing regex is good, and it can do wonderful things, but you should always know how to do basic string manipulations without it. Here's a non-regex solution:
static String getFilename(String url, String ext) {
if (url.EndsWith(ext)) {
int k = url.LastIndexOf("/");
return url.Substring(k + 1, url.Length - ext.Length - k - 1);
} else {
return "";
}
}
Then you'd call it as:
getFilename(url, ".html")
API links
String.Substring, EndsWith, and LastIndexOf
Attachments
Source code and output on ideone.com

Try this:
string url = "http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html";
Match match = Regex.Match(url, #"/([^/]+)\.html$");
if (match.Success)
{
string result = match.Groups[1].Value;
Console.WriteLine(result);
}
Result:
sdf-as
However it would be a better idea to use the System.URI class to parse the string so that you correctly handle things like http://example.com/foo.html?redirect=bar.html.

using System.Text.RegularExpressions;
Regex pattern = new Regex(".*\/([a-z\-]+)\.html");
Match match = pattern.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
if (match.Success)
{
Console.WriteLine(match.Value);
}
else
{
Console.WriteLine("Not found :(");
}

This one makes the slash and dot parts optional, and allows the file to have any extension:
new Regex(#"^(.*/)?(?<fileName>[^/]*?)(\.[^/.]*)?$", RegexOptions.ExplicitCapture);
But I still prefer Substring(LastIndexOf(...)) because it is far more readable.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

how to build a regex with square brackets - c#

I need to build a reg-ex to find all strings between [" and ", There are multiple occurrences of both the above strings and i want all content between them. Help please? here is an example : http://pastebin.com/crFDit2N

You can use this regex: (?<=\[").*?(?=",) It uses look-behind and look-ahead positive assertions to check that the match is preceded by [" and followed by ",.

Related

c# Replace text within curly brackets, including the curly brackets

How can I return only the digits from a large string with symbols, letters and of course digits with C#

I want to strip off everything but numbers, $, comma(,)

How can I change this regular expression so grab the text before the FIRST colon and ignore the rest?

Regular expression to use which matches text before .html and after /

Categories

Resources