How to remove all text between BBCode Quotation (including BBCode itself):
[quote date=2011-07-02 14:43:53 user=test link=1]blabla[/quote]
I must add that between tags can be text with HTML tags for formating.
My current attempt looks like:
Regex regex = new Regex(#"[quote+].+?[/\+quote]");
Well it's almost working.
You may try the following regex:
#"\[quote.*\].*?\[/quote\]"
Note that you have to escape square brackets in a regex.
Since your BBCode blocks contains attributes, a simple + won't suffice to cover everything. + means to repeat the specified range of characters, in this case e.
On the top of my head, I'd try something like this:
\[quote([^\[]*)\](.*?)\[\/quote\]
Please bear in mind that I have not tested this for C#, where the syntax might be different depending on the interpreter. Also note that I've added selection groups so that you'd be able to examine the result of each expression. As #Howard answered, [ and ] are reserved symbols and consequently needs to be escaped.
Related
I have a block of text as such.
google.sbox.p50 && google.sbox.p50(["how to",[["how to tie a tie",0],["how to train your dragon 2 trailer",0],["how to do the cup song",0],["how to get a six pack in 3 minutes",0],["how to make a paper gun that shoots",0],["how to basic",0],["how to love lil wayne",0],["how to sing like your favorite artist",0],["how to be a heartbreaker marina and the diamonds",0],["how to tame a horse in minecraft",0]],{"q":"XJW--0IKH6sqOp0ME-x5B7b_5wY","j":"5","k":1}])
Using \\[([^]]+)\\] I am able to get everything I need, but with a little extra that I don't. I do not need the ["how to",[[. I only need the blocks that are formatted like,
["how to tie a tie",0]
Can someone please help me modify my expression to only get what I need? I've been at it for hours and I can't grasp the idea of RegEx.
Put both the opening and closing square brackets in the negated character class?
\\[([^][]+)\\]
\\[ matches a literal [
\\] matches a literal ]
[^][] is a negated class, which for instance matches any character except ][. It might be a little difficult to see it, but it's equivalent to [^\\]\\[]. Here the double escapes are not required because you are using a character class (just like \\. is equivalent to [.])
([^][]+) captures everything within square brackets, making sure there's no ] or [ inside.
In C#, you can use the # symbol to avoid having to double escape everytime and using this makes the regex like that:
var regex = new Regex(#"\[([^][]+)\]");
Note: This regex will capture everything within square brackets. If you wish to specificly get the format ["how to tie a tie",0], you can be more precise. After all, the regex will only match stuff you make it match:
var regex = new Regex(#"\["[^"]+",0\]");
Here, we have another negated character class: [^"]. This will match any character which is not a quote character.
This one assumes that the digit is always 0, as depicted in your sample text block. If you have multiple possibilities of numbers, you can use the character class [0-9]+:
var regex = new Regex(#"\["[^"]+",[0-9]+\]");
You can use \d+ as well, but this character class also matches other characters which may or may not render the regex worse. If you want to be more even cautious by allowing possible spaces, tabs, newlines, form feeds in between the characters, you can use this regex:
var regex = new Regex(#"\[\s*"[^"]+"\s*,\s*[0-9]+\s*\]");
Conclusion, there might be many regexes which suit what you need, just make sure you know how your data is coming through so you can pick one which has the right amount of freeway.
I think this is what you are looking for to match the format of ["how to tie a tie",0]:
(\["[^"]+",\d\])
( ) - around the whole thing so it all gets captured in this group
\[" - find ["
[^"]+ - find one or more of anything except "
", - find ",
\d - find a number, if you want more than just a single digit, do \d+
\] - match the ending ]
The only variable things in this regex are whatever is within the quotes ([^"]+) and the number (\d+).
Demo
If you don't want the square brackets in the capture group, you can do it like this:
\[("[^"]+",\d+)\]
I assume you don't want to match if there are quotes within your quotes as it would probably break whatever purpose you are using it for, but if you do, this should work:
\[("[^[\]]+",\d+)\]
You must use this pattern
#"\[[^][]+\]"
More informations about square brackets here.
I think you need this one: (\[[^\[^]+?])
What you did mis is the ? (smallest match) and exclude any [ or ]
Seemingly the text in the outer brackets is a JSON representation of an object. Instead of a regular expression I'd just:
strip off the stuff before the bracket + first bracket (google.sbox.p50 && google.sbox.p50() plus strip off the trailing bracket ). There are more ways to do this, and it can be more efficient than regex.
JSON parse the remaining inner part.
From that point you have the object representation, you can leave out the first element of the array what you don't need, plus you have everything else in a traversable form.
There's the session information at the end along with parameters anyway (in {} brackets), so in the end you may end up parsing stuff anyway. Better not to reinvent the wheel (JSON parsing).
I have a xml file. As per my requirement I need to update empty tag such as I need to change <xml></xml> to <xml/>. Is it possible to change the tags like that..
Thank you...
var xmlString="<xml></xml> <toto></toto>";
var properString=System.Text.RegularExpressions.Regex.Replace(xmlString, "<([^>]+)></[^>]+>", "<$1/>");
EDIT: explanation!
#Neil Knight has already provided, in a comment, a link to Wikipedia explaining the concept of regular expressions. The part specific to .NET is available here: .NET Framework Regular Expressions
A starting XML tag can be matched with the following regular expression: <[^>]+>. The [^>]+ part can be read as: all characters that are not ">", with at least one character (so <> is not matched but <a> is). An ending XML tag can be matched with the same kind of expression: </[^>]+> (note the slash after the first character). So the regular expression <[^>]+></[^>]+> matches empty tags such as <foo></foo> (but be careful, it also matches <foo></bar> which is not valid XML code).
What we need now is to isolate the characters between "<" and ">". For that, we use parenthesis: <([^>]+)>. This instructs the regular expression engine to capture the matched characters. Each group of parenthesis can be referred later in a replacement operation by the "$x" string (where "x" is a number: "$1" for the first matching parenthesis, "$2" for the second one, etc.).
So, with a call to Regex.Replace(xmlString, "<([^>]+)></[^>]+>", "<$1/>"), <foo></foo> will be replaced by <foo/> ("foo" characters are captured, and "$1" is replaced by them). <foo></bar> will also be replaced by <foo/>.
I hope that this explanation is enough for #Felix K. ;o)
(my English is not so good, that's why I did not provide many details)
if (someElement.innerText == string.Empty)
{
someElement.innerText = null;
}
I'm working on a new feature for a C# application that will process a text given by the user. This text can contain any character, but everything that is between braces ({}) or between brackets ([]) will be treated on a special way (basically, the text inside brackets will be replaced for another text, and the braces will indicate a subsection in the given text and will be processed differently).
So, I want to give the user the choice to use braces and brackets on his text, so the first thing I thought was to use "{{" to represent "{", and the same for all other special characters, but this will give problems. If he wants to open a subsection and wants the first character in the subsection to be "{", then he would write "{{{", but that's the same thing he would write if he would like the character before the subsection to be "{". So this causes an ambiguity.
Now I'm thinking I could use "\" to escape braces and brackets, and use "\\" to represent "\". And I'm kinda figuring out how to process this, but I got a feeling I'm trying to reinvent the wheel here. Wonder if there is a known algorithm or library that does what I'm trying to do.
Why don't you use an existing markup convention? There are plenty of lightweight syntaxes to choose from; depending on your user population, some of them might already be familiar with MediaWiki markup and/or BBcode and/or reST and/or Markdown.
Why don't you use XML tags instead of special characters?
<section>
Blah blah blah blah <replace id="some identifier" />
</section>
This approach would let you parse your text using any XML parser in Microsoft .NET and any other platform. And you'll save time because there's nothing to escape.
I'd recommend using \ to escape {} chars in the text and un-escaped {} to surround a subsection. This is how C# handles " chars in a string. Using double braces introduces ambiguities and makes correctly processing the text difficult, if not impossible. Your choice also depends on your target users. Developers are comfortable using escape chars but they can be confusing to non-dev users. You might want to use tags like <sub> and </sub> to indicate a subsection. Either way, you can use a regular expression to parse the user's text into a RegEx.Matches collection.
I have a string like:
[a b="c" d="e"]Some multi line text[/a]
Now the part d="e" is optional. I want to convert such type of string into:
<a b="c" d="e">Some multi line text</a>
The values of a b and d are constant, so I don't need to catch them. I just need the values of c, e and the text between the tags and create an equivalent xml based expression. So how to do that, because there is some optional part also.
For HTML tags, please use HTML parser.
For [a][/a], you can do like following
Match m=Regex.Match(#"[a b=""c"" d=""e""]Some multi line text[/a]",
#"\[a b=""([^""]+)"" d=""([^""]+)""\](.*?)\[/a\]",
RegexOptions.Multiline);
m.Groups[1].Value
"c"
m.Groups[2].Value
"e"
m.Groups[3].Value
"Some multi line text"
Here is Regex.Replace (I am not that prefer though)
string inputStr = #"[a b=""[[[[c]]]]"" d=""e[]""]Some multi line text[/a]";
string resultStr=Regex.Replace(inputStr,
#"\[a( b=""[^""]+"")( d=""[^""]+"")?\](.*?)\[/a\]",
#"<a$1$2>$3</a>",
RegexOptions.Multiline);
If you are actually thinking of processing (pseudo)-HTML using regexes,
don't
SO is filled with posts where regexes are proposed for HTML/XML and answers pointing out why this is a bad idea.
Suppose your multiline text ("which can be anything") contains
[a b="foo" [a b="bar"]]
a regex cannot detect this.
See the classic answer in:
RegEx match open tags except XHTML self-contained tags
which has:
I think it's time for me to quit the
post of Assistant Don't Parse HTML
With Regex Officer. No matter how many
times we say it, they won't stop
coming every day... every hour even.
It is a lost cause, which someone else
can fight for a bit. So go on, parse
HTML with regex, if you must. It's
only broken code, not life and death.
– bobince
Seriously. Find an XML or HTML DOM and populate it with your data. Then serialize it. That will take care of all the problems you don't even know you have got.
Would some multiline text include [ and ]? If not, you can just replace [ with < and ] with > using string.replace - no need of regex.
Update:
If it can be anything but [/a], you can replace
^\[a([^\]]+)](.*?)\[/a]$
with
<a$1>$2</a>
I haven't escaped ] and / in the regex - escape them if necessary to get
^\[a([^\]]+)\](.*?)\[\/a\]$
I am using this regex:
[Blah(?:\s*)\]
I want to strip out the tag that looks like:
[Blah:http:..anyting goes here so catch all types of characters ]
Any tips on what's wrong with my regex?
A regex of \[Blah[^\]]*\] is the usual way. It means:
literal string [Blah
zero or more:
characters that aren't ]
literal string ]
If you want to handle nesting (e.g. input of the form [a[b[c]]]), then you need something other than regex (this is one reason why trying to use regex to parse HTML doesn't work).
Your regex [Blah(?:\s*)\] starts with an unescaped '[' which is "seen" as the start of a character class. That's what's wrong with your regex (there are probably more errors, but that one is the main reason).
Try changing it to \[Blah[^\]]*\] or \[Blah.*?\]. They should give the same result, but there might be a difference in their performance.