delete extra text and punctuation marks from the string keeping just smileys? - c#

I am running into some problems using the regular expression. Can you please help me out? The following in the problem I am trying to solve -
Input: :,... :D..:::))How are you today :P?..:(*
Output :D :) :P :(
Basically I want to remove the punctuations and text from the input string like-(.,:; etc) and replace them with empty string. But I want to keep the smilies -:) ,:( OR :P .I have written the following code but it is not working.
Regex= "[A-Za-z]|:[D(P(]"
but it also remove the ":D and :P" smilie.

The following regex string should work for you:
(((?<!:)[^:])|(:(?![PD\(\)])))[^:]*
It's made up of two parts:
( ((?<!:)[^:]) | (:(?![PD\(\)])) )
[^:]*
The first part is an OR (|) statement that uses Negative Lookahead and Lookbehind. It finds the first character in a block of text that doesn't contain a smiley by looking for either:
A character that is obviously not in a smiley:
Any character that is not preceded by a colon: (?<!:)
and is not a colon itself: [^:]
OR a colon that is not followed by a smiley character:
A colon :
That is not followed by a character that is the second half of a smiley: (?![PD\(\)]))
The second part ([^:]*) continues looking until we find the beginning of a potential smiley (a colon).
This Regex currently only finds the following smileys:
:D
:P
:(
:)
You can update the second half of the OR statement to find other smileys.
To sum it up, this Regex should find everything that is not part of a smiley. You can simply declare it in a Regex variable and then call .Replace(string input, string replacement), passing in your input string and the string you want to replace the non-smiley characters with (String.Empty in this case).

Not so perfect solution:
string text = ":,... :D..:::))How are you today :P?..:(*";
text = text.Replace(":)", "###)");
text = text.Replace(":D", "###D");
text = text.Replace(":P", "###P");
// clean up your punctuation marks here
//
text = text.Replace("###)", ":)");
text = text.Replace("###D", ":D");
text = text.Replace("###P", ":P");

Related

Regex to extract characters following 2 forward slashes but ignore if both forward slashes (and following text) are between double quotes

I am working with some string manipulation where I have a multiline text and have to extract text that follows //, but if // (and text) are between double quotes then match should not happen. A sample of the text I am working with is below:
This a line // tester 7897
//Ola
asdfasdf
//554654
Open("asd//Not this")
From the above text I'm expecting the intended Regex to return me the following matches
// tester 7897
//Ola
//554654
I have tried quite a few options but the following regex (with Regex Options Multiline) is the closest I have got to is following one:
(//).+
This gives me all matches following // and that includes //Not this from the last line (which I don't want).
I don't have a lot of experience using Regex. Any help will be greatly appreciated.
Try this.
^(?:[^\"\n]|\\\")*(?:(?<!\\)\"(?:[^\"\n]|\\\")*(?:(?<!\\)\")(?:[^\"\n]|\\\")*)*((?:\/\/).*)
Here is the link for regex101.
Here is the reference.
I do not think this is possible. You want to match everything after // only if it is not surrounded by "". That would require negative lookaround, and you cannot use negative lookbehind in this case where the number of characters between the first " and // is not fixed.

Regex - find matches not contained within pattern

I would like to use a regular expression to match all occurrences of a phrase where it's not contained within some delimiting characters. I tried putting one together but had some difficulty with the negative lookaheads.
My search phrase is "my phrase". The start delimiter tag is [[ and the end delimiter tag is ]]. The string I'd like to search is:
Here is a sentence with my phrase, here's another part which I don't want to match on [[my phrase]]. I would like to find this occurrence of my phrase.
From this string I would expect to find all occurrences of "my phrase" except the one contained within [[ ]].
I hope that makes sense, thanks in advance for any guidance.
[^#]my phrase[^#]
I have knocked up a RegEx that will do what you ask, this can be seen here.
Literally just escaping out # as a character and allowing any other character to be returned. You can return the index of these results but remember to strip off the first and last character of the string.
Note: This will not pick up any "my phrase" that end the sentence without a character following it
Edit - Seeing as you changed the scope while I was writing this answer,
here is the RegEx for the other delimiter:
[^[[]my phrase[^\]\]]
(?<=[^\[])my phrase(?=[^\]]*)
This will also elliminate the trailing punctuation marks.

Match regex pattern in a line of text without targeting the text within quotations

Stackoverflow has been very generous with answers to my regex questions so far, but with this one I'm blanking on what to do and just can't seem to find it answered here.
So I'm parsing a string, let's say for example's sake, a line of VB-esque code like either of the following:
Call Function ( "Str ing 1 ", "String 2" , " String 3 ", 1000 ) As Integer
Dim x = "This string should not be affected "
I'm trying to parse the text in order to eliminate all leading spaces, trailing spaces, and extra internal spaces (when two "words/chunks" are separated with two or more space or when there is one or more spaces between a character and a parentheses) using regex in C#. The result after parsing the above should look like:
Call Function("Str ing 1 ", "String 2", " String 3 ", 1000) As Integer
Dim x = "This string should not be affected "
The issue I'm running into is that, I want to parse all of the line except any text contained within quotation marks (i.e. a string). Basically if there are extra spaces or whatever inside a string, I want to assume that it was intended and move on without changing the string at all, but if there are extra spaces in the line text outside of the quotation marks, I want to parse and adjust that accordingly.
So far I have the following regex which does all of the parsing I mentioned above, the only issue is it will affect the contents of strings just like any other part of the line:
var rx = new Regex(#"\A\s+|(?<=\s)\s+|(?<=.)\s+(?=\()|(?<=\()\s+(?=.)|(?<=.)\s+(?=\))|\s+\z")
.
.
.
lineOfText = rx.Replace(lineOfText, String.Empty);
Anyone have any idea how I can approach this, or know of a past question answering this that I couldn't find? Thank you!
Since you are reading the file line by line, you can use the following fix:
("[^"]*(?:""[^"]*)*")|^\s+|(?<=\s)\s+|(?<=\w)\s+(?=\()|(?<=\()\s+(?=\w)|(?<=\w)\s+(?=\))|\s+$
Replace the matched text with $1 to restore the captured string literals that were captured with ("[^"]*(?:""[^"]*)*").
See demo

Regular Expression to Replace Unwanted Letters

I wrote a small program in C# to Capture ingame Text.
My issue is that the Text allso containts Collor Codes which i try to not to have. I read about the function Regex.Replace
Which i think is going to suite for that.
I have Following String (Line) i want to clear i used the small little tool espresso to play a little bit with regular expression but i never figured it really out.
This is the String i am going to work with:
|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R
I try to use ^|( [a-zA-Z0-9]{9})
which gave me theese matches
c001177ff
cff00AA00
cff00AA00
cff00AA00
cffff69b4
cff00AA00
cff40e0d0
cffffff00
cffffff00
cff40e0d0
cffff69b4
cff00AA00
Well i am not good at regex more likly i just started it. I don't want any body to present me completed solution (you are more than welcome to do that) at least a little help how i can solve that issue. I want to filter the Text.
Inpute Code
|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R
Should be Filtered to this
Save Code = AGQg R9$# 4fR
I think theese are Hexadecimal Color Codes the |c marks the beginning and the |r the End of the string.I think the |r | is just used to indicate that the first color string ends than we get an SPACE and the | indicates the next start.
How about a simple Linq?
var output = String.Join("", input.Split('|')
.Select(s => s.Length != 10 ? ' ' : s.Last()))
.Trim();
So I think the problem you were having was not escaping your |... the following regex works for me:
var replaced = Regex.Replace(intput, #"\|c[0-9a-zA-Z]{8}|\|r", "");
\|c[0-9a-zA-Z]{8} - match starting with "|c" and then any 8 letters or numbers
| - or
\|r - match "|r"
You're on the right track. Your regex
^|( [a-zA-Z0-9]{9})
Both forces the match to be only at the start of your input string, due to the ^ start-of-line anchor, and the | needs to be escaped, because unescaped, it's a special "or" operator, which completely changes the meaning of your regex.
In addition, the space after the | is undesired, and the capture group is unnecessary, as you only want to eliminate this portion.
If you replace all instances of this
\|[a-zA-z0-9]{9}
with nothing (the empty string)
You will achieve most of your goal. Try it here: http://regex101.com/r/rF6yB6/1
But it seems you really want to eliminate not just nine characters after the pipe, but up through nine characters. So use the {1,9} range quantifier instead:
\|[a-zA-z0-9]{1,9}
Try it: http://regex101.com/r/rF6yB6/2
This seems to achieve your goal exactly.
Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference.
string input = "[The example input from your question]";
string output = input.Replace("|r", "");
while (output.Contains("|c"))
output = output.Remove(output.IndexOf("|c"), 10);
// output = "Save Code = AGQg R9$# 4fR"
I like this much more than using Regexes just because it's so much more clear to me.
var str1 = "|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R"
var str2 = Regex.Replace(str,#"\|(r|[a-zA-Z0-9]{9})","") //"Save Code = AGQg R9$# 4fR"
In addition to this answer re: escaping the "pipe" character, you're starting your regex with the caret (^) character. This matches the beginning of a line.
A correct regex would be:
\|c[0-9a-zA-Z]{8}
This regex should match all of the characters you want to remove:
([|]c([0-9]|[a-f]|[A-F]){8})|[|]r
Here's the breakdown...
The vertical pipe is an OR marker, so to search for it, place it in square brackets [ and ].
The parenthesis makes a set. So you're searching for ([|]c([0-9]|[a-f]|[A-F]){8}) OR [|]r which is all of your color codes OR |r.
Breakdown of the color codes is the set that begins with |c and is followed by the set of exactly 8 characters that can be 0 though 9 or a through f or A through F.
I tested it at RegexPal.com.

Regex Replace in between

I have been trying real hard understanding regular expression,
Is there any way I can replace character(s) that is between two strings/
For example
I have
sometextREPLACEsomeothertext
I want to replace , REPLACE (which can be anything in real work) ONLY between sometext and someothertext with other string.
Can anyone please help me with this.
EDIT
Suppose, my input string is
sometext_REPLACE_someotherText_something_REPLACE_nothing
I want to replace REPLACE text in between sometext and someotherText
resulting following output
sometext_THISISREPLACED_someotherText_something_REPLACE_nothing
Thank you
If I understand your question correctly you might want to use lookahead and lookbehind for your regular expression
(?<=...) # matches a positive look behind
(?=...) # matches a positive look ahead
Thus
(?<=sometext)(\w+?)(?=someothertext)
would match any 'word' with at least 1 character following 'sometext' and followed by 'someothertext'
In C#:
result = Regex.Replace(subject, #"(?<=sometext)(\w+?)(?=someothertext)", "REPLACE");
This is the regex to test if the string is valid.
\^.REPLACE.\
C# replace
string s = "sdfsdfREPLACEdhfsdg";
string v = s.Replace("REPLACE", "SOMETEXT");

Categories