Error in using regex (verifying the code for regex) - c#

I have this piece of code
string myText = new TextRange(mainWindow.richtextbox2.Document.ContentStart,
mainWindow.richtextbox2.Document.ContentEnd).Text;
//replace two or more consecutive spaces with a single space, and
//replace two or more consecutive newlines with a single newline.
var str = Regex.Replace(myText, #"( |\r?\n)\1+", "$1", RegexOptions.Multiline);
mainWindow.Dispatcher.Invoke(new Action(() =>
mainWindow.richtextbox2.Document.Blocks.Add(new Paragraph(new
Run("Hello")))));
This is already working but the spacing still remains in between text sent.
how can I fix it or update my richtextbox? I am trying to eliminate the spacing in displaying a text to a richtextbox as shown
I want to show :
Hello
Hello
Hello
without the multiple newline or spacing.

Document is not of type string.
EDIT
string myText = new TextRange(richtextbox2.Document.ContentStart, richtextbox2.Document.ContentEnd).Text;
//replace two or more consecutive spaces with a single space, and
//replace two or more consecutive newlines with a single newline.
var str = Regex.Replace(myText, #"( |\r?\n)\1+", "$1", RegexOptions.Multiline);

Related

My Regex.Split with '\n' takes up two spaces instead of 1

I need to split my text into each word, space, and new line.
Although the words and spaces are properly working, the \n is taking up two spaces only if it's not after a word.
Example: "\nTest\nword", here, the first \n takes up two spaces while the second one takes up one.
How would I write the proper regex?
My code:
string delimiterChars = "([ \r\n])";
wordArray = Regex.Split(myTexy, delimiterChars);
For context, I am using Unity.
Input: enter image description here
Output: enter image description here
On the output of the picture: The first element is empty and the second is \n here. I don't want the empty element.
Regex.Split will always produce empty items where the matches are consecutive, or when they are at the start/end of string.
Instead, you can use a matching and extracting approach:
string delimiterChars = "[^ \r\n]+|[ \r\n]";
string[] wordArray = Regex.Matches(myTexy, delimiterChars)
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
The [^ \r\n]+|[ \r\n] regex matches one or more chars other than a space, CR and LF, or a space, CR or an LF char.
You can use regular expressions to remove leading delimiter characters.
var myTexy = "\nTest\nword";
string delimiterChars = "([ \r\n])";
myTexy = Regex.Replace(myTexy, "^" + delimiterChars, "");
var wordArray = Regex.Split(myTexy, delimiterChars);
The "^" regex option says only look for these characters at the beginning of the string.
Also, just so you are aware the behavior you are seeing is intended and is documented here:
If a match is found at the beginning or the end of the input string,
an empty string is included at the beginning or the end of the
returned array.
Let me know if this is what you are looking for -
String text = "\nTest\nword";
string[] words = Regex.Split(text, #"(\n+)");
Output -
Try this :-
string myStr = "This is test text";
wordArray = myStr.Split(new char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
Output:

Find hashtags in string

I am working on a Xamarin.Forms PCL project in C# and would like to detect all the hashtags.
I tried splitting at spaces and checking if the word begins with an # but the problem is if the post contains two spaces like "Hello #World Test" it would lose that the double space
string body = "Example string with a #hashtag in it";
string newbody = "";
foreach (var word in body.Split(' '))
{
if (word.StartsWith("#"))
newbody += "[" + word + "]";
newbody += word;
}
Goal output:
Example string with a [#hashtag] in it
I also only want it to have A-Z a-z 0-9 and _ stopping at any other character
Test #H3ll0_W0rld$%Test => Test [#H3ll0_W0rld]$%Test
Other Stack questions try to detect the string and extract it, I would like it work with it and put it back in the string without losing anything that methods such as splitting by certain characters would lose.
You can use Regex with #\w+ and $&
Explanation
# matches the character # literally (case sensitive)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$& Includes a copy of the entire match in the replacement string.
Example
var input = "asdads sdfdsf #burgers, #rabbits dsfsdfds #sdf #dfgdfg";
var regex = new Regex(#"#\w+");
var matches = regex.Matches(input);
foreach (var match in matches)
{
Console.WriteLine(match);
}
or
var result = regex.Replace(input, "[$&]" );
Console.WriteLine(result);
Ouput
#burgers
#rabbits
#sdf
#dfgdfg
asdads sdfdsf [#burgers], [#rabbits] dsfsdfds [#sdf] [#dfgdfg]
Updated Demo here
Another Example
Use a regular expression: \#\w*
string pattern = "\#\w*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);

How to remove text between multiple pairs of brackets?

I would like to remove text contained between each of multiple pairs of brackets. The code below works fine if there is only ONE pair of brackets within the string:
var text = "This (remove me) works fine!";
// Remove text between brackets.
text = Regex.Replace(text, #"\(.*\)", "");
// Remove extra spaces.
text = Regex.Replace(text, #"\s+", " ");
Console.WriteLine(text);
This works fine!
However, if there are MULTIPLE sets of brackets contained within the string too much text is removed. The Regex expression removes all text between the FIRST opening bracket and LAST closing bracket.
var text = "This is (remove me) not (remove me) a problem!";
// Remove text between brackets.
text = Regex.Replace(text, #"\(.*\)", "");
// Remove extra spaces.
text = Regex.Replace(text, #"\s+", " ");
Console.WriteLine(text);
This is a problem!
I'm stumped - I'm sure there's a simple solution, but I'm out of ideas...
Help most welcome!
You have two main possibilities:
change .* to .*? i.e. match as few as possible and thus match ) as early as possible:
text = Regex.Replace(text, #"\(.*?\)", "");
text = Regex.Replace(text, #"\s{2,}", " "); // let's exclude trivial replaces
change .* to [^)]* i.e. match any symbols except ):
text = Regex.Replace(text, #"\([^)]*\)", "");
text = Regex.Replace(text, #"\s{2,}", " ");
working example in c#, this will handle curly braces "{", so result will be.. {{pc_mem_kc}}
string str = "{{pc_mem_kc}} of members were health (test message)";
var pattern = #"\{.*?\}}";
var data11 = Regex.Matches(str, pattern, RegexOptions.IgnoreCase);

Searching for a RegEx to split a text in it words

I am searching for a RegularExpression to split a text in it words.
I have tested
Regex.Split(text, #"\s+")
But this gives me for example for
this (is a) text. and
this
(is
a)
text
and
But I search for a solution, that gives me only the words - without the (, ), . etc.
It should also split a text like
end.begin
in two words.
Try this:
Regex.Split(text, #"\W+")
\W is the counterpart to \w, which means alpha-numeric.
You're probably better off matching the words rather than splitting.
If you use Split (with \W as Regexident suggested), then you could get an extra string at the beginning and end. For example, the input string (a b) would give you four outputs: "", "a", "b", and another "", because you're using the ( and ) as separators.
What you probably want to do is just match the words. You can do that like this:
Regex.Matches(text, "\\w+").Cast<Match>().Select(match => match.Value)
Then you'll get just the words, and no extra empty strings at the beginning and end.
You can do:
var text = "this (is a) text. and";
// to replace unwanted characters with space
text = System.Text.RegularExpressions.Regex.Replace(text, "[(),.]", " ");
// to split the text with SPACE delimiter
var splitted = text.Split(null as char[], StringSplitOptions.RemoveEmptyEntries);
foreach (var token in splitted)
{
Console.WriteLine(token);
}
See this Demo

How to remove extra returns and spaces in a string by regex?

I convert a HTML code to plain text.But there are many extra returns and spaces.How to remove them?
string new_string = Regex.Replace(orig_string, #"\s", "") will remove all whitespace
string new_string = Regex.Replace(orig_string, #"\s+", " ") will just collapse multiple whitespaces into one
I'm assuming that you want to
find two or more consecutive spaces and replace them with a single space, and
find two or more consecutive newlines and replace them with a single newline.
If that's correct, then you could use
resultString = Regex.Replace(subjectString, #"( |\r?\n)\1+", "$1");
This keeps the original "type" of whitespace intact and also preserves Windows line endings correctly. If you also want to "condense" multiple tabs into one, use
resultString = Regex.Replace(subjectString, #"( |\t|\r?\n)\1+", "$1");
To condense a string of newlines and spaces (any number of each) into a single newline, use
resultString = Regex.Replace(subjectString, #"(?:(?:\r?\n)+ +){2,}", #"\n");
I used a lot of algorithm for that. Every loop was good but this was clear and absolute.
//define what you want to remove as char
char tb = (char)9; //Tab char ascii code
spc = (char)32; //space char ascii code
nwln = (char)10; //New line char ascii char
yourstring.Replace(tb,"");
yourstring.Replace(spc,"");
yourstring.Replace(nwln,"");
//by defining chars, result was better.
You can use Trim() to remove the spaces and returns. In HTML the spaces is not important so you can omit them by using the Trim() method in System.String class.

Categories