Regular expression for removing one space - c#

What is the regular expression for removing ONE space?
e.g:
H e l l o W o r l d ----> Hello World
(Notice that there's still one space in between Hello World. It has two space in between to begin with)
FYI, I'm working with C# regex:
Previously I did something like this, but it doesn't work properly for the above case:
Regex pattern = new Regex(#"[ ]{2,}");
pattern.Replace(content, #" ")

To remove one space from all groups of one or spaces, use
pattern = Regex.Replace(content, " ( *)", "$1");
To change n spaces to floor(n/2) spaces, use
pattern = Regex.Replace(content, " ( ?)", "$1");
I tried to add examples but stackoverflow consolidates whitespace even in inline code spans it seems.
Explanation, as requested: The first finds a space followed by zero or more spaces and replaces it with the zero or more spaces, reducing the length by 1. The second finds each group of one or two spaces and replaces it by zero or one spaces, changing 1 to 0 in one replacement, 2 to 1 in one replacement, 3 to 2 in two replacements, etc.

Try using a negative look ahead.
Regex pattern = new Regex(#"\s(?!\s)");
Console.WriteLine(pattern.Replace(content, ""))

If I understand you want to remove exactly one space for each occurrence of one or more consecutive spaces.
For that you need to create a regex which matches each such occurrence putting all but one of the spaces into a capturing group and then replace each occurrence with capturing group. So if there are 2 spaces next to each other, they're found as one match and the second space goes in the capturing group. So after the replacement two spaces have been reduced to one space.
Regex pattern = new Regex(#" ( *)");
String newString = pattern.Replace("H e l l o W o r l d", "$1");
// newString == "Hello World"

Another way would be to use a match evaluator to use a two character match:
string s = Regex.Replace("H e l l o W o r l d", #"\s\S", x => x.Value[1].ToString());

Matches a whitespace followed by a non-whitespace and removes those whitespaces.
Regex.Replace(input, #"(\s)(\S)", #"$2");
It kind of looks like it's a string with an added space after each character. If it's so then you could get the original string by retrieving only the even indexed characters in the string.
var input = "H e l l o W o r l d ";
var res = String.Join("", input.Where((c, i) => (i % 2) == 0));

By no means a general solution given the nature of your problem, but in this particular case it looks as though you could get away with removing spaces that touch a word character on either side: on the left, for example:
Regex.Replace(content, " \b", "");

The expression "[\s]{1}" should do it

Related

RegEx string between N and (N+1)th Occurance

I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;
If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.
Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2
Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];

Can I use the same substring as part of different captures?

I want to create a function that will allow me to convert CamelCase to Title Case. This seems like a good task for regular expressions, but I am not committed to using regular expressions, if you have a better solution.
Here is my first attempt that works in most cases, but there are some issues I will get to in a few lines:
private static Regex camelSplitRegex = new Regex(#"(\S)([A-Z])");
private static String camelReplacement = "$1 $2";
public String SplitCamel(String text){
return camelSplitRegex.Replace(text, camelReplacement);
}
The regex pattern looks for a non-whitespace character (1st capture) followed by a capital letter (2nd capture). In the function, Regex.Replace is used to insert a space between the 1st and 2nd captures.
This works fine for many examples:
SplitCamel("privateField") returns "private Field"
SplitCamel("PublicMethod") returns "Public Method"
SplitCamel(" LeadingSpace") returns " Leading Space" without inserting an extra space before "Leading", as desired.
The problem I have is when dealing with multiple consecutive capital letters.
SplitCamel("NASA") returns "N AS A" not "N A S A"
SplitCamel("C3PO") returns "C3 PO" not "C3 P O"
SplitCamel("CAPS LOCK FEVER") returns "C AP S L OC K F EV E R" not "C A P S L O C K F E V E R"
In these cases, I believe the issue is that each capital letter is only being captured as either \S or [A-Z], but cannot be \S on one match and [A-Z] on the next match.
My main question is, "Does the .NET regex engine has some way of supporting the same substring being used as different captures on consecutive matches?" Secondarily, is there a better way of splitting camel case?
private static Regex camelSplitRegex = new Regex(#"(?<=\w)(?=[A-Z])");
private static String camelReplacement = " ";
does the job.
The problem with your pattern is that when you have the string "ABCD", \S matches A and ([A-Z]) matches B and you obtain "A BCD", but for the next replacement B is already consumed by the pattern and can't be used any more.
The way is to use lookarounds (a lookbehind (?<=...) and a lookahead (?=...)) that don't consume characters, they are only tests for the current position in the string, that's why you don't need any reference in the replacement string, you only need to put a space at the current position.
The \w character class contains unicode letters, unicode digits and the underscore. If you want to restrict the search to ASCII digits and letters, use [0-9a-zA-Z] instead.
To be more precise:
for unicode, use (?<=[\p{L}\p{N}])(?=\p{Lu}) that works with accented letters and other alphabets and digits.
for ASCII use (?<=[a-zA-Z0-9])(?=[A-Z])
Here's a non-regular expression way to do that.
public static string SplitCamel(this string stuff)
{
var builder = new StringBuilder();
char? prev = null;
foreach (char c in stuff)
{
if (prev.HasValue && !char.IsWhiteSpace(prev.Value) && 'A' <= c && c <= 'Z')
builder.Append(' ');
builder.Append(c);
prev = c;
}
return builder.ToString();
}
The following
Console.WriteLine("'{0}'", "privateField".SplitCamel());
Console.WriteLine("'{0}'", "PublicMethod".SplitCamel());
Console.WriteLine("'{0}'", " LeadingSpace".SplitCamel());
Console.WriteLine("'{0}'", "NASA".SplitCamel());
Console.WriteLine("'{0}'", "C3PO".SplitCamel());
Console.WriteLine("'{0}'", "CAPS LOCK FEVER".SplitCamel());
Prints
'private Field'
'Public Method'
' Leading Space'
'N A S A'
'C3 P O'
'C A P S L O C K F E V E R'
please consider switching to the value type string instead of the string class. Update to this.
private static Regex camelSplitRegex = new Regex(#"(^\S)?([A-Z])");

Regex to get first 6 and last 4 characters of a string

I would like to use regex instead of string.replace() to get the first 6 chars of a string and the last 4 chars of the same string and substitute it with another character: & for example. The string is always with 16 chars. Im doing some research but i never worked with regex before. Thanks
If you prefer to use regular expression, you could use the following. The dot . will match any character except a newline sequence, so you can specify {n} to match exactly n times and use beginning/end of string anchors.
String r = Regex.Replace("123456foobar7890", #"^.{6}|.{4}$",
m => new string('&', m.ToString().Length));
Console.WriteLine(r); //=> "&&&&&&foobar&&&&"
If you want to invert the logic, replacing the middle portion of your string you can use Positive Lookbehind.
String r = Regex.Replace("123456foobar7890", #"(?<=^.{6}).{6}",
m => new string('&', m.ToString().Length));
Console.WriteLine(r); //=> "123456&&&&&&7890"

Regex to remove carriage return followed by space

Relatively new to regex, but hoping someone can help. While I've seen loads of examples on how to remove certain characters or combinations of characters, I can't seem to get the following to work for me.
I have a file with the following lines:
a b c
d
ef
g h
i
What I need is to end up with a string that removes the exact occurrance of newline and space (and only that), so the result would be
a b c
d
efg h
i
Right now I have
string contents = File.ReadAllText("input.text");
string result = Regex.Replace(contents,#"[\n \r]\ ","");
Console.WriteLine(result);
but that only removes the space in front of the g h line, instead of also combining it with the previous line.
What am I doing wrong?
string text = Regex.Replace( contents, #"(\r|\n)+^ +", "" , RegexOptions.None | RegexOptions.Multiline );
Mine reads as:
One or more matches of \r or \n (new line characters) --> "(\r|\n)+"
followed by the beginning of a line --> "^"
followed by one or more spaces --> " +"
Try using:
string result = Regex.Replace(contents,#"(?s)(?:(?:\r|\n)+ +)","");
(?s) Is to utilise singlie line mode.

Extracting one word based on special character using Regular Expression in C#

I am not very good at regular expression but want to do some thing like this :
string="c test123 d split"
I want to split the word based on "c" and "d". this can be any word which i already have. The string will be given by the user. i want "test123" and "split" as my output. and there can be any number of words i.e "c test123 d split e new" etc. c d e i have already with me. I want just the next word after that word i.e after c i have test123 and after d i have split and after e i have new so i need test123 and split and new. how can I do this??? And one more thing I will pass just c first than d and than e. not together all of them. I tried
string strSearchWord="c ";
Regex testRegex1 = new
Regex(strSearchWord);
List lstValues =
testRegex1.Split("c test123 d
split").ToList();
But it's working only for last character i.e for d it's giving the last word but for c it includes test123 d split.
How shall I do this???
The input might be
string strSearchWord="c mytest1 d newtest1 e lasttest1";
split should be based on characters "c d and e". I will pass them one by one.
or
string strSearchword="q 100 p 200 t 2000";
split should be based on characters "q p and t". I will pass them one by one.
or
string strSearchWord="t 100 r pass";
split should be based on characters "t r". I will pass them one by one.
or
string strSeaRCHwORD="fi 100 se 2000 td 500 ft 200 fv 6000 lt thanks ";
split should be based on characters "fi,se,td,ft,fv and lt". I will pass them one by one.
Hope it's clear. Any other specification????
string[] splitArray = null;
splitArray = Regex.Split(subjectString, #"\s*\b(c|d)\b\s*");
will split the string along the "words" c or d, whether or not they are surrounded by whitespace, but only if they occur as entire words (therefore the \b word boundary anchors).
This gives you all the substrings between your words as an array.
If you want to loop through the string manually, picking out each word after the search words one by one, you could use positive lookbehind:
string resultString = null;
resultString = Regex.Match(subjectString, #"(?<=\bc\b\s*)\w+").Value;
will find the word after c. Do the same for d ((?<=\bd\b\s*)\w+) etc.
This regex means:
(?<=\bc\b\s*): Assert that it is possible to match the "complete word" c, optionally followed by space characters, to the left of the current position in the string (positive lookbehind).
\w+: Then match any alphanumeric characters (including _) that follow.
use regex groups.
the regex would be
"c(.+?)d(.+?)"
and you would retrieve it as
Regex r = new Regex("c\s(.+?)\sd\s(.+?)"); // \s is whitespace
r.Match("c test123 d split").Groups[1] //is the 1st group "test123"
r.Match("c test123 d split").Groups[2] //is the 2nd group "split"
r.Match("c test123 d split").Groups[0] //is the whole match "c test123 d split

Categories