C# Replacing Multiple Spaces with 1 space leaving special characters intact - c#

Having a bit of a problem as I have to translate a string into a table. I'd like to remove multiple spaces, but not all of them. So the data in text comes back with lots of spaces in between like so:
SESSIONNAME USERNAME ID STATE TYPE DEVICE\r\n
services 0 Disc \r\n
console 1 Conn \r\n
alinav 2 Disc \r\n
rdp-tcp 65536 Listen \r\n
I would like to still keep the \r\n\ values that will define my rows, and I want to keep the empty value which would be legit under the columns, and I want to keep the spaces to define the columns. But I want to remove the extra spaces that I don't want to be fed into the values.
I've tried:
output = Regex.Replace(output, #"\s{2,}", " ", RegexOptions.Multiline);
output = output.Replace(" ", " ");
But the first one just removes everything (things I need and don't need). And the second one still leaves too many spaces.
Thanks.

You can do two things:
Use space explicitly in the regular expression, \s includes weird characters like (\n, \r, \t,...) as well, thus:
output = Regex.Replace(output, #" +", " ", RegexOptions.Multiline);
Or apply the second method until convergence:
string s2 = output;
do {
output = s2;
s2 = s2.Replace(" "," ");
} while(output != s2);
In most cases the first method will outperform the second one. This because the first method groups all substrings with two or more spaces. Regexes are in general a bit slower than simple string replacement, but if the string contains sequences with many spaces, the first method will be faster.

In your example the data is delimited by position, not by characters; is that correct? If so, you should extract by position; something like:
foreach (string s in output.Split())
{
var sessionName = s.Substring(0, 18).Trim();
var userName = s.Substring(18, 19).Trim();
var id = Int32.Parse(s.Substring(37, 8).Trim());
var whateverType = s.Substring(45, 12).Trim();
var device = s.Substring(57, 6).Trim();
}
Of course you need to do proper error checking, and should probably put the field widths in an array and calculate positions instead of hard-coding them as I have shown.

Related

How can I remove the spaces that appear between the words even after splitting the string? [duplicate]

I have the following input:
string txt = " i am a string "
I want to remove space from start of starting and end from a string.
The result should be: "i am a string"
How can I do this in c#?
String.Trim
Removes all leading and trailing white-space characters from the current String object.
Usage:
txt = txt.Trim();
If this isn't working then it highly likely that the "spaces" aren't spaces but some other non printing or white space character, possibly tabs. In this case you need to use the String.Trim method which takes an array of characters:
char[] charsToTrim = { ' ', '\t' };
string result = txt.Trim(charsToTrim);
Source
You can add to this list as and when you come across more space like characters that are in your input data. Storing this list of characters in your database or configuration file would also mean that you don't have to rebuild your application each time you come across a new character to check for.
NOTE
As of .NET 4 .Trim() removes any character that Char.IsWhiteSpace returns true for so it should work for most cases you come across. Given this, it's probably not a good idea to replace this call with the one that takes a list of characters you have to maintain.
It would be better to call the default .Trim() and then call the method with your list of characters.
You can use:
String.TrimStart - Removes all leading occurrences of a set of characters specified in an array from the current String object.
String.TrimEnd - Removes all trailing occurrences of a set of characters specified in an array from the current String object.
String.Trim - combination of the two functions above
Usage:
string txt = " i am a string ";
char[] charsToTrim = { ' ' };
txt = txt.Trim(charsToTrim)); // txt = "i am a string"
EDIT:
txt = txt.Replace(" ", ""); // txt = "iamastring"
I really don't understand some of the hoops the other answers are jumping through.
var myString = " this is my String ";
var newstring = myString.Trim(); // results in "this is my String"
var noSpaceString = myString.Replace(" ", ""); // results in "thisismyString";
It's not rocket science.
txt = txt.Trim();
Or you can split your string to string array, splitting by space and then add every item of string array to empty string.
May be this is not the best and fastest method, but you can try, if other answer aren't what you whant.
text.Trim() is to be used
string txt = " i am a string ";
txt = txt.Trim();
Use the Trim method.
static void Main()
{
// A.
// Example strings with multiple whitespaces.
string s1 = "He saw a cute\tdog.";
string s2 = "There\n\twas another sentence.";
// B.
// Create the Regex.
Regex r = new Regex(#"\s+");
// C.
// Strip multiple spaces.
string s3 = r.Replace(s1, #" ");
Console.WriteLine(s3);
// D.
// Strip multiple spaces.
string s4 = r.Replace(s2, #" ");
Console.WriteLine(s4);
Console.ReadLine();
}
OUTPUT:
He saw a cute dog.
There was another sentence.
He saw a cute dog.
You Can Use
string txt = " i am a string ";
txt = txt.TrimStart().TrimEnd();
Output is "i am a string"

using Regex to iterate over a string and search for 3 consecutive hyphens and replace it with [space][hyphen][space]

I currently have a string which looks like this when it is returned :
//This is the url string
// the-great-debate---toilet-paper-over-or-under-the-roll
string name = string.Format("{0}",url);
name = Regex.Replace(name, "-", " ");
And when I perform the following Regex operation it becomes like this :
the great debate toilet paper over or under the roll
However, like I mentioned in the question, I want to be able to apply regex to the url string so that I have the following output:-
the great debate - toilet paper over or under the roll
I would really appreciate any assistance.
[EDIT] However, not all the strings look like this, some of them just have a single hyphen so the above method work
world-water-day-2016
and it changes to
world water day 2016
but for this one:
the-great-debate---toilet-paper-over-or-under-the-roll
I need a way to check if the string has 3 hyphens than replace those 3 hyphens with [space][hyphen][space]. And than replace all the remaining single hyphens between the words with space.
First of all, there is always a very naive solution to this kind of problem: you replace your specific matches in context with some chars that are not usually used in the current environment and after replacing generic substrings you may replace the temporary substrings with the necessary exception.
var name = url.Replace("---", "[ \uFFFD ]").Replace("-", " ").Replace("[ \uFFFD ]", " - ");
You may also use a regex based replacement that matches either a 3-hyphen substring capturing it, or just match a single hyphen, and then check if Group 1 matched inside a match evaluator (the third parameter to Regex.Replace can be a Match evaluator method).
It will look like
var name = Regex.Replace(url, #"(---)|-", m => m.Groups[1].Success ? " - " : " ");
See the C# demo.
So, when (---) part matches, the 3 hyphens are put into Group 1 and the .Success property is set to true. Thus, m => m.Groups[1].Success ? " - " : " " replaces 3 hyphens with space+-+space and 1 hyphen (that may be actually 1 of the 2 consecutive hyphens) with a space.
Here's a solution using LINQ rather than Regex:
var str = "the-great-debate---toilet-paper-over-or-under-the-roll";
var result = str.Split(new string[] {"---"}, StringSplitOptions.None)
.Select(s => s.Replace("-", " "))
.Aggregate((c,n) => $"{c} - {n}");
// result = "the great debate - toilet paper over or under the roll"
Split the string up based on the ---, then remove hyphens from each substring, then join them back together.
The easy way:
name = Regex.Replace(name, "\b-|-\b", " ");
The show-off way:
name = Regex.Replace(name, "(\b)?-(?(1)|\b)", " ");

Regex to find embedded quotes in a quotes string

Original string:
11235485|56987|0|2010|05|"This is my sample
"text""|"01J400B"|""|1|"Sample "text" number two"|""sample text number
three""|""|""|
Desired string:
11235485|56987|0|2010|05|"This is my sample
""text"""|"01J400B"|""|1|"Sample ""text"" number two"|"""sample text
number three"""|""|""|
The desired string unfortunately is a requirement that is out of my control, all nested quotes MUST be qualified with quotes (I KNOW).
Try as I might I have not been able to create the desired string from the original.
A regex match/replace seems to be the way to go, I need help. Any help is appreciated.
I'd actually split the string and evaluate each piece:
public string Escape(string input)
{
string[] pieces = input.Split('|');
for (int i = 0; i < pieces.Length; i++)
{
string piece = pieces[i];
if (piece.StartsWith("\"") && piece.EndsWith("\""))
{
pieces[i] = "\"" + piece.Trim('\"').Replace("\"", "\"\"") + "\"";
}
}
return string.Join("|", pieces);
}
This is making several assumptions about the input:
Items are delimited by pipes (|)
Items are well formed and will begin and end with quotation marks
This will also break if you have |s inside of quoted strings.
You may be able to just use the normal string.Replace() method. You know that | is what starts the column, so you can replace all " to "" and then fix the column start and end by replacing |"" to |" and ""| to "|.
It'd look like this:
var input = YOUR_ORIGINAL_STRING;
input.Replace("\"", "\"\"").Replace("|\"\"", "|\"").Replace("\"\"|", "\"|"));
It's not pretty, but it gets the job done.

Remove new line character from C# String

I have following string.
string str = #"One
Two
Four
Five
Six
Seven
Eight
Thirteen
Twenty
";
I want to remove the extra new lines in this string. So that the string should look like:
str = "One
Two
Four
Five
Six
Seven
Eight
Thirteen
Twenty"
I am using this code but it is not working.
Str = Str.Replace("\n\n", "\n");
while (Str.IndexOf("\n") > 0)
{
Str = Str.Replace("\n\n", "\n");
}
I even tried with Str = Str.Replace("\u000a\u000a", "\u000a"); But still it didn't worked out.
You could split the string into lines, remove the empty entries and join it back together:
var lines = str.Split('\n')
.Where(s => !string.IsNullOrWhiteSpace(s));
str = string.Join("\n", lines);
Try this:
str = System.Text.RegularExpressions.Regex.Replace(str, "(" + Environment.NewLine + ")+", Environment.NewLine)
See here to learn more about Environment.Newline. But even the above code does not guarantee to remove duplicate newlines, because the document or string you are parsing could be created on different machine where the code for a newline is diferent:
"\r\n" - windows newline,
"\n" - unix newline,
"\r" - mac newline
For introduction to regular expression, wikipedia article should be quite informative, but generally:
Environment.Newline can be of multiple characters, such as "\r\n" and thats why I am enclosing this variable in "()" to mark it as a group of characters (single element) which should be considered atomic,
"+" matches the preceding element (Environment.Newline enclosed in "()") one or more times.
Thanks to above and to Regex.Replace we get exactly the desired output.
I tried your code and it hangs at the while. Which is to be expected as the replace will never get rid of all of the \n instances. You want to change your current while loop to this:
while (str.IndexOf("\n\n") > 0)
{
str = str.Replace("\n\n", "\n");
}
This will loop until any repeated instances of \n\n have been removed.
Edit: I've tested this and for a variety or cases and it works as long as the string does not start with \n or \n\n.

How do I find and remove any rule or newline in an output? [duplicate]

How can I replace Line Breaks within a string in C#?
Use replace with Environment.NewLine
myString = myString.Replace(System.Environment.NewLine, "replacement text"); //add a line terminating ;
As mentioned in other posts, if the string comes from another environment (OS) then you'd need to replace that particular environments implementation of new line control characters.
The solutions posted so far either only replace Environment.NewLine or they fail if the replacement string contains line breaks because they call string.Replace multiple times.
Here's a solution that uses a regular expression to make all three replacements in just one pass over the string. This means that the replacement string can safely contain line breaks.
string result = Regex.Replace(input, #"\r\n?|\n", replacementString);
To extend The.Anyi.9's answer, you should also be aware of the different types of line break in general use. Dependent on where your file originated, you may want to look at making sure you catch all the alternatives...
string replaceWith = "";
string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
should get you going...
I would use Environment.Newline when I wanted to insert a newline for a string, but not to remove all newlines from a string.
Depending on your platform you can have different types of newlines, but even inside the same platform often different types of newlines are used. In particular when dealing with file formats and protocols.
string ReplaceNewlines(string blockOfText, string replaceWith)
{
return blockOfText.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}
If your code is supposed to run in different environments, I would consider using the Environment.NewLine constant, since it is specifically the newline used in the specific environment.
line = line.Replace(Environment.NewLine, "newLineReplacement");
However, if you get the text from a file originating on another system, this might not be the correct answer, and you should replace with whatever newline constant is used on the other system. It will typically be \n or \r\n.
if you want to "clean" the new lines, flamebaud comment using regex #"[\r\n]+" is the best choice.
using System;
using System.Text.RegularExpressions;
class MainClass {
public static void Main (string[] args) {
string str = "AAA\r\nBBB\r\n\r\n\r\nCCC\r\r\rDDD\n\n\nEEE";
Console.WriteLine (str.Replace(System.Environment.NewLine, "-"));
/* Result:
AAA
-BBB
-
-
-CCC
DDD---EEE
*/
Console.WriteLine (Regex.Replace(str, #"\r\n?|\n", "-"));
// Result:
// AAA-BBB---CCC---DDD---EEE
Console.WriteLine (Regex.Replace(str, #"[\r\n]+", "-"));
// Result:
// AAA-BBB-CCC-DDD-EEE
}
}
Use new in .NET 6 method
myString = myString.ReplaceLineEndings();
Replaces ALL newline sequences in the current string.
Documentation:
ReplaceLineEndings
Don't forget that replace doesn't do the replacement in the string, but returns a new string with the characters replaced. The following will remove line breaks (not replace them). I'd use #Brian R. Bondy's method if replacing them with something else, perhaps wrapped as an extension method. Remember to check for null values first before calling Replace or the extension methods provided.
string line = ...
line = line.Replace( "\r", "").Replace( "\n", "" );
As extension methods:
public static class StringExtensions
{
public static string RemoveLineBreaks( this string lines )
{
return lines.Replace( "\r", "").Replace( "\n", "" );
}
public static string ReplaceLineBreaks( this string lines, string replacement )
{
return lines.Replace( "\r\n", replacement )
.Replace( "\r", replacement )
.Replace( "\n", replacement );
}
}
To make sure all possible ways of line breaks (Windows, Mac and Unix) are replaced you should use:
string.Replace("\r\n", "\n").Replace('\r', '\n').Replace('\n', 'replacement');
and in this order, to not to make extra line breaks, when you find some combination of line ending chars.
Why not both?
string ReplacementString = "";
Regex.Replace(strin.Replace(System.Environment.NewLine, ReplacementString), #"(\r\n?|\n)", ReplacementString);
Note: Replace strin with the name of your input string.
I needed to replace the \r\n with an actual carriage return and line feed and replace \t with an actual tab. So I came up with the following:
public string Transform(string data)
{
string result = data;
char cr = (char)13;
char lf = (char)10;
char tab = (char)9;
result = result.Replace("\\r", cr.ToString());
result = result.Replace("\\n", lf.ToString());
result = result.Replace("\\t", tab.ToString());
return result;
}
var answer = Regex.Replace(value, "(\n|\r)+", replacementString);
As new line can be delimited by \n, \r and \r\n, first we’ll replace \r and \r\n with \n, and only then split data string.
The following lines should go to the parseCSV method:
function parseCSV(data) {
//alert(data);
//replace UNIX new lines
data = data.replace(/\r\n/g, "\n");
//replace MAC new lines
data = data.replace(/\r/g, "\n");
//split into rows
var rows = data.split("\n");
}
Use the .Replace() method
Line.Replace("\n", "whatever you want to replace with");
Best way to replace linebreaks safely is
yourString.Replace("\r\n","\n") //handling windows linebreaks
.Replace("\r","\n") //handling mac linebreaks
that should produce a string with only \n (eg linefeed) as linebreaks.
this code is usefull to fix mixed linebreaks too.
Another option is to create a StringReader over the string in question. On the reader, do .ReadLine() in a loop. Then you have the lines separated, no matter what (consistent or inconsistent) separators they had. With that, you can proceed as you wish; one possibility is to use a StringBuilder and call .AppendLine on it.
The advantage is, you let the framework decide what constitutes a "line break".
string s = Regex.Replace(source_string, "\n", "\r\n");
or
string s = Regex.Replace(source_string, "\r\n", "\n");
depending on which way you want to go.
Hopes it helps.
If you want to replace only the newlines:
var input = #"sdfhlu \r\n sdkuidfs\r\ndfgdgfd";
var match = #"[\\ ]+";
var replaceWith = " ";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input.Replace(#"\n", replaceWith).Replace(#"\r", replaceWith), match, replaceWith);
Console.WriteLine("output: " + x);
If you want to replace newlines, tabs and white spaces:
var input = #"sdfhlusdkuidfs\r\ndfgdgfd";
var match = #"[\\s]+";
var replaceWith = "";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input, match, replaceWith);
Console.WriteLine("output: " + x);
This is a very long winded one-liner solution but it is the only one that I had found to work if you cannot use the the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method
MyStr.replace( System.String.Concat( System.Char.ConvertFromUtf32(13).ToString(), System.Char.ConvertFromUtf32(10).ToString() ), ReplacementString );
This is somewhat offtopic but to get it to work inside Visual Studio's XML .props files, which invoke .NET via the XML properties, I had to dress it up like it is shown below.
The Visual Studio XML --> .NET environment just would not accept the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method.
$([System.IO.File]::ReadAllText('MyFile.txt').replace( $([System.String]::Concat($([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString()))),$([System.String]::Concat('^',$([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString())))))
Based on #mark-bayers answer and for cleaner output:
string result = Regex.Replace(ex.Message, #"(\r\n?|\r?\n)+", "replacement text");
It removes \r\n , \n and \r while perefer longer one and simplify multiple occurances to one.

Categories