Remove new line character from C# String - c#

I have following string.
string str = #"One
Two
Four
Five
Six
Seven
Eight
Thirteen
Twenty
";
I want to remove the extra new lines in this string. So that the string should look like:
str = "One
Two
Four
Five
Six
Seven
Eight
Thirteen
Twenty"
I am using this code but it is not working.
Str = Str.Replace("\n\n", "\n");
while (Str.IndexOf("\n") > 0)
{
Str = Str.Replace("\n\n", "\n");
}
I even tried with Str = Str.Replace("\u000a\u000a", "\u000a"); But still it didn't worked out.

You could split the string into lines, remove the empty entries and join it back together:
var lines = str.Split('\n')
.Where(s => !string.IsNullOrWhiteSpace(s));
str = string.Join("\n", lines);

Try this:
str = System.Text.RegularExpressions.Regex.Replace(str, "(" + Environment.NewLine + ")+", Environment.NewLine)
See here to learn more about Environment.Newline. But even the above code does not guarantee to remove duplicate newlines, because the document or string you are parsing could be created on different machine where the code for a newline is diferent:
"\r\n" - windows newline,
"\n" - unix newline,
"\r" - mac newline
For introduction to regular expression, wikipedia article should be quite informative, but generally:
Environment.Newline can be of multiple characters, such as "\r\n" and thats why I am enclosing this variable in "()" to mark it as a group of characters (single element) which should be considered atomic,
"+" matches the preceding element (Environment.Newline enclosed in "()") one or more times.
Thanks to above and to Regex.Replace we get exactly the desired output.

I tried your code and it hangs at the while. Which is to be expected as the replace will never get rid of all of the \n instances. You want to change your current while loop to this:
while (str.IndexOf("\n\n") > 0)
{
str = str.Replace("\n\n", "\n");
}
This will loop until any repeated instances of \n\n have been removed.
Edit: I've tested this and for a variety or cases and it works as long as the string does not start with \n or \n\n.

Related

Why Notepad++ shows Carriage return + Line feed for both \r and \n

In C# define 4 variable as below:
string s1 = "\r";
string s2 = "\n";
string CarriageReturn = (Convert.ToChar(13)).ToString();
string LineFeed = (Convert.ToChar(10)).ToString();
Then by watching copy their value in Notepad++ and click on "Show all characters". Interestingly you can see there is no difference between \r and \n and for both of them, it shows CR LF.
Is it a bug or something else? How can we explain this?
Interestingly you can see there no difference with \r and \n and for both of them it shows CR LF Is it a bug or something else?
It is not a bug. CRLF is the default for the Environment.NewLine in Windows: a 'string containing "\r\n" for non-Unix platforms, or a string containing "\n" for Unix platforms.'
How can we explain this?
It probably results from the way you are outputting the string values to a file. If you use a method that adds new lines, such as WriteAllLines() does, then there will automatically be a CRLF at the end of each value you write.
For instance, we can run the following program.
string r = "\r";
string n = "\n";
string CarriageReturn = (Convert.ToChar(13)).ToString();
string LineFeed = (Convert.ToChar(10)).ToString();
var content = new string[] {
$"(r:{r})",
$"(n:{n})",
$"(13:{CarriageReturn})",
$"(10:{LineFeed})"
};
System.IO.File.WriteAllLines("output1.txt", content);
System.IO.File.WriteAllText("output2.txt", string.Join("", content));
It produces two output files. The one on the left used WriteAllLines to write four lines. The one on the right used WriteAllText() and did not write any new lines.
In both, all of the content outside parentheses is independent of your code. That is, the CRLF symbols are part of writing a line in the call to WriteAllLines.

Complex string split C#

I have input file like this:
input.txt
aa#aa.com bb#bb.com "Information" "Hi there"
cc#cc.com dd#dd.com "Follow up" "Interview"
I have used this method:
string[] words = item.Split(' ');
However, it splits every words with space. I also have spaces in quotes strings but I won't split those spaces.
Basically I want to parse this input from file to this output:
From = aa#aa.com
To = bb#bb.com
Subject = Information
Body = Hi there
How do I split these strings in C#?
Simply you can use Regex as it is said in this question
var stringValue = "aa#aa.com bb#bb.com \"Information\" \"Hi there\"";
var parts = Regex.Matches(stringValue, #"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
//parts: aa#aa.com
bb#bb.com
"Information"
"Hi there"
Also you may try Replace function to remove those " characters.
The String.Split() method has an overload that allows you to specify the number of splits required. You can get what you want like this:
Read one line at a time
Call input.Split(new string[" "], 3, StringSplitOptions.None) - this returns an array of strings with 3 parts. Since email addresses don't have spaces in them, the first two strings will be the from/to addresses, and the third string will be the subject and message. Assume the result of this call is stored in firstSplit[], then firstSplit[0] is the from address, firstSplit[1] is the to address, and firstSplit[2] is the subject and message combined.
Call firstSplit[2].Split(new string[""" """], 2, StringSplitOptions.None) - this searches for the string " " in the concatenated subject+message from the previous call, which should pinpoint the separator between the end of the subject and the start of the message. This will give you the subject and message in another array. (The double-quotes inside are doubled to escape them)
This assumes you disallow double quotes in your subject and message. If you do allow double quotes, then you need to ensure you escape them before putting it in the file in the first place.
You can do this without using regex by just using IndexOf and SubString just put it in a loop if you have multiple emails to parse.
It's not pretty but it would be faster than RegEx if you're doing a lot of them.
string content = #"abba#aa.com dddb#bdd.com ""Information"" ""Hi there""";
string firstEmail = content.Substring(0, content.IndexOf(" ", StringComparison.Ordinal));
string secondEmail = content.Substring(firstEmail.Length, content.IndexOf(" ", firstEmail.Length + 1) - firstEmail.Length);
int firstQuote = content.IndexOf("\"", StringComparison.Ordinal);
string subjectandMessage = content.Substring(firstQuote, content.Length - content.IndexOf("\"", firstQuote, StringComparison.Ordinal));
String[] words = subjectandMessage.Split(new string[] { "\" \"" }, StringSplitOptions.None);
Console.WriteLine(firstEmail);
Console.WriteLine(secondEmail);
Console.WriteLine(words[0].Remove(0,1));
Console.WriteLine(words[1].Remove(words[1].Length -1));
Output:
aa#aa.com
bb#bb.com
Information
Hi there
As Spencer pointed out, read this file line by line using File.ReadAllLines() method and then apply String.Split[] method with spaces using something like this:
string[] elements = string.Split(new char[0]);
UPDATE
Not a pretty solution, but this is how I think it can work:
string[] readText = File.ReadAllLines(' ');
//Take value of first 3 fields by simple readText[index]; (index: 0-2)
string temp = "";
for(int i=3; i<readText.Length; i++)
{
temp += readText[i];
}
Requires reference to Microsoft.VisualBasic, but a bit more reliable than Regex:
using (var tfp = new Microsoft.VisualBasic.FileIO.TextFieldParser("input.txt")) {
for (tfp.SetDelimiters(" "); !tfp.EndOfData;) {
string[] fields = tfp.ReadFields();
Debug.Print(string.Join(",", fields)); // "aa#aa.com,bb#bb.com,Information,Hi there"
}
}

How do I remove the newlines from this string?

I've spent many, many hours looking for answer. If you look up remove new lines on this site, it gives answers that looks like would work, but I can't get those to work.
string TextFileBlock = File.ReadAllText("TextFile.txt");
char newlinechar = '\n' ;
TextFileBlock = TextFileBlock.Replace(" ", String.Empty); //works for the spaces
TextFileBlock = TextFileBlock.Replace(newlinechar.ToString(), String.Empty);
///Does not get rid of the newlines. The enter key.
// TextFileBlock = TextFileBlock.Replace("\n", String.Empty);// not works
//TextFileBlock.Replace(Environment.NewLine, string.Empty);//not works
C# question
This should take care of it:
public static string RemoveNewLines(this string input)
{
return input.Replace("\r\n", string.Empty)
.Replace("\n", string.Empty)
.Replace("\r", string.Empty);
}
Assuming the file is reasonably small, you can replace the entire segment with:
string[] lines = File.ReadAllLines("TextFile.txt");
string TextFileBlock = String.Concat(lines).Replace(" ", "");
The ReadAllLines method returns an array of lines, where a line is terminated by a carriage return (\r), line feed (\n), or CRLF (\r\n). Individual elements in the array do not include the terminating carriage return or line feed, and are thus not included in the final string when the array elements are concatenated.

C# Replacing Multiple Spaces with 1 space leaving special characters intact

Having a bit of a problem as I have to translate a string into a table. I'd like to remove multiple spaces, but not all of them. So the data in text comes back with lots of spaces in between like so:
SESSIONNAME USERNAME ID STATE TYPE DEVICE\r\n
services 0 Disc \r\n
console 1 Conn \r\n
alinav 2 Disc \r\n
rdp-tcp 65536 Listen \r\n
I would like to still keep the \r\n\ values that will define my rows, and I want to keep the empty value which would be legit under the columns, and I want to keep the spaces to define the columns. But I want to remove the extra spaces that I don't want to be fed into the values.
I've tried:
output = Regex.Replace(output, #"\s{2,}", " ", RegexOptions.Multiline);
output = output.Replace(" ", " ");
But the first one just removes everything (things I need and don't need). And the second one still leaves too many spaces.
Thanks.
You can do two things:
Use space explicitly in the regular expression, \s includes weird characters like (\n, \r, \t,...) as well, thus:
output = Regex.Replace(output, #" +", " ", RegexOptions.Multiline);
Or apply the second method until convergence:
string s2 = output;
do {
output = s2;
s2 = s2.Replace(" "," ");
} while(output != s2);
In most cases the first method will outperform the second one. This because the first method groups all substrings with two or more spaces. Regexes are in general a bit slower than simple string replacement, but if the string contains sequences with many spaces, the first method will be faster.
In your example the data is delimited by position, not by characters; is that correct? If so, you should extract by position; something like:
foreach (string s in output.Split())
{
var sessionName = s.Substring(0, 18).Trim();
var userName = s.Substring(18, 19).Trim();
var id = Int32.Parse(s.Substring(37, 8).Trim());
var whateverType = s.Substring(45, 12).Trim();
var device = s.Substring(57, 6).Trim();
}
Of course you need to do proper error checking, and should probably put the field widths in an array and calculate positions instead of hard-coding them as I have shown.

How do I find and remove any rule or newline in an output? [duplicate]

How can I replace Line Breaks within a string in C#?
Use replace with Environment.NewLine
myString = myString.Replace(System.Environment.NewLine, "replacement text"); //add a line terminating ;
As mentioned in other posts, if the string comes from another environment (OS) then you'd need to replace that particular environments implementation of new line control characters.
The solutions posted so far either only replace Environment.NewLine or they fail if the replacement string contains line breaks because they call string.Replace multiple times.
Here's a solution that uses a regular expression to make all three replacements in just one pass over the string. This means that the replacement string can safely contain line breaks.
string result = Regex.Replace(input, #"\r\n?|\n", replacementString);
To extend The.Anyi.9's answer, you should also be aware of the different types of line break in general use. Dependent on where your file originated, you may want to look at making sure you catch all the alternatives...
string replaceWith = "";
string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
should get you going...
I would use Environment.Newline when I wanted to insert a newline for a string, but not to remove all newlines from a string.
Depending on your platform you can have different types of newlines, but even inside the same platform often different types of newlines are used. In particular when dealing with file formats and protocols.
string ReplaceNewlines(string blockOfText, string replaceWith)
{
return blockOfText.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}
If your code is supposed to run in different environments, I would consider using the Environment.NewLine constant, since it is specifically the newline used in the specific environment.
line = line.Replace(Environment.NewLine, "newLineReplacement");
However, if you get the text from a file originating on another system, this might not be the correct answer, and you should replace with whatever newline constant is used on the other system. It will typically be \n or \r\n.
if you want to "clean" the new lines, flamebaud comment using regex #"[\r\n]+" is the best choice.
using System;
using System.Text.RegularExpressions;
class MainClass {
public static void Main (string[] args) {
string str = "AAA\r\nBBB\r\n\r\n\r\nCCC\r\r\rDDD\n\n\nEEE";
Console.WriteLine (str.Replace(System.Environment.NewLine, "-"));
/* Result:
AAA
-BBB
-
-
-CCC
DDD---EEE
*/
Console.WriteLine (Regex.Replace(str, #"\r\n?|\n", "-"));
// Result:
// AAA-BBB---CCC---DDD---EEE
Console.WriteLine (Regex.Replace(str, #"[\r\n]+", "-"));
// Result:
// AAA-BBB-CCC-DDD-EEE
}
}
Use new in .NET 6 method
myString = myString.ReplaceLineEndings();
Replaces ALL newline sequences in the current string.
Documentation:
ReplaceLineEndings
Don't forget that replace doesn't do the replacement in the string, but returns a new string with the characters replaced. The following will remove line breaks (not replace them). I'd use #Brian R. Bondy's method if replacing them with something else, perhaps wrapped as an extension method. Remember to check for null values first before calling Replace or the extension methods provided.
string line = ...
line = line.Replace( "\r", "").Replace( "\n", "" );
As extension methods:
public static class StringExtensions
{
public static string RemoveLineBreaks( this string lines )
{
return lines.Replace( "\r", "").Replace( "\n", "" );
}
public static string ReplaceLineBreaks( this string lines, string replacement )
{
return lines.Replace( "\r\n", replacement )
.Replace( "\r", replacement )
.Replace( "\n", replacement );
}
}
To make sure all possible ways of line breaks (Windows, Mac and Unix) are replaced you should use:
string.Replace("\r\n", "\n").Replace('\r', '\n').Replace('\n', 'replacement');
and in this order, to not to make extra line breaks, when you find some combination of line ending chars.
Why not both?
string ReplacementString = "";
Regex.Replace(strin.Replace(System.Environment.NewLine, ReplacementString), #"(\r\n?|\n)", ReplacementString);
Note: Replace strin with the name of your input string.
I needed to replace the \r\n with an actual carriage return and line feed and replace \t with an actual tab. So I came up with the following:
public string Transform(string data)
{
string result = data;
char cr = (char)13;
char lf = (char)10;
char tab = (char)9;
result = result.Replace("\\r", cr.ToString());
result = result.Replace("\\n", lf.ToString());
result = result.Replace("\\t", tab.ToString());
return result;
}
var answer = Regex.Replace(value, "(\n|\r)+", replacementString);
As new line can be delimited by \n, \r and \r\n, first we’ll replace \r and \r\n with \n, and only then split data string.
The following lines should go to the parseCSV method:
function parseCSV(data) {
//alert(data);
//replace UNIX new lines
data = data.replace(/\r\n/g, "\n");
//replace MAC new lines
data = data.replace(/\r/g, "\n");
//split into rows
var rows = data.split("\n");
}
Use the .Replace() method
Line.Replace("\n", "whatever you want to replace with");
Best way to replace linebreaks safely is
yourString.Replace("\r\n","\n") //handling windows linebreaks
.Replace("\r","\n") //handling mac linebreaks
that should produce a string with only \n (eg linefeed) as linebreaks.
this code is usefull to fix mixed linebreaks too.
Another option is to create a StringReader over the string in question. On the reader, do .ReadLine() in a loop. Then you have the lines separated, no matter what (consistent or inconsistent) separators they had. With that, you can proceed as you wish; one possibility is to use a StringBuilder and call .AppendLine on it.
The advantage is, you let the framework decide what constitutes a "line break".
string s = Regex.Replace(source_string, "\n", "\r\n");
or
string s = Regex.Replace(source_string, "\r\n", "\n");
depending on which way you want to go.
Hopes it helps.
If you want to replace only the newlines:
var input = #"sdfhlu \r\n sdkuidfs\r\ndfgdgfd";
var match = #"[\\ ]+";
var replaceWith = " ";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input.Replace(#"\n", replaceWith).Replace(#"\r", replaceWith), match, replaceWith);
Console.WriteLine("output: " + x);
If you want to replace newlines, tabs and white spaces:
var input = #"sdfhlusdkuidfs\r\ndfgdgfd";
var match = #"[\\s]+";
var replaceWith = "";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input, match, replaceWith);
Console.WriteLine("output: " + x);
This is a very long winded one-liner solution but it is the only one that I had found to work if you cannot use the the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method
MyStr.replace( System.String.Concat( System.Char.ConvertFromUtf32(13).ToString(), System.Char.ConvertFromUtf32(10).ToString() ), ReplacementString );
This is somewhat offtopic but to get it to work inside Visual Studio's XML .props files, which invoke .NET via the XML properties, I had to dress it up like it is shown below.
The Visual Studio XML --> .NET environment just would not accept the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method.
$([System.IO.File]::ReadAllText('MyFile.txt').replace( $([System.String]::Concat($([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString()))),$([System.String]::Concat('^',$([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString())))))
Based on #mark-bayers answer and for cleaner output:
string result = Regex.Replace(ex.Message, #"(\r\n?|\r?\n)+", "replacement text");
It removes \r\n , \n and \r while perefer longer one and simplify multiple occurances to one.

Categories