I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):
string[] lines = tbIn.Text.Split('\n')
.Select(t => t.Replace("\r", "\r\n")).ToArray();
So I'm wondering - is there a better way to do it?
Accepted answer
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
The following seems to do the job:
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.
(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.
Something along the lines of using this regular expression:
[^\n\r]*\r\n
Then use Regex.Matches().
The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)
Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:
string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
lines[i] = lines[i].Replace("\r", "\r\n");
}
... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.
One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?
If you are just going to replace the newline (\n) then do something like this:
string[] lines = tbIn.Text.Split('\n')
.Select(t => t + "\r\n").ToArray();
Edit: Regex.Replace allows you to split on a string.
string[] lines = Regex.Split(tbIn.Text, "\r\n")
.Select(t => t + "\r\n").ToArray();
As always, extension method goodies :)
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
usage:
string text = "One,Two,Three,Four";
foreach (var s in text.SplitAndKeep(","))
{
Console.WriteLine(s);
}
Output:
One,
Two,
Three,
Four
You can achieve this with a regular expression. Here's an extension method with it:
public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
{
MatchCollection matches = Regex.Matches(input, #"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
string[] result = new string[matches.Count];
for (int i = 0; i < matches.Count ; i++)
{
result[i] = matches[i].Value;
}
return result;
}
I'm not sure if this is a better solution. Yours is very compact and simple.
Related
I need to split a string into newlines in .NET and the only way I know of to split strings is with the Split method. However that will not allow me to (easily) split on a newline, so what is the best way to do it?
To split on a string you need to use the overload that takes an array of strings:
string[] lines = theText.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);
Edit:
If you want to handle different types of line breaks in a text, you can use the ability to match more than one string. This will correctly split on either type of line break, and preserve empty lines and spacing in the text:
string[] lines = theText.Split(
new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);
What about using a StringReader?
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Try to avoid using string.Split for a general solution, because you'll use more memory everywhere you use the function -- the original string, and the split copy, both in memory. Trust me that this can be one hell of a problem when you start to scale -- run a 32-bit batch-processing app processing 100MB documents, and you'll crap out at eight concurrent threads. Not that I've been there before...
Instead, use an iterator like this;
public static IEnumerable<string> SplitToLines(this string input)
{
if (input == null)
{
yield break;
}
using (System.IO.StringReader reader = new System.IO.StringReader(input))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
This will allow you to do a more memory efficient loop around your data;
foreach(var line in document.SplitToLines())
{
// one line at a time...
}
Of course, if you want it all in memory, you can do this;
var allTheLines = document.SplitToLines().ToArray();
You should be able to split your string pretty easily, like so:
aString.Split(Environment.NewLine.ToCharArray());
Based on Guffa's answer, in an extension class, use:
public static string[] Lines(this string source) {
return source.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
}
Regex is also an option:
private string[] SplitStringByLineFeed(string inpString)
{
string[] locResult = Regex.Split(inpString, "[\r\n]+");
return locResult;
}
For a string variable s:
s.Split(new string[]{Environment.NewLine},StringSplitOptions.None)
This uses your environment's definition of line endings. On Windows, line endings are CR-LF (carriage return, line feed) or in C#'s escape characters \r\n.
This is a reliable solution, because if you recombine the lines with String.Join, this equals your original string:
var lines = s.Split(new string[]{Environment.NewLine},StringSplitOptions.None);
var reconstituted = String.Join(Environment.NewLine,lines);
Debug.Assert(s==reconstituted);
What not to do:
Use StringSplitOptions.RemoveEmptyEntries, because this will break markup such as Markdown where empty lines have syntactic purpose.
Split on separator new char[]{Environment.NewLine}, because on Windows this will create one empty string element for each new line.
I just thought I would add my two-bits, because the other solutions on this question do not fall into the reusable code classification and are not convenient.
The following block of code extends the string object so that it is available as a natural method when working with strings.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.Collections.ObjectModel;
namespace System
{
public static class StringExtensions
{
public static string[] Split(this string s, string delimiter, StringSplitOptions options = StringSplitOptions.None)
{
return s.Split(new string[] { delimiter }, options);
}
}
}
You can now use the .Split() function from any string as follows:
string[] result;
// Pass a string, and the delimiter
result = string.Split("My simple string", " ");
// Split an existing string by delimiter only
string foo = "my - string - i - want - split";
result = foo.Split("-");
// You can even pass the split options parameter. When omitted it is
// set to StringSplitOptions.None
result = foo.Split("-", StringSplitOptions.RemoveEmptyEntries);
To split on a newline character, simply pass "\n" or "\r\n" as the delimiter parameter.
Comment: It would be nice if Microsoft implemented this overload.
Starting with .NET 6 we can use the new String.ReplaceLineEndings() method to canonicalize cross-platform line endings, so these days I find this to be the simplest way:
var lines = input
.ReplaceLineEndings()
.Split(Environment.NewLine, StringSplitOptions.None);
I'm currently using this function (based on other answers) in VB.NET:
Private Shared Function SplitLines(text As String) As String()
Return text.Split({Environment.NewLine, vbCrLf, vbLf}, StringSplitOptions.None)
End Function
It tries to split on the platform-local newline first, and then falls back to each possible newline.
I've only needed this inside one class so far. If that changes, I will probably make this Public and move it to a utility class, and maybe even make it an extension method.
Here's how to join the lines back up, for good measure:
Private Shared Function JoinLines(lines As IEnumerable(Of String)) As String
Return String.Join(Environment.NewLine, lines)
End Function
Well, actually split should do:
//Constructing string...
StringBuilder sb = new StringBuilder();
sb.AppendLine("first line");
sb.AppendLine("second line");
sb.AppendLine("third line");
string s = sb.ToString();
Console.WriteLine(s);
//Splitting multiline string into separate lines
string[] splitted = s.Split(new string[] {System.Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
// Output (separate lines)
for( int i = 0; i < splitted.Count(); i++ )
{
Console.WriteLine("{0}: {1}", i, splitted[i]);
}
string[] lines = text.Split(
Environment.NewLine.ToCharArray(),
StringSplitOptions.RemoveEmptyStrings);
The RemoveEmptyStrings option will make sure you don't have empty entries due to \n following a \r
(Edit to reflect comments:) Note that it will also discard genuine empty lines in the text. This is usually what I want but it might not be your requirement.
I did not know about Environment.Newline, but I guess this is a very good solution.
My try would have been:
string str = "Test Me\r\nTest Me\nTest Me";
var splitted = str.Split('\n').Select(s => s.Trim()).ToArray();
The additional .Trim removes any \r or \n that might be still present (e. g. when on windows but splitting a string with os x newline characters). Probably not the fastest method though.
EDIT:
As the comments correctly pointed out, this also removes any whitespace at the start of the line or before the new line feed. If you need to preserve that whitespace, use one of the other options.
Examples here are great and helped me with a current "challenge" to split RSA-keys to be presented in a more readable way. Based on Steve Coopers solution:
string Splitstring(string txt, int n = 120, string AddBefore = "", string AddAfterExtra = "")
{
//Spit each string into a n-line length list of strings
var Lines = Enumerable.Range(0, txt.Length / n).Select(i => txt.Substring(i * n, n)).ToList();
//Check if there are any characters left after split, if so add the rest
if(txt.Length > ((txt.Length / n)*n) )
Lines.Add(txt.Substring((txt.Length/n)*n));
//Create return text, with extras
string txtReturn = "";
foreach (string Line in Lines)
txtReturn += AddBefore + Line + AddAfterExtra + Environment.NewLine;
return txtReturn;
}
Presenting a RSA-key with 33 chars width and quotes are then simply
Console.WriteLine(Splitstring(RSAPubKey, 33, "\"", "\""));
Output:
Hopefully someone find it usefull...
Silly answer: write to a temporary file so you can use the venerable
File.ReadLines
var s = "Hello\r\nWorld";
var path = Path.GetTempFileName();
using (var writer = new StreamWriter(path))
{
writer.Write(s);
}
var lines = File.ReadLines(path);
using System.IO;
string textToSplit;
if (textToSplit != null)
{
List<string> lines = new List<string>();
using (StringReader reader = new StringReader(textToSplit))
{
for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
lines.Add(line);
}
}
}
Very easy, actually.
VB.NET:
Private Function SplitOnNewLine(input as String) As String
Return input.Split(Environment.NewLine)
End Function
C#:
string splitOnNewLine(string input)
{
return input.split(environment.newline);
}
I breezed through the documentation for the string class and didn't see any good tools for combining an arbitrary number of strings into a single string. The best procedure I could come up with in my program is
string [] assetUrlPieces = { Server.MapPath("~/assets/"),
"organizationName/",
"categoryName/",
(Guid.NewGuid().ToString() + "/"),
(Path.GetFileNameWithoutExtension(file.FileName) + "/")
};
string assetUrl = combinedString(assetUrlPieces);
private string combinedString ( string [] pieces )
{
string alltogether = "";
foreach (string thispiece in pieces) alltogether += alltogether + thispiece;
return alltogether;
}
but that seems like too much code and too much inefficiency (from the string addition) and awkwardness.
If you want to insert a separator between values, string.Join is your friend. If you just want to concatenate the strings, then you can use string.Concat:
string assetUrl = string.Concat(assetUrlPieces);
That's marginally simpler (and possibly more efficient, but probably insignificantly) than calling string.Join with an empty separator.
As noted in comments, if you're actually building up the array at the same point in the code that you do the concatenation, and you don't need the array for anything else, just use concatenation directly:
string assetUrl = Server.MapPath("~/assets/") +
"organizationName/" +
"categoryName/" +
Guid.NewGuid() + "/" +
Path.GetFileNameWithoutExtension(file.FileName) + "/";
... or potentially use string.Format instead.
I prefer using string.Join:
var result = string.Join("", pieces);
You can read about string.Join on MSDN
You want a StringBuilder, I think.
var sb = new StringBuilder(pieces.Count());
foreach(var s in pieces) {
sb.Append(s);
}
return sb.ToString();
Update
#FiredFromAmazon.com: I think you'll want to go with the string.Concat solution offered by others for
Its sheer simplicity
Higher performance. Under the hood, it uses FillStringChecked, which does pointer copies, whereas string.Join uses StringBuilder. See http://referencesource.microsoft.com/#mscorlib/system/string.cs,1512. (Thank you to #Bas).
string.Concat is the most appropriate method for what you want.
var result = string.Concat(pieces);
Unless you want to put delimiters between the individual strings. Then you'd use string.Join
var result = string.Join(",", pieces); // comma delimited result.
A simple way to do this with a regular for loop:
(since you can use the indices, plus I like these loops better than foreach loops)
private string combinedString(string[] pieces)
{
string alltogether = "";
for (int index = 0; index <= pieces.Length - 1; index++) {
if (index != pieces.Length - 1) {
alltogether += string.Format("{0}/" pieces[index]);
}
}
return alltogether;
I have a string that looks like this
2,"E2002084700801601390870F"
3,"E2002084700801601390870F"
1,"E2002084700801601390870F"
4,"E2002084700801601390870F"
3,"E2002084700801601390870F"
This is one whole string, you can imagine it being on one row.
And I want to split this in the way they stand right now like this
2,"E2002084700801601390870F"
I cannot change the way it is formatted. So my best bet is to split at every second quotation mark. But I haven't found any good ways to do this. I've tried this https://stackoverflow.com/a/17892392/2914876 But I only get an error about invalid arguements.
Another issue is that this project is running .NET 2.0 so most LINQ functions aren't available.
Thank you.
Try this
var regEx = new Regex(#"\d+\,"".*?""");
var lines = regex.Matches(txt).OfType<Match>().Select(m => m.Value).ToArray();
Use foreach instead of LINQ Select on .Net 2
Regex regEx = new Regex(#"\d+\,"".*?""");
foreach(Match m in regex.Matches(txt))
{
var curLine = m.Value;
}
I see three possibilities, none of them are particularly exciting.
As #dvnrrs suggests, if there's no comma where you have line-breaks, you should be in great shape. Replace ," with something novel. Replace the remaining "s with what you need. Replace the "something novel" with ," to restore them. This is probably the most solid--it solves the problem without much room for bugs.
Iterate through the string looking for the index of the next " from the previous index, and maintain a state machine to decide whether to manipulate it or not.
Split the string on "s and rejoin them in whatever way works the best for your application.
I realize regular expressions will handle this but here's a pure 2.0 way to handle as well. It's much more readable and maintainable in my humble opinion.
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
const string data = #"2,""E2002084700801601390870F""3,""E2002084700801601390870F""1,""E2002084700801601390870F""4,""E2002084700801601390870F""3,""E2002084700801601390870F""";
var parsedData = ParseData(data);
foreach (var parsedDatum in parsedData)
{
Console.WriteLine(parsedDatum);
}
Console.ReadLine();
}
private static IEnumerable<string> ParseData(string data)
{
var results = new List<string>();
var split = data.Split(new [] {'"'}, StringSplitOptions.RemoveEmptyEntries);
if (split.Length % 2 != 0)
{
throw new Exception("Data Formatting Error");
}
for (var index = 0; index < split.Length / 2; index += 2)
{
results.Add(string.Format(#"""{0}""{1}""", split[index], split[index + 1]));
}
return results;
}
}
}
I have a StringBuilder instance where I am doing numerous sb.AppendLine("test"); for example.
How do I work out how many lines I have?
I see the class has .Length but that tells me how many characters in all.
Any ideas?
Sorted by efficiency:
Counting your AppendLine() calls
Calling IndexOf() in a loop
Using Regex
Using String.Split()
The last one is extraordinary expensive and generates lots of garbage, don't use.
You could wrap StringBuilder with your own class that would keep a count of lines as they are added or could the number of '\n' after your builder is full.
Regex.Matches(builder.ToString(), Environment.NewLine).Count
You can create a wrapper class do the following:
public class Wrapper
{
private StringBuilder strBuild = null;
private int count = 0;
public Wrapper(){
strBuild = new StringBuilder();
}
public void AppendLine(String toAppendParam){
strBuild.AppendLine(toAppendParam);
count++;
}
public StringBuilder getStringBuilder(){
return strBuild;
}
public int getCount(){
return count;
}
}
Try this:
sb.ToString().Split(System.Environment.NewLine.ToCharArray()).Length;
You should be able to search for the number of occurences of \n in the string.
UPDATE:
One way could be to split on the newline character and count the number of elements in the array as follows:
sb.ToString().Split('\n').length;
If you're going to use String.Split(), you will need to split the string with some options. Like this:
static void Main(string[] args)
{
var sb = new StringBuilder();
sb.AppendLine("this");
sb.AppendLine("is");
sb.AppendLine("a");
sb.AppendLine("test");
// StringSplitOptions.None counts the last (blank) newline
// which the last AppendLine call creates
// if you don't want this, then replace with
// StringSplitOptions.RemoveEmptyEntries
var lines = sb.ToString().Split(
new string[] {
System.Environment.NewLine },
StringSplitOptions.None).Length;
Console.WriteLine("Number of lines: " + lines);
Console.WriteLine("Press enter to exit.");
Console.ReadLine();
}
This results in:
Number of lines: 5
UPDATE What Gabe said
b.ToString().Count(c => c =='\n') would work here too, and might not
be much less efficient (aside from creating a separate copy of the
string!).
A better way, faster than creating a string from the StringBuilder and splitting it (or creating the string and regexing it), is to look into the StringBuilder and count the number of '\n' characters there in.
The following extension method will enumerate through the characters in the string builder, you can then linq on it until to your heart is content.
public static IEnumerable<char> GetEnumerator(this StringBuilder sb)
{
for (int i = 0; i < sb.Length; i++)
yield return sb[i];
}
... used here, count will be 4
StringBuilder b = new StringBuilder();
b.AppendLine("Hello\n");
b.AppendLine("World\n");
int lineCount = b.GetEnumerator().Count(c => c =='\n');
Derive your own line counting StringBuilder where AppendLine ups an internal line count and provides a method to get the value of line count.
Do a regex to count the number of line terminators (ex: \r\n) in the string. Or, load the strings into a text box and do a line count but thats the hack-ey way of doing it
You can split string bulider data into String[] array and then use String[].Length for number of lines.
something like as below:
String[] linestext = sb.Split(newline)
Console.Writeline(linetext.Length)
I have a file, the text format is like this:
.640 .070 -.390 -.740 -1.030 -1.410 -1.780 -1.840
-1.360 -.360 .860 1.880 2.340 2.250 1.950 1.710
1.410 .700 -.300 -.840 -.280 1.020 1.860 1.460
.310 -.460 -.320 .350 1.020 1.650 2.430 3.070
2.840 1.440 -.460 -1.650 -1.520 -.520 .250 .190
-.420 -.870 -.800 -.280 .570 1.660 2.500 2.220
.520 -1.560 -2.530 -2.030 -1.200 -1.060 -1.230 -.600
.990 2.300 2.180 .940 -.090 -.140 .320 .470
.330 .420 .830 1.080 1.090 1.530 2.740 3.800
3.410 1.610 -.150 -.900 -1.120 -1.640 -2.140 -1.590
.210 2.210 3.290 3.170 2.380 1.880 2.530 4.210
5.280 3.820 -.040 -3.670 -4.190 -1.260 2.930 5.740
5.980 3.920 .540 -2.890 -5.010 -4.780 -2.150 1.640
4.670 5.540 4.230 1.950 .120 -.470 -.010 .340
-.710 -2.940 -4.070 -1.810 3.000 6.590 6.140 2.750
-.490 -2.460 -4.180 -5.660 -4.800 -.560 4.510 6.630
5.140 2.860 2.230 2.510 1.670 -.440 -2.030 -2.330
Note that there are a lot of white characters between one value and another.
I tried to read each line, and then split the line according to a ' ' character. My code is something like this:
public List<double> Parse(StreamReader sr)
{
var dataList = new List<double>();
while (sr.Peek() >= 0)
{
string line = sr.ReadLine();
if (lineCount > 1)
{
string[] columns = line.Split(' ');
for (var j = 0; j < columns.Length; j++)
{
dataList.Add(double.Parse(columns[j]) ));
}
}
}
return dataList ;
}
The problem with the above code is that it is only able to handle the case where values are separated by a single white character.
Any idea ?
The simplest way is probably to use an overload of String.Split which includes a StringSplitOptions parameter, and specify StringSplitOptions.RemoveEmptyEntries.
I would also personally just call ReadLine until that returned null, rather than using TextReader.Peek. Aside from anything else, it's more general - it will work even if the underlying stream (if any) doesn't support seeking.
Before you do the split, replace all multi spaces with a single space, something like:
line = System.Text.RegularExpressions.Regex.Replace(line, #" +", #" ");
You may use the simple one line code for this. Let your text is in the string named input.
string[] values = System.Text.RegularExpressions.Regex.Split(input, #"\s+");
You will get all values in a string array simply