My question is quite simple. I need to get all text lines from Windows text file.
All lines are separated by \r\n symbols. I use String.Split, but its not cool, because
it only splits 'by one symbol' leaving empty string that I need to remove with options flag. Is there a better way?
My implementation
string wholefile = GetFromSomeWhere();
// now parsing
string[] lines = operationtext.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
// ok now I have lines array
UPDATE
File.ReadAllXXX is of no use here coz GetFromSomeWhere is actually RegEx, so I've no file after this point.
You can use this overload of String.Split, which takes an array of strings that can serve as delimiters:
string[] lines = operationtext.Split(new[] { Environment.NewLine },
StringSplitOptions.RemoveEmptyEntries);
Of course, if you already have the file-path, it's much simpler to use File.ReadAllLines:
string[] lines = File.ReadAllLines(filePath);
String.Split does accept a string (like "\r\n"). Not just a chararray.
string[] lines = wholetext.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
You may find it much easier to simply use File.ReadAllLines() or File.ReadLines()
you could use an extension method like the one below and your code would then look like this:
var lines = operationText.ReadAsLines();
Extension method implementation:
public static IEnumerable<string> ReadAsLines(this string text)
{
TextReader reader = new StringReader(text);
while(reader.Peek() >= 0)
{
yield return reader.ReadLine();
}
}
I'm guessing it's not as performant as the split option which is usually very performant but if that's not an issue...
Related
I need to split a string into newlines in .NET and the only way I know of to split strings is with the Split method. However that will not allow me to (easily) split on a newline, so what is the best way to do it?
To split on a string you need to use the overload that takes an array of strings:
string[] lines = theText.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);
Edit:
If you want to handle different types of line breaks in a text, you can use the ability to match more than one string. This will correctly split on either type of line break, and preserve empty lines and spacing in the text:
string[] lines = theText.Split(
new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);
What about using a StringReader?
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Try to avoid using string.Split for a general solution, because you'll use more memory everywhere you use the function -- the original string, and the split copy, both in memory. Trust me that this can be one hell of a problem when you start to scale -- run a 32-bit batch-processing app processing 100MB documents, and you'll crap out at eight concurrent threads. Not that I've been there before...
Instead, use an iterator like this;
public static IEnumerable<string> SplitToLines(this string input)
{
if (input == null)
{
yield break;
}
using (System.IO.StringReader reader = new System.IO.StringReader(input))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
This will allow you to do a more memory efficient loop around your data;
foreach(var line in document.SplitToLines())
{
// one line at a time...
}
Of course, if you want it all in memory, you can do this;
var allTheLines = document.SplitToLines().ToArray();
You should be able to split your string pretty easily, like so:
aString.Split(Environment.NewLine.ToCharArray());
Based on Guffa's answer, in an extension class, use:
public static string[] Lines(this string source) {
return source.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
}
Regex is also an option:
private string[] SplitStringByLineFeed(string inpString)
{
string[] locResult = Regex.Split(inpString, "[\r\n]+");
return locResult;
}
For a string variable s:
s.Split(new string[]{Environment.NewLine},StringSplitOptions.None)
This uses your environment's definition of line endings. On Windows, line endings are CR-LF (carriage return, line feed) or in C#'s escape characters \r\n.
This is a reliable solution, because if you recombine the lines with String.Join, this equals your original string:
var lines = s.Split(new string[]{Environment.NewLine},StringSplitOptions.None);
var reconstituted = String.Join(Environment.NewLine,lines);
Debug.Assert(s==reconstituted);
What not to do:
Use StringSplitOptions.RemoveEmptyEntries, because this will break markup such as Markdown where empty lines have syntactic purpose.
Split on separator new char[]{Environment.NewLine}, because on Windows this will create one empty string element for each new line.
I just thought I would add my two-bits, because the other solutions on this question do not fall into the reusable code classification and are not convenient.
The following block of code extends the string object so that it is available as a natural method when working with strings.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.Collections.ObjectModel;
namespace System
{
public static class StringExtensions
{
public static string[] Split(this string s, string delimiter, StringSplitOptions options = StringSplitOptions.None)
{
return s.Split(new string[] { delimiter }, options);
}
}
}
You can now use the .Split() function from any string as follows:
string[] result;
// Pass a string, and the delimiter
result = string.Split("My simple string", " ");
// Split an existing string by delimiter only
string foo = "my - string - i - want - split";
result = foo.Split("-");
// You can even pass the split options parameter. When omitted it is
// set to StringSplitOptions.None
result = foo.Split("-", StringSplitOptions.RemoveEmptyEntries);
To split on a newline character, simply pass "\n" or "\r\n" as the delimiter parameter.
Comment: It would be nice if Microsoft implemented this overload.
Starting with .NET 6 we can use the new String.ReplaceLineEndings() method to canonicalize cross-platform line endings, so these days I find this to be the simplest way:
var lines = input
.ReplaceLineEndings()
.Split(Environment.NewLine, StringSplitOptions.None);
I'm currently using this function (based on other answers) in VB.NET:
Private Shared Function SplitLines(text As String) As String()
Return text.Split({Environment.NewLine, vbCrLf, vbLf}, StringSplitOptions.None)
End Function
It tries to split on the platform-local newline first, and then falls back to each possible newline.
I've only needed this inside one class so far. If that changes, I will probably make this Public and move it to a utility class, and maybe even make it an extension method.
Here's how to join the lines back up, for good measure:
Private Shared Function JoinLines(lines As IEnumerable(Of String)) As String
Return String.Join(Environment.NewLine, lines)
End Function
Well, actually split should do:
//Constructing string...
StringBuilder sb = new StringBuilder();
sb.AppendLine("first line");
sb.AppendLine("second line");
sb.AppendLine("third line");
string s = sb.ToString();
Console.WriteLine(s);
//Splitting multiline string into separate lines
string[] splitted = s.Split(new string[] {System.Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
// Output (separate lines)
for( int i = 0; i < splitted.Count(); i++ )
{
Console.WriteLine("{0}: {1}", i, splitted[i]);
}
string[] lines = text.Split(
Environment.NewLine.ToCharArray(),
StringSplitOptions.RemoveEmptyStrings);
The RemoveEmptyStrings option will make sure you don't have empty entries due to \n following a \r
(Edit to reflect comments:) Note that it will also discard genuine empty lines in the text. This is usually what I want but it might not be your requirement.
I did not know about Environment.Newline, but I guess this is a very good solution.
My try would have been:
string str = "Test Me\r\nTest Me\nTest Me";
var splitted = str.Split('\n').Select(s => s.Trim()).ToArray();
The additional .Trim removes any \r or \n that might be still present (e. g. when on windows but splitting a string with os x newline characters). Probably not the fastest method though.
EDIT:
As the comments correctly pointed out, this also removes any whitespace at the start of the line or before the new line feed. If you need to preserve that whitespace, use one of the other options.
Examples here are great and helped me with a current "challenge" to split RSA-keys to be presented in a more readable way. Based on Steve Coopers solution:
string Splitstring(string txt, int n = 120, string AddBefore = "", string AddAfterExtra = "")
{
//Spit each string into a n-line length list of strings
var Lines = Enumerable.Range(0, txt.Length / n).Select(i => txt.Substring(i * n, n)).ToList();
//Check if there are any characters left after split, if so add the rest
if(txt.Length > ((txt.Length / n)*n) )
Lines.Add(txt.Substring((txt.Length/n)*n));
//Create return text, with extras
string txtReturn = "";
foreach (string Line in Lines)
txtReturn += AddBefore + Line + AddAfterExtra + Environment.NewLine;
return txtReturn;
}
Presenting a RSA-key with 33 chars width and quotes are then simply
Console.WriteLine(Splitstring(RSAPubKey, 33, "\"", "\""));
Output:
Hopefully someone find it usefull...
Silly answer: write to a temporary file so you can use the venerable
File.ReadLines
var s = "Hello\r\nWorld";
var path = Path.GetTempFileName();
using (var writer = new StreamWriter(path))
{
writer.Write(s);
}
var lines = File.ReadLines(path);
using System.IO;
string textToSplit;
if (textToSplit != null)
{
List<string> lines = new List<string>();
using (StringReader reader = new StringReader(textToSplit))
{
for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
lines.Add(line);
}
}
}
Very easy, actually.
VB.NET:
Private Function SplitOnNewLine(input as String) As String
Return input.Split(Environment.NewLine)
End Function
C#:
string splitOnNewLine(string input)
{
return input.split(environment.newline);
}
i have a 1000 text file and i want read single to single and Each file has a 4700000 record,for example one of line in file is:
43266200 6819 43295200 1393/05/23 14:28:45 113 1
and i want save into sql server for example:
field1:43266200
field2:6819
how can i do this?
var seperators = " ".ToCharArray();
foreach(var line in File.ReadLines(path))
{
var fields = line.Split(seperators, StringSplitOptions.RemoveEmptyEntries);
//now you have fields[0] and fields[1], save them in your database
}
This may help you
var message ="43266200 6819 43295200 1393/05/23 14:28:45 113 1";
//Split your data into pieces
var messages=message.Split(' ').Where( o => !string.IsNullOrEmpty(o));
var i=0;
foreach(var item in messages)
{
// do whatever you wanna to do with pieces
Console.Write( "field {0}:{1}",++i,item);
}
If you're reading the text from a file, and you can reasonably assume that the space character will be your only delimiter, you should use the String.Split() method to tokenize each line:
// instantiate FileInfo of your file as yourFile
foreach (string line in yourFile.ReadLines())
{
string[] lineTokens = line.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
}
String.Split() allows you to separate any string into a string[] of substrings based on the char delimiters you provide in the first argument. The second argument in the code above is one of the values in the StringSplitOptions enumeration, which has values of either None (provide all strings) or RemoveEmptyEntries (do not return any substrings that consist solely of delimiter characters).
Then, from there, you can iterate through lineTokens and assemble an object from each token, or you can assemble an SQL query where any given index corresponds to a column in the row you intend to add.
I used MyString.Split(Environment.Newline.ToCharArray()[0]) to split my string from a file into different pieces. But, every item in the array, except the first one starts with \n after I did that? I know the way that I'm splitting by newlines is kind of "cheaty" for lack of a better word, so if there is a better way of doing this, please tell me...
Here is the file...
If you are wanting to maintain using the .Split() instead of reading a file in a line at a time you can do...
var splitResult = MyString.Split( new string[]{ System.Environment.NewLine },
System.StringSplitOptions.RemoveEmptyEntries );
/* or System.StringSplitOptions.None if you want empty results as well */
EDIT:
The problem you were having is that in a non-unix environment the new-line "character" is actually two characters. So when you grabbed the zero index you were actually splitting on a carriage return...not the new-line character (\n).
Windows = "\r\n"
Unix = "\n"
Per http://msdn.microsoft.com/en-us/library/system.environment.newline.aspx
A newline in Windows is two characters (\r and \n). The Environment.Newline.ToCharArray()[0] expression specifies only one of those characters: \r. Therefore, the other character (\n) remains as a portion of the split string.
My I suggest you read your file using something like this:
public IEnumerable<string> ReadFile(string filePath)
{
using (StreamReader rdr = new StreamReader(filePath))
{
string line;
while ( (line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
You might need more error handling, or to specify different file open option, or to pass a stream to method rather than the path, but the idea of using an iterator over the ReadLine() method is sound. The result is you can just use code like this:
foreach (string line in ReadLine(" ... my file path ... "))
{
}
Using the .NET MicroFramework which is a really cut-down version of C#. For instance, System.String barely has any of the goodies that we've enjoyed over the years.
I need to split a text document into lines, which means splitting by \r\n. However, String.Split only provides a split by char, not by string.
How can I split a document into lines in an efficient manner (e.g. not looping madly across each char in the doc)?
P.S. System.String is also missing a Replace method, so that won't work.
P.P.S. Regex is not part of the MicroFramework either.
You can do
string[] lines = doc.Split('\n');
for (int i = 0; i < lines.Length; i+= 1)
lines[i] = lines[i].Trim();
Assuming that the µF supports Trim() at all. Trim() will remove all whitespace, that might be useful. Otherwise use TrimEnd('\r')
I would loop across each char in the document, because that's clearly required. How do you think String.Split works? I would try to do so only hitting each character once, however.
Keep a list of strings found so far. Use IndexOf repeatedly, passing in the current offset into the string (i.e. the previous match + 2).
How can I split a document into lines in an efficient manner (e.g. not looping madly across each char in the doc)?
How do you think the built-in Split works?
Just reimplement it yourself as an extension method.
What about:
string path = "yourfile.txt";
string[] lines = File.ReadAllLines(path);
Or
string content = File.ReadAllText(path);
string[] lines = content.Split(
Environment.NewLine.ToCharArray(),
StringSplitOptions.RemoveEmptyEntries);
Readind that .NET Micro Framework 3.0, this code can work:
string line = String.Empty;
StreamReader reader = new StreamReader(path);
while ((line = reader.ReadLine()) != null)
{
// do stuff
}
This may help in some scenario:
StreamReader reader = new StreamReader(file);
string _Line = reader.ReadToEnd();
string IntMediateLine = string.Empty;
IntMediateLine = _Line.Replace("entersign", "");
string[] ArrayLineSpliter = IntMediateLine.Split('any specail chaarater');
If you'd like a MicroFramework compatible split function that works for an entire string of characters, here's one that does the trick, similar to the regular frameworks' version using StringSplitOptions.None:
private static string[] Split(string s, string delim)
{
if (s == null) throw new NullReferenceException();
// Declarations
var strings = new ArrayList();
var start = 0;
// Tokenize
if (delim != null && delim != "")
{
int i;
while ((i = s.IndexOf(delim, start)) != -1)
{
strings.Add(s.Substring(start, i - start));
start = i + delim.Length;
}
}
// Append left over
strings.Add(s.Substring(start));
return (string[]) strings.ToArray(typeof(string));
}
You can split your string with a substring.
String.Split(new string[] { "\r\n" }, StringSplitOptions.None);
I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):
string[] lines = tbIn.Text.Split('\n')
.Select(t => t.Replace("\r", "\r\n")).ToArray();
So I'm wondering - is there a better way to do it?
Accepted answer
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
The following seems to do the job:
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.
(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.
Something along the lines of using this regular expression:
[^\n\r]*\r\n
Then use Regex.Matches().
The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)
Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:
string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
lines[i] = lines[i].Replace("\r", "\r\n");
}
... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.
One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?
If you are just going to replace the newline (\n) then do something like this:
string[] lines = tbIn.Text.Split('\n')
.Select(t => t + "\r\n").ToArray();
Edit: Regex.Replace allows you to split on a string.
string[] lines = Regex.Split(tbIn.Text, "\r\n")
.Select(t => t + "\r\n").ToArray();
As always, extension method goodies :)
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
usage:
string text = "One,Two,Three,Four";
foreach (var s in text.SplitAndKeep(","))
{
Console.WriteLine(s);
}
Output:
One,
Two,
Three,
Four
You can achieve this with a regular expression. Here's an extension method with it:
public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
{
MatchCollection matches = Regex.Matches(input, #"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
string[] result = new string[matches.Count];
for (int i = 0; i < matches.Count ; i++)
{
result[i] = matches[i].Value;
}
return result;
}
I'm not sure if this is a better solution. Yours is very compact and simple.