I have a StringBuilder instance where I am doing numerous sb.AppendLine("test"); for example.
How do I work out how many lines I have?
I see the class has .Length but that tells me how many characters in all.
Any ideas?
Sorted by efficiency:
Counting your AppendLine() calls
Calling IndexOf() in a loop
Using Regex
Using String.Split()
The last one is extraordinary expensive and generates lots of garbage, don't use.
You could wrap StringBuilder with your own class that would keep a count of lines as they are added or could the number of '\n' after your builder is full.
Regex.Matches(builder.ToString(), Environment.NewLine).Count
You can create a wrapper class do the following:
public class Wrapper
{
private StringBuilder strBuild = null;
private int count = 0;
public Wrapper(){
strBuild = new StringBuilder();
}
public void AppendLine(String toAppendParam){
strBuild.AppendLine(toAppendParam);
count++;
}
public StringBuilder getStringBuilder(){
return strBuild;
}
public int getCount(){
return count;
}
}
Try this:
sb.ToString().Split(System.Environment.NewLine.ToCharArray()).Length;
You should be able to search for the number of occurences of \n in the string.
UPDATE:
One way could be to split on the newline character and count the number of elements in the array as follows:
sb.ToString().Split('\n').length;
If you're going to use String.Split(), you will need to split the string with some options. Like this:
static void Main(string[] args)
{
var sb = new StringBuilder();
sb.AppendLine("this");
sb.AppendLine("is");
sb.AppendLine("a");
sb.AppendLine("test");
// StringSplitOptions.None counts the last (blank) newline
// which the last AppendLine call creates
// if you don't want this, then replace with
// StringSplitOptions.RemoveEmptyEntries
var lines = sb.ToString().Split(
new string[] {
System.Environment.NewLine },
StringSplitOptions.None).Length;
Console.WriteLine("Number of lines: " + lines);
Console.WriteLine("Press enter to exit.");
Console.ReadLine();
}
This results in:
Number of lines: 5
UPDATE What Gabe said
b.ToString().Count(c => c =='\n') would work here too, and might not
be much less efficient (aside from creating a separate copy of the
string!).
A better way, faster than creating a string from the StringBuilder and splitting it (or creating the string and regexing it), is to look into the StringBuilder and count the number of '\n' characters there in.
The following extension method will enumerate through the characters in the string builder, you can then linq on it until to your heart is content.
public static IEnumerable<char> GetEnumerator(this StringBuilder sb)
{
for (int i = 0; i < sb.Length; i++)
yield return sb[i];
}
... used here, count will be 4
StringBuilder b = new StringBuilder();
b.AppendLine("Hello\n");
b.AppendLine("World\n");
int lineCount = b.GetEnumerator().Count(c => c =='\n');
Derive your own line counting StringBuilder where AppendLine ups an internal line count and provides a method to get the value of line count.
Do a regex to count the number of line terminators (ex: \r\n) in the string. Or, load the strings into a text box and do a line count but thats the hack-ey way of doing it
You can split string bulider data into String[] array and then use String[].Length for number of lines.
something like as below:
String[] linestext = sb.Split(newline)
Console.Writeline(linetext.Length)
Related
I have a string like the following:
string myString = #"This is the first line
This is the third line as the 2nd is empty
The string continues
Anso so on ...
...
..
.";
I know that I can split this into an array, delete the first 2 elements and then rebuild my string from that array but I'm looking for something much more simple.
myString = String.Join("\n", myString.Split('\n').Skip(2));
Here's #maccettura's fiddle of that code with your string literal.
To break that down:
Split on newlines, return a sequence of segments -- the segments are lines, since we split on newline:
myString.Split('\n')
Skip the first two segments, return the rest of the a sequence
.Skip(2)
And rejoin the shorter sequence with newlines:
String.Join("\n", ...
This is just what you were contemplating doing in a loop, but with Skip(), it can be expressed as a readable one-liner.
Lastly, here's #user1242967 's version of the Split() call, which will handle \r\n newlines:
myString.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None)
If you do want to micro-optimize (or your strings are large, or you're calling this in a loop), here is a more performant way to do it:
private static string RemoveFirstTwoLines(string myString) {
int ix = myString.IndexOf('\n');
ix = myString.IndexOf('\n', ix + 1);
return myString.Substring(ix + 1);
}
Good for large strings, code easy to read:
using System.IO;
namespace ConsoleApplication
{
class Program
{
static void Main (string[] args)
{
var text = "line 1\r\nline 2\r\nline 3\r\nline 4";
using (var sr = new StringReader (text))
{
sr.ReadLine ();
sr.ReadLine ();
string result = sr.ReadToEnd ();
}
}
}
}
im trying to make a code that searches through a textfile for a certain phrase and then populates a textbox with the line if a phrase occurs in that. There are no errors with this code, but it doesn't work at all. Anyone know what is wrong? I'm not too sure if what i'm doing is remotely correct.
{
tuitDisplayTextBox.Text = "";
string[] tuitFilePath = File.ReadAllLines(Server.MapPath("~") +"/App_Data/tuitterMessages.txt");
for (int i = 0; i < tuitFilePath.Length; i++)
{
if (tuitFilePath[i].Contains(searchTextBox.Text))
{
tuitDisplayTextBox.Text += tuitFilePath[i];
}
}
Your solution should work... for the last line that matches, and only that one.
LINQ can help you here, though. Here's a solution that should work.
tuitDisplayTextBox.Text =
File.ReadLines(Server.MapPath("~") +"/App_Data/tuitterMessages.txt")
.Where(n => n.Contains(searchTextBox.Text)).Aggregate((a, b) =>
a + Enviroment.NewLine + b);
Here, what it does is it reads the lines of the file into an IEnumerable<string>, and then I filter that with the Where method, which basically means "if the condition is true for this element, add this element to the list of things to return, else don't add it". And then Aggregate is a bit more complicated. Basically what it does is it takes the first two items from the collection, and then pass a lambda through them that returns a value. Then call the lambda again with that result and the third element. And then it takes that result and calls it with the fourth element. And so on.
Here's some code more similar to yours that will also work:
tuitDisplayTextBox.Text = "";
IEnumerable<string> lines =
File.ReadAllLines(Server.MapPath("~") +"/App_Data/tuitterMessages.txt");
StringBuilder sb = new StringBuilder
foreach (string line in lines)
{
if (line.Contains(searchTextBox.Text))
{
sb.AppendLine(line);
}
}
tuitDisplayTextBox.Text = sb.ToString();
Here it's a bit different. First it reads all the lines into an IEnumerable<string> called lines. Then it makes a StringBuilder object (basically a mutable string). After that, it foreaches the lines in the IEnumerable<string> (I thought it was more appropriate here) and then if the line contains the text you want, it adds that line and a newline to the StringBuilder object. After that, it sets your textbox's text to the result of all of that, by getting the string representation of the StringBuilder instance.
And if you really want a for loop, here's the code modified to use a for loop:
tuitDisplayTextBox.Text = "";
string[] lines =
File.ReadAllLines(Server.MapPath("~") +"/App_Data/tuitterMessages.txt");
StringBuilder sb = new StringBuilder
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].Contains(searchTextBox.Text))
{
sb.AppendLine(lines[i]);
}
}
tuitDisplayTextBox.Text = sb.ToString();
Please note that File.ReadAllLines break sentences at '\r' or '\n'.
So, if you search for "hello world" and this text is break in the file into 2 lines (e.g. "... hello /n world" your code will failed...
So, use the ReadAllText() instead, return one string contains all file's text.
Still, you might face sometimes problems with file encoding, but this is another issue.
After, and if, you find the text you are searching for you can use the ReadAllLines to decide about the location of the text.
I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:
word:
TEST:
word:
TEST:
TEST: // random text
"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:
word_ONE:
TEST_ONE:
word_TWO:
TEST_TWO:
TEST_THREE: // random text
The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?
Regex is perfect for this job - using Replace with a match evaluator:
This example is not tested nor compiled:
public class Fix
{
public static String Execute(string largeText)
{
return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
}
private Dictionary<String, int> counters = new Dictionary<String, int>();
private static String[] numbers = {"ONE", "TWO", "THREE",...};
public String Evaluator(Match m)
{
String word = m.Groups[1].Value;
int count;
if (!counters.TryGetValue(word, out count))
count = 0;
count++;
counters[word] = count;
return word + "_" + numbers[count-1] + ":";
}
}
This should return what you requested when calling:
result = Fix.Execute(largeText);
i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.
Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
string matchedString = m.ToString();
if(wordCount.Contains(matchedString))
wordCount[matchedString]=wordCount[matchedString]+1;
else
wordCount.Add(matchedString, 1);
return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}
string inputText = "....";
string regexText = #"";
static void Main()
{
string text = "....";
string result = Regex.Replace(text, #"^\b[A-Za-z0-9_]{4,}\b:",
new MatchEvaluator(AppendIndex));
}
see this:
http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx
If I understand you correctly, regex is not necessary here.
You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.
EDIT
Read your file line by line OR split the string by '\n'
Check if your delimiter is present. Either by splitting by ':' OR using regex.
Get the first item from the split array OR the first match of your regex.
Use a dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.
Look at this example (I know it's not perfect and not so nice)
lets leave the exact argument for the Split function, I think it can help
static void Main(string[] args)
{
string a = "word:word:test:-1+234=567:test:test:";
string[] tks = a.Split(':');
Regex re = new Regex(#"^\b[A-Za-z0-9_]{4,}\b");
var res = from x in tks
where re.Matches(x).Count > 0
select x + DecodeNO(tks.Count(y=>y.Equals(x)));
foreach (var item in res)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static string DecodeNO(int n)
{
switch (n)
{
case 1:
return "_one";
case 2:
return "_two";
case 3:
return "_three";
}
return "";
}
I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):
string[] lines = tbIn.Text.Split('\n')
.Select(t => t.Replace("\r", "\r\n")).ToArray();
So I'm wondering - is there a better way to do it?
Accepted answer
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
The following seems to do the job:
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.
(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.
Something along the lines of using this regular expression:
[^\n\r]*\r\n
Then use Regex.Matches().
The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)
Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:
string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
lines[i] = lines[i].Replace("\r", "\r\n");
}
... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.
One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?
If you are just going to replace the newline (\n) then do something like this:
string[] lines = tbIn.Text.Split('\n')
.Select(t => t + "\r\n").ToArray();
Edit: Regex.Replace allows you to split on a string.
string[] lines = Regex.Split(tbIn.Text, "\r\n")
.Select(t => t + "\r\n").ToArray();
As always, extension method goodies :)
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
usage:
string text = "One,Two,Three,Four";
foreach (var s in text.SplitAndKeep(","))
{
Console.WriteLine(s);
}
Output:
One,
Two,
Three,
Four
You can achieve this with a regular expression. Here's an extension method with it:
public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
{
MatchCollection matches = Regex.Matches(input, #"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
string[] result = new string[matches.Count];
for (int i = 0; i < matches.Count ; i++)
{
result[i] = matches[i].Value;
}
return result;
}
I'm not sure if this is a better solution. Yours is very compact and simple.
I know this is a bit of a newbie question, but are there equivalents to C#'s string operations in Java?
Specifically, I'm talking about String.Format and String.Join.
The Java String object has a format method (as of 1.5), but no join method.
To get a bunch of useful String utility methods not already included you could use org.apache.commons.lang.StringUtils.
String.format. As for join, you need to write your own:
static String join(Collection<?> s, String delimiter) {
StringBuilder builder = new StringBuilder();
Iterator<?> iter = s.iterator();
while (iter.hasNext()) {
builder.append(iter.next());
if (!iter.hasNext()) {
break;
}
builder.append(delimiter);
}
return builder.toString();
}
The above comes from http://snippets.dzone.com/posts/show/91
Guava comes with the Joiner class.
import com.google.common.base.Joiner;
Joiner.on(separator).join(data);
As of Java 8, join() is now available as two class methods on the String class. In both cases the first argument is the delimiter.
You can pass individual CharSequences as additional arguments:
String joined = String.join(", ", "Antimony", "Arsenic", "Aluminum", "Selenium");
// "Antimony, Arsenic, Alumninum, Selenium"
Or you can pass an Iterable<? extends CharSequence>:
List<String> strings = new LinkedList<String>();
strings.add("EX");
strings.add("TER");
strings.add("MIN");
strings.add("ATE");
String joined = String.join("-", strings);
// "EX-TER-MIN-ATE"
Java 8 also adds a new class, StringJoiner, which you can use like this:
StringJoiner joiner = new StringJoiner("&");
joiner.add("x=9");
joiner.add("y=5667.7");
joiner.add("z=-33.0");
String joined = joiner.toString();
// "x=9&y=5667.7&z=-33.0"
TextUtils.join is available on Android
You can also use variable arguments for strings as follows:
String join (String delim, String ... data) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < data.length; i++) {
sb.append(data[i]);
if (i >= data.length-1) {break;}
sb.append(delim);
}
return sb.toString();
}
As for join, I believe this might look a little less complicated:
public String join (Collection<String> c) {
StringBuilder sb=new StringBuilder();
for(String s: c)
sb.append(s);
return sb.toString();
}
I don't get to use Java 5 syntax as much as I'd like (Believe it or not, I've been using 1.0.x lately) so I may be a bit rusty, but I'm sure the concept is correct.
edit addition: String appends can be slowish, but if you are working on GUI code or some short-running routine, it really doesn't matter if you take .005 seconds or .006, so if you had a collection called "joinMe" that you want to append to an existing string "target" it wouldn't be horrific to just inline this:
for(String s : joinMe)
target += s;
It's quite inefficient (and a bad habit), but not anything you will be able to perceive unless there are either thousands of strings or this is inside a huge loop or your code is really performance critical.
More importantly, it's easy to remember, short, quick and very readable. Performance isn't always the automatic winner in design choices.
Here is a pretty simple answer. Use += since it is less code and let the optimizer convert it to a StringBuilder for you. Using this method, you don't have to do any "is last" checks in your loop (performance improvement) and you don't have to worry about stripping off any delimiters at the end.
Iterator<String> iter = args.iterator();
output += iter.hasNext() ? iter.next() : "";
while (iter.hasNext()) {
output += "," + iter.next();
}
I didn't want to import an entire Apache library to add a simple join function, so here's my hack.
public String join(String delim, List<String> destinations) {
StringBuilder sb = new StringBuilder();
int delimLength = delim.length();
for (String s: destinations) {
sb.append(s);
sb.append(delim);
}
// we have appended the delimiter to the end
// in the previous for-loop. Let's now remove it.
if (sb.length() >= delimLength) {
return sb.substring(0, sb.length() - delimLength);
} else {
return sb.toString();
}
}
If you wish to join (concatenate) several strings into one, you should use a StringBuilder. It is far better than using
for(String s : joinMe)
target += s;
There is also a slight performance win over StringBuffer, since StringBuilder does not use synchronization.
For a general purpose utility method like this, it will (eventually) be called many times in many situations, so you should make it efficient and not allocate many transient objects. We've profiled many, many different Java apps and almost always find that string concatenation and string/char[] allocations take up a significant amount of time/memory.
Our reusable collection -> string method first calculates the size of the required result and then creates a StringBuilder with that initial size; this avoids unecessary doubling/copying of the internal char[] used when appending strings.
I wrote own:
public static String join(Collection<String> col, String delim) {
StringBuilder sb = new StringBuilder();
Iterator<String> iter = col.iterator();
if (iter.hasNext())
sb.append(iter.next().toString());
while (iter.hasNext()) {
sb.append(delim);
sb.append(iter.next().toString());
}
return sb.toString();
}
but Collection isn't supported by JSP, so for tag function I wrote:
public static String join(List<?> list, String delim) {
int len = list.size();
if (len == 0)
return "";
StringBuilder sb = new StringBuilder(list.get(0).toString());
for (int i = 1; i < len; i++) {
sb.append(delim);
sb.append(list.get(i).toString());
}
return sb.toString();
}
and put to .tld file:
<?xml version="1.0" encoding="UTF-8"?>
<taglib version="2.1" xmlns="http://java.sun.com/xml/ns/javaee"
<function>
<name>join</name>
<function-class>com.core.util.ReportUtil</function-class>
<function-signature>java.lang.String join(java.util.List, java.lang.String)</function-signature>
</function>
</taglib>
and use it in JSP files as:
<%#taglib prefix="funnyFmt" uri="tag:com.core.util,2013:funnyFmt"%>
${funnyFmt:join(books, ", ")}
StringUtils is a pretty useful class in the Apache Commons Lang library.
There is MessageFormat.format() which works like C#'s String.Format().
I see a lot of overly complex implementations of String.Join here. If you don't have Java 1.8, and you don't want to import a new library the below implementation should suffice.
public String join(Collection<String> col, String delim) {
StringBuilder sb = new StringBuilder();
for ( String s : col ) {
if ( sb.length() != 0 ) sb.append(delim);
sb.append(s);
}
return sb.toString();
}
ArrayList<Double> j=new ArrayList<>;
j.add(1);
j.add(.92);
j.add(3);
String ntop=j.toString(); //ntop= "[1, 0.92, 3]"
So basically, the String ntop stores the value of the entire collection with comma separators and brackets.
I would just use the string concatenation operator "+" to join two strings. s1 += s2;