Java equivalents of C# String.Format() and String.Join() - c#

I know this is a bit of a newbie question, but are there equivalents to C#'s string operations in Java?
Specifically, I'm talking about String.Format and String.Join.

The Java String object has a format method (as of 1.5), but no join method.
To get a bunch of useful String utility methods not already included you could use org.apache.commons.lang.StringUtils.

String.format. As for join, you need to write your own:
static String join(Collection<?> s, String delimiter) {
StringBuilder builder = new StringBuilder();
Iterator<?> iter = s.iterator();
while (iter.hasNext()) {
builder.append(iter.next());
if (!iter.hasNext()) {
break;
}
builder.append(delimiter);
}
return builder.toString();
}
The above comes from http://snippets.dzone.com/posts/show/91

Guava comes with the Joiner class.
import com.google.common.base.Joiner;
Joiner.on(separator).join(data);

As of Java 8, join() is now available as two class methods on the String class. In both cases the first argument is the delimiter.
You can pass individual CharSequences as additional arguments:
String joined = String.join(", ", "Antimony", "Arsenic", "Aluminum", "Selenium");
// "Antimony, Arsenic, Alumninum, Selenium"
Or you can pass an Iterable<? extends CharSequence>:
List<String> strings = new LinkedList<String>();
strings.add("EX");
strings.add("TER");
strings.add("MIN");
strings.add("ATE");
String joined = String.join("-", strings);
// "EX-TER-MIN-ATE"
Java 8 also adds a new class, StringJoiner, which you can use like this:
StringJoiner joiner = new StringJoiner("&");
joiner.add("x=9");
joiner.add("y=5667.7");
joiner.add("z=-33.0");
String joined = joiner.toString();
// "x=9&y=5667.7&z=-33.0"

TextUtils.join is available on Android

You can also use variable arguments for strings as follows:
String join (String delim, String ... data) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < data.length; i++) {
sb.append(data[i]);
if (i >= data.length-1) {break;}
sb.append(delim);
}
return sb.toString();
}

As for join, I believe this might look a little less complicated:
public String join (Collection<String> c) {
StringBuilder sb=new StringBuilder();
for(String s: c)
sb.append(s);
return sb.toString();
}
I don't get to use Java 5 syntax as much as I'd like (Believe it or not, I've been using 1.0.x lately) so I may be a bit rusty, but I'm sure the concept is correct.
edit addition: String appends can be slowish, but if you are working on GUI code or some short-running routine, it really doesn't matter if you take .005 seconds or .006, so if you had a collection called "joinMe" that you want to append to an existing string "target" it wouldn't be horrific to just inline this:
for(String s : joinMe)
target += s;
It's quite inefficient (and a bad habit), but not anything you will be able to perceive unless there are either thousands of strings or this is inside a huge loop or your code is really performance critical.
More importantly, it's easy to remember, short, quick and very readable. Performance isn't always the automatic winner in design choices.

Here is a pretty simple answer. Use += since it is less code and let the optimizer convert it to a StringBuilder for you. Using this method, you don't have to do any "is last" checks in your loop (performance improvement) and you don't have to worry about stripping off any delimiters at the end.
Iterator<String> iter = args.iterator();
output += iter.hasNext() ? iter.next() : "";
while (iter.hasNext()) {
output += "," + iter.next();
}

I didn't want to import an entire Apache library to add a simple join function, so here's my hack.
public String join(String delim, List<String> destinations) {
StringBuilder sb = new StringBuilder();
int delimLength = delim.length();
for (String s: destinations) {
sb.append(s);
sb.append(delim);
}
// we have appended the delimiter to the end
// in the previous for-loop. Let's now remove it.
if (sb.length() >= delimLength) {
return sb.substring(0, sb.length() - delimLength);
} else {
return sb.toString();
}
}

If you wish to join (concatenate) several strings into one, you should use a StringBuilder. It is far better than using
for(String s : joinMe)
target += s;
There is also a slight performance win over StringBuffer, since StringBuilder does not use synchronization.
For a general purpose utility method like this, it will (eventually) be called many times in many situations, so you should make it efficient and not allocate many transient objects. We've profiled many, many different Java apps and almost always find that string concatenation and string/char[] allocations take up a significant amount of time/memory.
Our reusable collection -> string method first calculates the size of the required result and then creates a StringBuilder with that initial size; this avoids unecessary doubling/copying of the internal char[] used when appending strings.

I wrote own:
public static String join(Collection<String> col, String delim) {
StringBuilder sb = new StringBuilder();
Iterator<String> iter = col.iterator();
if (iter.hasNext())
sb.append(iter.next().toString());
while (iter.hasNext()) {
sb.append(delim);
sb.append(iter.next().toString());
}
return sb.toString();
}
but Collection isn't supported by JSP, so for tag function I wrote:
public static String join(List<?> list, String delim) {
int len = list.size();
if (len == 0)
return "";
StringBuilder sb = new StringBuilder(list.get(0).toString());
for (int i = 1; i < len; i++) {
sb.append(delim);
sb.append(list.get(i).toString());
}
return sb.toString();
}
and put to .tld file:
<?xml version="1.0" encoding="UTF-8"?>
<taglib version="2.1" xmlns="http://java.sun.com/xml/ns/javaee"
<function>
<name>join</name>
<function-class>com.core.util.ReportUtil</function-class>
<function-signature>java.lang.String join(java.util.List, java.lang.String)</function-signature>
</function>
</taglib>
and use it in JSP files as:
<%#taglib prefix="funnyFmt" uri="tag:com.core.util,2013:funnyFmt"%>
${funnyFmt:join(books, ", ")}

StringUtils is a pretty useful class in the Apache Commons Lang library.

There is MessageFormat.format() which works like C#'s String.Format().

I see a lot of overly complex implementations of String.Join here. If you don't have Java 1.8, and you don't want to import a new library the below implementation should suffice.
public String join(Collection<String> col, String delim) {
StringBuilder sb = new StringBuilder();
for ( String s : col ) {
if ( sb.length() != 0 ) sb.append(delim);
sb.append(s);
}
return sb.toString();
}

ArrayList<Double> j=new ArrayList<>;
j.add(1);
j.add(.92);
j.add(3);
String ntop=j.toString(); //ntop= "[1, 0.92, 3]"
So basically, the String ntop stores the value of the entire collection with comma separators and brackets.

I would just use the string concatenation operator "+" to join two strings. s1 += s2;

Related

Proper way in C# to combine an arbitrary number of strings into a single string

I breezed through the documentation for the string class and didn't see any good tools for combining an arbitrary number of strings into a single string. The best procedure I could come up with in my program is
string [] assetUrlPieces = { Server.MapPath("~/assets/"),
"organizationName/",
"categoryName/",
(Guid.NewGuid().ToString() + "/"),
(Path.GetFileNameWithoutExtension(file.FileName) + "/")
};
string assetUrl = combinedString(assetUrlPieces);
private string combinedString ( string [] pieces )
{
string alltogether = "";
foreach (string thispiece in pieces) alltogether += alltogether + thispiece;
return alltogether;
}
but that seems like too much code and too much inefficiency (from the string addition) and awkwardness.
If you want to insert a separator between values, string.Join is your friend. If you just want to concatenate the strings, then you can use string.Concat:
string assetUrl = string.Concat(assetUrlPieces);
That's marginally simpler (and possibly more efficient, but probably insignificantly) than calling string.Join with an empty separator.
As noted in comments, if you're actually building up the array at the same point in the code that you do the concatenation, and you don't need the array for anything else, just use concatenation directly:
string assetUrl = Server.MapPath("~/assets/") +
"organizationName/" +
"categoryName/" +
Guid.NewGuid() + "/" +
Path.GetFileNameWithoutExtension(file.FileName) + "/";
... or potentially use string.Format instead.
I prefer using string.Join:
var result = string.Join("", pieces);
You can read about string.Join on MSDN
You want a StringBuilder, I think.
var sb = new StringBuilder(pieces.Count());
foreach(var s in pieces) {
sb.Append(s);
}
return sb.ToString();
Update
#FiredFromAmazon.com: I think you'll want to go with the string.Concat solution offered by others for
Its sheer simplicity
Higher performance. Under the hood, it uses FillStringChecked, which does pointer copies, whereas string.Join uses StringBuilder. See http://referencesource.microsoft.com/#mscorlib/system/string.cs,1512. (Thank you to #Bas).
string.Concat is the most appropriate method for what you want.
var result = string.Concat(pieces);
Unless you want to put delimiters between the individual strings. Then you'd use string.Join
var result = string.Join(",", pieces); // comma delimited result.
A simple way to do this with a regular for loop:
(since you can use the indices, plus I like these loops better than foreach loops)
private string combinedString(string[] pieces)
{
string alltogether = "";
for (int index = 0; index <= pieces.Length - 1; index++) {
if (index != pieces.Length - 1) {
alltogether += string.Format("{0}/" pieces[index]);
}
}
return alltogether;

Best way to search for many guids at once in a string?

I have before me the problem of searching for 90,000 GUIDs in a string. I need to get all instances of each GUID. Technically, I need to replace them too, but that's another story.
Currently, I'm searching for each one individually, using a regex. But it occurred to me that I could get better performance by searching for them all together. I've read about tries in the past and have never used one but it occurred to me that I could construct a trie of all 90,000 GUIDs and use that to search.
Alternatively, perhaps there is an existing library in .NET that can do this. It crossed my mind that there is no reason why I shouldn't be able to get good performance with just a giant regex, but this appears not to work.
Beyond that, perhaps there is some clever trick I could use relating to GUID structure to get better results.
This isn't really a crucial problem for me, but I thought I might be able to learn something.
Have a look at the Rabin-Karp string search algorithm. It's well suited for multi-pattern searching in a string:
http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_string_search_algorithm#Rabin.E2.80.93Karp_and_multiple_pattern_search
You will not get good performance with RegEx because it's performance is inherently poor. Additionally, if all the GUID's share the same format you should only need one RegEx. and regex.Replace(input, replacement); would do it.
If you have the list of guids in memory already the performance would be better by looping over that list and doing calling String.Replace like so
foreach(string guid in guids)
inputString.replace(guid, replacement);
I developed a method for replacing a large number of strings a while back, that may be useful:
A better way to replace many strings - obfuscation in C#
Another option would be to use a regular expression to find all GUIDs in the string, then loop through them and check if each is part of your set of GUIDs.
Basic example, using a Dictionary for fast lookup of the GUIDs:
Dictionary<string, string> guids = new Dictionary<string, string>();
guids.Add("3f74a071-54fc-10de-0476-a6b991f0be76", "(replacement)");
string text = "asdf 3f74a071-54fc-10de-0476-a6b991f0be76 lkaq2hlqwer";
text = Regex.Replace(text, #"[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}", m => {
string replacement;
if (guids.TryGetValue(m.Value, out replacement)) {
return replacement;
} else {
return m.Value;
}
});
Console.WriteLine(text);
Output:
asdf (replacement) lkaq2hlqwer
OK, this looks good. So to be clear here is the original code, which took 65s to run on the example string:
var unusedGuids = new HashSet<Guid>(oldToNewGuid.Keys);
foreach (var guid in oldToNewGuid) {
var regex = guid.Key.ToString();
if (!Regex.IsMatch(xml, regex))
unusedGuids.Add(guid.Key);
else
xml = Regex.Replace(xml, regex, guid.Value.ToString());
}
The new code is as follows and takes 6.7s:
var unusedGuids = new HashSet<Guid>(oldToNewGuid.Keys);
var guidHashes = new MultiValueDictionary<int, Guid>();
foreach (var guid in oldToNewGuid.Keys) {
guidHashes.Add(guid.ToString().GetHashCode(), guid);
}
var indices = new List<Tuple<int, Guid>>();
const int guidLength = 36;
for (int i = 0; i < xml.Length - guidLength; i++) {
var substring = xml.Substring(i, guidLength);
foreach (var value in guidHashes.GetValues(substring.GetHashCode())) {
if (value.ToString() == substring) {
unusedGuids.Remove(value);
indices.Add(new Tuple<int, Guid>(i, value));
break;
}
}
}
var builder = new StringBuilder();
int start = 0;
for (int i = 0; i < indices.Count; i++) {
var tuple = indices[i];
var substring = xml.Substring(start, tuple.Item1 - start);
builder.Append(substring);
builder.Append(oldToNewGuid[tuple.Item2].ToString());
start = tuple.Item1 + guidLength;
}
builder.Append(xml.Substring(start, xml.Length - start));
xml = builder.ToString();

How to Efficient way comma separator use

I work on C#. I have an array. To separate the array items I need to use comma. I did it but I think it's not efficient. How to do that, without an if condition? Please don't use replace method. My syntax is below.
string container = "";
string[] s = "Hellow world how are you".Split(' ');
foreach (string item in s)
{
if (container == "")
{
container += item;
}
else
{
container += "," + item;
}
}
I must need to continue the loop. I just want below type solution.
string container = "";
string[] s = "Hellow world how are you".Split(' ');
foreach (string item in s)
{
container += "," + item;
}
Thanks in advance. If have any queries please ask.
Using String.Join to join an array with comma separators.
string[] s = "Hello world how are you".Split(' ');
string container = String.Join(",", s);
Also, if you like getting help on this site, I recommend you start accepting a few answers.
Your problem is not the if statement. Your problem is that it is generally poor form and bad practice to perform string concatenation and other manipulations in a loop. The string class is immutable, changes are creating new strings, allocation new memory, etc. As a result, this practice is slow and inefficient, much more than your if statement will be. The more iterations of your loop, the more you'll notice the inefficiency.
You should familiarize yourself with the StringBuilder class, which allows you to perform efficient manipulations of a string without repeatedly allocating new objects. It is particularly useful in loops like yours above.
An example of using a StringBuilder is like the following
StringBuilder builder = new StringBuilder();
foreach (string item in array)
{
if (builder.Length != 0) builder.Append(",");
builder.Append(item);
}
string finalOutput = builder.ToString();
With that said, string.Join is also a powerful tool for the type of concatenation you are performing.
string.split starts with a string and ends with an array of strings and
is very fast . It uses unsafe code to determine the indexes of the separators. Then it allocates the array of the correct size and then cuts up the original string by allocating a bunch of other strings.
string.join starts with an array of strings and ends with a string and is also very fast and uses unsafe code. It creates a buffer and adds to the buffer each item in the string growing the string as it goes.
But since you want to Start with a string and end with a string your best bet is to use a method that uses unsafe code to change the ' ' with ','.
string s1 = "Hellow world how are you";
fixed (char* p = s1)
{
for (int i = 0; i < s1.Length; i++)
{
if (p[i] == ' ')
{
p[i] = ',';
}
}
}
This is a really bad idea
It only works because the source and target are the same length
It requires unsafe code
Since I'm mutating the string directly all references to the string get updated
There's probably a bunch of checks that I'm missing
Its only marginally faster then string.replace
Just use String.Replace if you really need it to be very fast and its very safe

How to retrieve a StringBuilder Line Count?

I have a StringBuilder instance where I am doing numerous sb.AppendLine("test"); for example.
How do I work out how many lines I have?
I see the class has .Length but that tells me how many characters in all.
Any ideas?
Sorted by efficiency:
Counting your AppendLine() calls
Calling IndexOf() in a loop
Using Regex
Using String.Split()
The last one is extraordinary expensive and generates lots of garbage, don't use.
You could wrap StringBuilder with your own class that would keep a count of lines as they are added or could the number of '\n' after your builder is full.
Regex.Matches(builder.ToString(), Environment.NewLine).Count
You can create a wrapper class do the following:
public class Wrapper
{
private StringBuilder strBuild = null;
private int count = 0;
public Wrapper(){
strBuild = new StringBuilder();
}
public void AppendLine(String toAppendParam){
strBuild.AppendLine(toAppendParam);
count++;
}
public StringBuilder getStringBuilder(){
return strBuild;
}
public int getCount(){
return count;
}
}
Try this:
sb.ToString().Split(System.Environment.NewLine.ToCharArray()).Length;
You should be able to search for the number of occurences of \n in the string.
UPDATE:
One way could be to split on the newline character and count the number of elements in the array as follows:
sb.ToString().Split('\n').length;
If you're going to use String.Split(), you will need to split the string with some options. Like this:
static void Main(string[] args)
{
var sb = new StringBuilder();
sb.AppendLine("this");
sb.AppendLine("is");
sb.AppendLine("a");
sb.AppendLine("test");
// StringSplitOptions.None counts the last (blank) newline
// which the last AppendLine call creates
// if you don't want this, then replace with
// StringSplitOptions.RemoveEmptyEntries
var lines = sb.ToString().Split(
new string[] {
System.Environment.NewLine },
StringSplitOptions.None).Length;
Console.WriteLine("Number of lines: " + lines);
Console.WriteLine("Press enter to exit.");
Console.ReadLine();
}
This results in:
Number of lines: 5
UPDATE What Gabe said
b.ToString().Count(c => c =='\n') would work here too, and might not
be much less efficient (aside from creating a separate copy of the
string!).
A better way, faster than creating a string from the StringBuilder and splitting it (or creating the string and regexing it), is to look into the StringBuilder and count the number of '\n' characters there in.
The following extension method will enumerate through the characters in the string builder, you can then linq on it until to your heart is content.
public static IEnumerable<char> GetEnumerator(this StringBuilder sb)
{
for (int i = 0; i < sb.Length; i++)
yield return sb[i];
}
... used here, count will be 4
StringBuilder b = new StringBuilder();
b.AppendLine("Hello\n");
b.AppendLine("World\n");
int lineCount = b.GetEnumerator().Count(c => c =='\n');
Derive your own line counting StringBuilder where AppendLine ups an internal line count and provides a method to get the value of line count.
Do a regex to count the number of line terminators (ex: \r\n) in the string. Or, load the strings into a text box and do a line count but thats the hack-ey way of doing it
You can split string bulider data into String[] array and then use String[].Length for number of lines.
something like as below:
String[] linestext = sb.Split(newline)
Console.Writeline(linetext.Length)

How to split a string while preserving line endings?

I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):
string[] lines = tbIn.Text.Split('\n')
.Select(t => t.Replace("\r", "\r\n")).ToArray();
So I'm wondering - is there a better way to do it?
Accepted answer
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
The following seems to do the job:
string[] lines = Regex.Split(tbIn.Text, #"(?<=\r\n)(?!$)");
(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.
(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.
Something along the lines of using this regular expression:
[^\n\r]*\r\n
Then use Regex.Matches().
The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)
Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:
string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
lines[i] = lines[i].Replace("\r", "\r\n");
}
... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.
One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?
If you are just going to replace the newline (\n) then do something like this:
string[] lines = tbIn.Text.Split('\n')
.Select(t => t + "\r\n").ToArray();
Edit: Regex.Replace allows you to split on a string.
string[] lines = Regex.Split(tbIn.Text, "\r\n")
.Select(t => t + "\r\n").ToArray();
As always, extension method goodies :)
public static class StringExtensions
{
public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
{
string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
for (int i = 0; i < obj.Length; i++)
{
string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
yield return result;
}
}
}
usage:
string text = "One,Two,Three,Four";
foreach (var s in text.SplitAndKeep(","))
{
Console.WriteLine(s);
}
Output:
One,
Two,
Three,
Four
You can achieve this with a regular expression. Here's an extension method with it:
public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
{
MatchCollection matches = Regex.Matches(input, #"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
string[] result = new string[matches.Count];
for (int i = 0; i < matches.Count ; i++)
{
result[i] = matches[i].Value;
}
return result;
}
I'm not sure if this is a better solution. Yours is very compact and simple.

Categories