Related
So i have this C# code:
static void Main(string[] args)
{
string #string = "- hello dude! - oh hell yeah hey what's up guy";
Console.WriteLine(String.Join(".", #string.GetSubstringsIndexes("he")));
Console.Read();
}
partial class that adds an extension "GetSubstringsIndexes" method:
partial class StringExtension
{
public static int[] GetSubstringsIndexes(this string #string, string substring)
{
List<int> indexes = new List<int>(#string.Length / substring.Length);
int result = #string.IndexOf(substring, 0);
while (result >= 0)
{
indexes.Add(result);
result = #string.IndexOf(substring, result + substring.Length);
}
return indexes.ToArray();
}
}
What i would want it to be like, is a lambda expression in the parameters brackets of a String.Join method instead of calling a function i wrote.
I mean, i would just want not to write this function and THEN call it, but to write a lambda expression to use only once!
Example of how i would want it to look like:
static void Main(string[] args)
{
string #string = "- hello dude! - oh hell yeah hey what's up guy";
Console.WriteLine(String.Join(".", () => {List<int> ind = new List<int>()..... AND SO ON...} ));
Console.Read();
}
Well, actually, I've just realized (while writing this question) that for this kind of a situation it is unnecessary, because my GetSubStringsIndexes method is too big. But imagine if it were a short one.
Just tell me whether or not it is possible to do something like that, and if it is possible, please, tell me how!
Edit:
I've done it and that's how it looks like:
Console.WriteLine(String.Join(".", ((Func<int[]>)
( () =>
{
List<int> indx = new List<int>();
int res = #string.IndexOf("he", 0);
while (res >= 0)
{
indx.Add(res);
res = #string.IndexOf("he", res + "he".Length);
}
return indx.ToArray();
}
))()));
Your "improvement" in the question works. Here is a more concise way of doing that with a helper function that you need to define only once:
static void Execute<TReturn>(Func<TReturn> func) => func();
Then:
Console.WriteLine(Execute(() => { /* any code here */ }));
This infers the delegate type automatically and calls the delegate. This removes a lot of clutter.
In general I'd advise against this style. Use multiple lines of code instead.
What you want isn't possible, as String.join() doesn't accept a Func<int[]> or an Expression<Func<int[]>>.
You could use a local function and call that, if you don't want to write an extension method.
static void Main(string[] args)
{
string #string = "- hello dude! - oh hell yeah hey what's up guy";
Console.WriteLine(String.Join(".", GetIndexes('he'));
Console.Read();
int[] GetIndexes(string substring) {
var indexes = new List<int>();
// compute indexes as desired. #string is available here.
return indexes.ToArray();
}
}
One way to do it would be to use a Select statement where you capture both the character being examined and it's index in the string (using the syntax: Select((item, index) => ...), and then, if the substring exists at that index, return the index, otherwise return -1 (from the Select statement), and then follow that up with a Where clause to remove any -1 results.
The code is a little long because we also have to be sure that we aren't too close to the end of the string before we check the substring (which results in another ternary condition that returns -1).
I'm sure this can be improved, but it's a start:
string #string = "- hello dude! - oh hell yeah hey what's up guy";
var subString = "he";
Console.WriteLine(string.Join(".", #string.Select((chr, index) =>
index + subString.Length < #string.Length
? #string.Substring(index, subString.Length) == subString
? index
: -1
: -1)
.Where(result => result > -1)));
In case that's too hard to read (due to the multiple ?: ternary expressions), here it is with comments before each line:
// For each character in the string, grab the character and it's index
Console.WriteLine(string.Join(".", #string.Select((chr, index) =>
// If we're not too close to the end of the string
index + subString.Length < #string.Length
// And the substring exists at this index
? #string.Substring(index, subString.Length) == subString
// Return this index
? index
// Substring not found here; return -1
: -1
// We're too close to end; return -1
: -1)
// Return only the indexes where the substring was found
.Where(result => result > -1)));
The final result:
Console.WriteLine(String.Join(".", ((Func<int[]>)
( () =>
{
List<int> indx = new List<int>();
int res = #string.IndexOf("he", 0);
while (res >= 0)
{
indx.Add(res);
res = #string.IndexOf("he", res + "he".Length);
}
return indx.ToArray();
}
))()));
I want an efficient way of grouping strings whilst keeping duplicates and order.
Something like this
1100110002200 -> 101020
I tried this previously
_case.GroupBy(c => c).Select(g => g.Key)
but I got 102
But this gives me what I want, I just want to optimize it, so I wouldn't have to scour the entire list each time
static List<char> group(string _case)
{
var groups = new List<char>();
for (int i = 0; i < _case.Length; i++)
{
if (groups.LastOrDefault() != _case[i])
groups.Add(_case[i]);
}
return groups;
}
While I like the elegant solution of rshepp, it turns out that the very basic code can run even 5 times faster than that.
public static string Simplify2(string str)
{
if (string.IsNullOrEmpty(str)) { return str; }
StringBuilder sb = new StringBuilder();
char last = str[0];
sb.Append(last);
foreach (char c in str)
{
if (last != c)
{
sb.Append(c);
last = c;
}
}
return sb.ToString();
}
You could create a method that loops each character and checks the previous character for equality. If they aren't the same, append/yield return the character. This is pretty easy to do with Linq.
public static string Simplify(string str)
{
return string.Concat(str.Where((c, i) => i == 0 || c != str[i - 1]));
}
Usage:
string simplified = Simplify("1100110002200");
// 101020
In my testing, my method and yours are roughly equal in speed, mine being insignificantly slower after 10 million executions (4260ms vs 4241ms).
However, my method returns the result as a string whereas yours doesn't. If you need to convert your result back to a string (which is likely) then my method is indeed much faster/more efficient (4260ms vs 6569ms).
A quick brain teaser: given a string
This is a string with repeating spaces
What would be the LINQ expressing to end up with
This is a string with repeating spaces
Thanks!
For reference, here's one non-LINQ way:
private static IEnumerable<char> RemoveRepeatingSpaces(IEnumerable<char> text)
{
bool isSpace = false;
foreach (var c in text)
{
if (isSpace && char.IsWhiteSpace(c)) continue;
isSpace = char.IsWhiteSpace(c);
yield return c;
}
}
This is not a linq type task, use regex
string output = Regex.Replace(input," +"," ");
Of course you could use linq to apply this to a collection of strings.
public static string TrimInternal(this string text)
{
var trimmed = text.Where((c, index) => !char.IsWhiteSpace(c) || (index != 0 && !char.IsWhiteSpace(text[index - 1])));
return new string(trimmed.ToArray());
}
Since nobody seems to have given a satisfactory answer, I came up with one. Here's a string-based solution (.Net 4):
public static string RemoveRepeatedSpaces(this string s)
{
return s[0] + string.Join("",
s.Zip(
s.Skip(1),
(x, y) => x == y && y == ' ' ? (char?)null : y));
}
However, this is just a general case of removing repeated elements from a sequence, so here's the generalized version:
public static IEnumerable<T> RemoveRepeatedElements<T>(
this IEnumerable<T> s, T dup)
{
return s.Take(1).Concat(
s.Zip(
s.Skip(1),
(x, y) => x.Equals(y) && y.Equals(dup) ? (object)null : y)
.OfType<T>());
}
Of course, that's really just a more specific version of a function that removes all consecutive duplicates from its input stream:
public static IEnumerable<T> RemoveRepeatedElements<T>(this IEnumerable<T> s)
{
return s.Take(1).Concat(
s.Zip(
s.Skip(1),
(x, y) => x.Equals(y) ? (object)null : y)
.OfType<T>());
}
And obviously you can implement the first function in terms of the second:
public static string RemoveRepeatedSpaces(this string s)
{
return string.Join("", s.RemoveRepeatedElements(' '));
}
BTW, I benchmarked my last function against the regex version (Regex.Replace(s, " +", " ")) and they were were within nanoseconds of each other, so the extra LINQ overhead is negligible compared to the extra regex overhead. When I generalized it to remove all consecutive duplicate characters, the equivalent regex (Regex.Replace(s, "(.)\\1+", "$1")) was 3.5 times slower than my LINQ version (string.Join("", s.RemoveRepeatedElements())).
I also tried the "ideal" procedural solution:
public static string RemoveRepeatedSpaces(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
char lastChar = '\0';
foreach (char c in s)
if (c != ' ' || lastChar != ' ')
sb.Append(lastChar = c);
return sb.ToString();
}
This is more than 5 times faster than a regex!
In practice, I would probably just use your original solution or regular expressions (if you want a quick & simple solution). A geeky approach that uses lambda functions would be to define a fixed point operator:
T FixPoint<T>(T initial, Func<T, T> f) {
T current = initial;
do {
initial = current;
current = f(initial);
} while (initial != current);
return current;
}
This keeps calling the operation f repeatedly until the operation returns the same value that it got as an argument. You can think of the operation as a generalized loop - it is quite useful, though I guess it is too geeky to be included in .NET BCL. Then you can write:
string res = FixPoint(original, s => s.Replace(" ", " "));
It is not as efficient as your original version, but unless there are too many spaces it should work fine.
Linq is by definition related to enumerable (i.e. collections, list, arrays). You could transorm your string into a collection of char and select the non space one but this is definitevly not a job for Linq.
Paul Creasey's answer is the way to go.
If you want to treat tabs as whitespace as well, go with:
text = Regex.Replace(text, "[ |\t]+", " ");
UPDATE:
The most logical way to solve this problem while satisfying the "using LINQ" requirement has been suggested by both Hasan and Ani. However, notice that these solutions involve accessing a character in a string by index.
The spirit of the LINQ approach is that it can be applied to any enumerable sequence. Because any reasonably efficient solution to this problem requires maintaining some kind of state (with Ani's and Hasan's solutions it's easy to miss this fact as the state is already maintained within the string itself), a generic approach that accepts any sequence of items is likely going to be much more straightforward to implement using procedural code.
This procedural code may then be abstracted into a method that looks like a LINQ-style method, of course. But I would not recommend tackling a problem like this with the attitude of "I want to use LINQ in this solution" from the get-go because it will impose very awkward restriction on your code.
For what it's worth, here's how I'd implement the general idea.
public static IEnumerable<T> StripConsecutives<T>(this IEnumerable<T> source, T value, IEqualityComparer<T> comparer)
{
// null-checking omitted for brevity
using (var enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
yield return enumerator.Current;
}
else
{
yield break;
}
T prev = enumerator.Current;
while (enumerator.MoveNext())
{
T current = enumerator.Current;
if (comparer.Equals(prev, value) && comparer.Equals(current, value))
{
// This is a consecutive occurrence of value --
// moving on...
}
else
{
yield return current;
}
prev = current;
}
}
}
Split to list, filter, then rejoin, 2 lines of code...
var test = " Alpha Beta Tango ";
var l = test.Split(' ').Where(s => !string.IsNullOrEmpty(s));
var result = string.Join(" ", l);
// result = "Alpha Beta Tango"
Refactoring as an extension method:
using Extensions;
void Main()
{
var test = " Alpha Beta Tango ";
var result = test.RemoveRepeatedSpaces();
// result = "Alpha Beta Tango";
}
static class Extentions
{
public static string RemoveRepeatedSpaces(this string s)
{
if (s == null)
return string.Empty;
var l = s.Split(' ').Where(a => !string.IsNullOrEmpty(a));
return string.Join(" ", l);
}
}
Not that I would want to use this practically (for many reasons) but out of strict curiousity I would like to know if there is a way to reverse order a string using LINQ and/or LAMBDA expressions in one line of code, without utilising any framework "Reverse" methods.
e.g.
string value = "reverse me";
string reversedValue = (....);
and reversedValue will result in "em esrever"
EDIT
Clearly an impractical problem/solution I know this, so don't worry it's strictly a curiosity question around the LINQ/LAMBDA construct.
Well, I can do it in one very long line, even without using LINQ or a lambda:
string original = "reverse me"; char[] chars = original.ToCharArray(); char[] reversed = new char[chars.Length]; for (int i=0; i < chars.Length; i++) reversed[chars.Length-i-1] = chars[i]; string reversedValue = new string(reversed);
(Dear potential editors: do not unwrap this onto multiple lines. The whole point is that it's a single line, as per the sentence above it and the question.)
However, if I saw anyone avoiding using framework methods for the sake of it, I'd question their sanity.
Note that this doesn't use LINQ at all. A LINQ answer would be:
string reverseValue = new string(original.Reverse().ToArray());
Avoiding using Reverse, but using OrderByDescending instead:
string reverseValue = new string(original.Select((c, index) => new { c, index })
.OrderByDescending(x => x.index)
.Select(x => x.c)
.ToArray());
Blech. I like Mehrdad's answer though. Of course, all of these are far less efficient than the straightforward approach.
Oh, and they're all wrong, too. Reversing a string is more complex than reversing the order of the code points. Consider combining characters, surrogate pairs etc...
I don't see a practical use for this but just for the sake of fun:
new string(Enumerable.Range(1, input.Length).Select(i => input[input.Length - i]).ToArray())
new string(value.Reverse().ToArray())
var reversedValue = value.ToCharArray()
.Select(ch => ch.ToString())
.Aggregate<string>((xs, x) => x + xs);
Variant with recursive lambda:
var value = "reverse me";
Func<String, String> f = null; f = s => s.Length == 1 ? s : f(s.Substring(1)) + s[0];
var reverseValue = f(value);
LP,
Dejan
You can use Aggregate to prepend each Char to the reversed string:
"reverse me".Aggregate("", (acc, c) => c + acc);
var reversedValue= "reverse me".Reverse().ToArray();
In addition to one previous post here is a more performant solution.
var actual0 = "reverse me".Aggregate(new StringBuilder(), (x, y) => x.Insert(0, y)).ToString();
public static string Reverse(string word)
{
int index = word.Length - 1;
string reversal = "";
//for each char in word
for (int i = index; index >= 0; index--)
{
reversal = reversal + (word.Substring(index, 1));
Console.WriteLine(reversal);
}
return reversal;
}
Quite simple. So, from this point on, I have a single method that reverses a string, that doesn't use any built-in Reverse functions.
So in your main method, just go,
Console.WriteLine(Reverse("Some word"));
Technically that's your one liner :P
If we need to support combining characters and surrogate pairs:
// This method tries to handle:
// (1) Combining characters
// These are two or more Unicode characters that are combined into one glyph.
// For example, try reversing "Not nai\u0308ve.". The diaresis (ยจ) should stay over the i, not move to the v.
// (2) Surrogate pairs
// These are Unicode characters whose code points exceed U+FFFF (so are not in "plane 0").
// To be represented with 16-bit 'char' values (which are really UTF-16 code units), one character needs *two* char values, a so-called surrogate pair.
// For example, try "The sphere \U0001D54A and the torus \U0001D54B.". The ๐ and the ๐ should be preserved, not corrupted.
var value = "reverse me"; // or "Not nai\u0308ve.", or "The sphere \U0001D54A and the torus \U0001D54B.".
var list = new List<string>(value.Length);
var enumerator = StringInfo.GetTextElementEnumerator(value);
while (enumerator.MoveNext())
{
list.Add(enumerator.GetTextElement());
}
list.Reverse();
var result = string.Concat(list);
Documentation: MSDN: System.Globalization.StringInfo Class
string str="a and b";
string t="";
char[] schar = str.Reverse().ToArray();
foreach (char c in schar )
{
test += c.ToString();
}
What is the most efficient way to write the old-school:
StringBuilder sb = new StringBuilder();
if (strings.Count > 0)
{
foreach (string s in strings)
{
sb.Append(s + ", ");
}
sb.Remove(sb.Length - 2, 2);
}
return sb.ToString();
...in LINQ?
This answer shows usage of LINQ (Aggregate) as requested in the question and is not intended for everyday use. Because this does not use a StringBuilder it will have horrible performance for very long sequences. For regular code use String.Join as shown in the other answer
Use aggregate queries like this:
string[] words = { "one", "two", "three" };
var res = words.Aggregate(
"", // start with empty string to handle empty list case.
(current, next) => current + ", " + next);
Console.WriteLine(res);
This outputs:
, one, two, three
An aggregate is a function that takes a collection of values and returns a scalar value. Examples from T-SQL include min, max, and sum. Both VB and C# have support for aggregates. Both VB and C# support aggregates as extension methods. Using the dot-notation, one simply calls a method on an IEnumerable object.
Remember that aggregate queries are executed immediately.
More information - MSDN: Aggregate Queries
If you really want to use Aggregate use variant using StringBuilder proposed in comment by CodeMonkeyKing which would be about the same code as regular String.Join including good performance for large number of objects:
var res = words.Aggregate(
new StringBuilder(),
(current, next) => current.Append(current.Length == 0? "" : ", ").Append(next))
.ToString();
return string.Join(", ", strings.ToArray());
In .Net 4, there's a new overload for string.Join that accepts IEnumerable<string>. The code would then look like:
return string.Join(", ", strings);
Why use Linq?
string[] s = {"foo", "bar", "baz"};
Console.WriteLine(String.Join(", ", s));
That works perfectly and accepts any IEnumerable<string> as far as I remember. No need Aggregate anything here which is a lot slower.
Have you looked at the Aggregate extension method?
var sa = (new[] { "yabba", "dabba", "doo" }).Aggregate((a,b) => a + "," + b);
Real example from my code:
return selected.Select(query => query.Name).Aggregate((a, b) => a + ", " + b);
A query is an object that has a Name property which is a string, and I want the names of all the queries on the selected list, separated by commas.
Here is the combined Join/Linq approach I settled on after looking at the other answers and the issues addressed in a similar question (namely that Aggregate and Concatenate fail with 0 elements).
string Result = String.Join(",", split.Select(s => s.Name));
or (if s is not a string)
string Result = String.Join(",", split.Select(s => s.ToString()));
Simple
easy to read and understand
works for generic elements
allows using objects or object properties
handles the case of 0-length elements
could be used with additional Linq filtering
performs well (at least in my experience)
doesn't require (manual) creation of an additional object (e.g. StringBuilder) to implement
And of course Join takes care of the pesky final comma that sometimes sneaks into other approaches (for, foreach), which is why I was looking for a Linq solution in the first place.
You can use StringBuilder in Aggregate:
List<string> strings = new List<string>() { "one", "two", "three" };
StringBuilder sb = strings
.Select(s => s)
.Aggregate(new StringBuilder(), (ag, n) => ag.Append(n).Append(", "));
if (sb.Length > 0) { sb.Remove(sb.Length - 2, 2); }
Console.WriteLine(sb.ToString());
(The Select is in there just to show you can do more LINQ stuff.)
quick performance data for the StringBuilder vs Select & Aggregate case over 3000 elements:
Unit test - Duration (seconds)
LINQ_StringBuilder - 0.0036644
LINQ_Select.Aggregate - 1.8012535
[TestMethod()]
public void LINQ_StringBuilder()
{
IList<int> ints = new List<int>();
for (int i = 0; i < 3000;i++ )
{
ints.Add(i);
}
StringBuilder idString = new StringBuilder();
foreach (int id in ints)
{
idString.Append(id + ", ");
}
}
[TestMethod()]
public void LINQ_SELECT()
{
IList<int> ints = new List<int>();
for (int i = 0; i < 3000; i++)
{
ints.Add(i);
}
string ids = ints.Select(query => query.ToString())
.Aggregate((a, b) => a + ", " + b);
}
I always use the extension method:
public static string JoinAsString<T>(this IEnumerable<T> input, string seperator)
{
var ar = input.Select(i => i.ToString());
return string.Join(seperator, ar);
}
By 'super-cool LINQ way' you might be talking about the way that LINQ makes functional programming a lot more palatable with the use of extension methods. I mean, the syntactic sugar that allows functions to be chained in a visually linear way (one after the other) instead of nesting (one inside the other). For example:
int totalEven = Enumerable.Sum(Enumerable.Where(myInts, i => i % 2 == 0));
can be written like this:
int totalEven = myInts.Where(i => i % 2 == 0).Sum();
You can see how the second example is easier to read. You can also see how more functions can be added with less of the indentation problems or the Lispy closing parens appearing at the end of the expression.
A lot of the other answers state that the String.Join is the way to go because it is the fastest or simplest to read. But if you take my interpretation of 'super-cool LINQ way' then the answer is to use String.Join but have it wrapped in a LINQ style extension method that will allow you to chain your functions in a visually pleasing way. So if you want to write sa.Concatenate(", ") you just need to create something like this:
public static class EnumerableStringExtensions
{
public static string Concatenate(this IEnumerable<string> strings, string separator)
{
return String.Join(separator, strings);
}
}
This will provide code that is as performant as the direct call (at least in terms of algorithm complexity) and in some cases may make the code more readable (depending on the context) especially if other code in the block is using the chained function style.
Here it is using pure LINQ as a single expression:
static string StringJoin(string sep, IEnumerable<string> strings) {
return strings
.Skip(1)
.Aggregate(
new StringBuilder().Append(strings.FirstOrDefault() ?? ""),
(sb, x) => sb.Append(sep).Append(x));
}
And its pretty damn fast!
I'm going to cheat a little and throw out a new answer to this that seems to sum up the best of everything on here instead of sticking it inside of a comment.
So you can one line this:
List<string> strings = new List<string>() { "one", "two", "three" };
string concat = strings
.Aggregate(new StringBuilder("\a"),
(current, next) => current.Append(", ").Append(next))
.ToString()
.Replace("\a, ",string.Empty);
Edit: You'll either want to check for an empty enumerable first or add an .Replace("\a",string.Empty); to the end of the expression. Guess I might have been trying to get a little too smart.
The answer from #a.friend might be slightly more performant, I'm not sure what Replace does under the hood compared to Remove. The only other caveat if some reason you wanted to concat strings that ended in \a's you would lose your separators... I find that unlikely. If that is the case you do have other fancy characters to choose from.
Lots of choices here. You can use LINQ and a StringBuilder so you get the performance too like so:
StringBuilder builder = new StringBuilder();
List<string> MyList = new List<string>() {"one","two","three"};
MyList.ForEach(w => builder.Append(builder.Length > 0 ? ", " + w : w));
return builder.ToString();
You can combine LINQ and string.join() quite effectively. Here I am removing an item from a string. There are better ways of doing this too but here it is:
filterset = String.Join(",",
filterset.Split(',')
.Where(f => mycomplicatedMatch(f,paramToMatch))
);
I did the following quick and dirty when parsing an IIS log file using linq, it worked # 1 million lines pretty well (15 seconds), although got an out of memory error when trying 2 millions lines.
static void Main(string[] args)
{
Debug.WriteLine(DateTime.Now.ToString() + " entering main");
// USED THIS DOS COMMAND TO GET ALL THE DAILY FILES INTO A SINGLE FILE: copy *.log target.log
string[] lines = File.ReadAllLines(#"C:\Log File Analysis\12-8 E5.log");
Debug.WriteLine(lines.Count().ToString());
string[] a = lines.Where(x => !x.StartsWith("#Software:") &&
!x.StartsWith("#Version:") &&
!x.StartsWith("#Date:") &&
!x.StartsWith("#Fields:") &&
!x.Contains("_vti_") &&
!x.Contains("/c$") &&
!x.Contains("/favicon.ico") &&
!x.Contains("/ - 80")
).ToArray();
Debug.WriteLine(a.Count().ToString());
string[] b = a
.Select(l => l.Split(' '))
.Select(words => string.Join(",", words))
.ToArray()
;
System.IO.File.WriteAllLines(#"C:\Log File Analysis\12-8 E5.csv", b);
Debug.WriteLine(DateTime.Now.ToString() + " leaving main");
}
The real reason I used linq was for a Distinct() I neede previously:
string[] b = a
.Select(l => l.Split(' '))
.Where(l => l.Length > 11)
.Select(words => string.Format("{0},{1}",
words[6].ToUpper(), // virtual dir / service
words[10]) // client ip
).Distinct().ToArray()
;
I blogged about this a while ago, what I did seams to be exactly what you're looking for:
http://ondevelopment.blogspot.com/2009/02/string-concatenation-made-easy.html
In the blog post describe how to implement extension methods that works on IEnumerable and are named Concatenate, this will let you write things like:
var sequence = new string[] { "foo", "bar" };
string result = sequence.Concatenate();
Or more elaborate things like:
var methodNames = typeof(IFoo).GetMethods().Select(x => x.Name);
string result = methodNames.Concatenate(", ");
FWIW I benchmarked string.Join vs .Aggregate on a string array of 15 strings using BDN:
Method
Mean
Error
StdDev
Gen0
Allocated
String_Join
92.99 ns
9.905 ns
0.543 ns
0.0560
352 B
LING_Aggregate
406.00 ns
74.662 ns
4.092 ns
0.4640
2912 B
The gap increases with bigger arrays