Can I use LINQ to strip repeating spaces from a string?

Can I use LINQ to strip repeating spaces from a string? - c#

A quick brain teaser: given a string
This is a string with repeating spaces
What would be the LINQ expressing to end up with
This is a string with repeating spaces
Thanks!
For reference, here's one non-LINQ way:
private static IEnumerable<char> RemoveRepeatingSpaces(IEnumerable<char> text)
{
bool isSpace = false;
foreach (var c in text)
{
if (isSpace && char.IsWhiteSpace(c)) continue;
isSpace = char.IsWhiteSpace(c);
yield return c;
}
}

This is not a linq type task, use regex
string output = Regex.Replace(input," +"," ");
Of course you could use linq to apply this to a collection of strings.

public static string TrimInternal(this string text)
{
var trimmed = text.Where((c, index) => !char.IsWhiteSpace(c) || (index != 0 && !char.IsWhiteSpace(text[index - 1])));
return new string(trimmed.ToArray());
}

Since nobody seems to have given a satisfactory answer, I came up with one. Here's a string-based solution (.Net 4):
public static string RemoveRepeatedSpaces(this string s)
{
return s[0] + string.Join("",
s.Zip(
s.Skip(1),
(x, y) => x == y && y == ' ' ? (char?)null : y));
}
However, this is just a general case of removing repeated elements from a sequence, so here's the generalized version:
public static IEnumerable<T> RemoveRepeatedElements<T>(
this IEnumerable<T> s, T dup)
{
return s.Take(1).Concat(
s.Zip(
s.Skip(1),
(x, y) => x.Equals(y) && y.Equals(dup) ? (object)null : y)
.OfType<T>());
}
Of course, that's really just a more specific version of a function that removes all consecutive duplicates from its input stream:
public static IEnumerable<T> RemoveRepeatedElements<T>(this IEnumerable<T> s)
{
return s.Take(1).Concat(
s.Zip(
s.Skip(1),
(x, y) => x.Equals(y) ? (object)null : y)
.OfType<T>());
}
And obviously you can implement the first function in terms of the second:
public static string RemoveRepeatedSpaces(this string s)
{
return string.Join("", s.RemoveRepeatedElements(' '));
}
BTW, I benchmarked my last function against the regex version (Regex.Replace(s, " +", " ")) and they were were within nanoseconds of each other, so the extra LINQ overhead is negligible compared to the extra regex overhead. When I generalized it to remove all consecutive duplicate characters, the equivalent regex (Regex.Replace(s, "(.)\\1+", "$1")) was 3.5 times slower than my LINQ version (string.Join("", s.RemoveRepeatedElements())).
I also tried the "ideal" procedural solution:
public static string RemoveRepeatedSpaces(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
char lastChar = '\0';
foreach (char c in s)
if (c != ' ' || lastChar != ' ')
sb.Append(lastChar = c);
return sb.ToString();
}
This is more than 5 times faster than a regex!

In practice, I would probably just use your original solution or regular expressions (if you want a quick & simple solution). A geeky approach that uses lambda functions would be to define a fixed point operator:
T FixPoint<T>(T initial, Func<T, T> f) {
T current = initial;
do {
initial = current;
current = f(initial);
} while (initial != current);
return current;
}
This keeps calling the operation f repeatedly until the operation returns the same value that it got as an argument. You can think of the operation as a generalized loop - it is quite useful, though I guess it is too geeky to be included in .NET BCL. Then you can write:
string res = FixPoint(original, s => s.Replace(" ", " "));
It is not as efficient as your original version, but unless there are too many spaces it should work fine.

Linq is by definition related to enumerable (i.e. collections, list, arrays). You could transorm your string into a collection of char and select the non space one but this is definitevly not a job for Linq.

Paul Creasey's answer is the way to go.
If you want to treat tabs as whitespace as well, go with:
text = Regex.Replace(text, "[ |\t]+", " ");
UPDATE:
The most logical way to solve this problem while satisfying the "using LINQ" requirement has been suggested by both Hasan and Ani. However, notice that these solutions involve accessing a character in a string by index.
The spirit of the LINQ approach is that it can be applied to any enumerable sequence. Because any reasonably efficient solution to this problem requires maintaining some kind of state (with Ani's and Hasan's solutions it's easy to miss this fact as the state is already maintained within the string itself), a generic approach that accepts any sequence of items is likely going to be much more straightforward to implement using procedural code.
This procedural code may then be abstracted into a method that looks like a LINQ-style method, of course. But I would not recommend tackling a problem like this with the attitude of "I want to use LINQ in this solution" from the get-go because it will impose very awkward restriction on your code.
For what it's worth, here's how I'd implement the general idea.
public static IEnumerable<T> StripConsecutives<T>(this IEnumerable<T> source, T value, IEqualityComparer<T> comparer)
{
// null-checking omitted for brevity
using (var enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
yield return enumerator.Current;
}
else
{
yield break;
}
T prev = enumerator.Current;
while (enumerator.MoveNext())
{
T current = enumerator.Current;
if (comparer.Equals(prev, value) && comparer.Equals(current, value))
{
// This is a consecutive occurrence of value --
// moving on...
}
else
{
yield return current;
}
prev = current;
}
}
}

Split to list, filter, then rejoin, 2 lines of code...
var test = " Alpha Beta Tango ";
var l = test.Split(' ').Where(s => !string.IsNullOrEmpty(s));
var result = string.Join(" ", l);
// result = "Alpha Beta Tango"
Refactoring as an extension method:
using Extensions;
void Main()
{
var test = " Alpha Beta Tango ";
var result = test.RemoveRepeatedSpaces();
// result = "Alpha Beta Tango";
}
static class Extentions
{
public static string RemoveRepeatedSpaces(this string s)
{
if (s == null)
return string.Empty;
var l = s.Split(' ').Where(a => !string.IsNullOrEmpty(a));
return string.Join(" ", l);
}
}

Related

Efficient way of grouping characters in string c#

I want an efficient way of grouping strings whilst keeping duplicates and order.
Something like this
1100110002200 -> 101020
I tried this previously
_case.GroupBy(c => c).Select(g => g.Key)
but I got 102
But this gives me what I want, I just want to optimize it, so I wouldn't have to scour the entire list each time
static List<char> group(string _case)
{
var groups = new List<char>();
for (int i = 0; i < _case.Length; i++)
{
if (groups.LastOrDefault() != _case[i])
groups.Add(_case[i]);
}
return groups;
}

While I like the elegant solution of rshepp, it turns out that the very basic code can run even 5 times faster than that.
public static string Simplify2(string str)
{
if (string.IsNullOrEmpty(str)) { return str; }
StringBuilder sb = new StringBuilder();
char last = str[0];
sb.Append(last);
foreach (char c in str)
{
if (last != c)
{
sb.Append(c);
last = c;
}
}
return sb.ToString();
}

You could create a method that loops each character and checks the previous character for equality. If they aren't the same, append/yield return the character. This is pretty easy to do with Linq.
public static string Simplify(string str)
{
return string.Concat(str.Where((c, i) => i == 0 || c != str[i - 1]));
}
Usage:
string simplified = Simplify("1100110002200");
// 101020
In my testing, my method and yours are roughly equal in speed, mine being insignificantly slower after 10 million executions (4260ms vs 4241ms).
However, my method returns the result as a string whereas yours doesn't. If you need to convert your result back to a string (which is likely) then my method is indeed much faster/more efficient (4260ms vs 6569ms).

how to find all the double characters in a string in c#

I am trying to get a count of all the double characters in a string using C#,i.e "ssss" should be two doubles not three doubles.
For example right now i have to do a for loop in the string like this
string s="shopkeeper";
for(int i=1;i<s.Length;i++) if(s[i]==s[i-1]) d++;
the value of d at the end should be 1
Is there a shorter way to do this? in linq or regex? and what are the perfomance implications, what is the most effective way? Thanks for your help
I have read [How to check repeated letters in a string c#] and
it's helpful, but doesn't address double characters, i am looking for
double characters

Try following Regex to extract any double characters: "(.)\1"
UPD: simple example:
foreach (var match in Regex.Matches("shhopkeeper", #"(.)\1"))
Console.WriteLine(match);

This works:
var doubles =
text
.Skip(1)
.Aggregate(
text.Take(1).Select(x => x.ToString()).ToList(),
(a, c) =>
{
if (a.Last().Last() == c)
a[a.Count - 1] += c.ToString();
else
a.Add(c.ToString());
return a;
})
.Select(x => x.Length / 2)
.Sum();
I gives me these results:
"shopkeeper" -> 1
"beekeeper" -> 2
"bookkeeper" -> 3
"boookkkeeeper" -> 3
"booookkkkeeeeper" -> 6

First I would like to mention that there is no "natural" LINQ solution to this problem, so every standard LINQ based solution will be ugly and highly inefficient compared to a simple for loop.
However there is a LINQ "spirit" solution to this and similar problems, like the linked How to check repeated letters in a string c# or if you want for instance finding not doubles, but let say triples, quadruples etc.
The common sub problem is, given a some sequence of elements, generate a new sequence of (value, count) pair groups for the consecutive elements having one and the same value.
It can be done with a custom extension method like this (the name of the method could be different, it's not essential for the point):
public static class EnumerableEx
{
public static IEnumerable<TResult> Zip<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, int, TResult> resultSelector, IEqualityComparer<TSource> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<TSource>.Default;
using (var e = source.GetEnumerator())
{
for (bool more = e.MoveNext(); more;)
{
var value = e.Current;
int count = 1;
while ((more = e.MoveNext()) && comparer.Equals(e.Current, value)) count++;
yield return resultSelector(value, count);
}
}
}
}
Using this function in combination with standard LINQ, one can easily solve the original question:
var s = "shhopkeeperssss";
var countDoubles = s.Zip((value, count) => count / 2).Sum();
but also
var countTriples = s.Zip((value, count) => count / 3).Sum();
or
var countQuadruples = s.Zip((value, count) => count / 4).Sum();
or the question from the link
var repeatedChars = s.Zip((value, count) => new { Char = value, Count = count })
.Where(e => e.Count > 1);
etc.

Compare two values using RegEx

If I have two values eg/ABC001 and ABC100 or A0B0C1 and A1B0C0, is there a RegEx I can use to make sure the two values have the same pattern?

Well, here's my shot at it. This doesn't use regular expressions, and assumes s1 and s2 only contain numbers or digits:
public static bool SamePattern(string s1, string s2)
{
if (s1.Length == s2.Length)
{
char[] chars1 = s1.ToCharArray();
char[] chars2 = s2.ToCharArray();
for (int i = 0; i < chars1.Length; i++)
{
if (!Char.IsDigit(chars1[i]) && chars1[i] != chars2[i])
{
return false;
}
else if (Char.IsDigit(chars1[i]) != Char.IsDigit(chars2[i]))
{
return false;
}
}
return true;
}
else
{
return false;
}
}
A description of the algorithm is as follows:
If the strings have different lengths, return false.
Otherwise, check the characters in the same position in both strings:
If they are both digits or both numbers, move on to the next iteration.
If they aren't digits but aren't the same, return false.
If one is a digit and one is a number, return false.
If all characters in both strings were checked successfully, return true.

If you don't know the pattern in advance, but are only going to encounter two groups of characters (alpha and digits), then you could do the following:
Write some C# that parsed the first pattern, looking at each char and determine if it's alpha, or digit, then generate a regex accordingly from that pattern.
You may find that there's no point writing code to generate a regex, as it could be just as simple to check the second string against the first.
Alternatively, without regex:
First check the strings are the same length.
Then loop through both strings at the same time, char by char. If char[x] from string 1 is alpha, and char[x] from string two is the same, you're patterns are matching.
Try this, it should cope if a string sneaks in some symbols. Edited to compare character values ... and use Char.IsLetter and Char.IsDigit
private bool matchPattern(string string1, string string2)
{
bool result = (string1.Length == string2.Length);
char[] chars1 = string1.ToCharArray();
char[] chars2 = string2.ToCharArray();
for (int i = 0; i < string1.Length; i++)
{
if (Char.IsLetter(chars1[i]) != Char.IsLetter(chars2[i]))
{
result = false;
}
if (Char.IsLetter(chars1[i]) && (chars1[i] != chars2[i]))
{
//Characters must be identical
result = false;
}
if (Char.IsDigit(chars1[i]) != Char.IsDigit(chars2[i]))
result = false;
}
return result;
}

Consider using Char.GetUnicodeCategory
You can write a helper class for this task:
public class Mask
{
public Mask(string originalString)
{
OriginalString = originalString;
CharCategories = originalString.Select(Char.GetUnicodeCategory).ToList();
}
public string OriginalString { get; private set; }
public IEnumerable<UnicodeCategory> CharCategories { get; private set; }
public bool HasSameCharCategories(Mask other)
{
//null checks
return CharCategories.SequenceEqual(other.CharCategories);
}
}
Use as
Mask mask1 = new Mask("ab12c3");
Mask mask2 = new Mask("ds124d");
MessageBox.Show(mask1.HasSameCharCategories(mask2).ToString());

I don't know C# syntax but here is a pseudo code:
split the strings on ''
sort the 2 arrays
join each arrays with ''
compare the 2 strings

A general-purpose solution with LINQ can be achieved quite easily. The idea is:
Sort the two strings (reordering the characters).
Compare each sorted string as a character sequence using SequenceEquals.
This scheme enables a short, graceful and configurable solution, for example:
// We will be using this in SequenceEquals
class MyComparer : IEqualityComparer<char>
{
public bool Equals(char x, char y)
{
return x.Equals(y);
}
public int GetHashCode(char obj)
{
return obj.GetHashCode();
}
}
// and then:
var s1 = "ABC0102";
var s2 = "AC201B0";
Func<char, double> orderFunction = char.GetNumericValue;
var comparer = new MyComparer();
var result = s1.OrderBy(orderFunction).SequenceEqual(s2.OrderBy(orderFunction), comparer);
Console.WriteLine("result = " + result);
As you can see, it's all in 3 lines of code (not counting the comparer class). It's also very very easily configurable.
The code as it stands checks if s1 is a permutation of s2.
Do you want to check if s1 has the same number and kind of characters with s2, but not necessarily the same characters (e.g. "ABC" to be equal to "ABB")? No problem, change MyComparer.Equals to return char.GetUnicodeCategory(x).Equals(char.GetUnicodeCategory(y));.
By changing the values of orderFunction and comparer you can configure a multitude of other comparison options.
And finally, since I don't find it very elegant to define a MyComparer class just to enable this scenario, you can also use the technique described in this question:
Wrap a delegate in an IEqualityComparer
to define your comparer as an inline lambda. This would result in a configurable solution contained in 2-3 lines of code.

What are the alternatives to Split a string in c# that don't use String.Split()

I saw this question that asks given a string "smith;rodgers;McCalne" how can you produce a collection. The answer to this was to use String.Split.
If we don't have Split() built in what do you do instead?
Update:
I admit writing a split function is fairly easy. Below is what I would've wrote. Loop through the string using IndexOf and extract using Substring.
string s = "smith;rodgers;McCalne";
string seperator = ";";
int currentPosition = 0;
int lastPosition = 0;
List<string> values = new List<string>();
do
{
currentPosition = s.IndexOf(seperator, currentPosition + 1);
if (currentPosition == -1)
currentPosition = s.Length;
values.Add(s.Substring(lastPosition, currentPosition - lastPosition));
lastPosition = currentPosition+1;
} while (currentPosition < s.Length);
I took a peek at SSCLI implementation and its similar to the above except it handles way more use cases and it uses an unsafe method to determine the indexes of the separators before doing the substring extraction.
Others have suggested the following.
An Extension Method that uses an Iterator Block
Regex suggestion (no implementation)
Linq Aggregate method
Is this it?

It's reasonably simple to write your own Split equivalent.
Here's a quick example, although in reality you'd probably want to create some overloads for more flexibility. (Well, in reality you'd just use the framework's built-in Split methods!)
string foo = "smith;rodgers;McCalne";
foreach (string bar in foo.Split2(";"))
{
Console.WriteLine(bar);
}
// ...
public static class StringExtensions
{
public static IEnumerable<string> Split2(this string source, string delim)
{
// argument null checking etc omitted for brevity
int oldIndex = 0, newIndex;
while ((newIndex = source.IndexOf(delim, oldIndex)) != -1)
{
yield return source.Substring(oldIndex, newIndex - oldIndex);
oldIndex = newIndex + delim.Length;
}
yield return source.Substring(oldIndex);
}
}

You make your own loop to do the split. Here is one that uses the Aggregate extension method. Not very efficient, as it uses the += operator on the strings, so it should not really be used as anything but an example, but it works:
string names = "smith;rodgers;McCalne";
List<string> split = names.Aggregate(new string[] { string.Empty }.ToList(), (s, c) => {
if (c == ';') s.Add(string.Empty); else s[s.Count - 1] += c;
return s;
});

Regex?
Or just Substring. This is what Split does internally

How to search a string in String array

I need to search a string in the string array. I dont want to use any for looping in it
string [] arr = {"One","Two","Three"};
string theString = "One"
I need to check whether theString variable is present in arr.

Well, something is going to have to look, and looping is more efficient than recursion (since tail-end recursion isn't fully implemented)... so if you just don't want to loop yourself, then either of:
bool has = arr.Contains(var); // .NET 3.5
or
bool has = Array.IndexOf(arr, var) >= 0;
For info: avoid names like var - this is a keyword in C# 3.0.

Every method, mentioned earlier does looping either internally or externally, so it is not really important how to implement it. Here another example of finding all references of target string
string [] arr = {"One","Two","Three"};
var target = "One";
var results = Array.FindAll(arr, s => s.Equals(target));

Does it have to be a string[] ? A List<String> would give you what you need.
List<String> testing = new List<String>();
testing.Add("One");
testing.Add("Two");
testing.Add("Three");
testing.Add("Mouse");
bool inList = testing.Contains("Mouse");

bool exists = arr.Contains("One");

I think it is better to use Array.Exists than Array.FindAll.

Its pretty simple. I always use this code to search string from a string array
string[] stringArray = { "text1", "text2", "text3", "text4" };
string value = "text3";
int pos = Array.IndexOf(stringArray, value);
if (pos > -1)
{
return true;
}
else
{
return false;
}

If the array is sorted, you can use BinarySearch. This is a O(log n) operation, so it is faster as looping. If you need to apply multiple searches and speed is a concern, you could sort it (or a copy) before using it.

Each class implementing IList has a method Contains(Object value). And so does System.Array.

Why the prohibition "I don't want to use any looping"? That's the most obvious solution. When given the chance to be obvious, take it!
Note that calls like arr.Contains(...) are still going to loop, it just won't be you who has written the loop.
Have you considered an alternate representation that's more amenable to searching?
A good Set implementation would perform well. (HashSet, TreeSet or the local equivalent).
If you can be sure that arr is sorted, you could use binary search (which would need to recurse or loop, but not as often as a straight linear search).

You can use Find method of Array type. From .NET 3.5 and higher.
public static T Find<T>(
T[] array,
Predicate<T> match
)
Here is some examples:
// we search an array of strings for a name containing the letter “a”:
static void Main()
{
string[] names = { "Rodney", "Jack", "Jill" };
string match = Array.Find (names, ContainsA);
Console.WriteLine (match); // Jack
}
static bool ContainsA (string name) { return name.Contains ("a"); }
Here’s the same code shortened with an anonymous method:
string[] names = { "Rodney", "Jack", "Jill" };
string match = Array.Find (names, delegate (string name)
{ return name.Contains ("a"); } ); // Jack
A lambda expression shortens it further:
string[] names = { "Rodney", "Jack", "Jill" };
string match = Array.Find (names, n => n.Contains ("a")); // Jack

At first shot, I could come up with something like this (but it's pseudo code and assuming you cannot use any .NET built-in libaries). Might require a bit of tweaking and re-thinking, but should be good enough for a head-start, maybe?
int findString(String var, String[] stringArray, int currentIndex, int stringMaxIndex)
{
if currentIndex > stringMaxIndex
return (-stringMaxIndex-1);
else if var==arr[currentIndex] //or use any string comparison op or function
return 0;
else
return findString(var, stringArray, currentIndex++, stringMaxIndex) + 1 ;
}
//calling code
int index = findString(var, arr, 0, getMaxIndex(arr));
if index == -1 printOnScreen("Not found");
else printOnScreen("Found on index: " + index);

In C#, if you can use an ArrayList, you can use the Contains method, which returns a boolean:
if MyArrayList.Contains("One")

You can check the element existence by
arr.Any(x => x == "One")

it is old one ,but this is the way i do it ,
enter code herevar result = Array.Find(names, element => element == "One");

I'm surprised that no one suggested using Array.IndexOf Method.
Indeed, Array.IndexOf has two advantages :
It allows searching if an element is included into an array,
It gets at the same time the index into the array.
int stringIndex = Array.IndexOf(arr, theString);
if (stringIndex >= 0)
{
// theString has been found
}
Inline version :
if (Array.IndexOf(arr, theString) >= 0)
{
// theString has been found
}

Using Contains()
string [] SomeArray = {"One","Two","Three"};
bool IsExist = SomeArray.Contains("One");
Console.WriteLine("Is string exist: "+ IsExist);
Using Find()
string [] SomeArray = {"One","Two","Three"};
var result = Array.Find(SomeArray, element => element == "One");
Console.WriteLine("Required string is: "+ result);
Another simple & traditional way, very useful for beginners to build logic.
string [] SomeArray = {"One","Two","Three"};
foreach (string value in SomeArray) {
if (value == "One") {
Console.WriteLine("Required string is: "+ value);
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Can I use LINQ to strip repeating spaces from a string? - c#

This is not a linq type task, use regex string output = Regex.Replace(input," +"," "); Of course you could use linq to apply this to a collection of strings.

public static string TrimInternal(this string text) { var trimmed = text.Where((c, index) => !char.IsWhiteSpace(c) || (index != 0 && !char.IsWhiteSpace(text[index - 1]))); return new string(trimmed.ToArray()); }

Linq is by definition related to enumerable (i.e. collections, list, arrays). You could transorm your string into a collection of char and select the non space one but this is definitevly not a job for Linq.

Related

Efficient way of grouping characters in string c#

how to find all the double characters in a string in c#

Compare two values using RegEx

What are the alternatives to Split a string in c# that don't use String.Split()

How to search a string in String array

Categories

Resources