Eric Lippert's challenge "comma-quibbling", best answer? - c#

I wanted to bring this challenge to the attention of the stackoverflow community. The original problem and answers are here. BTW, if you did not follow it before, you should try to read Eric's blog, it is pure wisdom.
Summary:
Write a function that takes a non-null IEnumerable and returns a string with the following characteristics:
If the sequence is empty the resulting string is "{}".
If the sequence is a single item "ABC" then the resulting string is "{ABC}".
If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "{ABC and DEF}".
If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "{ABC, DEF, G and H}". (Note: no Oxford comma!)
As you can see even our very own Jon Skeet (yes, it is well known that he can be in two places at the same time) has posted a solution but his (IMHO) is not the most elegant although probably you can not beat its performance.
What do you think? There are pretty good options there. I really like one of the solutions that involves the select and aggregate methods (from Fernando Nicolet). Linq is very powerful and dedicating some time to challenges like this make you learn a lot. I twisted it a bit so it is a bit more performant and clear (by using Count and avoiding Reverse):
public static string CommaQuibbling(IEnumerable<string> items)
{
int last = items.Count() - 1;
Func<int, string> getSeparator = (i) => i == 0 ? string.Empty : (i == last ? " and " : ", ");
string answer = string.Empty;
return "{" + items.Select((s, i) => new { Index = i, Value = s })
.Aggregate(answer, (s, a) => s + getSeparator(a.Index) + a.Value) + "}";
}

Inefficient, but I think clear.
public static string CommaQuibbling(IEnumerable<string> items)
{
List<String> list = new List<String>(items);
if (list.Count == 0) { return "{}"; }
if (list.Count == 1) { return "{" + list[0] + "}"; }
String[] initial = list.GetRange(0, list.Count - 1).ToArray();
return "{" + String.Join(", ", initial) + " and " + list[list.Count - 1] + "}";
}
If I was maintaining the code, I'd prefer this to more clever versions.

How about this approach? Purely cumulative - no back-tracking, and only iterates once. For raw performance, I'm not sure you'll do better with LINQ etc, regardless of how "pretty" a LINQ answer might be.
using System;
using System.Collections.Generic;
using System.Text;
static class Program
{
public static string CommaQuibbling(IEnumerable<string> items)
{
StringBuilder sb = new StringBuilder('{');
using (var iter = items.GetEnumerator())
{
if (iter.MoveNext())
{ // first item can be appended directly
sb.Append(iter.Current);
if (iter.MoveNext())
{ // more than one; only add each
// term when we know there is another
string lastItem = iter.Current;
while (iter.MoveNext())
{ // middle term; use ", "
sb.Append(", ").Append(lastItem);
lastItem = iter.Current;
}
// add the final term; since we are on at least the
// second term, always use " and "
sb.Append(" and ").Append(lastItem);
}
}
}
return sb.Append('}').ToString();
}
static void Main()
{
Console.WriteLine(CommaQuibbling(new string[] { }));
Console.WriteLine(CommaQuibbling(new string[] { "ABC" }));
Console.WriteLine(CommaQuibbling(new string[] { "ABC", "DEF" }));
Console.WriteLine(CommaQuibbling(new string[] {
"ABC", "DEF", "G", "H" }));
}
}

If I was doing a lot with streams which required first/last information, I'd have thid extension:
[Flags]
public enum StreamPosition
{
First = 1, Last = 2
}
public static IEnumerable<R> MapWithPositions<T, R> (this IEnumerable<T> stream,
Func<StreamPosition, T, R> map)
{
using (var enumerator = stream.GetEnumerator ())
{
if (!enumerator.MoveNext ()) yield break ;
var cur = enumerator.Current ;
var flags = StreamPosition.First ;
while (true)
{
if (!enumerator.MoveNext ()) flags |= StreamPosition.Last ;
yield return map (flags, cur) ;
if ((flags & StreamPosition.Last) != 0) yield break ;
cur = enumerator.Current ;
flags = 0 ;
}
}
}
Then the simplest (not the quickest, that would need a couple more handy extension methods) solution will be:
public static string Quibble (IEnumerable<string> strings)
{
return "{" + String.Join ("", strings.MapWithPositions ((pos, item) => (
(pos & StreamPosition.First) != 0 ? "" :
pos == StreamPosition.Last ? " and " : ", ") + item)) + "}" ;
}

Here as a Python one liner
>>> f=lambda s:"{%s}"%", ".join(s)[::-1].replace(',','dna ',1)[::-1]
>>> f([])
'{}'
>>> f(["ABC"])
'{ABC}'
>>> f(["ABC","DEF"])
'{ABC and DEF}'
>>> f(["ABC","DEF","G","H"])
'{ABC, DEF, G and H}'
This version might be easier to understand
>>> f=lambda s:"{%s}"%" and ".join(s).replace(' and',',',len(s)-2)
>>> f([])
'{}'
>>> f(["ABC"])
'{ABC}'
>>> f(["ABC","DEF"])
'{ABC and DEF}'
>>> f(["ABC","DEF","G","H"])
'{ABC, DEF, G and H}'

Here's a simple F# solution, that only does one forward iteration:
let CommaQuibble items =
let sb = System.Text.StringBuilder("{")
// pp is 2 previous, p is previous
let pp,p = items |> Seq.fold (fun (pp:string option,p) s ->
if pp <> None then
sb.Append(pp.Value).Append(", ") |> ignore
(p, Some(s))) (None,None)
if pp <> None then
sb.Append(pp.Value).Append(" and ") |> ignore
if p <> None then
sb.Append(p.Value) |> ignore
sb.Append("}").ToString()
(EDIT: Turns out this is very similar to Skeet's.)
The test code:
let Test l =
printfn "%s" (CommaQuibble l)
Test []
Test ["ABC"]
Test ["ABC";"DEF"]
Test ["ABC";"DEF";"G"]
Test ["ABC";"DEF";"G";"H"]
Test ["ABC";null;"G";"H"]

I'm a fan of the serial comma: I eat, shoot, and leave.
I continually need a solution to this problem and have solved it in 3 languages (though not C#). I would adapt the following solution (in Lua, does not wrap answer in curly braces) by writing a concat method that works on any IEnumerable:
function commafy(t, andword)
andword = andword or 'and'
local n = #t -- number of elements in the numeration
if n == 1 then
return t[1]
elseif n == 2 then
return concat { t[1], ' ', andword, ' ', t[2] }
else
local last = t[n]
t[n] = andword .. ' ' .. t[n]
local answer = concat(t, ', ')
t[n] = last
return answer
end
end

This isn't brilliantly readable, but it scales well up to tens of millions of strings. I'm developing on an old Pentium 4 workstation and it does 1,000,000 strings of average length 8 in about 350ms.
public static string CreateLippertString(IEnumerable<string> strings)
{
char[] combinedString;
char[] commaSeparator = new char[] { ',', ' ' };
char[] andSeparator = new char[] { ' ', 'A', 'N', 'D', ' ' };
int totalLength = 2; //'{' and '}'
int numEntries = 0;
int currentEntry = 0;
int currentPosition = 0;
int secondToLast;
int last;
int commaLength= commaSeparator.Length;
int andLength = andSeparator.Length;
int cbComma = commaLength * sizeof(char);
int cbAnd = andLength * sizeof(char);
//calculate the sum of the lengths of the strings
foreach (string s in strings)
{
totalLength += s.Length;
++numEntries;
}
//add to the total length the length of the constant characters
if (numEntries >= 2)
totalLength += 5; // " AND "
if (numEntries > 2)
totalLength += (2 * (numEntries - 2)); // ", " between items
//setup some meta-variables to help later
secondToLast = numEntries - 2;
last = numEntries - 1;
//allocate the memory for the combined string
combinedString = new char[totalLength];
//set the first character to {
combinedString[0] = '{';
currentPosition = 1;
if (numEntries > 0)
{
//now copy each string into its place
foreach (string s in strings)
{
Buffer.BlockCopy(s.ToCharArray(), 0, combinedString, currentPosition * sizeof(char), s.Length * sizeof(char));
currentPosition += s.Length;
if (currentEntry == secondToLast)
{
Buffer.BlockCopy(andSeparator, 0, combinedString, currentPosition * sizeof(char), cbAnd);
currentPosition += andLength;
}
else if (currentEntry == last)
{
combinedString[currentPosition] = '}'; //set the last character to '}'
break; //don't bother making that last call to the enumerator
}
else if (currentEntry < secondToLast)
{
Buffer.BlockCopy(commaSeparator, 0, combinedString, currentPosition * sizeof(char), cbComma);
currentPosition += commaLength;
}
++currentEntry;
}
}
else
{
//set the last character to '}'
combinedString[1] = '}';
}
return new string(combinedString);
}

Another variant - separating punctuation and iteration logic for the sake of code clarity. And still thinking about perfomrance.
Works as requested with pure IEnumerable/string/ and strings in the list cannot be null.
public static string Concat(IEnumerable<string> strings)
{
return "{" + strings.reduce("", (acc, prev, cur, next) =>
acc.Append(punctuation(prev, cur, next)).Append(cur)) + "}";
}
private static string punctuation(string prev, string cur, string next)
{
if (null == prev || null == cur)
return "";
if (null == next)
return " and ";
return ", ";
}
private static string reduce(this IEnumerable<string> strings,
string acc, Func<StringBuilder, string, string, string, StringBuilder> func)
{
if (null == strings) return "";
var accumulatorBuilder = new StringBuilder(acc);
string cur = null;
string prev = null;
foreach (var next in strings)
{
func(accumulatorBuilder, prev, cur, next);
prev = cur;
cur = next;
}
func(accumulatorBuilder, prev, cur, null);
return accumulatorBuilder.ToString();
}
F# surely looks much better:
let rec reduce list =
match list with
| [] -> ""
| head::curr::[] -> head + " and " + curr
| head::curr::tail -> head + ", " + curr :: tail |> reduce
| head::[] -> head
let concat list = "{" + (list |> reduce ) + "}"

Disclaimer: I used this as an excuse to play around with new technologies, so my solutions don't really live up to the Eric's original demands for clarity and maintainability.
Naive Enumerator Solution
(I concede that the foreach variant of this is superior, as it doesn't require manually messing about with the enumerator.)
public static string NaiveConcatenate(IEnumerable<string> sequence)
{
StringBuilder sb = new StringBuilder();
sb.Append('{');
IEnumerator<string> enumerator = sequence.GetEnumerator();
if (enumerator.MoveNext())
{
string a = enumerator.Current;
if (!enumerator.MoveNext())
{
sb.Append(a);
}
else
{
string b = enumerator.Current;
while (enumerator.MoveNext())
{
sb.Append(a);
sb.Append(", ");
a = b;
b = enumerator.Current;
}
sb.AppendFormat("{0} and {1}", a, b);
}
}
sb.Append('}');
return sb.ToString();
}
Solution using LINQ
public static string ConcatenateWithLinq(IEnumerable<string> sequence)
{
return (from item in sequence select item)
.Aggregate(
new {sb = new StringBuilder("{"), a = (string) null, b = (string) null},
(s, x) =>
{
if (s.a != null)
{
s.sb.Append(s.a);
s.sb.Append(", ");
}
return new {s.sb, a = s.b, b = x};
},
(s) =>
{
if (s.b != null)
if (s.a != null)
s.sb.AppendFormat("{0} and {1}", s.a, s.b);
else
s.sb.Append(s.b);
s.sb.Append("}");
return s.sb.ToString();
});
}
Solution with TPL
This solution uses a producer-consumer queue to feed the input sequence to the processor, whilst keeping at least two elements buffered in the queue. Once the producer has reached the end of the input sequence, the last two elements can be processed with special treatment.
In hindsight there is no reason to have the consumer operate asynchronously, which would eliminate the need for a concurrent queue, but as I said previously, I was just using this as an excuse to play around with new technologies :-)
public static string ConcatenateWithTpl(IEnumerable<string> sequence)
{
var queue = new ConcurrentQueue<string>();
bool stop = false;
var consumer = Future.Create(
() =>
{
var sb = new StringBuilder("{");
while (!stop || queue.Count > 2)
{
string s;
if (queue.Count > 2 && queue.TryDequeue(out s))
sb.AppendFormat("{0}, ", s);
}
return sb;
});
// Producer
foreach (var item in sequence)
queue.Enqueue(item);
stop = true;
StringBuilder result = consumer.Value;
string a;
string b;
if (queue.TryDequeue(out a))
if (queue.TryDequeue(out b))
result.AppendFormat("{0} and {1}", a, b);
else
result.Append(a);
result.Append("}");
return result.ToString();
}
Unit tests elided for brevity.

Late entry:
public static string CommaQuibbling(IEnumerable<string> items)
{
string[] parts = items.ToArray();
StringBuilder result = new StringBuilder('{');
for (int i = 0; i < parts.Length; i++)
{
if (i > 0)
result.Append(i == parts.Length - 1 ? " and " : ", ");
result.Append(parts[i]);
}
return result.Append('}').ToString();
}

public static string CommaQuibbling(IEnumerable<string> items)
{
int count = items.Count();
string answer = string.Empty;
return "{" +
(count==0) ? "" :
( items[0] +
(count == 1 ? "" :
items.Range(1,count-1).
Aggregate(answer, (s,a)=> s += ", " + a) +
items.Range(count-1,1).
Aggregate(answer, (s,a)=> s += " AND " + a) ))+ "}";
}
It is implemented as,
if count == 0 , then return empty,
if count == 1 , then return only element,
if count > 1 , then take two ranges,
first 2nd element to 2nd last element
last element

Here's mine, but I realize it's pretty much like Marc's, some minor differences in the order of things, and I added unit-tests as well.
using System;
using NUnit.Framework;
using NUnit.Framework.Extensions;
using System.Collections.Generic;
using System.Text;
using NUnit.Framework.SyntaxHelpers;
namespace StringChallengeProject
{
[TestFixture]
public class StringChallenge
{
[RowTest]
[Row(new String[] { }, "{}")]
[Row(new[] { "ABC" }, "{ABC}")]
[Row(new[] { "ABC", "DEF" }, "{ABC and DEF}")]
[Row(new[] { "ABC", "DEF", "G", "H" }, "{ABC, DEF, G and H}")]
public void Test(String[] input, String expectedOutput)
{
Assert.That(FormatString(input), Is.EqualTo(expectedOutput));
}
//codesnippet:93458590-3182-11de-8c30-0800200c9a66
public static String FormatString(IEnumerable<String> input)
{
if (input == null)
return "{}";
using (var iterator = input.GetEnumerator())
{
// Guard-clause for empty source
if (!iterator.MoveNext())
return "{}";
// Take care of first value
var output = new StringBuilder();
output.Append('{').Append(iterator.Current);
// Grab next
if (iterator.MoveNext())
{
// Grab the next value, but don't process it
// we don't know whether to use comma or "and"
// until we've grabbed the next after it as well
String nextValue = iterator.Current;
while (iterator.MoveNext())
{
output.Append(", ");
output.Append(nextValue);
nextValue = iterator.Current;
}
output.Append(" and ");
output.Append(nextValue);
}
output.Append('}');
return output.ToString();
}
}
}
}

How about skipping complicated aggregation code and just cleaning up the string after you build it?
public static string CommaQuibbling(IEnumerable<string> items)
{
var aggregate = items.Aggregate<string, StringBuilder>(
new StringBuilder(),
(b,s) => b.AppendFormat(", {0}", s));
var trimmed = Regex.Replace(aggregate.ToString(), "^, ", string.Empty);
return string.Format(
"{{{0}}}",
Regex.Replace(trimmed,
", (?<last>[^,]*)$", #" and ${last}"));
}
UPDATED: This won't work with strings with commas, as pointed out in the comments. I tried some other variations, but without definite rules about what the strings can contain, I'm going to have real problems matching any possible last item with a regular expression, which makes this a nice lesson for me on their limitations.

I quite liked Jon's answer, but that's because it's much like how I approached the problem. Rather than specifically coding in the two variables, I implemented them inside of a FIFO queue.
It's strange because I just assumed that there would be 15 posts that all did exactly the same thing, but it looks like we were the only two to do it that way. Oh, looking at these answers, Marc Gravell's answer is quite close to the approach we used as well, but he's using two 'loops', rather than holding on to values.
But all those answers with LINQ and regex and joining arrays just seem like crazy-talk! :-)

I don't think that using a good old array is a restriction. Here is my version using an array and an extension method:
public static string CommaQuibbling(IEnumerable<string> list)
{
string[] array = list.ToArray();
if (array.Length == 0) return string.Empty.PutCurlyBraces();
if (array.Length == 1) return array[0].PutCurlyBraces();
string allExceptLast = string.Join(", ", array, 0, array.Length - 1);
string theLast = array[array.Length - 1];
return string.Format("{0} and {1}", allExceptLast, theLast)
.PutCurlyBraces();
}
public static string PutCurlyBraces(this string str)
{
return "{" + str + "}";
}
I am using an array because of the string.Join method and because if the possibility of accessing the last element via an index. The extension method is here because of DRY.
I think that the performance penalities come from the list.ToArray() and string.Join calls, but all in one I hope that piece of code is pleasent to read and maintain.

I think Linq provides fairly readable code. This version handles a million "ABC" in .89 seconds:
using System.Collections.Generic;
using System.Linq;
namespace CommaQuibbling
{
internal class Translator
{
public string Translate(IEnumerable<string> items)
{
return "{" + Join(items) + "}";
}
private static string Join(IEnumerable<string> items)
{
var leadingItems = LeadingItemsFrom(items);
var lastItem = LastItemFrom(items);
return JoinLeading(leadingItems) + lastItem;
}
private static IEnumerable<string> LeadingItemsFrom(IEnumerable<string> items)
{
return items.Reverse().Skip(1).Reverse();
}
private static string LastItemFrom(IEnumerable<string> items)
{
return items.LastOrDefault();
}
private static string JoinLeading(IEnumerable<string> items)
{
if (items.Any() == false) return "";
return string.Join(", ", items.ToArray()) + " and ";
}
}
}

You can use a foreach, without LINQ, delegates, closures, lists or arrays, and still have understandable code. Use a bool and a string, like so:
public static string CommaQuibbling(IEnumerable items)
{
StringBuilder sb = new StringBuilder("{");
bool empty = true;
string prev = null;
foreach (string s in items)
{
if (prev!=null)
{
if (!empty) sb.Append(", ");
else empty = false;
sb.Append(prev);
}
prev = s;
}
if (prev!=null)
{
if (!empty) sb.Append(" and ");
sb.Append(prev);
}
return sb.Append('}').ToString();
}

public static string CommaQuibbling(IEnumerable<string> items)
{
var itemArray = items.ToArray();
var commaSeparated = String.Join(", ", itemArray, 0, Math.Max(itemArray.Length - 1, 0));
if (commaSeparated.Length > 0) commaSeparated += " and ";
return "{" + commaSeparated + itemArray.LastOrDefault() + "}";
}

Here's my submission. Modified the signature a bit to make it more generic. Using .NET 4 features (String.Join() using IEnumerable<T>), otherwise works with .NET 3.5. Goal was to use LINQ with drastically simplified logic.
static string CommaQuibbling<T>(IEnumerable<T> items)
{
int count = items.Count();
var quibbled = items.Select((Item, index) => new { Item, Group = (count - index - 2) > 0})
.GroupBy(item => item.Group, item => item.Item)
.Select(g => g.Key
? String.Join(", ", g)
: String.Join(" and ", g));
return "{" + String.Join(", ", quibbled) + "}";
}

There's a couple non-C# answers, and the original post did ask for answers in any language, so I thought I'd show another way to do it that none of the C# programmers seems to have touched upon: a DSL!
(defun quibble-comma (words)
(format nil "~{~#[~;~a~;~a and ~a~:;~#{~a~#[~; and ~:;, ~]~}~]~}" words))
The astute will note that Common Lisp doesn't really have an IEnumerable<T> built-in, and hence FORMAT here will only work on a proper list. But if you made an IEnumerable, you certainly could extend FORMAT to work on that, as well. (Does Clojure have this?)
Also, anyone reading this who has taste (including Lisp programmers!) will probably be offended by the literal "~{~#[~;~a~;~a and ~a~:;~#{~a~#[~; and ~:;, ~]~}~]~}" there. I won't claim that FORMAT implements a good DSL, but I do believe that it is tremendously useful to have some powerful DSL for putting strings together. Regex is a powerful DSL for tearing strings apart, and string.Format is a DSL (kind of) for putting strings together but it's stupidly weak.
I think everybody writes these kind of things all the time. Why the heck isn't there some built-in universal tasteful DSL for this yet? I think the closest we have is "Perl", maybe.

Just for fun, using the new Zip extension method from C# 4.0:
private static string CommaQuibbling(IEnumerable<string> list)
{
IEnumerable<string> separators = GetSeparators(list.Count());
var finalList = list.Zip(separators, (w, s) => w + s);
return string.Concat("{", string.Join(string.Empty, finalList), "}");
}
private static IEnumerable<string> GetSeparators(int itemCount)
{
while (itemCount-- > 2)
yield return ", ";
if (itemCount == 1)
yield return " and ";
yield return string.Empty;
}

return String.Concat(
"{",
input.Length > 2 ?
String.Concat(
String.Join(", ", input.Take(input.Length - 1)),
" and ",
input.Last()) :
String.Join(" and ", input),
"}");

I have tried using foreach. Please let me know your opinions.
private static string CommaQuibble(IEnumerable<string> input)
{
var val = string.Concat(input.Process(
p => p,
p => string.Format(" and {0}", p),
p => string.Format(", {0}", p)));
return string.Format("{{{0}}}", val);
}
public static IEnumerable<T> Process<T>(this IEnumerable<T> input,
Func<T, T> firstItemFunc,
Func<T, T> lastItemFunc,
Func<T, T> otherItemFunc)
{
//break on empty sequence
if (!input.Any()) yield break;
//return first elem
var first = input.First();
yield return firstItemFunc(first);
//break if there was only one elem
var rest = input.Skip(1);
if (!rest.Any()) yield break;
//start looping the rest of the elements
T prevItem = first;
bool isFirstIteration = true;
foreach (var item in rest)
{
if (isFirstIteration) isFirstIteration = false;
else
{
yield return otherItemFunc(prevItem);
}
prevItem = item;
}
//last element
yield return lastItemFunc(prevItem);
}

Here are a couple of solutions and testing code written in Perl based on the replies at http://blogs.perl.org/users/brian_d_foy/2013/10/comma-quibbling-in-perl.html.
#!/usr/bin/perl
use 5.14.0;
use warnings;
use strict;
use Test::More qw{no_plan};
sub comma_quibbling1 {
my (#words) = #_;
return "" unless #words;
return $words[0] if #words == 1;
return join(", ", #words[0 .. $#words - 1]) . " and $words[-1]";
}
sub comma_quibbling2 {
return "" unless #_;
my $last = pop #_;
return $last unless #_;
return join(", ", #_) . " and $last";
}
is comma_quibbling1(qw{}), "", "1-0";
is comma_quibbling1(qw{one}), "one", "1-1";
is comma_quibbling1(qw{one two}), "one and two", "1-2";
is comma_quibbling1(qw{one two three}), "one, two and three", "1-3";
is comma_quibbling1(qw{one two three four}), "one, two, three and four", "1-4";
is comma_quibbling2(qw{}), "", "2-0";
is comma_quibbling2(qw{one}), "one", "2-1";
is comma_quibbling2(qw{one two}), "one and two", "2-2";
is comma_quibbling2(qw{one two three}), "one, two and three", "2-3";
is comma_quibbling2(qw{one two three four}), "one, two, three and four", "2-4";

It hasn't quite been a decade since the last post so here's my variation:
public static string CommaQuibbling(IEnumerable<string> items)
{
var text = new StringBuilder();
string sep = null;
int last_pos = items.Count();
int next_pos = 1;
foreach(string item in items)
{
text.Append($"{sep}{item}");
sep = ++next_pos < last_pos ? ", " : " and ";
}
return $"{{{text}}}";
}

In one statement:
public static string CommaQuibbling(IEnumerable<string> inputList)
{
return
String.Concat("{",
String.Join(null, inputList
.Select((iw, i) =>
(i == (inputList.Count() - 1)) ? $"{iw}" :
(i == (inputList.Count() - 2) ? $"{iw} and " : $"{iw}, "))
.ToArray()), "}");
}

In .NET Core we can leverage SkipLast and TakeLast.
public static string CommaQuibblify(IEnumerable<string> items)
{
var head = string.Join(", ", items.SkipLast(2).Append(""));
var tail = string.Join(" and ", items.TakeLast(2));
return '{' + head + tail + '}';
}
https://dotnetfiddle.net/X58qvZ

Related

Get count of unique characters between first and last letter

I'm trying to get the unique characters count that are between the first and last letter of a word. For example: if I type Yellow the expected output is Y3w, if I type People the output should be P4e and if I type Money the output should be M3y. This is what I tried:
//var strArr = wordToConvert.Split(' ');
string[] strArr = new[] { "Money","Yellow", "People" };
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
//ignore 2-letter words
string newword = null;
int distinctCount = 0;
int k = word.Length;
int samecharcount = 0;
int count = 0;
for (int i = 1; i < k - 2; i++)
{
if (word.ElementAt(i) != word.ElementAt(i + 1))
{
count++;
}
else
{
samecharcount++;
}
}
distinctCount = count + samecharcount;
char frst = word[0];
char last = word[word.Length - 1];
newword = String.Concat(frst, distinctCount.ToString(), last);
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine("Output: " + result);
Console.WriteLine("----------------------------------------------------");
With this code I'm getting the expect output for Yellow, but seems that is not working with People and Money. What can I do to fix this issue or also I'm wondering is maybe there is a better way to do this for example using LINQ/Regex.
Here's an implementation that uses Linq:
string[] strArr = new[]{"Money", "Yellow", "People"};
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
// we want the first letter, the last letter, and the distinct count of everything in between
var first = word.First();
var last = word.Last();
var others = word.Skip(1).Take(word.Length - 2);
// Case sensitive
var distinct = others.Distinct();
// Case insensitive
// var distinct = others.Select(c => char.ToLowerInvariant(c)).Distinct();
string newword = first + distinct.Count().ToString() + last;
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine(result);
Output:
M3y Y3w P4e
Note that this doesn't take account of case, so the output for FiIsSh is 4.
Maybe not the most performant, but here is another example using linq:
var words = new[] { "Money","Yellow", "People" };
var transformedWords = words.Select(Transform);
var sentence = String.Join(' ', transformedWords);
public string Transform(string input)
{
if (input.Length < 3)
{
return input;
}
var count = input.Skip(1).SkipLast(1).Distinct().Count();
return $"{input[0]}{count}{input[^1]}";
}
You can implement it with the help of Linq. e.g. (C# 8+)
private static string EncodeWord(string value) => value.Length <= 2
? value
: $"{value[0]}{value.Substring(1, value.Length - 2).Distinct().Count()}{value[^1]}";
Demo:
string[] tests = new string[] {
"Money","Yellow", "People"
};
var report = string.Join(Environment.NewLine, tests
.Select(test => $"{test} :: {EncodeWord(test)}"));
Console.Write(report);
Outcome:
Money :: M3y
Yellow :: Y3w
People :: P4e
A lot of people have put up some good solutions. I have two solutions for you: one uses LINQ and the other does not.
LINQ, Probably not much different from others
if (str.Length < 3) return str;
var midStr = str.Substring(1, str.Length - 2);
var midCount = midStr.Distinct().Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
Non-LINQ
if (str.Length < 3) return str;
var uniqueLetters = new Dictionary<char, int>();
var midStr = str.Substring(1, str.Length - 2);
foreach (var c in midStr)
{
if (!uniqueLetters.ContainsKey(c))
{
uniqueLetters.Add(c, 0);
}
}
var midCount = uniqueLetters.Keys.Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
I tested this with the following 6 strings:
Yellow
Money
Purple
Me
You
Hiiiiiiiii
Output:
LINQ: Y3w, Non-LINQ: Y3w
LINQ: M3y, Non-LINQ: M3y
LINQ: P4e, Non-LINQ: P4e
LINQ: Me, Non-LINQ: Me
LINQ: Y1u, Non-LINQ: Y1u
LINQ: H1i, Non-LINQ: H1i
Fiddle
Performance-wise I'd guess they're pretty much the same, if not identical, but I haven't run any real perf test on the two approaches. I can't imagine they'd be much different, if at all. The only real difference is that the second route expands Distinct() into what it probably does under the covers anyway (I haven't looked at the source to see if that's true, but that's a pretty common way to get a count of . And the first route is certainly less code.
I Would use Linq for that purpose:
string[] words = new string[] { "Yellow" , "People", "Money", "Sh" }; // Sh for 2 letter words (or u can insert 0 and then remove the trinary operator)
foreach (string word in words)
{
int uniqeCharsInBetween = word.Substring(1, word.Length - 2).ToCharArray().Distinct().Count();
string result = word[0] + (uniqeCharsInBetween == 0 ? string.Empty : uniqeCharsInBetween.ToString()) + word[word.Length - 1];
Console.WriteLine(result);
}

Find the longest repetition character in String

Let's say i have a string like
string text = "hello dear";
Then I want to determine the longest repetition of coherented characters - in this case it would be ll. If there are more than one with the same count, take any.
I have tried to solve this with linq
char rchar = text.GroupBy(y => y).OrderByDescending(y => y.Count()).Select(x => x).First().Key;
int rcount = text.Where(x => x == rchar).Count();
string routput = new string(rchar, rcount);
But this returns ee. Am I on the right track?
Another solution using LINQ:
string text = "hello dear";
string longestRun = new string(text.Select((c, index) => text.Substring(index).TakeWhile(e => e == c))
.OrderByDescending(e => e.Count())
.First().ToArray());
Console.WriteLine(longestRun); // ll
It selects a sequence of substrings starting with the same repeating character, and creates the result string with the longest of them.
Although a regex or custom Linq extension is fine, if you don't mind "doing it the old way" you can achieve your result with two temporary variables and a classic foreach loop.
The logic is pretty simple, and runs in O(n). Loop through your string and compare the current character with the previous one.
If it's the same, increase your count. If it's different, reset it to 1.
Then, if your count is greater than the previous recorded max count, overwrite the rchar with your current character.
string text = "hello dear";
char rchar = text[0];
int rcount = 1;
int currentCount = 0;
char previousChar = char.MinValue;
foreach (char character in text)
{
if (character != previousChar)
{
currentCount = 1;
}
else
{
currentCount++;
}
if (currentCount >= rcount)
{
rchar = character;
rcount = currentCount;
}
previousChar = character;
}
string routput = new string(rchar, rcount);
It's indeed verbose, but gets the job done.
If you prefer doing things with LINQ, you can do it fairly simply with a LINQ extension:
var text = "hello dear";
var result = string.Join("",
text
.GroupAdjacentBy((l, r) => (l == r)) /* Create groups where the current character is
Is the same as the previous character */
.OrderByDescending(g => g.Count()) //Order by the group lengths
.First() //Take the first group (which is the longest run)
);
With this extension (taken from Use LINQ to group a sequence of numbers with no gaps):
public static class LinqExtensions
{
public static IEnumerable<IEnumerable<T>> GroupAdjacentBy<T>(this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var e = source.GetEnumerator())
{
if (e.MoveNext())
{
var list = new List<T> { e.Current };
var pred = e.Current;
while (e.MoveNext())
{
if (predicate(pred, e.Current))
{
list.Add(e.Current);
}
else
{
yield return list;
list = new List<T> { e.Current };
}
pred = e.Current;
}
yield return list;
}
}
}
}
Might be a bit overkill - but I've found GroupAdjacentBy quite useful in other situations as well.
RegEx would be a option
string text = "hello dear";
string Result = string.IsNullOrEmpty(text) ? string.Empty : Regex.Matches(text, #"(.)\1*", RegexOptions.None).Cast<Match>().OrderByDescending(x => x.Length).First().Value;

compare the characters in two strings

In C#, how do I compare the characters in two strings.
For example, let's say I have these two strings
"bc3231dsc" and "bc3462dsc"
How do I programically figure out the the strings
both start with "bc3" and end with "dsc"?
So the given would be two variables:
var1 = "bc3231dsc";
var2 = "bc3462dsc";
After comparing each characters from var1 to var2, I would want the output to be:
leftMatch = "bc3";
center1 = "231";
center2 = "462";
rightMatch = "dsc";
Conditions:
1. The strings will always be a length of 9 character.
2. The strings are not case sensitive.
The string class has 2 methods (StartsWith and Endwith) that you can use.
After reading your question and the already given answers i think there are some constraints are missing, which are maybe obvious to you, but not to the community. But maybe we can do a little guess work:
You'll have a bunch of string pairs that should be compared.
The two strings in each pair are of the same length or you are only interested by comparing the characters read simultaneously from left to right.
Get some kind of enumeration that tells me where each block starts and how long it is.
Due to the fact, that a string is only a enumeration of chars you could use LINQ here to get an idea of the matching characters like this:
private IEnumerable<bool> CommonChars(string first, string second)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
var charsToCompare = first.Zip(second, (LeftChar, RightChar) => new { LeftChar, RightChar });
var matchingChars = charsToCompare.Select(pair => pair.LeftChar == pair.RightChar);
return matchingChars;
}
With this we can proceed and now find out how long each block of consecutive true and false flags are with this method:
private IEnumerable<Tuple<int, int>> Pack(IEnumerable<bool> source)
{
if (source == null)
throw new ArgumentNullException("source");
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
bool current = iterator.Current;
int index = 0;
int length = 1;
while (iterator.MoveNext())
{
if(current != iterator.Current)
{
yield return Tuple.Create(index, length);
index += length;
length = 0;
}
current = iterator.Current;
length++;
}
yield return Tuple.Create(index, length);
}
}
Currently i don't know if there is an already existing LINQ function that provides the same functionality. As far as i have already read it should be possible with SelectMany() (cause in theory you can accomplish any LINQ task with this method), but as an adhoc implementation the above was easier (for me).
These functions could then be used in a way something like this:
var firstString = "bc3231dsc";
var secondString = "bc3462dsc";
var commonChars = CommonChars(firstString, secondString);
var packs = Pack(commonChars);
foreach (var item in packs)
{
Console.WriteLine("Left side: " + firstString.Substring(item.Item1, item.Item2));
Console.WriteLine("Right side: " + secondString.Substring(item.Item1, item.Item2));
Console.WriteLine();
}
Which would you then give this output:
Left side: bc3
Right side: bc3
Left side: 231
Right side: 462
Left side: dsc
Right side: dsc
The biggest drawback is in someway the usage of Tuple cause it leads to the ugly property names Item1 and Item2 which are far away from being instantly readable. But if it is really wanted you could introduce your own simple class holding two integers and has some rock-solid property names. Also currently the information is lost about if each block is shared by both strings or if they are different. But once again it should be fairly simply to get this information also into the tuple or your own class.
static void Main(string[] args)
{
string test1 = "bc3231dsc";
string tes2 = "bc3462dsc";
string firstmatch = GetMatch(test1, tes2, false);
string lasttmatch = GetMatch(test1, tes2, true);
string center1 = test1.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
string center2 = test2.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
}
public static string GetMatch(string fist, string second, bool isReverse)
{
if (isReverse)
{
fist = ReverseString(fist);
second = ReverseString(second);
}
StringBuilder builder = new StringBuilder();
char[] ar1 = fist.ToArray();
for (int i = 0; i < ar1.Length; i++)
{
if (fist.Length > i + 1 && ar1[i].Equals(second[i]))
{
builder.Append(ar1[i]);
}
else
{
break;
}
}
if (isReverse)
{
return ReverseString(builder.ToString());
}
return builder.ToString();
}
public static string ReverseString(string s)
{
char[] arr = s.ToCharArray();
Array.Reverse(arr);
return new string(arr);
}
Pseudo code of what you need..
int stringpos = 0
string resultstart = ""
while not end of string (either of the two)
{
if string1.substr(stringpos) == string1.substr(stringpos)
resultstart =resultstart + string1.substr(stringpos)
else
exit while
}
resultstart has you start string.. you can do the same going backwards...
Another solution you can use is Regular Expressions.
Regex re = new Regex("^bc3.*?dsc$");
String first = "bc3231dsc";
if(re.IsMatch(first)) {
//Act accordingly...
}
This gives you more flexibility when matching. The pattern above matches any string that starts in bc3 and ends in dsc with anything between except a linefeed. By changing .*? to \d, you could specify that you only want digits between the two fields. From there, the possibilities are endless.
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
List<string> common_str = commonStrings(s1,s2);
foreach ( var s in common_str)
Console.WriteLine(s);
}
static public List<string> commonStrings(string s1, string s2){
int len = s1.Length;
char [] match_chars = new char[len];
for(var i = 0; i < len ; ++i)
match_chars[i] = (Char.ToLower(s1[i])==Char.ToLower(s2[i]))? '#' : '_';
string pat = new String(match_chars);
Regex regex = new Regex("(#+)", RegexOptions.Compiled);
List<string> result = new List<string>();
foreach (Match match in regex.Matches(pat))
result.Add(s1.Substring(match.Index, match.Length));
return result;
}
}
for UPDATE CONDITION
using System;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
int len = 9;//s1.Length;//cond.1)
int l_pos = 0;
int r_pos = len;
for(int i=0;i<len && Char.ToLower(s1[i])==Char.ToLower(s2[i]);++i){
++l_pos;
}
for(int i=len-1;i>0 && Char.ToLower(s1[i])==Char.ToLower(s2[i]);--i){
--r_pos;
}
string leftMatch = s1.Substring(0,l_pos);
string center1 = s1.Substring(l_pos, r_pos - l_pos);
string center2 = s2.Substring(l_pos, r_pos - l_pos);
string rightMatch = s1.Substring(r_pos);
Console.Write(
"leftMatch = \"{0}\"\n" +
"center1 = \"{1}\"\n" +
"center2 = \"{2}\"\n" +
"rightMatch = \"{3}\"\n",leftMatch, center1, center2, rightMatch);
}
}

How can I speed this loop up? Is there a class for replacing multiple terms at at time?

The loop:
var pattern = _dict[key];
string before;
do
{
before = pattern;
foreach (var pair in _dict)
if (key != pair.Key)
pattern = pattern.Replace(string.Concat("{", pair.Key, "}"), string.Concat("(", pair.Value, ")"));
} while (pattern != before);
return pattern;
It just does a repeated find-and-replace on a bunch of keys. The dictionary is just <string,string>.
I can see 2 improvements to this.
Every time we do pattern.Replace it searches from the beginning of the string again. It would be better if when it hit the first {, it would just look through the list of keys for a match (perhaps using a binary search), and then replace the appropriate one.
The pattern != before bit is how I check if anything was replaced during that iteration. If the pattern.Replace function returned how many or if any replaces actually occured, I wouldn't need this.
However... I don't really want to write a big nasty thing class to do all that. This must be a fairly common scenario? Are there any existng solutions?
Full Class
Thanks to Elian Ebbing and ChrisWue.
class FlexDict : IEnumerable<KeyValuePair<string,string>>
{
private Dictionary<string, string> _dict = new Dictionary<string, string>();
private static readonly Regex _re = new Regex(#"{([_a-z][_a-z0-9-]*)}", RegexOptions.Compiled | RegexOptions.IgnoreCase);
public void Add(string key, string pattern)
{
_dict[key] = pattern;
}
public string Expand(string pattern)
{
pattern = _re.Replace(pattern, match =>
{
string key = match.Groups[1].Value;
if (_dict.ContainsKey(key))
return "(" + Expand(_dict[key]) + ")";
return match.Value;
});
return pattern;
}
public string this[string key]
{
get { return Expand(_dict[key]); }
}
public IEnumerator<KeyValuePair<string, string>> GetEnumerator()
{
foreach (var p in _dict)
yield return new KeyValuePair<string,string>(p.Key, this[p.Key]);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Example Usage
class Program
{
static void Main(string[] args)
{
var flex = new FlexDict
{
{"h", #"[0-9a-f]"},
{"nonascii", #"[\200-\377]"},
{"unicode", #"\\{h}{1,6}(\r\n|[ \t\r\n\f])?"},
{"escape", #"{unicode}|\\[^\r\n\f0-9a-f]"},
{"nmstart", #"[_a-z]|{nonascii}|{escape}"},
{"nmchar", #"[_a-z0-9-]|{nonascii}|{escape}"},
{"string1", #"""([^\n\r\f\\""]|\\{nl}|{escape})*"""},
{"string2", #"'([^\n\r\f\\']|\\{nl}|{escape})*'"},
{"badstring1", #"""([^\n\r\f\\""]|\\{nl}|{escape})*\\?"},
{"badstring2", #"'([^\n\r\f\\']|\\{nl}|{escape})*\\?"},
{"badcomment1", #"/\*[^*]*\*+([^/*][^*]*\*+)*"},
{"badcomment2", #"/\*[^*]*(\*+[^/*][^*]*)*"},
{"baduri1", #"url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}"},
{"baduri2", #"url\({w}{string}{w}"},
{"baduri3", #"url\({w}{badstring}"},
{"comment", #"/\*[^*]*\*+([^/*][^*]*\*+)*/"},
{"ident", #"-?{nmstart}{nmchar}*"},
{"name", #"{nmchar}+"},
{"num", #"[0-9]+|[0-9]*\.[0-9]+"},
{"string", #"{string1}|{string2}"},
{"badstring", #"{badstring1}|{badstring2}"},
{"badcomment", #"{badcomment1}|{badcomment2}"},
{"baduri", #"{baduri1}|{baduri2}|{baduri3}"},
{"url", #"([!#$%&*-~]|{nonascii}|{escape})*"},
{"s", #"[ \t\r\n\f]+"},
{"w", #"{s}?"},
{"nl", #"\n|\r\n|\r|\f"},
{"A", #"a|\\0{0,4}(41|61)(\r\n|[ \t\r\n\f])?"},
{"C", #"c|\\0{0,4}(43|63)(\r\n|[ \t\r\n\f])?"},
{"D", #"d|\\0{0,4}(44|64)(\r\n|[ \t\r\n\f])?"},
{"E", #"e|\\0{0,4}(45|65)(\r\n|[ \t\r\n\f])?"},
{"G", #"g|\\0{0,4}(47|67)(\r\n|[ \t\r\n\f])?|\\g"},
{"H", #"h|\\0{0,4}(48|68)(\r\n|[ \t\r\n\f])?|\\h"},
{"I", #"i|\\0{0,4}(49|69)(\r\n|[ \t\r\n\f])?|\\i"},
{"K", #"k|\\0{0,4}(4b|6b)(\r\n|[ \t\r\n\f])?|\\k"},
{"L", #"l|\\0{0,4}(4c|6c)(\r\n|[ \t\r\n\f])?|\\l"},
{"M", #"m|\\0{0,4}(4d|6d)(\r\n|[ \t\r\n\f])?|\\m"},
{"N", #"n|\\0{0,4}(4e|6e)(\r\n|[ \t\r\n\f])?|\\n"},
{"O", #"o|\\0{0,4}(4f|6f)(\r\n|[ \t\r\n\f])?|\\o"},
{"P", #"p|\\0{0,4}(50|70)(\r\n|[ \t\r\n\f])?|\\p"},
{"R", #"r|\\0{0,4}(52|72)(\r\n|[ \t\r\n\f])?|\\r"},
{"S", #"s|\\0{0,4}(53|73)(\r\n|[ \t\r\n\f])?|\\s"},
{"T", #"t|\\0{0,4}(54|74)(\r\n|[ \t\r\n\f])?|\\t"},
{"U", #"u|\\0{0,4}(55|75)(\r\n|[ \t\r\n\f])?|\\u"},
{"X", #"x|\\0{0,4}(58|78)(\r\n|[ \t\r\n\f])?|\\x"},
{"Z", #"z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z"},
{"Z", #"z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z"},
{"CDO", #"<!--"},
{"CDC", #"-->"},
{"INCLUDES", #"~="},
{"DASHMATCH", #"\|="},
{"STRING", #"{string}"},
{"BAD_STRING", #"{badstring}"},
{"IDENT", #"{ident}"},
{"HASH", #"#{name}"},
{"IMPORT_SYM", #"#{I}{M}{P}{O}{R}{T}"},
{"PAGE_SYM", #"#{P}{A}{G}{E}"},
{"MEDIA_SYM", #"#{M}{E}{D}{I}{A}"},
{"CHARSET_SYM", #"#charset\b"},
{"IMPORTANT_SYM", #"!({w}|{comment})*{I}{M}{P}{O}{R}{T}{A}{N}{T}"},
{"EMS", #"{num}{E}{M}"},
{"EXS", #"{num}{E}{X}"},
{"LENGTH", #"{num}({P}{X}|{C}{M}|{M}{M}|{I}{N}|{P}{T}|{P}{C})"},
{"ANGLE", #"{num}({D}{E}{G}|{R}{A}{D}|{G}{R}{A}{D})"},
{"TIME", #"{num}({M}{S}|{S})"},
{"PERCENTAGE", #"{num}%"},
{"NUMBER", #"{num}"},
{"URI", #"{U}{R}{L}\({w}{string}{w}\)|{U}{R}{L}\({w}{url}{w}\)"},
{"BAD_URI", #"{baduri}"},
{"FUNCTION", #"{ident}\("},
};
var testStrings = new[] { #"""str""", #"'str'", "5", "5.", "5.0", "a", "alpha", "url(hello)",
"url(\"hello\")", "url(\"blah)", #"\g", #"/*comment*/", #"/**/", #"<!--", #"-->", #"~=",
"|=", #"#hash", "#import", "#page", "#media", "#charset", "!/*iehack*/important"};
foreach (var pair in flex)
{
Console.WriteLine("{0}\n\t{1}\n", pair.Key, pair.Value);
}
var sw = Stopwatch.StartNew();
foreach (var str in testStrings)
{
Console.WriteLine("{0} matches: ", str);
foreach (var pair in flex)
{
if (Regex.IsMatch(str, "^(" + pair.Value + ")$", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
Console.WriteLine(" {0}", pair.Key);
}
}
Console.WriteLine("\nRan in {0} ms", sw.ElapsedMilliseconds);
Console.ReadLine();
}
}
Purpose
For building complex regular expressions that may extend eachother. Namely, I'm trying to implement the css spec.
I think it would be faster if you look for any occurrences of {foo} using a regular expression, and then use a MatchEvaluator that replaces the {foo} if foo happens to be a key in the dictionary.
I have currently no visual studio here, but I guess this is functionally equivalent with your code example:
var pattern = _dict[key];
bool isChanged = false;
do
{
isChanged = false;
pattern = Regex.Replace(pattern, "{([^}]+)}", match => {
string matchKey = match.Groups[1].Value;
if (matchKey != key && _dict.ContainsKey(matchKey))
{
isChanged = true;
return "(" + _dict[matchKey] + ")";
}
return match.Value;
});
} while (isChanged);
Can I ask you why you need the do/while loop? Can the value of a key in the dictionary again contain {placeholders} that have to be replaced? Can you be sure you don't get stuck in an infinite loop where key "A" contains "Blahblah {B}" and key "B" contains "Blahblah {A}"?
Edit: further improvements would be:
Using a precompiled Regex.
Using recursion instead of a loop (see ChrisWue's comment).
Using _dict.TryGetValue(), as in Guffa's code.
You will end up with an O(n) algorithm where n is the size of the output, so you can't do much better than this.
You should be able to use a regular expression to find the matches. Then you can also make use of the fast lookup of the dictionary and not just use it as a list.
var pattern = _dict[key];
bool replaced = false;
do {
pattern = Regex.Replace(pattern, #"\{([^\}]+)\}", m => {
string k = m.Groups[1].Value;
string value;
if (k != key && _dict.TryGetValue(k, out value) {
replaced = true;
return "(" + value + ")";
} else {
return "{" + k + "}";
}
});
} while (replaced);
return pattern;
You can implement the following algorithm:
Search for { in source string
Copy everything upto { to StringBuilder
Find matching } (the search is done from last fond position)
Compare value between { and } to keys in your dictionary
If it matches copy to String builder ( + Value + )
Else copy from source string
If source string end is not reached go to step 1
Could you use PLINQ at all?
Something along the lines of:
var keys = dict.KeyCollection.Where(k => k != key);
bool replacementMade = keys.Any();
foreach(var k in keys.AsParallel(), () => {replacement code})

How to word by word iterate in string in C#?

I want to iterate over string as word by word.
If I have a string "incidentno and fintype or unitno", I would like to read every word one by one as "incidentno", "and", "fintype", "or", and "unitno".
foreach (string word in "incidentno and fintype or unitno".Split(' ')) {
...
}
var regex = new Regex(#"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));
This works even if you have ".,; tabs and new lines" between your words.
Slightly twisted I know, but you could define an iterator block as an extension method on strings. e.g.
/// <summary>
/// Sweep over text
/// </summary>
/// <param name="Text"></param>
/// <returns></returns>
public static IEnumerable<string> WordList(this string Text)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
foreach (string word in "incidentno and fintype or unitno".WordList())
System.Console.WriteLine("'" + word + "'");
Which has the advantage of not creating a big array for long strings.
Use the Split method of the string class
string[] words = "incidentno and fintype or unitno".Split(" ");
This will split on spaces, so "words" will have [incidentno,and,fintype,or,unitno].
Assuming the words are always separated by a blank, you could use String.Split() to get an Array of your words.
There are multiple ways to accomplish this. Two of the most convenient methods (in my opinion) are:
Using string.Split() to create an array. I would probably use this method, because it is the most self-explanatory.
example:
string startingSentence = "incidentno and fintype or unitno";
string[] seperatedWords = startingSentence.Split(' ');
Alternatively, you could use (this is what I would use):
string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringSplitOptions.RemoveEmptyEntries will remove any empty entries from your array that may occur due to extra whitespace and other minor problems.
Next - to process the words, you would use:
foreach (string word in seperatedWords)
{
//Do something
}
Or, you can use regular expressions to solve this problem, as Darin demonstrated (a copy is below).
example:
var regex = new Regex(#"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));
For processing, you can use similar code to the first option.
foreach (string word in words)
{
//Do something
}
Of course, there are many ways to solve this problem, but I think that these two would be the simplest to implement and maintain. I would go with the first option (using string.Split()) just because regex can sometimes become quite confusing, while a split will function correctly most of the time.
When using split, what about checking for empty entries?
string sentence = "incidentno and fintype or unitno"
string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string word in words)
{
// Process
}
EDIT:
I can't comment so I'm posting here but this (posted above) works:
foreach (string word in "incidentno and fintype or unitno".Split(' '))
{
...
}
My understanding of foreach is that it first does a GetEnumerator() and the calles .MoveNext until false is returned. So the .Split won't be re-evaluated on each iteration
public static string[] MyTest(string inword, string regstr)
{
var regex = new Regex(regstr);
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase);
return words;
}
? MyTest("incidentno, and .fintype- or; :unitno",#"[^\w+]")
[0]: "incidentno"
[1]: "and"
[2]: "fintype"
[3]: "or"
[4]: "unitno"
I'd like to add some information to JDunkerley's awnser.
You can easily make this method more reliable if you give a string or char parameter to search for.
public static IEnumerable<string> WordList(this string Text,string Word)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
public static IEnumerable<string> WordList(this string Text, char c)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
I write a string processor class.You can use it.
Example:
metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString();
Class:
public static class StringProcessor
{
private static List<String> PrepositionList;
public static string ToNormalString(this string strText)
{
if (String.IsNullOrEmpty(strText)) return String.Empty;
char chNormalKaf = (char)1603;
char chNormalYah = (char)1610;
char chNonNormalKaf = (char)1705;
char chNonNormalYah = (char)1740;
string result = strText.Replace(chNonNormalKaf, chNormalKaf);
result = result.Replace(chNonNormalYah, chNormalYah);
return result;
}
public static List<KeyValuePair<String, Int32>> Process(this String bodyText,
List<String> blackListWords = null,
int minimumWordLength = 3,
char splitor = ' ',
bool perWordIsLowerCase = true)
{
string[] btArray = bodyText.ToNormalString().Split(splitor);
long numberOfWords = btArray.LongLength;
Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1);
foreach (string word in btArray)
{
if (word != null)
{
string lowerWord = word;
if (perWordIsLowerCase)
lowerWord = word.ToLower();
var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "")
.Replace("?", "").Replace("!", "").Replace(",", "")
.Replace("<br>", "").Replace(":", "").Replace(";", "")
.Replace("،", "").Replace("-", "").Replace("\n", "").Trim();
if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords)))
{
if (wordsDic.ContainsKey(normalWord))
{
var cnt = wordsDic[normalWord];
wordsDic[normalWord] = ++cnt;
}
else
{
wordsDic.Add(normalWord, 1);
}
}
}
}
List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList();
return keywords;
}
public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true)
{
List<KeyValuePair<String, Int32>> result = null;
if (isBasedOnFrequency)
result = list.OrderByDescending(q => q.Value).ToList();
else
result = list.OrderByDescending(q => q.Key).ToList();
return result;
}
public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10)
{
List<KeyValuePair<String, Int32>> result = list.Take(n).ToList();
return result;
}
public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list)
{
List<String> result = new List<String>();
foreach (var item in list)
{
result.Add(item.Key);
}
return result;
}
public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list)
{
List<Int32> result = new List<Int32>();
foreach (var item in list)
{
result.Add(item.Value);
}
return result;
}
public static String AsString<T>(this List<T> list, string seprator = ", ")
{
String result = string.Empty;
foreach (var item in list)
{
result += string.Format("{0}{1}", item, seprator);
}
return result;
}
private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords)
{
bool result = false;
if (blackListWords == null) return false;
foreach (var w in blackListWords)
{
if (w.ToNormalString().Equals(word))
{
result = true;
break;
}
}
return result;
}
}

Categories