Query Expansion logic in C# - c#

I have to write logic for expanding queries for solr search engine. I am using
Dictionary<string, string> dicSynon = new Dictionary<string, string>();.
Each time I will be getiing strings like "2 ln ca". In my dictionary I am having synonyms for ln and ca as lane and california. Now I need to pass solr all the combination of strings. Like this
2 ln ca
2 lane ca
2 lane california
2 ln california
Please help me to build the logic....

This is a exercise in using combinatorics and Linqs SelectMany:
First you have to writeyourself some function to give you a sequence of synonyms given a word (including the word, so "2" will result in ("2")) - let's call this 'Synonmys' - using a dictionary it can look like this:
private Dictionary<string, IEnumerable<string>> synonyms = new Dictionary<string, IEnumerable<string>>();
public IEnumerable<string> GetSynonmys(string word)
{
return synonyms.ContainsKey(word) ? synonyms[word] : new[]{word};
}
(you have to fill the Dictionary yourself...)
Having this your task is rather easy - just think on it. You have to combine every synonym of a word with all the combinations you get from doing the task on the rest of the words - that's exactly where you can use SelectMany (I only paste the rest seperated by a space - maybe you should refactor a bit) - the Algorithm yourself is your standard recursive combinations-algorithm - you will see this a lot if you know this kind of problem:
public string[] GetWords(string text)
{
return text.Split(new[]{' '}); // add more seperators if you need
}
public IEnumerable<string> GetCombinations(string[] words, int lookAt = 0)
{
if (lookAt >= words.Length) return new[]{""};
var currentWord = words[lookAt];
var synonymsForCurrentWord = GetSynonmys(currentWord);
var combinationsForRest = GetCombinations(words, lookAt + 1);
return synonymsForCurrentWord.SelectMany(synonym => combinationsForRest.Select(rest => synonym + " " + rest));
}
Here is a complete example:
class Program
{
static void Main(string[] args)
{
var test = new Synonmys(Tuple.Create("ca", "ca"),
Tuple.Create("ca", "california"),
Tuple.Create("ln", "ln"),
Tuple.Create("ln", "lane"));
foreach (var comb in test.GetCombinations("2 ln ca"))
Console.WriteLine("Combination: " + comb);
}
}
class Synonmys
{
private Dictionary<string, IEnumerable<string>> synonyms;
public Synonmys(params Tuple<string, string>[] syns )
{
synonyms = syns.GroupBy(s => s.Item1).ToDictionary(g => g.Key, g => g.Select(kvp => kvp.Item2));
}
private IEnumerable<string> GetSynonmys(string word)
{
return synonyms.ContainsKey(word) ? synonyms[word] : new[]{word};
}
private string[] GetWords(string text)
{
return text.Split(new[]{' '}); // add more seperators if you need
}
private IEnumerable<string> GetCombinations(string[] words, int lookAt = 0)
{
if (lookAt >= words.Length) return new[]{""};
var currentWord = words[lookAt];
var synonymsForCurrentWord = GetSynonmys(currentWord);
var combinationsForRest = GetCombinations(words, lookAt + 1);
return synonymsForCurrentWord.SelectMany(synonym => combinationsForRest.Select(rest => synonym + " " + rest));
}
public IEnumerable<string> GetCombinations(string text)
{
return GetCombinations(GetWords(text));
}
}
Feel free to comment if something is not crystal-clear here ;)

Related

Replace every other of a certain char in a string

I have searched a lot to find a solution to this, but could not find anything. I do however suspect that it is because I don't know what to search for.
First, I have a string that I convert to an array. The string will be formatted like so:
"99.28099822998047,68.375 118.30699729919434,57.625 126.49999713897705,37.875 113.94499683380127,11.048999786376953 96.00499725341797,8.5"
I create the array with the following code:
public static Array StringToArray(string String)
{
var list = new List<string>();
string[] Coords = String.Split(' ', ',');
foreach (string Coord in Coords)
{
list.Add(Coord);
}
var array = list.ToArray();
return array;
}
Now my problem is; I am trying to find a way to convert it back into a string, with the same formatting. So, I could create a string simply using:
public static String ArrayToString(Array array)
{
string String = string.Join(",", array);
return String;
}
and then hopefully replace every 2nd "," with a space (" "). Is this possible? Or are there a whole other way you would do this?
Thank you in advance! I hope my question makes sense.
There is no built-in way of doing what you need. However, it's pretty trivial to achieve what it is you need e.g.
public static string[] StringToArray(string str)
{
return str.Replace(" ", ",").Split(',');
}
public static string ArrayToString(string[] array)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i <= array.Length-1; i++)
{
sb.AppendFormat(i % 2 != 0 ? "{0} " : "{0},", array[i]);
}
return sb.ToString();
}
If those are pairs of coordinates, you can start by parsing them like pairs, not like separate numbers:
public static IEnumerable<string[]> ParseCoordinates(string input)
{
return input.Split(' ').Select(vector => vector.Split(','));
}
It is easier then to reconstruct the original string:
public static string PrintCoordinates(IEnumerable<string[]> coords)
{
return String.Join(" ", coords.Select(vector => String.Join(",", vector)));
}
But if you absolutely need to have your data in a flat structure like array, it is then possible to convert it to a more structured format:
public static IEnumerable<string[]> Pairwise(string[] coords)
{
coords.Zip(coords.Skip(1), (coord1, coord2) => new[] { coord1, coord2 });
}
You then can use this method in conjunction with PrintCoordinates to reconstruct your initial string.
Here is a route to do it. I don't think other solutions were removing last comma or space. I also include a test.
public static String ArrayToString(Array array)
{
var useComma = true;
var stringBuilder = new StringBuilder();
foreach (var value in array)
{
if (useComma)
{
stringBuilder.AppendFormat("{0}{1}", value, ",");
}
else
{
stringBuilder.AppendFormat("{0}{1}", value, " ");
}
useComma = !useComma;
}
// Remove last space or comma
stringBuilder.Length = stringBuilder.Length - 1;
return stringBuilder.ToString();
}
[TestMethod]
public void ArrayToStringTest()
{
var expectedStringValue =
"99.28099822998047,68.375 118.30699729919434,57.625 126.49999713897705,37.875 113.94499683380127,11.048999786376953 96.00499725341797,8.5";
var array = new[]
{
"99.28099822998047",
"68.375",
"118.30699729919434",
"57.625",
"126.49999713897705",
"37.875",
"113.94499683380127",
"11.048999786376953",
"96.00499725341797",
"8.5",
};
var actualStringValue = ArrayToString(array);
Assert.AreEqual(expectedStringValue, actualStringValue);
}
Another way of doing it:
string inputString = "1.11,11.3 2.22,12.4 2.55,12.8";
List<string[]> splitted = inputString.Split(' ').Select(a => a.Split(',')).ToList();
string joined = string.Join(" ", splitted.Select(a => string.Join(",",a)).ToArray());
"splitted" list will look like this:
1.11 11.3
2.22 12.4
2.55 12.8
"joined" string is the same as "inputString"
Here's another approach to this problem.
public static string ArrayToString(string[] array)
{
Debug.Assert(array.Length % 2 == 0, "Array is not dividable by two.");
// Group all coordinates as pairs of two.
int index = 0;
var coordinates = from item in array
group item by index++ / 2
into pair
select pair;
// Format each coordinate pair with a comma.
var formattedCoordinates = coordinates.Select(i => string.Join(",", i));
// Now concatinate all the pairs with a space.
return string.Join(" ", formattedCoordinates);
}
And a simple demonstration:
public static void A_Simple_Test()
{
string expected = "1,2 3,4";
string[] array = new string[] { "1", "2", "3", "4" };
Debug.Assert(expected == ArrayToString(array));
}

Need algorithm to make simple program (sentence permutations)

I really cant understand how to make a simple algorithm on C# to solve my problem. So, we have a sentences:
{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}.
So, my program should make a lot of sentences looks like:
Hello my mate.
Hello my m8.
Hello my friend.
Hello my friends.
Hi my mate.
...
Hi-Hi my friends.
I know, there are a lot of programs which could do this, but i'd like to make it myself. Ofcourse, it should work with this too:
{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}.
Update I just wasn't too happy about my using the regexen to parse so simple input; yet I disliked the manual index manipulation jungle found in other answers.
So I replaced the tokenizing with a Enumerator-based scanner with two alternating token-states. This is more justified by the complexity of the input, and has a 'Linqy' feel to it (although it really isn't Linq). I have kept the original Regex based parser at the end of my post for interested readers.
This just had to be solved using Eric Lippert's/IanG's CartesianProduct Linq extension method, in which the core of the program becomes:
public static void Main(string[] args)
{
const string data = #"{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}.";
var pockets = Tokenize(data.GetEnumerator());
foreach (var result in CartesianProduct(pockets))
Console.WriteLine(string.Join("", result.ToArray()));
}
Using just two regexen (chunks and legs) to do the parsing into 'pockets', it becomes a matter of writing the CartesianProduct to the console :) Here is the full working code (.NET 3.5+):
using System;
using System.Text;
using System.Text.RegularExpressions;
using System.Linq;
using System.Collections.Generic;
namespace X
{
static class Y
{
private static bool ReadTill(this IEnumerator<char> input, string stopChars, Action<StringBuilder> action)
{
var sb = new StringBuilder();
try
{
while (input.MoveNext())
if (stopChars.Contains(input.Current))
return true;
else
sb.Append(input.Current);
} finally
{
action(sb);
}
return false;
}
private static IEnumerable<IEnumerable<string>> Tokenize(IEnumerator<char> input)
{
var result = new List<IEnumerable<string>>();
while(input.ReadTill("{", sb => result.Add(new [] { sb.ToString() })) &&
input.ReadTill("}", sb => result.Add(sb.ToString().Split('|'))))
{
// Console.WriteLine("Expected cumulative results: " + result.Select(a => a.Count()).Aggregate(1, (i,j) => i*j));
}
return result;
}
public static void Main(string[] args)
{
const string data = #"{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}.";
var pockets = Tokenize(data.GetEnumerator());
foreach (var result in CartesianProduct(pockets))
Console.WriteLine(string.Join("", result.ToArray()));
}
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] {item}));
}
}
}
Old Regex based parsing:
static readonly Regex chunks = new Regex(#"^(?<chunk>{.*?}|.*?(?={|$))+$", RegexOptions.Compiled);
static readonly Regex legs = new Regex(#"^{((?<alternative>.*?)[\|}])+(?<=})$", RegexOptions.Compiled);
private static IEnumerable<String> All(this Regex regex, string text, string group)
{
return !regex.IsMatch(text)
? new [] { text }
: regex.Match(text).Groups[group].Captures.Cast<Capture>().Select(c => c.Value);
}
public static void Main(string[] args)
{
const string data = #"{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}.";
var pockets = chunks.All(data, "chunk").Select(v => legs.All(v, "alternative"));
The rest is unchanged
Not sure what you need Linq (#user568262) or "simple" recursion (#Azad Salahli) for. Here's my take on it:
using System;
using System.Text;
class Program
{
static Random rng = new Random();
static string GetChoiceTemplatingResult(string t)
{
StringBuilder res = new StringBuilder();
for (int i = 0; i < t.Length; ++i)
if (t[i] == '{')
{
int j;
for (j = i + 1; j < t.Length; ++j)
if (t[j] == '}')
{
if (j - i < 1) continue;
var choices = t.Substring(i + 1, j - i - 1).Split('|');
res.Append(choices[rng.Next(choices.Length)]);
i = j;
break;
}
if (j == t.Length)
throw new InvalidOperationException("No matching } found.");
}
else
res.Append(t[i]);
return res.ToString();
}
static void Main(string[] args)
{
Console.WriteLine(GetChoiceTemplatingResult(
"{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}, {i|we} want to {tell|say} you {hello|hi|hi-hi}."));
}
}
As others have noted, you can solve your problem by splitting up the string into a sequence of sets, and then taking the Cartesian product of all of those sets. I wrote a bit about generating arbitrary Cartesial products here:
http://blogs.msdn.com/b/ericlippert/archive/2010/06/28/computing-a-cartesian-product-with-linq.aspx
An alternative approach, more powerful than that, is to declare a grammar for your language and then write a program that generates every string in that language. I wrote a long series of articles on how to do so. It starts here:
http://blogs.msdn.com/b/ericlippert/archive/2010/04/26/every-program-there-is-part-one.aspx
You can use a Tuple to hold index values of each collection.
For example, you would have something like:
List<string> Greetings = new List<string>()
{
"Hello",
"Hi",
"Hallo"
};
List<string> Targets = new List<string>()
{
"Mate",
"m8",
"friend",
"friends"
};
So now you have your greetings, let's create random numbers and fetch items.
static void Main(string[] args)
{
List<string> Greetings = new List<string>()
{
"Hello",
"Hi",
"Hallo"
};
List<string> Targets = new List<string>()
{
"Mate",
"m8",
"friend",
"friends"
};
var combinations = new List<Tuple<int, int>>();
Random random = new Random();
//Say you want 5 unique combinations.
while (combinations.Count < 6)
{
Tuple<int, int> tmpCombination = new Tuple<int, int>(random.Next(Greetings.Count), random.Next(Targets.Count));
if (!combinations.Contains(tmpCombination))
{
combinations.Add(tmpCombination);
}
}
foreach (var item in combinations)
{
Console.WriteLine("{0} my {1}", Greetings[item.Item1], Targets[item.Item2]);
}
Console.ReadKey();
}
This doesn't look trivial. You need to
1. do some parsing, to extract all the lists of words that you want to combine,
2. obtain all the actual combinations of these words (which is made harder by the fact that the number of lists you want to combine is not fixed)
3. rebuild the original sentence putting all the combinations in the place of the group they came from
part 1 (the parsing part) is probably the easiest: it could be done with a Regex like this
// get all the text within {} pairs
var pattern = #"\{(.*?)\}";
var query = "{Hello|Hi|Hi-Hi} my {mate|m8|friend|friends}.";
var matches = Regex.Matches(query, pattern);
// create a List of Lists
for(int i=0; i< matches.Count; i++)
{
var nl = matches[i].Groups[1].ToString().Split('|').ToList();
lists.Add(nl);
// build a "template" string like "{0} my {1}"
query = query.Replace(matches[i].Groups[1].ToString(), i.ToString());
}
for part 2 (taking a List of Lists and obtain all resulting combinations) you can refer to this answer
for part 3 (rebuilding your original sentence) you can now take the "template" string you have in query and use String.Format to substitute all the {0}, {1} .... with the combined values from part 2
// just one example,
// you will need to loop through all the combinations obtained from part 2
var OneResultingCombination = new List<string>() {"hi", "mate"};
var oneResult = string.Format(query, OneResultingCombination.ToArray());

How can I speed this loop up? Is there a class for replacing multiple terms at at time?

The loop:
var pattern = _dict[key];
string before;
do
{
before = pattern;
foreach (var pair in _dict)
if (key != pair.Key)
pattern = pattern.Replace(string.Concat("{", pair.Key, "}"), string.Concat("(", pair.Value, ")"));
} while (pattern != before);
return pattern;
It just does a repeated find-and-replace on a bunch of keys. The dictionary is just <string,string>.
I can see 2 improvements to this.
Every time we do pattern.Replace it searches from the beginning of the string again. It would be better if when it hit the first {, it would just look through the list of keys for a match (perhaps using a binary search), and then replace the appropriate one.
The pattern != before bit is how I check if anything was replaced during that iteration. If the pattern.Replace function returned how many or if any replaces actually occured, I wouldn't need this.
However... I don't really want to write a big nasty thing class to do all that. This must be a fairly common scenario? Are there any existng solutions?
Full Class
Thanks to Elian Ebbing and ChrisWue.
class FlexDict : IEnumerable<KeyValuePair<string,string>>
{
private Dictionary<string, string> _dict = new Dictionary<string, string>();
private static readonly Regex _re = new Regex(#"{([_a-z][_a-z0-9-]*)}", RegexOptions.Compiled | RegexOptions.IgnoreCase);
public void Add(string key, string pattern)
{
_dict[key] = pattern;
}
public string Expand(string pattern)
{
pattern = _re.Replace(pattern, match =>
{
string key = match.Groups[1].Value;
if (_dict.ContainsKey(key))
return "(" + Expand(_dict[key]) + ")";
return match.Value;
});
return pattern;
}
public string this[string key]
{
get { return Expand(_dict[key]); }
}
public IEnumerator<KeyValuePair<string, string>> GetEnumerator()
{
foreach (var p in _dict)
yield return new KeyValuePair<string,string>(p.Key, this[p.Key]);
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Example Usage
class Program
{
static void Main(string[] args)
{
var flex = new FlexDict
{
{"h", #"[0-9a-f]"},
{"nonascii", #"[\200-\377]"},
{"unicode", #"\\{h}{1,6}(\r\n|[ \t\r\n\f])?"},
{"escape", #"{unicode}|\\[^\r\n\f0-9a-f]"},
{"nmstart", #"[_a-z]|{nonascii}|{escape}"},
{"nmchar", #"[_a-z0-9-]|{nonascii}|{escape}"},
{"string1", #"""([^\n\r\f\\""]|\\{nl}|{escape})*"""},
{"string2", #"'([^\n\r\f\\']|\\{nl}|{escape})*'"},
{"badstring1", #"""([^\n\r\f\\""]|\\{nl}|{escape})*\\?"},
{"badstring2", #"'([^\n\r\f\\']|\\{nl}|{escape})*\\?"},
{"badcomment1", #"/\*[^*]*\*+([^/*][^*]*\*+)*"},
{"badcomment2", #"/\*[^*]*(\*+[^/*][^*]*)*"},
{"baduri1", #"url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}"},
{"baduri2", #"url\({w}{string}{w}"},
{"baduri3", #"url\({w}{badstring}"},
{"comment", #"/\*[^*]*\*+([^/*][^*]*\*+)*/"},
{"ident", #"-?{nmstart}{nmchar}*"},
{"name", #"{nmchar}+"},
{"num", #"[0-9]+|[0-9]*\.[0-9]+"},
{"string", #"{string1}|{string2}"},
{"badstring", #"{badstring1}|{badstring2}"},
{"badcomment", #"{badcomment1}|{badcomment2}"},
{"baduri", #"{baduri1}|{baduri2}|{baduri3}"},
{"url", #"([!#$%&*-~]|{nonascii}|{escape})*"},
{"s", #"[ \t\r\n\f]+"},
{"w", #"{s}?"},
{"nl", #"\n|\r\n|\r|\f"},
{"A", #"a|\\0{0,4}(41|61)(\r\n|[ \t\r\n\f])?"},
{"C", #"c|\\0{0,4}(43|63)(\r\n|[ \t\r\n\f])?"},
{"D", #"d|\\0{0,4}(44|64)(\r\n|[ \t\r\n\f])?"},
{"E", #"e|\\0{0,4}(45|65)(\r\n|[ \t\r\n\f])?"},
{"G", #"g|\\0{0,4}(47|67)(\r\n|[ \t\r\n\f])?|\\g"},
{"H", #"h|\\0{0,4}(48|68)(\r\n|[ \t\r\n\f])?|\\h"},
{"I", #"i|\\0{0,4}(49|69)(\r\n|[ \t\r\n\f])?|\\i"},
{"K", #"k|\\0{0,4}(4b|6b)(\r\n|[ \t\r\n\f])?|\\k"},
{"L", #"l|\\0{0,4}(4c|6c)(\r\n|[ \t\r\n\f])?|\\l"},
{"M", #"m|\\0{0,4}(4d|6d)(\r\n|[ \t\r\n\f])?|\\m"},
{"N", #"n|\\0{0,4}(4e|6e)(\r\n|[ \t\r\n\f])?|\\n"},
{"O", #"o|\\0{0,4}(4f|6f)(\r\n|[ \t\r\n\f])?|\\o"},
{"P", #"p|\\0{0,4}(50|70)(\r\n|[ \t\r\n\f])?|\\p"},
{"R", #"r|\\0{0,4}(52|72)(\r\n|[ \t\r\n\f])?|\\r"},
{"S", #"s|\\0{0,4}(53|73)(\r\n|[ \t\r\n\f])?|\\s"},
{"T", #"t|\\0{0,4}(54|74)(\r\n|[ \t\r\n\f])?|\\t"},
{"U", #"u|\\0{0,4}(55|75)(\r\n|[ \t\r\n\f])?|\\u"},
{"X", #"x|\\0{0,4}(58|78)(\r\n|[ \t\r\n\f])?|\\x"},
{"Z", #"z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z"},
{"Z", #"z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z"},
{"CDO", #"<!--"},
{"CDC", #"-->"},
{"INCLUDES", #"~="},
{"DASHMATCH", #"\|="},
{"STRING", #"{string}"},
{"BAD_STRING", #"{badstring}"},
{"IDENT", #"{ident}"},
{"HASH", #"#{name}"},
{"IMPORT_SYM", #"#{I}{M}{P}{O}{R}{T}"},
{"PAGE_SYM", #"#{P}{A}{G}{E}"},
{"MEDIA_SYM", #"#{M}{E}{D}{I}{A}"},
{"CHARSET_SYM", #"#charset\b"},
{"IMPORTANT_SYM", #"!({w}|{comment})*{I}{M}{P}{O}{R}{T}{A}{N}{T}"},
{"EMS", #"{num}{E}{M}"},
{"EXS", #"{num}{E}{X}"},
{"LENGTH", #"{num}({P}{X}|{C}{M}|{M}{M}|{I}{N}|{P}{T}|{P}{C})"},
{"ANGLE", #"{num}({D}{E}{G}|{R}{A}{D}|{G}{R}{A}{D})"},
{"TIME", #"{num}({M}{S}|{S})"},
{"PERCENTAGE", #"{num}%"},
{"NUMBER", #"{num}"},
{"URI", #"{U}{R}{L}\({w}{string}{w}\)|{U}{R}{L}\({w}{url}{w}\)"},
{"BAD_URI", #"{baduri}"},
{"FUNCTION", #"{ident}\("},
};
var testStrings = new[] { #"""str""", #"'str'", "5", "5.", "5.0", "a", "alpha", "url(hello)",
"url(\"hello\")", "url(\"blah)", #"\g", #"/*comment*/", #"/**/", #"<!--", #"-->", #"~=",
"|=", #"#hash", "#import", "#page", "#media", "#charset", "!/*iehack*/important"};
foreach (var pair in flex)
{
Console.WriteLine("{0}\n\t{1}\n", pair.Key, pair.Value);
}
var sw = Stopwatch.StartNew();
foreach (var str in testStrings)
{
Console.WriteLine("{0} matches: ", str);
foreach (var pair in flex)
{
if (Regex.IsMatch(str, "^(" + pair.Value + ")$", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
Console.WriteLine(" {0}", pair.Key);
}
}
Console.WriteLine("\nRan in {0} ms", sw.ElapsedMilliseconds);
Console.ReadLine();
}
}
Purpose
For building complex regular expressions that may extend eachother. Namely, I'm trying to implement the css spec.
I think it would be faster if you look for any occurrences of {foo} using a regular expression, and then use a MatchEvaluator that replaces the {foo} if foo happens to be a key in the dictionary.
I have currently no visual studio here, but I guess this is functionally equivalent with your code example:
var pattern = _dict[key];
bool isChanged = false;
do
{
isChanged = false;
pattern = Regex.Replace(pattern, "{([^}]+)}", match => {
string matchKey = match.Groups[1].Value;
if (matchKey != key && _dict.ContainsKey(matchKey))
{
isChanged = true;
return "(" + _dict[matchKey] + ")";
}
return match.Value;
});
} while (isChanged);
Can I ask you why you need the do/while loop? Can the value of a key in the dictionary again contain {placeholders} that have to be replaced? Can you be sure you don't get stuck in an infinite loop where key "A" contains "Blahblah {B}" and key "B" contains "Blahblah {A}"?
Edit: further improvements would be:
Using a precompiled Regex.
Using recursion instead of a loop (see ChrisWue's comment).
Using _dict.TryGetValue(), as in Guffa's code.
You will end up with an O(n) algorithm where n is the size of the output, so you can't do much better than this.
You should be able to use a regular expression to find the matches. Then you can also make use of the fast lookup of the dictionary and not just use it as a list.
var pattern = _dict[key];
bool replaced = false;
do {
pattern = Regex.Replace(pattern, #"\{([^\}]+)\}", m => {
string k = m.Groups[1].Value;
string value;
if (k != key && _dict.TryGetValue(k, out value) {
replaced = true;
return "(" + value + ")";
} else {
return "{" + k + "}";
}
});
} while (replaced);
return pattern;
You can implement the following algorithm:
Search for { in source string
Copy everything upto { to StringBuilder
Find matching } (the search is done from last fond position)
Compare value between { and } to keys in your dictionary
If it matches copy to String builder ( + Value + )
Else copy from source string
If source string end is not reached go to step 1
Could you use PLINQ at all?
Something along the lines of:
var keys = dict.KeyCollection.Where(k => k != key);
bool replacementMade = keys.Any();
foreach(var k in keys.AsParallel(), () => {replacement code})

Extract domain name from URL in C#

This question has answer in other languages/platforms but I couldn't find a robust solution in C#. Here I'm looking for the part of URL which we use in WHOIS so I'm not interested in sub-domains, port, schema, etc.
Example 1: http://s1.website.co.uk/folder/querystring?key=value => website.co.uk
Example 2: ftp://username:password#website.com => website.com
The result should be the same when the owner in whois is the same so sub1.xyz.com and sub2.xyz.com both belong to who has the xyz.com which I'm need to extract from a URL.
I needed the same, so I wrote a class that you can copy and paste into your solution. It uses a hard coded string array of tld's. http://pastebin.com/raw.php?i=VY3DCNhp
Console.WriteLine(GetDomain.GetDomainFromUrl("http://www.beta.microsoft.com/path/page.htm"));
outputs microsoft.com
and
Console.WriteLine(GetDomain.GetDomainFromUrl("http://www.beta.microsoft.co.uk/path/page.htm"));
outputs microsoft.co.uk
As #Pete noted, this is a little bit complicated, but I'll give it a try.
Note that this application must contain a complete list of known TLD's. These can be retrieved from http://publicsuffix.org/. Left extracting the list from this site as an exercise for the reader.
class Program
{
static void Main(string[] args)
{
var testCases = new[]
{
"www.domain.com.ac",
"www.domain.ac",
"domain.com.ac",
"domain.ac",
"localdomain",
"localdomain.local"
};
foreach (string testCase in testCases)
{
Console.WriteLine("{0} => {1}", testCase, UriHelper.GetDomainFromUri(new Uri("http://" + testCase + "/")));
}
/* Produces the following results:
www.domain.com.ac => domain.com.ac
www.domain.ac => domain.ac
domain.com.ac => domain.com.ac
domain.ac => domain.ac
localdomain => localdomain
localdomain.local => localdomain.local
*/
}
}
public static class UriHelper
{
private static HashSet<string> _tlds;
static UriHelper()
{
_tlds = new HashSet<string>
{
"com.ac",
"edu.ac",
"gov.ac",
"net.ac",
"mil.ac",
"org.ac",
"ac"
// Complete this list from http://publicsuffix.org/.
};
}
public static string GetDomainFromUri(Uri uri)
{
return GetDomainFromHostName(uri.Host);
}
public static string GetDomainFromHostName(string hostName)
{
string[] hostNameParts = hostName.Split('.');
if (hostNameParts.Length == 1)
return hostNameParts[0];
int matchingParts = FindMatchingParts(hostNameParts, 1);
return GetPartOfHostName(hostNameParts, hostNameParts.Length - matchingParts);
}
private static int FindMatchingParts(string[] hostNameParts, int offset)
{
if (offset == hostNameParts.Length)
return hostNameParts.Length;
string domain = GetPartOfHostName(hostNameParts, offset);
if (_tlds.Contains(domain.ToLowerInvariant()))
return (hostNameParts.Length - offset) + 1;
return FindMatchingParts(hostNameParts, offset + 1);
}
private static string GetPartOfHostName(string[] hostNameParts, int offset)
{
var sb = new StringBuilder();
for (int i = offset; i < hostNameParts.Length; i++)
{
if (sb.Length > 0)
sb.Append('.');
sb.Append(hostNameParts[i]);
}
string domain = sb.ToString();
return domain;
}
}
The closest you could get is the System.Uri.Host property, which would extract the sub1.xyz.com portion. Unfortunately, it's hard to know what exactly is the "toplevel" portion of the host (e.g. sub1.foo.co.uk versus sub1.xyz.com)
if you need to domain name then you can use URi.hostadress in .net
if you need the url from content then you need to parse them using regex.

Eric Lippert's challenge "comma-quibbling", best answer?

I wanted to bring this challenge to the attention of the stackoverflow community. The original problem and answers are here. BTW, if you did not follow it before, you should try to read Eric's blog, it is pure wisdom.
Summary:
Write a function that takes a non-null IEnumerable and returns a string with the following characteristics:
If the sequence is empty the resulting string is "{}".
If the sequence is a single item "ABC" then the resulting string is "{ABC}".
If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "{ABC and DEF}".
If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "{ABC, DEF, G and H}". (Note: no Oxford comma!)
As you can see even our very own Jon Skeet (yes, it is well known that he can be in two places at the same time) has posted a solution but his (IMHO) is not the most elegant although probably you can not beat its performance.
What do you think? There are pretty good options there. I really like one of the solutions that involves the select and aggregate methods (from Fernando Nicolet). Linq is very powerful and dedicating some time to challenges like this make you learn a lot. I twisted it a bit so it is a bit more performant and clear (by using Count and avoiding Reverse):
public static string CommaQuibbling(IEnumerable<string> items)
{
int last = items.Count() - 1;
Func<int, string> getSeparator = (i) => i == 0 ? string.Empty : (i == last ? " and " : ", ");
string answer = string.Empty;
return "{" + items.Select((s, i) => new { Index = i, Value = s })
.Aggregate(answer, (s, a) => s + getSeparator(a.Index) + a.Value) + "}";
}
Inefficient, but I think clear.
public static string CommaQuibbling(IEnumerable<string> items)
{
List<String> list = new List<String>(items);
if (list.Count == 0) { return "{}"; }
if (list.Count == 1) { return "{" + list[0] + "}"; }
String[] initial = list.GetRange(0, list.Count - 1).ToArray();
return "{" + String.Join(", ", initial) + " and " + list[list.Count - 1] + "}";
}
If I was maintaining the code, I'd prefer this to more clever versions.
How about this approach? Purely cumulative - no back-tracking, and only iterates once. For raw performance, I'm not sure you'll do better with LINQ etc, regardless of how "pretty" a LINQ answer might be.
using System;
using System.Collections.Generic;
using System.Text;
static class Program
{
public static string CommaQuibbling(IEnumerable<string> items)
{
StringBuilder sb = new StringBuilder('{');
using (var iter = items.GetEnumerator())
{
if (iter.MoveNext())
{ // first item can be appended directly
sb.Append(iter.Current);
if (iter.MoveNext())
{ // more than one; only add each
// term when we know there is another
string lastItem = iter.Current;
while (iter.MoveNext())
{ // middle term; use ", "
sb.Append(", ").Append(lastItem);
lastItem = iter.Current;
}
// add the final term; since we are on at least the
// second term, always use " and "
sb.Append(" and ").Append(lastItem);
}
}
}
return sb.Append('}').ToString();
}
static void Main()
{
Console.WriteLine(CommaQuibbling(new string[] { }));
Console.WriteLine(CommaQuibbling(new string[] { "ABC" }));
Console.WriteLine(CommaQuibbling(new string[] { "ABC", "DEF" }));
Console.WriteLine(CommaQuibbling(new string[] {
"ABC", "DEF", "G", "H" }));
}
}
If I was doing a lot with streams which required first/last information, I'd have thid extension:
[Flags]
public enum StreamPosition
{
First = 1, Last = 2
}
public static IEnumerable<R> MapWithPositions<T, R> (this IEnumerable<T> stream,
Func<StreamPosition, T, R> map)
{
using (var enumerator = stream.GetEnumerator ())
{
if (!enumerator.MoveNext ()) yield break ;
var cur = enumerator.Current ;
var flags = StreamPosition.First ;
while (true)
{
if (!enumerator.MoveNext ()) flags |= StreamPosition.Last ;
yield return map (flags, cur) ;
if ((flags & StreamPosition.Last) != 0) yield break ;
cur = enumerator.Current ;
flags = 0 ;
}
}
}
Then the simplest (not the quickest, that would need a couple more handy extension methods) solution will be:
public static string Quibble (IEnumerable<string> strings)
{
return "{" + String.Join ("", strings.MapWithPositions ((pos, item) => (
(pos & StreamPosition.First) != 0 ? "" :
pos == StreamPosition.Last ? " and " : ", ") + item)) + "}" ;
}
Here as a Python one liner
>>> f=lambda s:"{%s}"%", ".join(s)[::-1].replace(',','dna ',1)[::-1]
>>> f([])
'{}'
>>> f(["ABC"])
'{ABC}'
>>> f(["ABC","DEF"])
'{ABC and DEF}'
>>> f(["ABC","DEF","G","H"])
'{ABC, DEF, G and H}'
This version might be easier to understand
>>> f=lambda s:"{%s}"%" and ".join(s).replace(' and',',',len(s)-2)
>>> f([])
'{}'
>>> f(["ABC"])
'{ABC}'
>>> f(["ABC","DEF"])
'{ABC and DEF}'
>>> f(["ABC","DEF","G","H"])
'{ABC, DEF, G and H}'
Here's a simple F# solution, that only does one forward iteration:
let CommaQuibble items =
let sb = System.Text.StringBuilder("{")
// pp is 2 previous, p is previous
let pp,p = items |> Seq.fold (fun (pp:string option,p) s ->
if pp <> None then
sb.Append(pp.Value).Append(", ") |> ignore
(p, Some(s))) (None,None)
if pp <> None then
sb.Append(pp.Value).Append(" and ") |> ignore
if p <> None then
sb.Append(p.Value) |> ignore
sb.Append("}").ToString()
(EDIT: Turns out this is very similar to Skeet's.)
The test code:
let Test l =
printfn "%s" (CommaQuibble l)
Test []
Test ["ABC"]
Test ["ABC";"DEF"]
Test ["ABC";"DEF";"G"]
Test ["ABC";"DEF";"G";"H"]
Test ["ABC";null;"G";"H"]
I'm a fan of the serial comma: I eat, shoot, and leave.
I continually need a solution to this problem and have solved it in 3 languages (though not C#). I would adapt the following solution (in Lua, does not wrap answer in curly braces) by writing a concat method that works on any IEnumerable:
function commafy(t, andword)
andword = andword or 'and'
local n = #t -- number of elements in the numeration
if n == 1 then
return t[1]
elseif n == 2 then
return concat { t[1], ' ', andword, ' ', t[2] }
else
local last = t[n]
t[n] = andword .. ' ' .. t[n]
local answer = concat(t, ', ')
t[n] = last
return answer
end
end
This isn't brilliantly readable, but it scales well up to tens of millions of strings. I'm developing on an old Pentium 4 workstation and it does 1,000,000 strings of average length 8 in about 350ms.
public static string CreateLippertString(IEnumerable<string> strings)
{
char[] combinedString;
char[] commaSeparator = new char[] { ',', ' ' };
char[] andSeparator = new char[] { ' ', 'A', 'N', 'D', ' ' };
int totalLength = 2; //'{' and '}'
int numEntries = 0;
int currentEntry = 0;
int currentPosition = 0;
int secondToLast;
int last;
int commaLength= commaSeparator.Length;
int andLength = andSeparator.Length;
int cbComma = commaLength * sizeof(char);
int cbAnd = andLength * sizeof(char);
//calculate the sum of the lengths of the strings
foreach (string s in strings)
{
totalLength += s.Length;
++numEntries;
}
//add to the total length the length of the constant characters
if (numEntries >= 2)
totalLength += 5; // " AND "
if (numEntries > 2)
totalLength += (2 * (numEntries - 2)); // ", " between items
//setup some meta-variables to help later
secondToLast = numEntries - 2;
last = numEntries - 1;
//allocate the memory for the combined string
combinedString = new char[totalLength];
//set the first character to {
combinedString[0] = '{';
currentPosition = 1;
if (numEntries > 0)
{
//now copy each string into its place
foreach (string s in strings)
{
Buffer.BlockCopy(s.ToCharArray(), 0, combinedString, currentPosition * sizeof(char), s.Length * sizeof(char));
currentPosition += s.Length;
if (currentEntry == secondToLast)
{
Buffer.BlockCopy(andSeparator, 0, combinedString, currentPosition * sizeof(char), cbAnd);
currentPosition += andLength;
}
else if (currentEntry == last)
{
combinedString[currentPosition] = '}'; //set the last character to '}'
break; //don't bother making that last call to the enumerator
}
else if (currentEntry < secondToLast)
{
Buffer.BlockCopy(commaSeparator, 0, combinedString, currentPosition * sizeof(char), cbComma);
currentPosition += commaLength;
}
++currentEntry;
}
}
else
{
//set the last character to '}'
combinedString[1] = '}';
}
return new string(combinedString);
}
Another variant - separating punctuation and iteration logic for the sake of code clarity. And still thinking about perfomrance.
Works as requested with pure IEnumerable/string/ and strings in the list cannot be null.
public static string Concat(IEnumerable<string> strings)
{
return "{" + strings.reduce("", (acc, prev, cur, next) =>
acc.Append(punctuation(prev, cur, next)).Append(cur)) + "}";
}
private static string punctuation(string prev, string cur, string next)
{
if (null == prev || null == cur)
return "";
if (null == next)
return " and ";
return ", ";
}
private static string reduce(this IEnumerable<string> strings,
string acc, Func<StringBuilder, string, string, string, StringBuilder> func)
{
if (null == strings) return "";
var accumulatorBuilder = new StringBuilder(acc);
string cur = null;
string prev = null;
foreach (var next in strings)
{
func(accumulatorBuilder, prev, cur, next);
prev = cur;
cur = next;
}
func(accumulatorBuilder, prev, cur, null);
return accumulatorBuilder.ToString();
}
F# surely looks much better:
let rec reduce list =
match list with
| [] -> ""
| head::curr::[] -> head + " and " + curr
| head::curr::tail -> head + ", " + curr :: tail |> reduce
| head::[] -> head
let concat list = "{" + (list |> reduce ) + "}"
Disclaimer: I used this as an excuse to play around with new technologies, so my solutions don't really live up to the Eric's original demands for clarity and maintainability.
Naive Enumerator Solution
(I concede that the foreach variant of this is superior, as it doesn't require manually messing about with the enumerator.)
public static string NaiveConcatenate(IEnumerable<string> sequence)
{
StringBuilder sb = new StringBuilder();
sb.Append('{');
IEnumerator<string> enumerator = sequence.GetEnumerator();
if (enumerator.MoveNext())
{
string a = enumerator.Current;
if (!enumerator.MoveNext())
{
sb.Append(a);
}
else
{
string b = enumerator.Current;
while (enumerator.MoveNext())
{
sb.Append(a);
sb.Append(", ");
a = b;
b = enumerator.Current;
}
sb.AppendFormat("{0} and {1}", a, b);
}
}
sb.Append('}');
return sb.ToString();
}
Solution using LINQ
public static string ConcatenateWithLinq(IEnumerable<string> sequence)
{
return (from item in sequence select item)
.Aggregate(
new {sb = new StringBuilder("{"), a = (string) null, b = (string) null},
(s, x) =>
{
if (s.a != null)
{
s.sb.Append(s.a);
s.sb.Append(", ");
}
return new {s.sb, a = s.b, b = x};
},
(s) =>
{
if (s.b != null)
if (s.a != null)
s.sb.AppendFormat("{0} and {1}", s.a, s.b);
else
s.sb.Append(s.b);
s.sb.Append("}");
return s.sb.ToString();
});
}
Solution with TPL
This solution uses a producer-consumer queue to feed the input sequence to the processor, whilst keeping at least two elements buffered in the queue. Once the producer has reached the end of the input sequence, the last two elements can be processed with special treatment.
In hindsight there is no reason to have the consumer operate asynchronously, which would eliminate the need for a concurrent queue, but as I said previously, I was just using this as an excuse to play around with new technologies :-)
public static string ConcatenateWithTpl(IEnumerable<string> sequence)
{
var queue = new ConcurrentQueue<string>();
bool stop = false;
var consumer = Future.Create(
() =>
{
var sb = new StringBuilder("{");
while (!stop || queue.Count > 2)
{
string s;
if (queue.Count > 2 && queue.TryDequeue(out s))
sb.AppendFormat("{0}, ", s);
}
return sb;
});
// Producer
foreach (var item in sequence)
queue.Enqueue(item);
stop = true;
StringBuilder result = consumer.Value;
string a;
string b;
if (queue.TryDequeue(out a))
if (queue.TryDequeue(out b))
result.AppendFormat("{0} and {1}", a, b);
else
result.Append(a);
result.Append("}");
return result.ToString();
}
Unit tests elided for brevity.
Late entry:
public static string CommaQuibbling(IEnumerable<string> items)
{
string[] parts = items.ToArray();
StringBuilder result = new StringBuilder('{');
for (int i = 0; i < parts.Length; i++)
{
if (i > 0)
result.Append(i == parts.Length - 1 ? " and " : ", ");
result.Append(parts[i]);
}
return result.Append('}').ToString();
}
public static string CommaQuibbling(IEnumerable<string> items)
{
int count = items.Count();
string answer = string.Empty;
return "{" +
(count==0) ? "" :
( items[0] +
(count == 1 ? "" :
items.Range(1,count-1).
Aggregate(answer, (s,a)=> s += ", " + a) +
items.Range(count-1,1).
Aggregate(answer, (s,a)=> s += " AND " + a) ))+ "}";
}
It is implemented as,
if count == 0 , then return empty,
if count == 1 , then return only element,
if count > 1 , then take two ranges,
first 2nd element to 2nd last element
last element
Here's mine, but I realize it's pretty much like Marc's, some minor differences in the order of things, and I added unit-tests as well.
using System;
using NUnit.Framework;
using NUnit.Framework.Extensions;
using System.Collections.Generic;
using System.Text;
using NUnit.Framework.SyntaxHelpers;
namespace StringChallengeProject
{
[TestFixture]
public class StringChallenge
{
[RowTest]
[Row(new String[] { }, "{}")]
[Row(new[] { "ABC" }, "{ABC}")]
[Row(new[] { "ABC", "DEF" }, "{ABC and DEF}")]
[Row(new[] { "ABC", "DEF", "G", "H" }, "{ABC, DEF, G and H}")]
public void Test(String[] input, String expectedOutput)
{
Assert.That(FormatString(input), Is.EqualTo(expectedOutput));
}
//codesnippet:93458590-3182-11de-8c30-0800200c9a66
public static String FormatString(IEnumerable<String> input)
{
if (input == null)
return "{}";
using (var iterator = input.GetEnumerator())
{
// Guard-clause for empty source
if (!iterator.MoveNext())
return "{}";
// Take care of first value
var output = new StringBuilder();
output.Append('{').Append(iterator.Current);
// Grab next
if (iterator.MoveNext())
{
// Grab the next value, but don't process it
// we don't know whether to use comma or "and"
// until we've grabbed the next after it as well
String nextValue = iterator.Current;
while (iterator.MoveNext())
{
output.Append(", ");
output.Append(nextValue);
nextValue = iterator.Current;
}
output.Append(" and ");
output.Append(nextValue);
}
output.Append('}');
return output.ToString();
}
}
}
}
How about skipping complicated aggregation code and just cleaning up the string after you build it?
public static string CommaQuibbling(IEnumerable<string> items)
{
var aggregate = items.Aggregate<string, StringBuilder>(
new StringBuilder(),
(b,s) => b.AppendFormat(", {0}", s));
var trimmed = Regex.Replace(aggregate.ToString(), "^, ", string.Empty);
return string.Format(
"{{{0}}}",
Regex.Replace(trimmed,
", (?<last>[^,]*)$", #" and ${last}"));
}
UPDATED: This won't work with strings with commas, as pointed out in the comments. I tried some other variations, but without definite rules about what the strings can contain, I'm going to have real problems matching any possible last item with a regular expression, which makes this a nice lesson for me on their limitations.
I quite liked Jon's answer, but that's because it's much like how I approached the problem. Rather than specifically coding in the two variables, I implemented them inside of a FIFO queue.
It's strange because I just assumed that there would be 15 posts that all did exactly the same thing, but it looks like we were the only two to do it that way. Oh, looking at these answers, Marc Gravell's answer is quite close to the approach we used as well, but he's using two 'loops', rather than holding on to values.
But all those answers with LINQ and regex and joining arrays just seem like crazy-talk! :-)
I don't think that using a good old array is a restriction. Here is my version using an array and an extension method:
public static string CommaQuibbling(IEnumerable<string> list)
{
string[] array = list.ToArray();
if (array.Length == 0) return string.Empty.PutCurlyBraces();
if (array.Length == 1) return array[0].PutCurlyBraces();
string allExceptLast = string.Join(", ", array, 0, array.Length - 1);
string theLast = array[array.Length - 1];
return string.Format("{0} and {1}", allExceptLast, theLast)
.PutCurlyBraces();
}
public static string PutCurlyBraces(this string str)
{
return "{" + str + "}";
}
I am using an array because of the string.Join method and because if the possibility of accessing the last element via an index. The extension method is here because of DRY.
I think that the performance penalities come from the list.ToArray() and string.Join calls, but all in one I hope that piece of code is pleasent to read and maintain.
I think Linq provides fairly readable code. This version handles a million "ABC" in .89 seconds:
using System.Collections.Generic;
using System.Linq;
namespace CommaQuibbling
{
internal class Translator
{
public string Translate(IEnumerable<string> items)
{
return "{" + Join(items) + "}";
}
private static string Join(IEnumerable<string> items)
{
var leadingItems = LeadingItemsFrom(items);
var lastItem = LastItemFrom(items);
return JoinLeading(leadingItems) + lastItem;
}
private static IEnumerable<string> LeadingItemsFrom(IEnumerable<string> items)
{
return items.Reverse().Skip(1).Reverse();
}
private static string LastItemFrom(IEnumerable<string> items)
{
return items.LastOrDefault();
}
private static string JoinLeading(IEnumerable<string> items)
{
if (items.Any() == false) return "";
return string.Join(", ", items.ToArray()) + " and ";
}
}
}
You can use a foreach, without LINQ, delegates, closures, lists or arrays, and still have understandable code. Use a bool and a string, like so:
public static string CommaQuibbling(IEnumerable items)
{
StringBuilder sb = new StringBuilder("{");
bool empty = true;
string prev = null;
foreach (string s in items)
{
if (prev!=null)
{
if (!empty) sb.Append(", ");
else empty = false;
sb.Append(prev);
}
prev = s;
}
if (prev!=null)
{
if (!empty) sb.Append(" and ");
sb.Append(prev);
}
return sb.Append('}').ToString();
}
public static string CommaQuibbling(IEnumerable<string> items)
{
var itemArray = items.ToArray();
var commaSeparated = String.Join(", ", itemArray, 0, Math.Max(itemArray.Length - 1, 0));
if (commaSeparated.Length > 0) commaSeparated += " and ";
return "{" + commaSeparated + itemArray.LastOrDefault() + "}";
}
Here's my submission. Modified the signature a bit to make it more generic. Using .NET 4 features (String.Join() using IEnumerable<T>), otherwise works with .NET 3.5. Goal was to use LINQ with drastically simplified logic.
static string CommaQuibbling<T>(IEnumerable<T> items)
{
int count = items.Count();
var quibbled = items.Select((Item, index) => new { Item, Group = (count - index - 2) > 0})
.GroupBy(item => item.Group, item => item.Item)
.Select(g => g.Key
? String.Join(", ", g)
: String.Join(" and ", g));
return "{" + String.Join(", ", quibbled) + "}";
}
There's a couple non-C# answers, and the original post did ask for answers in any language, so I thought I'd show another way to do it that none of the C# programmers seems to have touched upon: a DSL!
(defun quibble-comma (words)
(format nil "~{~#[~;~a~;~a and ~a~:;~#{~a~#[~; and ~:;, ~]~}~]~}" words))
The astute will note that Common Lisp doesn't really have an IEnumerable<T> built-in, and hence FORMAT here will only work on a proper list. But if you made an IEnumerable, you certainly could extend FORMAT to work on that, as well. (Does Clojure have this?)
Also, anyone reading this who has taste (including Lisp programmers!) will probably be offended by the literal "~{~#[~;~a~;~a and ~a~:;~#{~a~#[~; and ~:;, ~]~}~]~}" there. I won't claim that FORMAT implements a good DSL, but I do believe that it is tremendously useful to have some powerful DSL for putting strings together. Regex is a powerful DSL for tearing strings apart, and string.Format is a DSL (kind of) for putting strings together but it's stupidly weak.
I think everybody writes these kind of things all the time. Why the heck isn't there some built-in universal tasteful DSL for this yet? I think the closest we have is "Perl", maybe.
Just for fun, using the new Zip extension method from C# 4.0:
private static string CommaQuibbling(IEnumerable<string> list)
{
IEnumerable<string> separators = GetSeparators(list.Count());
var finalList = list.Zip(separators, (w, s) => w + s);
return string.Concat("{", string.Join(string.Empty, finalList), "}");
}
private static IEnumerable<string> GetSeparators(int itemCount)
{
while (itemCount-- > 2)
yield return ", ";
if (itemCount == 1)
yield return " and ";
yield return string.Empty;
}
return String.Concat(
"{",
input.Length > 2 ?
String.Concat(
String.Join(", ", input.Take(input.Length - 1)),
" and ",
input.Last()) :
String.Join(" and ", input),
"}");
I have tried using foreach. Please let me know your opinions.
private static string CommaQuibble(IEnumerable<string> input)
{
var val = string.Concat(input.Process(
p => p,
p => string.Format(" and {0}", p),
p => string.Format(", {0}", p)));
return string.Format("{{{0}}}", val);
}
public static IEnumerable<T> Process<T>(this IEnumerable<T> input,
Func<T, T> firstItemFunc,
Func<T, T> lastItemFunc,
Func<T, T> otherItemFunc)
{
//break on empty sequence
if (!input.Any()) yield break;
//return first elem
var first = input.First();
yield return firstItemFunc(first);
//break if there was only one elem
var rest = input.Skip(1);
if (!rest.Any()) yield break;
//start looping the rest of the elements
T prevItem = first;
bool isFirstIteration = true;
foreach (var item in rest)
{
if (isFirstIteration) isFirstIteration = false;
else
{
yield return otherItemFunc(prevItem);
}
prevItem = item;
}
//last element
yield return lastItemFunc(prevItem);
}
Here are a couple of solutions and testing code written in Perl based on the replies at http://blogs.perl.org/users/brian_d_foy/2013/10/comma-quibbling-in-perl.html.
#!/usr/bin/perl
use 5.14.0;
use warnings;
use strict;
use Test::More qw{no_plan};
sub comma_quibbling1 {
my (#words) = #_;
return "" unless #words;
return $words[0] if #words == 1;
return join(", ", #words[0 .. $#words - 1]) . " and $words[-1]";
}
sub comma_quibbling2 {
return "" unless #_;
my $last = pop #_;
return $last unless #_;
return join(", ", #_) . " and $last";
}
is comma_quibbling1(qw{}), "", "1-0";
is comma_quibbling1(qw{one}), "one", "1-1";
is comma_quibbling1(qw{one two}), "one and two", "1-2";
is comma_quibbling1(qw{one two three}), "one, two and three", "1-3";
is comma_quibbling1(qw{one two three four}), "one, two, three and four", "1-4";
is comma_quibbling2(qw{}), "", "2-0";
is comma_quibbling2(qw{one}), "one", "2-1";
is comma_quibbling2(qw{one two}), "one and two", "2-2";
is comma_quibbling2(qw{one two three}), "one, two and three", "2-3";
is comma_quibbling2(qw{one two three four}), "one, two, three and four", "2-4";
It hasn't quite been a decade since the last post so here's my variation:
public static string CommaQuibbling(IEnumerable<string> items)
{
var text = new StringBuilder();
string sep = null;
int last_pos = items.Count();
int next_pos = 1;
foreach(string item in items)
{
text.Append($"{sep}{item}");
sep = ++next_pos < last_pos ? ", " : " and ";
}
return $"{{{text}}}";
}
In one statement:
public static string CommaQuibbling(IEnumerable<string> inputList)
{
return
String.Concat("{",
String.Join(null, inputList
.Select((iw, i) =>
(i == (inputList.Count() - 1)) ? $"{iw}" :
(i == (inputList.Count() - 2) ? $"{iw} and " : $"{iw}, "))
.ToArray()), "}");
}
In .NET Core we can leverage SkipLast and TakeLast.
public static string CommaQuibblify(IEnumerable<string> items)
{
var head = string.Join(", ", items.SkipLast(2).Append(""));
var tail = string.Join(" and ", items.TakeLast(2));
return '{' + head + tail + '}';
}
https://dotnetfiddle.net/X58qvZ

Categories