Linq - lookahead Iteration - c#

I am iterating thru a collection using a visitor-type pattern and need to access the current and next item in the list. At the moment I am doing it via an extension method like this
public void Visit<TItem>(this IEnumerable<TItem> theList, Action<TItem, TItem> visitor)
{
for (i = 0; i <= theList.Count - 1; i++) {
if (i == theList.Count - 1) {
visitor(theList(i), null);
} else {
visitor(theList(i), theList(i + 1));
}
}
}
I was wondering whether there are other/better/more elegant ways to achieve this? At the moment I think I only need to have access to the current and next items in the list, but I'm wondering whether I may encounter situations where I may need to lookahead the next 'n' items, for example.

Assuming you're using .NET 4, you can use Zip to accomplish the same thing:
var query = original.Zip(original.Skip(1),
(current, next) => new { current, next });
This will iterate over the sequence twice though. A nicer alternative to your current extension method (which I don't believe will work, btw, as IEnumerable doesn't have a Count property, and you're trying to call theList as a method as well...) would be something like:
public static void Visit<TItem>(this IEnumerable<TItem> theList,
Action<TItem, TItem> visitor)
{
TItem prev = default(TItem);
using (var iterator = theList.GetEnumerator())
{
if (!iterator.MoveNext())
{
return;
}
prev = iterator.Current;
while (iterator.MoveNext())
{
TItem current = iterator.Current;
visitor(prev, current);
prev = current;
}
}
visitor(prev, default(TItem)); // Are you sure you want this?
}
A more general lookahead is trickier, to be honest... you'd want some sort of circular buffer, I suspect... probably a custom collection.

When we run into a similar task we have defined an extension methods:
/// <summary>
/// Projects a window of source elements in a source sequence into target sequence.
/// Thus
/// target[i] =
/// selector(source[i], source[i - 1], ... source[i - window + 1])
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="window">A size of window.</param>
/// <param name="lookbehind">
/// Indicate whether to produce target if the number of source elements
/// preceeding the current is less than the window size.
/// </param>
/// <param name="lookahead">
/// Indicate whether to produce target if the number of source elements
/// following current is less than the window size.
/// </param>
/// <param name="selector">
/// A selector that derives target element.
/// On input it receives:
/// an array of source elements stored in round-robing fashon;
/// an index of the first element;
/// a number of elements in the array to count.
/// </param>
/// <returns>Returns a sequence of target elements.</returns>
public static IEnumerable<R> Window<T, R>(
this IEnumerable<T> source,
int window,
bool lookbehind,
bool lookahead,
Func<T[], int, int, R> selector)
{
var buffer = new T[window];
var index = 0;
var count = 0;
foreach(var value in source)
{
if (count < window)
{
buffer[count++] = value;
if (lookbehind || (count == window))
{
yield return selector(buffer, 0, count);
}
}
else
{
buffer[index] = value;
index = index + 1 == window ? 0 : index + 1;
yield return selector(buffer, index, count);
}
}
if (lookahead)
{
while(--count > 0)
{
index = index + 1 == window ? 0 : index + 1;
yield return selector(buffer, index, count);
}
}
}
/// <summary>
/// Projects a window of source elements in a source sequence into a
/// sequence of window arrays.
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="window">A size of window.</param>
/// <param name="lookbehind">
/// Indicate whether to produce target if the number of source elements
/// preceeding the current is less than the window size.
/// </param>
/// <param name="lookahead">
/// Indicate whether to produce target if the number of source elements
/// following current is less than the window size.
/// </param>
/// <returns>Returns a sequence of windows.</returns>
public static IEnumerable<T[]> Window<T>(
this IEnumerable<T> source,
int window,
bool lookbehind,
bool lookahead)
{
return source.Window(
window,
lookbehind,
lookahead,
(buffer, index, count) =>
{
var result = new T[count];
for(var i = 0; i < count; ++i)
{
result[i] = buffer[index];
index = index + 1 == buffer.Length ? 0 : index + 1;
}
return result;
});
}
These functions help to produce output elements from a window of input elements.
See also LINQ extensions.

It seems like you are using the wrong type. The act of indexing an sequence will iterate it until it reaches the specified index every single time. Why not use IList<T> or ReadOnlyCollection<T>?

Not tested, but I think this works? When the visit would exceed the bounds it loops to the front of the list.
public class FriendlyEnumerable<T> : IEnumerable<T>
{
private IEnumerable<T> _enum;
public FriendlyEnumerable(IEnumerable<T> enumerable)
{
_enum = enumerable;
}
public void VisitAll(Action<T, T> visitFunc)
{
VisitAll(visitFunc, 1);
}
public void VisitAll(Action<T, T> visitFunc, int lookahead)
{
int index = 0;
int length = _enum.Count();
_enum.ToList().ForEach(t =>
{
for (int i = 1; i <= lookahead; i++)
visitFunc(t, _enum.ElementAt((index + i) % length));
index++;
});
}
#region IEnumerable<T> Members
public IEnumerator<T> GetEnumerator()
{
return _enum.GetEnumerator();
}
#endregion
}
You could use it like:
List<string> results = new List<string>();
List<string> strings = new List<string>()
{ "a", "b", "c", "d", "a", "b", "c", "d" };
FriendlyEnumerable<string> fe = new FriendlyEnumerable<string>(strings);
Action<string, string> compareString =
new Action<string,string>((s1, s2) =>
{
if (s1 == s2)
results.Add(s1 + " == " + s2);
});
fe.VisitAll(compareString);
//no results
fe.VisitAll(compareString, 4);
//8 results

public static void VisitLookAhead<TItem>(
this IEnumerable<TItem> source,
Action<IEnumerable<TItem>> visitor,
int targetSize
)
{
if (targetSize <= 1)
{
throw new Exception("invalid targetSize for VisitLookAhead");
}
List<List<TItem>> collections = new List<List<TItem>>();
// after 6th iteration with targetSize 6
//1, 2, 3, 4, 5, 6 <-- foundlist
//2, 3, 4, 5, 6
//3, 4, 5, 6
//4, 5, 6
//5, 6
//6
foreach(TItem x in source)
{
collections.Add(new List<TItem>());
collections.ForEach(subList => subList.Add(x));
List<TItem> foundList = collections
.FirstOrDefault(subList => subList.Count == targetSize);
if (foundList != null)
{
collections.Remove(foundList);
visitor(foundList);
}
}
//generate extra lists at the end - when lookahead will be missing items.
foreach(int i in Enumerable.Range(1, targetSize)
{
collections.ForEach(subList => subList.Add(default(TItem)));
List<TItem> foundList = collections
.FirstOrDefault(subList => subList.Count == targetSize);
if (foundList != null)
{
collections.Remove(foundList);
visitor(foundList);
}
}
}

Related

How to use IF-ELSE in RPN(Reverse Polish Notation)?

i have done a RPN class to calculate strings which end-user input like
"1.0+3/2-tan(45)/(1+1)+sin(30)*abs(-1)+Abs(-10)"
Then, I want to parsing conditional statements and multi-parameters function such as "if(1>2,3/3,2*1)","max(1,2,3,4)"
So, my questions how to use IF-ELSE in the RPN?
Here's my code: enter link description here
For if(1>2,3/3,2*1) you would first evaluate the three argument from right to left and push their resuls on the stack so that it looked like this:
top-of-stack->false
1
2
Then if would be implemented in the RPN engine something like (pseudo-code):
void DoIf()
{
if (pop()) // pop result of "if" evaluation
{
var result = pop(); // pop "true" result from stack
pop(); // discard "false" result
push(result); // push back "true" result
}
else
{
pop(); // discard "true" result, leaving "false" result on stack
}
}
As for multi-parameter functions, there should be no special handling needed. Just evaluate and push all arguments (right to left, typically). The implementation of the function should pop off the required number of arguments and then push its result (if any).
i try to parse multi-parameters function such as if\Max before RPN.Parse()
public class MultiParameterFunctionParser
{
public readonly List<string> Funcs = new List<string> {"IF", "MAX"};
public string Parse(string exp)
{
while (IsFunction(exp,out var index,out var funcName))//
{
var parameters = GetParameters(exp, index, funcName, out var before, out var after);
var list = GetParameterList(parameters);
var value = Evaluate(list, funcName);
exp= $"{before}({value}){after}";
}
return exp;
}
/// <summary>
/// Is Exp Contains a function?
/// </summary>
/// <param name="exp"></param>
/// <param name="index"></param>
/// <param name="funcName"></param>
/// <returns></returns>
private bool IsFunction(string exp, out int index, out string funcName)
{
index = -1;
funcName = "";
foreach (var func in Funcs)
{
var idx = exp.IndexOf($"{func}(", StringComparison.CurrentCultureIgnoreCase);
if (idx == -1 || idx + 3 >= exp.Length - 1)
continue;
index = idx;
funcName = func;
break;
}
return index != -1 && index + 3 < exp.Length - 1;
}
/// <summary>
/// Get Parameters' string
/// </summary>
/// <param name="exp">8+if(12,sin(90),0)+1.2</param>
/// <param name="index">2 if's start index</param>
/// <param name="before">8+</param>
/// <param name="after">+1.2</param>
/// <returns>12,sin(90),0</returns>
private static string GetParameters(string exp,int index, string funcName, out string before, out string after)
{
before = exp.Substring(0, index);
index += funcName.Length + 1;
var leftCount = 1; // '(' count
var rightCount = 0;// ')' count
var results = "";
while (index < exp.Length && leftCount != rightCount)
{
var c = exp[index];
if (c.Equals('('))
leftCount++;
else if (c.Equals(')'))
rightCount++;
if (leftCount > rightCount)
results += c;
else
break;
index++;
}
after = exp.Substring(index + 1, exp.Length - index - 1);
return results;
}
/// <summary>
/// Parse Parameter string to list.
/// </summary>
/// <param name="exp">MAX(1,-1),1,0</param>
/// <returns>{"MAX(1,-1)","1","0"}</returns>
private static List<string> GetParameterList(string exp)
{
var count = exp.Length;
for (var i = count - 1; i > -1 && exp.Length > 0; i--)
{
var c = exp[i];
if (c != ',')
continue;
var after = exp.Substring(i + 1);
var before = exp.Substring(0,i);
if (after.Count(a => a == '(').Equals(after.Count(a => a == ')')))
{
exp = before + '#' + after;
}
}
var results = exp.Split('#').ToList();
return results;
}
private static double Evaluate(List<string> parameters, string funcName)
{
if (funcName.Equals("MAX", StringComparison.CurrentCultureIgnoreCase))
return EvaluateMax(parameters);
if (funcName.Equals("IF", StringComparison.CurrentCultureIgnoreCase))
return EvaluateIF(parameters);
return 0;
}
private static double EvaluateIF(List<string> parameters)
{
if (parameters == null || parameters.Count != 3)
throw new Exception("EvaluateIF parameters.Count()!=3");
var results = new List<double>();
foreach (var parameter in parameters)
{
var rpn = new RPN();
rpn.Parse(parameter);
var obj = rpn.Evaluate();
if (obj == null)
{
throw new Exception("EvaluateIF Not Number!");
}
if (obj.ToString().Equals("true", StringComparison.CurrentCultureIgnoreCase))
{
results.Add(1);
}
else if (obj.ToString().Equals("false", StringComparison.CurrentCultureIgnoreCase))
{
results.Add(-1);
}
else
{
if (double.TryParse(obj.ToString(), out var d))
results.Add(d);
else
throw new Exception("EvaluateIF Not Number!");
}
}
return results[0] >= 0 ? results[1] : results[2];
}
private static double EvaluateMax(IEnumerable<string> parameters)
{
var results = new List<double>();
foreach (var parameter in parameters)
{
var rpn = new RPN();
rpn.Parse(parameter);
var obj = rpn.Evaluate();
if (double.TryParse(obj.ToString(), out var d))
results.Add(d);
}
return results.Count > 0 ? results.Max() : 0;
}
}

What is c# alternative of splice in c++

I have a c++ code and trying to write in C#
I couldn't figure out best alternative to splice in c#
Also C++ has a 'find' too to work on map,which I want to implement in C# on dictionary
In your C++ example, you show:
statement_tokens.splice(statement_tokens.begin(), tokens, tokens.begin(), next_sc);
From what I understand (documentation), this overload takes an insert position, a list (of the same type), and the first (inclusive) and last (exclusive) indexes of a range to splice into the insert position, and then inserts this range into the original list.
Update: AND it removes the items from the other list. I just added that functionality.
If this is correct, then the following extension method should work:
List Extension Method (check the end of this answer for other overloads of this method)
public static class ListExtensions
{
public static void Splice<T>(this List<T> list, int insertAtIndex, List<T> items,
int first, int last)
{
if (items == null) return;
insertAtIndex = Math.Min(list.Count, Math.Max(0, insertAtIndex));
first = Math.Min(items.Count - 1, Math.Max(0, first));
last = Math.Min(items.Count, Math.Max(1, last));
if (first >= last) return;
list.InsertRange(insertAtIndex, items.GetRange(first, last - first));
items.RemoveRange(first, last - first);
}
}
Update 2: Now, it looks like you're missing another extension method for std::find_if, which returns the index of a list item in a specified range, based on a method that returns true if the item meets some condition. So let's add the following method to the ListExtensions class above:
public static int FindIf<T>(this List<T> list, int start, int end, Func<T, bool> method)
{
if (method == null) return end;
if (!list.Any(method)) return end;
start = Math.Min(list.Count - 1, Math.Max(0, start));
end = Math.Min(list.Count, Math.Max(1, end));
if (start >= end) return end;
var range = list.GetRange(start, end - start);
var index = range.IndexOf(list.First(method));
if (index < start) return end;
return index;
}
Notice that one of the arguments to this method is a function that takes an item of type T and returns a bool. This will be a simple method that checks if the string value of our token is a semicolon:
static bool TokenIsSemicolon(EvlToken token)
{
return (token != null && token.Str == ";");
}
Now, you may notice that I referenced token.Str. This is from the EvlToken class, which was created to mimic the C++ struct:
class EvlToken
{
public enum TokenType { Name, Number, Single }
public TokenType Type { get; set; }
public string Str { get; set; }
public int LineNo { get; set; }
}
Now we can finish the conversion of the original method, calling our FindIf and Splice extension methods:
static bool MoveTokensToStatement(List<EvlToken> statementTokens, List<EvlToken> tokens)
{
if (statementTokens == null || statementTokens.Count > 0) return false;
if (tokens == null || tokens.Count == 0) return false;
int nextSemiColon = tokens.FindIf(0, tokens.Count, TokenIsSemicolon);
if (nextSemiColon == tokens.Count)
{
Console.WriteLine("Looked for ';' but reached the end of the file.");
return false;
}
++nextSemiColon;
statementTokens.Splice(0, tokens, 0, nextSemiColon);
return true;
}
Additional Overloads
For completeness, here is the extensions class with the other two overloads mentioned in the documentation:
public static class ListExtensions
{
/// <summary>
/// Transfers all elements from 'items' into 'this' at the specified index
/// </summary>
/// <typeparam name="T">The type of items in the list</typeparam>
/// <param name="list">'this' instance</param>
/// <param name="insertAtIndex">The index to insert the items</param>
/// <param name="items">The list to transfer the items from</param>
public static void Splice<T>(this List<T> list, int insertAtIndex,
List<T> items)
{
if (items == null) return;
list.Splice(insertAtIndex, items, 0, items.Count);
}
/// <summary>
/// Transfers the element at 'itemIndex' from 'items'
/// into 'this' at the specified index
/// </summary>
/// <typeparam name="T">The type of items in the list</typeparam>
/// <param name="list">'this' instance</param>
/// <param name="insertAtIndex">The index to insert the item</param>
/// <param name="items">The list to transfer the item from</param>
/// <param name="itemIndex">The index of the item to transfer</param>
public static void Splice<T>(this List<T> list, int insertAtIndex,
List<T> items, int itemIndex)
{
list.Splice(insertAtIndex, items, itemIndex, itemIndex + 1);
}
/// <summary>
/// Transfers the specified range of elements from 'items'
/// into 'this' at the specified index
/// </summary>
/// <typeparam name="T">The type of items in the list</typeparam>
/// <param name="list">'this' instance</param>
/// <param name="insertAtIndex">The index to insert the item</param>
/// <param name="items">The list to transfer the item from</param>
/// <param name="first">The index of the first item in the range</param>
/// <param name="last">The exclusive index of the last item in the range</param>
public static void Splice<T>(this List<T> list, int insertAtIndex, List<T> items,
int first, int last)
{
if (items == null) return;
insertAtIndex = Math.Min(list.Count, Math.Max(0, insertAtIndex));
first = Math.Min(items.Count - 1, Math.Max(0, first));
last = Math.Min(items.Count, Math.Max(1, last));
if (first >= last) return;
list.InsertRange(insertAtIndex, items.GetRange(first, last - first));
items.RemoveRange(first, last - first);
}
/// <summary>
/// Searches for the first item in the specified range that "method" returns true for
/// </summary>
/// <typeparam name="T">The type of items in the list</typeparam>
/// <param name="list">'this' instance</param>
/// <param name="start">The index of the first item in the range</param>
/// <param name="end">The exclusive index of the last item in the range</param>
/// <param name="method">A method which takes type 'T' and returns a bool</param>
/// <returns>The index of the item, if found, otherwise 'end'</returns>
public static int FindIf<T>(this List<T> list, int start, int end, Func<T, bool> method)
{
if (method == null) return end;
if (!list.Any(method)) return end;
start = Math.Min(list.Count - 1, Math.Max(0, start));
end = Math.Min(list.Count, Math.Max(1, end));
if (start >= end) return end;
var range = list.GetRange(start, end - start);
var index = range.IndexOf(list.First(method));
if (index < start) return end;
return index;
}
}
Example Usage
Here's an example using a list of EvlTokens, and then calling MoveTokensToStatement twice:
private static void Main()
{
var tokens = new List<EvlToken>
{
new EvlToken {LineNo = 3, Str = "int", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 3, Str = "x", Type = EvlToken.TokenType.Name},
new EvlToken {LineNo = 3, Str = "=", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 3, Str = "1", Type = EvlToken.TokenType.Number},
new EvlToken {LineNo = 3, Str = "+", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 3, Str = "5", Type = EvlToken.TokenType.Number},
new EvlToken {LineNo = 3, Str = ";", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 4, Str = "Console", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 4, Str = ".", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 4, Str = "WriteLine", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 4, Str = "(", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 4, Str = "Hello World", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 4, Str = ")", Type = EvlToken.TokenType.Single},
new EvlToken {LineNo = 4, Str = ";", Type = EvlToken.TokenType.Single}
};
var statementTokens = new List<EvlToken>();
MoveTokensToStatement(statementTokens, tokens);
Console.WriteLine("Here is the result of calling 'MoveTokensToStatement' the first time:");
Console.WriteLine(string.Join(" ", statementTokens.Select(t => t.Str)));
statementTokens.Clear();
MoveTokensToStatement(statementTokens, tokens);
Console.WriteLine("\nHere is the result of calling 'MoveTokensToStatement' the second time:");
Console.WriteLine(string.Join("", statementTokens.Select(t => t.Str)));
statementTokens.Clear();
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output
#RufusL shows you how to write a method with the same postcondition, but doesn't really discuss any other characteristics of the algorithm. In particular, his algorithm has higher complexity than the C++ splice.
Namely, splice on a doubly linked list such as C++'s std::list is an O(1) operation, because it only requires a constant number of pointer swaps.
.NET does have a doubly linked list class in the base library, which is System.Collections.Generic.LinkedList, but it keeps pointers back to the list from each node (System.Collections.Generic.LinkedListNode, List property) and each list stores a count. As a result, in addition to the constant number of swaps of the forward and back pointers, O(n) node-to-list pointer updates will be required, and O(n) "distance" calculation is required in order to update the Count field on both lists.
So in order to achieve a true equivalent (O(1)) to C++ std::list::splice, one has to abandon the BCL LinkedList class and make a custom doubly linked list, with neither a cached Count field (LINQ Count() that walks the list will still work) nor pointers from list nodes to the list.

How do I use First() in LINQ but random?

In a list like this:
var colors = new List<string>{"green", "red", "blue", "black","purple"};
I can get the first value like this:
var color = colors.First(c => c.StartsWidth("b")); //This will return the string with "blue"
Bot how do I do it, if I want want a random value matching the conditions? For example something like this:
Debug.log(colors.RandomFirst(c => c.StartsWidth("b"))) // Prints out black
Debug.log(colors.RandomFirst(c => c.StartsWidth("b"))) // Prints out black
Debug.log(colors.RandomFirst(c => c.StartsWidth("b"))) // Prints out blue
Debug.log(colors.RandomFirst(c => c.StartsWidth("b"))) // Prints out black
As in if there are multiple entries in the list matching the condition, i want to pull one of them randomly.
It has (I need it to be) to be an inline solution.
Thank you.
Random ordering then:
var rnd = new Random();
var color = colors.Where(c => c.StartsWith("b"))
.OrderBy(x => rnd.Next())
.First();
The above generates a random number for each element and sorts the results by that number.
You propbably won't notice a random results if you have only 2 elements matching your condition. But you can try the below sample (using the extension method below):
var colors = Enumerable.Range(0, 100).Select(i => "b" + i);
var rnd = new Random();
for (int i = 0; i < 5; i++)
{
Console.WriteLine(colors.RandomFirst(x => x.StartsWith("b"), rnd));
}
Output:
b23
b73
b27
b11
b8
You can create an extension method out of this called RandomFirst:
public static class MyExtensions
{
public static T RandomFirst<T>(this IEnumerable<T> source, Func<T, bool> predicate,
Random rnd)
{
return source.Where(predicate).OrderBy(i => rnd.Next()).First();
}
}
Usage:
var rnd = new Random();
var color1 = colors.RandomFirst(x => x.StartsWith("b"), rnd);
var color2 = colors.RandomFirst(x => x.StartsWith("b"), rnd);
var color3 = colors.RandomFirst(x => x.StartsWith("b"), rnd);
Optimization:
If you're worried about performance, you can try this optimized method (cuts the time to half for large lists):
public static T RandomFirstOptimized<T>(this IEnumerable<T> source,
Func<T, bool> predicate, Random rnd)
{
var matching = source.Where(predicate);
int matchCount = matching.Count();
if (matchCount == 0)
matching.First(); // force the exception;
return matching.ElementAt(rnd.Next(0, matchCount));
}
In case you have IList<T> you could also write a tiny extension method to pick a random element:
static class IListExtensions
{
private static Random _rnd = new Random();
public static void PickRandom<T>(this IList<T> items) =>
return items[_rnd.Next(items.Count)];
}
and use it like this:
var color = colors.Where(c => c.StartsWith("b")).ToList().PickRandom();
Another implementation is to extract all possible colors (sample) and take random one from them:
// Simplest, but not thread safe
private static Random random = new Random();
...
// All possible colors: [blue, black]
var sample = colors
.Where(c => c.StartsWidth("b"))
.ToArray();
var color = sample[random.Next(sample.Length)];
Simple way for short sequences if you don't mind iterating the sequence twice:
var randomItem = sequence.Skip(rng.Next(sequence.Count())).First();
For example (error handling elided for clarity):
var colors = new List<string> { "bronze", "green", "red", "blue", "black", "purple", "brown" };
var rng = new Random();
for (int i = 0; i < 10; ++i)
{
var sequence = colors.Where(c => c.StartsWith("b"));
var randomItem = sequence.Skip(rng.Next(sequence.Count())).First();
Console.WriteLine(randomItem);
}
This is an O(N) solution, but requires that the sequence is iterated once to get the count, then again to select a random item.
More complex solution using Reservoir Sampling suitable for long sequences
You can randomly select N items from a sequence of an unknown length in a single pass (O(N)) without resorting to expensive sorting, using a method known as Reservoir Sampling.
You would especially want to use Reservoir Sampling when:
The number of items to randomly choose from is large
The number of items to randomly choose from is unknown in advance
The number of items to randomly choose is small compared to the number of items to choose from
although you can use it for other situations too.
Here's a sample implementation:
/// <summary>
/// This uses Reservoir Sampling to select <paramref name="n"/> items from a sequence of items of unknown length.
/// The sequence must contain at least <paramref name="n"/> items.
/// </summary>
/// <typeparam name="T">The type of items in the sequence from which to randomly choose.</typeparam>
/// <param name="items">The sequence of items from which to randomly choose.</param>
/// <param name="n">The number of items to randomly choose<paramref name="items"/>.</param>
/// <param name="rng">A random number generator.</param>
/// <returns>The randomly chosen items.</returns>
public static T[] RandomlySelectedItems<T>(IEnumerable<T> items, int n, System.Random rng)
{
var result = new T[n];
int index = 0;
int count = 0;
foreach (var item in items)
{
if (index < n)
{
result[count++] = item;
}
else
{
int r = rng.Next(0, index + 1);
if (r < n)
result[r] = item;
}
++index;
}
if (index < n)
throw new ArgumentException("Input sequence too short");
return result;
}
For your case, you will need to pass n as 1, and you will receive an array of size 1.
You could use it like this (but note that this has no error checking, in the case that colors.Where(c => c.StartsWith("b") returns an empty sequence):
var colors = new List<string> { "green", "red", "blue", "black", "purple" };
var rng = new Random();
for (int i = 0; i < 10; ++i)
Console.WriteLine(RandomlySelectedItems(colors.Where(c => c.StartsWith("b")), 1, rng)[0]);
However, if you want to call this multiple times rather than just once, then you would be better off shuffling the array and accessing the first N items in the shuffled array. (It's hard to tell what your actual usage pattern will be from the question.)
I have created these two RandomOrDefault that are optimized to work on IList. One with predicate and one without it.
/// <summary>
/// Get a random element in the list
/// </summary>
public static TSource RandomOrDefault<TSource>(this IList<TSource> source)
{
if (source == null || source.Count == 0)
return default;
if (source.Count == 1)
return source[0];
var rand = new Random();
return source[rand.Next(source.Count)];
}
/// <summary>
/// Get a random element in the list that satisfies a condition
/// </summary>
public static TSource RandomOrDefault<TSource>(this IList<TSource> source, Func<TSource, bool> predicate)
{
if (source == null || source.Count == 0)
return default;
if (source.Count == 1)
{
var first = source[0];
if (predicate(first))
return first;
return default;
}
var matching = source.Where(predicate);
int matchCount = matching.Count();
if (matchCount == 0)
return default;
var rand = new Random();
return matching.ElementAt(rand.Next(matchCount));
}

How to GroupBy objects by numeric values with tolerance factor?

I have a C# list of objects with the following simplified data:
ID, Price
2, 80.0
8, 44.25
14, 43.5
30, 79.98
54, 44.24
74, 80.01
I am trying to GroupBy the lowest number while taking into account a tolerance factor.
for example, in a case of tolerance = 0.02, my expected result should be:
44.24 -> 8, 54
43.5 -> 14
79.98 -> 2, 30, 74
How can i do this while achieving a good performance for large datasets?
Is LINQ the way to go in this case?
It seemed to me that if you have a large data set you'll want to avoid the straightforward solution of sorting the values and then collecting them as you iterate through the sorted list, since sorting a large collection can be expensive. The most efficient solution I could think of which doesn't do any explicit sorting was to build a tree where each node contains the items where the key falls within a "contiguous" range (where all the keys are within tolerance of each other) - the range for each node expands every time an item is added which falls outside the range by less than tolerance. I implemented a solution - which turned out to be more complicated and interesting than I expected - and based on my rough benchmarking it looks like doing it this way takes about half as much time as the straightforward solution.
Here's my implementation as an extension method (so you can chain it, although like the normal Group method it'll iterate the source completely as soon as the result IEnumerable is iterated).
public static IEnumerable<IGrouping<double, TValue>> GroupWithTolerance<TValue>(
this IEnumerable<TValue> source,
double tolerance,
Func<TValue, double> keySelector)
{
if(source == null)
throw new ArgumentNullException("source");
return GroupWithToleranceHelper<TValue>.Group(source, tolerance, keySelector);
}
private static class GroupWithToleranceHelper<TValue>
{
public static IEnumerable<IGrouping<double, TValue>> Group(
IEnumerable<TValue> source,
double tolerance,
Func<TValue, double> keySelector)
{
Node root = null, current = null;
foreach (var item in source)
{
var key = keySelector(item);
if(root == null) root = new Node(key);
current = root;
while(true){
if(key < current.Min - tolerance) { current = (current.Left ?? (current.Left = new Node(key))); }
else if(key > current.Max + tolerance) {current = (current.Right ?? (current.Right = new Node(key)));}
else
{
current.Values.Add(item);
if(current.Max < key){
current.Max = key;
current.Redistribute(tolerance);
}
if(current.Min > key) {
current.Min = key;
current.Redistribute(tolerance);
}
break;
}
}
}
if (root != null)
{
foreach (var entry in InOrder(root))
{
yield return entry;
}
}
else
{
//Return an empty collection
yield break;
}
}
private static IEnumerable<IGrouping<double, TValue>> InOrder(Node node)
{
if(node.Left != null)
foreach (var element in InOrder(node.Left))
yield return element;
yield return node;
if(node.Right != null)
foreach (var element in InOrder(node.Right))
yield return element;
}
private class Node : IGrouping<double, TValue>
{
public double Min;
public double Max;
public readonly List<TValue> Values = new List<TValue>();
public Node Left;
public Node Right;
public Node(double key) {
Min = key;
Max = key;
}
public double Key { get { return Min; } }
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<TValue> GetEnumerator() { return Values.GetEnumerator(); }
public IEnumerable<TValue> GetLeftValues(){
return Left == null ? Values : Values.Concat(Left.GetLeftValues());
}
public IEnumerable<TValue> GetRightValues(){
return Right == null ? Values : Values.Concat(Right.GetRightValues());
}
public void Redistribute(double tolerance)
{
if(this.Left != null) {
this.Left.Redistribute(tolerance);
if(this.Left.Max + tolerance > this.Min){
this.Values.AddRange(this.Left.GetRightValues());
this.Min = this.Left.Min;
this.Left = this.Left.Left;
}
}
if(this.Right != null) {
this.Right.Redistribute(tolerance);
if(this.Right.Min - tolerance < this.Max){
this.Values.AddRange(this.Right.GetLeftValues());
this.Max = this.Right.Max;
this.Right = this.Right.Right;
}
}
}
}
}
You can switch double to another type if you need to (I so wish C# had a numeric generic constraint).
The most straight-forward approach is to design your own IEqualityComparer<double>.
public class ToleranceEqualityComparer : IEqualityComparer<double>
{
public double Tolerance { get; set; } = 0.02;
public bool Equals(double x, double y)
{
return x - Tolerance <= y && x + Tolerance > y;
}
//This is to force the use of Equals methods.
public int GetHashCode(double obj) => 1;
}
Which you should use like so
var dataByPrice = data.GroupBy(d => d.Price, new ToleranceEqualityComparer());
Here is a new implementation that ultimately passed unit tests that the other two solutions failed. It implements the same signature as the currently accepted answer. The unit tests checked to ensure no groups resulted in a min and max value larger than the tolerance and that the number of items grouped matched the items provided.
How to use
var values = new List<Tuple<double, string>>
{
new Tuple<double, string>(113.5, "Text Item 1"),
new Tuple<double, string>(109.62, "Text Item 2"),
new Tuple<double, string>(159.06, "Text Item 3"),
new Tuple<double, string>(114, "Text Item 4")
};
var groups = values.GroupWithTolerance(5, a => a.Item1).ToList();
Extension Method
/// <summary>
/// Groups items of an IEnumerable collection while allowing a tolerance that all items within the group will fall within
/// </summary>
/// <typeparam name="TValue"></typeparam>
/// <param name="source"></param>
/// <param name="tolerance"></param>
/// <param name="keySelector"></param>
/// <returns></returns>
/// <exception cref="ArgumentNullException"></exception>
public static IEnumerable<IGrouping<double, TValue>> GroupWithTolerance<TValue>(
this IEnumerable<TValue> source,
double tolerance,
Func<TValue, double> keySelector
)
{
var sortedValuesWithKey = source
.Select((a, i) => Tuple.Create(a, keySelector(a), i))
.OrderBy(a => a.Item2)
.ToList();
var diffsByIndex = sortedValuesWithKey
.Skip(1)
//i will start at 0 but we are targeting the diff between 0 and 1.
.Select((a, i) => Tuple.Create(i + 1, sortedValuesWithKey[i + 1].Item2 - sortedValuesWithKey[i].Item2))
.ToList();
var groupBreaks = diffsByIndex
.Where(a => a.Item2 > tolerance)
.Select(a => a.Item1)
.ToHashSet();
var groupKeys = new double[sortedValuesWithKey.Count];
void AddRange(int startIndex, int endIndex)
{
//If there is just one value in the group, take a short cut.
if (endIndex - startIndex == 0)
{
groupKeys[sortedValuesWithKey[startIndex].Item3] = sortedValuesWithKey[startIndex].Item2;
return;
}
var min = sortedValuesWithKey[startIndex].Item2;
var max = sortedValuesWithKey[endIndex].Item2;
//If the range is within tolerance, we are done with this group.
if (max - min < tolerance)
{
//Get the average value of the group and assign it to all elements.
var rangeValues = new List<double>(endIndex - startIndex);
for (var x = startIndex; x <= endIndex; x++)
rangeValues.Add(sortedValuesWithKey[x].Item2);
var average = rangeValues.Average();
for (var x = startIndex; x <= endIndex; x++)
groupKeys[sortedValuesWithKey[x].Item3] = average;
return;
}
//The range is not within tolerance and needs to be divided again.
//Find the largest gap and divide.
double maxDiff = -1;
var splitIndex = -1;
for (var i = startIndex; i < endIndex; i++)
{
var currentDif = diffsByIndex[i].Item2;
if (currentDif > maxDiff)
{
maxDiff = currentDif;
splitIndex = i;
}
}
AddRange(startIndex, splitIndex);
AddRange(splitIndex + 1, endIndex);
}
var groupStartIndex = 0;
for (var i = 1; i < sortedValuesWithKey.Count; i++)
{
//There isn't a group break here, at least not yet, so continue.
if (!groupBreaks.Contains(i))
continue;
AddRange(groupStartIndex, i - 1);
groupStartIndex = i;
}
//Add the last group's keys if we haven't already.
if (groupStartIndex < sortedValuesWithKey.Count)
AddRange(groupStartIndex, sortedValuesWithKey.Count - 1);
return sortedValuesWithKey.GroupBy(a => groupKeys[a.Item3], a => a.Item1);
}

List<string>.Contains using trim

It would be nice if this worked, but alas it doesn't.
List<string> items = new List<string>();
items.Add("a ");
bool useTrim = true;
if (items.Contains("a", useTrim)) {
Console.WriteLine("I'm happy");
}
I ended up implementing it as an extension method below. But I was wondering if anyone else had any elegant ideas other than creating a comparer class or looping through.
/// <summary>
/// Determines whether an element in the List of strings
/// matches the item. .Trim() is applied to each element
/// for the comparison
/// </summary>
/// <param name="value">a list of strings</param>
/// <param name="item">the string to search for in the list</param>
/// <returns>true if item is found in the list</returns>
public static bool ContainsTrimmed(this List<string> value, string item) {
bool ret = false;
if ((value.FindIndex(s => s.Trim() == item)) >= 0) {
ret = true;
}
return ret;
}
Well you'll either need to loop through it each time, or create another list of just the trimmed values, and use that for searching. (Heck, you could create a HashSet<string> if you only need to know whether or not a trimmed value is present.)
However, if you want to stick to just a single list, then rather than using FindIndex I'd use Any from LINQ:
if (items.Any(x => x.Trim() == item))
Note that even if you do want to keep your ContainsTrimmed method, you can simplify it to just:
return value.FindIndex(s => s.Trim() == item) >= 0;
I would suggest creating a custom IEqualityComparer to supply to the overloaded function Contains.
This is exactly the reason why this overload exists.
class TrimmedEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
if (x == null && y != null || x != null && y == null)
return false;
if (x == null && y == null)
return true;
return x.Trim() == y.Trim();
}
public int GetHashCode(string obj)
{
return obj != null ? obj.GetHashCode() : 0;
}
}
You call it like this.
var strs = new string[] {"a ", "b ", "c"};
if (strs.Contains("b", new TrimmedEqualityComparer()))
Console.WriteLine("I'm happy");

Categories