I have a LIST<T> where T:IComparable<T>
I want to write a List<T> GetFirstNElements (IList<T> list, int n) where T :IComparable <T> which returns the first n distinct largest elements ( the list can have dupes) using expression trees.
In some performance-critical code I wrote recently, I had a very similar requirement - the candidate set was very large, and the number needed very small. To avoid sorting the entire candidate set, I use a custom extension method that simply keeps the n largest items found so far in a linked list. Then I can simply:
loop once over the candidates
if I haven't yet found "n" items, or the current item is better than the worst already selected, then add it (at the correct position) in the linked-list (inserts are cheap here)
if we now have more than "n" selected, drop the worst (deletes are cheap here)
then we are done; at the end of this, the linked-list contains the best "n" items, already sorted. No need to use expression-trees, and no "sort a huge list" overhead. Something like:
public static IEnumerable<T> TakeTopDistinct<T>(this IEnumerable<T> source, int count)
{
if (source == null) throw new ArgumentNullException("source");
if (count < 0) throw new ArgumentOutOfRangeException("count");
if (count == 0) yield break;
var comparer = Comparer<T>.Default;
LinkedList<T> selected = new LinkedList<T>();
foreach(var value in source)
{
if(selected.Count < count // need to fill
|| comparer.Compare(selected.Last.Value, value) < 0 // better candidate
)
{
var tmp = selected.First;
bool add = true;
while (tmp != null)
{
var delta = comparer.Compare(tmp.Value, value);
if (delta == 0)
{
add = false; // not distinct
break;
}
else if (delta < 0)
{
selected.AddBefore(tmp, value);
add = false;
if(selected.Count > count) selected.RemoveLast();
break;
}
tmp = tmp.Next;
}
if (add && selected.Count < count) selected.AddLast(value);
}
}
foreach (var value in selected) yield return value;
}
If i get the question right, you just want to sort the entries in the list.
Wouldn't it be possible for you to implement the IComparable and use the "Sort" Method of the List?
The code in "IComparable" can handle the tree compare and everything you want to use to compare and sort so you just can use the Sort mechnism at this point.
List<T> GetFirstNElements (IList<T> list, int n) where T :IComparable <T>{
list.Sort();
List<T> returnList = new List<T>();
for(int i = 0; i<n; i++){
returnList.Add(list[i]);
}
return returnList;
}
Wouldn't be the fastest code ;-)
The standard algorithm for doing so, which takes expected time O(list.Length) is in Wikipedia as "quickfindFirstK" on this page:
http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements
This improves on #Marc Gravell's answer because the expected running time of it is linear in the length of the input list, regardless of the value of n.
Related
In a checksum calculation algorithm I'm implementing, the input must be an even number of bytes - if it isn't an extra zero byte must be packed at the end.
I do not want to modify the input data to my method by actually adding an element (and the input might be non-modifiable). Neither do I want to create a new data structure and copy the input.
I wondered if LINQ is a good option to create a lightweight IEnumerable something like:
void Calculate(IList<byte> input)
{
IEnumerable<byte> items = (input.Count & 1 ==0) ? items : X(input,0x0);
foreach(var i in items)
{
...
}
}
i.e. what would X(...) look like?
You can use this iterator (yield return) extension method to add extra items to the end of an IEnumerable<T> without needing to initially iterate over the elements (which you would need to do in-order to get a .Count value).
Note that you should check if input is an IReadOnlyCollection<T> or an IList<T> because that means you can use a more optimal code path when the .Count can be known in-advance.
public static IEnumerable<T> EnsureModuloItems<T>( this IEnumerable<T> source, Int32 modulo, T defaultValue = default )
{
if( source is null ) throw new ArgumentNullException(nameof(source));
if( modulo < 1 ) throw new ArgumentOutOfRangeException( nameof(modulo), modulo, message: "Value must be 1 or greater." );
//
Int32 count = 0;
foreach( T item in source )
{
yield return item;
count++;
}
Int32 remainder = count % modulo;
for( Int32 i = 0; i < remainder; i++ )
{
yield return defaultValue;
}
}
Used like so:
foreach( Byte b in input.EnsureModuloItems( modulo: 2, defaultValue: 0x00 ) )
{
}
You might use Concat method for that
IEnumerable<byte> items = input.Count() % 2 == 0 ? input : input.Concat(new[] { (byte)0x0 });
I've also changed your code a little bit, there is no Count property for IEnumerable<T>, you should use Count() method.
Since Concat() accepts IEnumerable<T>, it requires to a List<T> ao array to it. You can make a simple extension method to wrap a single item as IEnumerable<T>
internal static class Ext
{
public static IEnumerable<T> Yield<T>(this T item)
{
yield return item;
}
}
and use it
IEnumerable<byte> items = input.Count() % 2 == 0 ? input : input.Concat(((byte)0x0).Yield());
However, according to comments, the better option here can be an Append method
IEnumerable<byte> items = input.Count() % 2 == 0 ? input : input.Append((byte)0x0);
I have a List of Lists.
To do some Opertations with each of those lists, i separate the Lists by a property and set a temp List with its value;
The list can be sometimes empty.
That is why i use this function for assignment.
EDIT:
My current solution is this simple method.
It should be easily adaptable.
private List<string> setList(List<string> a, int count)
{
List < string > retr;
if(a.Capacity == 0)
{
retr = new List<string>();
for(int counter = 0; counter < count; counter++)
{
retr.Add(string.empty);
}
}
else
{
retr = a;
}
return retr;
}
Is there a better way to either take a list as values or initialize a list with element count?
Or should I implement my own "List" class that has this behavior?
You could use Enumerable.Repeat<T> if you wanted to avoid the loop:
var list = Enumerable.Repeat<string>("", count).ToList();
But there are several things that are problematic with your code:
If Capacity is not 0, it doesn't mean it's equal to your desired count. Even if it is equal to the specified count, it doesn't mean that the actual List.Count is equal to count. A safer way would be to do:
static List<string> PreallocateList(List<string> a, int count)
{
// reuse the existing list?
if (a.Count >= count)
return a;
return Enumerable.Repeat("", count).ToList();
}
Preallocating a List<T> is unusual. It's usually common to use arrays when you have a fixed length known in advance.
// this would (perhaps) make more sense
var array = new string[count];
And keep in mind, as mentioned in 1., that list's Capacity is not the same as Count:
var list = new List<string>(10);
// this will print 10
Console.WriteLine("Capacity is {0}", list.Capacity);
// but this will throw an exception
list[0] = "";
Most likely, however, this method is unnecessary and there is a better way to accomplish what you're doing. If nothing else, I would play the safe card and simply instantiate a new list each time (presuming that you have an algorithm which depends on a preallocated list):
static List<string> PreallocateList(int count)
{
return Enumerable.Repeat("", count).ToList();
}
Or, if you are only interested in having the right capacity (not count), then just use the appropriate constructor:
static List<string> PreallocateList(int count)
{
// this will prevent internal array resizing, if that's your concern
return new List<string>(count);
}
Your method is meaningless but equivalent to
static List<string> setList(List<string> a, int count) =>
a.Capacity == 0 ? Enumerable.Repeat("", count).ToList() : a;
if you want Linq.
I have code that needs to know that a collection should not be empty or contain only one item.
In general, I want an extension of the form:
bool collectionHasAtLeast2Items = collection.AtLeast(2);
I can write an extension easily, enumerating over the collection and incrementing an indexer until I hit the requested size, or run out of elements, but is there something already in the LINQ framework that would do this? My thoughts (in order of what came to me) are::
bool collectionHasAtLeast2Items = collection.Take(2).Count() == 2; or
bool collectionHasAtLeast2Items = collection.Take(2).ToList().Count == 2;
Which would seem to work, though the behaviour of taking more elements than the collection contains is not defined (in the documentation) Enumerable.Take Method , however, it seems to do what one would expect.
It's not the most efficient solution, either enumerating once to take the elements, then enumerating again to count them, which is unnecessary, or enumerating once to take the elements, then constructing a list in order to get the count property which isn't enumerator-y, as I don't actually want the list.
It's not pretty as I always have to make two assertions, first taking 'x', then checking that I actually received 'x', and it depends upon undocumented behaviour.
Or perhaps I could use:
bool collectionHasAtLeast2Items = collection.ElementAtOrDefault(2) != null;
However, that's not semantically-clear. Maybe the best is to wrap that with a method-name that means what I want. I'm assuming that this will be efficient, I haven't reflected on the code.
Some other thoughts are using Last(), but I explicitly don't want to enumerate through the whole collection.
Or maybe Skip(2).Any(), again not semantically completely obvious, but better than ElementAtOrDefault(2) != null, though I would think they produce the same result?
Any thoughts?
public static bool AtLeast<T>(this IEnumerable<T> source, int count)
{
// Optimization for ICollection<T>
var genericCollection = source as ICollection<T>;
if (genericCollection != null)
return genericCollection.Count >= count;
// Optimization for ICollection
var collection = source as ICollection;
if (collection != null)
return collection.Count >= count;
// General case
using (var en = source.GetEnumerator())
{
int n = 0;
while (n < count && en.MoveNext()) n++;
return n == count;
}
}
You can use Count() >= 2, if you sequence implements ICollection?
Behind the scene, Enumerable.Count() extension method checks does the sequence under loop implements ICollection. If it does indeed, Count property returned, so target performance should be O(1).
Thus ((IEnumerable<T>)((ICollection)sequence)).Count() >= x also should have O(1).
You could use Count, but if performance is an issue, you will be better off with Take.
bool atLeastX = collection.Take(x).Count() == x;
Since Take (I believe) uses deferred execution, it will only go through the collection once.
abatishchev mentioned that Count is O(1) with ICollection, so you could do something like this and get the best of both worlds.
IEnumerable<int> col;
// set col
int x;
// set x
bool atLeastX;
if (col is ICollection<int>)
{
atLeastX = col.Count() >= x;
}
else
{
atLeastX = col.Take(x).Count() == x;
}
You could also use Skip/Any, in fact I bet it would be even faster than Take/Count.
I'm a complete LINQ newbie, so I don't know if my LINQ is incorrect for what I need to do or if my expectations of performance are too high.
I've got a SortedList of objects, keyed by int; SortedList as opposed to SortedDictionary because I'll be populating the collection with pre-sorted data. My task is to find either the exact key or, if there is no exact key, the one with the next higher value. If the search is too high for the list (e.g. highest key is 100, but search for 105), return null.
// The structure of this class is unimportant. Just using
// it as an illustration.
public class CX
{
public int KEY;
public DateTime DT;
}
static CX getItem(int i, SortedList<int, CX> list)
{
var items =
(from kv in list
where kv.Key >= i
select kv.Key);
if (items.Any())
{
return list[items.Min()];
}
return null;
}
Given a list of 50,000 records, calling getItem 500 times takes about a second and a half. Calling it 50,000 times takes over 2 minutes. This performance seems very poor. Is my LINQ bad? Am I expecting too much? Should I be rolling my own binary search function?
First, your query is being evaluated twice (once for Any, and once for Min). Second, Min requires that it iterate over the entire list, even though the fact that it's sorted means that the first item will be the minimum. You should be able to change this:
if (items.Any())
{
return list[items.Min()];
}
To this:
var default =
(from kv in list
where kv.Key >= i
select (int?)kv.Key).FirstOrDefault();
if(default != null) return list[default.Value];
return null;
UPDATE
Because you're selecting a value type, FirstOrDefault doesn't return a nullable object. I have altered your query to cast the selected value to an int? instead, allowing the resulting value to be checked for null. I would advocate this over using ContainsKey, as that would return true if your list contained a value for 0. For example, say you have the following values
0 2 4 6 8
If you were to pass in anything less than or equal to 8, then you would get the correct value. However, if you were to pass in 9, you would get 0 (default(int)), which is in the list but isn't a valid result.
Writing a binary search on your own can be tough.
Fortunately, Microsoft already wrote a pretty robust one: Array.BinarySearch<T>. This is, in fact, the method that SortedList<TKey, TValue>.IndexOfKey uses internally. Only problem is, it takes a T[] argument, instead of any IList<T> (like SortedList<TKey, TValue>.Keys).
You know what, though? There's this great tool called Reflector that lets you look at .NET source code...
Check it out: a generic BinarySearch extension method on IList<T>, taken straight from the reflected code of Microsoft's Array.BinarySearch<T> implementation.
public static int BinarySearch<T>(this IList<T> list, int index, int length, T value, IComparer<T> comparer) {
if (list == null)
throw new ArgumentNullException("list");
else if (index < 0 || length < 0)
throw new ArgumentOutOfRangeException((index < 0) ? "index" : "length");
else if (list.Count - index < length)
throw new ArgumentException();
int lower = index;
int upper = (index + length) - 1;
while (lower <= upper) {
int adjustedIndex = lower + ((upper - lower) >> 1);
int comparison = comparer.Compare(list[adjustedIndex], value);
if (comparison == 0)
return adjustedIndex;
else if (comparison < 0)
lower = adjustedIndex + 1;
else
upper = adjustedIndex - 1;
}
return ~lower;
}
public static int BinarySearch<T>(this IList<T> list, T value, IComparer<T> comparer) {
return list.BinarySearch(0, list.Count, value, comparer);
}
public static int BinarySearch<T>(this IList<T> list, T value) where T : IComparable<T> {
return list.BinarySearch(value, Comparer<T>.Default);
}
This will let you call list.Keys.BinarySearch and get the negative bitwise complement of the index you want in case the desired key isn't found (the below is taken basically straight from tzaman's answer):
int index = list.Keys.BinarySearch(i);
if (index < 0)
index = ~index;
var item = index < list.Count ? list[list.Keys[index]] : null;
return item;
Using LINQ on a SortedList will not give you the benefit of the sort.
For optimal performance, you should write your own binary search.
OK, just to give this a little more visibility - here's a more concise version of Adam Robinson's answer:
return list.FirstOrDefault(kv => kv.Key >= i).Value;
The FirstOrDefault function has an overload that accepts a predicate, which selects the first element satisfying a condition - you can use that to directly get the element you want, or null if it doesn't exist.
Why not use the BinarySearch that's built into the List class?
var keys = list.Keys.ToList();
int index = keys.BinarySearch(i);
if (index < 0)
index = ~index;
var item = index < keys.Count ? list[keys[index]] : null;
return item;
If the search target isn't in the list, BinarySearch returns the bit-wise complement of the next-higher item; we can use that to directly get you what you want by re-complementing the result if it's negative. If it becomes equal to the Count, your search key was bigger than anything in the list.
This should be much faster than doing a LINQ where, since it's already sorted...
As comments have pointed out, the ToList call will force an evaluation of the whole list, so this is only beneficial if you do multiple searches without altering the underlying SortedList, and you keep the keys list around separately.
Using OrderedDictionary in PowerCollections you can get an enumerator that starts where they key you are looking for should be... if it's not there, you'll get the next closest node and can then navigate forwards/backwards from that in O(log N) time per nav call.
This has the advantage of you not having to write your own search or even manage your own searches on top of a SortedList.
Is there a good way to enumerate through only a subset of a Collection in C#? That is, I have a collection of a large number of objects (say, 1000), but I'd like to enumerate through only elements 250 - 340. Is there a good way to get an Enumerator for a subset of the collection, without using another Collection?
Edit: should have mentioned that this is using .NET Framework 2.0.
Try the following
var col = GetTheCollection();
var subset = col.Skip(250).Take(90);
Or more generally
public static IEnumerable<T> GetRange(this IEnumerable<T> source, int start, int end) {
// Error checking removed
return source.Skip(start).Take(end - start);
}
EDIT 2.0 Solution
public static IEnumerable<T> GetRange<T>(IEnumerable<T> source, int start, int end ) {
using ( var e = source.GetEnumerator() ){
var i = 0;
while ( i < start && e.MoveNext() ) { i++; }
while ( i < end && e.MoveNext() ) {
yield return e.Current;
i++;
}
}
}
IEnumerable<Foo> col = GetTheCollection();
IEnumerable<Foo> range = GetRange(col, 250, 340);
I like to keep it simple (if you don't necessarily need the enumerator):
for (int i = 249; i < Math.Min(340, list.Count); i++)
{
// do something with list[i]
}
Adapting Jared's original code for .Net 2.0:
IEnumerable<T> GetRange(IEnumerable<T> source, int start, int end)
{
int i = 0;
foreach (T item in source)
{
i++;
if (i>end) yield break;
if (i>start) yield return item;
}
}
And to use it:
foreach (T item in GetRange(MyCollection, 250, 340))
{
// do something
}
Adapting Jarad's code once again, this extention method will get you a subset that is defined by item, not index.
//! Get subset of collection between \a start and \a end, inclusive
//! Usage
//! \code
//! using ExtensionMethods;
//! ...
//! var subset = MyList.GetRange(firstItem, secondItem);
//! \endcode
class ExtensionMethods
{
public static IEnumerable<T> GetRange<T>(this IEnumerable<T> source, T start, T end)
{
#if DEBUG
if (source.ToList().IndexOf(start) > source.ToList().IndexOf(end))
throw new ArgumentException("Start must be earlier in the enumerable than end, or both must be the same");
#endif
yield return start;
if (start.Equals(end))
yield break; //start == end, so we are finished here
using (var e = source.GetEnumerator())
{
while (e.MoveNext() && !e.Current.Equals(start)); //skip until start
while (!e.Current.Equals(end) && e.MoveNext()) //return items between start and end
yield return e.Current;
}
}
}
You might be able to do something with Linq. The way I would do this is to put the objects into an array, then I can choose which items I want to process based on the array id.
If you find that you need to do a fair amount of slicing and dicing of lists and collections, it might be worth climbing the learning curve into the C5 Generic Collection Library.