Set/extend List<T> length in c# - c#

Given a List<T> in c# is there a way to extend it (within its capacity) and set the new elements to null? I'd like something that works like a memset. I'm not looking for sugar here, I want fast code. I known that in C the operation could be done in something like 1-3 asm ops per entry.
The best solution I've found is this:
list.AddRange(Enumerable.Repeat(null, count-list.Count));
however that is c# 3.0 (<3.0 is preferred) and might be generating and evaluating an enumerator.
My current code uses:
while(list.Count < lim) list.Add(null);
so that's the starting point for time cost.
The motivation for this is that I need to set the n'th element even if it is after the old Count.

The simplest way is probably by creating a temporary array:
list.AddRange(new T[size - count]);
Where size is the required new size, and count is the count of items in the list. However, for relatively large values of size - count, this can have bad performance, since it can cause the list to reallocate multiple times.(*) It also has the disadvantage of allocating an additional temporary array, which, depending on your requirements, may not be acceptable. You could mitigate both issues at the expense of more explicit code, by using the following methods:
public static class CollectionsUtil
{
public static List<T> EnsureSize<T>(this List<T> list, int size)
{
return EnsureSize(list, size, default(T));
}
public static List<T> EnsureSize<T>(this List<T> list, int size, T value)
{
if (list == null) throw new ArgumentNullException("list");
if (size < 0) throw new ArgumentOutOfRangeException("size");
int count = list.Count;
if (count < size)
{
int capacity = list.Capacity;
if (capacity < size)
list.Capacity = Math.Max(size, capacity * 2);
while (count < size)
{
list.Add(value);
++count;
}
}
return list;
}
}
The only C# 3.0 here is the use of the "this" modifier to make them extension methods. Remove the modifier and it will work in C# 2.0.
Unfortunately, I never compared the performance of the two versions, so I don't know which one is better.
Oh, and did you know you could resize an array by calling Array.Resize<T>? I didn't know that. :)
Update:
(*) Using list.AddRange(array) will not cause an enumerator to be used. Looking further through Reflector showed that the array will be casted to ICollection<T>, and the Count property will be used so that allocation is done only once.

static IEnumerable<T> GetValues<T>(T value, int count) {
for (int i = 0; i < count; ++i)
yield return value;
}
list.AddRange(GetValues<object>(null, number_of_nulls_to_add));
This will work with 2.0+

Why do you want to do that ?
The main advantage of a List is that it can grow as needed, so why do you want to add a number of null or default elements to it ?
Isn't it better that you use an array in this case ?

Related

Splitting collection into equal batches in Parallel using c#

I am trying to split collection into equal number of batches.below is the code.
public static List<List<T>> SplitIntoBatches<T>(List<T> collection, int size)
{
var chunks = new List<List<T>>();
var count = 0;
var temp = new List<T>();
foreach (var element in collection)
{
if (count++ == size)
{
chunks.Add(temp);
temp = new List<T>();
count = 1;
}
temp.Add(element);
}
chunks.Add(temp);
return chunks;
}
can we do it using Parallel.ForEach() for better performance as we have around 1 Million items in the list?
Thanks!
If performance is the concern, my thoughts (in increasing order of impact):
right-sizing the lists when you create them would save a lot of work, i.e. figure out the output batch sizes before you start copying, i.e. temp = new List<T>(thisChunkSize)
working with arrays would be more effective than working with lists - new T[thisChunkSize]
especially if you use BlockCopy (or CopyTo, which uses that internally) rather than copying individual elements one by one
once you've calculated the offsets for each of the chunks, the individual block-copies could probably be executed in parallel, but I wouldn't assume it will be faster - memory bandwidth will be the limiting factor at that point
but the ultimate fix is: don't copy the data at all, but instead just create ranges over the existing data; for example, if using arrays, ArraySegment<T> would help; if you're open to using newer .NET features, this is a perfect fit for Memory<T>/Span<T> - creating memory/span ranges over an existing array is essentially free and instant - i.e. take a T[] and return List<Memory<T>> or similar.
Even if you can't switch to ArraySegment<T> / Memory<T> etc, returning something like that could still be used - i.e. List<ListSegment<T>> where ListSegment<T> is something like:
readonly struct ListSegment<T> { // like ArraySegment<T>, but for List<T>
public List<T> List {get;}
public int Offset {get;}
public int Count {get;}
}
and have your code work with ListSegment<T> by processing the Offset and Count appropriately.

C# Array slice without copy

I'd like to pass a sub-set of a C# array to into a method. I don't care if the method overwrites the data so would like to avoid creating a copy.
Is there a way to do this?
Thanks.
Change the method to take an IEnumerable<T> or ArraySegment<T>.
You can then pass new ArraySegment<T>(array, 5, 2)
With C# 7.2 we have Span<T> . You can use the extension method AsSpan<T> for your array and pass it to the method without copying the sliced part. eg:
Method( array.AsSpan().Slice(1,3) )
You can use the following class. Note you may need to modify it depending on whether you want endIndex to be inclusive or exclusive. You could also modify it to take a start and a count, rather than a start and an end index.
I intentionally didn't add mutable methods. If you specifically want them, that's easy enough to add. You may also want to implement IList if you add the mutable methods.
public class Subset<T> : IReadOnlyList<T>
{
private IList<T> source;
private int startIndex;
private int endIndex;
public Subset(IList<T> source, int startIndex, int endIndex)
{
this.source = source;
this.startIndex = startIndex;
this.endIndex = endIndex;
}
public T this[int i]
{
get
{
if (startIndex + i >= endIndex)
throw new IndexOutOfRangeException();
return source[startIndex + i];
}
}
public int Count
{
get { return endIndex - startIndex; }
}
public IEnumerator<T> GetEnumerator()
{
return source.Skip(startIndex)
.Take(endIndex - startIndex)
.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Arrays are immutable by size (i.e. you can't change size of a array), so you can only pass a subtracted copy of the original array. As option you can pass two indexes aside original array into method and operate on the basis of additional two indexes..
you can use Linq take funktion and take as many elements from array as you want
var yournewarray = youroldarray.Take(4).ToArray();

is there in C# a method for List<T> like resize in c++ for vector<T>

When I use resize(int newsize) in C++ for vector<T>, it means that the size of this vector are set to newsize and the indexes run in range [0..newsize). How to do the same in C# for List<T>?
Changing the List<T> property Capacity only changes the Capacity but leaves the Count the same, and furthermore the indexes still are in range [0..Count). Help me out, please.
P.S. Imagine I have a vector<T> tmp with a tmp.size() == 5 I cannot refer to tmp[9], but when I then use tmp.resize(10) I may refer to tmp[9]. In C# if I have List<T> tmp with tmp.Count == 5 I cannot refer to tmp[9] (IndexOutOfRangeException), but even when I set tmp.Capacity=10 I will not be able to refer to tmp[9] coz of tmp.Count is still 5. I want to find some analogy of resize in C#.
No, but you can use extension methods to add your own. The following has the same behaviour as std::vector<T>::resize(), including the same time-complexity. The only difference is that in C++ we can define a default with void resize ( size_type sz, T c = T() ) and the way templates work means that that's fine if we call it without the default for a T that has no accessible parameterless constructor. In C# we can't do that, so instead we have to create one method with no constraint that matches the non-default-used case, and another with a where new() constraint that calls into it.
public static class ListExtra
{
public static void Resize<T>(this List<T> list, int sz, T c)
{
int cur = list.Count;
if(sz < cur)
list.RemoveRange(sz, cur - sz);
else if(sz > cur)
{
if(sz > list.Capacity)//this bit is purely an optimisation, to avoid multiple automatic capacity changes.
list.Capacity = sz;
list.AddRange(Enumerable.Repeat(c, sz - cur));
}
}
public static void Resize<T>(this List<T> list, int sz) where T : new()
{
Resize(list, sz, new T());
}
}
Now the likes of myList.Resize(23) or myList.Resize(23, myDefaultValue) will match what one expects from C++'s vector. I'd note though that sometimes where with C++ you'd have a vector of pointers, in C# you'd have a list of some reference-type. Hence in cases where the C++ T() produces a null pointer (because it's a pointer), here we're expecting it to call a parameterless constructor. For that reason you might find it closer to the behaviour you're used to to replace the second method with:
public static void Resize<T>(this List<T> list, int sz)
{
Resize(list, sz, default(T));
}
This has the same effect with value types (call parameterless constructor), but with reference-types, it'll fill with nulls. In which case, we can just rewrite the entire class to:
public static class ListExtra
{
public static void Resize<T>(this List<T> list, int sz, T c = default(T))
{
int cur = list.Count;
if(sz < cur)
list.RemoveRange(sz, cur - sz);
else if(sz > cur)
list.AddRange(Enumerable.Repeat(c, sz - cur));
}
}
Note that this isn't so much about differences between std::vector<T> and List<T> as about the differences in how pointers are used in C++ and C#.
Just to make Jon Hanna's answer more readable:
public static class ListExtras
{
// list: List<T> to resize
// size: desired new size
// element: default value to insert
public static void Resize<T>(this List<T> list, int size, T element = default(T))
{
int count = list.Count;
if (size < count)
{
list.RemoveRange(size, count - size);
}
else if (size > count)
{
if (size > list.Capacity) // Optimization
list.Capacity = size;
list.AddRange(Enumerable.Repeat(element, size - count));
}
}
}
Setting List<T>.Capacity is like using std::vector<T>.reserve(..). Maybe List<T>.AddRange(..) fit your needs.
sorry. is this what u need?
List.TrimExcess()
This is my solution.
private void listResize<T>(List<T> list, int size)
{
if (size > list.Count)
while (size - list.Count > 0)
list.Add(default<T>);
else if (size < list.Count)
while (list.Count - size > 0)
list.RemoveAt(list.Count-1);
}
When the size and list.Count are the same, there is no need to resize the list.
The default(T) parameter is used instead of null,"",0 or other nullable types, to fill an empty item in the list, because we don't know what type <T> is (reference, value, struct etc.).
P.S. I used for loops instead of while loops and i ran into a
problem. Not always the size of the list was that i was asking for. It
was smaller. Any thoughts why?
Check it:
private void listResize<T>(List<T> list, int size)
{
if (size > list.Count)
for (int i = 0; i <= size - list.Count; i++)
list.Add(default(T));
else if (size < list.Count)
for (int i = 0; i <= list.Count - size; i++)
list.RemoveAt(list.Count-1);
}
Haven't you read at MSDN:-
A list is a resizable collection of items. Lists can be constructed
multiple ways, but the most useful class is List. This allows you
to strongly type your list, includes all of the essential
functionality for dealing with collections, and can be easily
searched.
Further:-
Capacity is the number of elements that the List can store before
resizing is required, while Count is the number of elements that are
actually in the List.
Capacity is always greater than or equal to Count. If Count exceeds
Capacity while adding elements, the capacity is increased by
automatically reallocating the internal array before copying the old
elements and adding the new elements.
A list doesn't have a finite size.
Is there a reason why the size matters to you?
Perhaps an array or a dictionary is closer to your requirements

Is there a LINQ extension or (a sensible/efficient set of LINQ entensions) that determine whether a collection has at least 'x' elements?

I have code that needs to know that a collection should not be empty or contain only one item.
In general, I want an extension of the form:
bool collectionHasAtLeast2Items = collection.AtLeast(2);
I can write an extension easily, enumerating over the collection and incrementing an indexer until I hit the requested size, or run out of elements, but is there something already in the LINQ framework that would do this? My thoughts (in order of what came to me) are::
bool collectionHasAtLeast2Items = collection.Take(2).Count() == 2; or
bool collectionHasAtLeast2Items = collection.Take(2).ToList().Count == 2;
Which would seem to work, though the behaviour of taking more elements than the collection contains is not defined (in the documentation) Enumerable.Take Method , however, it seems to do what one would expect.
It's not the most efficient solution, either enumerating once to take the elements, then enumerating again to count them, which is unnecessary, or enumerating once to take the elements, then constructing a list in order to get the count property which isn't enumerator-y, as I don't actually want the list.
It's not pretty as I always have to make two assertions, first taking 'x', then checking that I actually received 'x', and it depends upon undocumented behaviour.
Or perhaps I could use:
bool collectionHasAtLeast2Items = collection.ElementAtOrDefault(2) != null;
However, that's not semantically-clear. Maybe the best is to wrap that with a method-name that means what I want. I'm assuming that this will be efficient, I haven't reflected on the code.
Some other thoughts are using Last(), but I explicitly don't want to enumerate through the whole collection.
Or maybe Skip(2).Any(), again not semantically completely obvious, but better than ElementAtOrDefault(2) != null, though I would think they produce the same result?
Any thoughts?
public static bool AtLeast<T>(this IEnumerable<T> source, int count)
{
// Optimization for ICollection<T>
var genericCollection = source as ICollection<T>;
if (genericCollection != null)
return genericCollection.Count >= count;
// Optimization for ICollection
var collection = source as ICollection;
if (collection != null)
return collection.Count >= count;
// General case
using (var en = source.GetEnumerator())
{
int n = 0;
while (n < count && en.MoveNext()) n++;
return n == count;
}
}
You can use Count() >= 2, if you sequence implements ICollection?
Behind the scene, Enumerable.Count() extension method checks does the sequence under loop implements ICollection. If it does indeed, Count property returned, so target performance should be O(1).
Thus ((IEnumerable<T>)((ICollection)sequence)).Count() >= x also should have O(1).
You could use Count, but if performance is an issue, you will be better off with Take.
bool atLeastX = collection.Take(x).Count() == x;
Since Take (I believe) uses deferred execution, it will only go through the collection once.
abatishchev mentioned that Count is O(1) with ICollection, so you could do something like this and get the best of both worlds.
IEnumerable<int> col;
// set col
int x;
// set x
bool atLeastX;
if (col is ICollection<int>)
{
atLeastX = col.Count() >= x;
}
else
{
atLeastX = col.Take(x).Count() == x;
}
You could also use Skip/Any, in fact I bet it would be even faster than Take/Count.

LINQ into SortedList

I'm a complete LINQ newbie, so I don't know if my LINQ is incorrect for what I need to do or if my expectations of performance are too high.
I've got a SortedList of objects, keyed by int; SortedList as opposed to SortedDictionary because I'll be populating the collection with pre-sorted data. My task is to find either the exact key or, if there is no exact key, the one with the next higher value. If the search is too high for the list (e.g. highest key is 100, but search for 105), return null.
// The structure of this class is unimportant. Just using
// it as an illustration.
public class CX
{
public int KEY;
public DateTime DT;
}
static CX getItem(int i, SortedList<int, CX> list)
{
var items =
(from kv in list
where kv.Key >= i
select kv.Key);
if (items.Any())
{
return list[items.Min()];
}
return null;
}
Given a list of 50,000 records, calling getItem 500 times takes about a second and a half. Calling it 50,000 times takes over 2 minutes. This performance seems very poor. Is my LINQ bad? Am I expecting too much? Should I be rolling my own binary search function?
First, your query is being evaluated twice (once for Any, and once for Min). Second, Min requires that it iterate over the entire list, even though the fact that it's sorted means that the first item will be the minimum. You should be able to change this:
if (items.Any())
{
return list[items.Min()];
}
To this:
var default =
(from kv in list
where kv.Key >= i
select (int?)kv.Key).FirstOrDefault();
if(default != null) return list[default.Value];
return null;
UPDATE
Because you're selecting a value type, FirstOrDefault doesn't return a nullable object. I have altered your query to cast the selected value to an int? instead, allowing the resulting value to be checked for null. I would advocate this over using ContainsKey, as that would return true if your list contained a value for 0. For example, say you have the following values
0 2 4 6 8
If you were to pass in anything less than or equal to 8, then you would get the correct value. However, if you were to pass in 9, you would get 0 (default(int)), which is in the list but isn't a valid result.
Writing a binary search on your own can be tough.
Fortunately, Microsoft already wrote a pretty robust one: Array.BinarySearch<T>. This is, in fact, the method that SortedList<TKey, TValue>.IndexOfKey uses internally. Only problem is, it takes a T[] argument, instead of any IList<T> (like SortedList<TKey, TValue>.Keys).
You know what, though? There's this great tool called Reflector that lets you look at .NET source code...
Check it out: a generic BinarySearch extension method on IList<T>, taken straight from the reflected code of Microsoft's Array.BinarySearch<T> implementation.
public static int BinarySearch<T>(this IList<T> list, int index, int length, T value, IComparer<T> comparer) {
if (list == null)
throw new ArgumentNullException("list");
else if (index < 0 || length < 0)
throw new ArgumentOutOfRangeException((index < 0) ? "index" : "length");
else if (list.Count - index < length)
throw new ArgumentException();
int lower = index;
int upper = (index + length) - 1;
while (lower <= upper) {
int adjustedIndex = lower + ((upper - lower) >> 1);
int comparison = comparer.Compare(list[adjustedIndex], value);
if (comparison == 0)
return adjustedIndex;
else if (comparison < 0)
lower = adjustedIndex + 1;
else
upper = adjustedIndex - 1;
}
return ~lower;
}
public static int BinarySearch<T>(this IList<T> list, T value, IComparer<T> comparer) {
return list.BinarySearch(0, list.Count, value, comparer);
}
public static int BinarySearch<T>(this IList<T> list, T value) where T : IComparable<T> {
return list.BinarySearch(value, Comparer<T>.Default);
}
This will let you call list.Keys.BinarySearch and get the negative bitwise complement of the index you want in case the desired key isn't found (the below is taken basically straight from tzaman's answer):
int index = list.Keys.BinarySearch(i);
if (index < 0)
index = ~index;
var item = index < list.Count ? list[list.Keys[index]] : null;
return item;
Using LINQ on a SortedList will not give you the benefit of the sort.
For optimal performance, you should write your own binary search.
OK, just to give this a little more visibility - here's a more concise version of Adam Robinson's answer:
return list.FirstOrDefault(kv => kv.Key >= i).Value;
The FirstOrDefault function has an overload that accepts a predicate, which selects the first element satisfying a condition - you can use that to directly get the element you want, or null if it doesn't exist.
Why not use the BinarySearch that's built into the List class?
var keys = list.Keys.ToList();
int index = keys.BinarySearch(i);
if (index < 0)
index = ~index;
var item = index < keys.Count ? list[keys[index]] : null;
return item;
If the search target isn't in the list, BinarySearch returns the bit-wise complement of the next-higher item; we can use that to directly get you what you want by re-complementing the result if it's negative. If it becomes equal to the Count, your search key was bigger than anything in the list.
This should be much faster than doing a LINQ where, since it's already sorted...
As comments have pointed out, the ToList call will force an evaluation of the whole list, so this is only beneficial if you do multiple searches without altering the underlying SortedList, and you keep the keys list around separately.
Using OrderedDictionary in PowerCollections you can get an enumerator that starts where they key you are looking for should be... if it's not there, you'll get the next closest node and can then navigate forwards/backwards from that in O(log N) time per nav call.
This has the advantage of you not having to write your own search or even manage your own searches on top of a SortedList.

Categories