Why there is no SelectToArray method in System.Linq.Enumerable - c#

So i'd like to ask: why we have only selector that returns an enumerable? For example, i have frequently situation, when i must modify each value of array, for example:
int[] a = {1,2,3,4,5};
a = a.Select(x=>x*2).ToArray();
so here we get an enumerable, and only after it we can convert it back into array.
We can try to use Array.ForEach, but only if we could modify the source. But if we have array of reference types and can't modify them, we should anyway write something like this
SomeClass[] a = FillSomeClassArray();
SomeClass[] b = a.Select(x=> ((SomeClass)x.Clone()).Modify()).ToArray();
in my case i'm using my own class
public static class CollectionHelper
{
public static TResult[] SelectToArray<T, TResult>(this ICollection<T> source, Func<T, TResult> selector)
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
var result = new TResult[source.Count];
int i = 0;
foreach (T t in source)
{
result[i] = selector(t);
i++;
}
return result;
}
}
here we haven't double-convertation, when we haven't predicate we know length of result and we should use this information. I know that MS shouldn't do all the work instead of me, but afaik it's functionaloty standard enough.

The biggest problem with adding SelectToArray to the framework is consistency. If you add SelectToArray, you should also add each of the following:
CastToArray<T>
ConcatToArray<T>
RepeatToArray<T>
ReverseToArray<T>
SkipToArray<T>
OfTypeToArray<T>
TakeToArray<T>
While we're at the subject of adding new methods, what's wrong with adding the same optimization to lists? Now we also need
SelectToList<T> (similar to the one that started it all)
CastToList<T>
ConcatToList<T>
... and so on - I'm sure you got the idea.
Considering the minuscule savings from knowing the size of the target array or the target list, such major refactoring is impractical. You would be able to achieve the same effect with a simple method like this:
static T[] CopyToArray(
this IEnumerable<T> source
, T[] result
, int pos = 0
, int? lengthOrNull = null
) {
int length = lengthOrNull ?? result.Length;
foreach (var item in source) {
if (pos > length) break;
result[pos++] = item;
}
return result;
}
Now the caller can combine the existing LINQ functionality with this method to compose all of the above XyzToArray methods, like this:
IList<MyClass> data = ...
int[] res = data.Select(x => x.IntProperty).CopyToArray(new int[data.Count]);
You would also be able to write results of LINQ queries into different parts of an existing array, like this:
IList<MyClass> data1 = ...
IList<MyClass> data2 = ...
int[] res = new int[data1.Count+data2.Count];
data1.Select(x => x.IntProperty).CopyToArray(res, 0, data1.Count);
data2.Select(x => x.IntProperty).CopyToArray(res, data1.Count, data2.Count);

Related

How to replace an element in a Collection

The thing I wanna do would appear really simple - I want to find an element in an ICollection<T> that satisfies a given predicate and replace it with another. In C++ I would write this like:
for(auto &element : collection) {
if(predicate(elem)) {
element = newElement;
}
}
Grab the element by reference and reassign it. However doing
foreach(ref var element in collection)
in C# fails to compile, and I'm unsure if it'd even do what I want if it did compile. How do I access the physical reference within a collection to modify it?
My method signature if it helps:
public static void ReplaceReference<T>(
ICollection<T> collection,
T newReference,
Func<T, bool> predicate)
EDIT:
Since it appears unclear, I cannot just take the ICollection<T> and change it to something else. I'm getting an ICollection - that's all I know and I can't change that. No matter how much I'd love this to be an IList, or IEasilyReplacable I can't influence that.
ICollection<T> wouldn't be the best for this scenario. IList<T> allows you to assign with the indexer.
Another option would be to create a new collection as you iterate.
You could also write some sort of wrapper that is the actual reference in the collection and holds the value:
ICollection<Wrapper<T>> collection = ...;
foreach(var wrapper in collection)
{
wrapper.Value = newValue;
}
As per my understanding you want to replace specific item in collection based on given predicate, I tried below code and it is works fine for me.
I've created a list of string with 4 items and i asked my generic method to search for string with value "Name 1" if it is true it should change it to value "Name 5".
I've tested it using console application so you can test it by creating forloop that show values of list using Console.WriteLine();
public void Main(string[] args)
{
List<string> list = new List<string>();
list.Add("Name 1");
list.Add("Name 2");
list.Add("Name 3");
list.Add("Name 4");
Func<string, bool> logicFunc = (listItemValue) => listItemValue == "Name 1";
ReplaceReference(list, "Name 5", logicFunc);
}
public static void ReplaceReference<T>(ICollection<T> collection, T newReference, Func<T, bool> predicate)
{
var typeName = typeof(T).Name;
var newCollection = collection.ToList();
for (int i = 0; i < newCollection.Count; i++)
{
if (predicate(newCollection[i]))
{
newCollection[i] = newReference;
}
}
}
So I bashed my head against the wall and came up with a really simple solution for the particular replace problem, which is to find, remove and then add.
var existing = collection.FirstOrDefault(predicate);
if (existing != null)
{
collection.Remove(existing);
collection.Add(newReference);
}
However, I see it as rather a workaround to my foreach issue, and have thus posted this question as a follow-up: Grab element from a Collection by reference in a foreach
EDIT:
For Daniel A. White's comment:
Handling only the first one was what I intended to do, but it can be easily changed to replace-all:
var existing = collection.Where(predicate);
foreach(var element in existing)
{
collection.Remove(element);
}
for(int i = 0; i < existing.Count); ++i)
{
collection.Add(newReference);
}
As for ordering - ICollection is not necessarily ordered. So the way for fixing that would be creating a new method with a less general signature
static void ReplaceReference<T>(
IList<T> list,
T newReference,
Func<T, bool> predicate)
that would use the indexer to replace the values
for(int i = 0; i < list.Count; ++i)
{
if(predicate(list[i]))
{
list[i] = newReference;
// break here if replace-one variant.
}
}
And now in the main method we check if our collection is an IList, therefore ordered, and pass it to the ordered version:
if(collection is IList<T> list)
{
ReplaceReference(list, newReference, predicate);
return;
}
===========================================================================
Sidenote: of course there is also the dumbo approach:
var newCollection = new List<T>();
foreach(var element in collection)
{
newList.Add(predicate(element) ? newReference : element);
}
collection.Clear();
foreach(var newElement in newCollection)
{
collection.Add(newElement);
}
but it's highly inefficient.

LINQ to return null if an array is empty

public class Stuff
{
public int x;
// ... other stuff
}
I have a IEnumerable<Stuff> and I want to build a int[] of all of the x properties of all the Stuff objects in the collection.
I do:
IEnumerable<Stuff> coll;
// ...
var data = coll.Select(s => s.x).ToArray();
What I want is a null array rather than a int[0] if the collection is empty. In other words, if !coll.Any(), then I want data = null. (My actual need is that coll is an intermediate result of a complex LINQ expression, and I would like to do this with a LINQ operation on the expression chain, rather than saving the intermediate result)
I know that int[0] is more desirable than null in many contexts, but I am storing many of these results and would prefer to pass around nulls than empty arrays.
So my current solution is something like:
var tmp = coll.Select(s => s.x).ToArray();
int[] data = tmp.Any() ? tmp : null;
Any way to do this without storing tmp?
EDIT: The main question is how to do this without storing intermediate results. Something like NULLIF() from T-SQL where you get back what you passed in if the condition is false, and NULL if the condition is true.
If you're doing this a lot, you could write an extension method:
public static class IEnumerableExt
{
public static T[] ToArrayOrNull<T>(this IEnumerable<T> seq)
{
var result = seq.ToArray();
if (result.Length == 0)
return null;
return result;
}
}
Then your calling code would be:
var data = coll.Select(s => s.x).ToArrayOrNull();
Create the array only if coll is not empty, so the other way round:
int[] data = null;
if(coll.Any()) data = coll.Select(s => s.x).ToArray();
There's not a way to get Select to return null, but if you don't want to create an additional array you could do:
var tmp = coll.Select(s => s.x);
int[] data = tmp.Any() ? tmp.ToArray() : null;

Extensions for IEnumerable generic

I've got two extensions for IEnumerable:
public static class IEnumerableGenericExtensions
{
public static IEnumerable<IEnumerable<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
List<T> toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
public static int IndexOf<T>(this IEnumerable<T> source, Predicate<T> searchPredicate)
{
int i = 0;
foreach (var item in source)
if (searchPredicate(item))
return i;
else
i++;
return -1;
}
}
Then I write this code:
Pages = history.InSetsOf<Message>(500);
var index = Pages.IndexOf(x => x == Pages.ElementAt(0));
where
public class History : IEnumerable
But as a result I've got not '0' as I've expected, but '-1'. I cant understand - why so?
When you write Pages.IndexOf(x => x == Pages.ElementAt(0));, you actually run InSetsOf many times, due to deferred execution (aka lazy). To expand:
Pages = history.InSetsOf<Message>(500) - this line doesn't run InSetsOf at all.
Pages.IndexOf - Iterates over Pages, so it starts executing InSetsOf once.
x == Pages.ElementAt(0) - this executes InSetsOf again, once for every element in the collection of Pages (or at least until searchPredicate return true, which doesn't happen here).
Each time you run InSetsOf you create a new list (specifically, a new first list, because you use ElementAt(0)). These are two different objects, so comparison of == between them fails.
An extremely simple fix would be to return a list, so Pages is not a deferred query, but a concrete collection:
Pages = history.InSetsOf<Message>(500).ToList();
Another option is to use SequenceEqual, though I'd recommend caching the first element anyway:
Pages = history.InSetsOf<Message>(500);
var firstPage = Pages.FirstOrDefault();
var index = Pages.IndexOf(x => x.SequenceEqual(firstPage));
Does your class T implement the IComparable? If not, your equality check might be flawed, as the framework does not know exactly when T= T. You would also get by just overriding equals on your class T I would guess.

What's the fastest way to convert List<string> to List<int> in C# assuming int.Parse will work for every item?

By fastest I mean what is the most performant means of converting each item in List to type int using C# assuming int.Parse will work for every item?
You won't get around iterating over all elements. Using LINQ:
var ints = strings.Select(s => int.Parse(s));
This has the added bonus it will only convert at the time you iterate over it, and only as much elements as you request.
If you really need a list, use the ToList method. However, you have to be aware that the performance bonus mentioned above won't be available then.
If you're really trying to eeke out the last bit of performance you could try doing someting with pointers like this, but personally I'd go with the simple linq implementation that others have mentioned.
unsafe static int ParseUnsafe(string value)
{
int result = 0;
fixed (char* v = value)
{
char* str = v;
while (*str != '\0')
{
result = 10 * result + (*str - 48);
str++;
}
}
return result;
}
var parsed = input.Select(i=>ParseUnsafe(i));//optionally .ToList() if you really need list
There is likely to be very little difference between any of the obvious ways to do this: therefore go for readability (one of the LINQ-style methods posted in other answers).
You may gain some performance for very large lists by initializing the output list to its required capacity, but it's unlikely you'd notice the difference, and readability will suffer:
List<string> input = ..
List<int> output = new List<int>(input.Count);
... Parse in a loop ...
The slight performance gain will come from the fact that the output list won't need to be repeatedly reallocated as it grows.
I don't know what the performance implications are, but there is a List<T>.ConvertAll<TOutput> method for converting the elements in the current List to another type, returning a list containing the converted elements.
List.ConvertAll Method
var myListOfInts = myListString.Select(x => int.Parse(x)).ToList()
Side note: If you call ToList() on ICollection .NET framework automatically preallocates an
List of needed size, so it doesn't have to allocate new space for each new item added to the list.
Unfortunately LINQ Select doesn't return an ICollection (as Joe pointed out in comments).
From ILSpy:
// System.Linq.Enumerable
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
return new List<TSource>(source);
}
// System.Collections.Generic.List<T>
public List(IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
ICollection<T> collection2 = collection as ICollection<T>;
if (collection2 != null)
{
int count = collection2.Count;
this._items = new T[count];
collection2.CopyTo(this._items, 0);
this._size = count;
return;
}
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current);
}
}
}
So, ToList() just calls List constructor and passes in an IEnumerable.
The List constructor is smart enough that if it is an ICollection it uses most efficient way of filling a new instance of List

How to iterate over two arrays at once?

I have two arrays built while parsing a text file. The first contains the column names, the second contains the values from the current row. I need to iterate over both lists at once to build a map. Right now I have the following:
var currentValues = currentRow.Split(separatorChar);
var valueEnumerator = currentValues.GetEnumerator();
foreach (String column in columnList)
{
valueEnumerator.MoveNext();
valueMap.Add(column, (String)valueEnumerator.Current);
}
This works just fine, but it doesn't quite satisfy my sense of elegance, and it gets really hairy if the number of arrays is larger than two (as I have to do occasionally). Does anyone have another, terser idiom?
You've got a non-obvious pseudo-bug in your initial code - IEnumerator<T> extends IDisposable so you should dispose it. This can be very important with iterator blocks! Not a problem for arrays, but would be with other IEnumerable<T> implementations.
I'd do it like this:
public static IEnumerable<TResult> PairUp<TFirst,TSecond,TResult>
(this IEnumerable<TFirst> source, IEnumerable<TSecond> secondSequence,
Func<TFirst,TSecond,TResult> projection)
{
using (IEnumerator<TSecond> secondIter = secondSequence.GetEnumerator())
{
foreach (TFirst first in source)
{
if (!secondIter.MoveNext())
{
throw new ArgumentException
("First sequence longer than second");
}
yield return projection(first, secondIter.Current);
}
if (secondIter.MoveNext())
{
throw new ArgumentException
("Second sequence longer than first");
}
}
}
Then you can reuse this whenever you have the need:
foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar),
(column, value) => new { column, value })
{
// Do something
}
Alternatively you could create a generic Pair type, and get rid of the projection parameter in the PairUp method.
EDIT:
With the Pair type, the calling code would look like this:
foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar))
{
// column = pair.First, value = pair.Second
}
That looks about as simple as you can get. Yes, you need to put the utility method somewhere, as reusable code. Hardly a problem in my view. Now for multiple arrays...
If the arrays are of different types, we have a problem. You can't express an arbitrary number of type parameters in a generic method/type declaration - you could write versions of PairUp for as many type parameters as you wanted, just like there are Action and Func delegates for up to 4 delegate parameters - but you can't make it arbitrary.
If the values will all be of the same type, however - and if you're happy to stick to arrays - it's easy. (Non-arrays is okay too, but you can't do the length checking ahead of time.) You could do this:
public static IEnumerable<T[]> Zip<T>(params T[][] sources)
{
// (Insert error checking code here for null or empty sources parameter)
int length = sources[0].Length;
if (!sources.All(array => array.Length == length))
{
throw new ArgumentException("Arrays must all be of the same length");
}
for (int i=0; i < length; i++)
{
// Could do this bit with LINQ if you wanted
T[] result = new T[sources.Length];
for (int j=0; j < result.Length; j++)
{
result[j] = sources[j][i];
}
yield return result;
}
}
Then the calling code would be:
foreach (var array in Zip(columns, row, whatevers))
{
// column = array[0]
// value = array[1]
// whatever = array[2]
}
This involves a certain amount of copying, of course - you're creating an array each time. You could change that by introducing another type like this:
public struct Snapshot<T>
{
readonly T[][] sources;
readonly int index;
public Snapshot(T[][] sources, int index)
{
this.sources = sources;
this.index = index;
}
public T this[int element]
{
return sources[element][index];
}
}
This would probably be regarded as overkill by most though ;)
I could keep coming up with all kinds of ideas, to be honest... but the basics are:
With a little bit of reusable work, you can make the calling code nicer
For arbitrary combinations of types you'll have to do each number of parameters (2, 3, 4...) separately due to the way generics works
If you're happy to use the same type for each part, you can do better
if there are the same number of column names as there are elements in each row, could you not use a for loop?
var currentValues = currentRow.Split(separatorChar);
for(var i=0;i<columnList.Length;i++){
// use i to index both (or all) arrays and build your map
}
In a functional language you would usually find a "zip" function which will hopefully be part of a C#4.0 . Bart de Smet provides a funny implementation of zip based on existing LINQ functions:
public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> func)
{
return first.Select((x, i) => new { X = x, I = i })
.Join(second.Select((x, i) => new { X = x, I = i }),
o => o.I,
i => i.I,
(o, i) => func(o.X, i.X));
}
Then you can do:
int[] s1 = new [] { 1, 2, 3 };
int[] s2 = new[] { 4, 5, 6 };
var result = s1.Zip(s2, (i1, i2) => new {Value1 = i1, Value2 = i2});
If you're really using arrays, the best way is probably just to use the conventional for loop with indices. Not as nice, granted, but as far as I know .NET doesn't offer a better way of doing this.
You could also encapsulate your code into a method called zip – this is a common higher-order list function. However, C# lacking a suitable Tuple type, this is quite crufty. You'd end up returning an IEnumerable<KeyValuePair<T1, T2>> which isn't very nice.
By the way, are you really using IEnumerable instead of IEnumerable<T> or why do you cast the Current value?
Use IEnumerator for both would be nice
var currentValues = currentRow.Split(separatorChar);
using (IEnumerator<string> valueEnum = currentValues.GetEnumerator(), columnEnum = columnList.GetEnumerator()) {
while (valueEnum.MoveNext() && columnEnum.MoveNext())
valueMap.Add(columnEnum.Current, valueEnum.Current);
}
Or create an extension methods
public static IEnumerable<TResult> Zip<T1, T2, TResult>(this IEnumerable<T1> source, IEnumerable<T2> other, Func<T1, T2, TResult> selector) {
using (IEnumerator<T1> sourceEnum = source.GetEnumerator()) {
using (IEnumerator<T2> otherEnum = other.GetEnumerator()) {
while (sourceEnum.MoveNext() && columnEnum.MoveNext())
yield return selector(sourceEnum.Current, otherEnum.Current);
}
}
}
Usage
var currentValues = currentRow.Split(separatorChar);
foreach (var valueColumnPair in currentValues.Zip(columnList, (a, b) => new { Value = a, Column = b }) {
valueMap.Add(valueColumnPair.Column, valueColumnPair.Value);
}
Instead of creating two seperate arrays you could make a two-dimensional array, or a dictionary (which would be better). But really, if it works I wouldn't try to change it.

Categories