How to iterate over two arrays at once? - c#

I have two arrays built while parsing a text file. The first contains the column names, the second contains the values from the current row. I need to iterate over both lists at once to build a map. Right now I have the following:
var currentValues = currentRow.Split(separatorChar);
var valueEnumerator = currentValues.GetEnumerator();
foreach (String column in columnList)
{
valueEnumerator.MoveNext();
valueMap.Add(column, (String)valueEnumerator.Current);
}
This works just fine, but it doesn't quite satisfy my sense of elegance, and it gets really hairy if the number of arrays is larger than two (as I have to do occasionally). Does anyone have another, terser idiom?

You've got a non-obvious pseudo-bug in your initial code - IEnumerator<T> extends IDisposable so you should dispose it. This can be very important with iterator blocks! Not a problem for arrays, but would be with other IEnumerable<T> implementations.
I'd do it like this:
public static IEnumerable<TResult> PairUp<TFirst,TSecond,TResult>
(this IEnumerable<TFirst> source, IEnumerable<TSecond> secondSequence,
Func<TFirst,TSecond,TResult> projection)
{
using (IEnumerator<TSecond> secondIter = secondSequence.GetEnumerator())
{
foreach (TFirst first in source)
{
if (!secondIter.MoveNext())
{
throw new ArgumentException
("First sequence longer than second");
}
yield return projection(first, secondIter.Current);
}
if (secondIter.MoveNext())
{
throw new ArgumentException
("Second sequence longer than first");
}
}
}
Then you can reuse this whenever you have the need:
foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar),
(column, value) => new { column, value })
{
// Do something
}
Alternatively you could create a generic Pair type, and get rid of the projection parameter in the PairUp method.
EDIT:
With the Pair type, the calling code would look like this:
foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar))
{
// column = pair.First, value = pair.Second
}
That looks about as simple as you can get. Yes, you need to put the utility method somewhere, as reusable code. Hardly a problem in my view. Now for multiple arrays...
If the arrays are of different types, we have a problem. You can't express an arbitrary number of type parameters in a generic method/type declaration - you could write versions of PairUp for as many type parameters as you wanted, just like there are Action and Func delegates for up to 4 delegate parameters - but you can't make it arbitrary.
If the values will all be of the same type, however - and if you're happy to stick to arrays - it's easy. (Non-arrays is okay too, but you can't do the length checking ahead of time.) You could do this:
public static IEnumerable<T[]> Zip<T>(params T[][] sources)
{
// (Insert error checking code here for null or empty sources parameter)
int length = sources[0].Length;
if (!sources.All(array => array.Length == length))
{
throw new ArgumentException("Arrays must all be of the same length");
}
for (int i=0; i < length; i++)
{
// Could do this bit with LINQ if you wanted
T[] result = new T[sources.Length];
for (int j=0; j < result.Length; j++)
{
result[j] = sources[j][i];
}
yield return result;
}
}
Then the calling code would be:
foreach (var array in Zip(columns, row, whatevers))
{
// column = array[0]
// value = array[1]
// whatever = array[2]
}
This involves a certain amount of copying, of course - you're creating an array each time. You could change that by introducing another type like this:
public struct Snapshot<T>
{
readonly T[][] sources;
readonly int index;
public Snapshot(T[][] sources, int index)
{
this.sources = sources;
this.index = index;
}
public T this[int element]
{
return sources[element][index];
}
}
This would probably be regarded as overkill by most though ;)
I could keep coming up with all kinds of ideas, to be honest... but the basics are:
With a little bit of reusable work, you can make the calling code nicer
For arbitrary combinations of types you'll have to do each number of parameters (2, 3, 4...) separately due to the way generics works
If you're happy to use the same type for each part, you can do better

if there are the same number of column names as there are elements in each row, could you not use a for loop?
var currentValues = currentRow.Split(separatorChar);
for(var i=0;i<columnList.Length;i++){
// use i to index both (or all) arrays and build your map
}

In a functional language you would usually find a "zip" function which will hopefully be part of a C#4.0 . Bart de Smet provides a funny implementation of zip based on existing LINQ functions:
public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> func)
{
return first.Select((x, i) => new { X = x, I = i })
.Join(second.Select((x, i) => new { X = x, I = i }),
o => o.I,
i => i.I,
(o, i) => func(o.X, i.X));
}
Then you can do:
int[] s1 = new [] { 1, 2, 3 };
int[] s2 = new[] { 4, 5, 6 };
var result = s1.Zip(s2, (i1, i2) => new {Value1 = i1, Value2 = i2});

If you're really using arrays, the best way is probably just to use the conventional for loop with indices. Not as nice, granted, but as far as I know .NET doesn't offer a better way of doing this.
You could also encapsulate your code into a method called zip – this is a common higher-order list function. However, C# lacking a suitable Tuple type, this is quite crufty. You'd end up returning an IEnumerable<KeyValuePair<T1, T2>> which isn't very nice.
By the way, are you really using IEnumerable instead of IEnumerable<T> or why do you cast the Current value?

Use IEnumerator for both would be nice
var currentValues = currentRow.Split(separatorChar);
using (IEnumerator<string> valueEnum = currentValues.GetEnumerator(), columnEnum = columnList.GetEnumerator()) {
while (valueEnum.MoveNext() && columnEnum.MoveNext())
valueMap.Add(columnEnum.Current, valueEnum.Current);
}
Or create an extension methods
public static IEnumerable<TResult> Zip<T1, T2, TResult>(this IEnumerable<T1> source, IEnumerable<T2> other, Func<T1, T2, TResult> selector) {
using (IEnumerator<T1> sourceEnum = source.GetEnumerator()) {
using (IEnumerator<T2> otherEnum = other.GetEnumerator()) {
while (sourceEnum.MoveNext() && columnEnum.MoveNext())
yield return selector(sourceEnum.Current, otherEnum.Current);
}
}
}
Usage
var currentValues = currentRow.Split(separatorChar);
foreach (var valueColumnPair in currentValues.Zip(columnList, (a, b) => new { Value = a, Column = b }) {
valueMap.Add(valueColumnPair.Column, valueColumnPair.Value);
}

Instead of creating two seperate arrays you could make a two-dimensional array, or a dictionary (which would be better). But really, if it works I wouldn't try to change it.

Related

Is it possible to create an ICollection from a given type?

I would like to create an extension method to convert a ICollection of type long to a ICOllection of type long?. It would be something like that:
public static void ConvetirLongANullableLong<T, U>(this ICollection<T> paramIcOrigenNoNullable, ICollection<U> paramIcDestinoNullable)
{
for (int i = 0; i < paramIcOrigenNoNullable.Count; i++)
{
paramIcDestinoNullable.Add((U)paramIcOrigenNoNullable.ElementAt(i));
}
}
But I have a problem because I can't convert U to T.
The idea was to create a generic method to convert for example long to long? in this case, but int to int? or any other non nullable basic type to a nullable type.
Is it possible or I should to create one method for each type?
This is motivated byt this post: Fastest way to convert List<int> to List<int?> that says it is better to use a foreach and don't use linq select or linq cast because it is slower.
Thanks.
PD: I give as answer to V0ldek, because it is what I was really asked in this post, but really it is faster if I use linq select, how PavelAnikhouski tells in some comment, at least using Entity Core 3.0. I don't know if in another versions is faster the for option, because the results in the link that I indicate, the times are very differnt than in the case of PavelAnikhouski and in my own case too.
If you're gonna always convert from T to T? then, well, you don't need a type U. You know U, U = T?. But you do need to constraint T to struct.
public static void ConvertToCollectionOfNullable<T>(
this ICollection<T> source,
ICollection<T?> destination) where T : struct
{
for (int i = 0; i < source.Count; i++)
{
destination.Add(source.ElementAt(i));
}
}
also, you can use foreach on a collection for cleaner code.
public static void ConvertToCollectionOfNullable<T>(
this ICollection<T> source,
ICollection<T?> destination) where T : struct
{
foreach (var element in source)
{
destination.Add(element);
}
}
You can made made even simpler, using OfType<T> or Cast<T> methods from System.Linq
var list = new List<long>() { 1, 2, 3 };
var converted = list.OfType<long?>().ToList();
OfType<T> doesn't throw any exception, if cast will fail (like Cast<T>)
One more option is to use Select method for that. Since you know the result type, it won't be a problem with cast (compiler helps you too) and boxing as well
var list = new List<long>() { 1, 2, 3 };
var converted = list.Select(_ => (long?)_).ToList();
IEnumerable.Cast is slow, cause it uses boxing. This is what happens:
long? l = (long?)(object)8;
But this is the only way to Cast from anything to anything.
If you want it faster implement it type-specific.
But for the special case, where you want to convert a value type, to it's nullable type, you can stick with generic. You just need to avoid to handle two types (your <T, U>) it is T and T?, and you have to tell the compiler T is a value type:
public static IEnumerable<T?> Convert<T>(this IEnumerable<T> list) where T : struct
{
return list.Select(x => new T?(x));
}
if you want it faster, avoid the LINQ enumerator (avoid foreach) and use a simple loop
public static IList<T?> Convert<T>(this IList<T> list) where T : struct
{
var newlist = new List<T?>(list.Count);
for (int i = 0; i < list.Count; i++)
newlist.Add(new T?(list[i]));
return newlist;
}

Why there is no SelectToArray method in System.Linq.Enumerable

So i'd like to ask: why we have only selector that returns an enumerable? For example, i have frequently situation, when i must modify each value of array, for example:
int[] a = {1,2,3,4,5};
a = a.Select(x=>x*2).ToArray();
so here we get an enumerable, and only after it we can convert it back into array.
We can try to use Array.ForEach, but only if we could modify the source. But if we have array of reference types and can't modify them, we should anyway write something like this
SomeClass[] a = FillSomeClassArray();
SomeClass[] b = a.Select(x=> ((SomeClass)x.Clone()).Modify()).ToArray();
in my case i'm using my own class
public static class CollectionHelper
{
public static TResult[] SelectToArray<T, TResult>(this ICollection<T> source, Func<T, TResult> selector)
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
var result = new TResult[source.Count];
int i = 0;
foreach (T t in source)
{
result[i] = selector(t);
i++;
}
return result;
}
}
here we haven't double-convertation, when we haven't predicate we know length of result and we should use this information. I know that MS shouldn't do all the work instead of me, but afaik it's functionaloty standard enough.
The biggest problem with adding SelectToArray to the framework is consistency. If you add SelectToArray, you should also add each of the following:
CastToArray<T>
ConcatToArray<T>
RepeatToArray<T>
ReverseToArray<T>
SkipToArray<T>
OfTypeToArray<T>
TakeToArray<T>
While we're at the subject of adding new methods, what's wrong with adding the same optimization to lists? Now we also need
SelectToList<T> (similar to the one that started it all)
CastToList<T>
ConcatToList<T>
... and so on - I'm sure you got the idea.
Considering the minuscule savings from knowing the size of the target array or the target list, such major refactoring is impractical. You would be able to achieve the same effect with a simple method like this:
static T[] CopyToArray(
this IEnumerable<T> source
, T[] result
, int pos = 0
, int? lengthOrNull = null
) {
int length = lengthOrNull ?? result.Length;
foreach (var item in source) {
if (pos > length) break;
result[pos++] = item;
}
return result;
}
Now the caller can combine the existing LINQ functionality with this method to compose all of the above XyzToArray methods, like this:
IList<MyClass> data = ...
int[] res = data.Select(x => x.IntProperty).CopyToArray(new int[data.Count]);
You would also be able to write results of LINQ queries into different parts of an existing array, like this:
IList<MyClass> data1 = ...
IList<MyClass> data2 = ...
int[] res = new int[data1.Count+data2.Count];
data1.Select(x => x.IntProperty).CopyToArray(res, 0, data1.Count);
data2.Select(x => x.IntProperty).CopyToArray(res, data1.Count, data2.Count);

Convert an object array of object arrays to a two dimensional array of object

I have a third party library returning an object array of object arrays that I can stuff into an object[]:
object[] arr = myLib.GetData(...);
The resulting array consists of object[] entries, so you can think of the return value as some kind of recordset with the outer array representing the rows and the inner arrays containing the field values where some fields might not be filled (a jagged array). To access the individual fields I have to cast like:
int i = (int) ((object[])arr[row])[col];//access a field containing an int
Now as I'm lazy I want to access the elements like this:
int i = (int) arr[row][col];
To do this I use the following Linq query:
object[] result = myLib.GetData(...);
object[][] arr = result.Select(o => (object[])o ).ToArray();
I tried using a simple cast like object[][] arr = (object[][])result; but that fails with a runtime error.
Now, my questions:
Is there a simpler way of doing this? I have the feeling that some
nifty cast should do the trick?
Also I am worried about performance
as I have to reshape a lot of data just to save me some casting, so I
wonder if this is really worth it?
EDIT:
Thank you all for the speedy answers.
#James: I like your answer wrapping up the culprit in a new class, but the drawback is that I always have to do the Linq wrapping when taking in the source array and the indexer needs both row and col values int i = (int) arr[row, col]; (I need to get a complete row as well like object[] row = arr[row];, sorry didn't post that in the beginning).
#Sergiu Mindras: Like James, i feel the extension method a bit dangerous as it would apply to all object[] variables.
#Nair: I chose your answer for my implementation, as it does not need using the Linq wrapper and I can access both individual fields using int i = (int) arr[row][col]; or an entire row using object[] row = arr[row];
#quetzalcoatl and #Abe Heidebrecht: Thanks for the hints on Cast<>().
Conclusion: I wish I could choose both James' and Nair's answer, but as I stated above, Nair's solution gives me (I think) the best flexibility and performance.
I added a function that will 'flatten' the internal array using the above Linq statement because I have other functions that need to be fed with such a structure.
Here is how I (roughly) implemented it (taken from Nair's solution:
public class CustomArray
{
private object[] data;
public CustomArray(object[] arr)
{
data = arr;
}
//get a row of the data
public object[] this[int index]
{ get { return (object[]) data[index]; } }
//get a field from the data
public object this[int row, int col]
{ get { return ((object[])data[row])[col]; } }
//get the array as 'real' 2D - Array
public object[][] Data2D()
{//this could be cached in case it is accessed more than once
return data.Select(o => (object[])o ).ToArray()
}
static void Main()
{
var ca = new CustomArray(new object[] {
new object[] {1,2,3,4,5 },
new object[] {1,2,3,4 },
new object[] {1,2 } });
var row = ca[1]; //gets a full row
int i = (int) ca[2,1]; //gets a field
int j = (int) ca[2][1]; //gets me the same field
object[][] arr = ca.Data2D(); //gets the complete array as 2D-array
}
}
So - again - thank you all! It always is a real pleasure and enlightenment to use this site.
You could create a wrapper class to hide the ugly casting e.g.
public class DataWrapper
{
private readonly object[][] data;
public DataWrapper(object[] data)
{
this.data = data.Select(o => (object[])o ).ToArray();
}
public object this[int row, int col]
{
get { return this.data[row][col]; }
}
}
Usage
var data = new DataWrapper(myLib.GetData(...));
int i = (int)data[row, col];
There is also the opportunity to make the wrapper generic e.g. DataWrapper<int>, however, I wasn't sure if your data collection would be all of the same type, returning object keeps it generic enough for you to decide what data type cast is needed.
There are few similar answer posted which does something similar. This differ only if you want to acess like
int i = (int) arr[row][col];
To demonstrate the idea
public class CustomArray
{
private object[] _arr;
public CustomArray(object[] arr)
{
_arr = arr;
}
public object[] this[int index]
{
get
{
// This indexer is very simple, and just returns or sets
// the corresponding element from the internal array.
return (object[]) _arr[index];
}
}
static void Main()
{
var c = new CustomArray(new object[] { new object[] {1,2,3,4,5 }, new object[] {1,2,3,4 }, new object[] {1,2 } });
var a =(int) c[1][2]; //here a will be 4 as you asked.
}
}
(1) This probably could be done in short and easy form with dynamic keyword, but you'll use compile-time checking. But considering that you use object[], that's a small price:
dynamic results = obj.GetData();
object something = results[0][1];
I've not checked it with a compiler though.
(2) instead of Select(o => (type)o) there's a dedicated Cast<> function:
var tmp = items.Select(o => (object[])o).ToArray();
var tmp = items.Cast<object[]>().ToArray();
They are almost the same. I'd guess that Cast is a bit faster, but again, I've not checked that.
(3) Yes, reshaping in that way will affect the performance somewhat, depending mostly on the amount of items. The impact will be the larger the more elements you have. That's mostly related to .ToArray as it will enumerate all the items and it will make an additional array. Consider this:
var results = ((object[])obj.GetData()).Cast<object[]>();
The 'results' here are of type IEnumerable<object[]> and the difference is that it will be enumerated lazily, so the extra iteration over all elements is gone, the temporary extra array is gone, and also the overhead is minimal - similar to manual casting of every element, which you'd do anyways.. But - you lose the ability to index over the topmost array. You can loop/foreach over it, but you cannot index/[123] it.
EDIT:
The James's wrapper way is probably the best in terms of overall performance. I like it the most for readability, but that's personal opinion. Others may like LINQ more. But, I like it. I'd suggest James' wrapper.
You could use extension method:
static int getValue(this object[] arr, int col, int row)
{
return (int) ((object[])arr[row])[col];
}
And retrieve by
int requestedValue = arr.getValue(col, row);
No idea for arr[int x][int y] syntax.
EDIT
Thanks James for your observation
You can use a nullable int so you don't get an exception when casting.
So, the method will become:
static int? getIntValue(this object[] arr, int col, int row)
{
try
{
int? returnVal = ((object[])arr[row])[col] as int;
return returnVal;
}
catch(){ return null; }
}
And can be retrieved by
int? requestedValue = arr.getIntValue(col, row);
This way you get a nullable object and all encountered exceptions force return null
You can use LINQ Cast operator instead of Select...
object[][] arr = result.Cast<object[]>().ToArray()
This is a little less verbose, but should be nearly identical performance wise. Another way is to do it manually:
object[][] arr = new object[result.Length][];
for (int i = 0; i < arr.Length; ++i)
arr[i] = (object[])result[i];

Execute Action<T> by IEnumerable Characteristics?

I have this code :
int[] g = new int[3] { 1, 2, 3 };
g.ToList().ForEach(f=>Console.Write(f));
For each item in the array , I want to execute an Action....
int[] is already implementing Ienumerable.
I would like to execute an Action without "ToList()"
is there any other solution ( with one line of code) to do it ( without toList? i.e using its IEnumerable characteristics )
You could use Array.ForEach instead:
Array.ForEach(g, f => Console.Write(f));
or even1:
Array.ForEach(g, Console.Write);
Personally I'd probably use a foreach loop instead though, for the reasons given by Eric Lippert...
1 If it compiles. I've given up trying to predict whether method group conversion will work in the context of generic type inference.
ForEach() is a method in the List class, not the IEnumerable interface, so it would not be available to the array directly.
If you are hardcore about doing it one line of code and IEnumerable, you could use (or abuse) a method like Any() or All() and do your desired operation (in this case, printing) before returning an appropriate value that would cause the iteration to continue.
Or you could instead use Array.ForEach().
Inspired by Jon Skeet, here's a useful extension method that I wrote:
Client:
var jobs = new List<Job>()
{
new Job { Id = "XAML Developer" },
new Job { Id = "Assassin" },
new Job { Id = "Narco Trafficker" }
};
jobs.Execute(ApplyFilter, j => j.Id);
public void ApplyFilter(string filterId) { }
Extension Method:
public static void Execute<TSource, TKey>(this IEnumerable<TSource> source, Action<TKey> applyBehavior, Func<TSource, TKey> keySelector)
{
foreach (var item in source)
{
var target = keySelector(item);
applyBehavior(target);
}
}

Extensions for IEnumerable generic

I've got two extensions for IEnumerable:
public static class IEnumerableGenericExtensions
{
public static IEnumerable<IEnumerable<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
List<T> toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
public static int IndexOf<T>(this IEnumerable<T> source, Predicate<T> searchPredicate)
{
int i = 0;
foreach (var item in source)
if (searchPredicate(item))
return i;
else
i++;
return -1;
}
}
Then I write this code:
Pages = history.InSetsOf<Message>(500);
var index = Pages.IndexOf(x => x == Pages.ElementAt(0));
where
public class History : IEnumerable
But as a result I've got not '0' as I've expected, but '-1'. I cant understand - why so?
When you write Pages.IndexOf(x => x == Pages.ElementAt(0));, you actually run InSetsOf many times, due to deferred execution (aka lazy). To expand:
Pages = history.InSetsOf<Message>(500) - this line doesn't run InSetsOf at all.
Pages.IndexOf - Iterates over Pages, so it starts executing InSetsOf once.
x == Pages.ElementAt(0) - this executes InSetsOf again, once for every element in the collection of Pages (or at least until searchPredicate return true, which doesn't happen here).
Each time you run InSetsOf you create a new list (specifically, a new first list, because you use ElementAt(0)). These are two different objects, so comparison of == between them fails.
An extremely simple fix would be to return a list, so Pages is not a deferred query, but a concrete collection:
Pages = history.InSetsOf<Message>(500).ToList();
Another option is to use SequenceEqual, though I'd recommend caching the first element anyway:
Pages = history.InSetsOf<Message>(500);
var firstPage = Pages.FirstOrDefault();
var index = Pages.IndexOf(x => x.SequenceEqual(firstPage));
Does your class T implement the IComparable? If not, your equality check might be flawed, as the framework does not know exactly when T= T. You would also get by just overriding equals on your class T I would guess.

Categories