What should I prefer if I know the number of elements before runtime?
Resharper offers me IEnumerable<string> instead of string[]?
ReSharper suggests IEnumerable<string> if you are only using methods defined for IEnumerable. It does so with the idea that, since you clearly do not need the value to be typed as array, you might want to hide the exact type from the consumers of (i.e., the code that uses) the value because you might want to change the type in the future.
In most cases, going with the suggestion is the right thing to do. The difference will not be something that you can observe while your program is running; rather, it's in how easily you will find it to make changes to your program in the future.
From the above you can also infer that the whole suggestion/question is meaningless unless the value we are talking about is passed across method boundaries (I don't remember if R# also offers it for a local variable).
If ReSharper suggests you use IEnumerable<string> it means you are only using features of that interface and no array specific features. Go with the suggestion of ReSharper and change it.
If you are trying to provide this method as an interface to other methods, I would prefer to have the output of your method more generic, hence would go for IEnumerable<string>.
Inside a method, if you are trying to instantiate and this is not being passed around to other methods, I would go for string[]. unless I need deferred execution. Although, it doesn't matter which one you use in this case.
The actual type should be string[] but depending on the user you may want to expose it as something else. e.g. IEnumerable<string> sequence = new string[5]... In particular if it's something like static readonly, then you should make it a ReadOnlyCollection so the entries can't be modified.
with string[] you can do more you can acces items by index with IEnumerable you have to loop to find specific index
It's probably suggesting this because it's looking for a better Liskov Substitution at this point in your code. Keep in mind the difference between the declared type and the implementing type. IEnumerable<> isn't an implementation, it's an interface. You can declare the variable as an IEnumerable<string> and build it with a string[] since the string array implements IEnumerable<string>.
What this does for you is allow you to pass around that string array as a more generic, more abstracted type. Anything which expects or returns an IEnumerable<string> (regardless of implementation, be it List<string> or string[] or anything else) can then use your string array, without having to worry about the specific implementation you pass it. As long as it satisfies the interface, it's polymorphic of the correct type.
Keep in mind that this isn't always the way to go. Sometimes you, as the developer, are very concerned with the implementation (perhaps for really fine-grained performance tuning, for example) and don't want to move up to an abstraction. The decision is up to you. ReSharper is merely making a suggestion to use an abstraction rather than an implementation in a variable/method declaration.
ReSharper is likely flagging it for you because you are not returning the least constrained type. If you aren't going to be using access on it by index in the future, I'd go with IEnumerable to have less constraint on the method which returns it.
Depends on your usage later on. If you need to enumare through these elements or sort or compare them later on then I would recommend IEnumerable otherwise go with array.
I wrote this response for a similar question regarding array or IEnumerable for return values, which was then closed as duplicate before I could post it. I thought the answer might be interesting to some so I post it here.
The main advantage of IEnumerable over T[] is that IEnumerable (for return values) can be made lazy. Ie it only computes the next element when needed.
Consider the difference between Directory.GetFiles and Directory.EnumerateFiles. GetFiles returns an Array, EnumerateFiles returns IEnumerable. This means that for a directory with two million files the Array will contain two million strings. EnumerateFiles only instansiate the strings as needed saving memory and improving response time.
However, it's not all benefits.
foreach is significantly less efficient on non-arrays (you can see this by disassembling the ILCode).
Array promises more, ie that its length will not change.
Lazy evaluation is not always better, consider the Directory class. The GetFiles implementation will open a find file handle, iterate over all files, close the find file handle and then return results. EnumerateFiles will do nothing until the first find file is requested, then the find file handle is opened and the files iterated, find file handle is closed when the enumerator is disposed. This means that the life-time of the find file handle is controlled by the caller, not the callee. Can be seen as less encapsulation and can give potential runtime errors with locked file handles.
In my humble opinion, I think R# is overzelous in suggestion IEnumerable over arrays especially so for return values (input parameters have less potential drawbacks). What I tend to do when I see a function that returns IEnumerable is a .ToArray in order to avoid potential issues with Lazy evaluation but if the Collection is already an Array this is inefficient.
I like the principle; promise alot, require little. Ie don't require that the input parameters must be arrays (use IEnumerable) but return Array over IEnumerable as Array is a bigger promise.
Related
I have IEnumerable<Object> and need to pass to a method as a parameter but this method takes IReadOnlyCollection<Object>
Is it possible to convert IEnumerable<Object> to IReadOnlyCollection<Object> ?
One way would be to construct a list, and call AsReadOnly() on it:
IReadOnlyCollection<Object> rdOnly = orig.ToList().AsReadOnly();
This produces ReadOnlyCollection<object>, which implements IReadOnlyCollection<Object>.
Note: Since List<T> implements IReadOnlyCollection<T> as well, the call to AsReadOnly() is optional. Although it is possible to call your method with the result of ToList(), I prefer using AsReadOnly(), so that the readers of my code would see that the method that I am calling has no intention to modify my list. Of course they could find out the same thing by looking at the signature of the method that I am calling, but it is nice to be explicit about it.
Since the other answers seem to steer in the direction of wrapping the collections in a truly read-only type, let me add this.
I have rarely, if ever, seen a situation where the caller is so scared that an IEnumerable<T>-taking method might maliciously try to cast that IEnumerable<T> back to a List or other mutable type, and start mutating it. Cue organ music and evil laughter!
No. If the code you are working with is even remotely reasonable, then if it asks for a type that only has read functionality (IEnumerable<T>, IReadOnlyCollection<T>...), it will only read.
Use ToList() and be done with it.
As a side note, if you are creating the method in question, it is generally best to ask for no more than an IEnumerable<T>, indicating that you "just want a bunch of items to read". Whether or not you need its Count or need to enumerate it multiple times is an implementation detail, and is certainly prone to change. If you need multiple enumeration, simply do this:
items = items as IReadOnlyCollection<T> ?? items.ToList(); // Avoid multiple enumeration
This keeps the responsibility where it belongs (as locally as possible) and the method signature clean.
When returning a bunch of items, on the other hand, I prefer to return an IReadOnlyCollection<T>. Why? The goal is to give the caller something that fulfills reasonsable expectations - no more, no less. Those expectations are usually that the collection is materialized and that the Count is known - precisely what IReadOnlyCollection<T> provides (and a simple IEnumerable<T> does not). By being no more specific than this, our contract matches expectations, and the method is still free to change the underlying collection. (In contrast, if a method returns a List<T>, it makes me wonder what context there is that I should want to index into the list and mutate it... and the answer is usually "none".)
As an alternative to dasblinkenlight's answer, to prevent the caller casting to List<T>, instead of doing orig.ToList().AsReadOnly(), the following might be better:
ReadOnlyCollection<object> rdOnly = Array.AsReadOnly(orig.ToArray());
It's the same number of method calls, but one takes the other as a parameter instead of being called on the return value.
I have a method that looks like this:
public void UpdateTermInfo(List<Term> termInfoList)
{
foreach (Term termInfo in termInfoList)
{
UpdateTermInfo(termInfo);
}
m_xdoc.Save(FileName.FullName);
}
Resharper advises me to change the method signature to IEnumerable<Term> instead of List<Term>. What is the benefit of doing this?
The other answers point out that by choosing a "larger" type you permit a broader set of callers to call you. Which is a good enough reason in itself to make this change. However, there are other reasons. I would recommend that you make this change because when I see a method that takes a list or an array, the first thing I think is "what if that method tries to change an item in my list/array?"
You want the contents of a bucket, but you are requiring not just the bucket but also the ability to change its contents. Why would you require that if you're not going to use that ability? When you say "this method cannot take any old sequence; it has to take a mutable list that is indexed by integers" I think that you're making that requirement on the caller because you're going to take advantage of that power.
If "I'm planning on messing up your data structure" is not what you intend to communicate to the caller of the method then don't communicate that. A method that takes a sequence communicates "The most I'm going to do is read from this sequence in order".
Simply put, accepting an enumerable allows your function to be compatible with a broader scope of input arguments, such as arrays and LINQ queries.
To expound on accepting LINQ queries, one could do:
UpdateTermInfo(myTermList.Where(x => somefilter));
Additionally, specifying an interface rather than a concrete class allows others to provide their own implementation of that interface. In this way, you are being "subscriptive" rather than "proscriptive." (Yes, I did just make up a word.)
In general (with many exceptions relating to what sort of abilities you want to reserve for potential later modifications), it is a best-practice to implement functions using arguments that are the most general that they can be. This gives maximum flexibility to the consumer of your function.
As a result, if you are dead-set on using a list for this function (perhaps because at some later date you expect you might want to use properties such as Count or the index operator), I would strongly urge you to consider using IList<Term> instead of List<Term> for the reasons mentioned above.
List implements IEnumerable, using it would makes things more flexible. If an instance came along where you didn't want to use a List and wanted to use a different collection object it would cast from IEnumerable with ease.
For instance IEnumerable allows you to use Arrays and many others as opposed to always using a List.
Inumerable is simply a collection of items, dissimilar to a List, where you can add, remove, sort, use For Each, Count etc.
The main idea behind that refactor is that you make the method more general. You don't say what data structure you want, only what you need from it: that you can iterate through its elements.
So later, when you decide that O(n) search is not good enough for you, you only have to change one line and move along.
If you use List then you are confining yourself to only use a concrete implementation of List where as with IEnumerable you can pass in Arrays, Lists, Collections as they all implement that interface.
My question is about naming, design, and implementation choices. I can see myself going in two different directions with how to solve an issue and I'm interested to see where others who may have come across similar concerns would handle the issue. It's part aesthetics, part function.
A little background on the code... I created a type called ISlice<T> that provides a reference into a section of a source of items which can be a collection (e.g. array, list) or a string. The core support comes from a few implementation classes that support fast indexing using the Begin and End markers for the slice to get the item from the original source. The purpose is to provide slicing capabilities similar to what the Go language provides while using Python style indexing (i.e. both positive and negative indexes are supported).
To make creating slices (instances of ISlice<T>) easier and more "fluent", I created a set of extension methods. For example:
static public ISlice<T> Slice<T>(this IList<T> source, int begin, int end)
{
return new ListSlice<T>(source, begin, end);
}
static public ISlice<char> Slice(this string source, int begin, int end)
{
return new StringSlice(source, begin, end);
}
There are others, such as providing optional begin/end parameters, but the above will suffice for where I'm going with this.
These routines work well and make it easy to slice up a collection or a string. What I also need is way to take a slice and create a copy of it as an array, a list, or a string. That's where things get "interesting". Originally, I thought I'd need to create ToArray, ToList extension methods, but then remembered that the LINQ variants perform optimizations if your collection implements ICollection<T>. In my case, ISlice<T>, does inherits from it, though much to my chagrin as I dislike throwing NotSupportedExceptions from methods like Add. Regardless, I get those for free. Great.
What about converting back into a string as there's no built-in support for converting an IEnumerable<char> easily back into a string? Closest thing I found is one of the string.Concat overloads, but it would not handle chars as efficiently as it could. Just as important from a design stand point is that it doesn't jump out as a "conversion" routine.
The first thought was to create a ToString extension method, but that doesn't work as ToString is an instance method which means it trumps extension methods and would never be called. I could override ToString, but the behavior would be inconsistent as ListSlice<T> would need to special case its ToString for times where T is a char. I don't like that as the ToString will give something useful when the type parameter is a char, but the class name in other cases. Also, if there are other slice types created in the future I'd have to create a common base class to ensure the same behavior or each class would have to implement this same check. An extension method on the interface would handle that much more elegantly.
The extension method leads me to a naming convention issue. The obvious is to use ToString, but as stated earlier it's not allowed. I could name it something different, but what? ToNewString? NewString? CreateString? Something in the To-family of methods would let it fall in with the ToArray/ToList routines, but ToNewString sticks out as being 'odd' when seen in the intellisense and code editor. NewString/CreateString are not as discoverable as you'd have to know to look for them. It doesn't fit the "conversion method" pattern that the To-family methods provide.
Go with overriding ToString and accept the inconsistent behavior hardcoded into the ListSlice<T> implementation and other implementations? Go with the more flexible, but potentially more poorly named extension method route? Is there a third option I haven't considered?
My gut tells me to go with the ToString despite my reservations, though, it also occurred to me... Would you even consider ToString giving you a useful output on a collection/enumerable type? Would that violate the principle of least surprise?
Update
Most implementations of slicing operations provide a copy, albeit a subset, of the data from whatever source was used for the slice. This is perfectly acceptable in most use cases and leaves for a clean API as you can simply return the same data type back. If you slice a list, you return a list containing only the items in the range specified in the slice. If you slice a string, you return a string. And so on.
The slicing operations I'm describing above are solving an issue when working with constraints which make this behavior undesirable. For example, if you work with large data sets, the slice operations would lead to unnecessary additional memory allocations not to mention the performance impact of copying the data. This is especially true if the slices will have further processing done on them before getting to your final results. So, the goal of the slice implementation is to have references into larger data sets to avoid making unnecessary copies of the information until it becomes beneficial to do so.
The catch is that at the end of the processing the desire to turn the slice-based processed data back into a more API and .NET friendly type like lists, arrays, and strings. It makes the data easier to pass into other APIs. It also allows you to discard the slices, thus, also the large data set the slices referenced.
Would you even consider ToString giving you a useful output on a collection/enumerable type? Would that violate the principle of least surprise?
No, and yes. That would be completely unexpected behavior, since it would behave differently than every other collection type.
As for this:
What about converting back into a string as there's no built-in support for converting an IEnumerable>char< easily back into a string?
Personally, I would just use the string constructor taking an array:
string result = new string(mySlice.ToArray());
This is explicit, understood, and expected - I expect to create a new string by passing an object to a constructor.
Perhaps the reason for your conundrum is the fact that you are treating string as a ICollection<char>. You haven't provide details about the problem that you are trying to solve but maybe that's a wrong assumption.
It's true that a string is an IEnumerable<char>. But as you've noticed assuming a direct mapping to a collection of chars creates problems. Strings are just too "special" in the framework.
Looking at it from the other end, would it be obvious that the difference between an ISlice<char> and ISlice<byte> is that you can concatenate the former into a string? Would there be a concatenate operation on the latter that makes sense? What about ISlice<string>? Shouldn't I be able to concatenate those as well?
Sorry I'm not providing specific answers but maybe these questions will point you at the right solution for your problem.
String[] is light weight compared to list<string>. So if I don't have any need to manipulate my collection, should I use string[] or is it always advisable to go for list<string>?
In case of list<string>, do we need to perform null check or not required?
Use string[] when you need to work with static arrays: you don't need to add and remove elements -> only access elements by index. If you need to modify the collection use List<string>. And if you intend to only loop through the contents and never access by index use IEnumerable<string>.
If the collection should not be modified, use string[] or even better, IEnumerable<string>. This indicates that the collection of strings should be treated as a read-only collection of strings.
Using IEnumerable<string> in your API also opens up for the underlying implementation to be changed without breaking client code. You can easily use a string array as the underlying implementation and expose it as IEnumerable<string>. If the implementation at a later stage is better suited using a list or other structure, you can change it as long as it supports IEnumerable<string>.
I'd say you've summed it up well yourself.
If the size of your list won't change, and you don't need any of the advanced List functions like sorting, then String[] is preferable because as you say it's lightweight.
But consider potential future requirements - is it possible that you might one day want to use List for something? If so, consider using List now.
You need to check for null, both in String[] and also List. Both types can have a null value.
I would say it depends what you're trying to accomplish. Generally, however, my opinion is that you have access to a great framework that does a lot of hard work for you so use it (ie. use List<> instead of array).
Have a look at the members on offer to you by a class like List<> and you'll see what I mean: in addition to not having to worry as much about array capacity and index out of bounds exceptions, List and other ICollection/IList classes give you methods like Add, Remove, Clear, Insert, Find, etc that are infinitely helpful. I also believe
myList.Add (myWidg);
is a lot nicer to read and maintain than
myArr [i] = myWidg;
I would definitely vote for List. Apart from various member functions that a list supports, it provides 'no element' concept. There can be a list which have no elements but there cannot be an array with no elements. So, if we adhere to best practices of not returning null from a function, then we can safely check for the count of the element without doing a null check. In case of array, we have to check the null. Moreover, I seldom use a loop to search an element, either in array or list. LINQ just makes it neat and we can use it with List not array. Array has to be converted to list to make use of LINQ.
This really really depends on the situation. Anything really performance related should probably be done with arrays. Anything else would go with lists.
I just realize that maybe I was mistaken all the time in exposing T[] to my views, instead of IEnumerable<T>.
Usually, for this kind of code:
foreach (var item in items) {}
item should be T[] or IEnumerable<T>?
Than, if I need to get the count of the items, would the Array.Count be faster over the IEnumerable<T>.Count()?
IEnumerable<T> is generally a better choice here, for the reasons listed elsewhere. However, I want to bring up one point about Count(). Quintin is incorrect when he says that the type itself implements Count(). It's actually implemented in Enumerable.Count() as an extension method, which means other types don't get to override it to provide more efficient implementations.
By default, Count() has to iterate over the whole sequence to count the items. However, it does know about ICollection<T> and ICollection, and is optimised for those cases. (In .NET 3.5 IIRC it's only optimised for ICollection<T>.) Now the array does implement that, so Enumerable.Count() defers to ICollection<T>.Count and avoids iterating over the whole sequence. It's still going to be slightly slower than calling Length directly, because Count() has to discover that it implements ICollection<T> to start with - but at least it's still O(1).
The same kind of thing is true for performance in general: the JITted code may well be somewhat tighter when iterating over an array rather than a general sequence. You'd basically be giving the JIT more information to play with, and even the C# compiler itself treats arrays differently for iteration (using the indexer directly).
However, these performance differences are going to be inconsequential for most applications - I'd definitely go with the more general interface until I had good reason not to.
It's partially inconsequential, but standard theory would dictate "Program against an interface, not an implementation". With the interface model you can change the actual datatype being passed without effecting the caller as long as it conforms to the same interface.
The contrast to that is that you might have a reason for exposing an array specifically and in which case would want to express that.
For your example I think IEnumerable<T> would be desirable. It's also worthy to note that for testing purposes using an interface could reduce the amount of headache you would incur if you had particular classes you would have to re-create all the time, collections aren't as bad generally, but having an interface contract you can mock easily is very nice.
Added for edit:
This is more inconsequential because the underlying datatype is what will implement the Count() method, for an array it should access the known length, I would not worry about any perceived overhead of the method.
See Jon Skeet's answer for an explanation of the Count() implementation.
T[] (one sized, zero based) also implements ICollection<T> and IList<T> with IEnumerable<T>.
Therefore if you want lesser coupling in your application IEnumerable<T> is preferable. Unless you want indexed access inside foreach.
Since Array class implements the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, and System.Collections.Generic.IEnumerable<T> generic interfaces, I would use IEnumerable, unless you need to use these interfaces.
http://msdn.microsoft.com/en-us/library/system.array.aspx
Your gut feeling is correct, if all the view cares about, or should care about, is having an enumerable, that's all it should demand in its interfaces.
What is it logically (conceptually) from the outside?
If it's an array, then return the array. If the only point is to enumerate, then return IEnumerable. Otherwise IList or ICollection may be the way to go.
If you want to offer lots of functionality but not allow it to be modified, then perhaps use a List internally and return the ReadonlyList returned from it's .AsReadOnly() method.
Given that changing the code from an array to IEnumerable at a later date is easy, but changing it the other way is not, I would go with a IEnumerable until you know you need the small spead benfit of return an array.