What is the correct exception to throw in the following instance?
If, for example, I have a class: Album with a collection of Songs:
List<Song>
And a method within Album to add a Song:
public void AddSong(Song song)
{
songs.Add(song);
}
Should I throw an exception if a user attempts to add a song that already exists? If so, what type of exception?
I have heard the phrase: "Only use exceptions in exceptional circumstances", but I want to tell the client implementing Album exactly what has gone wrong (not just return a Boolean value).
In exactly the same situation, the .NET designers at Microsoft chose to throw ArgumentException with a descriptive message. Oh, and they were pretty consistent about that.
If your use case implies that items in the collection should be unique, then you should use a datastructure that enforces that.
By doing that, you not only avoid having to write a O(N) lookup method to check for duplicates, but you can also bubble up the pre-existing duplicate key exception that a collection of this sort would throw.
However, .NET does not have a distinct collection that preserves sort order, though it is very easy to extend List to support this.
The approach I used below sacrifices memory footprint for speed, by storing the unique values in a second HashSet. If memory size was more important, you'd just have to do a O(N) check on each Add operation. Because methods are not virtual (for some reason) in List, I resulted to hiding the base methods using the new keyword.
Note that this is just an example, and is not thread safe, and should probably not be used in a real production application.
public class UniqueList<T> : List<T>
{
private HashSet<T> _internalHash = new HashSet<T>();
public UniqueList() : base() { }
public UniqueList(IEnumerable<T> collection) : base(collection) { }
public UniqueList(int capacity) : base(capacity) { }
public new void Add(T item)
{
if (!_internalHash.Add(item))
throw new ArgumentException("Item already exists in UniqueList");
base.Add(item);
}
public new void AddRange(IEnumerable<T> collection)
{
foreach (T t in collection)
{
this.Add(t);
}
}
public new bool Remove(T item)
{
_internalHash.Remove(item);
return base.Remove(item);
}
public new int RemoveAll(Predicate<T> match)
{
int removedElems = 0;
foreach (T item in this)
{
if (match(item))
{
this.Remove(item);
removedElems++;
}
}
return removedElems;
}
public new void RemoveAt(int index)
{
this.Remove(this[index]);
}
public new void RemoveRange(int index, int count)
{
for (int i = index; i < count; i++)
{
this.Remove(this[i]);
}
}
}
Instead of throwing an exception you could have the AddSong method return a boolean - true if the song is successfully added and false otherwise. Personally, I think throwing an exception would be acceptable in this case if it's reasonable to expect that the song is unique in the collection. For an example, if the collection is a list of songs on an album you don't reasonable expect a duplicate song (same title, same duration, same position in the sequence of tracks, etc.). You have the option of creating your own exception class derived from System.Exception to create custom errors if you want so you could throw an exception that explains exactly why the error occurred.
You can always create your own exceptions. Simply create a class that inherits from Exception(or, in this case, ArgumentException).
Something along the lines of DuplicateItemException (or DuplicateSongException if you want something very specific) sounds about right.
If you want to offer up useful exceptions you may want to have a base exception.
AlbumException
Then taking from CMerat's answer create.
DuplicateSongException
This should of course inherit from AlbumException.
Personally, I would make the Album class immutable. In that case this whole situation would disappear.
Related
(Sorry for the vague title; couldn't think of anything better. Feel free to rephrase.)
So let's say my function or property returns an IEnumerable<T>:
public IEnumerable<Person> Adults
{
get
{
return _Members.Where(i => i.Age >= 18);
}
}
If I run a foreach on this property without actually materializing the returned enumerable:
foreach(var Adult in Adults)
{
//...
}
Is there a rule that governs whether IEnumerable<Person> will be materialized to array or list or something else?
Also is it safe to cast Adults to List<Person> or Array without calling ToList() or ToArray()?
Edit
Many people have spent a lot of effort into answering this question. Thanks to all of them. However, the gist of this question still remains unanswered. Let me put in some more details:
I understand that foreach doesn't require the target object to be an array or list. It doesn't even need to be a collection of any kind. All it needs the target object to do is to implement enumeration. However if I place inspect the value of target object, it reveals that the actual underlying object is List<T> (just like it shows object (string) when you inspect a boxed string object). This is where the confusion starts. Who performed this materialization? I inspected the underlying layers (Where() function's source) and it doesn't look like those functions are doing this.
So my problem lies at two levels.
First one is purely theoretical. Unlike many other disciplines like physics and biology, in computer sciences we always know precisely how something works (answering #zzxyz's last comment); so I was trying to dig about the agent who created List<T> and how it decided it should choose a List and not an Array and if there is a way of influencing that decision from our code.
My second reason was practical. Can I rely on the type of actual underlying object and cast it to List<T>? I need to use some List<T> functionality and I was wondering if for example ((List<Person>)Adults).BinarySearch() is as safe as Adults.ToList().BinarySearch()?
I also understand that it isn't going to create any performance penalty even if I do call ToList() explicitly. I was just trying to understand how it is working. Anyway, thanks again for the time; I guess I have spent just too much time on it.
In general terms all you need for a foreach to work is to have an object with an accessible GetEnumerator() method that returns an object that has the following methods:
void Reset()
bool MoveNext()
T Current { get; private set; } // where `T` is some type.
You don't even need an IEnumerable or IEnumerable<T>.
This code works as the compiler figures out everything it needs:
void Main()
{
foreach (var adult in new Adults())
{
Console.WriteLine(adult.ToString());
}
}
public class Adult
{
public override string ToString() => "Adult!";
}
public class Adults
{
public class Enumerator
{
public Adult Current { get; private set; }
public bool MoveNext()
{
if (this.Current == null)
{
this.Current = new Adult();
return true;
}
this.Current = null;
return false;
}
public void Reset() { this.Current = null; }
}
public Enumerator GetEnumerator() { return new Enumerator(); }
}
Having a proper enumerable makes the process work more easily and more robustly. The more idiomatic version of the above code is:
public class Adults
{
private class Enumerator : IEnumerator<Adult>
{
public Adult Current { get; private set; }
object IEnumerator.Current => this.Current;
public void Dispose() { }
public bool MoveNext()
{
if (this.Current == null)
{
this.Current = new Adult();
return true;
}
this.Current = null;
return false;
}
public void Reset()
{
this.Current = null;
}
}
public IEnumerator<Adult> GetEnumerator()
{
return new Enumerator();
}
}
This enables the Enumerator to be a private class, i.e. private class Enumerator. The interface then does all of the hard work - it's not even possible to get a reference to the Enumerator class outside of Adults.
The point is that you do not know at compile-time what the concrete type of the class is - and if you did you may not even be able to cast to it.
The interface is all you need, and even that isn't strictly true if you consider my first example.
If you want a List<Adult> or an Adult[] you must call .ToList() or .ToArray() respectively.
There is no such thing as a default concrete type for any interface.
The entire point of an interface is to guarantee properties, methods, events or indexers, without the user need of any knowledge of the concrete type that implements it.
When using an interface, all you can know is the properties, methods, events and indexers this interface declares, and that's all you actually need to know. That's just another aspect of encapsulation - same as when you are using a method of a class you don't need to know the internal implementation of that method.
To answer your question in the comments:
who decides that concrete type in case we don't, just as I did above?
That's the code that created the instance that's implementing the interface.
Since you can't do var Adults = new IEnumerable<Person> - it has to be a concrete type of some sort.
As far as I see in the source code for linq's Enumerable extensions - the where returns either an instance of Iterator<TSource> or an instance of WhereEnumerableIterator<TSource>. I didn't bother checking further what exactly are those types, but I can pretty much guarantee they both implement IEnumerable, or the guys at Microsoft are using a different c# compiler then the rest of us... :-)
The following code hopefully highlights why neither you nor the compiler can assume an underlying collection:
public class OneThroughTen : IEnumerable<int>
{
private static int bar = 0;
public IEnumerator<int> GetEnumerator()
{
while (true)
{
yield return ++bar;
if (bar == 10)
{ yield break; }
}
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
}
class Program
{
static void Main(string[] args)
{
IEnumerable<int> x = new OneThroughTen();
foreach (int i in x)
{ Console.Write("{0} ", i); }
}
}
Output being, of course:
1 2 3 4 5 6 7 8 9 10
Note, the code above behaves extremely poorly in the debugger. I don't know why. This code behaves just fine:
public IEnumerator<int> GetEnumerator()
{
while (bar < 10)
{
yield return ++bar;
}
bar = 0;
}
(I used static for bar to highlight that not only does the OneThroughTen not have a specific collection, it doesn't have any collection, and in fact has no instance data whatsoever. We could just as easily return 10 random numbers, which would've been a better example, now that I think on it :))
From your edited question and comments it sounds like you understand the general concept of using IEnumerable, and that you cannot assume that "a list object backs all IEnumerable objects". Your real question is about something that has confused you in the debugger, but we've not really been able to understand exactly what it is you are seeing. Perhaps a screenshot would help?
Here I have 5 IEnumerable<int> variables which I assign in various ways, along with how the "Watch" window describes them. Does this show the confusion you are having? If not, can you construct a similarly short program and screenshot that does?
Coming a bit late into the party here :)
Actually Linq's "Where" decides what's going to be the underlying implementation of IEnumerable's GetEnumerator.
Look at the source code:
https://github.com/dotnet/runtime/blob/918e6a9a278bc66fb191c43d4db4a71e63ffad31/src/libraries/System.Linq/src/System/Linq/Where.cs#L59
You'll see that based on the "source" type, the methods return "WhereSelectArrayIterator" or "WhereSelectListIterator" or a more generic "WhereSelectEnumerableSelector".
Each of this objects implement the GetEnumerator over an Array, or a List, so I'm pretty sure that's why you see the underlying object type being one of these on VS inspector.
Hope this helps clarifying.
I have been digging into this myself. I believe the 'underlying type' is an iterator method, not an actual data structure type.
An iterator method defines how to generate the objects in a sequence
when requested.
https://learn.microsoft.com/en-us/dotnet/csharp/iterators#enumeration-sources-with-iterator-methods
In my usecase/testing, the iterator is System.Linq.Enumerable.SelectManySingleSelectorIterator. I don't think this is a collection data type. It is a method that can enumerate IEnumerables.
Here is a snippet:
public IEnumerable<Item> ItemsToBuy { get; set; }
...
ItemsToBuy = Enumerable.Range(1, rng.Next(1, 20))
.Select(RandomItem(rng, market))
.SelectMany(e => e);
The property is IEnumerable and .SelectMany returns IEnumerable. So what is the actual collection data structure? I don't think there is one in how I am interpreting 'collection data structure'.
Also is it safe to cast Adults to List or Array without
calling ToList() or ToArray()?
Not for me. When attempting to cast ItemsToBuy collection in a foreach loop I get the following runtime exception:
{"Unable to cast object of type
'SelectManySingleSelectorIterator2[System.Collections.Generic.IEnumerable1[CashMart.Models.Item],CashMart.Models.Item]'
to type 'CashMart.Models.Item[]'."}
So I could not cast, but I could .ToArray(). I do suspect there is a performance hit as I would think that the IEnumerable would have to 'do things' to make it an array, including memory allocation for the array even if the entities are already in memory.
However if I place inspect the value of target object, it reveals that
the actual underlying object is List
This was not my experience and I think it may depend on the IEnumerable source as well as the LinQ provider. If I add a where, the returned iterator is:
System.Linq.Enumerable.WhereEnumerableIterator
I am unsure what your _Member source is, but using LinQ-to-Objects, I get an iterator. LinQ-to-Entities must call the database and store the result set in memory somehow and then enumerate on that result. I would doubt that it internally makes it a List, but I don't know much. I suspect instead that _Members may be a List somewhere else in your code thus, even after the .Where, it shows as a List.
I am wandering about the more in-depth functionality of the IEnumerable<T> interface.
Basically, it works as an intermediary step in execution. For example, if you write:
IEnumerable<int> temp = new int[]{1,2,3}.Select(x => 2*x);
The result of the Select function will not be calculated (enumerated) until something is done with temp to allow it (such as List<int> list = temp.ToList()).
However, what puzzles me is, since IEnumerable<T> is an interface, it cannot, by definition, be instantiated. So, what is the collection the actual items (in the example 2*x items) reside in?
Moreover, if we were to write IEnumerable<int> temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
I cannot seem to find a thorough (more in-depth) explanation as to the actual implementation of this interface and its functionality (for example, if there is an underlying collection, how does the yield keyword work).
Basically, what I am asking for is a more elaborate explanation on the functionality of IEnumerable<T>.
Implementation shouldn't matter. All these (LINQ) methods return IEnumerable<T>, interface members are the only members you can access, and that should be enough to use them.
However, if you really have to know, you can find actual implementations on http://sourceof.net.
Enumerable.cs
But, for some of the methods you won't be able to find explicit class declaration, because some of them use yield return, which means proper class (with state machine) is generated by compiler during compilation. e.g. Enumerable.Repeat is implemented that way:
public static IEnumerable<int> Range(int start, int count) {
long max = ((long)start) + count - 1;
if (count < 0 || max > Int32.MaxValue)
throw Error.ArgumentOutOfRange("count");
return RangeIterator(start, count);
}
static IEnumerable<int> RangeIterator(int start, int count) {
for (int i = 0; i < count; i++)
yield return start + i;
}
You can read more about that on MSDN: Iterators (C# and Visual Basic)
Not all objects that implement IEnumerable defer execution in some way. The API of the interface makes it possible to defer execution, but it doesn't require it. There are likewise implementations that don't defer execution in any way.
So, what is the collection the actual items (in the example 2*x items) reside in?
There is none. Whenever the next value is requested it computes that one value on demand, gives it to the caller, and then forgets the value. It doesn't store it anywhere else.
Moreover, if we were to write IEnumerable<int> temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
There wouldn't be one. It would compute each new value immediately when you ask for the next value and it won't remember it afterward. It only stores enough information to be able to compute the next value, which means it only needs to store the element and the number of values left to yield.
While the actual .NET implementations will use much more concise means of creating such a type, creating an enumerable that defers execution is not particularly hard. Doing so even the long way is more tedious than difficult. You simply compute the next value in the MoveNext method of the iterator. In the example you asked of, Repeat, this is easy as you only need to compute if there is another value, not what it is:
public class Repeater<T> : IEnumerator<T>
{
private int count;
private T element;
public Repeater(T element, int count)
{
this.element = element;
this.count = count;
}
public T Current { get { return element; } }
object IEnumerator.Current
{
get { return Current; }
}
public void Dispose() { }
public bool MoveNext()
{
if (count > 0)
{
count--;
return true;
}
else
return false;
}
public void Reset()
{
throw new NotSupportedException();
}
}
(I've omitted an IEnumerable type that just returns a new instance of this type, or a static Repeat method that creates a new instance of that enumerable. There isn't anything particularly interesting to see there.)
A slightly more interesting example would be something like Count:
public class Counter : IEnumerator<int>
{
private int remaining;
public Counter(int start, int count)
{
Current = start;
this.remaining = count;
}
public int Current { get; private set; }
object IEnumerator.Current
{
get { return Current; }
}
public void Dispose() { }
public bool MoveNext()
{
if (remaining > 0)
{
remaining--;
Current++;
return true;
}
else
return false;
}
public void Reset()
{
throw new NotSupportedException();
}
}
Here we're not only computing if we have another value, but what that next value is, each time a new value is requested of us.
So, what is the collection the actual items (in the example 2*x items) reside in?
It is not residing anywhere. There is code that will produce the individual items "on demand" when you iterate, but the 2*x numbers are not computed upfront. They are also not stored anywhere, unless you call ToList or ToArray.
Moreover, if we were to write IEnumerable temp = Enumerable.Repeat(1, 10);, what would be the underlying collection where the 1s are stored (array, list, something else)?
The same picture is here: the returned implementation of IEnumerable is not public, and it returns its items on demand, without storing them anywhere.
C# compiler provides a convenient way to implement IEnumerable without defining a class for it. All you need is to declare your method return type as IEnumerable<T>, and use yield return to supply values on as-needed basis.
Let's say I have IEnumerable<int> property backed with List<int> field, so I can modify the collection from within the class, but it's publicly exposed as read-only.
public class Foo
{
private List<int> _bar = new List<int>();
public IEnumerable<int> Bar
{
get { return _bar; }
}
}
But with code like that you can easily cast object retrieved from the property back to List<int> and modify it:
var foo = new Foo();
var bar = (List<int>)foo.Bar;
bar.Add(10);
Question is: what is the best (best readable, easiest to write, without performance loss) way to avoid that?
I can come up with at least 4 solutions, but non of them is perfect:
foreach and yield return:
public IEnumerable<int> Bar
{
get
{
foreach (var item in _bar)
yield return item;
}
}
- really annoying to write and to read.
AsReadOnly():
public IEnumerable<int> Bar
{
get { return _bar.AsReadOnly(); }
}
+ will cause exception when someone tries to modify the returned collection
+ does not create a copy of the entire collection.
ToList()
public IEnumerable<int> Bar
{
get { return _bar.ToList(); }
}
+ User can still modify retrieved collection, but it's not the same collection we are modifying from within the class, so we shouldn't care.
- creates a copy of entire collection what may cause problems when collection is big.
Custom wrapper class.
public static class MyExtensions
{
private class MyEnumerable<T> : IEnumerable<T>
{
private ICollection<T> _source;
public MyEnumerable(ICollection<T> source)
{
_source = source;
}
public IEnumerator<T> GetEnumerator()
{
return _source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable)_source).GetEnumerator();
}
}
public static IEnumerable<T> AsMyEnumerable<T>(this ICollection<T> source)
{
return new MyEnumerable<T>(source);
}
}
usage:
public IEnumerable<int> Bar
{
get
{
return _bar.AsMyEnumerable();
}
}
+ don't need to clone the collection
- when you use it as LINQ queries source some methods won't use ICollection.Count, because you don't expose it.
Is there any better way to do that?
Question is: what is the best (best readable, easiest to write, without performance loss) way to avoid that?
In general, I don't try to avoid it. The consumer of my API should use the type I expose, and if they don't, any bugs resulting are their fault, not mine. As such, I don't really care if they cast the data that way - when I change my internal representation, and they get cast exceptions, that's their issue.
That being said, if there is a security concern, I would likely just use AsReadOnly. This is effectively self-documenting, and has no real downsides (apart from a small allocation for the wrapper, as there is no copy of the data, you do get meaningful exceptions on modification, etc). There is no real disadvantage to this vs. making your own custom wrapper, and a custom wrapper means more code to test and maintain.
In general, I personally try to avoid copying without reason. That would eliminate ToList() as an option in general. Using an iterator (your first option) is not as bad, though it does not really provide many advantages over a ReadOnlyCollection<T>.
In general I'm in the don't bother camp. If I give you an IEnumerable and you cast it to something else you are the one who broke the contract and its not my fault if something breaks. If I found myself in a position where I really needed to protect a client from corrupting my state I would create an extension method like:
static IEnumerable<T> AsSafeEnumerable<T>(this IEnumerable<T> obj)
{
foreach (var item in obj)
{
yield return item;
}
}
and never worry about it again.
I think you've sufficiently outlined many of the solutions but one thing I didn't see you take into account was allocation overhead (yet performance was listed as a concern). Allocations matter and having the property allocate a new object on every call for the same list is wasteful. Instead I would prefer an object with lifetime equal to the original list that could be returned without the extra overhead. Essentially a modified version of #4
I agree with Reed Copsey post, if the consumer wishes to cast and manage the data in a way that will corrupt the class data then let them.
You can spend alot of time trying to "stop" a particular action to happen just to have some clever Reflection spoil all your hard word.
Now to answer the question, if you had to do this. The first option seems the most relevant to me, however I would not enumerate through the list when calling the property. I would just return the inner collection such as.
public IEnumerable<int> Bar
{
get { return _bar; }
}
I've few methods which acept collections of fixed size (e.g. 2, 3, 5). And I can't decide which way is better:
public void Foo(IEnumerable<Object> objects)
{
if(objects.Count() != 3)
{
throw new Exception()
}
// actions
}
public void Foo(Object objectA, Object objectB, Object objectC)
{
// actions
}
Is there any ultimate +\- of each option?
The second is much better in my view:
It's obvious from the signature that it's expecting 3 values
Failures are flagged at compile time instead of execution time
If you have a specific number of members that are required, use your second option. It is confusing to the consumer of your method if a collection is allowed but then an exception is thrown at run time. This may or may not be caught if proper testing is not utilized and it is misleading. Always design for the person who will consume your code, never assuming that you will always be the one to maintain it.
I would go for this:
public class Bar
{
public Object object1;
public Object object2;
public Object object3;
// add a constructor if you want
}
...
public void Foo(Bar b)
{
// actions
}
Often times I need a collection of non-sequential objects with numeric identifiers. I like using the KeyedCollection for this, but I think there's a serious drawback. If you use an int for the key, you can no longer access members of the collection by their index (collection[index] is now really collection[key]). Is this a serious enough problem to avoid using the int as the key? What would a preferable alternative be? (maybe int.ToString()?)
I've done this before without any major problems, but recently I hit a nasty snag where XML serialization against a KeyedCollection does not work if the key is an int, due to a bug in .NET.
Basically you need to decide if users of the class are likely to be confused by the fact that they can't, for example, do:
for(int i=0; i=< myCollection.Count; i++)
{
... myCollection[i] ...
}
though they can of course use foreach, or use a cast:
for(int i=0; i=< myCollection.Count; i++)
{
... ((Collection<MyType>)myCollection)[i] ...
}
It's not an easy decision, as it can easily lead to heisenbugs. I decided to allow it in one of my apps, where access from users of the class was almost exclusively by key.
I'm not sure I'd do so for a shared class library though: in general I'd avoid exposing a KeyedCollection in a public API: instead I would expose IList<T> in a public API, and consumers of the API who need keyed access can define their own internal KeyedCollection with a constructor that takes an IEnumerable<TItem> and populates the collection with it. This means you can easily build a new KeyedCollection from a list retrieved from an API.
Regarding serialization, there is also a performance problem that I reported to Microsoft Connect: the KeyedCollection maintains an internal dictionary as well as a list, and serializes both - it is sufficient to serialize the list as the dictionary can easily be recreated on deserialization.
For this reason as well as the XmlSerialization bug, I'd recommend you avoid serializing a KeyedCollection - instead only serialize the KeyedCollection.Items list.
I don't like the suggestion of wrapping your int key in another type. It seems to me wrong to add complexity simply so that a type can be used as an item in a KeyedCollection. I'd use a string key (ToString) rather than doing this - this is rather like the VB6 Collection class.
FWIW, I asked the same question some time ago on the MSDN forums. There is a response from a member of the FxCop team, but no conclusive guidelines.
An easy solution might be to wrap the int into another type to create a distinct type for overload resolution. If you use a struct, this wrapper doesn't have any additional overhead:
struct Id {
public int Value;
public Id(int value) { Value = value; }
override int GetHashCode() { return Value.GetHashCode(); }
// … Equals method.
}
It might be best to add a GetById(int) method to a collection type. Collection<T> can be used instead if you don't need any other key for accessing the contained objects:
public class FooCollection : Collection<Foo>
{ Dictionary<int,Foo> dict = new Dictionary<int,Foo>();
public Foo GetById(int id) { return dict[id]; }
public bool Contains(int id) { return dict.Containskey(id);}
protected override void InsertItem(Foo f)
{ dict[f.Id] = f;
base.InsertItem(f);
}
protected override void ClearItems()
{ dict.Clear();
base.ClearItems();
}
protected override void RemoveItem(int index)
{ dict.Remove(base.Items[index].Id);
base.RemoveItem(index);
}
protected override void SetItem(int index, Foo item)
{ dict.Remove(base.Items[index].Id);
dict[item.Id] = item;
base.SetItem(index, item);
}
}
}
The key in a KeyedCollection should be unique and quickly derivable from the object being collected. Given a person class, for example, it could be the SSN property or perhaps even concatenating FirstName and LastName properties (if the result is known to be unique). If an ID is legitimately a field of the object being collected than it is a valid candidate for the key. But perhaps try casting it as a string instead to avoid the collision.