Should I define custom enumerator or use built-in one?

Should I define custom enumerator or use built-in one? - c#

I've been given some code from a customer that looks like this:
public class Thing
{
// custom functionality for Thing...
}
public class Things : IEnumerable
{
Thing[] things;
internal int Count { get { return things.Length; } }
public Thing this[int i] { get { return this.things[i]; } }
public IEnumerator GetEnumerator() { return new ThingEnumerator(this); }
// custom functionality for Things...
}
public class ThingEnumerator : IEnumerator
{
int i;
readonly int count;
Things container;
public ThingEnumerator(Things container)
{
i = -1;
count = container.Count;
this.container = container;
}
public object Current { get { return this.container[i]; } }
public bool MoveNext() { return ++i < count; }
public void Reset() { i = -1; }
}
What I'm wondering is whether it would have been better to have gotten rid of the ThingEnumerator class and replaced the Things.GetEnumerator call with an implementation that simply delegated to the array's GetEnumerator? Like so:
public IEnumerator GetEnumerator() { return things.GetEnumerator(); }
Are there any advantages to keeping the code as is? (Another thing I've noticed is that the existing code could be improved by replacing IEnumerator with IEnumerator<Thing>.)

With generics, there is really little value in implementing IEnumerable and IEnumerator yourself.
Removing these are replacing the class with a generic collection means you have far less code to maintain and has the advantage of using code that is known to work.

In the general case, there can sometimes be a reason to implement your own enumerator. You might want some functionality that the built-in one doesn't offer - some validation, logging, raising OnAccess-type events somewhere, perhaps some logic to lock items and release them afterwards for concurrent access (I've seen code that does that last one; it's odd and I wouldn't recommend it).
Having said that, I can't see anything like that in the example you've posted, so it doesn't seem to be adding any value beyond what IEnumerable provides. As a rule, if there's built-in code that does what you want, use it. All you'll achieve by rolling your own is to create more code to maintain.

The code you have looks like code that was written for .NET 1.0/1.1, before .NET generics were available - at that time, there was value in implementing your own collection class (generally derived from System.Collections.CollectionBase) so that the indexer property could be typed to the runtime type of the collection.
However, unless you were using value types and boxing/unboxing was the performance limitant, I would have inherited from CollectionBase and there would be no need to redefine GetEnumerator() or Count.
However, now, I would recommend one of these two approaches:
If you need the custom collection to have some custom functionality, then derive the collection from System.Collections.ObjectModel.Collection<Thing> - it provides all the necessary hooks for you to control insertion, replacement and deletion of items in the collection.
If you actually only need something that needs to be enumerated, I would return a standard IList<Thing> backed by a List<Thing>.

Unless you are doing something truly custom (such as some sort of validation) in the custom enumerator, there really isn't any reason to do this no.
Generally, go with what is available in the standard libraries unless there is definite reason not to. They are likely better tested and have more time spent on them, as individual units of code, then you can afford to spend, and why recreate the wheel?
In cases like this, the code already exists but it may still be better to replace the code if you have time to test very well. (It's a no-brainer if there is decent unit test coverage.)
You'll be reducing your maintenance overhead, removing a potential source of obscure bugs and leaving the code cleaner than you found it. Uncle Bob would be proud.

An array enumerator does pretty much the same as your custom enumerator, so yes, you can just as well return the array's enumerator directly.
In this case, I would recommend you do it, because array enumerators also perform more error checking and, as you stated, it's just simpler.

Related

Enhancing testability by decomposing batch tasks

I can't seem to find much information on this so I thought I'd bring it up here. One of the issues I often find myself running into is unit testing the creation of a single object while processing a list. For example, I'd have a method signature such as IEnumerable<Output> Process(IEnumerable<Input> inputs). When unit testing a single input I would create a list of one input and simply call First() on the results and ensure it is what I expect it to be. This would lead to something such as:
public class BatchCreator
{
public IEnumerable<Output> Create(IEnumerable<Input> inputs)
{
foreach (var input in inputs)
{
Console.WriteLine("Creating Output...");
yield return new Output();
}
}
}
My current thinking is that maybe one class should be responsible for the objects creation while another class be responsible for orchestrating my list of inputs. See example below.
public interface ICreator<in TInput, out TReturn>
{
TReturn Create(TInput input);
}
public class SingleCreator : ICreator<Input, Output>
{
public Output Create(Input input)
{
Console.WriteLine("Creating Output...");
return new Output();
}
}
public class CompositeCreator : ICreator<IEnumerable<Input>, IEnumerable<Output>>
{
private readonly ICreator<Input, Output> _singleCreator;
public CompositeCreator(ICreator<Input, Output> singleCreator)
{
_singleCreator = singleCreator;
}
public IEnumerable<Output> Create(IEnumerable<Input> inputs)
{
return inputs.Select(input => _singleCreator.Create(input));
}
}
With what's been posted above, I can easily test that I'm able to create one single instance of Output given an Input. Note that I do not need to call SingleCreator anywhere else in the code base other than from CompositeCreator. Creating ICreator would also give me the benefit of reusing it for other times I need to do similar tasks, which I currently do 2-3 other times in my current project
Anyone have any experience with this that could shed some light? Am I simply overthinking this? Suggestions are greatly appreciated.

Generally speaking, there's nothing inherently wrong with your reasoning. More or less that's how the issue can be solved.
However, your CompositeCreator isn't actually composite, since it uses precisely one "creation method".
It's difficult to say anything more, because we don't know your project internals, but if it integrates well into your use cases, then it's fine. What I'd try is stay with ICreator<Tin, Tout> only and make an extension method IEnumerable<Tout> CreateMany(this IEnumerable<Tin> c) to deal with collections. You can test both easily, independently (fake ICreator and check whether collection of inputs is processed). This way you get rid of ICreator<IEnumerable, ...>, which is usually good, because operating on collection as a whole and operating on individual items often don't go well together.

I'm not entirely sure why you need the IEnumerable input/output option, the composite creator, unless it is more than just a collection, as that's a problem solved by LINQ, which would look something like:
var singleCreator = new SingleCreator();
var outputs = InputEnumerable.Select(singleCreator.Create);
I think this is subjective, and depends on the complexity of the classes you are passing around - if it's not just an IEnumerable then it's worthwhile having some sort of multiple creator, which may or may not need to be a class.

What are the potential hazards of an object removing itself from a list it is in?

For instance, if I have an class:
public class StuffHolder
{
List<Stuff> myList;
public StuffHolder()
{
myList = newList<Stuff>();
myList.Add(new Stuff(myList));
myList[0].stuffHappens();
}
}
and a Stuff Object:
public class Stuff
{
List<Stuff> myList;
public Stuff(List<Stuff> myList)
{
this.myList = myList;
}
public void stuffHappens()
{
myList.Remove(this);
}
}
What are the disadvantages of calling stuffHappens() rather than having stuff pass the information that it should be removed to the StuffHolder class and having the StuffHolder class remove that specific Stuff?

There's a hazard if stuffHappens() ever occurs in more than one thread at a time, as the List<T> collection is not thread-safe.
The bigger hazard is the confusion of responsibility, as it probably shouldn't be the job of Stuff to know about it being stored in a collection. This kind of design 'fuzziness' causes steadily increasing confusion as systems grow and evolve.

It will certainly work (i.e. the Stuff object will be removed from the list).
The question is why you have a StuffHolder in the first place. Usually when you wrap a collection like that you are doing it to maintain some invariants or cache some data. Using the list like this means you could violate the invariant.
Essentially the issue is that StuffHolder has no idea that an object has been removed from it's list. It's up to you whether that is a problem for your particular situation.

It's possible from the code perspective, so it's ok. The answer depends on what you are modeling, the design you have made. May be some scenarios where that solution is not a good one and some scenarios where it is. If you want, you can share what are you trying to achieve and we can discuss then.
Hope it helps.

Which one is better to have auto-implemented property with private setter or private field and property just getter?

My question may be a part of an old topic - "properties vs fields".
I have situation where variable is read-only for outside class but needs to be modified inside a class. I can approach it in 2 ways:
First:
private Type m_Field;
public Type MyProperty { get { return m_Field; } }
Second:
public Type MyProperty { get; private set; }
After reading several articles (that mostly covered benefits of using public properties instead of public fields) I did not get idea if the second method has some advantage over the first one but writing less code. I am interested which one will be better practice to use in projects (and why) or it's just a personal choice.
Maybe this question does not belong to SO so I apologize in advance.

The second version produces less clutter, but is less flexible. I suggest you use the second version until you run into a situation that makes the first version necessary and then refactor - changes will be local to the class anyway, so don't worry too much about that!
Generally, writing less code is a good idea. The less code you write, the less bugs you produce :)

Second version is shorter, so I think it's usually better.
The exception is, when the only write access occurs in the constructor. Then I prefer the first version as this allows the field to be marked as readonly.

The second one will pretty much compile down to the first one anyway, so IMO always use the second as it's less & neater code.
The only scenarios I tend to use the first approach are when I want to lazily load a property e.g.
private List<string> _items;
...
public List<string> Items
{
get
{
if (_items == null)
{
_items = new List<string>();
// load items
}
return _items;
}
}

Please go through the following question. IT seems like same ,
https://softwareengineering.stackexchange.com/questions/72495/net-properties-use-private-set-or-readonly-property

For debugging the second is the best. Otherwise you'll have to put breakpoins at each place where you set the field. With the second you put one breakpoint on the set of the property.

Personally I prefer the second version because it is less to write so I can use the time to do more complex coding.... plus in my opinion it promotes lazy development

Need a C# example of unintended consequences

I am putting together a presentation on the benefits of Unit Testing and I would like a simple example of unintended consequences: Changing code in one class that breaks functionality in another class.
Can someone suggest a simple, easy to explain an example of this?
My plan is to write unit tests around this functionality to demonstrate that we know we broke something by immediately running the test.

A slightly simpler, and thus perhaps clearer, example is:
public string GetServerAddress()
{
return "172.0.0.1";
}
public void DoSomethingWithServer()
{
Console.WriteLine("Server address is: " + GetServerAddress());
}
If GetServerAddress is changes to return an array:
public string[] GetServerAddress()
{
return new string[] { "127.0.0.1", "localhost" };
}
The output from DoSomethingWithServer will be somewhat different, but it will all still compile, making for an even subtler bug.
The first (non-array) version will print Server address is: 127.0.0.1 and the second will print Server address is: System.String[], this is something I've also seen in production code. Needless to say it's no longer there!

Here's an example:
class DataProvider {
public static IEnumerable<Something> GetData() {
return new Something[] { ... };
}
}
class Consumer {
void DoSomething() {
Something[] data = (Something[])DataProvider.GetData();
}
}
Change GetData() to return a List<Something>, and Consumer will break.
This might seen somewhat contrived, but I've seen similar problems in real code.

Say you have a method that does:
abstract class ProviderBase<T>
{
public IEnumerable<T> Results
{
get
{
List<T> list = new List<T>();
using(IDataReader rdr = GetReader())
while(rdr.Read())
list.Add(Build(rdr));
return list;
}
}
protected abstract IDataReader GetReader();
protected T Build(IDataReader rdr);
}
With various implementations being used. One of them is used in:
public bool CheckNames(NameProvider source)
{
IEnumerable<string> names = source.Results;
switch(names.Count())
{
case 0:
return true;//obviously none invalid.
case 1:
//having one name to check is a common case and for some reason
//allows us some optimal approach compared to checking many.
return FastCheck(names.Single());
default:
return NormalCheck(names)
}
}
Now, none of this is particularly weird. We aren't assuming a particular implementaiton of IEnumerable. Indeed, this will work for arrays and very many commonly used collections (can't think of one in System.Collections.Generic that doesn't match off the top of my head). We've only used the normal methods, and the normal extension methods. It's not even unusual to have an optimised case for single-item collections. We could for instance change the list to be an array, or maybe a HashSet (to automatically remove duplicates), or a LinkedList or a few other things and it'll keep working.
Still, while we aren't depending on a particular implementation, we are depending on a particular feature, specifically that of being rewindable (Count() will either call ICollection.Count or else enumerate through the enumerable, after which the name-checking will take place.
Someone though sees Results property and thinks "hmm, that's a bit wasteful". They replace it with:
public IEnumerable<T> Results
{
get
{
using(IDataReader rdr = GetReader())
while(rdr.Read())
yield return Build(rdr);
}
}
This again is perfectly reasonable, and will indeed lead to a considerable performance boost in many cases. If CheckNames isn't hit in the immediate "tests" done by the coder in question (maybe it isn't hit in a lot of code paths), then the fact that CheckNames will error (and possibly return a false result in the case of more than 1 name, which may be even worse, if it opens a security risk).
Any unit test that hits on CheckNames with the more than zero results is going to catch it though.
Incidentally a comparable (if more complicated) change is a reason for a backwards-compatibility feature in NPGSQL. Not quite as simple as just replacing a List.Add() with a return yield, but a change to the way ExecuteReader worked gave a comparable change from O(n) to O(1) to get the first result. However, before then NpgsqlConnection allowed users to obtain another reader from a connection while the first was still open, and after it didn't. The docs for IDbConnection says you shouldn't do this, but that didn't mean there was no running code that did. Luckily one such piece of running code was an NUnit test, and a backwards-compatibility feature added to allow such code to continue to function with just a change to configuration.

Implementing a "LazyProperty" class - is this a good idea?

I often find myself writing a property that is evaluated lazily. Something like:
if (backingField == null)
backingField = SomeOperation();
return backingField;
It is not much code, but it does get repeated a lot if you have a lot of properties.
I am thinking about defining a class called LazyProperty:
public class LazyProperty<T>
{
private readonly Func<T> getter;
public LazyProperty(Func<T> getter)
{
this.getter = getter;
}
private bool loaded = false;
private T propertyValue;
public T Value
{
get
{
if (!loaded)
{
propertyValue = getter();
loaded = true;
}
return propertyValue;
}
}
public static implicit operator T(LazyProperty<T> rhs)
{
return rhs.Value;
}
}
This would enable me to initialize a field like this:
first = new LazyProperty<HeavyObject>(() => new HeavyObject { MyProperty = Value });
And then the body of the property could be reduced to:
public HeavyObject First { get { return first; } }
This would be used by most of the company, since it would go into a common class library shared by most of our products.
I cannot decide whether this is a good idea or not. I think the solutions has some pros, like:
Less code
Prettier code
On the downside, it would be harder to look at the code and determine exactly what happens - especially if a developer is not familiar with the LazyProperty class.
What do you think ? Is this a good idea or should I abandon it ?
Also, is the implicit operator a good idea, or would you prefer to use the Value property explicitly if you should be using this class ?
Opinions and suggestions are welcomed :-)

Just to be overly pedantic:
Your proposed solution to avoid repeating code:
private LazyProperty<HeavyObject> first =
new LazyProperty<HeavyObject>(() => new HeavyObject { MyProperty = Value });
public HeavyObject First {
get {
return first;
}
}
Is actually more characters than the code that you did not want to repeat:
private HeavyObject first;
public HeavyObject First {
get {
if (first == null) first = new HeavyObject { MyProperty = Value };
return first;
}
}
Apart from that, I think that the implicit cast made the code very hard to understand. I would not have guessed that a method that simply returns first, actually end up creating a HeavyObject. I would at least have dropped the implicit conversion and returned first.Value from the property.

Don't do it at all.
Generally using this kind of lazy initialized properties is a valid design choice in one case: when SomeOperation(); is an expensive operation (in terms of I/O, like when it requires a DB hit, or computationally) AND when you are certain you will often NOT need to access it.
That said, by default you should go for eager initialization, and when profiler says it's your bottleneck, then change it to lazy initialization.
If you feel urge to create that kind of abstraction, it's a smell.

Surely you'd at least want the LazyPropery<T> to be a value type, otherwise you've added memory and GC pressure for every "lazily-loaded" property in your system.
Also, what about multiple-threaded scenarios? Consider two threads requesting the property at the same time. Without locking, you could potentially create two instances of the underlying property. To avoid locking in the common case, you would want to do a double-checked lock.

I prefer the first code, because a) it is such a common pattern with properties that I immediately understand it, and b) the point you raised: that there is no hidden magic that you have to go look up to understand where and when the value is being obtained.

I like the idea in that it is much less code and more elegant, but I would be very worried about the fact that it becomes hard to look at it and tell what is going on. The only way I would consider it is to have a convention for variables set using the "lazy" way, and also to comment anywhere it is used. Now there isn't going to be a compiler or anything that will enforce those rules, so still YMMV.
In the end, for me, decisions like this boil down to who is going to be looking at it and the quality of those programmers. If you can trust your fellow developers to use it right and comment well then go for it, but if not, you are better off doing it in a easily understood and followed way. /my 2cents

I don't think worrying about a developer not understanding is a good argument against doing something like this...
If you think that then you couldn't do anything for the fear of someone not understanding what you did
You could write a tutorial or something in a central repository, we have here a wiki for these kind of notes
Overall, I think it's a good implementation idea (not wanting to start a debate whether lazyloading is a good idea or not)

What I do in this case is I create a Visual Studio code snippet. I think that's what you really should do.
For example, when I create ASP.NET controls, I often times have data that gets stored in the ViewState a lot, so I created a code snippet like this:
public Type Value
{
get
{
if(ViewState["key"] == null)
ViewState["key"] = someDefaultValue;
return (Type)ViewState["key"];
}
set{ ViewState["key"] = value; }
}
This way, the code can be easily created with only a little work (defining the type, the key, the name, and the default value). It's reusable, but you don't have the disadvantage of a complex piece of code that other developers might not understand.

I like your solution as it is very clever but I don't think you win much by using it. Lazy loading a private field in a public property is definitely a place where code can be duplicated. However this has always struck me as a pattern to use rather than code that needs to be refactored into a common place.
Your approach may become a concern in the future if you do any serialization. Also it is more confusing initially to understand what you are doing with the custom type.
Overall I applaud your attempt and appreciate its cleverness but would suggest that you revert to your original solution for the reasons stated above.

Personally, I don't think the LazyProperty class as is offers enough value to justify using it especially considering the drawbacks using it for value types has (as Kent mentioned). If you needed other functionality (like making it multithreaded), it might be justified as a ThreadSafeLazyProperty class.
Regarding the implicit property, I like the "Value" property better. It's a little more typing, but a lot more clear to me.

I think this is an interesting idea. First I would recommend that you hide the Lazy Property from the calling code, You don't want to leak into your domain model that it is lazy. Which your doing with the implicit operator so keep that.
I like how you can use this approach to handle and abstract away the details of locking for example. If you do that then I think there is value and merit. If you do add locking watch out for the double lock pattern it's very easy to get it wrong.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Should I define custom enumerator or use built-in one? - c#

With generics, there is really little value in implementing IEnumerable and IEnumerator yourself. Removing these are replacing the class with a generic collection means you have far less code to maintain and has the advantage of using code that is known to work.

An array enumerator does pretty much the same as your custom enumerator, so yes, you can just as well return the array's enumerator directly. In this case, I would recommend you do it, because array enumerators also perform more error checking and, as you stated, it's just simpler.

Related

Enhancing testability by decomposing batch tasks

What are the potential hazards of an object removing itself from a list it is in?

Which one is better to have auto-implemented property with private setter or private field and property just getter?

Need a C# example of unintended consequences

Implementing a "LazyProperty" class - is this a good idea?

Categories

Resources