I know I'm doing validation wrong. Please persuade me to stop :) - c#

First let me explain how I currently handle validation, say, for an IPv4 address:
public struct IPv4Address {
private string value;
private IPv4Address(string value) {
this.value = value;
}
private static IPv4Address CheckSyntax(string value) {
// If everything's fine...
return new IPv4Address(value);
// If something's wrong with the syntax...
throw new ApplicationException("message");
}
public static implicit operator IPv4Address(string value) {
return CheckSyntax(value);
}
public static implicit operator string(IPv4Address address) {
return address.value;
}
}
I have a bunch of structs like this one.
They often have additional private members to handle things but not methods are exposed publicly.
Here is a sample usage:
IPv4Address a;
IPv4Address b = "1.2.3.4";
a = b;
b = "5.6.7.8";
string address = a;
// c contains "1.2.3.4"
IPv4Address c = address;
// See if it's valid
try {
IPv4Address d = "111.222.333.444";
}
catch (ApplicationException e) {
// Handle the exception...
}
I can feel that there's something very disturbing with this, hence I'm pondering about switching to a static class with methods like IsIPv4Address and so on.
Now, here's what I think is wrong with the approach above:
New team members will have to wrap their heads around this
It might impede integration with 3rd party code
Exceptions are expensive
Never seen anything like this and I am a conservative type at heart :)
And then what I like about it:
Very close to having a lot of specialized primitives since you have value types.
In practice they can be often used like primitive types would, for example it isn't a problem passing the struct above to a method that accepts a string.
And, my favourite, you can pass these structs between objects and be sure that they contain a syntactically valid value. This also avoids having to always check for correctness, which can be expensive if done unnecessarily and even forgotten.
I can't find a fatal flaw to the approach above (just a beginner here), what do you think?
Edit: as you can infer from the first line this is only an example, I'm not asking for a way to validate an IP address.

First of all, you should read posts about implicit casting, and when to use it (and why it's bad to use it in your scenario), you can start here.
If you need to have checking methods, they should be public static, instead of such strange constructs, beside this, having such methods allows you to choose if you want to throw exceptions (like .Parse() methods do), or signal by returning some value that should be checked (like .TryParse() methods).
Beside this, having static methods for creating valid objects doesn't mean you can't use value type (struct) instead of a class, if you really want to. Also, remember that structs have implicit empty constructor, that you can't "hide", so even your construct can be used like this:
IPv4Address a = new IPv4Address();
which will give you invalid struct (value will be null).

I also very much like the concept of mimicking the primitive patterns in the framework, so I would follow that through and lean toward IpV4Address.Parse("1.2.3.4") (along with TryParse) rather than implicit cast.

Seems like a whole lot of complexity with no benefit. Just because you can doesn't mean you should.
Instead of a CheckSyntax(string value) that returns a IP (This method is poorly worded btw) I would just have something like a
bool IsIP(string)
Then you could put this in a utility class, or a base class, or a separate abstraction someplace.

From the MSDN topic on implicit conversions
The pre-defined implicit conversions
always succeed and never cause
exceptions to be thrown. Properly
designed user-defined implicit
conversions should exhibit these
characteristics as well
Microsoft itself has not always followed this recommendation. The following use of the implicit operator of System.Xml.Linq.XName throws an XmlException:
XName xname = ":";
Maybe the designers of System.Xml.Linq got away with it by not documenting the exception :-)

Related

Using object type as a parameter

I've been wondering about this: would you support using simply an object as a parameter in your method? My reason behind doing this would be overloading. Currently, I'm trying to create a method that caters to many different datatypes: string, decimal, DateTime... the list goes on.
It's getting a bit messy though, so I was thinking of doing the following
public void GenericMethod(object val)
{
if (val is string)
// process as string
else if (val is decimal)
// process as decimal
else if (val is DateTime)
// do something for dt
else
// ...
}
What do you think of such a method? Would it incur unnecessary overhead? (during type checking) Have you implemented it? Tell me...
EDIT: Yeah, and just a sidenote, I'm kinda familiar with overloading. But it gets a little annoying when there are like more than 10 overloads...
Yes, that would work. However there are better ways of doing this.
Your best bet is to use overloads:
public void GenericMethod(string val)
{
// process as string
}
public void GenericMethod(decimal val)
{
// process as decimal
}
etc.
Whenever you use a is keyword in your code that's a huge hint that you're probably forgetting to use some important O.O. principles: overloads, subclasses, or the like.
Overloads aren't actually that annoying to work with, just to write. Remember, you're not coding this for yourself today, you're coding this for yourself three months from now when you have to read the code and figure out why the heck you did it that way, or where in the world that bug comes from.
Yet another reason to avoid the "switch on type" technique is for consistency with the .NET framework (and thus people's expectations). Follow Console.Write and the other wide variety of methods that are overridden within and under a given class.
I've been wondering about this: would you support using simply an object as a parameter in your method?
Very rarely. If there's a fixed set of types which are properly supported - and you'll throw an exception otherwise - then I'd use overloads.
If you can actually accept any type, and you'll handle a not-specially-supported type in some well-known way, then it's okay to accept object. That's what LINQ to XML does all over the place, and the result is a very clean API. I'd do it very carefully though - it's rarely a good idea.
And yes, there'd be an overhead. I wouldn't usually make that the basis of the decision though - the overhead will be small enough to be negligible in most cases. Design your API as cleanly as you can, then work out whether it's going to cause a bottleneck.
Yes, it would incur overhead for both type checking and boxing/unboxing the value type. I would recommend the overloads.
Another possibility, as long as you aren't doing a lot of math with the number, would be to make it a generic method. Arithmetic is rather difficult with generics though, as there are no constraints for value types which enables the use of operators.
No need of those!
Just declare as many methods with same name as you want and take each type as argument in each method.[This is called Overloading. e.g. You may have seen that +1 Overloads beside Methods, which implies that there is 1 more Method with same name but with different argument types]
Say Like this :
void Method(decimal d)
{
//Process Decimal
}
void Method(string s)
{
//Process String
}
By default, It will find its own method according to the Type.
There are cases where your approach makes sense. I've used it before, mostly when I have a bunch of processing that is the same for different data types.
But this is not overloading. Overloading would be to define different signatures for the same method name like this:
public void GenericMethod(string val)
{
// process as string
}
public void GenericMethod(decimal val)
{
// process as decimal
}
public void GenericMethod(DateTime val)
{
// do something for dt
}
// Etc.
And for some cases, this approach makes more sense.
Implementing many overloads one of them takes object is no problem. Take a look at Console.WriteLine overloads for example. http://msdn.microsoft.com/en-us/library/system.console.writeline.aspx However, take care that int for example can conflict with double:
int sum(int i, int j)
{
return i + j;
}
double sum(double i, double j)
{
return i + j;
}
object sum(object i, object j)
{
return i.ToString() + j.ToString();
}
==============================
static void Main()
{
sum(1, 2); // Error: ambigous call between `int` and `double` versions
sum(1.0, 2.0); // calls double version, although 1.0 and 2.0 are objects too
sum("Hello", "World"); // object
}

Non-read only alternative to anonymous types

In C#, an anonymous type can be as follows:
method doStuff(){
var myVar = new {
a = false,
b = true
}
if (myVar.a)
{
// Do stuff
}
}
However, the following will not compile:
method doStuff(){
var myVar = new {
a = false,
b = true
}
if (myVar.a)
{
myVar.b = true;
}
}
This is because myVar's fields are read-only and cannot be assigned to. It seems wanting to do something like the latter is fairly common; perhaps the best solution I've seen is to just define a struct outside the method.
However, is there really no other way to make the above block work? The reason it bothers me is, myVar is a local variable of this field, so it seems like it should only be referred to inside the method that uses it. Besides, needing to place the struct outside of the method can make the declaration of an object quite far from its use, especially in a long method.
Put in another way, is there an alternative to anonymous types which will allow me to define a "struct" like this (I realize struct exists in C# and must be defined outside of a method) without making it read-only? If no, is there something fundamentally wrong with wanting to do this, and should I be using a different approach?
No, you'll have to create your own class or struct to do this (preferrably a class if you want it to be mutable - mutable structs are horrible).
If you don't care about Equals/ToString/GetHashCode implementations, that's pretty easy:
public class MyClass {
public bool Foo { get; set; }
public bool Bar { get; set; }
}
(I'd still use properties rather than fields, for various reasons.)
Personally I usually find myself wanting an immutable type which I can pass between methods etc - I want a named version of the existing anonymous type feature...
Is there an alternative to anonymous types which will allow me to concisely define a simple "record" type like this without making it read-only?
No. You'll have to make a nominal type.
If no, is there something fundamentally wrong with wanting to do this?
No, it's a reasonable feature that we have considered before.
I note that in Visual Basic, anonymous types are mutable if you want them to be.
The only thing that is really "fundamentally wrong" about a mutable anonymous type is that it would be dangerous to use one as a hash key. We designed anonymous types with the assumptions that (1) you're going to use them as the keys in equijoins in LINQ query comprehensions, and (2) in LINQ-to-Objects and other implementations, joins will be implemented using hash tables. Therefore anonymous types should be useful as hash keys, and mutable hash keys are dangerous.
In Visual Basic, the GetHashCode implementation does not consume any information from mutable fields of anonymous types. Though that is a reasonable compromise, we simply decided that in C# the extra complexity wasn't worth the effort.
In C# 7 we can leverage named tuples to do the trick:
(bool a, bool b) myVar = (false, true);
if (myVar.a)
{
myVar.b = true;
}
You won't be able to get the nice initialization syntax but the ExpandoObject class introduced in .NET 4 would serve as a viable solution.
dynamic eo = new ExpandoObject();
eo.SomeIntValue = 5;
eo.SomeIntValue = 10; // works fine
For the above types of operation, you should define your own mutable STRUCT. Mutable structs may pose a headache for compiler writers like Eric Lippert, and there are some unfortunate limitations in how .net handles them, but nonetheless the semantics of mutable "Plain Old Data" structs (structs in which all fields are public, and the only public functions which write this are constructors, or are called exclusively from constructors) offer far clearer semantics than can be achieved via classes.
For example, consider the following:
struct Foo {
public int bar;
...other stuff;
}
int test(Action<Foo[]> proc1, Action<Foo> proc2)
{
foo myFoos[] = new Foo[100];
proc1(myFoos);
myFoos[4].bar = 9;
proc2(myFoos[4]); // Pass-by-value
return myFoos[4].bar;
}
Assuming there's no unsafe code and that the passed-in delegates can be called and will return in finite time, what will test() return? The fact that Foo is a struct with a public field bar is sufficient to answer the question: it will return 9, regardless of what else appears in the declaration of Foo, and regardless of what functions are passed in proc1 and proc2. If Foo were a class, one would have to examine every single Action<Foo[]> and Action<Foo> that exists, or will ever exist, to know what test() would return. Determining that Foo is a struct with public field bar seems much easier than examining all past and future functions that might get passed in.
Struct methods which modify this are handled particularly poorly in .net, so if one needs to use a method to modify a struct, it's almost certainly better to use one of these patterns:
myStruct = myStruct.ModifiedInSomeFashion(...); // Approach #1
myStructType.ModifyInSomeFashion(ref myStruct, ...); // Approach #2
than the pattern:
myStruct.ModifyInSomeFashion(...);
Provided one uses the above approach to struct-modifying patterns, however, mutable structs have the advantage of allowing code which is both more efficient and easier to read than immutable structs or immutable classes, and is much less trouble-prone than mutable classes. For things which represent an aggregation of values, with no identity outside the values they contain, mutable class types are often the worst possible representation.
I find it really annoying that you can't set anonymous properties as read/write as you can in VB - often I want to return data from a database using EF/LINQ projection, and then do some massaging of the data in c# that can't be done at the database for whatever reason. The easiest way to do this is to iterate over existing anonymous instances and update properties as you go. NOTE this is not so bad now in EF.Core, as you can mix db functions and .net functions in a single query finally.
My go-to workaround is to use reflection and will be frowned upon and down-voted but works; buyer beware if the underlying implementation changes and all your code breaks.
public static class AnonClassHelper {
public static void SetField<T>(object anonClass, string fieldName, T value) {
var field = anonClass.GetType().GetField($"<{fieldName}>i__Field", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
field.SetValue(anonClass, value);
}
}
// usage
AnonClassHelper.SetField(inst, nameof(inst.SomeField), newVal);
An alternative I have used when dealing with strings is to make properties of type StringBuilder, then these individual properties will be settable via the StringBuilder methods after you have an instance of your anonymous type.
I know it is really old question but how about replacing whole anonymous
object:
`
if (myVar.a)
{
myVar = new
{ a = false, b = true };
}
`

Is it necessarily bad style to ignore the return value of a method

Let's say I have a C# method public void CheckXYZ(int xyz) {
// do some operation with side effects
}
Elsewhere in the same class is another method public int GetCheckedXYZ(int xyz) {
int abc;
// functionally equivalent operation to CheckXYZ,
// with additional side effect of assigning a value to abc
return abc; // this value is calculated during the check above
}
Is it necessarily bad style to refactor this by removing the CheckXYZ method, and replacing all existing CheckXYZ() calls with GetCheckedXYZ(), ignoring the return value? The returned type isn't IDisposable in this case. Does it come down to discretion?
EDIT: After all the responses, I've expanded the example a little. (Yes, I realise it's now got out in it, it's especially for #Steven)
public void EnsureXYZ(int xyz) {
if (!cache.ContainsKey(xyz))
cache.Add(xyz, random.Next());
}
public int AlwaysGetXYZ(int xyz) {
int abc;
if (!cache.TryGetValue(xyz, out abc))
{
abc = random.Next();
cache.Add(xyz, abc);
}
return abc;
}
It entirely depends upon what that return value is telling you and if that is important to know or not. If the data returned by the method is not relevant to the code that is calling it then ignoring it is entirely valid. But if it indicates some failure/counter/influential value then ignore it at your peril.
Usually it's bad style, yes. It's allowed and ok where methods return an instance of the class for chaining (foo.bar().baz().xyz().asdf() => asdf returns the instance foo but you don't need it anymore)
In your case the point of bad style wouldn't be the ignored return value but the methods with side effects. A CheckXyz() function should always return a boolean and have no further side effects.
In general, side effects are bad and if you call a method and can ignore the returned value it means that the method/object/library/program might be poorly designed.
A common C convention is to write this:
(void)GetCheckedXYZ();
The cast to void has no effect, but by convention it shows that the developer knows that the return value is being ignored, i.e. it shows it's deliberate.
C# won't let you do that, but I've seen this instead (Also in Java):
/*(void)*/GetCheckedXYZ();
Which some may feel lacks aesthetics, but it does convey the intent of the developer without resorting to alternative versions of methods, which to my eye seems worse.
A lot of these answers have good points. I'd just add that if you choose to ignore a return value, then comment it along the lines of "don't care about the return value because...", so that the next person who comes into the code will see that you have not missed it by accident and that you have thought things through
EDIT: better still, put your comment inside an empty block
if (!something()) {
// Not worried if this fails because blah
}
IMHO it's generally best to have one (and only one) way of doing things to avoid duplicating your codebase. Generally though, if you sometimes use and sometimes don't use the return value, it's probably a sign that your code could be broken down in a better way. In your example, it would probably be fine if both of those functions called a third (common) function to avoid duplicating the core functionality.
Error codes should always be checked for and handled but if the function's just returning information then what you do with that information is up to you.
[Edit] ...and as dbemerlin points out, side effects should be avoided wherever possible.
you could work with out parameters as well.

C# myths about best practices?

My colleague keeps telling me of the things listed in comments.
I am confused.
Can somebody please demystify these things for me?
class Bar
{
private int _a;
public int A
{
get { return _a; }
set { _a = value; }
}
private Foo _objfoo;
public Foo OFoo
{
get { return _objfoo; }
set { _objfoo = value; }
}
public Bar(int a, Foo foo)
{
// this is a bad idea
A = a;
OFoo = foo;
}
// MYTHS
private void Method()
{
this.A //1 -
this._a //2 - use this when inside the class e.g. if(this._a == 2)
A //3 - use this outside the class e.g. barObj.A
_a //4 -
// Not using this.xxx creates threading issues.
}
}
class Foo
{
// implementation
}
The this. is redundant if there isn't a name collision. You only need it when you need a reference to the current object or if you have an argument with the same name as a field.
Threading issues have nothing to do with it. The confusion maybe comes from the fact that most static members are implemented so that they are thread-safe and static members cannot (!) be called with this. since they aren't bound to the instance.
"Not using this.xxx creates threading
issues"
is a complete myth. Just ask your co-worker to check the generate IL and have him explain why they are the same whether you add this or not.
"use this when inside the class e.g.
if(this._a == 2)"
is down to what you want to achieve. What your co-worker seems to be saying is always reference the private field, which does not seem to me sensible. Often you want to access the public property, even inside a class, since the getter may modify the value (for instance, a property of type List may return a new List instance when the list is null to avoid null reference exceptions when accessing the property).
My personal "best practice" is to always use this. Yes it's redundant but it's great way to identify from the first look where the state of the instance is chaged or retrieved when you consider multi-threaded app.
It may help to ask your co-worker why he considers these suggestions are best practice? Often people quote best-practice "rules" that they have picked up somewhere without any real understanding of the reasons behind the practices.
As Lucero says, the "this" is not required unless () there is a name collision. However, some people like to include the "this" when it is not strictly required, because they believe it enhances readability / more clearly shows the programmers intentions. In my opinion, this is a matter of personal preference rather than anything else.
As for the "bad idea" in your "Bar" method: Your co-worker may consider this bad practice for the following reason: if the setter method for "A" is altered to have some side effect then A=a; will also produce this side effect, whereas _a = a; will just set the private variable. In my view, best practice is a matter of being aware of the difference rather than prefering one over another.
Finally, the "threading issues" are nonsense - AFAIK "this" has nothing to do with threading.
The number 2 is a myth that is easily debunked by mentioning automatic properties. Automatic properties allow you to define a property without the backing field which is automatically generated by the compiler. So ask your co-worker what is his opinion about automatic properties.

why C# does not provide internal helper for passing property as reference?

This is issue about LANGUAGE DESIGN.
Please do not answer to the question until you read entire post! Thank you.
With all helpers existing in C# (like lambdas, or automatic properties) it is very odd for me that I cannot pass property by a reference. Let's say I would like to do that:
foo(ref my_class.prop);
I get error so I write instead:
{
var tmp = my_class.prop;
foo(tmp);
my_class.prop = tmp;
}
And now it works. But please notice two things:
it is general template, I didn't put anywhere type, only "var", so it applies for all types and number of properties I have to pass
I have to do it over and over again, with no benefit -- it is mechanical work
The existing problem actually kills such useful functions as Swap. Swap is normally 3 lines long, but since it takes 2 references, calling it takes 5 lines. Of course it is nonsense and I simply write "swap" by hand each time I would like to call it. But this shows C# prevents reusable code, bad.
THE QUESTION
So -- what bad could happen if compiler automatically create temporary variables (as I do by hand), call the function, and assign the values back to properties? Is this any danger in it? I don't see it so I am curious what do you think why the design of this issue looks like it looks now.
Cheers,
EDIT As 280Z28 gave great examples for beating idea of automatically wrapping ref for properties I still think wrapping properties with temporary variables would be useful. Maybe something like this:
Swap(inout my_class.prop1,inout my_class.prop2);
Otherwise no real Swap for C# :-(
There are a lot of assumptions you can make about the meaning and behavior of a ref parameter. For example,
Case 1:
int x;
Interlocked.Increment(ref x);
If you could pass a property by ref to this method, the call would be the same but it would completely defeat the semantics of the method.
Case 2:
void WaitForCompletion(ref bool trigger)
{
while (!trigger)
Thread.Sleep(1000);
}
Summary: A by-ref parameter passes the address of a memory location to the function. An implementation creating a temporary variable in order to "pass a property by reference" would be semantically equivalent to passing by value, which is precisely the behavior that you're disallowing when you make the parameter a ref one.
Your proposal is called "copy in - copy out" reference semantics. Copy-in-copy-out semantics are subtly different from what we might call "ref to variable" semantics; different enough to be confusing and wrong in many situations. Others have already given you some examples; there are plenty more. For example:
void M() { F(ref this.p); }
void F(ref int x) { x = 123; B(); }
void B() { Console.WriteLine(this.p); }
If "this.p" is a property, with your proposal, this prints the old value of the property. If it is a field then it prints the new value.
Now imagine that you refactor a field to be a property. In the real language, that causes errors if you were passing a field by ref; the problem is brought to your attention. With your proposal, there is no error; instead, behaviour changes silently and subtly. That makes for bugs.
Consistency is important in C#, particularly in parts of the language that people find confusing, like reference semantics. I would want either references to always be copy-in-copy-out or never copy-in-copy-out. Doing it one way sometimes and another way other times seems like really bad design for C#, a language which values consistency over brevity.
Because a property is a method. It is a language construct responding to a pattern of encapsulating the setting and retrieval of a private field through a set of methods. It is functionally equivalent to this:
class Foo
{
private int _bar;
public int GetBar( ) { return _bar; }
public void SetBar( ) { _bar = value; }
}
With a ref argument, changes to the underlying variable will be observed by the method, this won't happen in your case. In other words, it is not exactly the same.
var t = obj.prop;
foo(ref t);
obj.prop = t;
Here, side effects of getter and setter are only visible once each, regardless of how many times the "by-ref" parameter got assigned to.
Imagine a dynamically computed property. Its value might change at any time. With this construct, foo is not kept up to date even though the code suggests this ("I'm passing the property to the method")
So -- what bad could happen if
compiler automatically create
temporary variables (as I do by hand),
call the function, and assign the
values back to properties? Is this any
danger in it?
The danger is that the compiler is doing something you don't know. Making the code confusing because properties are methods, not variables.
I'll provide just one simple example where it would cause confusion. Assume it was possible (as is in VB):
class Weird {
public int Prop { get; set; }
}
static void Test(ref int x) {
x = 42;
throw new Exception();
}
static void Main() {
int v = 10;
try {
Test(ref v);
} catch {}
Console.WriteLine(v); // prints 42
var c = new Weird();
c.Prop = 10;
try {
Test(ref c.Prop);
} catch {}
Console.WriteLine(c.Prop); // prints 10!!!
}
Nice. Isn't it?
Because, as Eric Lippert is fond of pointing out, every language feature must be understood, designed, specified, implemented, tested and documented. And it's obviously not a common scenario/pain point.

Categories