I think I understand strong typing, but every time I look for examples for what is weak typing I end up finding examples of programming languages that simply coerce/convert types automatically.
For instance, in this article named Typing: Strong vs. Weak, Static vs. Dynamic says that Python is strongly typed because you get an exception if you try to:
Python
1 + "1"
Traceback (most recent call last):
File "", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'
However, such thing is possible in Java and in C#, and we do not consider them weakly typed just for that.
Java
int a = 10;
String b = "b";
String result = a + b;
System.out.println(result);
C#
int a = 10;
string b = "b";
string c = a + b;
Console.WriteLine(c);
In this another article named Weakly Type Languages the author says that Perl is weakly typed simply because I can concatenate a string to a number and viceversa without any explicit conversion.
Perl
$a=10;
$b="a";
$c=$a.$b;
print $c; #10a
So the same example makes Perl weakly typed, but not Java and C#?.
Gee, this is confusing
The authors seem to imply that a language that prevents the application of certain operations on values of different types is strongly typed and the contrary means weakly typed.
Therefore, at some point I have felt prompted to believe that if a language provides a lot of automatic conversions or coercion between types (as perl) may end up being considered weakly typed, whereas other languages that provide only a few conversions may end up being considered strongly typed.
I am inclined to believe, though, that I must be wrong in this interepretation, I just do not know why or how to explain it.
So, my questions are:
What does it really mean for a language to be truly weakly typed?
Could you mention any good examples of weakly typing that are not related to automatic conversion/automatic coercion done by the language?
Can a language be weakly typed and strongly typed at the same time?
UPDATE: This question was the subject of my blog on the 15th of October, 2012. Thanks for the great question!
What does it really mean for a language to be "weakly typed"?
It means "this language uses a type system that I find distasteful". A "strongly typed" language by contrast is a language with a type system that I find pleasant.
The terms are essentially meaningless and you should avoid them. Wikipedia lists eleven different meanings for "strongly typed", several of which are contradictory. This indicates that the odds of confusion being created are high in any conversation involving the term "strongly typed" or "weakly typed".
All that you can really say with any certainty is that a "strongly typed" language under discussion has some additional restriction in the type system, either at runtime or compile time, that a "weakly typed" language under discussion lacks. What that restriction might be cannot be determined without further context.
Instead of using "strongly typed" and "weakly typed", you should describe in detail what kind of type safety you mean. For example, C# is a statically typed language and a type safe language and a memory safe language, for the most part. C# allows all three of those forms of "strong" typing to be violated. The cast operator violates static typing; it says to the compiler "I know more about the runtime type of this expression than you do". If the developer is wrong, then the runtime will throw an exception in order to protect type safety. If the developer wishes to break type safety or memory safety, they can do so by turning off the type safety system by making an "unsafe" block. In an unsafe block you can use pointer magic to treat an int as a float (violating type safety) or to write to memory you do not own. (Violating memory safety.)
C# imposes type restrictions that are checked at both compile-time and at runtime, thereby making it a "strongly typed" language compared to languages that do less compile-time checking or less runtime checking. C# also allows you to in special circumstances do an end-run around those restrictions, making it a "weakly typed" language compared with languages which do not allow you to do such an end-run.
Which is it really? It is impossible to say; it depends on the point of view of the speaker and their attitude towards the various language features.
As others have noted, the terms "strongly typed" and "weakly typed" have so many different meanings that there's no single answer to your question. However, since you specifically mentioned Perl in your question, let me try to explain in what sense Perl is weakly typed.
The point is that, in Perl, there is no such thing as an "integer variable", a "float variable", a "string variable" or a "boolean variable". In fact, as far as the user can (usually) tell, there aren't even integer, float, string or boolean values: all you have are "scalars", which are all of these things at the same time. So you can, for example, write:
$foo = "123" + "456"; # $foo = 579
$bar = substr($foo, 2, 1); # $bar = 9
$bar .= " lives"; # $bar = "9 lives"
$foo -= $bar; # $foo = 579 - 9 = 570
Of course, as you correctly note, all of this can be seen as just type coercion. But the point is that, in Perl, types are always coerced. In fact, it's quite hard for a user to tell what the internal "type" of a variable might be: at line 2 in my example above, asking whether the value of $bar is the string "9" or the number 9 is pretty much meaningless, since, as far as Perl is concerned, those are the same thing. Indeed, it's even possible for a Perl scalar to internally have both a string and a numeric value at the same time, as is e.g. the case for $foo after line 2 above.
The flip side of all this is that, since Perl variables are untyped (or, rather, don't expose their internal type to the user), operators cannot be overloaded to do different things for different types of arguments; you can't just say "this operator will do X for numbers and Y for strings", because the operator can't (won't) tell which kind of values its arguments are.
Thus, for example, Perl has and needs both a numeric addition operator (+) and a string concatenation operator (.): as you saw above, it's perfectly fine to add strings ("1" + "2" == "3") or to concatenate numbers (1 . 2 == 12). Similarly, the numeric comparison operators ==, !=, <, >, <=, >= and <=> compare the numeric values of their arguments, while the string comparison operators eq, ne, lt, gt, le, ge and cmp compare them lexicographically as strings. So 2 < 10, but 2 gt 10 (but "02" lt 10, while "02" == 2). (Mind you, certain other languages, like JavaScript, try to accommodate Perl-like weak typing while also doing operator overloading. This often leads to ugliness, like the loss of associativity for +.)
(The fly in the ointment here is that, for historical reasons, Perl 5 does have a few corner cases, like the bitwise logical operators, whose behavior depends on the internal representation of their arguments. Those are generally considered an annoying design flaw, since the internal representation can change for surprising reasons, and so predicting just what those operators do in a given situation can be tricky.)
All that said, one could argue that Perl does have strong types; they're just not the kind of types you might expect. Specifically, in addition to the "scalar" type discussed above, Perl also has two structured types: "array" and "hash". Those are very distinct from scalars, to the point where Perl variables have different sigils indicating their type ($ for scalars, # for arrays, % for hashes)1. There are coercion rules between these types, so you can write e.g. %foo = #bar, but many of them are quite lossy: for example, $foo = #bar assigns the length of the array #bar to $foo, not its contents. (Also, there are a few other strange types, like typeglobs and I/O handles, that you don't often see exposed.)
Also, a slight chink in this nice design is the existence of reference types, which are a special kind of scalars (and which can be distinguished from normal scalars, using the ref operator). It's possible to use references as normal scalars, but their string/numeric values are not particularly useful, and they tend to lose their special reference-ness if you modify them using normal scalar operations. Also, any Perl variable2 can be blessed to a class, turning it into an object of that class; the OO class system in Perl is somewhat orthogonal to the primitive type (or typelessness) system described above, although it's also "weak" in the sense of following the duck typing paradigm. The general opinion is that, if you find yourself checking the class of an object in Perl, you're doing something wrong.
1 Actually, the sigil denotes the type of the value being accessed, so that e.g. the first scalar in the array #foo is denoted $foo[0]. See perlfaq4 for more details.
2 Objects in Perl are (normally) accessed through references to them, but what actually gets blessed is the (possibly anonymous) variable the reference points to. However, the blessing is indeed a property of the variable, not of its value, so e.g. that assigning the actual blessed variable to another one just gives you a shallow, unblessed copy of it. See perlobj for more details.
In addition to what Eric has said, consider the following C code:
void f(void* x);
f(42);
f("hello");
In contrast to languages such as Python, C#, Java or whatnot, the above is weakly typed because we lose type information. Eric correctly pointed out that in C# we can circumvent the compiler by casting, effectively telling it “I know more about the type of this variable than you”.
But even then, the runtime will still check the type! If the cast is invalid, the runtime system will catch it and throw an exception.
With type erasure, this doesn’t happen – type information is thrown away. A cast to void* in C does exactly that. In this regard, the above is fundamentally different from a C# method declaration such as void f(Object x).
(Technically, C# also allows type erasure through unsafe code or marshalling.)
This is as weakly typed as it gets. Everything else is just a matter of static vs. dynamic type checking, i.e. of the time when a type is checked.
A perfect example comes from the wikipedia article of Strong Typing:
Generally strong typing implies that the programming language places severe restrictions on the intermixing that is permitted to occur.
Weak Typing
a = 2
b = "2"
concatenate(a, b) # returns "22"
add(a, b) # returns 4
Strong Typing
a = 2
b = "2"
concatenate(a, b) # Type Error
add(a, b) # Type Error
concatenate(str(a), b) #Returns "22"
add(a, int(b)) # Returns 4
Notice that a weak typing language can intermix different types without errors. A strong type language requires the input types to be the expected types. In a strong type language a type can be converted (str(a) converts an integer to a string) or cast (int(b)).
This all depends on the interpretation of typing.
I would like to contribute to the discussion with my own research on the subject, as others comment and contribute I have been reading their answers and following their references and I have found interesting information. As suggested, it is probable that most of this would be better discussed in the Programmers forum, since it appears to be more theoretical than practical.
From a theoretical standpoint, I think the article by Luca Cardelli and Peter Wegner named On Understanding Types, Data Abstraction and Polymorphism has one of the best arguments I have read.
A type may be viewed as a set of clothes (or a suit of armor) that
protects an underlying untyped representation from arbitrary or
unintended use. It provides a protective covering that hides the
underlying representation and constrains the way objects may interact
with other objects. In an untyped system untyped objects are naked
in that the underlying representation is exposed for all to see.
Violating the type system involves removing the protective set of
clothing and operating directly on the naked representation.
This statement seems to suggest that weakly typing would let us access the inner structure of a type and manipulate it as if it was something else (another type). Perhaps what we could do with unsafe code (mentioned by Eric) or with c type-erased pointers mentioned by Konrad.
The article continues...
Languages in which all expressions are type-consistent are called
strongly typed languages. If a language is strongly typed its compiler
can guarantee that the programs it accepts will execute without type
errors. In general, we should strive for strong typing, and adopt
static typing whenever possible. Note that every statically typed
language is strongly typed but the converse is not necessarily true.
As such, strong typing means the absence of type errors, I can only assume that weak typing means the contrary: the likely presence of type errors. At runtime or compile time? Seems irrelevant here.
Funny thing, as per this definition, a language with powerful type coercions like Perl would be considered strongly typed, because the system is not failing, but it is dealing with the types by coercing them into appropriate and well defined equivalences.
On the other hand, could I say than the allowance of ClassCastException and ArrayStoreException (in Java) and InvalidCastException, ArrayTypeMismatchException (in C#) would indicate a level of weakly typing, at least at compile time? Eric's answer seems to agree with this.
In a second article named Typeful Programming provided in one of the references provided in one of the answers in this question, Luca Cardelli delves into the concept of type violations:
Most system programming languages allow arbitrary type violations,
some indiscriminately, some only in restricted parts of a program.
Operations that involve type violations are called unsound. Type
violations fall in several classes [among which we can mention]:
Basic-value coercions: These include conversions between integers, booleans, characters, sets, etc. There is no need for type violations
here, because built-in interfaces can be provided to carry out the
coercions in a type-sound way.
As such, type coercions like those provided by operators could be considered type violations, but unless they break the consistency of the type system, we might say that they do not lead to a weakly typed system.
Based on this neither Python, Perl, Java or C# are weakly typed.
Cardelli mentions two type vilations that I very well consider cases of truly weak typing:
Address arithmetic. If necessary, there should be a built-in (unsound) interface, providing the adequate operations on addresses
and type conversions. Various situations involve pointers into the
heap (very dangerous with relocating collectors), pointers to the
stack, pointers to static areas, and pointers into other address
spaces. Sometimes array indexing can replace address arithmetic.
Memory mapping. This involves looking at an area of memory as an unstructured array, although it contains structured data. This is
typical of memory allocators and collectors.
This kind of things possible in languages like C (mentioned by Konrad) or through unsafe code in .Net (mentioned by Eric) would truly imply weakly typing.
I believe the best answer so far is Eric's, because the definition of this concepts is very theoretical, and when it comes to a particular language, the interpretations of all these concepts may lead to different debatable conclusions.
Weak typing does indeed mean that a high percentage of types can be implicitly coerced, attempting to guess what the coder intended.
Strong typing means that types are not coerced, or at least coerced less.
Static typing means your variables' types are determined at compile time.
Many people have recently been confusing "manifestly typed" with "strongly typed". "Manifestly typed" means that you declare your variables' types explicitly.
Python is mostly strongly typed, though you can use almost anything in a boolean context, and booleans can be used in an integer context, and you can use an integer in a float context. It is not manifestly typed, because you don't need to declare your types (except for Cython, which isn't entirely python, albeit interesting). It is also not statically typed.
C and C++ are manifestly typed, statically typed, and somewhat strongly typed, because you declare your types, types are determined at compile time, and you can mix integers and pointers, or integers and doubles, or even cast a pointer to one type into a pointer to another type.
Haskell is an interesting example, because it is not manifestly typed, but it's also statically and strongly typed.
The strong <=> weak typing is not only about the continuum on how much or how little of the values are coerced automatically by the language for one datatype to another, but how strongly or weakly the actual values are typed. In Python and Java, and mostly in C#, the values have their types set in stone. In Perl, not so much - there are really only a handful of different valuetypes to store in a variable.
Let's open the cases one by one.
Python
In Python example 1 + "1", + operator calls the __add__ for type int giving it the string "1" as an argument - however, this results in NotImplemented:
>>> (1).__add__('1')
NotImplemented
Next, the interpreter tries the __radd__ of str:
>>> '1'.__radd__(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute '__radd__'
As it fails, the + operator fails with the the result TypeError: unsupported operand type(s) for +: 'int' and 'str'. As such, the exception does not say much about strong typing, but the fact that the operator + does not coerce its arguments automatically to the same type, is a pointer to the fact that Python is not the most weakly typed language in the continuum.
On the other hand, in Python 'a' * 5 is implemented:
>>> 'a' * 5
'aaaaa'
That is,
>>> 'a'.__mul__(5)
'aaaaa'
The fact that the operation is different requires some strong typing - however the opposite of * coercing the values to numbers before multiplying still would not necessarily make the values weakly typed.
Java
The Java example, String result = "1" + 1; works only because as a fact of convenience, the operator + is overloaded for strings. The Java + operator replaces the sequence with creating a StringBuilder (see this):
String result = a + b;
// becomes something like
String result = new StringBuilder().append(a).append(b).toString()
This is rather an example of very static typing, without no actual coercion - StringBuilder has a method append(Object) that is specifically used here. The documentation says the following:
Appends the string representation of the Object argument.
The overall effect is exactly as if the argument were converted to a
string by the method String.valueOf(Object), and the characters of
that string were then appended to this character sequence.
Where String.valueOf then
Returns the string representation of the Object argument.
[Returns] if the argument is null, then a string equal to "null"; otherwise, the value of obj.toString() is returned.
Thus this is a case of absolutely no coercion by the language - delegating every concern to the objects itself.
C#
According to the Jon Skeet answer here, operator + is not even overloaded for the string class - akin to Java, this is just convenience generated by the compiler, thanks to both static and strong typing.
Perl
As the perldata explains,
Perl has three built-in data types: scalars, arrays of scalars, and associative arrays of scalars, known as "hashes". A scalar is a single string (of any size, limited only by the available memory), number, or a reference to something (which will be discussed in perlref). Normal arrays are ordered lists of scalars indexed by number, starting with 0. Hashes are unordered collections of scalar values indexed by their associated string key.
Perl however does not have a separate data type for numbers, booleans, strings, nulls, undefineds, references to other objects etc - it just has one type for these all, the scalar type; 0 is a scalar value as much as is "0". A scalar variable that was set as a string can really change into a number, and from there on behave differently from "just a string", if it is accessed in a numerical context. The scalar can hold anything in Perl, it is as much the object as it exists in the system. whereas in Python the names just refers to the objects, in Perl the scalar values in the names are changeable objects. Furthermore, the Object Oriented Type system is glued on top of this: there are just 3 datatypes in perl - scalars, lists and hashes. A user defined object in Perl is a reference (that is a pointer to any of the 3 previous) blessed to a package - you can take any such value and bless it to any class at any instant you want.
Perl even allows you to change the classes of values at whim - this is not possible in Python where to create a value of some class you need to explicitly construct the value belonging to that class with object.__new__ or similar. In Python you cannot really change the essence of the object after the creation, in Perl you can do much anything:
package Foo;
package Bar;
my $val = 42;
# $val is now a scalar value set from double
bless \$val, Foo;
# all references to $val now belong to class Foo
my $obj = \$val;
# now $obj refers to the SV stored in $val
# thus this prints: Foo=SCALAR(0x1c7d8c8)
print \$val, "\n";
# all references to $val now belong to class Bar
bless \$val, Bar;
# thus this prints Bar=SCALAR(0x1c7d8c8)
print \$val, "\n";
# we change the value stored in $val from number to a string
$val = 'abc';
# yet still the SV is blessed: Bar=SCALAR(0x1c7d8c8)
print \$val, "\n";
# and on the course, the $obj now refers to a "Bar" even though
# at the time of copying it did refer to a "Foo".
print $obj, "\n";
thus the type identity is weakly bound to the variable, and it can be changed through any reference on the fly. In fact, if you do
my $another = $val;
\$another does not have the class identity, even though \$val will still give the blessed reference.
TL;DR
There are much more about weak typing to Perl than just automatic coercions, and it is more about that the types of the values themselves are not set into stone, unlike the Python which is dynamically yet very strongly typed language. That python gives TypeError on 1 + "1" is an indication that the language is strongly typed, even though the contrary one of doing something useful, as in Java or C# does not preclude them being strongly typed languages.
As many others have expressed, the entire notion of "strong" vs "weak" typing is problematic.
As a archetype, Smalltalk is very strongly typed -- it will always raise an exception if an operation between two objects is incompatible. However, I suspect few on this list would call Smalltalk a strongly-typed language, because it is dynamically typed.
I find the notion of "static" versus "dynamic" typing more useful than "strong" versus "weak." A statically-typed language has all the types figured out at compile-time, and the programmer has to explicitly declare if otherwise.
Contrast with a dynamically-typed language, where typing is performed at run-time. This is typically a requirement for polymorphic languages, so that decisions about whether an operation between two objects is legal does not have to be decided by the programmer in advance.
In polymorphic, dynamically-typed languages (like Smalltalk and Ruby), it's more useful to think of a "type" as a "conformance to protocol." If an object obeys a protocol the same way another object does -- even if the two objects do not share any inheritance or mixins or other voodoo -- they are considered the same "type" by the run-time system. More correctly, an object in such systems is autonomous, and can decide if it makes sense to respond to any particular message referring to any particular argument.
Want an object that can make some meaningful response to the message "+" with an object argument that describes the colour blue? You can do that in dynamically-typed languages, but it is a pain in statically-typed languages.
I like #Eric Lippert's answer, but to address the question - strongly typed languages typically have explicit knowledge of the types of variables at each point of the program. Weakly typed languages do not, so they can attempt to perform an operation that may not be possible for a particular type.
It think the easiest way to see this is in a function.
C++:
void func(string a) {...}
The variable a is known to be of type string and any incompatible operation will be caught at compile time.
Python:
def func(a)
...
The variable a could be anything and we can have code that calls an invalid method, which will only get caught at runtime.
Classes and structs in C# share several characteristics:
they can be instantiated (absent restrictions to the contrary, as with abstract and static classes)
they can contain method and property implementations
the type's author defines the type's instance fields
We often use "class" and "struct" to distinguish between "reference type" and "value type", but sometimes it's useful to consider both types of types. Furthermore, "reference type" also includes interfaces and delegates, which are not classes. So "class" doesn't mean any reference type, it means "a reference _(fill in the blank)_".
For example, if reference and value type declarations were like this:
public sealed class ref String { }
public class val Int32 { }
instead of like this:
public sealed class String { }
public struct Int32 { }
then the word "class" could be used to denote the concept.
The best answer I've come up with here is "concrete type", but that would be confusing, since it could also refer to the non-abstract subclass of an abstract class.
Any suggestions?
EDIT
To clarify, I'm not seeking a word that can collectively describe instances of classes and structs. I'm trying to describe class types and struct types.
In other words, if "class" denotes a set that includes System.String, System.FileInfo, etc., and "struct" denotes a set that includes System.Int32, System.Collections.Generic.List<T>.Enumerator, etc., then I'm looking for the word that denotes the union of those sets.
EDIT 2
(In reaction to Jordão's answer) Another way to answer this question would be to complete the following sentence: "All C# method implementations must be declared as members of a _(fill in the blank)_".
I normally use the term "type" to refer to any of those elements: class, struct and even interfaces and enums.
I never really felt the need to talk about classes and structs exclusively, I would probably just say "class", and then differentiate them as needed.
The term type in C# can refer to any of:
Reference Types
object, dynamic and string
Class types
Interface types
Delegate types
Constructed class/interface/delegate types (e.g. List<string>)
Array types
Value types
Struct types
Enumeration types
Simple types (integral types, floating point types, decimal, and bool)
Nullable types
Pointer types
All of these are terms from the C# specification.
class, interface, delegate, struct and enum types are also called type declarations (or: user-definable types).
Depending on your point of view, you might also consider type parameters and void to be types.
However, there isn't any special term for "classes or structs". In the language of the C# specification, one would say:
All C# method implementations must be declared as members of a class or struct declaration.
I realize this question has a selected answer, but I believe I can offer a fresh insight that will still be helpful:
I think the word you are looking for may be Model. This term is used to mean several different things in CS, but the wikipedia article for mathematical model describes my intension.
In this context, a model is a description of a system in some meta language. A system can be fully expressed in terms of its three parts: structure; behavior; and, interconnectivity. Both .NET classes and .NET structs are compatible with this definition. Interfaces are not, because the behavior is not defined. You can only indicate the structure of method calls and member declarations and the type contracts for operations (interconnectivity). Enums may or may not be compatible with this definition, but as most frequently used are not, because they typically do not express behavior. The exceptions are enums for which bitwise operations are sufficient representations of meaningful set operations. With this precondition, I think its fair to classify an enum along with classes and structs.
As a side note, both interfaces and standard enums could be considered as systems by themselves, if extension methods were interpreted as intrinsic to the types they extend. However, neither the compiler nor I would consider extension methods to be intrinsic to the type of the first operand. A more accurate interpretation would be to consider both the enum/interface and the extension method as necessary components of a system. The difference between these component types that are extended and a class/struct/special-case enum is that the class/struct/special-case enum is a system in-itself, and therefore a subsystem of its containing system, whereas the component type is a component but not a system in itself.
It is probably worthwhile to clarify that, under this interpretation, the term model is analogous to a type, whereas the term system is analogous to an instance. A system could also apply to a larger composite, such as an assembly, but that is not what the question was about.
The statement "All C# method implementations must be declared as members of a model" seems to work. It also does not logically entail that "all models can contain custom method implementations", so we are safe in the special-case of set-theoretic enums. It would also work in the case where the modeled system is the composition of static extension method implementations and interfaces.
I find this to be a pseudo discussion. Point #3 can be said about enums and interfaces too, and the point about subclasses of abstract classes not fitting into the mix, I simply don't get. I think your own suggestion of "concrete types" is ok, but maybe you just want to talk about them as classes and structs, oh wait, but with the exception of subclasses of abstract classes and classes that implement interfaces. The reason that there is no term for what you are looking for might be that it is not a very useful concept in its own right.
EDIT:
All C# method implementations must be declared as members of a class or struct.
It should be Microsoft term. The origin of these notions is C++, where structs are just classes with all members being public. So, Microsoft cooked some new judging here, mixing some of C, C++ and Java. So they should invent also a terms.
Microsoft denotes them all as "types" which can be "value", "reference" and "pointer": http://msdn.microsoft.com/en-us/library/3ewxz6et(v=vs.100).aspx
But these notions do not gather only structs and classes.
So, if invent some custom term, we may take one from Pascal language, for example, where it is "record". Or some other terms can be coined from here: http://en.wikipedia.org/wiki/Object_composition
Both classes and structs are types which define objects. They are building blocks within an object oriented programming language. You can model both of them using UML or some other high-level object oriented modelling language. The choice between one or the other is an implementation detail.
I'm looking at OCaml's functors. It looks to me pretty identical to the so called generic objects in C++/C#/Java. If you ignore Java's type erasion for now, and ignore the implementation details for C++ templates (I'm interested with the language feature), functors are quite indentical to generics.
If I understand it correctly, functor gives you a new set of functions from a type you provide, so that for example
List<MyClass>.GetType() != List<MyOtherClass>.GetType()
But you could roughly rewrite OCaml's
#module Set =
functor (Elt: ORDERED_TYPE) ->
struct
type element = Elt.t
type set = element list
let empty = []
let rec add x s =
match s with
[] -> [x]
| hd::tl ->
match Elt.compare x hd with
Equal -> s (* x is already in s *)
| Less -> x :: s (* x is smaller than all elements of s *)
| Greater -> hd :: add x tl
let rec member x s =
match s with
[] -> false
| hd::tl ->
match Elt.compare x hd with
Equal -> true (* x belongs to s *)
| Less -> false (* x is smaller than all elements of s *)
| Greater -> member x tl
end;;
into C#
class Set<T> where T : ISortable
{
private List<T> l = new List<T>();
static public Set<T> empty = new Set<T>();
public bool IsMember(T x) {return l.IndexOf(x) > -1;}
public void Add(T x) {l.Add(x);}
}
Sure there's a slight different since a functor affects a Module (which is just a bunch of types function and values definitions, similar to C#'s namespace).
But is it just it? Are functors merely generics applied to namespaces? Or is there any signifcant different between functors and generics which I'm missing.
Even if functors are just generics-for-namespace, what's the significant advantage of that approach? Classes can also be used as ad-hoc namespaces using nested classes.
But is it just it? Are functors merely
generics applied to namespaces?
Yes, I think one can treat functors as "namespaces with generics", and that by itself would be very welcome in C++ where the only option left is to use classes with all static members which becomes pretty ugly soon. Comparing to C++ templates one huge advantage is the explicit signature on module parameters (this is what I believe C++0x concepts could become, but oops).
Also modules are quite different from namespaces (consider multiple structural signatures, abstract and private types).
Even if functors are just
generics-for-namespace, what's the
significant advantage of that
approach? Classes can also be used as
ad-hoc namespaces using nested
classes.
Not sure whether it qualifies for significant, but namespaces can be opened, while class usage is explicitly qualified.
All in all - I think there is no obvious "significant advantage" of functors alone, it is just different approach to code modularization - ML style - and it fits nicely with the
core language. Not sure whether comparing module system apart from the language makes much sense.
PS C++ templates and C# generics are also quite different so that comparing against them as a whole feels little strange.
If I understand it correctly, functor gives you a new set of functions from a type you provide
More generally, functors map modules to modules. Your Set example maps a module adhering to the ORDERED_TYPE signature to a module implementing a set. The ORDERED_TYPE signature requires a type and a comparison function. Therefore, your C# is not equivalent because it parameterizes the set only over the type and not over the comparison function. So your C# can only implement one set type for each element type whereas the functor can implement many set modules for each element module, e.g. in ascending and descending order.
Even if functors are just generics-for-namespace, what's the significant advantage of that approach?
One major advantage of a higher-order module system is the ability to gradually refine interfaces. In OOP, everything is public or private (or sometimes protected or internal etc.). With modules, you can gradually refine module signatures at will giving more public access closer to the internals of a module and abstracting more and more of it away as you get further from that part of the code. I find that to be a considerable benefit.
Two examples where higher-order module systems shine compared to OOP are parameterizing data structure implementations over each other and building extensible graph libraries. See the section on "Structural abstraction" in Chris Okasaki's PhD thesis for examples of data structures parameterized over other data structures, e.g. a functor that converts a queue into a catenable list. See OCaml's excellent ocamlgraph library and the paper Designing a Generic Graph Library using ML Functors for an example of extensible and reusable graph algorithms using functors.
Classes can also be used as ad-hoc namespaces using nested classes.
In C#, you cannot parameterize classes over other classes. In C++, you can do some things like inheriting from a class passed in via a template.
Also, you can curry functors.
Functors in SML are generative, so the abstract types produced by an application of a functor at one point in a program are not the same as the abstract types produced by the same application (i.e. same functor, same argument) at another point.
For example, in:
structure IntMap1 = MakeMap(Int)
(* ... some other file in some other persons code: *)
structure IntMap2 = MakeMap(Int)
You can't take a map produced by a function in IntMap1 and use it with a function from IntMap2, because IntMap1.t is a different abstract type to IntMap2.t.
In practice this means if your library has a function producing an IntMap.t then you must also supply the IntMap structure as part of your library, and if the user of your library wants to use his own (or another libraries) IntMap then he has to convert the values from your IntMap to his IntMap - even though they are already structurally equivalent.
The alternative is to make your library a functor itself, and require the user of the library to apply that functor with their choice of IntMap. This also requires the user of the library to do more work than ideal. Especially when your library not only uses IntMap, but also other kinds of Map, and various Sets, and others.
With generics, OTOH, it is quite easy to write a library producing a Map, and have that value work with other libraries functions that take Map.
I just found a source that may help you with your problem - as OCaml has a different meaning for functors:
http://books.google.de/books?id=lfTv3iU0p8sC&pg=PA160&lpg=PA160&dq=ocaml+functor&source=bl&ots=vu0sdIB3ja&sig=OhGGcBdaIUR-3-UU05W1VoXQPKc&hl=de&ei=u2e8SqqCNI7R-Qa43IHSCw&sa=X&oi=book_result&ct=result&resnum=9#v=onepage&q=ocaml%20functor&f=false
still - I find it confusing if the same word is used for different concepts.
I don't know if OCaml has a different meaning - but normally a Functor is a "Function object" (see here: http://en.wikipedia.org/wiki/Function_object). This is totally different to generics (see here: http://en.wikipedia.org/wiki/Generic_programming)
A function object is an object that can be used as a function. Generics are a way to parametrize objects. Generics are kind of orthogonally to inheritance (which specializes objects). Generics introduce typesafety and reduce the need for casting. Functors are an improved function pointer.
From Wikipedia:
Generic programming is a style of
computer programming in which
algorithms are written in terms of
to-be-specified-later types that are
then instantiated when needed for
specific types provided as parameters
and was pioneered by Ada which
appeared in 1983. This approach
permits writing common functions or
types that differ only in the set of
types on which they operate when used,
thus reducing duplication.
Generics provide the ability to define types that are specified later. You don't have to cast items to a type to use them because they are already typed.
Why does C# and VB have Generics? What benefit do they provide? What benefits do you find using them?
What other languages also have generics?
C# and VB have generics to take advantage of generics support in the underlying CLR (or is the other way around?). They allow you to write code ina statically-typed language that can apply to more than one kind of type without rewriting the code for each type you use them for (the runtime will do that for you) or otherwise using System.Object and casting everywhere (like we had to do with ArrayList).
Did you read the article?
These languages also have generics:
C++ (via templates)
Ada (via templates)
Eiffel
D (via templates)
Haskell
Java
Personally, I think they allows to save a lot of time. I'm still using .NET Framework 1.1 and every time you want a specific collection, you need to create a strongly typed collection by implementing CollectionBase. With Generics, you just need to declare your collection like that List<MyObject> and it's done.
Consider these method signatures:
//Old and busted
public abstract class Enum
{
public static object Parse(Type enumType, string value);
}
//To call it:
MyEnum x = (MyEnum) Enum.Parse(typeof(MyEnum), someString);
//New and groovy
public abstract class Enum
{
public static T Parse<T>(string value);
}
//To call it:
MyEnum x = Enum.Parse<MyEnum>(someString);
Look ma: No runtime type manipulation.
From MSDN:
Generics provide the solution to a
limitation in earlier versions of the
common language runtime and the C#
language in which generalization is
accomplished by casting types to and
from the universal base type Object.
By creating a generic class, you can
create a collection that is type-safe
at compile-time.
Read the rest of that article to see some examples of how Generics can improve the readability and performance of your code.
Probably the most common use for them is having strongly typed ArrayLists. In .NET 1.1, you'd either have to cast everything from object to your desired Type, or use something like CodeSmith to generate a strongly typed ArrayList.
Additionally, they help decrease boxing. Again, in .NET 1.x, if you tried to use an ArrayList with a Value Type, you'd end up boxing and unboxing the objects all over the place. Generics avoid that by letting you define the Type, whether Reference or Value.
There are other handy uses for them too, event handlers, LINQ queries, etc.
Generics in .NET are excellent for object collections. You can define your object type however you want and be able to have, say, a List without writing any code for that, and have access to all the efficient functionality of the .NET List generic collection while being type-safe to T. It's great stuff.
Generics are build on the concept of templates in c++ if you are familiar with them.
Its a way to implement an algorithm or data structure but delaying the actual type it is used on.
List can then be assigned with any type of your choice int, string and even custom types the type is assigned on construction of the list. But you will be able to use the list operations add remove etc.
You can really save a lot of coding effort by getting used to generics. And you don't have to box and unbox between types.
Java have generics as well. They are called wildcards.
Generics in .net, like inheritence and extension methods, allows for reduction of code duplication. Let me explain by way of refactoring.
If all classes with a common ancestor have a common method, place the common method in the classes' common ancestor (inheritence).
If some classes have a common method that uses a public contract to achieve some result, make the common method into an extension method on that public contract.
If some several methods or classes have the same code that differs only by the types acted upon (especially where the details of the type are not relevant to the operation of the method), collect those methods or classes into a generic.
They increase performance for collections using value types, since no boxing/unboxing will be required. They're a lot cleaner to use since you won't have to cast an object (for example using ArrayList) to the desired type - and likewise they help enforce type safety.
Biggest advantage of generics over non generic types in C# (not Java, Java is a different story) is that they are much faster. The JIT generates the best machine code it can come up with for a given type. List<int> is actually a list of ints and not integer objects wrapping an int. This makes generic types awesomely fast and also type safe which can help you detect an awesome lot of errors at compile time :)
The common example is collections. e.g. a set of type T, as an Add(T) method and a T get() method. Same code, different type safe collections.
C++, D, Ada and others have templates, a superset of generics that do it a little different bug get the same end result (and then some).
IIRC Java has generics, but I don't do Java.
The easiest way to explain it is to give an example. Say you want two hashtables, one that maps objects of type string to type int and one that maps objects of type string to type double. You could define Hashtable and then use the K and V types. Without generics, you'd have to use the 'object' type which, in addition to having to be cast to be meaningful, gives up typesafety. Just instantiate Hashtable and Hashtable and you've got your hash tables with proper typechecking and all.
Java also has generics. C++ has templates.
Dynamic languages like Perl and Javascript don't have the same type restrictions so they get mostly the same benefits with less work.
In objective-C you can use protocols to achieve the aims of generics. Since the language is weakly typed however, it's generally not as much of a concern as when you are fighting the type system to use one code path for many types.
Personally I am a huge fan of generics because of all of the code I don't have to write.
What is Inversion of Control?