HI Stan,
Type inference usually gets tricky once you add subtyping and overloading. Dafny has both, in certain restricted ways. Overloading, because builtin operators like "+" in A + B can mean arithmetic addition, set union, multiset union, map
union, or sequence concatenation, depending on the type of A and B. Subtyping, because "object" is a supertype of all reference types and because "nat" is a subrange type of "int". Because of these restrictions, it may not be impossible to define
an algorithm for type inference.
In the following, I describe three issues with special ties to Dafny. (They are not necessarily problems, just things to think about.)
One commonly encountered problem in Dafny that an improved type inference algorithm would solve is this. Suppose "Repr" is a field of type "set<object>" and you want to do:
Repr := {this} + {this.p};
or:
Repr := {this, this.p};
where "p" is of some reference type other than that of "this". Today, Dafny gives a typechecking error, forcing you to have to rewrite this as:
Repr := {this};
Repr := Repr + {this.p};
The rewrite works, because when an operand to + is of type set<object>, then Dafny is happy to coerce the other to set<object> as well. It would be nice if the programmer didn't have to do this rewrite manually.
In the example I just gave, it is clear what we want the outcome to be. That's not always as clear, as I explain in this next issue. Suppose F is a function that takes and returns a "nat":
function F(n: nat): nat
Consider a statement:
var x := F(15);
What should be the inferred type of "x"? "nat" is a possibility, but what if the code continues with:
x := x  F(x);
In a language where static checking ends with type checking, this might be a problem to which one would apply a most elaborate and complicated solution, trying to figure out when "x" can stay a "nat" or when it has to instead be an "int". In Dafny,
it seems one can get away with something much simpler, namely: infer the type to be the coarsergrained ("base type") "int" and check any further inttonat coercion by the (static) verifier. Hence, in this case, there seems to be no harm
in inferring just base types.
In this regard, subranges seem different than the subtyping that "object" gives rise to, because if "c" is a reference of class type "C", then you would want "{c}" to be of type set<C>you won't want the type inference algorithm to leap to the "base"
type and inferr set<object>.
Note also that the coercioncheckedbytheverifier trick would not always be pleasant. For example, a coercision from List<int> to List<nat>, where List<T> is the standard datatype for list of T's, may require an inductive proof.
And a coercsion from List<C<int>> to List<C<nat>>, where C<T> is some class with fields of type T, would require a proof that depends on the state of the heap (and, in fact, would be unsafe in the presence of changes in the heap).
Heapdependent coersions should probably be forbidden; the others could be allowed, but would require some engineering of the verifier in order to set up provable proof obligations. (Note that the List<int> to List<nat> coercion does not
look very appealing for runtime checking either, in languages where that is of concern.)
The third issue concerns quantifications. Consider an expression
forall x :: F(x) < 100
In order for this expression to be wellformed, "x" must be a "nat" (that is, "x" must be an integer at least 0). If the type of "x" is inferred as "int", then the verifier would complain about a violated precondition for F. That's probably how
type inference should behave. Unfortunately, it means the programmer has to manually write:
forall x :: 0 <= x ==> F(x) < 100
or
forall x: nat :: F(x) < 100
Note that for bound variables of quantifiers, it really matters what the inference doesthe more values the types allow, the stronger the universal quantifier is! For example, although "nat" is a possible typing of:
forall n :: n <= 2*n
the quantifier is "true" with "n: nat" but "false" with "n: int".
Rustan
