My good friends Claudio Russo and Andrew Kennedy have been kind enough to send me a draft paper about Generalized Algebraic Data Types (GADTs) and Object-Oriented Programming. GADTs generalize the datatypes of ML and Haskell by permitting constructors to produce different type-instantiations of the same datatype. One of Andrew and Claudio’s examples is a slightly modified version of Peter Sestoft’s example of representing abstract syntax trees for typed expressions in Generic C#. To get a better feeling of the difference between the generics in C#/Java and the generics in C++ I have rewritten the the example in C++.
To recap the example: we want to represent abstract syntax trees for a tiny language of simple expressions. Futhermore, we want representation to be type safe. That is, we do not want to be able to represent the nonsense expression “true + 42
“. We shall represent the abstract syntax trees the standard OO way with an abstract base class Exp
for the general type of expressions and the concreate subclasses for each node type. To enforce the type safety of our embedded language we use generics. That is, a value with type Exp<R>
is an abstract syntax tree for an expression that evaluates to a value of type R
.
First the base class.
Peter’s Generic C# version:
abstract class Exp<R> { // R = result type abstract public R eval(); }
C++ version:
template< typename R > // R = result type class Exp { public: virtual R eval() const = 0; virtual ~Exp() {} };
Note that in C++ we need to remember to define a virtual destructor because we plan to inherit from Exp
and we also want to be able to dynamically allocate subclasses of Exp
.
Next up is the class for literals.
Peter’s Generic C# version:
class Lit<R> : Exp<R> { private readonly R v; public Lit(R v) { this.v = v; } public override R eval() { return v; } }
C++ version:
template< typename R > class Lit : public Exp<R> { public: virtual R eval() const { return val; } explicit Lit(R val_) : val(val_) {} private: R const val; };
No surprises here.
But for binary expressions like plus, for instances, it starts to get a bit more tricky. Here I’ll use a simpler version of the class for representing plus nodes than the version Peter uses.
Generic C# version:
class Plus : Exp<int> { private readonly Exp<int> lhs, rhs; public Plus(Exp<int> lhs, Exp<int> rhs) { this.lhs = lhs; this.rhs = rhs; } public override int eval() { return lhs.eval() + rhs.eval(); } }
First cut at a C++ version:
class Plus : public Exp<int> { public: virtual int eval() const { return lhs.eval() + rhs.eval(); } explicit Plus(Exp<int> const lhs_, Exp<int> const rhs_) : lhs(lhs_), rhs(rhs_) {} private: Exp<int> lhs, rhs; };
Alas, this is not valid C++ because we are not allowed to declare fields and parameters of type Exp<int>
because this class has an abstract virtual function, namely eval
(and besides if we could, for example by modifying the class Exp
, the code would not do what we expects). These problems has nothing to do with the generics in C++, they are just the usual C++ OO quirks. And the solution is straightforward: we must use a pointer to values of type Exp
. But then we have to make decisions about ownership: should we make a deep copy of subexpressions? should we share subexpressions? and if so who should take take of freeing resources? should we use ref counting? or what? Here we shall just use one of the the wonderful smart pointers from Boost: shared_ptr
which will take care of the refcounting and deletion of expressions. Thus, first we make a couple of convenient typedefs for integer expressions and shared pointers to integer expressions (and similar for boolean expressions):
typedef Exp< int > IntExp; typedef boost::shared_ptr< IntExp > IntExpPtr;
Then the class for plus nodes looks like this:
class Plus : public IntExp { public: virtual int eval() const { return lhs->eval() + rhs->eval(); } explicit Plus(IntExpPtr const & lhs_, IntExpPtr const & rhs_) : lhs(lhs_), rhs(rhs_) {} private: IntExpPtr lhs, rhs; };
and a wrapper for constructing shared pointers for new plus expressions:
IntExpPtr plus(IntExpPtr const & lhs, IntExpPtr const & rhs) { return IntExpPtr( new Plus( lhs, rhs ) ); }
Similar to the Plus
class we can define a class for representing condtional expressions:
Generic C# version:
class Cond<R> : Exp<R> { private readonly E<bool> cond; private readonly E<R> truth, falsehood; public Cond(E<bool> cond, E<R> truth, E<R> falsehood) { this.cond = cond; this.truth = truth; this.falsehood = falsehood; } public override R eval() { return cond.eval() ? truth.eval() : falsehood.eval(); } }
C++ version:
template < typename R > class Cond : public Exp<R> { public: typedef boost::shared_ptr< Exp<R> > SubExpPtr; virtual R eval() const { return cond->eval() ? truth->eval() : falsehood->eval(); } explicit Cond(BoolExpPtr const & cond_, SubExpPtr const & truth_, SubExpPtr const & falsehood_) : cond(cond_), truth(truth_), falsehood(falsehood_) {} private: BoolExpPtr cond; SubExpPtr truth, falsehood; };
The interesting things to note about this class are that Cond
contains subexpressions of different types, that Cond
is polymorphic in the result type, that Cond
enforces that the two branches of a condtional expression must have the same type, and the only one of the branchs is evaluated based on the evaluation of the condition.
Using these classes we can define a function that builds an expression for calculating the n‘th fibonacci number:
IntExpPtr fib(int n) { IntExpPtr fib0( lit(0) ), fib1( lit(1) ); switch ( n ) { case 0: return fib0; case 1: return fib1; default: for(int i = 1; i != n; ++i) { IntExpPtr tmp( fib1 ); fib1 = plus( fib0, fib1 ); fib0 = tmp; } return fib1; } }
This function builds an expression that takes linear, O(n), space, but it will use exponential time, O(2n), to be evaluated.
Conclusion
Non-surprising this GADT example can be translated more or less straight forward from generinc C# to C++. The things that are a bit tedious are just the normal C++ oddities and has nothing to do with generics. In fact, we would have had the exact same problems without generics.
Next up is to try and translate a few more of Claudio and Andrews examples. The statically typed printf
and the typed LR parsing looks nifty.
Do you know about the Boost variant library?
http://www.boost.org/doc/html/variant.html
How does this compare to your solution?
Also, how does this compare to AST:
http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/ast/index.html
This is a Domain Specific Language which generates ASTs in C++.
Yes, I know about the Boost variant library. I think the best way to compare my solution with a varient is to look at the Recursive variant types. The variant library coresponds roughly to ML and Haskell datatypes in C++. That is, if you remove all the template stuff from my solution then you get more or less the idiom that variant package into a library. Furthermore, variant makes a different tradeoff with respect to extensability of the datatype and certain forms of type safety at the C++ level (see my entry on the visitor pattern for a bit more discussion of the tradeoffs). But the variant example of embedding expressions does not enforce the type-safty of the embedded expressions. But I think you could use my technique (which is really Andrew, Claudio, and Peter’s technique) with the variant library. That would be an interesting experiment. Thus, I think that my technique complements the variant library.
With respect to AST (the language), as you say it is a DSL for generationg AST classes in C++, whereas my solution is a pattern/style/idiom for writting ASTs (or DSELs) manually. But if you look at the C++ classes generated by AST then they use a different (ugly) idiom. Also, I don’t think that AST will enforce type-safety of the embedded language. But AST could be changed to generate classes using my technique.
Hope this makes my writing more clear. 🙂
I agree that trying to implement this using the Boost variant is an interesting thought.
But the question on my mind is: Is this really new? I would have thought that one of the numerous expression template systems like
http://exmat.sourceforge.net/
http://research.microsoft.com/projects/greta/
http://www.boost.org/libs/numeric/ublas/doc/index.htm
http://www.oonumerics.org/blitz/
http://acts.nersc.gov/events/PI1999/pete/
etc. etc. would have compile time type safety for expressions already?
Oh, but I don’t claim to have done anything new with my embedding of the tiny expression language. And, as you say, somebodyelse have properly done something like this already. At least the matrix libraries that checks dimentions at compile time must have something similar. However, what is new is that Claudio and Andrew present a general and structured way of handling the embedding of the static semantics of a DSEL (at least they present one of the needed pieces).
In fact, I’m not really sure that the presented embedding of the expression language is really useful. Because often you want to be able to construct nonsense expressions, for example in a parser. But that example had the advantage that it was inspired by a publically available example by Peter Sestoft.
You could have used references instead of shared pointers, especially since you have immutable classes.
class Plus : public Exp {
public:
virtual int eval() const {
return lhs.eval() + rhs.eval();
}
explicit Plus(Exp const &lhs_, Exp const &rhs_)
: lhs(lhs_), rhs(rhs_)
{}
private:
const Exp &lhs, &rhs;
};