Accessible Functional Programming
Apr. 18th, 2013 03:05 pm In theory I should be writing my compiler right now, but I just got a new keyboard and am typing at about quarter-speed, so this is a great chance to write down some stuff that's been bouncing around my head lately and get used to my new keyboard, all at once. Not that this will help me with programming-style typing, but oh well.
Fair warning: this is pretty rambly! I just wanted to get these words out of my head and onto something more concrete; a more coherent version will come when I actually start working on this
It is a tragedy that Python is as popular as it is. I believe this in the usual sense, of course---its semantics are terrible---but that's not what I want to emphasize here. I think a lot of the PL community, myself included, is to blame for this. Our languages are not accessible; Python is. While I think it should be possible for most programmers to learn how to write good functional code in ML or Haskell or whatever, I also don't think it should be mandatory.
From talking to people who like Python, the feature I hear mentioned the most is how easy it is to use. While ML and Haskell are comparable to Python in terms of the length of a Hello, world program, presumably (unfortunately, the people I'm talking to don't also know ML or Haskell) the ease of use is diminished somewhat in more complicated programs.
We understand semantics well, yes; that's kind of our job. But programming is a populist skill, to some extent, not limited to people with the time and inclination to learn about monads and category theory and parentheses, and requiring that people sit down and learn these things, instead of presenting languages that use them in an intuitive way, is narrowminded and exclusionary. Programming may not be the most important thing in people's hierarchy of needs, but everything I've been learning and acting on lately in the rest of my life compels me, as someone who hopes to be an expert in the field, to make sure everyone can benefit from my work and the work of my colleagues, not just those of us with graduate degrees.
This is really the driving force behind this post; everything else is details. I'm sure I'm leaving things out, too---I'm using "accessible" here in a very broad sense, but it's also important to make sure languages are accessible to people with disabilities, which is an area in which I am woefully ignorant. My ultimate goal is to have a good, theoretically sound language where I could walk up to anyone in the world and say "If you want to program, you should use <foo>", and not deserve to have an angry mob chase me off. Right now, Python is, I think, the best candidate for that. Certainly all of my favorite languages (SML, Haskell, Agda), while theoretically well-founded, fail the angry mob test. This is a problem with the languages, not with the people who don't want to use them.
As someone who is relatively new to Haskell, I can see a lot of areas in it that I would never, ever want to explain to a new programmer, especially not one who is new to functional programming. In particular, the syntax for composing monadic functions, littered with <$> and <*> in the best case, is atrocious. do-notation is a step towards usability, but does not go anywhere near far enough to make it accessible. Thinking about the types of monadic code and how they compose is also a nuisance, and while I am in favor of keeping track of effects in the type system, I believe the interface to this can be vastly improved.
Lately, I have been thinking of how to fix these things. My ultimate goal is an accessible programming language with a good, ideally somewhat clean, semantics. I haven't defined "accessible" yet, and for good reason---I'm not sure what all it should encompass yet. A few criteria, though, are:
- Lightweight, unobtrusive types
- Clean syntax (no System.out.whatever nonsense, and as few reserved keywords as possible)
- Intuitive semantics. Calling effectful code should be easy. Doing effectful things should be easy. Doing all of these in a SAFE way should, of course, be easy. I should be able to write a for loop or a while loop without much effort expended. Haskell fails all of these.
Python passes all of these criteria (even the type system one---its single type is unobtrusive!), but if I were okay with python I wouldn't be writing this post, or in PL grad school, for that matter. So we also want some criteria for the semantic side of things, with design informed by PL research and experience with languages like Haskell and ML.
- Expressive module system (I come from ML land, and am therefore morally obligated to put this first)
- Expressive type system---I still believe that it is important for effects to be captured in the types of code, for example.
- Operator overloading, presumably with typeclasses
- First-class proofs of correctness. I'm not quite sure how this will fit in, if at all, but it would be nice to have dependent types of some sort baked in to the language so you could prove things about the function you just wrote, if you want to.
At first glance, these seem irreconcilable, but I believe they are not. There is already a lot of work being done to make type systems less intrusive, for example---gradual types approach this from one angle, and bidirectional typechecking from another. It should be possible for the user to customize the amount of static checking they want---a compiler can easily provide passive-aggressive warnings when things are lower than a certain threshold, but there is no reason not to provide users with flexibility there. Ideally, the benefits of types will become self-evident when people can compare the security, correctness, and speed of typed and untyped versions of the same code for themselves.
I also believe that effects can be handled much more intuitively than is currently done in haskell. Every time I have to write "do x' <- x ; case x' of ..." or "f <$> g x <*> g y <*> ..." I die a little inside. I assume someone, somewhere has done research into streamlining this, perhaps by overloading function application or doing some fancy elaboration during typechecking? I haven't looked into this at all, but it seems like an obvious avenue of inquiry.
Programming with the state and reader monads is also a nuisance due to the anonymous nature of their environments. I think we can solve two problems at once here by explicitly declaring the environments that these things refer to: this will remove the need for obtuse combinators such as put, get, and ask, while still providing the same information as the monadic types, just in a different location. For example, instead of writing
incr :: State Integer
incr = do
i <- get
put $ i + 1
get
we could write
state i : Integer
def incr () =
i = i + 1
return i
or something along those lines. This places an added burden on the compiler, but as long as the problem is decidable, this is exactly where the burden should be, rather than the programmer. In an ideal world we would even be able to infer the appropriate monad transformer stacks (one of the most terrifying parts of writing Haskell code), at least in a limited sense, though I don't know how feasible this is in the general case or even a restricted case. If there's no work on this out there already, though, this again seems like an important avenue for research.
Another issue is code verification. Not even Haskell does a good job on this front, since type classes are subject to extralingual laws, but in a language that permits dependent types, this can be fixed. I don't know how well these would compose, but it would be nice to have a built-in layer that allows some forms of verification from within the language itself. The compiler could provide a series of missing proof obligations that would appear in warnings, so the user can choose what level of proof they want to apply. Most people won't care, for example, about proving that their monoids are actual monoids, or that their call to foldr could be turned into a faster call to foldl, but for production code, or critical code, or whatever, it adds an extra level of assurance that no number of code reviews can provide.
Of course, to do any of this, it will take a lot of time, and a lot of libraries, especially since a lot of the effectful stuff will need to be special-cased. Having an FFI can probably help a bit on the library front, though I'm not sure what the FFI should be to---C seems like the obvious choice, but that seems like it would also encourage writing and using a ton of unsafe code, so I'm not sure where to fall on that. Libraries aren't going to write themselves, though.
Anyway, this is all very preliminary. Over the summer I hope to get started on writing some of this as a side project; it will probably proceed at a glacial pace, especially since I am not very familiar with implementing many of these things, but I think a decent first milestone would be a hybrid functional-imperative language with accessible syntax, but that elaborates into a Haskell-like pure monadic backend. We'll see. If anyone knows of any results on any of the things I mentioned, let me know; I'm guessing there's at least one grotesquely undecideable problem up there somewhere. I want to have a high-level functional language that you don't have to be a type- or category-theorist to use, though, and nothing that currently exists that I'm aware of meets these requirements. Of course, I am a type- and category-theorist, so the implementation will also be a great chance for me to learn Idris!
Fair warning: this is pretty rambly! I just wanted to get these words out of my head and onto something more concrete; a more coherent version will come when I actually start working on this
It is a tragedy that Python is as popular as it is. I believe this in the usual sense, of course---its semantics are terrible---but that's not what I want to emphasize here. I think a lot of the PL community, myself included, is to blame for this. Our languages are not accessible; Python is. While I think it should be possible for most programmers to learn how to write good functional code in ML or Haskell or whatever, I also don't think it should be mandatory.
From talking to people who like Python, the feature I hear mentioned the most is how easy it is to use. While ML and Haskell are comparable to Python in terms of the length of a Hello, world program, presumably (unfortunately, the people I'm talking to don't also know ML or Haskell) the ease of use is diminished somewhat in more complicated programs.
We understand semantics well, yes; that's kind of our job. But programming is a populist skill, to some extent, not limited to people with the time and inclination to learn about monads and category theory and parentheses, and requiring that people sit down and learn these things, instead of presenting languages that use them in an intuitive way, is narrowminded and exclusionary. Programming may not be the most important thing in people's hierarchy of needs, but everything I've been learning and acting on lately in the rest of my life compels me, as someone who hopes to be an expert in the field, to make sure everyone can benefit from my work and the work of my colleagues, not just those of us with graduate degrees.
This is really the driving force behind this post; everything else is details. I'm sure I'm leaving things out, too---I'm using "accessible" here in a very broad sense, but it's also important to make sure languages are accessible to people with disabilities, which is an area in which I am woefully ignorant. My ultimate goal is to have a good, theoretically sound language where I could walk up to anyone in the world and say "If you want to program, you should use <foo>", and not deserve to have an angry mob chase me off. Right now, Python is, I think, the best candidate for that. Certainly all of my favorite languages (SML, Haskell, Agda), while theoretically well-founded, fail the angry mob test. This is a problem with the languages, not with the people who don't want to use them.
As someone who is relatively new to Haskell, I can see a lot of areas in it that I would never, ever want to explain to a new programmer, especially not one who is new to functional programming. In particular, the syntax for composing monadic functions, littered with <$> and <*> in the best case, is atrocious. do-notation is a step towards usability, but does not go anywhere near far enough to make it accessible. Thinking about the types of monadic code and how they compose is also a nuisance, and while I am in favor of keeping track of effects in the type system, I believe the interface to this can be vastly improved.
Lately, I have been thinking of how to fix these things. My ultimate goal is an accessible programming language with a good, ideally somewhat clean, semantics. I haven't defined "accessible" yet, and for good reason---I'm not sure what all it should encompass yet. A few criteria, though, are:
- Lightweight, unobtrusive types
- Clean syntax (no System.out.whatever nonsense, and as few reserved keywords as possible)
- Intuitive semantics. Calling effectful code should be easy. Doing effectful things should be easy. Doing all of these in a SAFE way should, of course, be easy. I should be able to write a for loop or a while loop without much effort expended. Haskell fails all of these.
Python passes all of these criteria (even the type system one---its single type is unobtrusive!), but if I were okay with python I wouldn't be writing this post, or in PL grad school, for that matter. So we also want some criteria for the semantic side of things, with design informed by PL research and experience with languages like Haskell and ML.
- Expressive module system (I come from ML land, and am therefore morally obligated to put this first)
- Expressive type system---I still believe that it is important for effects to be captured in the types of code, for example.
- Operator overloading, presumably with typeclasses
- First-class proofs of correctness. I'm not quite sure how this will fit in, if at all, but it would be nice to have dependent types of some sort baked in to the language so you could prove things about the function you just wrote, if you want to.
At first glance, these seem irreconcilable, but I believe they are not. There is already a lot of work being done to make type systems less intrusive, for example---gradual types approach this from one angle, and bidirectional typechecking from another. It should be possible for the user to customize the amount of static checking they want---a compiler can easily provide passive-aggressive warnings when things are lower than a certain threshold, but there is no reason not to provide users with flexibility there. Ideally, the benefits of types will become self-evident when people can compare the security, correctness, and speed of typed and untyped versions of the same code for themselves.
I also believe that effects can be handled much more intuitively than is currently done in haskell. Every time I have to write "do x' <- x ; case x' of ..." or "f <$> g x <*> g y <*> ..." I die a little inside. I assume someone, somewhere has done research into streamlining this, perhaps by overloading function application or doing some fancy elaboration during typechecking? I haven't looked into this at all, but it seems like an obvious avenue of inquiry.
Programming with the state and reader monads is also a nuisance due to the anonymous nature of their environments. I think we can solve two problems at once here by explicitly declaring the environments that these things refer to: this will remove the need for obtuse combinators such as put, get, and ask, while still providing the same information as the monadic types, just in a different location. For example, instead of writing
incr :: State Integer
incr = do
i <- get
put $ i + 1
get
we could write
state i : Integer
def incr () =
i = i + 1
return i
or something along those lines. This places an added burden on the compiler, but as long as the problem is decidable, this is exactly where the burden should be, rather than the programmer. In an ideal world we would even be able to infer the appropriate monad transformer stacks (one of the most terrifying parts of writing Haskell code), at least in a limited sense, though I don't know how feasible this is in the general case or even a restricted case. If there's no work on this out there already, though, this again seems like an important avenue for research.
Another issue is code verification. Not even Haskell does a good job on this front, since type classes are subject to extralingual laws, but in a language that permits dependent types, this can be fixed. I don't know how well these would compose, but it would be nice to have a built-in layer that allows some forms of verification from within the language itself. The compiler could provide a series of missing proof obligations that would appear in warnings, so the user can choose what level of proof they want to apply. Most people won't care, for example, about proving that their monoids are actual monoids, or that their call to foldr could be turned into a faster call to foldl, but for production code, or critical code, or whatever, it adds an extra level of assurance that no number of code reviews can provide.
Of course, to do any of this, it will take a lot of time, and a lot of libraries, especially since a lot of the effectful stuff will need to be special-cased. Having an FFI can probably help a bit on the library front, though I'm not sure what the FFI should be to---C seems like the obvious choice, but that seems like it would also encourage writing and using a ton of unsafe code, so I'm not sure where to fall on that. Libraries aren't going to write themselves, though.
Anyway, this is all very preliminary. Over the summer I hope to get started on writing some of this as a side project; it will probably proceed at a glacial pace, especially since I am not very familiar with implementing many of these things, but I think a decent first milestone would be a hybrid functional-imperative language with accessible syntax, but that elaborates into a Haskell-like pure monadic backend. We'll see. If anyone knows of any results on any of the things I mentioned, let me know; I'm guessing there's at least one grotesquely undecideable problem up there somewhere. I want to have a high-level functional language that you don't have to be a type- or category-theorist to use, though, and nothing that currently exists that I'm aware of meets these requirements. Of course, I am a type- and category-theorist, so the implementation will also be a great chance for me to learn Idris!