In theory I should be writing my compiler right now, but I just got a new keyboard and am typing at about quarter-speed, so this is a great chance to write down some stuff that's  been bouncing around my head lately and get used to my new keyboard, all at once. Not that this will help me with programming-style typing, but oh well.

Fair warning: this is pretty rambly! I just wanted to get these words out of my head and onto something more concrete; a more coherent version will come when I actually start working on this

It is a tragedy that Python is as popular as it is. I believe this in the usual sense, of course---its semantics are terrible---but that's not what I want to emphasize here. I think a lot of the PL community, myself included, is to blame for this. Our languages are not accessible; Python is. While I think it should be possible for most programmers to learn how to write good functional code in ML or Haskell or whatever, I also don't think it should be mandatory.

From talking to people who like Python, the feature I hear mentioned the most is how easy it is to use. While ML and Haskell are comparable to Python in  terms of the length of a Hello, world program, presumably (unfortunately, the people I'm talking  to don't also know ML or Haskell) the ease of use is diminished somewhat in more complicated programs.

We understand semantics well, yes; that's kind of our job. But programming is a populist skill, to some extent, not limited to people with the time and inclination to learn about monads and category theory and parentheses, and requiring that people sit down and learn these things, instead of presenting languages that use them in an intuitive way, is narrowminded and exclusionary. Programming may not be the most important thing in people's hierarchy of needs, but everything I've been learning and acting on lately in the rest of my life compels me, as someone who hopes to be an expert in the field, to make sure everyone can benefit from my work and the work of my colleagues, not just those of us with graduate degrees.

This is really the driving force behind this post; everything else is details. I'm sure I'm leaving things out, too---I'm using "accessible" here in a very broad sense, but it's also important to make sure languages are accessible to people with disabilities, which is an area in which I am woefully ignorant. My ultimate goal is to have a good, theoretically sound language where I could walk up to anyone in the world and say "If you want to program, you should use <foo>", and not deserve to have an angry mob chase me off. Right now, Python is, I think, the best candidate for that. Certainly all of my favorite languages (SML, Haskell, Agda), while theoretically well-founded, fail the angry mob test. This is a problem with the languages, not with the people who don't want to use them.

As someone who is relatively new to Haskell, I can see a lot of areas in it that I would never, ever want to explain to a new programmer, especially not one who is new to functional programming. In particular, the syntax for composing monadic functions, littered with <$> and <*> in the best case, is atrocious. do-notation is a step towards usability, but does not go anywhere near far enough to make it accessible. Thinking about the types of monadic code and how they compose is also a nuisance, and while I am in favor of keeping track of effects in the type system, I believe the interface to this can be vastly improved.

Lately, I have been thinking of how to fix these things. My ultimate goal is an accessible programming language with a good, ideally somewhat clean, semantics. I haven't defined "accessible" yet, and for good reason---I'm not sure what all it should encompass yet. A few criteria, though, are:

- Lightweight, unobtrusive types
- Clean syntax (no System.out.whatever nonsense, and as few reserved keywords as possible)
- Intuitive semantics. Calling effectful code should be easy. Doing effectful things should be easy. Doing all of these in a SAFE way should, of course, be easy. I should be able to write a for loop or a while loop without much effort expended. Haskell fails all of these.

Python passes all of these criteria (even the type system one---its single type is unobtrusive!), but if I were okay with python I wouldn't be writing this post, or in PL grad school, for that matter. So we also want some criteria for the semantic side of things, with design informed by PL research and experience with languages like Haskell and ML.

- Expressive module system (I come from ML land, and am therefore morally obligated to put this first)
- Expressive type system---I still believe that it is important for effects to be captured in the types of code, for example.
- Operator overloading, presumably with typeclasses
- First-class proofs of correctness. I'm not quite sure how this will fit in, if at all, but it would be nice to have dependent types of some sort baked in to the language so you could prove things about the function you just wrote, if you want to.

At first glance, these seem irreconcilable, but I believe they are not. There is already a lot of work being done to make type systems less intrusive, for example---gradual types approach this from one angle, and bidirectional typechecking from another. It should be possible for the user to customize the amount of static checking they want---a compiler can easily provide passive-aggressive warnings when things are lower than a certain threshold, but there is no reason not to provide users with flexibility there. Ideally, the benefits of types will become self-evident when people can compare the security, correctness, and speed of typed and untyped versions of the same code for themselves.

I also believe that effects can be handled much more intuitively than is currently done in haskell. Every time I have to write "do x' <- x ; case x' of ..." or "f <$> g x <*> g y <*> ..." I die a little inside. I assume someone, somewhere has done research into streamlining this, perhaps by overloading function application or doing some fancy elaboration during typechecking? I haven't looked into this at all, but it seems like an obvious avenue of inquiry.

Programming with the state and reader monads is also a nuisance due to the anonymous nature of their environments. I think we can solve two problems at once here by explicitly declaring the environments that these things refer to: this will remove the need for obtuse combinators such as put, get, and ask, while still providing the same information as the monadic types, just in a different location. For example, instead of writing

incr :: State Integer
incr = do
  i <- get
  put $ i + 1
  get

we could write

state i : Integer

  def incr () =
    i = i + 1
    return i

or something along those lines. This places an added burden on the compiler, but as long as the problem is decidable, this is exactly where the burden should be, rather than the programmer. In an ideal world we would even be able to infer the appropriate monad transformer stacks (one of the most terrifying parts of writing Haskell code), at least in a limited sense, though I don't know how feasible this is in the general case or even a restricted case. If there's no work on this out there already, though, this again seems like an important avenue for research.

Another issue is code verification. Not even Haskell does a good job on this front, since type classes are subject to extralingual laws, but in a language that permits dependent types, this can be fixed. I don't know how well these would compose, but it would be nice to have a built-in layer that allows some forms of verification from within the language itself. The compiler could provide a series of missing proof obligations that would appear in warnings, so the user can choose what level of proof they want to apply. Most people won't care, for example, about proving that their monoids are actual monoids, or that their call to foldr could be turned into a faster call to foldl, but for production code, or critical code, or whatever, it adds an extra level of assurance that no number of code reviews can provide.

Of course, to do any of this, it will take a lot of time, and a lot of libraries, especially since a lot of the effectful stuff will need to be special-cased. Having an FFI can probably help a bit on the library front, though I'm not sure what the FFI should be to---C seems like the obvious choice, but that seems like it would also encourage writing and using a ton of unsafe code, so I'm not sure where to fall on that. Libraries aren't going to write themselves, though.

Anyway, this is all very preliminary. Over the summer I hope to get started on writing some of this as a side project; it will probably proceed at a glacial pace, especially since I am not very familiar with implementing many of these things, but I think a decent first milestone would be a hybrid functional-imperative language with accessible syntax, but that elaborates into a Haskell-like pure monadic backend. We'll see. If anyone knows of any results on any of the things I mentioned, let me know; I'm guessing there's at least one grotesquely undecideable problem up there somewhere. I want to have a high-level functional language that you don't have to be a type- or category-theorist to use, though, and nothing that currently exists that I'm aware of meets these requirements. Of course, I am a type- and category-theorist, so the implementation will also be a great chance for me to learn Idris!
A while ago, Andy Keep defended his thesis here at IU about the nanopass framework, a super-slick (to say the least) DSL for writing compilers in Scheme. One of the cool things about it is that you only have to define local parts of passes: if there's a peephole optimization that you want to do, you only have to specify the conversation for the part of the AST that you want to change; everything else stays the same.

This reminded me a little bit of substructural logic programming; there's a lot going on there, but the idea is that you use a substructural logic (such as linear logic) for the semantics of your logic programming. This gives you the ability to write what are essentially state machine transitions as a forward chaining logic program. I can imagine a sufficiently stringy AST definition that "flattens" the AST,  putting everything in the context at once and using something like pointers (or even ordered contexts, perhaps, if using ordered logic programming) to tell what the structure of the term is.

Such a representation raises questions of its own: is flattening the AST like that a good idea? Is there a way to do it that doesn't feel like munging around in C? Could it be done behind-the-scenes so that none of the ensuing ugliness would be exposed to the programmer? It seems like it might make a cool backend for nanopass or something similar, for example. But I just want to ask these questions, not answer them; the point is, if we have this kind of representation, it seems like we then automatically get free parallel peephole optimizations. Write the transition rule and run it, and it will automatically apply wherever it can.

But can we do better than just myopic, local transformations? I was talking to Aaron Todd, another grad student here, and he came up with an idea to parallelize compilation:

<toddaaro> it seems like in a piece of code the syntax monotonically gets closer to machine code, and the steps along the way are triggered by reaching an information threshold
<toddaaro> which could let you easily parallelize compilers
 
His idea was to write passes in terms of the pieces of syntax themselves, rather than the entire AST. Ideally, this would result in abstracting away from information dependencies and allowing the code to run in parallel, to some extent; some dependencies exist, of course, but it seems like it would be possible to block on that information becoming available in a sufficiently expressive setting.

I think substructural logic programming might provide such a setting. With well-defined interfaces between each part of a pass, you could write everything declaratively and trust the logic programming engine to do the rest. This would require some good mechanisms for modularity in SSLP, but there's at least been some thought put into this already; if this kind of approach is feasible, I think the main question is whether to push SSLP as a compiler implementation paradigm or to use it as a backend for some kind of (potentially nanopass-like) DSL.

This is, of course, assuming the approach I've outlined above works out, but I think it should. A compiler is, at its core, a series of translations, and translations can be though of as rewrite rules. In this case they'll be stateful, but SSLP is already good at dealing with state (it is, if anything, even more convenient than using the State monad in haskell, which has already been a lifesaver for me in this course so far!). If anything, it seems like a perfect fit! But anyway, this is a pretty fluffy blog post, so what do y'all think? I might try to come up with a few code examples, if there's interest.
I wrote an unfortunately long pdf about linear logic and comonads. It might be interesting? It might solve a long-standing problem? Or it might just be a good way for me to procrastinate on my homework---only one way to find out!
It's been a while since I made this account, ostensibly for the purpose of research blogging, so November---often used by folks as a kind of "thing-a-day" month---seems like a good time to (re?)surrect it and actually get some research blogging done. I doubt I'll update every day, but I have a fair amount of stuff going on, so it shouldn't be too hard to post about that every few days or so.

Also, if anyone knows how to change the font in one of these posts, let me know; it'd be nice to use a fixed width font for programming stuff. Footnotes would come in handy, too.

For anyone who might not know me---hi! I'm Zach Sparks, a PhD student (currently in my second year) at Indiana University. My broad area of research is programming langauges, in particular type systems and logic, and I've been working with Amr Sabry (who may or may not technically be my advisor yet, but I think it is pretty clear at this point that he is my advisor) on a logic for reversible computation.

I've also got another project going on with Dan Friedman, working on implementing a little dependently-typed programming language, kind of like Agda. It is pretty cool! We haven't done much so far (so I probably won't mention it beyond this), but I've already learned a hell of a lot about dependent types just from working on/thinking about it.

Aside from those, I have lots of smallish ideas floating around---these are probably the things that I will use to fill up this blog in the coming month! In general, I care a lot about formal treatments of modularity (in particular, ML-style modules) and variable binding (especially higher-order abstract syntax). I've also been thinking a lot about comonads, in several different contexts, most recently that of designing a rad new programming language.

So, there should be tons of cool stuff coming in to populate this space in the near future. Stay tuned!

Profile

chooseyourownlogic

April 2013

S M T W T F S
 123456
78910111213
14151617 181920
21222324252627
282930    

Syndicate

RSS Atom

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 24th, 2025 04:28 am
Powered by Dreamwidth Studios