Our composable community infrastructure

TL;DR: we propose to factor Hackage into a separate, very simple service serving a stash of Haskell packages, with the rest of Hackage built on top of that, in the name of availability, reliability and extensibility.

One of the main strengths of Haskell is just how much it encourages composable code. As programmers, we are goaded along a good path for composability by the strictures of purity, which forces us to be honest about the side effects we might use, but first and foremost because first class functions and lazy evaluation afford us the freedom to decompose solutions into orthogonal components and recompose them elsewhere. In the words of John Hughes, Haskell provides the necessary glue to build composable programs, ultimately enabling robust code reuse. Perhaps we ought to build our shared community infrastructure along the same principles: freedom to build awesome new services by assembling together existing ones, made possible by the discipline to write these basic building blocks as stateless, scalable, essentially pure services. Let's think about how, taking packages hosting as an example, with a view towards solving three concrete problems:

Haskell packages

Today Haskell packages are sets of files with a distinguished *.cabal file containing the package metadata. We host these files on a central package repository called Hackage, a community supported service. Hackage is a large service that has by and large served the community well, and has done so since 2007. The repository has grown tremendously, by now hosting no less than 5,600 packages. It implements many features, some of which include package management. In particular, Hackage allows registered users to:

Some of the above constitute the defining features of a central package repository. Of course, Hackage is much more than just that today - it is a portal for exploring what packages are out there through a full blown web interface, running nightly builds on all packages to make sure they compile and putting together build reports, generating package API documentation and providing access to the resulting HTML files, maintaining RSS feeds for new package uploads, generating activity graphs, integration with Hoogle and Hayoo, etc.

In the rest of this blog post, we'll explore why it's important to tweeze out the package repository from the rest, and build the Hackage portal on top of that. That is to say, talk separately about Hackage-the-repository and Hackage-the-portal.

A central hub is the cement of the community

A tell-tale sign of a thriving development community is that a number of services pop up independently to address the needs of niche segments of the community or indeed the community as a whole. Over time, these community resources together as a set of resources form an ecosystem, or perhaps even a market, in much the same way that the set of all Haskell packages form an ecosystem. There is no central authority deciding which package ought to be the unique consecrated package for e.g. manipulating filesystem paths: on Hackage today there are at least 5, each exploring different parts of the design space.

However, we do need common infrastructure in place, because we do need consensus about what package names refer to what code and where to find it. People often refer to Hackage as the "wild west" of Haskell, due to its very permissive policies about what content makes it on Hackage. But that's not to say that it's an entirely chaotic free-for-all: package names are unique, only designated maintainers can upload new versions of some given package and version numbers are bound to a specific set of source files and content for all time.

The core value of Hackage-the-repository then, is to establish consensus about who maintains what package, what versions are available and the metadata associated with each version. If Alice has created a package called foolib, then Bob can't claim foolib for any of his own packages, he must instead choose another name. There is therefore agreement across the community about what foolib means. Agreement makes life much easier for users, tools and developers talking about these packages.

What doesn't need consensus is anything outside of package metadata and authorization: we may want multiple portals to Haskell code, or indeed have some portals dedicated to particular views (a particular subset of the full package set) of the central repository. For example, stackage.org today is one such portal, dedicated to LTS Haskell and Stackage Nightly, two popular views of consistent package sets maintained by FP Complete. We fully anticipate that others will over time contribute other views — general-purpose or niche (e.g. specialized for a particular web framework) — or indeed alternative portals — ranging from small, fast and stable to experimental and replete with social features aplenty. Think powerful new search functionality, querying reverse dependencies, pre-built package sets for Windows, OS X and Linux, package reviews, package voting ... you name it!

Finally, by carving out the central package repository into its own simple and reliable service, we limit the impact of bugs on both availability and reliably, and thus preserve on one of our most valuable assets: the code that we together as a community have written. Complex solutions invariably affect reliability. Keeping the core infrastructure small and making it easy to build on top is how we manage that.

The next section details one way to carve the central package repository, to illustrate my point. Alternative designs are possible of course - I merely wish to seek agreement that a modular architecture with at its core a set of very small and simple services as our community commons would be beneficial to the community.

Pairing down the central hub to its bare essence

Before we proceed, let's first introduce a little bit of terminology:

A central hub for all open source Haskell packages might look something like this:

The metadata and package directories together form a central repository of all open source Haskell packages. Just as is the case with Hackage today, anyone is allowed to upload any package they like via the upload service. We might call these directories collectively The Haskell Stash. End-user command-line tools, such as cabal-install, need only interact with the Stash to get the latest list of packages and versions. If the upload or authentication services go down, existing packages can still be downloaded without any issue.

Availability is a crucial property for such a core piece of infrastructure: users from around the world rely on it today to locate the dependencies necessary for building and deploying Haskell code. The strategy for maintaining high availability can be worked out independently for each service. A tried and tested approach is to avoid reinventing as much of the wheel as we can, reusing existing protocols and infrastructure where possible. I envision the following implementation:

Conclusion

The Haskell Stash is a repository in which to store our community's shared code assets in as simple, highly available and composable a manner as possible. Reduced to its bare essence, easily consumable by all manner of downstream services, most notably, Hackage itself, packdeps.haskellers.com, hdiff.luite.com, stackage.org, etc. It is by enabling people to extend core infrastructure in arbitrary directions that we can hope to build a thriving community that meets not just the needs of those that happened to seed it, but that furthermore embraces new uses, new needs, new people.

Provided community interest in this approach, the next steps would be:

  1. implement the Haskell Stash;
  2. implement support for the Haskell Stash in Hackage Server;
  3. in the interim, if needed, mirror Hackage content in the Haskell Stash.

In the next post in this series, we'll explore ways to apply the same principles of composability to our command-line tooling, in the interest of making our tools more hackable, more powerful and ship with fewer bugs.

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.