Preface for unaware
When you install a particular version of GHC on your machine it comes with a collection of
"boot" libraries. What does it mean to be a "boot" library? Quite simply, a library must
be used for implementation of GHC and other core components. Two such notable libraries are
ghc. All the matching package names and
their versions for a particular GHC release can be found in this
The fact that a library comes wired-in with GHC means that there is never a need to download sources for the particular version from Hackage or elsewhere. In fact, there is really no need to upload the sources on Hackage even for the purpose of building the Haddock for each individual package, since those are conveniently hosted on haskell.org
That being said, Hackage has always been a central place for releasing a Haskell package
and historically Hackage trustees would upload the exact version of almost every "boot"
package on Hackage. That is why, for example, we have
on Hackage, despite that it comes with versions of GHC from
Such an upload makes total sense. Any Haskeller using a core package as a dependency for their own package in a cabal file has a central place to look for available versions and documentation for those versions. In fact some people have become so accustomed to this process that it has been discussed on Haskell-Cafe and a few other places when such package was never uploaded:
It's a crisis that the standard library is unavailable on Hackage...
A bit over a half a year ago
ghc-8.8.1 was released, with current latest one being
ghc-8.8.3. If you carefully inspect the table of core
and try to match to available versions on Hackage for those libraries, you will quickly
notice that a few of them are missing. I personally don't know the exact reasoning
behind this is, but from what I've heard it has something to do with the fact that
ghc-8.8.1 now depends on
The problem for us is that it also affects Stackage's web interface. Let's see how and why.
The "how" is very simple. Until recently, if a package was missing from Hackage, it would
not have been listed on Stackage either. This means that if you tried to follow a
dependency of any package on
base-18.104.22.168 in nightly snapshots starting September of
last year you would not find it. As I noted before, not only was
base missing, but a few
others as well.
This problem also depicted itself in a funny looking bug on Stackage. For every package
in a list of dependencies the count was always off by at least 1 when compared with the
actual links in the list
(eg. primtive). This
had me puzzled at first. It was later that I realized that
base was missing and since
almost every every package depends on it, it was counted, but not listed, causing a
Stackage was structured in such a way that it always used Hackage as true source of available packages, except for the core packages, since those would always come bundled with GHC. For example if you look at the specification of a latest LTS-15.3 snapshot you will not find any of the core packages listed there, for they are decided by the GHC version, which in turn is specified in the snapshot.
There are a few stages, tools and actual people involved in making a Stackage snapshot happen. Here are some of the steps in the pipeline:
a curated list of packages that involves package maintainers and sometimes Stackage curators.
a curator tool that is used to construct the actual snapshot, build packages, run test suites and generate Haddocks.
The last step is of the most interest to us because
stackage.org is the place where we had stuff missing. Let's
look at some pieces of information the tool needs in order for
stackage-server to create
a page for a package:
- Package name, its version and Pantry keys (cryptographic keys that uniquely identify the contents of source distribution)
- Previously generated haddocks and hoogle files for each package
- Cabal file, so we can extract useful information about the package, such as description, license, maintainers, module names etc.
- Optionally Readme and Changelog files from the source distribution can be served on a package page as well.
Information from the latter two bullet points is only available in the source distribution tarballs. Packages that are defined in the snapshot do not pose a problem for us, because by definition their sources are available from Hackage or any of its mirrors. Core packages on the other hand are different, in a sense that they are always available in a build environment, so information about them is present when we build a package:
$ stack --resolver lts-15.0 exec -- ghc-pkg describe base name: base version: 22.214.171.124 visibility: public ...
The problem is that
stackage-server-cron tool is just an executable that is running
somewhere in a cloud and it doesn't have such environment. Therefore, until recently, we
had no means of getting the cabal files for core packages except by checking on
Hackage. With more and more core packages missing from Hackage, especially such critical
bytestring, we had to come up with solution.
Solving this problem should be simple, because all we really need is cabal files. Haddock for missing packages has been generated and was always available, it is the extra little bit of the meta information that was needed in order to generate the appropriate links and the package home page.
The first place to look for cabal files was the GHC git repository. The whole GHC bundle though is quite different from all other packages that we are normally used to:
- Libraries that GHC depends on do not come from Hackage, as we already know, instead they are pinned as git submodules.
- Most of the packages that are defined in the GHC repository do not have cabal files. Instead they have templates that are used for generating cabal files for a particular architecture during the build process.
This means that the repository is not a good source for grabbing cabal files. Building GHC from source is a time consuming process and we don't want to be doing that for every release, just to get cabal files we need. A better alternative is to simply download a distribution package for a common operating system and extract the missing cabal files from there. We used Linux x86_64 for Debian, but the choice of the OS shouldn't really matter, since we only really need high level information from those cabal files.
That was it. The only thing we really needed to do in order to get missing core files on
Stackage was to collect all missing cabal files and make them available to the
Going back to the origin of Stackage it turns out that there was quite a few of such core
packages missing, one most common and most notable one was
ghc itself. Only a handful of
officially released versions were ever uploaded to Hackage.
From now on we have a special repository
where we can place cabal files for missing core packages, which
tool will pick up automatically. As it usually goes with public repositories
anyone from the community is encouraged to submit pull requests, whenever they notice
that a core package is not being listed on Stackage for a newly created snapshot.
For the past few weeks the very first such missing core package from Hackage
base-126.96.36.199 was being
included on Stackage. With recent notable additions being