Binder is a publicly accessible online service for executing interactive notebooks based on Git repositories. Binder dynamically builds and deploys containers following a recipe stored in the repository, then gives the user a browser-based notebook interface. The Binder group periodically releases a log of container launches from the public Binder service. Archives of launch records are available here. These records do not include identifiable information like IP addresses, but do give the source repo being launched along with some other metadata. The main content of this dataset is in the
binder.sqlite
file. This SQLite database includes launch records from 2018-11-03 to 2021-06-06 in the events
table, which has the following schema. CREATE TABLE events( version INTEGER, timestamp TEXT, provider TEXT, spec TEXT, origin TEXT, ref TEXT, guessed_ref TEXT ); CREATE INDEX idx_timestamp ON events(timestamp);
version
indicates the version of the record as assigned by Binder. The origin
field became available with version 3, and the ref
field with version 4. Older records where this information was not recorded will have the corresponding fields set to null. timestamp
is the ISO timestamp of the launch provider
gives the type of source repo being launched ("GitHub" is by far the most common). The rest of the explanations assume GitHub, other providers may differ. spec
gives the particular branch/release/commit being built. It consists of <github-id>/<repo>/<branch>
. origin
indicates which backend was used. Each has its own storage, compute, etc. so this info might be important for evaluating caching and performance. Note that only recent records include this field. May be null. ref
specifies the git commit that was actually used, rather than the named branch referenced by spec
. Note that this was not recorded from the beginning, so only the more recent entries include it. May be null. For records where ref
is not available, we attempted to clone the named reference given by spec
rather than the specific commit (see below). The guessed_ref
field records the commit found at the time of cloning. If the branch was updated since the container was launched, this will not be the exact version that was used, and instead will refer to ...