Project everware: Reusable science
Take interesting code for a spin, in your browser!
Easily spin up the code from a git repository in a custom docker container, in order to quickly test out and play with something you are curious about.
Like nbviewer, but executable. A super easy way to run a notebook (with all of its complicated dependencies). Actually, you do not even have to use python or even jupyter notebooks to profit from this.
Does all this sound interesting? Let us introduce project everware! The marriage of jupyterhub and custom docker containers.
The everware project is making the data analysis part of science easier to reuse and reproduce. As easy as pasting the URL of a github repository. We will then launch a custom docker container in which the code runs and connect you to it in your browser. This makes it super easy for you to try out someone else's code, modify it and take the parts that interest you and reuse them. It also means that reproducing and preserving an analysis comes for free.
We have a working demo where you can paste the URL any of the repositories compatible with everware.
How does it work?
What is needed for this magic to work? The only real, fundamental
requirement is that the repository you want to try out contains a
Dockerfile
describing how to setup the environment for the
analysis. Preferably it should also contain a jupyter notebook (an
executable README
on steroids) describing how to run each step of
the analysis.
This notebook provides the narrative that links the individual steps of the analysis. It can contain LaTeX, images, equations and code. A notebook alternates between narrative and executable cells.
One step of your analysis can be as simple as echo 3.141 > PI.txt
,
or require compiling large amounts of code and running it.
The steps of your analysis do not have to be written in python
, they
can be what ever you want them to be. Compiled c++
, bash
,
FORTRAN
, R
, what have you.
Not even the executable cells in the notebook describing the steps
have to be python
, as of today there are 49 kernels for
jupyter,
so you can write the steps in your favourite language. Actually, you
do not even need to use jupyter notebooks ... but it is a good idea!
To recap: you add a Dockerfile
to your analysis repository, we offer
you ever lasting single click executability, reusability, and
reproducibility.
How is everware different?
There already exist great services like sagemath, dominodatalab, or tmpnb. If you could mix all of them together they would offer docker containers in the cloud, jupyter notebooks as interface, git repositories for collaboration, and no-wasteful-clicking-just-run-the-thing usability.
However, neither of them on their own solves all the challenges of a modern, computationally intensive, collaborative, reusable, and reproducible scientific endeavour.
This is why we are building everware.
What is in a Dockerfile?
A Dockerfile
is a convenient way of describing all your project's
dependencies. Probably you do not even have to write your own
Dockerfile
, just pick one that is close enough/fulfils your
requirements.
We are building this because it fixes several painful things we encounter most days as experimental physicists. Below I list some of them.
The convenience
You can then use this environment while developing your analysis in the first place. The container will run locally, you can use your favourite editor, mount your large input files directly into the container, etc. You can use what ever libraries and versions of them that you like, not your local admin. When you are done with your analysis you get reproducibility for free.
The holiday
Sometimes libraries and software get updated while you are on holiday. Nothing spoils your holiday faster than returning to "nothing works anymore ... ahhrgg ... why!!! It was working before I left!!"
The new guy
How many hours have you spent trying to figure out why that new
student or postdoc ca not run your code? In the end it is always some
weird environment variable or something hidden at the bottom of your
.bashrc
(or worse, theirs).
The coding ninja student
If you get others to use it it has the side effect that you make it
much easier for yourself to take the code from that
coding-ninja-student that makes that amazing Figure 3 in your paper
and use it yourself. You even have a chance to understand what you
need to install to run it in your analysis (Dockerfile
syntax is
like a shared language, unlike the l33t speak of
coding-ninja-student).
The helper
Personally, I am kind of into machine learning. Often people will send
me some weird script, that depends on libraries X
, Y
and Z
. In
some weird combination of versions. I look at it and think 'yeah right
... this is gonna take a while.', getting things setup to help you
will take a while. Therefore I will be demanding a lot of beers in
exchange for helping you with your problem. If I can run it in my
browser at a click of a button ... one coffee will do it (and I will
do it now, not maybe later).
So really you are doing yourself a favour, at a very small price.
The Everware Project
We started this project at the CERN webfest hackathon two weeks ago where we won the price for "best tech".
We are: @OmeGak, @ibab, @ndawe, @betatim, @uzzielperez, @anaderi, and @AxelVoitier.
It is based on the amazing work by the Jupyterhub and docker guys. We shamelessly stole, copied, hacked and modified all the things. Check out the code on github: everware project
Do join us!
We have a working demo where you can paste the URL any of the repositories compatible with everware.
Lots is still missing. Right now we provide the computing power, this
is Ok for trying stuff out, but if you want to run a typical LHC
analysis pipeline you need "days" of CPU time. So we want to add a way
for the visitor to type in the repository and some credentials that
allow us to launch the docker container on their private "cloud" (at
their company, institute, AWS, ...). This is the number one missing
piece. As well as lots and lots of tidying and making things more
robust. We also need a lot more documentation, so far your best bet is
to drop by on gitter:
everware/everware or email
everware@googlegroups.com
.