Project everware: Reusable science

13 August 2015

Take interesting code for a spin, in your browser!

Easily spin up the code from a git repository in a custom docker container, in order to quickly test out and play with something you are curious about.

Like nbviewer, but executable. A super easy way to run a notebook (with all of its complicated dependencies). Actually, you do not even have to use python or even jupyter notebooks to profit from this.

Does all this sound interesting? Let us introduce project everware! The marriage of jupyterhub and custom docker containers.

The everware project is making the data analysis part of science easier to reuse and reproduce. As easy as pasting the URL of a github repository. We will then launch a custom docker container in which the code runs and connect you to it in your browser. This makes it super easy for you to try out someone else's code, modify it and take the parts that interest you and reuse them. It also means that reproducing and preserving an analysis comes for free.

We have a working demo where you can paste the URL any of the repositories compatible with everware.

How does it work?

What is needed for this magic to work? The only real, fundamental requirement is that the repository you want to try out contains a Dockerfile describing how to setup the environment for the analysis. Preferably it should also contain a jupyter notebook (an executable README on steroids) describing how to run each step of the analysis.

This notebook provides the narrative that links the individual steps of the analysis. It can contain LaTeX, images, equations and code. A notebook alternates between narrative and executable cells.

One step of your analysis can be as simple as echo 3.141 > PI.txt, or require compiling large amounts of code and running it.

The steps of your analysis do not have to be written in python, they can be what ever you want them to be. Compiled c++, bash, FORTRAN, R, what have you.

Not even the executable cells in the notebook describing the steps have to be python, as of today there are 49 kernels for jupyter, so you can write the steps in your favourite language. Actually, you do not even need to use jupyter notebooks ... but it is a good idea!

To recap: you add a Dockerfile to your analysis repository, we offer you ever lasting single click executability, reusability, and reproducibility.

How is everware different?

There already exist great services like sagemath, dominodatalab, or tmpnb. If you could mix all of them together they would offer docker containers in the cloud, jupyter notebooks as interface, git repositories for collaboration, and no-wasteful-clicking-just-run-the-thing usability.

However, neither of them on their own solves all the challenges of a modern, computationally intensive, collaborative, reusable, and reproducible scientific endeavour.

This is why we are building everware.

What is in a Dockerfile?

A Dockerfile is a convenient way of describing all your project's dependencies. Probably you do not even have to write your own Dockerfile, just pick one that is close enough/fulfils your requirements.

We are building this because it fixes several painful things we encounter most days as experimental physicists. Below I list some of them.

The convenience

You can then use this environment while developing your analysis in the first place. The container will run locally, you can use your favourite editor, mount your large input files directly into the container, etc. You can use what ever libraries and versions of them that you like, not your local admin. When you are done with your analysis you get reproducibility for free.

The holiday

Sometimes libraries and software get updated while you are on holiday. Nothing spoils your holiday faster than returning to "nothing works anymore ... ahhrgg ... why!!! It was working before I left!!"

The new guy

How many hours have you spent trying to figure out why that new student or postdoc ca not run your code? In the end it is always some weird environment variable or something hidden at the bottom of your .bashrc (or worse, theirs).

The coding ninja student

If you get others to use it it has the side effect that you make it much easier for yourself to take the code from that coding-ninja-student that makes that amazing Figure 3 in your paper and use it yourself. You even have a chance to understand what you need to install to run it in your analysis (Dockerfile syntax is like a shared language, unlike the l33t speak of coding-ninja-student).

The helper

Personally, I am kind of into machine learning. Often people will send me some weird script, that depends on libraries X, Y and Z. In some weird combination of versions. I look at it and think 'yeah right ... this is gonna take a while.', getting things setup to help you will take a while. Therefore I will be demanding a lot of beers in exchange for helping you with your problem. If I can run it in my browser at a click of a button ... one coffee will do it (and I will do it now, not maybe later).

So really you are doing yourself a favour, at a very small price.

The Everware Project

We started this project at the CERN webfest hackathon two weeks ago where we won the price for "best tech".

We are: @OmeGak, @ibab, @ndawe, @betatim, @uzzielperez, @anaderi, and @AxelVoitier.

It is based on the amazing work by the Jupyterhub and docker guys. We shamelessly stole, copied, hacked and modified all the things. Check out the code on github: everware project

Do join us!

We have a working demo where you can paste the URL any of the repositories compatible with everware.

Lots is still missing. Right now we provide the computing power, this is Ok for trying stuff out, but if you want to run a typical LHC analysis pipeline you need "days" of CPU time. So we want to add a way for the visitor to type in the repository and some credentials that allow us to launch the docker container on their private "cloud" (at their company, institute, AWS, ...). This is the number one missing piece. As well as lots and lots of tidying and making things more robust. We also need a lot more documentation, so far your best bet is to drop by on gitter: everware/everware or email