Distributed workflows with Jupyter

Jupyter Notebook’s capability to unify imperative code and declarative metadata in a unique format puts them halfway between the two classes of tools commonly used for workflow modeling: high-level coordination languages and low-level distributed computing libraries. Also, Jupyter Notebooks come with a feature-rich, user-friendly web interface out-of-the-box, making them far more accessible for domain experts than the SSH-based remote shells commonly exposed by HPC facilities worldwide.

Iacopo Colonnelli

@octectcomposer

Ph.D. student in Modeling and Data Science

Iacopo Colonnelli is a Ph.D. student in Modeling and Data Science at Università di Torino. He received his master’s degree in Computer Engineering from Politecnico di Torino with a thesis on a high-performance parallel tracking algorithm for the ALICE experiment at CERN.
His research focuses on both statistical and computational aspects of data analysis at large scale and on workflow modeling and management in heterogeneous distributed architectures

What the attendees will learn

Attendees will learn how the literate computing paradigm (and in particular the Jupyter software stack) can be used to produce well-documented application prototypes and scientific experiments, especially in the data science domain. Then, we will explore how these prototypes can scale to real distributed application through the literate distributed computing abstraction, without the need to rewrite the code from scratch.

 

Requirements

A laptop with Docker and Docker Compose installed on the machine.

Basic knowledge of Python is required for the hands-on part, while the general discussion is open to everyone.

Companies using this technology

Jupyter Notebooks are used by almost every data science platform offered as-a-Service in the Cloud (either pure Jupyter Notebooks or a decorated version of them, e.g. Google Colab).

Content

This workshop explores Jupyter Notebooks and its potential to express complex workflows and coordinate their distributed execution, powered by the Jupyter workflow kernel developed at University of Torino.

In particular, the workshop will be composed of two main units. The first part will cover a general introduction to literate computing and Jupyter workflows, exploring their features and limitations in terms of portability, reproducibility, and ease of use by domain experts. Then, the second part will explore their capability to express distributed applications and to automatically optimize their distributed execution. In both parts, a theoretical introduction will be followed by hands-on exercises.

Workshop Plan

Part 1: Jupyter Notebooks
 - Literate computing paradigm
 - Jupyter Notebooks and their software stack
 - Pros and cons of Jupyter
 - Hands-on: write your own Notebook
 
Part 2: Distributed workflows with Jupyter
 - Literate distributed computing
 - Distributed workflows with Jupyter
 - The Jupyter-workflow kernel
 - Hands-on: write your own workflow
 


 

Distributed workflows with Jupyter


Date and time:

Wednesday 27th

15:00 - 18:00

Topics:

Workflows, Jupyter Notebooks, Prototyping, Data Science

Target audience roles:

Everyone that would like to fast prototype research experiments or application with a well-documented format, and potentially evolve it into a real applicatio nat scale without rewriting a line of code. 

Attendees:

22

Included:

Coffee and tea


malaga map

MÁLAGA

GOOGLE MAPSgoogle map icon
VENUE

27TH APRIL

Polo de Contenidos Digitales Málaga
Av de Sor Teresa Prat, 15, 29003 Málaga

28TH-29TH APRIL

FYCMA - Palacio de Ferias y Congresos de Málaga
Av. de José Ortega y Gasset, 201, 29006 Málaga