When and how Jupyter Notebooks fail, and what to use instead
Have you ever come across a promising Jupyter Notebook that is supposed to show you how to use some awesome tech, only to be disappointed when it crashes after a few cells? Unlike many data scientists, I have come to really dislike Jupyter Notebooks because of all the headaches they’ve given me over the years. In this blog, I will explain some common problems with Jupyter Notebooks, share real-world examples of how things can go wrong, and suggest alternative tools and best practices that can help you avoid these problems. Additionally, I will outline specific scenarios where Jupyter Notebooks can still be very useful, as long as you follow some simple rules.
Jupyter Notebooks are a popular tool for data scientists to develop code, process and analyse data, train machine learning models, and make predictions with those models. Jupyter provides two main features:
- Interactive Computing Environment: Jupyter offers a web-based interface where users can write and execute code in real-time, supporting multiple programming languages including Julia, Python and R.
- Notebook Document Format: Jupyter Notebooks are documents that combine live code, visualisations, and narrative text, saved in a JSON-based format. Their content is split over multiple cells, which can be executed all at once, or individually one at a time.
What can go wrong
Jupyter Notebooks are primarily designed for exploratory data analysis and early prototyping. However, when used in production environments they are usually a source of trouble. Here are some common issues:
1. Execution order: Jupyter Notebooks allow code cells to be executed out of order, leading to unexpected behaviour. For example, if you load data in one cell, pre-process it in a few more cells, and train a model in another, running these cells out of order can corrupt your data pipeline.
2. Code changes: Jupyter Notebooks also allow code cells to be modified after they have been executed. This could again lead to unexpected behaviour when the notebook is executed once more from start to finish.
3. Data leakage: Too often, notebooks contain sensitive output. To understand the data sets they’re working with, data scientists like to print samples of them, e.g. before and after transformations. This can lead to serious compliance issues, for example with GDPR. Therefore, you should always clear outputs before saving and committing notebooks to a version control system, but this step is easily skipped over.
4. Security breaches: Credentials are sometimes stored in notebooks for convenience. Saving passwords, security keys and access tokens in plain text in a notebook is a major security risk and can lead to unauthorised access and potential security breaches. A better way to manage sensitive information, is to use environment variables or secure vaults.
5. Lack of best practices: Implementing software engineering best practices with notebooks is challenging. For instance, separating code into reusable modules is not straightforward in notebooks, and building CI/CD pipelines for code embedded in notebooks can be difficult. This typically means that code in notebooks ends up being less robust and maintainable.
6. Collaboration challenges: Merging changes from multiple contributors to notebooks is difficult, due to their JSON format. This can lead to corrupted notebooks, and in the worst case, loss of valuable work.
7. Dependency issues: Notebooks frequently lack clear documentation of the software environment and dependencies needed to run the code, making it hard to reproduce results. While this problem is not inherent to notebooks and can be avoided, it still happens more often with notebooks than with code files. There are many tools available to manage and document dependencies (such as `pip`, `conda`, `uv` or `docker`) which can help improve reproducibility.
Real-world example
In several companies, I’ve seen data scientists using sets of notebooks to load and prepare data, train a machine learning model and save it to disk, and then load the model and new data to make predictions. Sometimes, this process involved over 10 notebooks, all executed and orchestrated manually. The data scientist had to keep track of all the input data files, model versions and prediction files produced by the notebooks, stored either locally or on a shared folder on the network.
In one case, a model was used to predict which customers would be most receptive to a targeted marketing campaign. Every week, the data scientist would use a prediction notebook and a previously trained model to select customers for the call centre. Every few months, he would retrain the model using a training notebook, and manually update the prediction notebook to reference the latest version of the model. Similarly, the data needed for training and predicting was extracted by running two more notebooks, saving data locally as CSV files. Occasionally, new features would be added to the model, requiring changes to multiple notebooks.
In the example above, you can easily see where things could go wrong — and they did! The data extraction, data preparation, and feature engineering steps were modified and ended up misaligned with the actual input data. Performance metrics used for training the model did not match those used for customer selection. Sometimes the code simply crashed, other times it produced obviously incorrect results, and in the worst cases, it gave plausible-looking results that were actually completely wrong upon closer inspection (in one instance, predictions were made with several numerical features incorrectly swapped). Finally, when the product owner requested, it proved impossible to reproduce the predictions of a previously trained model — even using the original notebooks! We never found out why, as we ultimately rebuilt the solution from scratch, using best practices from software engineering, data engineering and machine learning, creating a robust, reliable, and tested solution, leaving the Jupyter Notebook mess behind us.
When Jupyter Notebooks are actually OK
Despite their drawbacks, Jupyter Notebooks can actually be very valuable if used properly. I usually advise data science teams to…
Use notebooks in three very specific scenarios only!
… each scenario having its own specific rules on how to use a notebook:
1. Individual exploration
- In these notebooks you are free to do whatever you like.
- But you are not allowed to commit them to a version control system such as Git (at least not to shared branches, such as `main` or `development`).
- Move useful code into plain code files, and/or a knowledge sharing notebook (see below), where you should add proper documentation, unit tests and implement other best practices as appropriate.
2. Knowledge sharing
The goal of these notebooks is to explain your code to other developers. Think of this notebook as replacing a PowerPoint presentation to showcase your code.
- The code should be well documented, with a logical story line and clear explanations of what the code is used for and how it works.
- These notebooks should run without errors from start to finish (Restart Kernel and Run All Cells…). This also means you need to specify the environment and dependencies required for running the notebook.
- Usually, you should clear all output from the notebook before saving and committing it to a version control system. You don’t want (sensitive) data appearing in your Git repository.
3. Hands-on exercises
- These notebooks are designed to teach the reader something about a particular topic through a hands-on exercise.
This scenario typically consists of a pair of notebooks:
- a solution notebook, that follows the rules of the knowledge sharing notebook, and
- an exercise notebook, which is a copy of the solution notebook but in which several code blocks have been cleared for the reader to work out.
What to do instead of Jupyter Notebooks
1. Develop code files using an IDE
Code files When developing code to be deployed in production, I would always recommend writing code files (e.g. Python files) instead of using Jupyter Notebooks. Code files provide a more structured and maintainable way to organise your code. They allow you to implement software engineering best practices such as modularisation, version control, and unit testing much more effectively. Such best practices are critical in software development as they ensure the reliability and maintainability of your application.
IDEs Integrated Development Environments, like PyCharm and Visual Studio Code, offer a range of features that enhance the coding process. Code completion speeds up development and reduces errors, while debugging tools allow you to step through your code, inspect variables, and identify issues more easily. IDEs also have built-in support for version control systems, making it easier to commit changes, resolve conflicts, and collaborate with other developers. Additionally, IDEs help you organise your project files and dependencies, making it easier to manage large codebases.
Notebooks can still be used on the side, for exploration and experimentation. While plain code files are recommended for developing production-ready code, Jupyter Notebooks are still a great tool for individual exploration during the development process. By using exploratory notebooks on the side, you can iterate rapidly on your ideas and gain insights into your data or problem domain before deciding on a final solution. Once you are satisfied with the results in the notebook, you can copy/paste your code into code files for further refinement, testing, optimisation, and integration into modules of a larger codebase.
2. Create and use template solutions
Creating and using template solutions can greatly enhance productivity, code quality and collaboration between developers. Templates provide a standardised starting point for new projects, incorporating best practices and common workflows. Templates can include predefined structures for loading data, preprocessing, model training, and evaluation, as well as integration with CI/CD pipelines and MLOps platforms. By starting with a template, you avoid reinventing the wheel and you ensure that your projects adhere to established standards. This approach ensures that new projects are built on a solid foundation, reducing the likelihood of errors and inconsistencies. This not only speeds up the development process but also makes it easier to maintain and extend your codebase. Finally, projects built using standardised template solutions are much easier to hand-over to a colleague for further development or maintenance.
As I’ve argued already years ago (in 2020: Data science is boring), most machine learning projects follow very similar workflows. Therefore, creating a new solution usually only requires small changes to an existing one. In fact, a data scientist should really only need to specify:
- the required training data,
- the base model structure,
- the training algorithm, and
- the performance metric to be optimised.
The rest of the solution can and should be automated. Template solutions are perfect for this, and once you have created a single solution, it can basically serve as your first template already! (Please invest a little time into preparing a proper template repository, though. You don’t want specific details of your current project cropping up in future projects.) When I’m asked to build a new machine learning model, if possible I use a template repository, make the necessary changes, and I can be ready to run the first full version of my new ML pipeline, perhaps faster than I can do my first experiment in a Jupyter Notebook.
3. Implement MLOps
Implementing an MLOps platform can significantly streamline the development and deployment of machine learning models, enhancing efficiency and improving quality (as I’ve written about in my blog The Power of MLOps). By automating repetitive tasks such as defining an ML pipeline, data preprocessing, model training, and deployment, MLOps allows data scientists to focus on understanding the business, finding the most valuable opportunities for analytics and machine learning, and experimenting with new ideas.
Building an MLOps platform as a self-service platform further enhances the efficiency of the machine learning development process (see Empowering Efficiency — Building a Self-Service Platform for Analytics). This approach reduces the time and effort required to move from experimentation to production, while ensuring that models are deployed reliably and consistently. If you want to give innovation a boost in your organisation, this is the way to go!
Conclusion
While Jupyter Notebooks have their place in the data science toolkit, they come with significant drawbacks that can hinder the development of reliable and maintainable systems. By understanding these limitations and following best practices, you can enhance the quality and reliability of your work, while increasing your efficiency and productivity.
- Use Jupyter Notebooks for individual exploration and knowledge sharing only.
- Develop production-ready code using an IDE.
- Use customisable templates to get started quickly.
- Implement MLOps to enhance efficiency and ensure quality.
This way, you can quickly build solutions that are not only functional, but also reliable and maintainable in the long run.
About the author
Erik Jan de Vries is an award-winning ML product architect / engineer and freelance consultant. While AI tools were used in the writing process for this blog, all the ideas and arguments presented here are my own. I’m always up for a chat about AI and ML, feel free to contact me on LinkedIn.