Like many managers in the corporate world, until recently I thought you should not use these tools. The common theme is that it’s for small projects or classroom problems. Not for the real world. Then, in the process of designing a new course, I had to work with notebooks. Because all classes use notebooks these days, I thought I should too. Afterall, if millions of students pay so much money to stare at notebooks, they must love it! Even though in my opinion all these classes progress at the pace of a snail, as a result.
For a self-learner like myself, navigating through many small code snippets is the wrong way to learn. I like to see the full solution and then dive in. The top-down approach. But for most people, the bottom-up approach — the notebook experience in short — works a lot better. In the end my goal is to sell to interested buyers, so I jumped into Jupyter, Colab, and so on.
I thought I would limit the use of notebooks to teaching activity only. Working with Google Colab did not start well. Eventually though, I found them a lot more useful than I imagined. I have now adopted it for some development activities. In the remaining of this article, I share my experience, the pluses and minuses, and how to get the best out of it. It is aimed at people with little or no practical experience with notebooks. Possibly people who think that they are a waste of time.
What is a notebook?
A notebook is a document that contains both code and text. The code is executable step by step. Each small (or long) code snippet constitutes an element called cell. You can run cells sequentially, stop and resume hours later from where you stopped. Nothing is lost. By the end, the output is the same as if running the full Python script in your local environment. The output is also produced sequentially, and displayed after each cell. Thus, it works well with linear programs. Programs with many nested loops (to avoid if possible) benefit from having a separate function for each loop. Each of these functions goes into a cell.
From a physical standpoint, notebook files have the extension ipynb. You can see an example on my GitHub repository, here. If you click on this notebook, it will render the full (completed) static version as an HTML page. You can download it, and open it in your own notebook environment. Then you will be able to run it step-by-step and modify it, mostly to test or understand the Python code, or for re-use. Besides Python code, some cells may contain Unix commands. These are recognizable as they start with an exclamation point. They will execute just like Python code.
How to get started
How to get started with notebooks? In other words, how to use a notebook viewer / editor, and create / execute notebooks? You can do it remotely from your browser, by signing-up on Google Colab. Some Python installations (Anaconda) may come with notebooks pre-installed. What worked best for me is to install it in my local environment, just like you would install any new Python library, with the command
!pip install notebook. Then, to run in, type in
jupyter notebook on the command line. It will open a new window and creates its own virtual machine. It does not modify your own local Python environment during installation. You will still be able to use it unchanged.
Colab allows you to run notebooks remotely, and share with other users or students. The input datasets you need to run your code can be imported from Google drive. You can save your notebook on Google drive or GitHub, from the Colab environment. On occasions, you may have to install some Python libraries if your code uses special ones, such as svd. In this example, do it with
!pip install svd in a notebook cell (and then run the cell). To display MatPlotLib output (plots), run the command
%matplotlib inline in a cell, once for the entire notebook.
My experience with Colab is not a success. Sometimes you can access files on Google drive, sometimes not. A library that once worked stops working for unknown reasons. And when you think you saved the environment (the Python output files) you did not. Next time you log on, it is not there anymore because Colab creates a new virtual machine each time. What works consistently is when you copy and paste your code from some other place, or enter it manually. What does not work well is trying to replicate your typical batch-mode working environment. It makes it difficult to use Colab for development purposes or larger projects. Finally, to copy and paste into a cell,
Ctrl-V works. You have to know that, but now you do.
I installed Jupyter notebook on my machine. It did not impact my standard working environment. I like the fact that I can run my code incrementally, piece by piece. Stop in the middle and make some changes or tests, then resume from where I stopped, without having to re-run the whole script each time, since everything is still there in memory even days later if I did not close the notebook.
Then I can document the code, explain the models that I use by adding text cells in the notebook. I can had navigation links as in an HTML page, and even add complex mathematical formulas in LaTex, into the document. The output images from Python are also easy to blend into the notebook. And the final step is to share it, on GitHub, allowing everyone to replicate the results and get all the documentation needed, all in just one document: the notebook.
About the Author
Vincent Granville is a pioneering data scientist and machine learning expert, founder of MLTechniques.com and co-founder of Data Science Central (acquired by TechTarget in 2020), former VC-funded executive, author and patent owner. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, CNET, InfoSpace. Vincent is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS).
Vincent published in Journal of Number Theory, Journal of the Royal Statistical Society (Series B), and IEEE Transactions on Pattern Analysis and Machine Intelligence. He is also the author of “Intuitive Machine Learning and Explainable AI”, available here. He lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math and probabilistic number theory.