An Introduction to Colab and Jupyter for Beginners

I strongly suggest watching the video for all the screen recordings and visuals!

Let’s talk about something that has become super popular in data science, machine learning, and all sorts of coding adventures: Jupyter Notebooks and Google Colab. These are two fantastic environments that let you write code, see the outputs right away, add notes and explanations for yourself or others, and basically craft these interactive experiences that can go far beyond a traditional script file.

Google Colab (left) vs. jupyter notebooks (right).

You might have heard of both and thought they were basically the same thing, or maybe you’ve never really used either and want to know which is best to start with. Whichever category you’re in, today I’ll walk you through a friendly introduction to Jupyter Notebooks and Google Colab, compare them, and share some tips on when you might want to use each one.

Let’s start with Jupyter Notebooks.

Jupyter Notebooks are pretty much a staple in the Python ecosystem. Instead of writing all your Python commands in one big file called something like myscript.py, you break your code into smaller, independent cells. Each cell can be run on its own without affecting or forcing you to re-run the rest of the code. This is a huge convenience because it means you can experiment with a snippet of code, see what it does, and if you like the results, you move on. If it’s not working, you can tweak it without having to restart your entire program. This makes your coding process a lot more interactive and exploratory.

Jupyter Notebooks also let you insert these special cells that hold text written in Markdown. Markdown is just a simple way of formatting text, so you can do things like headers, italic or bold text, bullet points if you need them for typed notes, or even embed images or equations in LaTeX. This feature is super cool if you like to keep your code and your explanations together. It is especially useful when you want to share your findings with teammates or students, or maybe you just want a record of your own thought process with code you can run live. You can also create raw cells, which basically hold data or content that isn’t meant to be run or rendered in any special way, like images or videos.

Another term you’ll hear a lot in the Jupyter world is the kernel. That kernel is basically the computational engine that runs your code. When you hit the run button on a cell, the kernel is what takes your code, executes it, and provides the result right there in the notebook. If the kernel crashes, you lose all your variables, so you might have to restart or reimport your libraries. Which means running your code from the start again, as with pure Python files. Likewise, if you want to start fresh, you can restart the kernel so that all the variables and states are wiped out, which can help you avoid these weird situations where your notebook’s memory is in some unknown state.

One of the things I love about Jupyter Notebooks is that you can have immediate feedback on whatever code you write, whether it’s producing a table, a graph, or an error message. If you’re trying to figure out which data transformation to apply or what part of a dataset might have outliers, you can just try something out, run that cell, and see what happens without re-running everything else. It’s no wonder so many data scientists and researchers prefer Jupyter Notebooks for tasks like data cleaning, data visualization, machine learning experiments, and teaching. I preferred it over Python files while doing my Master’s, too.

Because Jupyter Notebooks are so common, you’ll find that you can install them in multiple ways. One of the simplest ways is to use pip, the Python package manager, so you just type pip install notebook in your terminal. Ideally, you do this inside an environment to keep your Python setup organized else you could really complicate your life.

There are two common ways to create an environment:

First, using venv, which is built into Python. You create a virtual environment by running python -m venv my_env, then activate it with source my_env/bin/activate on Mac/Linux or my_env\\Scripts\\activate on Windows. Once it's active, install Jupyter inside the environment with pip install notebook.

The second way is using Anaconda, which is popular for data science. If you have Anaconda installed, you can create an environment with conda create --name my_env, activate it using conda activate my_env, and install Jupyter with conda install jupyter.

Both methods keep your projects organized and prevent conflicts between different libraries. If you’re just getting started, Anaconda is often recommended because it comes with many useful tools pre-installed and has a user-friendly interface, but venv is a great lightweight alternative if you like to open your terminal. You can ask ChatGPT for all specific steps to set this up easily as well! ;)

Then, once it’s installed, you can open a terminal or command prompt, type jupyter notebook, and it’ll pop up in your web browser. You’ll see what looks like a file manager in the browser, letting you navigate to whatever folder you want and create a new notebook in that folder. It’ll open in a new tab, and voilà, you’re ready to start coding. If you have multiple versions of Python installed, just be sure that your kernel is set to the right one you want to use.

Now, once you have Jupyter running, the files you see or can modify are the ones on your computer. You can store your notebooks wherever you like on your local machine. The advantage of doing everything locally is that you’re in total control and you can install any version of a library you want. You can also work offline, which might be useful if you’re on a plane or in a location with an unreliable internet connection. The downside is that you’re limited by your computer’s hardware. If you have an older laptop, it might be slow. That’s where Google Colab comes in as a handy alternative.

A note again on watching the video here if you’d like the visuals! :)

Google Colab is essentially Jupyter in the cloud, and you can use it for free. Instead of installing anything on your computer, you skip the Anaconda and bootup part to just go to your browser, head to the Colab website, and create a new notebook or open one from someone else through a single link click. All your notebooks live in Google Drive, and Colab can link directly to them, which makes it quite convenient if you are already using Google Drive for storing information. It provides all the same features as Jupyter — cells, code, Markdown, text rendering — and even the same look and feel. The big difference is that your code is actually running on a virtual machine in Google’s data centers. Instead of using your local CPU or GPU, you’re using the remote system’s hardware.

In case you’re curious about a straightforward demonstration of how to use Colab: you basically open your browser, go to the Colab website, and you’ll see a big button that says something like New Notebook. Once you click that, you get a fresh notebook. You can type something like print(“Hello, I am Colab Notebook”) in the cell, then press Shift+Enter top run it, and the output appears below the cell. You can add more cells, rename the notebook, and so on. If you want to get fancy and see if you have GPU access, you open the Runtime menu and change the runtime type to GPU. Then if you type something like !nvidia-smi in a cell, it’ll show you the GPU information of the system you are using. And yes, you have access to a GPU for free, though it’s limited compared to paid users.

Colab has a lot of advantages. For one, you don’t have to worry about installing commonly-used data science libraries like NumPy, Torch or Pandas, because Colab already has a bunch of them installed by default. In many cases, you can just write your import statements and start coding. It makes prototyping and quick testing super fast, just like reducing barriers when you try to create better habits. And if you need a specific library that isn’t installed, you can still install it right there in the notebook by running a command like this one !pip install library-name. Just needs an extra exclamation mark to make the cell run like a terminal. Another huge perk is that Colab offers free access to GPUs and even TPUs. That’s a game-changer if you want to train a deep learning model or do any computation that would normally surpass your personal computer. You can just select a GPU runtime as we did and harness that extra power at no cost. Of course, it’s not unlimited. Google might time you out if your session is idle for too long, or you exceed their usage limits. But for many learners and practitioners, it’s more than enough to experiment with large datasets, train models, or try new techniques, especially for quick tests.

Another advantage of Colab is how easy it is to share notebooks with other people. It’s basically the same process as sharing a Google Doc. You can let people view, comment, or edit. Colab also saves your changes automatically. You can literally close your browser or have your computer crash, and your notebook is saved in Drive.

Of course, Colab has its limitations too. Because it’s cloud-based, you need a decent internet connection. If your connection drops and you lose your session, you might lose some of your work unless you saved it carefully or your environment was pinned. And there are usage limits for how long you can run a notebook or how much memory you can use. If you want more consistent performance or resources, there’s a paid tier called Colab Pro, which unlocks more GPU time and other perks. Another thing to consider is that you can’t just access local files on your computer unless you manually upload them or link them through some external service. The easiest route is mounting Google Drive in your notebook, which just takes a couple lines of Python. That way you can read or write files as if they were in a local folder.

Now, you might be wondering which environment is right for you, Jupyter or Colab. The honest answer is that it depends on what you’re doing. If you like controlling everything and you want to make sure your environment is exactly how you want it, need offline usage and have a decent computer and graphics card, Jupyter Notebooks might be the ideal solution. If you don’t want to install anything, want to test something right now, need more computational power than your laptop can handle, or want easy collaboration with others, Colab might be a better choice. I typically use both depending on what I want to do. You can also develop locally, then upload to Colab if you suddenly need more resources or want to share with a friend. It’s pretty flexible.

As an example, let’s say you’re a data scientist exploring a new dataset. You might fire up a Jupyter Notebook locally, load the data, and start cleaning and visualizing. You’ll create a couple of cells for reading the data, maybe some explanation in Markdown to keep track of your logic, and a few code cells for exploratory plots. Once you have a sense of the data, you might do some machine learning experiments. If they’re not too big, you’re good to go. You can run them on your local CPU. But if you want to try some large neural network or an algorithm that benefits a lot from GPU acceleration, you might then say, “Alright, let’s upload this notebook to Colab, turn on GPU mode, and see what kind of performance boost we can get.” You can store your dataset on Google Drive and then link it. And once you’ve done that, you can run the same code in the cloud, usually without many modifications. The difference is that you might see it run in a fraction of the time, or at least you’re not tying up your own machine.

Another scenario is teaching and sharing tutorials. When I’ve taught Python or data science, using Colab is amazing. Students just need a Google account. They click a link, copy a notebook, and they can follow along, execute and test stuff live while we do the course. No more messing around with instructions like “Alright, install Conda, open the command prompt, install this library, set up your environment with conda,” and fix tons of local environment bugs. On the other hand, if you’re teaching in an environment where installing Python is acceptable, or your students already have it, and your task doesn’t require much computing, OR you have a powerful PC, using Jupyter is a classic approach. You give them a jupyter notebook file, they open it in their local Jupyter, and they can code away. In either case, they get the magic of an interactive notebook, which is perfect for learning.

It’s not all sunshine and rainbows, of course. Colab can be annoying. Sometimes, your session will reset if it’s idle for a while, or you might lose data that wasn’t saved properly. You always have to be mindful of how much time you have on a GPU session or if you’re hitting any usage caps. At the end of the day, while notebooks are fantastic for exploration, testing ideas, and teaching, they aren’t the only way to write Python. If you’re building a full-fledged application, deploying machine learning models to production, automating tasks, or working on large software projects, traditional Python scripts (.py files) are often the better choice. They integrate more easily into production pipelines, can be packaged into applications, and offer better structure for complex projects. That said, notebooks always have a place—whether for prototyping, debugging, or quickly trying out new concepts before translating them into a more permanent codebase.

I hope you found this introduction to notebooks useful and that you now better understand why they exist. Thanks for reading, and happy coding!

p.s. this article was made for our full python course. Master the most in-demand skill for building AI-powered solutions — from scratch: https://academy.towardsai.net/courses/python-for-genai?ref=1f9b29