S2 — Rounding Up In Kaggle Environment
In the previous article, we learned how to create your Kaggle account, and
what is most important to know about Competitions, Datasets, Code
Notebooks, Discussions and Kaggle Learn and Models. In this article, we
will explore the Kaggle Notebooks functionality. Kernels and Code are used
as alternative names sometimes to refer to Notebooks, Kernels being the old
name and Code being the new menu name for Notebooks. Both terms, the
old one and the new one, illustrate something important about a notebook on
Kaggle.
Table of Content
- What is kaggle notebooks
- How to create notebooks
- Exploring notebooks capabilities
What is Kaggle Notebook ?
Kaggle Notebooks are integrated development environments that allow you
to write code, version it, run it, using Kaggle platform computational
resources and produce the results in various forms. When you initiate work
on a notebook, you start a coding editor. This in turn, starts a Docker
container, provisioned with the most used Python packages for data analysis
and machine learning, running in a virtual machine allocated in Google
Cloud. The code itself is linked to a code repository. You can write code in one of two languages, Python or R. Currently, Python
is used by most of the users on Kaggle and all examples in this article will
only be in Python. The term Notebooks is used generically, but there are two types of Kaggle
Notebooks:
- Kaggle Scripts
- Kaggle Notebooks
Kaggle Scripts
Kaggle scripts are files that will execute all code sequentially.
The output of the scripts execution will be printed in the console. If
you want, you can also execute a part of the script only, by selecting a
few lines and pressing the Run button. If you are using the R language
for development, you can use a special type of script, RMarkdown
script. The environment to develop it is similar to the one for Python or
R scripts, but you can use the syntax for R-Markdown and the output
will be a combination of the R code execution results and the
R-Markdown syntax for text and graphical effects.
Kaggle Notebooks
Notebooks have a similar look and feel as Jupyter
Notebooks. They are similar but not identical. Kaggle Notebooks have
multiple additional options to support integration with a Kaggle
environment and a better user experience. Notebooks are composed of a
succession of cells with either code or Markdown content and each cell
can be executed independently. You can code using either R or Python
while using Notebooks. When running a cell, the output generated is
displayed right under the cell in the case of code cells. With a brief overview of Kaggle Notebooks and their essential components,
let’s see now how you can create a notebook.
How to Create Notebooks
There are several ways to start a notebook. You can start it from the main menu code that is shown
in the figure below, from the context of a dataset in the figure
below, a Competition, next to the second figure, or by forking copying and editing an existing
notebook.
When you create a new notebook from the Code menu, this new notebook
will appear in your list of notebooks but will not be added to any Dataset or
Competition context. If you choose to start it from a Kaggle Dataset, the dataset will be already
added to the list of data associated with the notebook, and you will see it in
the right-side panel that is shown in fifth figure, when you edit the notebook.
The same is true in the case of a Competition. The Dataset associated with it
will be already present in the list of datasets when you initialize the
notebook.
To fork copy and edit, an existing notebook, press the three vertical dots
next to the Edit button of that notebook and then select the Copy & edit
notebook menu item from the drop-down list.
Once created, the notebook will be open for editing, as you can see in the
following screenshot. On the upper-left side, there is a regular menu, File, Edit, View, Run, Add-ons and Help, with quick action icons for editing
and running under it. On the right-hand side, there is a retractable panel with
more quick actions.
The File menu is complex and offers options for input and output, as well as
various settings for interactions with other resources on the platform, Models, Utility Scripts and Notebooks. It has menu items to import an
external notebook or export your current notebook and to even add data or
models to the notebook. You can also either save the current notebook as a
utility script or add a utility script to the notebook. You can choose to set the
language to R or Python, by default, is set to Python. There is an option to
set the current notebook as a script or a notebook, notebook is the default. Additional options are for publishing and sharing the notebook on GitHub.
To publish the notebook on GitHub, you will have to link your Kaggle
account with the GitHub account by authorizing Kaggle to access your
GitHub account. Once you perform this operation, updates of the notebook will be mirrored on GitHub as well. Using the Share menu item, you can set
who can view or edit the notebook. Initially, you will be the only user with
read and write access, but once you add contributors, they can also be
assigned with both read or write access, or only with read (view) access. If
you publish your notebook, then everyone will have access to read it, be able
to fork, copy and edit it and then edit the work. The Edit menu allows you to move cells around (up and down) or delete a
selected cell. In View, you have options to adjust the look and feel of the
editor adding or removing themes, line numbers and setting the editor
layout and the resulting output HTML content see or hide input or output
for selected cells, or collapse or extend cells. The Run menu item provides controls to run one cell, all cells, all cells
before or after and to start/stop a session. At the restart of the session, the
Kernel, the Docker container in which the notebook is running is
restarted and all context data initialized when we run some of the cells is
reset. This is a very useful option when, while editing, you want to reset the
environment with all the variables. Add-ons menu groups, secret
management, Google Cloud services and the Google Cloud SDK each of
those extends the functionality of notebooks and will be presented under the
Advanced capabilities section later in this article. Now that we have learned how to create, edit and run notebooks, let’s
continue by exploring more notebook features.
Exploring Notebook Capabilities
Notebooks serve as powerful tools for data exploration, model training and
running inferences. In this section, we will examine the various capabilities
that Kaggle Notebooks have to offer.
We will start off with the most frequently used features of notebooks. We
will go through the options to add various resources to a notebook data and
models and to modify the execution environment. Then, we continue with
more advanced features, which will include setting up utility scripts, adding
or using secrets, using Google Cloud services or upgrading a notebook to a
Google Cloud AI Notebook.
Conclusion
Finally, we will exploring the fundamentals of kaggle environment and learn the basic methods and concepts that help us for stepping ahead in the journey of data science and machine learning and we folks take a descriptive learning drive of kaggle and understanding the best and frequently usable options and In upcoming sections we will deep dive further more in kaggle and take a hands glance as well.