Functions And Capabilities In Kaggle
In this article, we will walk you through the core concept of learning kaggle environment and understanding the fundamentals of kaggle platform and we also take a descriptive drive of learning kaggle different options and functions.
Table of Content
- Basic Functions and Capabilities
- Advanced Capabilities
- Setting a Notebook as a utility script or adding utility script
Basic Functions and Capabilities
On the right-side panel, we have quick menu actions for access to frequently
used features of notebooks. In the following screenshot, we take a more
detailed look at these quick menu actions.
As you can see, the first quick menu actions are grouped under the Data
section. Here, you have buttons to add or remove datasets from the
notebook. By clicking on the Add Data button, you can add one existing
dataset. You have the search text box and quick buttons to select from your
datasets, competition datasets and notebooks. When you select your
notebooks, you can include the output of notebooks as data sources for the
current notebook. You also have an upload button next to the Add Data
button and you can use it to upload a new dataset before adding it to the
notebook. In the same Data section on the panel, you have the input and
output folder browser, and buttons next to each item so that you can copy the
path to either folders or files. Right under the Data section, we have the Models section see figure 1.1.
Here, we can add models to the notebook. Models is a new feature on the
platform, and it allows you to use powerful pre-trained models in your
notebook. In the Notebook options section, we can configure the accelerator, the
language, the persistence option, the environment, and internet access as per
our preferences see Figure 1.1. By default, the notebook will use a Central
Processing Unit (CPU) only. See the following screenshot for the expanded
view of Add Data, Add Models, and Notebook options in the right-hand
side panel.
You can search datasets by their name or path and you have speed filters to
search in competitions or the output of your notebooks. For Models as well,
you can search by name and filter by type text, image, computer vision or
video. Notebook options allows the selection of an accelerator type None
means CPU only, the programming language, the persistence type and the
option for the environment. By choosing the accelerator, you can switch to using one of the two
hardware accelerator options for Graphical Processing Unit (GPU) or
Tensor Processing Unit (TPU). The technical specifications for CPU
configuration and accelerator configurations, at the time of writing, are given
in Table. For all these specifications, either with CPU or GPU, you have
a maximum of 12 hours of continuous execution time. In the case of TPUs,
the execution time is limited to 9 hours. The input data size, however, is not
limited. The output is limited to 20 GB. An additional 20 GB can be used
only temporarily, during runtime, but it will not be saved after the run. By default, your notebook is set to not use any persistence. You can opt to
ensure persistence for files and variables, files only or variables only.
You can set your notebook to always use the original environment or to pin
to the latest environment. Depending on what libraries you use and what
data processing you perform, it might be useful to choose to work with the
original environment or use the latest available environment. When you
select the original environment, the settings of the original environment will
be kept every time you run new versions of the notebook. With the
alternative option to use the latest available environment, the environment
with predefined library versions will be updated to the latest version. The internet access is preset to “On” but in some cases, you would like to
set it “Off” For certain code competitions, internet access is not allowed. In
such cases, you will be able to download dynamic resources from the
internet in your training notebook, but you will have to make sure that every
needed resource is either internal to the notebook or in one of the attached
models, utility scripts or datasets, when running the inference notebook for
that code competition. We saw what the basic features of notebooks are and how to add data,
models and configure the running environment. Let’s see now what the
more advanced features are.
Advanced Capabilities
Basic notebook functionality allows us to perform quick experiments, test
ideas and prototype solutions. If we want to build more complex functionalities, however, we will need to write reusable code separate configurations including secrets, like API keys from code and even
integrate our code with external systems or components.
The Kaggle environment offers generous computational resources, but these
are limited. We might want to combine Kaggle Notebooks with external resources or we might want to integrate components from Kaggle
notebooks, datasets with other components, Google Cloud or our local
environment. In the upcoming sections, we will learn how to achieve all
these.
Setting a Notebook as a Utility Script or Adding Utility Script
In most cases, you will write all the code for your notebook in successive
cells in the same file. For more complex code and especially when you
would like to reuse some of the code, without copying code between
notebooks, you can choose to develop utility modules. Kaggle Notebooks
offers a useful feature for this purpose, namely Utility scripts. Utility scripts are created in the same way notebooks are. You will have to
start a notebook and then choose the Set as Utility Script menu item from
the File menu. If you want to use a utility script in your current notebook,
you need to select the Add utility script menu item from the File menu.
This will open a selector window for utility scripts on the right-side panel
and here, you can choose from your existing utility scripts and add one or
more to your notebook. As you can see in the following screenshot, added
utility scripts appear with the + button next to them seen on the left panel
and are added to the notebook under a separate group, usr/lib Utility
Scripts, just under the Input data section and before the Output data
section seen on the right panel.
To use the utility script within your code, you will have to import the module
in the same way you import Python packages. In the following code snippet,
we import the modules or functions included in one utility script.
As you can see, the function missing_data is defined in the utility script
data_quality_stats.
Conclusion
Finally, we will take a descriptive drive of learning kaggle functions and capabilities, actually whenever we go through any complex coding experience it feels cumbersome for us to solve easily. But kaggle provides smooth and all time flexible environment that provides better and ai based experience with automatic solutions and the experts that was present on the kaggle for many years, they also help newbies for growing and provides tactical gear to move with different approaches on kaggle.