Notebooks
Data practitioners can experiment machine learning algorithms spinning up Jupyter notebook with access to all your datasets. data.all leverages Amazon SageMaker instance to access Jupyter notebooks.
📦 Create a Notebook
⚠️ Pre-requisites
To use Notebooks you need to introduce your own VPC ID or create a Sagemaker Studio domain inside a VPC (read the docs). Provisioning the notebook instances inside a VPC enables the notebook to access VPC-only resources such as EFS file systems.
To create a Notebook, go to Notebooks on the left pane and click on the Create button. Then fill in the following form:
Field | Description | Required | Editable | Example |
---|---|---|---|---|
Sagemaker instance name | Name of the notebook | Yes | No | Cannes Project |
Short description | Short description about the notebook | No | No | Notebook for Cannes exploration |
Tags | Tags | No | No | deleteme |
Environment | Environment (and mapped AWS account) | Yes | No | Data Science |
Region (auto-filled) | AWS region | Yes | No | Europe (Ireland) |
Organization (auto-filled) | Organization of the environment | Yes | No | AnyCompany EMEA |
Team | Team that owns the notebook | Yes | No | DataScienceTeam |
VPC Identifier | VPC provided to host the notebook | No | No | vpc-…… |
Subnets | Subnets provided to host the notebook | No | No | subnet-…. |
Instance type | [ml.t3.medium, ml.t3.large, ml.m5.xlarge] | Yes | No | ml.t3.medium |
Volume size | [32, 64, 128, 256] | Yes | No | 32 |
If successfully created we can check its metadata in the Overview tab. Unlike other data.all resources, Notebooks are non-editable.
☁️ Check CloudFormation stack
In the Stack tab of the Notebook, is where we check the AWS resources provisioned by data.all as well as its status. As part of the Notebook CloudFormation stack deployed using CDK, data.all will deploy:
- AWS EC2 Security Group
- AWS SageMaker Notebook Instance
- AWS KMS Key and Alias
🗑️ Delete a Notebook
To delete a Notebook, simply select it and click on the Delete button in the top right corner. It is possible to keep the CloudFormation stack associated with the Notebook by selecting this option in the confirmation delete window that appears after clicking on delete.
💻 Open JupyterLab
Click on the Open JupyterLab button of the Notebook window to start writing code on Jupyter Notebooks.
⏹️ Stop/Start instance
As we briefly commented, data.all uses AWS SageMaker instances to access Jupyter notebooks. Be frugal and stop your
instances when you are not developing. To do that, close the Jupyter window and click on
Stop Instance in the Notebook buttons. It takes a couple of minutes, just refresh and check the Notebook Status
in the overview tab. It should end up in STOPPED
.
✅ Save money, stop your instances
This feature allows users to easily manage their instances directly from data.all UI.
Same when you are coming back to work on your Notebook, click on Start instance to start the SageMaker instance.
In this case the Status of the notebook should first be PENDING
and once the instance is ready, INSERVICE
.
🏷️ Create Key-value tags
In the Tags tab of the notebook window, we can create key-value tags. These tags are not data.all tags that are used to tag datasets and find them in the catalog. In this case we are creating AWS tags as part of the notebook CloudFormation stack. There are multiple tagging strategies as explained in the documentation.