Ubuntu – How to upload a data set from a command-line (such as Google Collaboratory) into Kaggle

command linescriptsupload

I have read some commands to upload a file or data set from Google Collaboratory or Linux terminal in to GitHub (see my previous question).

However I have no idea about how I can upload a data set from Google Collaboratory or Linux shell into Kaggle directly via commands. How can I achieve that?

Best Answer

1 Preparation

Based on the official Kaggle API documentation:

  1. Install the Kaggle command-line interface (here via PIP, a Python package manager):

    sudo apt install python3-pip
    pip3 install --user kaggle
    
  2. Create a configuration directory for the next step:

    mkdir ~/.kaggle
    
  3. Authentication:

    In order to use the Kaggle’s public API, you must first authenticate using an API token. From the site header, click on your user profile picture, then on “My Account” from the dropdown menu. This will take you to your account settings at https://www.kaggle.com/account. Scroll down to the section of the page labelled API:

    To create a new token, click on the “Create New API Token” button. This will download a fresh authentication token onto your machine.

    Store it as ~/.kaggle/kaggle.json, since that’s where the CLI will look for it by default. You can simply copy and paste that path into the file selection dialogue of your web browser.

2 Dataset Upload

Again from the same official API documentation:

Create a new Dataset

Here are the steps you can follow to create a new dataset on Kaggle:

  1. Create a folder containing the files you want to upload.

  2. Run

    kaggle datasets init -p /path/to/dataset
    

    to generate a metadata file.

  3. Add your dataset’s metadata to the generated file, datapackage.json.

  4. Run

    kaggle datasets create -p /path/to/dataset
    

    to create the dataset.

Your dataset will be private by default. You can also add a -u flag to make it public when you create it, or navigate to “Settings” > “Sharing” from your dataset’s page to make it public or share with collaborators.

Create a new Dataset version

If you’d like to upload a new version of an existing dataset, follow these steps:

  1. Run

    kaggle datasets init -p /path/to/dataset
    

    to generate a metadata file (if you don’t already have one).

  2. Make sure the id field in dataset-metadata.json (or datapackage.json) points to your dataset.

  3. Run:

    kaggle datasets version -p /path/to/dataset -m "Your message here"
    

These instructions are the basic commands required to get started with creating and updating Datasets on Kaggle. You can find out more details from the official documentation on GitHub:

Looking at my answer it turned out to be a nice way to tell you to RTFM. ;-]