Ubuntu – How to upload a data set from a command-line (such as Google Collaboratory) into Kaggle

command linescriptsupload

I have read some commands to upload a file or data set from Google Collaboratory or Linux terminal in to GitHub (see my previous question).

However I have no idea about how I can upload a data set from Google Collaboratory or Linux shell into Kaggle directly via commands. How can I achieve that?

Best Answer

1 Preparation

Based on the official Kaggle API documentation:

Install the Kaggle command-line interface (here via PIP, a Python package manager):
```
sudo apt install python3-pip
pip3 install --user kaggle
```
Create a configuration directory for the next step:
```
mkdir ~/.kaggle
```
Authentication:

In order to use the Kaggle’s public API, you must first authenticate using an API token. From the site header, click on your user profile picture, then on “My Account” from the dropdown menu. This will take you to your account settings at https://www.kaggle.com/account. Scroll down to the section of the page labelled API:

To create a new token, click on the “Create New API Token” button. This will download a fresh authentication token onto your machine.

Store it as ~/.kaggle/kaggle.json, since that’s where the CLI will look for it by default. You can simply copy and paste that path into the file selection dialogue of your web browser.

2 Dataset Upload

Again from the same official API documentation:

Create a new Dataset

Here are the steps you can follow to create a new dataset on Kaggle:
Create a folder containing the files you want to upload.
Run
kaggle datasets init -p /path/to/dataset
to generate a metadata file.
Add your dataset’s metadata to the generated file, datapackage.json.
Run
kaggle datasets create -p /path/to/dataset
to create the dataset.
Your dataset will be private by default. You can also add a -u flag to make it public when you create it, or navigate to “Settings” > “Sharing” from your dataset’s page to make it public or share with collaborators.

Create a new Dataset version

If you’d like to upload a new version of an existing dataset, follow these steps:
Run
kaggle datasets init -p /path/to/dataset
to generate a metadata file (if you don’t already have one).
Make sure the id field in dataset-metadata.json (or datapackage.json) points to your dataset.
Run:
kaggle datasets version -p /path/to/dataset -m "Your message here"

These instructions are the basic commands required to get started with creating and updating Datasets on Kaggle. You can find out more details from the official documentation on GitHub:

Initializing metadata

Create a Dataset

Update a Dataset

Looking at my answer it turned out to be a nice way to tell you to RTFM. ;-]

Related Solutions

Ubuntu – How to activate/deactivate a gnome-shell extension from command line

It is well described in the Gnome wiki, quoting:

You can do this with the GSettings key, org.gnome.shell.enabled-extensions, or several tools that manipulate this GSettings key, such as GNOME Tweak Tool or a recent version of gnome-shell-extension-tool.

If you invoke gnome-shell-extension-tool --help, you will see that it is capable of enabling and disabling extensions by their name. For example, the following command enables user themes:

gnome-shell-extension-tool -e user-theme

Oh, and you can get the names of all your locally installed extensions by doing ls ~/.local/share/gnome-shell/extensions. It will give you entries of the form the-name@author.

Ubuntu – Cron is executed but script doesn’t work

You seem to be confusing two different methods of invoking cron jobs.

Ubuntu inherits from Debian a somewhat confusing policy of supporting both user crontabs that are stored in a spool area /var/spool/cron, and system-wide cron jobs run from /etc/crontab and the files in /etc/cron.d.

Jobs specified in /etc/crontab or via files in /etc/cron.d need an extra field in order to allow them to be run as a different user so the format is something like

*/10 * * * * <username> <command> <args>

Jobs set up via the spool area using crontab -e (or sudo crontab -e for root) already belong to a specific user, and don't need the user field

*/10 * * * * <command> <args>

If you include the username field in a cron job set up via a crontab -e command, it will be misinterpreted as a command: as we can see from your log output,

Oct 21 07:30:01 stan CRON[7604]: (stan) CMD (stan /home/stan/update.sh)

cron is interpreting stan as a command with argument /home/stan/update.sh

The solution should be simply to remove the username stan from your crontab.

Best Answer

1 Preparation

2 Dataset Upload

Create a new Dataset

Create a new Dataset version

Related Solutions

Ubuntu – How to activate/deactivate a gnome-shell extension from command line

Ubuntu – Cron is executed but script doesn’t work

Related Question