NeuralkAI Categorization Example#

This example shows how to use the NeuralkAI SDK to predict product categories. We use a subset of the Best Buy dataset.

import os
import tempfile
from pathlib import Path

import polars as pl

from neuralk import Neuralk
from neuralk.datasets import best_buy

Loading username and password#

To connect to the Neuralk API, we need to authenticate. Here we read the username and password from environment variables. We first attempt to load any variables set in a dotenv file.

Then, we can create a Neuralk client to connect to the API.

try:
    from dotenv import load_dotenv

    load_dotenv()
except ImportError:
    print("python-dotenv not installed, skipping .env loading")

user = os.environ.get("NEURALK_USER")
password = os.environ.get("NEURALK_PASSWORD")

client = Neuralk(user, password)

Uploading datasets into a project#

All the datasets and the analyses we run on them belong to a project. Those are managed through the .projects attribute of the Neuralk client.

Here we retrieve the "best_buy" project, creating it if it does not exist yet. Note that we can get the list of all projects accessible from our client with Neuralk.projects.get_list

def get_or_create_project(name):
    project = next((p for p in client.projects.get_list() if p.name == name), None)
    if project is None:
        project = client.projects.create(name)
    return project


project = get_or_create_project("best_buy")

The Neuralk SDK package provides a few example datasets with which we can try out the API in the datasets module. We obtain the path to a Parquet file containing a (training) subset of the Best Buy data.

to make the example run fast we use a small subset, pass subsample=False to run on full dataset

local_dataset = best_buy(subsample=True)
pl.read_parquet(local_dataset["train_path"])

shape: (100, 7)

name	description	neuralk categorization level 0	neuralk categorization level 1	neuralk categorization level 2	neuralk categorization level 3	neuralk categorization level 4
str	str	str	str	str	str	str
"EnerPlex - Surfr Amp Battery C…	"ENERPLEX Surfr Amp Battery Cas…	"Cell Phones"	"Cell Phone Accessories"	"iPhone Accessories"	"iPhone Cases & Clips"	null
"Resident Evil 5 - PRE-OWNED - …	"Daylight won't protect you fro…	"Video Games"	"Pre-Owned Games"	null	null	null
"Elite - 8-Cup Rice Cooker - Wh…	"Steamer attachment with cool-t…	"Appliances"	"Small Kitchen Appliances"	"Steamers, Rice Cookers & Press…	"Rice Cookers"	null
"VTech - CS6619-16 DECT 6.0 Exp…	"1 handset includedSpecial Feat…	"Connected Home & Housewares"	"Telephones & Communication"	"Cordless Telephones"	"Systems"	null
"Apple - Certified Pre-Owned iP…	"Certified Pre-Owned5.5-inch Re…	"Cell Phones"	"Refurbished & Pre-Owned Phones"	"Pre-Owned Phones"	null	null
…	…	…	…	…	…	…
"mophie - juice pack air Chargi…	"Compatible with Apple iPhone S…	"Cell Phones"	"Cell Phone Accessories"	"iPhone Accessories"	"iPhone Cases & Clips"	null
"Metra - Turbo Wire Harness Ada…	"From our expanded online assor…	"Car Electronics & GPS"	"Car Installation Parts & Acces…	"Car Audio Installation Parts"	"Deck Installation Parts"	"Deck Harnesses"
"Isaac Mizrahi - Smartwatch 42m…	"Compatible with select Apple a…	"Cell Phones"	"Cell Phone Accessories"	"Smartwatches & Accessories"	"Smartwatches"	null
"Nintendo Selects: Super Mario …	"Leave gravity at home and enjo…	"Video Games"	"Wii"	"Wii Games"	null	null
"Acer - Aspire Z 23" Touch-Scre…	"Windows 8.1, upgrade to Window…	"Computers & Tablets"	"Desktop & All-in-One Computers"	"All Desktops"	null	null

To run an analysis, we need to create a dataset on the platform. Those are managed through the .datasets attribute of the Neuralk client.

The data is uploaded to our project with Neuralk.datasets.create. We pass the project object, a name for the dataset, and the local path of the CSV or Parquet file (in this case, Parquet) to upload.

train_dataset = client.datasets.create(
    project, "best_buy_train", local_dataset["train_path"]
)

Fitting a product categorization workflow#

The next step is to fit a workflow on the training data. Later, we will be able to use it to categorize new products for which we do not have a ground truth.

Analyses are managed through the analysis attribute of the Neuralk client. We launch the fit of our workflow by creating a “categorization fit” analysis with Neuralk.analysis.create_categorization_fit.

analysis_fit = client.analysis.create_categorization_fit(
    train_dataset,
    "best_buy_fit",
    target_columns=[
        "neuralk categorization level 0",
        "neuralk categorization level 1",
        "neuralk categorization level 2",
        "neuralk categorization level 3",
        "neuralk categorization level 4",
    ],
)
print("Categorization fit analysis created:", analysis_fit)

Categorization fit analysis created: Analysis(id='0ffd5794-81fa-4656-adce-97f4aaffedf8', name='best_buy_fit', error=None, advancement=None, status='PENDING')

Neuralk.analysis.create_categorization_fit returns immediately an Analysis object that represents the analysis we have just launched. This analysis has not finished running yet. Depending on our needs, we may want to continue with other tasks or finish our script at this point. But in this example, we want to wait and use the fitted model once it is ready.

Neuralk.analysis.wait_until_complete allows us to pause the execution of our script and wait for the analysis to finish (or error). It returns a new Analysis object, corresponding to the same analysis but with an updated status.

analysis_fit = client.analysis.wait_until_complete(analysis_fit, verbose=True)

Analysis status: None

Analysis status: PENDING

Analysis status: RUNNING

Analysis status: SUCCEEDED ✅

Using the fitted workflow#

Now that we have fitted our categorizer, we can use it for some new, unseen products.

We start by creating a new dataset to upload the unseen data.

test_dataset = client.datasets.create(
    project, "best_buy_train", local_dataset["test_path"]
)

We can now apply the fitted workflow to the test_dataset we just created. This is done with Neuralk.analysis.create_categorization_predict, to which we pass the new dataset, a name for the analysis, and the Analysis object resulting from fitting the model we wish to use now.

analysis_predict = client.analysis.create_categorization_predict(
    test_dataset, "best_buy_predict", analysis_fit
)
print("Categorization fit analysis created:", analysis_fit)

Categorization fit analysis created: Analysis(id='0ffd5794-81fa-4656-adce-97f4aaffedf8', name='best_buy_fit', error=None, advancement=100, status='SUCCEEDED')

As before, we wait until the analysis finishes before continuing the example.

analysis_predict = client.analysis.wait_until_complete(analysis_predict, verbose=True)

Analysis status: None

Analysis status: PENDING

Analysis status: RUNNING

Analysis status: SUCCEEDED ✅

Downloading the prediction results#

Now that our prediction analyis is complete, we want to download the predictions. This is done with Neuralk.analysis.download_results, to which we pass the reference to the prediction analysis whose results we want.

All the results are stored in the provided directory, from which we can load them to use as we wish.

with tempfile.TemporaryDirectory() as results_dir:
    client.analysis.download_results(analysis_predict, folder_path=results_dir)
    print("Prediction results downloaded to temporary directory")
    results_file = next(Path(results_dir).iterdir())
    y_pred = pl.read_parquet(results_file)

print(y_pred.shape)

Prediction results downloaded to temporary directory
(100, 5)

Total running time of the script: (0 minutes 55.692 seconds)

Gallery generated by Sphinx-Gallery