Note
Go to the end to download the full example code.
NeuralkAI Categorization Example#
This example shows how to use the NeuralkAI SDK to predict product categories. We use a subset of the Best Buy dataset.
import os
import tempfile
from pathlib import Path
import polars as pl
from neuralk import Neuralk
from neuralk.datasets import best_buy
Loading username and password#
To connect to the Neuralk API, we need to authenticate. Here we read the username and password from environment variables. We first attempt to load any variables set in a dotenv file.
Then, we can create a Neuralk
client to connect to the API.
Uploading datasets into a project#
All the datasets and the analyses we run on them belong to a project. Those
are managed through the .projects
attribute of the Neuralk
client.
Here we retrieve the "best_buy"
project, creating it if it does not exist
yet. Note that we can get the list of all projects accessible from our client
with Neuralk.projects.get_list
def get_or_create_project(name):
project = next((p for p in client.projects.get_list() if p.name == name), None)
if project is None:
project = client.projects.create(name)
return project
project = get_or_create_project("best_buy")
The Neuralk SDK package provides a few example datasets with which we can try
out the API in the datasets
module. We obtain the path to a Parquet
file containing a (training) subset of the Best Buy data.
to make the example run fast we use a small subset, pass subsample=False to run on full dataset
local_dataset = best_buy(subsample=True)
pl.read_parquet(local_dataset["train_path"])
To run an analysis, we need to create a dataset on the platform. Those are
managed through the .datasets
attribute of the Neuralk
client.
The data is uploaded to our project with Neuralk.datasets.create
. We
pass the project object, a name for the dataset, and the local path of the
CSV or Parquet file (in this case, Parquet) to upload.
train_dataset = client.datasets.create(
project, "best_buy_train", local_dataset["train_path"]
)
Fitting a product categorization workflow#
The next step is to fit a workflow on the training data. Later, we will be able to use it to categorize new products for which we do not have a ground truth.
Analyses are managed through the analysis
attribute of the
Neuralk
client. We launch the fit of our workflow by creating a
“categorization fit” analysis with
Neuralk.analysis.create_categorization_fit
.
analysis_fit = client.analysis.create_categorization_fit(
train_dataset,
"best_buy_fit",
target_columns=[
"neuralk categorization level 0",
"neuralk categorization level 1",
"neuralk categorization level 2",
"neuralk categorization level 3",
"neuralk categorization level 4",
],
)
print("Categorization fit analysis created:", analysis_fit)
Categorization fit analysis created: Analysis(id='0ffd5794-81fa-4656-adce-97f4aaffedf8', name='best_buy_fit', error=None, advancement=None, status='PENDING')
Neuralk.analysis.create_categorization_fit
returns immediately an
Analysis
object that represents the analysis we have just launched.
This analysis has not finished running yet. Depending on our needs, we may
want to continue with other tasks or finish our script at this point. But in
this example, we want to wait and use the fitted model once it is ready.
Neuralk.analysis.wait_until_complete
allows us to pause the execution
of our script and wait for the analysis to finish (or error). It returns a
new Analysis
object, corresponding to the same analysis but with an
updated status.
analysis_fit = client.analysis.wait_until_complete(analysis_fit, verbose=True)
Analysis status: None
Analysis status: PENDING
Analysis status: RUNNING
Analysis status: SUCCEEDED ✅
Using the fitted workflow#
Now that we have fitted our categorizer, we can use it for some new, unseen products.
We start by creating a new dataset to upload the unseen data.
test_dataset = client.datasets.create(
project, "best_buy_train", local_dataset["test_path"]
)
We can now apply the fitted workflow to the test_dataset
we just created.
This is done with Neuralk.analysis.create_categorization_predict
, to
which we pass the new dataset, a name for the analysis, and the
Analysis
object resulting from fitting the model we wish to use now.
analysis_predict = client.analysis.create_categorization_predict(
test_dataset, "best_buy_predict", analysis_fit
)
print("Categorization fit analysis created:", analysis_fit)
Categorization fit analysis created: Analysis(id='0ffd5794-81fa-4656-adce-97f4aaffedf8', name='best_buy_fit', error=None, advancement=100, status='SUCCEEDED')
As before, we wait until the analysis finishes before continuing the example.
analysis_predict = client.analysis.wait_until_complete(analysis_predict, verbose=True)
Analysis status: None
Analysis status: PENDING
Analysis status: RUNNING
Analysis status: SUCCEEDED ✅
Downloading the prediction results#
Now that our prediction analyis is complete, we want to download the
predictions. This is done with Neuralk.analysis.download_results
, to
which we pass the reference to the prediction analysis whose results we want.
All the results are stored in the provided directory, from which we can load them to use as we wish.
with tempfile.TemporaryDirectory() as results_dir:
client.analysis.download_results(analysis_predict, folder_path=results_dir)
print("Prediction results downloaded to temporary directory")
results_file = next(Path(results_dir).iterdir())
y_pred = pl.read_parquet(results_file)
print(y_pred.shape)
Prediction results downloaded to temporary directory
(100, 5)
Total running time of the script: (0 minutes 55.692 seconds)