NeuralkAI Enrichment Workflow Example#

This example demonstrates how to use the Neuralk SDK to:

Access to an existing project
Train an enrichment analysis over a training dataset
Run predictions on a new dataset
Retrieve the results

Step 1 - Import required libraries#

import os
from pathlib import Path
import polars as pl
import tempfile

from neuralk import Neuralk

Step 2 - Load username and password#

To connect to the Neuralk API, we need to authenticate. Here we read the username and password from environment variables. We first attempt to load any variables set in a dotenv file.

Then, we can create a Neuralk client to connect to the API.

try:
    from dotenv import load_dotenv

    load_dotenv()
except ImportError:
    print("python-dotenv not installed, skipping .env loading")

user = os.environ.get("NEURALK_USER")
password = os.environ.get("NEURALK_PASSWORD")

client = Neuralk(user, password)

Step 3 - We preview the public Amazon reviews dataset we will use for training.#

Dataset was extracted from the Hugging Face Hub, https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023, and preprocessed. %%

dataset_train = pl.read_parquet("datasets/amazon_reviews_train.parquet", n_rows=5)
print("Columns in dataset_train: ", dataset_train.columns)
print("First 5 rows of dataset_train: ", dataset_train)

Columns in dataset_train:  ['title', 'average_rating', 'rating_number', 'features', 'description', 'price', 'images', 'categories', 'parent_asin', 'subtitle', 'author', 'category', 'upc']
First 5 rows of dataset_train:  shape: (5, 13)
┌─────────────────────────────────┬────────────────┬───────────────┬─────────────────────────────────┬───┬─────────────────────────────────┬────────┬───────────────────────────┬──────┐
│ title                           ┆ average_rating ┆ rating_number ┆ features                        ┆ … ┆ subtitle                        ┆ author ┆ category                  ┆ upc  │
│ ---                             ┆ ---            ┆ ---           ┆ ---                             ┆   ┆ ---                             ┆ ---    ┆ ---                       ┆ ---  │
│ str                             ┆ f64            ┆ f64           ┆ list[str]                       ┆   ┆ str                             ┆ str    ┆ str                       ┆ str  │
╞═════════════════════════════════╪════════════════╪═══════════════╪═════════════════════════════════╪═══╪═════════════════════════════════╪════════╪═══════════════════════════╪══════╡
│ Herduk Flower Pots - 5.3" Larg… ┆ 4.5            ┆ 3.0           ┆ []                              ┆ … ┆ null                            ┆ null   ┆ Patio_Lawn_and_Garden     ┆ null │
│ 3D Matalchok Quality NF-MSS Ca… ┆ 4.5            ┆ 33.0          ┆ ["High temperature resistance … ┆ … ┆ null                            ┆ null   ┆ Industrial_and_Scientific ┆ null │
│ EXCEART Crash Ride Cymbal Bass… ┆ 3.7            ┆ 10.0          ┆ ["Easy playability, A bright c… ┆ … ┆ null                            ┆ null   ┆ Musical_Instruments       ┆ null │
│ Garfield 2015 Day-to-Day Calen… ┆ 4.9            ┆ 47.0          ┆ ["Garfield on a diet? No way, … ┆ … ┆ Calendar – Day to Day Calendar… ┆ null   ┆ Office_Products           ┆ null │
│ DecoArt DS3-9 DecoMagic Brush … ┆ 4.6            ┆ 39.0          ┆ ["Non-toxic", "Can be used on … ┆ … ┆ null                            ┆ null   ┆ Arts_Crafts_and_Sewing    ┆ null │
└─────────────────────────────────┴────────────────┴───────────────┴─────────────────────────────────┴───┴─────────────────────────────────┴────────┴───────────────────────────┴──────┘

Step 4 - Create a new project and upload dataset#

A dataset can be uploaded to the Neuralk platform from local. Here we upload the training dataset from the local file system. %%

project_name = "Amazon_Enrichment"
for project in client.projects.get_list():
    if project.name == project_name:
        client.projects.delete(project)

project = client.projects.create(project_name)
print("Project created:", project)

dataset = client.datasets.create(project, "Amazon Train set", "datasets/amazon_reviews_train.parquet")
print("Dataset uploaded:", dataset)

Project created: Project(name='Amazon_Enrichment', id='c7096bfe-feb1-42c8-ac1d-d891879e127a', dataset_list=[], user_list=[('OWNER', User(id='5e00262a-34d4-416a-80d4-296ae1b87585', email='alexandre.abraham@neuralk-ai.com', firstname='Alexandre', lastname='Abraham'))], project_file_list=[], analysis_list=[])
Dataset uploaded: Dataset(id='e1eca076-ad3d-46c5-baa1-fd06861af99f', name='Amazon Train set', file_name='amazon_reviews_train.parquet', analysis_list=[])

Step 5 - Fit an enrichment analysis#

The enrichment analysis performs two tasks: product categorization and attribute extraction, in order to enrich your products raw data. We specify the column to predict for the category (“category”) and the features to use. Users are able to specify a taxonomy file, or the attributes to extract for all products or by category. In this example, we only specify the attributes to extract for the category “Clothing_Shoes_and_Jewelry”, and define simple generic attributes for all products. %%

enrichment_fit = client.analysis.create_enrichment_fit(
    dataset=dataset,
    name="Amazon Enrichment - Fit",
    target_columns=["category"],
    taxonomy_file=None, # I don't have a taxonomy of reference
    feature_cols=["title", "description", "features"],
    generic_attributes_schema=["brand", "product model"], # Attributes shared by all products
    specific_attributes_schema={"Clothing_Shoes_and_Jewelry": ["material", "gender", "color"]} # For this category, we want to extract the material and the gender in addition.
)

print("Enrichment training analysis created:", enrichment_fit)


# We monitor the training progress until it's complete (100% advancement).
client.analysis.wait_until_complete(
    enrichment_fit,
    refresh_time=10, ## Refresh the progress of the analysis every 10 seconds
    verbose=True, ## Print the progress of the analysis
)

Enrichment training analysis created: Analysis(id='10c89afc-b48b-444a-a092-a1d5ba988883', name='Amazon Enrichment - Fit', error=None, advancement=None, status='PENDING')

Analysis status: None

Analysis status: PENDING

Analysis status: RUNNING

Analysis status: SUCCEEDED ✅

Analysis(id='10c89afc-b48b-444a-a092-a1d5ba988883', name='Amazon Enrichment - Fit', error=None, advancement=100, status='SUCCEEDED')

Step 6 - Launch a prediction analysis#

Now that the enrichment analysis is fitted, we can use it to predict the category and attributes of a new dataset. We use a test dataset from the same source, different from the training dataset, to perform predictions.

dataset = client.datasets.create(project, "Amazon Prediction Set", "datasets/amazon_reviews_predict.parquet")

enrichment_predict = client.analysis.create_enrichment_predict(
    dataset=dataset,
    name="Amazon Enrichment - Predict",
    enrichment_fit_analysis=enrichment_fit,
)
print("Prediction analysis launched:", enrichment_predict)

enrichment_predict = client.analysis.wait_until_complete(
    enrichment_predict,
    refresh_time=5,
    verbose=True,
)

Prediction analysis launched: Analysis(id='48468321-a4d1-483d-984c-00304d40d96d', name='Amazon Enrichment - Predict', error=None, advancement=None, status='PENDING')

Analysis status: None

Analysis status: PENDING

Analysis status: RUNNING

Analysis status: SUCCEEDED ✅

Step 7 - Download the prediction results#

Now that our prediction analyis is complete, we want to download the predictions. This is done with Neuralk.analysis.download_results, to which we pass the reference to the prediction analysis whose results we want.

All the results are stored in the provided directory, from which we can load them to use as we wish.

with tempfile.TemporaryDirectory() as results_dir:
    client.analysis.download_results(enrichment_predict, folder_path=results_dir)
    print("Prediction results downloaded to temporary directory")
    results_file = next(Path(results_dir).iterdir())
    prediction_results = pl.read_parquet(results_file)

Prediction results downloaded to temporary directory

Step 8 - Analyze the enriched products sheets#

We can now analyze the enriched products sheets. At the of the day, the enrichment analysis allows you to have a structured table for each category.

print("Prediction results for Clothing_Shoes_and_Jewelry products")
clothing_shoes_and_jewelry_results = prediction_results.filter(pl.col("neuralk_categorization") == "Clothing_Shoes_and_Jewelry")
unnest_infos = clothing_shoes_and_jewelry_results.with_columns(pl.col("neuralk_extracted_information").str.json_decode()).unnest("neuralk_extracted_information")
print(unnest_infos.head())

print("Prediction results for Non-Clothing_Shoes_and_Jewelry products")
non_clothing_shoes_and_jewelry_results = prediction_results.filter(pl.col("neuralk_categorization") != "Clothing_Shoes_and_Jewelry")
unnest_infos = non_clothing_shoes_and_jewelry_results.with_columns(pl.col("neuralk_extracted_information").str.json_decode()).unnest("neuralk_extracted_information")
print(unnest_infos.head())

Prediction results for Clothing_Shoes_and_Jewelry products
shape: (5, 6)
┌────────────────────────────┬────────────────┬─────────────────────────────────┬─────────────────┬─────────────┬────────┐
│ neuralk_categorization     ┆ brand          ┆ product model                   ┆ material        ┆ color       ┆ gender │
│ ---                        ┆ ---            ┆ ---                             ┆ ---             ┆ ---         ┆ ---    │
│ str                        ┆ str            ┆ str                             ┆ str             ┆ str         ┆ str    │
╞════════════════════════════╪════════════════╪═════════════════════════════════╪═════════════════╪═════════════╪════════╡
│ Clothing_Shoes_and_Jewelry ┆ petzl          ┆ null                            ┆ fabric          ┆ red-black   ┆ unisex │
│ Clothing_Shoes_and_Jewelry ┆ detomaso       ┆ dt1064-c                        ┆ leather         ┆ black       ┆ men's  │
│ Clothing_Shoes_and_Jewelry ┆ falcon jewelry ┆ 6-piece puzzle ring             ┆ sterling silver ┆ null        ┆ men    │
│ Clothing_Shoes_and_Jewelry ┆ null           ┆ null                            ┆ null            ┆ black white ┆ null   │
│ Clothing_Shoes_and_Jewelry ┆ loungefly      ┆ captain america costume cospla… ┆ null            ┆ blue        ┆ null   │
└────────────────────────────┴────────────────┴─────────────────────────────────┴─────────────────┴─────────────┴────────┘
Prediction results for Non-Clothing_Shoes_and_Jewelry products
shape: (5, 3)
┌────────────────────────┬────────────┬─────────────────────────────────┐
│ neuralk_categorization ┆ brand      ┆ product model                   │
│ ---                    ┆ ---        ┆ ---                             │
│ str                    ┆ str        ┆ str                             │
╞════════════════════════╪════════════╪═════════════════════════════════╡
│ Digital_Music          ┆ null       ┆ null                            │
│ Office_Products        ┆ null       ┆ null                            │
│ Home_and_Kitchen       ┆ null       ┆ null                            │
│ Amazon_Fashion         ┆ oenbopo    ┆ null                            │
│ Sports_and_Outdoors    ┆ powerextra ┆ 48-11-2410/48-11-2420/48-11-24… │
└────────────────────────┴────────────┴─────────────────────────────────┘

Step 9 - Clean the environement#

client.logout()

<Response [200]>

Total running time of the script: (2 minutes 27.771 seconds)

Gallery generated by Sphinx-Gallery