
# NeuralkAI Enrichment Workflow Example

This example demonstrates how to use the Neuralk SDK to:

1. Access to an existing project
2. Train an enrichment analysis over a training dataset
3. Run predictions on a new dataset
4. Retrieve the results


## Step 1 - Import required libraries



In [None]:
import os
from pathlib import Path
import polars as pl
import tempfile

from neuralk import Neuralk

## Step 2 - Load username and password

To connect to the Neuralk API, we need to authenticate. Here we read the
username and password from environment variables. We first attempt to load
any variables set in a [dotenv](https://github.com/theskumar/python-dotenv)
file.

Then, we can create a :class:`Neuralk` client to connect to the API.



In [None]:
try:
    from dotenv import load_dotenv

    load_dotenv()
except ImportError:
    print("python-dotenv not installed, skipping .env loading")

user = os.environ.get("NEURALK_USER")
password = os.environ.get("NEURALK_PASSWORD")

In [None]:
client = Neuralk(user, password)

## Step 3 - We preview the public Amazon reviews dataset we will use for training.

Dataset was extracted from the Hugging Face Hub, https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023, and preprocessed.
%%



In [None]:
dataset_train = pl.read_parquet("datasets/amazon_reviews_train.parquet", n_rows=5)
print("Columns in dataset_train: ", dataset_train.columns)
print("First 5 rows of dataset_train: ", dataset_train)

## Step 4 - Create a new project and upload dataset

A dataset can be uploaded to the Neuralk platform from local.
Here we upload the training dataset from the local file system.
%%



In [None]:
project_name = "Amazon_Enrichment"
for project in client.projects.get_list():
    if project.name == project_name:
        client.projects.delete(project)

project = client.projects.create(project_name)
print("Project created:", project)

dataset = client.datasets.create(project, "Amazon Train set", "datasets/amazon_reviews_train.parquet")
print("Dataset uploaded:", dataset)

## Step 5 - Fit an enrichment analysis
The enrichment analysis performs two tasks: product categorization and attribute extraction, in order to enrich your products raw data.
We specify the column to predict for the category ("category") and the features to use.
Users are able to specify a taxonomy file, or the attributes to extract for all products or by category.
In this example, we only specify the attributes to extract for the category "Clothing_Shoes_and_Jewelry", and define simple generic attributes for all products.
%%



In [None]:
enrichment_fit = client.analysis.create_enrichment_fit(
    dataset=dataset,
    name="Amazon Enrichment - Fit",
    target_columns=["category"],
    taxonomy_file=None, # I don't have a taxonomy of reference
    feature_cols=["title", "description", "features"],
    generic_attributes_schema=["brand", "product model"], # Attributes shared by all products
    specific_attributes_schema={"Clothing_Shoes_and_Jewelry": ["material", "gender", "color"]} # For this category, we want to extract the material and the gender in addition.
)

print("Enrichment training analysis created:", enrichment_fit)


# We monitor the training progress until it's complete (100% advancement).
client.analysis.wait_until_complete(
    enrichment_fit,
    refresh_time=10, ## Refresh the progress of the analysis every 10 seconds
    verbose=True, ## Print the progress of the analysis
)

## Step 6 - Launch a prediction analysis
Now that the enrichment analysis is fitted, we can use it to predict the category and attributes of a new dataset.
We use a test dataset from the same source, different from the training dataset, to perform predictions.



In [None]:
dataset = client.datasets.create(project, "Amazon Prediction Set", "datasets/amazon_reviews_predict.parquet")

enrichment_predict = client.analysis.create_enrichment_predict(
    dataset=dataset,
    name="Amazon Enrichment - Predict",
    enrichment_fit_analysis=enrichment_fit,
)
print("Prediction analysis launched:", enrichment_predict)

enrichment_predict = client.analysis.wait_until_complete(
    enrichment_predict,
    refresh_time=5,
    verbose=True,
)

## Step 7 - Download the prediction results

Now that our prediction analyis is complete, we want to download the
predictions. This is done with :meth:`Neuralk.analysis.download_results`, to
which we pass the reference to the prediction analysis whose results we want.

All the results are stored in the provided directory, from which we can load
them to use as we wish.



In [None]:
with tempfile.TemporaryDirectory() as results_dir:
    client.analysis.download_results(enrichment_predict, folder_path=results_dir)
    print("Prediction results downloaded to temporary directory")
    results_file = next(Path(results_dir).iterdir())
    prediction_results = pl.read_parquet(results_file)

## Step 8 - Analyze the enriched products sheets

We can now analyze the enriched products sheets.
At the of the day, the enrichment analysis allows you to have a structured table for each category.



In [None]:
print("Prediction results for Clothing_Shoes_and_Jewelry products")  
clothing_shoes_and_jewelry_results = prediction_results.filter(pl.col("neuralk_categorization") == "Clothing_Shoes_and_Jewelry")
unnest_infos = clothing_shoes_and_jewelry_results.with_columns(pl.col("neuralk_extracted_information").str.json_decode()).unnest("neuralk_extracted_information")
print(unnest_infos.head())

print("Prediction results for Non-Clothing_Shoes_and_Jewelry products")  
non_clothing_shoes_and_jewelry_results = prediction_results.filter(pl.col("neuralk_categorization") != "Clothing_Shoes_and_Jewelry")
unnest_infos = non_clothing_shoes_and_jewelry_results.with_columns(pl.col("neuralk_extracted_information").str.json_decode()).unnest("neuralk_extracted_information")
print(unnest_infos.head())

## Step 9 - Clean the environement



In [None]:
client.logout()