Skip to main content

Neuralk-AI Categorization Example

This example shows how to use the NeuralkAI SDK to predict product categories. We use a subset of the Best Buy dataset.

import os
import tempfile
from pathlib import Path

import polars as pl

from neuralk import Neuralk
from neuralk.datasets import best_buy

Loading username and password

To connect to the Neuralk API, we need to authenticate. Here we read the username and password from environment variables. We first attempt to load any variables set in a dotenv file.

Then, we can create a Neuralk client to connect to the API.

try:
from dotenv import load_dotenv

load_dotenv()
except ImportError:
print("python-dotenv not installed, skipping .env loading")

user = os.environ.get("NEURALK_USERNAME")
password = os.environ.get("NEURALK_PASSWORD")
client = Neuralk(user, password)

Uploading datasets into a project

All the datasets and the analyses we run on them belong to a project. Those are managed through the .projects attribute of the Neuralk client.

Here we retrieve the "best_buy" project, creating it if it does not exist yet. Note that we can get the list of all projects accessible from our client with Neuralk.projects.get_list

def get_or_create_project(name):
project = next((p for p in client.projects.get_list() if p.name == name), None)
if project is None:
project = client.projects.create(name)
return project


project = get_or_create_project("best_buy")

The Neuralk SDK package provides a few example datasets with which we can try out the API in the datasets module. We obtain the path to a Parquet file containing a (training) subset of the Best Buy data.

to make the example run fast we use a small subset, pass subsample=False to run on full dataset

local_dataset = best_buy(subsample=True)
pl.read_parquet(local_dataset["train_path"])

To run an analysis, we need to create a dataset on the platform. Those are managed through the .datasets attribute of the Neuralk client.

The data is uploaded to our project with Neuralk.datasets.create. We pass the project object, a name for the dataset, and the local path of the CSV or Parquet file (in this case, Parquet) to upload.

train_dataset = client.datasets.create(
project, "best_buy_train", local_dataset["train_path"]
)

client.datasets.wait_until_complete(train_dataset, verbose=True)

Fitting a product categorization workflow

The next step is to fit a workflow on the training data. Later, we will be able to use it to categorize new products for which we do not have a ground truth.

Analyses are managed through the analysis attribute of the Neuralk client. We launch the fit of our workflow by creating a “categorization fit” analysis with Neuralk.analysis.create_categorization_fit.

analysis_fit = client.analysis.create_categorization_fit(
train_dataset,
"best_buy_fit",
target_columns=[
"neuralk categorization level 0",
"neuralk categorization level 1",
"neuralk categorization level 2",
"neuralk categorization level 3",
"neuralk categorization level 4",
],
)
print("Categorization fit analysis created:", analysis_fit)

Neuralk.analysis.create_categorization_fit returns immediately an Analysis object that represents the analysis we have just launched. This analysis has not finished running yet. Depending on our needs, we may want to continue with other tasks or finish our script at this point. But in this example, we want to wait and use the fitted model once it is ready.

Neuralk.analysis.wait_until_complete allows us to pause the execution of our script and wait for the analysis to finish (or error). It returns a new Analysis object, corresponding to the same analysis but with an updated status.

analysis_fit = client.analysis.wait_until_complete(analysis_fit, verbose=True)

Using the fitted workflow

Now that we have fitted our categorizer, we can use it for some new, unseen products.

We start by creating a new dataset to upload the unseen data.

test_dataset = client.datasets.create(
project, "best_buy_train", local_dataset["test_path"]
)

client.datasets.wait_until_complete(test_dataset, verbose=True)

We can now apply the fitted workflow to the test_dataset we just created. This is done with Neuralk.analysis.create_categorization_predict, to which we pass the new dataset, a name for the analysis, and the Analysis object resulting from fitting the model we wish to use now.

analysis_predict = client.analysis.create_categorization_predict(
test_dataset, "best_buy_predict", analysis_fit
)
print("Categorization fit analysis created:", analysis_fit)

As before, we wait until the analysis finishes before continuing the example.

analysis_predict = client.analysis.wait_until_complete(analysis_predict, verbose=True)

Downloading the prediction results

Now that our prediction analyis is complete, we want to download the predictions. This is done with Neuralk.analysis.download_results, to which we pass the reference to the prediction analysis whose results we want.

All the results are stored in the provided directory, from which we can load them to use as we wish.

with tempfile.TemporaryDirectory() as results_dir:
client.analysis.download_results(analysis_predict, folder_path=results_dir)
print("Prediction results downloaded to temporary directory")
results_file = next(Path(results_dir).iterdir())
y_pred = pl.read_parquet(results_file)

print(y_pred.shape)