Neuralk-AI Categorization Example
This example shows how to use the NeuralkAI SDK to predict product categories. We use a subset of the Best Buy dataset.
import os
import tempfile
from pathlib import Path
import polars as pl
from neuralk import Neuralk
from neuralk.datasets import best_buy
Loading username and password
To connect to the Neuralk API, we need to authenticate. Here we read the username and password from environment variables. We first attempt to load any variables set in a dotenv file.
Then, we can create a Neuralk client to connect to the API.
try:
from dotenv import load_dotenv
load_dotenv()
except ImportError:
print("python-dotenv not installed, skipping .env loading")
user = os.environ.get("NEURALK_USERNAME")
password = os.environ.get("NEURALK_PASSWORD")
client = Neuralk(user, password)
Uploading datasets into a project
All the datasets and the analyses we run on them belong to a project. Those
are managed through the .projects attribute of the Neuralk
client.
Here we retrieve the "best_buy" project, creating it if it does not exist
yet. Note that we can get the list of all projects accessible from our client
with Neuralk.projects.get_list
def get_or_create_project(name):
project = next((p for p in client.projects.get_list() if p.name == name), None)
if project is None:
project = client.projects.create(name)
return project
project = get_or_create_project("best_buy")
The Neuralk SDK package provides a few example datasets with which we can try
out the API in the datasets module. We obtain the path to a Parquet
file containing a (training) subset of the Best Buy data.
to make the example run fast we use a small subset, pass subsample=False to run on full dataset
local_dataset = best_buy(subsample=True)
pl.read_parquet(local_dataset["train_path"])
To run an analysis, we need to create a dataset on the platform. Those are
managed through the .datasets attribute of the Neuralk client.
The data is uploaded to our project with Neuralk.datasets.create. We
pass the project object, a name for the dataset, and the local path of the
CSV or Parquet file (in this case, Parquet) to upload.
train_dataset = client.datasets.create(
project, "best_buy_train", local_dataset["train_path"]
)
client.datasets.wait_until_complete(train_dataset, verbose=True)
Fitting a product categorization workflow
The next step is to fit a workflow on the training data. Later, we will be able to use it to categorize new products for which we do not have a ground truth.
Analyses are managed through the analysis attribute of the
Neuralk client. We launch the fit of our workflow by creating a
“categorization fit” analysis with
Neuralk.analysis.create_categorization_fit.
analysis_fit = client.analysis.create_categorization_fit(
train_dataset,
"best_buy_fit",
target_columns=[
"neuralk categorization level 0",
"neuralk categorization level 1",
"neuralk categorization level 2",
"neuralk categorization level 3",
"neuralk categorization level 4",
],
)
print("Categorization fit analysis created:", analysis_fit)
Neuralk.analysis.create_categorization_fit returns immediately an
Analysis object that represents the analysis we have just launched.
This analysis has not finished running yet. Depending on our needs, we may
want to continue with other tasks or finish our script at this point. But in
this example, we want to wait and use the fitted model once it is ready.
Neuralk.analysis.wait_until_complete allows us to pause the execution
of our script and wait for the analysis to finish (or error). It returns a
new Analysis object, corresponding to the same analysis but with an
updated status.
analysis_fit = client.analysis.wait_until_complete(analysis_fit, verbose=True)
Using the fitted workflow
Now that we have fitted our categorizer, we can use it for some new, unseen products.
We start by creating a new dataset to upload the unseen data.
test_dataset = client.datasets.create(
project, "best_buy_train", local_dataset["test_path"]
)
client.datasets.wait_until_complete(test_dataset, verbose=True)
We can now apply the fitted workflow to the test_dataset we just created.
This is done with Neuralk.analysis.create_categorization_predict, to
which we pass the new dataset, a name for the analysis, and the
Analysis object resulting from fitting the model we wish to use now.
analysis_predict = client.analysis.create_categorization_predict(
test_dataset, "best_buy_predict", analysis_fit
)
print("Categorization fit analysis created:", analysis_fit)
As before, we wait until the analysis finishes before continuing the example.
analysis_predict = client.analysis.wait_until_complete(analysis_predict, verbose=True)
Downloading the prediction results
Now that our prediction analyis is complete, we want to download the
predictions. This is done with Neuralk.analysis.download_results, to
which we pass the reference to the prediction analysis whose results we want.
All the results are stored in the provided directory, from which we can load them to use as we wish.
with tempfile.TemporaryDirectory() as results_dir:
client.analysis.download_results(analysis_predict, folder_path=results_dir)
print("Prediction results downloaded to temporary directory")
results_file = next(Path(results_dir).iterdir())
y_pred = pl.read_parquet(results_file)
print(y_pred.shape)