Neuralk-AI Classification Expert Module
The previous examples have shown how to use the Neuralk Classifier to
solve classification problems with Neuralk’s foundational model. However as we
saw the Classifier is a generic building block, and for complex use-cases it
should be integrated in a full data-processing pipeline. One option is to build
our own pipeline locally and use the Classifier in it. An alternative is to
rely on one of the fully integrated, end-to-end workflows available on the
Neuralk platform (at the moment, classification and product categorization).
This is a different usage of the platform, in which we rely on it more heavily to create and manage projects and datasets, run data-processing workflows and store the resulting models and prediction results.
Here we illustrate the practical aspects of the platform (how to connect to it, upload data etc.) on a toy example. More information on use-cases and support is available on the Neuralk website.
WARNING
For this example to run, the environment variables NEURALK_USERNAME and
NEURALK_PASSWORD must be defined. They will be used to connect to the
Neuralk API.
The first step is to create a Neuralk client that we will use to
interact with the platform. Note that we chose to make our username and
password available through environment variables, but you can use other
approaches to load them.
import os
from neuralk import Neuralk
client = Neuralk(os.environ['NEURALK_USERNAME'], os.environ['NEURALK_PASSWORD'])
Next, we use scikit-learn’s make_moons dataset to
simulate a binary classification task.
from neuralk.datasets import two_moons
moons_data = two_moons()
print(moons_data["path"])
/Users/aabraham/neuralk/src/neuralk/datasets/data/moons.csv
We want to learn to classify this data with Neuralk. All analyses that run on the platform happen within a “project”. We must first create a project and upload a dataset in it.
project = client.projects.create("MoonsExample", exist_ok=True)
dataset = client.datasets.create(
project,
"MoonsExample",
moons_data["path"],
)
client.datasets.wait_until_complete(dataset, verbose=True)
Dataset(id='e416598a-84b3-419e-8047-a0cecf8e177f', name='MoonsExample', file_name='moons.csv', status='OK', analysis_list=[])
Now, we fit a classification pipeline. Note that no long-running training is happening, as the core of the pipeline is the pretrained foundational Neuralk model.
We specify the column to predict (“label”) and the features to use.
analysis_fit = client.analysis.create_classifier_fit(
dataset,
"Two Moons Classifier",
target_column="label",
feature_column_name_list=["feature1", "feature2"],
)
analysis_fit = client.analysis.wait_until_complete(analysis_fit, verbose=True)
Once our fit is completed, we can refer to it to perform predictions on unseen data. Here to keep the example simple we just apply it to the same data that we used for training.
analysis_predict = client.analysis.create_classifier_predict(
dataset, "Two Moons Prediction", analysis_fit
)
analysis_predict = client.analysis.wait_until_complete(analysis_predict, verbose=True)
Finally, we can download the prediction results.
import tempfile
from pathlib import Path
from sklearn.metrics import accuracy_score
import polars as pl
with tempfile.TemporaryDirectory() as results_dir:
client.analysis.download_results(analysis_predict, folder_path=results_dir)
results_file = next(Path(results_dir).iterdir())
y_pred = pl.read_parquet(results_file)["label"].to_numpy()
X = pl.read_csv(moons_data["path"])
y = X["label"].to_numpy()
X = X.drop("label").to_numpy()
acc = accuracy_score(y, y_pred)
print(f"Accuracy of classification: {acc}")
Accuracy of classification: 0.984
We finish by plotting the results.
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams.update(
{
"axes.edgecolor": "#4d4d4d",
"axes.linewidth": 1.2,
"axes.facecolor": "#f5f5f5",
"figure.facecolor": "white",
}
)
fig, axes = plt.subplots(1, 2, figsize=(11, 5), dpi=120)
titles = ["Ground Truth", f"Model Prediction\nAccuracy: {acc:.2f}"]
colors = ["#1a73e8", "#ffa600"] # Professional blue & orange
for idx, ax in enumerate(axes):
labels = y if idx == 0 else y_pred
for lab in np.unique(labels):
ax.scatter(
X[labels == lab, 0],
X[labels == lab, 1],
s=70,
marker="o",
c=colors[lab],
edgecolors="white",
linewidths=0.8,
alpha=0.9,
label=f"Class {lab}" if idx == 0 else None, # Legend only on first panel
zorder=3,
)
# Aesthetics
ax.set_xticks([])
ax.set_yticks([])
ax.set_aspect("equal")
ax.set_title(titles[idx], fontsize=14, weight="bold", pad=12)
ax.grid(False)
# Subtle outer border (inside the axes limits)
x_margin = 0.4
y_margin = 0.4
ax.set_xlim(X[:, 0].min() - x_margin, X[:, 0].max() + x_margin)
ax.set_ylim(X[:, 1].min() - y_margin, X[:, 1].max() + y_margin)
# Panel annotation (A, B)
ax.text(
0.05,
0.98,
chr(ord("A") + idx),
transform=ax.transAxes,
fontsize=16,
fontweight="bold",
va="top",
ha="right",
)
# Shared legend beneath the plots
handles, labels_ = axes[0].get_legend_handles_labels()
fig.legend(
handles,
labels_,
loc="lower center",
ncol=2,
frameon=False,
fontsize=12,
bbox_to_anchor=(0.5, 0.02),
)
fig.tight_layout()
plt.subplots_adjust(bottom=0.05)
plt.show()
Total running time of the script: (0 minutes 21.339 seconds)