Using the model directly¶
There are 2 different ways to use NICL, Neuralk’s In-Context-Learning model.
The simplest is to use only the model itself, through a familiar interface compatible with scikit-learn.
The NICLClassifier supports both deployment modes:
Neuralk Cloud API: Use
NICLClassifierwith your API key (default)On-premise server: Use
NICLClassifierwith thehostparameter
We describe this usage pattern here.
The second option is to use Neuralk’s end-to-end expert use-cases. Those are not only a predictor, but fully integrated workflows that encompass project and dataset management, preprocessing and feature extraction, and prediction. This usage pattern is described in the next section.
Effective use of In-Context learning¶
When we use the model directly, we are responsible for selecting and preprocessing its inputs. Here we provide some general tips for using the NICL model effectively.
Data preparation¶
Missing values are not supported and must be imputed.
We find that simple imputation strategies like scikit-learn’s SimpleImputer perform as well or even better than more sophisticated and computationally expensive strategies.
Only numeric inputs are accepted.
Other data types such as text, categories and datetimes must be encoded.
This can be handled for example with sklearn.preprocessing or skrub.
High dimensionality can be detrimental.
If your features have more than a few hundred dimensions, we recommend applying a sklearn.decomposition.PCA to reduce the dimension.
Dataset size¶
NICL has no hard limits on data size. The only constraints are network-related (request size and timeout). NICL can handle datasets of any scale.
For very large datasets, you may want to sample a relevant subset as context to optimize inference speed. This is discussed in more detail in Controlling the sampling of the Classifier’s context.
Estimator interface¶
NICL can be used as any other classifier compatible with scikit-learn. Here is an example of a simple pipeline that should perform well on most datasets:
>>> import os
>>> from skrub import TableVectorizer
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.impute import SimpleImputer
>>> from sklearn.decomposition import PCA
>>> from neuralk import NICLClassifier
>>>
>>> api_key = os.environ["NEURALK_API_KEY"]
>>> classifier = make_pipeline(
... TableVectorizer(),
... SimpleImputer(),
... PCA(100),
... NICLClassifier(api_key=api_key),
... )
Adjust the PCA dimension according to your data’s dimensionality or tune this hyperparameter.
For on-premise deployments, use NICLClassifier with the host parameter:
>>> from neuralk import NICLClassifier
>>>
>>> classifier = NICLClassifier(host="http://localhost:8000")
Deprecated since version 0.1.0: Classifier and OnPremiseClassifier are deprecated and will be removed in a future version.
Use NICLClassifier instead.