Frequently Asked Questions¶
General¶
What is NICL?¶
NICL (Neuralk In-Context Learning) is a foundation model designed specifically for tabular prediction tasks (classification) where patterns can be inferred directly from the data distribution without extensive model tuning.
At Neuralk AI we develop powerful Tabular Foundation Models (TFMs) purpose-built for industry applications, delivering state-of-the-art accuracy on real-world ML tasks across commerce, finance, healthcare and beyond.
When should I use NICL?¶
NICL is ideal when you:
Need strong baseline performance without hyper-parameter tuning
Want a unified approach to handle mixed feature types
Are exploring new datasets and want fast iteration
Prefer interpretability and flexible prompting over black-box optimisation
What is In-Context Learning?¶
In-Context Learning (ICL) is a process where a model makes predictions based on contextual examples provided directly within the input, rather than by updating its internal parameters through traditional training methods. This approach is particularly advantageous when labeled data is scarce or frequently changing.
For NICL, a small set of input–output pairs (the context) acts as a description of the task. The model uses this contextual information to infer relationships and predict outcomes for new samples, effectively adapting to new datasets without retraining.
How is NICL different from LLMs?¶
In-Context Learning in NICL is conceptually similar to how Large Language Models use prompts. In both cases, the model conditions its predictions on the information provided in the input without updating internal parameters.
Key difference: Unlike LLMs, Tabular Foundation Models like NICL do not contain broad, latent world knowledge. They act as inference engines that rely almost entirely on the information provided in the context, learning the task from the examples themselves rather than retrieving facts from internal memory.
Think of NICL as a reasoning engine for tabular data: the clearer the context, the more accurate the inference.
What are the limitations of NICL?¶
While NICL provides excellent generalisation across a wide range of datasets, for very high-dimensional or sparse data (e.g., text bag-of-words, one-hot expansions) it may require further preprocessing steps to achieve optimal performance.
What’s the difference between nicl-flash, nicl-small, and nicl-large?¶
Model |
Best For |
Trade-off |
|---|---|---|
|
Prototyping, real-time applications |
Fastest inference, slightly lower accuracy |
|
General use (default) |
Balanced speed and accuracy |
|
Complex tasks, maximum accuracy |
Highest accuracy, slower inference |
Does Neuralk support regression?¶
Currently, the SDK supports classification tasks only. Regression support is planned for a future release.
Can I use Neuralk offline?¶
Yes, with an on-premise deployment. Contact us for enterprise licensing:
clf = NICLClassifier(host="http://your-server:8000")
Data & Limits¶
What’s the maximum dataset size?¶
For the current release, while our infrastructure is being scaled up, we limit data size to:
Maximum rows: 10 million
Maximum columns: 1,000
Additional practical constraints:
Request timeout: Default 15 minutes (configurable via
timeout_s)Network bandwidth: Large datasets take longer to upload
For very large datasets, consider using sampling strategies. See Built-in selection of the most informative context for details.
What data types are supported?¶
NICL accepts numeric data only. You must preprocess other types:
Categorical: Use
sklearn.preprocessing.LabelEncoderorskrub.TableVectorizerText: Use embeddings or bag-of-words encoding (native textual data handling is in development)
Dates: Convert to numeric features (timestamp, day of week, etc.)
Missing values: Impute before passing to the model
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from skrub import TableVectorizer
from neuralk import NICLClassifier
clf = make_pipeline(
TableVectorizer(),
SimpleImputer(),
NICLClassifier(),
)
How do I handle high-dimensional data?¶
For data with more than 1,000 features, we recommend applying dimensionality reduction using PCA:
from sklearn.decomposition import PCA
clf = make_pipeline(
SimpleImputer(),
PCA(n_components=100),
NICLClassifier(),
)
How much data preparation is needed?¶
NICL is designed to work with minimal preprocessing. In most cases, you can pass your dataset as a clean tabular structure (e.g., pandas DataFrame) and obtain strong performance.
However, you can improve accuracy and stability by:
Handling missing values (imputation or explicit missing indicators)
Encoding categorical features as integers consistently across training and inference
Keeping feature semantics consistent between training and prediction
Expert-based preprocessing or domain-specific feature engineering can further enhance results, but NICL is intentionally built to work well “as is.”
Note
Native support for missing values and datetime features is coming soon.
How do I optimise context for better accuracy?¶
Unlike traditional ML models, NICL does not rely on hyper-parameter tuning. Instead, performance can be improved through context specification:
Include representative samples to help NICL infer feature importance and relationships
Experiment with context to help NICL focus on the most relevant aspects of your data
How many samples should I use as context?¶
While NICL can handle large datasets, more examples are not always better. The best performance often comes from a compact, informative subset rather than a full dataset dump.
Too few samples: might lead to underfitting and poor generalization
Too many samples: might cause context dilution, slow down inference and introduce noise
Optimal: diverse, non-redundant examples as context
Tip
Start with representative samples and increase gradually until you reach diminishing returns.
→ See Built-in selection of the most informative context for advanced sampling strategies.
Performance¶
Why is my prediction slow?¶
Improving latency is a priority at Neuralk, and our API is updated regularly to enhance performance.
Common causes of slow predictions:
Large dataset: More data = longer upload and processing time
Network latency: Cloud API depends on your connection
Model choice:
nicl-largeis slower thannicl-flash
Solutions:
Use
nicl-flashfor faster inferenceSample your training data (see Built-in selection of the most informative context)
Consider on-premise deployment for latency-sensitive applications
How can I speed up inference?¶
Use a faster model:
clf = NICLClassifier(model="nicl-flash")Reduce context size with sampling strategies:
clf = NICLClassifier(strategy="random", n_groups=10)Reduce feature dimensionality with PCA
Authentication & Billing¶
How do I get an API key?¶
Run neuralk login in your terminal, then register at:
How does billing work?¶
Each prediction consumes credits based on dataset size and model used. Check your usage after predictions:
clf.fit(X_train, y_train)
clf.predict(X_test)
print(f"Credits used: {clf.credits_consumed}")
Contact sales@neuralk-ai.com for enterprise pricing and volume discounts.
What are the rate limits?¶
Rate limits depend on your subscription tier. If you encounter HTTP 429 errors, wait and retry or contact support for higher limits.
Troubleshooting¶
“API key is required for cloud mode”¶
Set your API key via environment variable:
export NEURALK_API_KEY=nk_live_your_key_here
Or pass it directly:
clf = NICLClassifier(api_key="nk_live_your_key_here")
Request timeout errors¶
For large datasets, increase the timeout:
clf = NICLClassifier(timeout_s=1800) # 30 minutes
How do I debug a failed request?¶
Use the request_id to reference your request when contacting support:
try:
predictions = clf.predict(X_test)
except NeuralkException as e:
print(f"Request ID: {clf.request_id}")
print(f"Error: {e.message}")
Still Have Questions?¶
Documentation: https://docs.neuralk-ai.com
Support: alex@neuralk-ai.com
Enterprise & Collaborations: antoine@neuralk-ai.com