Skorch Read The Docs

Leo Migdal

-Nov 17, 2025, 4:04 AM

A scikit-learn compatible neural network library that wraps PyTorch. The goal of skorch is to make it possible to use PyTorch with sklearn. This is achieved by providing a wrapper around PyTorch that has an sklearn interface. skorch does not re-invent the wheel, instead getting as much out of your way as possible. If you are familiar with sklearn and PyTorch, you don’t have to learn any new concepts, and the syntax should be well known. (If you’re not familiar with those libraries, it is worth getting familiarized.)

Additionally, skorch abstracts away the training loop, making a lot of boilerplate code obsolete. A simple net.fit(X, y) is enough. Out of the box, skorch works with many types of data, be it PyTorch Tensors, NumPy arrays, Python dicts, and so on. However, if you have other data, extending skorch is easy to allow for that. Overall, skorch aims at being as flexible as PyTorch while having a clean interface as sklearn. Contains history class and helper functions.

History for use in training using multiple processes When using skorch with AccelerateMixin for multi GPU training, use this class instead of the default History class. When using PyTorch torch.nn.parallel.DistributedDataParallel, the whole training process is forked and batches are processed in parallel. That means that the standard History does not see all the batches that are being processed, which results in the different processes having histories that are out of sync. This is bad because the history is used as a reference to influence the training, e.g. to control early stopping.

This class solves the problem by using a distributed store from PyTorch, e.g. torch.distributed.TCPStore, to synchronize the batch information across processes. This ensures that the information stored in the individual history copies is identical for history[:, 'batches']. When it comes to the epoch-level information, it can still diverge between processes (e.g. the recorded duration of the epoch). © Copyright 2017, Marian Tietz, Daniel Nouri, Benjamin Bossan.

Revision e32c195a. The base class covers more generic cases. Depending on your use case, you might want to use NeuralNetClassifier or NeuralNetRegressor. In addition to the parameters listed below, there are parameters with specific prefixes that are handled separately. To illustrate this, here is an example: This way, when optimizer is initialized, NeuralNet will take care of setting the momentum parameter to 0.95.

(Note that the double underscore notation in optimizer__momentum means that the parameter momentum should be set on the object optimizer. This is the same semantic as used by sklearn.) Furthermore, this allows to change those parameters later: © Copyright 2017, Marian Tietz, Daniel Nouri, Benjamin Bossan. This module contains classes and functions related to data handling. This class is responsible for performing the NeuralNet’s internal cross validation.

For this, it sticks closely to the sklearn standards. For more information on how sklearn handles cross validation, look here. The first argument that CVSplit takes is cv. It works analogously to the cv argument from sklearn GridSearchCV, cross_val_score(), etc. For those not familiar, here is a short explanation of what you may pass: Furthermore, CVSplit takes a stratified argument that determines whether a stratified split should be made (only makes sense for discrete targets), and a random_state argument, which is used in case the cross validation split...

One difference to sklearn’s cross validation is that skorch makes only a single split. In sklearn, you would expect that in a 5-fold cross validation, the model is trained 5 times on the different combination of folds. This is often not desirable for neural networks, since training takes a lot of time. Therefore, skorch only ever makes one split.

Skorch Read The Docs

People Also Search

A Scikit-learn Compatible Neural Network Library That Wraps PyTorch. The

Additionally, Skorch Abstracts Away The Training Loop, Making A Lot

History For Use In Training Using Multiple Processes When Using

This Class Solves The Problem By Using A Distributed Store

Revision E32c195a. The Base Class Covers More Generic Cases. Depending