Torch Optim Pytorch 2 9 Documentation

Leo Migdal

-Nov 17, 2025, 3:12 AM

Created On: Jun 13, 2025 | Last Updated On: Aug 24, 2025 torch.optim is a package implementing various optimization algorithms. Most commonly used methods are already supported, and the interface is general enough, so that more sophisticated ones can also be easily integrated in the future. To use torch.optim you have to construct an optimizer object that will hold the current state and will update the parameters based on the computed gradients. To construct an Optimizer you have to give it an iterable containing the parameters (all should be Parameter s) or named parameters (tuples of (str, Parameter)) to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. Features described in this documentation are classified by release status: Stable (API-Stable): These features will be maintained long-term and there should generally be no major performance limitations or gaps in documentation. We also expect to maintain backwards compatibility (although breaking changes can happen and notice will be given one release ahead of time). Unstable (API-Unstable): Encompasses all features that are under active development where APIs may change based on user feedback, requisite performance improvements or because coverage across operators is not yet complete. The APIs and performance characteristics of these features may change.

Go to the end to download the full example code. Created On: Dec 03, 2020 | Last Updated: Sep 29, 2025 | Last Verified: Not Verified A third order polynomial, trained to predict $y=\sin(x)$ from $-\pi$ to $\pi$ by minimizing squared Euclidean distance. This implementation uses the nn package from PyTorch to build the network. Rather than manually updating the weights of the model as we have been doing, we use the optim package to define an Optimizer that will update the weights for us. The optim package defines many optimization algorithms that are commonly used for deep learning, including SGD+momentum, RMSProp, Adam, etc.

We are excited to announce the release of PyTorch® 2.9 (release notes)! This release features: This release is composed of 3216 commits from 452 contributors since PyTorch 2.8. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.9. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.

If you maintain and build your own custom C++/CUDA extensions with PyTorch, this update is for you! We’ve been building out a stable ABI with C++ convenience wrappers to enable you to build extensions with one torch version and run with another. We’ve added the following APIs since the last release: With these APIs, we have been able to enable a libtorch-ABI wheel for Flash-Attention 3: see the PR here. While we have been intentional about API design to ensure maximal stability, please note that the highlevel C++ APIs are still in preview! We are working on many next steps: building out the ABI surface, establishing versioning, writing more docs, and enabling more custom kernels to be ABI stable.

We introduce PyTorch Symmetric Memory to enable easy programming of multi-GPU kernels that work over NVLinks as well as RDMA networks. Symmetric Memory unlocks three new programming opportunities: Approaches to Learning Rate Scheduling Beyond torch.optim.lr_scheduler Advanced Optimization Techniques (Not Direct Alternatives, But Related) No Optimizer (Rare and Specific Use Cases) Created On: Jul 18, 2025 | Last Updated On: Jul 18, 2025

The following are aliases to their counterparts in torch.optim in the nested namespaces in which they are defined. For any of these APIs, feel free to use the top-level version in torch.optim like torch.optim.Adam or the nested version torch.optim.adam.Adam. Functional API that performs Adadelta algorithm computation. Functional API that performs Adagrad algorithm computation. Functional API that performs Adam algorithm computation. For further details regarding the algorithm we refer to Adam: A Method for Stochastic Optimization.

params (iterable) – iterable of parameters or named_parameters to optimize or iterable of dicts defining parameter groups. When using named_parameters, all parameters in all groups should be named lr (float, Tensor, optional) – learning rate (default: 1e-3). A tensor LR is not yet supported for all our implementations. Please use a float LR if you are not also specifying fused=True or capturable=True. betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8) Created On: Mar 01, 2021 | Last Updated On: Jun 16, 2025 Distributed optimizer is not currently supported when using CUDA tensors torch.distributed.optim exposes DistributedOptimizer, which takes a list of remote parameters (RRef) and runs the optimizer locally on the workers where the parameters live. The distributed optimizer can use any of the local optimizer Base class to apply the gradients on each worker. DistributedOptimizer takes remote references to parameters scattered across workers and applies the given optimizer locally for each parameter.

This class uses get_gradients() in order to retrieve the gradients for specific parameters. In the realm of deep learning, optimization is a crucial step that can significantly impact the performance of a model. PyTorch, a popular open - source deep learning framework, provides a powerful module called torch.optim for handling optimization algorithms. Stored on GitHub, this module offers a wide range of optimization algorithms that can be used to train neural networks effectively. In this blog post, we will delve into the fundamental concepts, usage methods, common practices, and best practices of torch.optim to help you make the most of it in your deep learning projects. In deep learning, the goal of optimization is to find the optimal set of parameters (weights and biases) of a neural network that minimizes a given loss function.

The loss function measures how well the model is performing on the training data. Optimization algorithms iteratively update the parameters of the model to reduce the value of the loss function over time. torch.optim is a PyTorch module that provides various optimization algorithms such as Stochastic Gradient Descent (SGD), Adam, Adagrad, etc. These algorithms are used to update the parameters of a neural network based on the gradients of the loss function with respect to the parameters. Gradient descent is the most basic optimization algorithm. The idea is to compute the gradient of the loss function with respect to the parameters and update the parameters in the opposite direction of the gradient.

The update rule for a parameter $\theta$ is given by: $\theta_{new}=\theta_{old}-\alpha\nabla L(\theta_{old})$ Created On: Jun 13, 2025 | Last Updated On: Aug 24, 2025 torch.optim is a package implementing various optimization algorithms. Most commonly used methods are already supported, and the interface is general enough, so that more sophisticated ones can also be easily integrated in the future. To use torch.optim you have to construct an optimizer object that will hold the current state and will update the parameters based on the computed gradients.

To construct an Optimizer you have to give it an iterable containing the parameters (all should be Parameter s) or named parameters (tuples of (str, Parameter)) to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.

Torch Optim Pytorch 2 9 Documentation

People Also Search

Created On: Jun 13, 2025 | Last Updated On: Aug

PyTorch Is An Optimized Tensor Library For Deep Learning Using

Go To The End To Download The Full Example Code.

We Are Excited To Announce The Release Of PyTorch® 2.9

If You Maintain And Build Your Own Custom C++/CUDA Extensions