Developer Notes Pytorch 2 9 Documentation

Leo Migdal

-Nov 17, 2025, 3:12 AM

developer notes pytorch 2 9 documentation

Created On: Apr 16, 2025 | Last Updated On: Apr 16, 2025 Created On: Aug 13, 2025 | Last Updated On: Sep 02, 2025 PyTorch provides a flexible and efficient platform for building deep learning models, offering dynamic computation graphs and a rich ecosystem of tools and libraries. This guide will help you harness the power of PyTorch to create and deploy machine learning models effectively. PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. Features described in this documentation are classified by release status:

Stable (API-Stable): These features will be maintained long-term and there should generally be no major performance limitations or gaps in documentation. We also expect to maintain backwards compatibility (although breaking changes can happen and notice will be given one release ahead of time). Unstable (API-Unstable): Encompasses all features that are under active development where APIs may change based on user feedback, requisite performance improvements or because coverage across operators is not yet complete. The APIs and performance characteristics of these features may change. Created On: Apr 24, 2025 | Last Updated On: Apr 24, 2025 When a user passes one or more tensors to out= the contract is as follows:

if an out tensor has no elements it will be resized to the shape, stride, and memory format of the output of the computation. if an out tensor has a different shape than the result of the computation an error is thrown OR the out tensor is resized to the same shape, stride, and memory format of the... (This resizing behavior is deprecated and PyTorch is updating its operators to consistently throw an error.) passing out= tensors with the correct shape is numerically equivalent to performing the operation and “safe copying” its results to the (possibly resized) out tensor. In this case strides and memory format are preserved. We are excited to announce the release of PyTorch® 2.9 (release notes)!

This release features: This release is composed of 3216 commits from 452 contributors since PyTorch 2.8. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.9. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page. If you maintain and build your own custom C++/CUDA extensions with PyTorch, this update is for you!

We’ve been building out a stable ABI with C++ convenience wrappers to enable you to build extensions with one torch version and run with another. We’ve added the following APIs since the last release: With these APIs, we have been able to enable a libtorch-ABI wheel for Flash-Attention 3: see the PR here. While we have been intentional about API design to ensure maximal stability, please note that the highlevel C++ APIs are still in preview! We are working on many next steps: building out the ABI surface, establishing versioning, writing more docs, and enabling more custom kernels to be ABI stable. We introduce PyTorch Symmetric Memory to enable easy programming of multi-GPU kernels that work over NVLinks as well as RDMA networks.

Symmetric Memory unlocks three new programming opportunities: There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. This release is meant to fix the following issues (regressions / silent correctness):

Significant Memory Regression in F.conv3d with bfloat16 Inputs in PyTorch 2.9.0 (#166643) This release provides work around this issue. If you are impacted please install nvidia-cudnn package version 9.15+ from pypi. (#166480) (#167111) Fix Inductor bug when compiling Gemma (#165601) Fix InternalTorchDynamoError in bytecode_transformation (#166036) Fix silent correctness error_on_graph_break bug where non-empty checkpoint results in unwanted graph break resumption (#166586) Improve performance by avoiding recompilation with mark_static_address... Created On: Jul 24, 2019 | Last Updated On: Jul 15, 2025 This note talks about several extension points and tricks that might be useful when running PyTorch within a larger system or operating multiple systems using PyTorch in a larger organization.

The note assumes that you either build PyTorch from source in your organization or have an ability to statically link additional code to be loaded when PyTorch is used. Therefore, many of the hooks are exposed as C++ APIs that can be triggered once in a centralized place, e.g. in static initialization code. PyTorch comes with torch.autograd.profiler capable of measuring time taken by individual operators on demand. One can use the same mechanism to do “always ON” measurements for any process running PyTorch. It might be useful for gathering information about PyTorch workloads running in a given process or across the entire set of machines.

New callbacks for any operator invocation can be added with torch::addGlobalCallback. Hooks will be called with torch::RecordFunction struct that describes invocation context (e.g. name). If enabled, RecordFunction::inputs() contains arguments of the function represented as torch::IValue variant type. Note, that inputs logging is relatively expensive and thus has to be enabled explicitly. PyTorch v2.9.1 should now be generally available.

Promotions to PyPI, and download.pytorch.org have been done. Installation instructions for the new release can be found at getting started page. Release notes for PyTorch and Domain Libraries are available on following links: All tags, including for the following domains have been pushed: Please contact Releng team members if you have any questions/comments. Powered by Discourse, best viewed with JavaScript enabled

We are kicking off the PyTorch 2.9.0 release cycle and continue to be excited for all the great features from our PyTorch community! REMINDER OF CHANGES TO CLASSIFICATION & TRACKING As mentioned in this RFC, beginning in release 2.8, there are changes to classification and tracking. Feature submissions are now classified as either Stable (API-Stable) or Unstable (API-Unstable), and the previous classifications of Prototype, Beta and Stable, will no longer be used. The requirements for a feature to be considered stable remain the same, and in the RFC we propose a suggested path to stable. All features continue to be welcome.

If you would like a feature to be included in the release blogpost, please mention it in the “Release highlight for Proposed Feature” issue that you create, and include the release version this is... pytorch 2.9 was released october 15, 2025. key improvements include expanded wheel variant support (rocm, xpu, cuda 13), symmetric memory for multi-gpu programming, stable libtorch abi for c++/cuda extensions, flexible graph break control, python 3.14 support, and linux aarch64 cuda builds. see pytorch setup with uv for detailed configuration including automatic backend selection. pytorch 2.9 introduces experimental wheel variant support that automatically detects your hardware. this feature is being tested with a special build of uv:

note: wheel variant support is experimental in 2.9. the goal is eventual automatic hardware detection without manual backend selection. pytorch 2.9 introduces symmetric memory for simplified multi-gpu kernel programming:

Developer Notes Pytorch 2 9 Documentation

People Also Search

Created On: Apr 16, 2025 | Last Updated On: Apr

Stable (API-Stable): These Features Will Be Maintained Long-term And There

If An Out Tensor Has No Elements It Will Be

This Release Features: This Release Is Composed Of 3216 Commits

We’ve Been Building Out A Stable ABI With C++ Convenience