The Road Less Scheduled Openreview

Leo Migdal
-
the road less scheduled openreview

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track Aaron Defazio, Xingyu (Alice) Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky Existing learning rate schedules that do not require specification of the optimization stopping step $T$ are greatly out-performed by learning rate schedules that depend on $T$. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from... Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging.

An open source implementation of our method is available at https://github.com/facebookresearch/schedule_free. Schedule-Free AdamW is the core algorithm behind our winning entry to the MLCommons 2024 AlgoPerf Algorithmic Efficiency Challenge Self-Tuning track. Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings. Use the "Report an Issue" link to request a name change.

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs. ↗ OpenReview ↗ NeurIPS Proc.

↗ Hugging Face ↗ Chat Many machine learning optimization techniques depend on learning rate schedules. However, these schedules necessitate determining the training duration beforehand, which limits their applicability. Moreover, the theoretical guarantees often don’t translate to real-world performance, creating a theory-practice gap. This paper focuses on addressing these shortcomings. This research introduces ‘Schedule-Free’ optimization, a novel approach that completely forgoes learning rate schedules while maintaining state-of-the-art performance across various tasks.

The core contribution is a unified theoretical framework linking iterate averaging and learning rate schedules, leading to a new momentum-based algorithm. This approach matches and often surpasses schedule-based approaches in practice, demonstrating significant improvement in efficiency and usability. This paper is crucial because it addresses a significant gap between optimization theory and practical applications in machine learning. Current methods rely on learning rate schedules requiring prior knowledge of the training duration. The Schedule-Free approach eliminates this need, achieving state-of-the-art performance across diverse problems. This offers researchers a more efficient, theoretically sound method, and opens new avenues in large-scale optimization.

This figure shows the training performance of Schedule-Free SGD and Schedule-Free AdamW compared to traditional cosine learning rate schedules. The black lines represent the Schedule-Free methods, demonstrating that they closely follow the Pareto frontier (optimal balance between loss and training time). The red lines represent cosine schedules with different lengths. The results show that Schedule-Free methods perform comparably to or better than the tuned cosine schedules, even without requiring the specification of the optimization stopping time. Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from...

Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available (this https URL). Published with Wowchemy — the free, open source website builder that empowers creators. Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track Aaron Defazio, Xingyu (Alice) Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

Existing learning rate schedules that do not require specification of the optimization stopping step $T$ are greatly out-performed by learning rate schedules that depend on $T$. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from... Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available at https://github.com/facebookresearch/schedule_free. Schedule-Free AdamW is the core algorithm behind our winning entry to the MLCommons 2024 AlgoPerf Algorithmic Efficiency Challenge Self-Tuning track.

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings. Use the "Report an Issue" link to request a name change.

People Also Search

Part Of Advances In Neural Information Processing Systems 37 (NeurIPS

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track Aaron Defazio, Xingyu (Alice) Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky Existing learning rate schedules that do not require specification of the optimization stopping step $T$ are greatly out-performed by learning rate schedules that depend on $T$. We propose an approach ...

An Open Source Implementation Of Our Method Is Available At

An open source implementation of our method is available at https://github.com/facebookresearch/schedule_free. Schedule-Free AdamW is the core algorithm behind our winning entry to the MLCommons 2024 AlgoPerf Algorithmic Efficiency Challenge Self-Tuning track. Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliograp...

ArXivLabs Is A Framework That Allows Collaborators To Develop And

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add v...

↗ Hugging Face ↗ Chat Many Machine Learning Optimization Techniques

↗ Hugging Face ↗ Chat Many machine learning optimization techniques depend on learning rate schedules. However, these schedules necessitate determining the training duration beforehand, which limits their applicability. Moreover, the theoretical guarantees often don’t translate to real-world performance, creating a theory-practice gap. This paper focuses on addressing these shortcomings. This rese...

The Core Contribution Is A Unified Theoretical Framework Linking Iterate

The core contribution is a unified theoretical framework linking iterate averaging and learning rate schedules, leading to a new momentum-based algorithm. This approach matches and often surpasses schedule-based approaches in practice, demonstrating significant improvement in efficiency and usability. This paper is crucial because it addresses a significant gap between optimization theory and prac...