Pdf On Hyper Parameter Selection For Guaranteed Convergence Of Rmsprop

Leo Migdal
-
pdf on hyper parameter selection for guaranteed convergence of rmsprop

RMSProp is one of the most popular stochastic optimization algorithms in deep learning applications. However, recent work has pointed out that this method may not converge to the optimal solution even in simple convex settings. To this end, we propose a time-varying version of RMSProp to fix the non-convergence issues. Specifically, the hyperparameter, \(\beta _t\), is considered as a time-varying sequence rather than a fine-tuned constant. We also provide a rigorous proof that the RMSProp can converge to critical points even for smooth and non-convex objectives, with a convergence rate of order \(\mathcal {O}(\log T/\sqrt{T})\). This provides a new understanding of RMSProp divergence, a common issue in practical applications.

Finally, numerical experiments show that time-varying RMSProp exhibits advantages over standard RMSProp on benchmark datasets and support the theoretical results. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout. The datasets analysed during the current study are available in the following public domain resources: http://yann.lecun.com/exdb/mnist/; http://www.cs.toronto.edu/~kriz/cifar.html. https://github.com/kuangliu/pytorch-cifar This repository is the official implementation of the paper "RMSprop can converge with proper hyper-parameter".

This folder contains the following code: (a) cifar_resnet.py: this is the code for RMSprop/Adam algorithm, for training cifar10 on resnet, presented in Section 5; (b) cifar_resnet_SGD.py: this is the code for SGD algorithm with momentum, for training cifar10 on resnet, presented in Section 5; (c) reddiexample.m: this is the code for Adam algorithm for training the counter-example (1) in Reddi et. al, presented in Section 1; RMSProp is one of the most popular stochastic optimization algorithms in deep learning applications.

However, recent work has pointed out that this method may not converge to the optimal solution even in simple convex settings. To this end, we propose a time-varying version of RMSProp to fix the non-convergence issues. Specifically, the hyperparameter, β t , is considered as a time-varying sequence rather than a fine-tuned constant. We also provide a rigorous proof that the RMSProp can converge to critical points even for smooth and non-convex objectives, with a convergence rate of order O ( log T / T ) . This provides a new understanding of RMSProp divergence, a common issue in practical applications. Finally, numerical experiments show that time-varying RMSProp exhibits advantages over standard RMSProp on benchmark datasets and support the theoretical results.

Keywords: Convergence; Deep learning; Neural networks; Non-convex optimization; RMSProp. © The Author(s), under exclusive licence to Springer Nature B.V. 2022. Conflict of interestThe authors declare that they have no conflict of interest. RMSProp is one of the most popular stochastic optimization algorithms in deep learning applications. However, recent work has pointed out that this method may not converge to the optimal solution even in simple convex settings.

To this end, we propose a time-varying version of RMSProp to fix the non-convergence issues. Specifically, the hyperparameter, \(\beta _t\), is considered as a time-varying sequence rather than a fine-tuned constant. We also provide a rigorous proof that the RMSProp can converge to critical points even for smooth and non-convex objectives, with a convergence rate of order \(\mathcal {O}(\log T/\sqrt{T})\). This provides a new understanding of RMSProp divergence, a common issue in practical applications. Finally, numerical experiments show that time-varying RMSProp exhibits advantages over standard RMSProp on benchmark datasets and support the theoretical results. This is a preview of subscription content, log in via an institution to check access.

Price excludes VAT (USA) Tax calculation will be finalised during checkout. The datasets analysed during the current study are available in the following public domain resources: http://yann.lecun.com/exdb/mnist/; http://www.cs.toronto.edu/~kriz/cifar.html. https://github.com/kuangliu/pytorch-cifar RMSProp is one of the most popular stochastic optimization algorithms in deep learning applications. However, recent work has pointed out that this method may not converge to the optimal solution even in simple convex settings. To this end, we propose a time-varying version of RMSProp to fix the non-convergence issues.

Specifically, the hyperparameter, \(\beta _t\), is considered as a time-varying sequence rather than a fine-tuned constant. We also provide a rigorous proof that the RMSProp can converge to critical points even for smooth and non-convex objectives, with a convergence rate of order \(\mathcal {O}(\log T/\sqrt{T})\). This provides a new understanding of RMSProp divergence, a common issue in practical applications. Finally, numerical experiments show that time-varying RMSProp exhibits advantages over standard RMSProp on benchmark datasets and support the theoretical results. RMSProp 是深度学习应用程序中最流行的随机优化算法之一。然而,最近的工作指出,即使在简单的凸设置中,这种方法也可能无法收敛到最优解。为此,我们提出了一个时变版本的 RMSProp 来解决非收敛问题。具体来说,超参数 \(\beta _t\) 被认为是一个时变序列,而不是一个微调的常数。我们还提供了一个严格的证明,即使对于平滑和非凸目标,RMSProp 也可以收敛到临界点,收敛率为 \(\mathcal {O}(\log T/\sqrt{T})\)。 这为实际应用中的常见问题——RMSProp 发散提供了新的理解。最后,数值实验表明,时变 RMSProp 在基准数据集上表现出优于标准 RMSProp 的优势,并支持理论结果。

People Also Search

RMSProp Is One Of The Most Popular Stochastic Optimization Algorithms

RMSProp is one of the most popular stochastic optimization algorithms in deep learning applications. However, recent work has pointed out that this method may not converge to the optimal solution even in simple convex settings. To this end, we propose a time-varying version of RMSProp to fix the non-convergence issues. Specifically, the hyperparameter, \(\beta _t\), is considered as a time-varying...

Finally, Numerical Experiments Show That Time-varying RMSProp Exhibits Advantages Over

Finally, numerical experiments show that time-varying RMSProp exhibits advantages over standard RMSProp on benchmark datasets and support the theoretical results. This is a preview of subscription content, log in via an institution to check access. Price excludes VAT (USA) Tax calculation will be finalised during checkout. The datasets analysed during the current study are available in the followi...

This Folder Contains The Following Code: (a) Cifar_resnet.py: This Is

This folder contains the following code: (a) cifar_resnet.py: this is the code for RMSprop/Adam algorithm, for training cifar10 on resnet, presented in Section 5; (b) cifar_resnet_SGD.py: this is the code for SGD algorithm with momentum, for training cifar10 on resnet, presented in Section 5; (c) reddiexample.m: this is the code for Adam algorithm for training the counter-example (1) in Reddi et. ...

However, Recent Work Has Pointed Out That This Method May

However, recent work has pointed out that this method may not converge to the optimal solution even in simple convex settings. To this end, we propose a time-varying version of RMSProp to fix the non-convergence issues. Specifically, the hyperparameter, β t , is considered as a time-varying sequence rather than a fine-tuned constant. We also provide a rigorous proof that the RMSProp can converge t...

Keywords: Convergence; Deep Learning; Neural Networks; Non-convex Optimization; RMSProp. ©

Keywords: Convergence; Deep learning; Neural networks; Non-convex optimization; RMSProp. © The Author(s), under exclusive licence to Springer Nature B.V. 2022. Conflict of interestThe authors declare that they have no conflict of interest. RMSProp is one of the most popular stochastic optimization algorithms in deep learning applications. However, recent work has pointed out that this method may n...