Cocalc Lab 08 Optimization Ipynb

Leo Migdal

-Nov 9, 2025, 3:29 PM

This class, Optimization, is the eighth of eight classes in the Machine Learning Foundations series. It builds upon the material from each of the other classes in the series -- on linear algebra, calculus, probability, statistics, and algorithms -- in order to provide a detailed introduction to training machine... Through the measured exposition of theory paired with interactive examples, you’ll develop a working understanding of all of the essential theory behind the ubiquitous gradient descent approach to optimization as well as how to... You’ll also learn about the latest optimizers, such as Adam and Nadam, that are widely-used for training deep neural networks. Over the course of studying this topic, you'll: Discover how the statistical and machine learning approaches to optimization differ, and why you would select one or the other for a given problem you’re solving.

Understand exactly how the extremely versatile (stochastic) gradient descent optimization algorithm works, including how to apply it There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. Some code in this notebook runs pretty slowly in CoCalc.

If you have the ability, you might wish to run this notebook locally or on a more powerful remote machine. Fitting a model in machine learning is an optimization problem. In a previous lesson we saw how logistic and linear regression use optimization to find the regression model coefficients to minimize the difference between observed and predicted values of the response variable. Most machine learning models also come with a bunch of parameters that need to be set which can alter the fit of the model. For example, here is the LogisticRegression class from scikit learn (sklearn): Some of these parameters have to do with exactly what model is fit.

For instance, penalty changes the form of regularization added to the objective function to prevent overfitting while C changes the strength of the regularization (larger C is less regularization). These extra parameters are usually called hyperparameters and to get the best model they often need to be tuned. This tuning is another kind of optimization and is usually called "hyperparameter optimization" or "hyperparameter tuning". This is a hot area and a little searching with Google will yield a ton of results. Here is one article that gives an overview of hyperparameter tuning methods (but gets a bit technical at the end). To keep everything straight it helps to remember that model parameters and hyperparameters are different.

Hyperparameters are set or determined before the model is fit. Model parameters are determined during the process of fitting the model to the data. Until now, you've always used Gradient Descent to update the parameters and minimize the cost. In this notebook, you'll gain skills with some more advanced optimization methods that can speed up learning and perhaps even get you to a better final value for the cost function. Having a good optimization algorithm can be the difference between waiting days vs. just a few hours to get a good result.

By the end of this notebook, you'll be able to: Apply optimization methods such as (Stochastic) Gradient Descent, Momentum, RMSProp and Adam Use random minibatches to accelerate convergence and improve optimization Gradient descent goes "downhill" on a cost function JJJ. Think of it as trying to do this: There was an error while loading.

Please reload this page.

Cocalc Lab 08 Optimization Ipynb

People Also Search

This Class, Optimization, Is The Eighth Of Eight Classes In

Understand Exactly How The Extremely Versatile (stochastic) Gradient Descent Optimization

If You Have The Ability, You Might Wish To Run

For Instance, Penalty Changes The Form Of Regularization Added To

Hyperparameters Are Set Or Determined Before The Model Is Fit.