4 4 1 1 17 Statsmodels Formula Api Wls Statsmodels Api V1

Leo Migdal

-Dec 4, 2025, 8:20 AM

4 4 1 1 17 statsmodels formula api wls statsmodels api v1

A regression model with diagonal but non-identity covariance structure. The weights are presumed to be (proportional to) the inverse of the variance of the observations. That is, if the variables are to be transformed by 1/sqrt(W) you must supply weights = 1/W. 1-d endogenous response variable. The dependent variable. A nobs x k array where nobs is the number of observations and k is the number of regressors.

An intercept is not included by default and should be added by the user. See statsmodels.tools.add_constant(). 1d array of weights. If you supply 1/W then the variables are pre- multiplied by 1/sqrt(W). If no weights are supplied the default value is 1 and WLS reults are the same as OLS. Create a Model from a formula and dataframe.

An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame. Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals. Additional positional argument that are passed to the model. These are passed to the model with one exception.

The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1. This document describes the Formula API in statsmodels, which provides an R-style formula interface for specifying statistical models. The Formula API allows users to express model specifications using a concise, string-based syntax rather than directly managing design matrices.

This approach simplifies model creation and enhances readability by allowing users to focus on the statistical relationships rather than data manipulation details. For information about direct data management without formulas, see Data Management. The Formula API provides a consistent interface for specifying models using R-like formulas. It leverages the patsy library for formula parsing and design matrix creation, which then feeds into statsmodels' model classes. Sources: statsmodels/formula/api.py12-32 The Formula API provides formula-based constructors for many statsmodels model classes.

Each of these constructors is a convenience function that calls the from_formula method of the corresponding model class. The main statsmodels API is split into models: statsmodels.api: Cross-sectional models and methods. Canonically imported using import statsmodels.api as sm. statsmodels.tsa.api: Time-series models and methods. Canonically imported using import statsmodels.tsa.api as tsa.

statsmodels.formula.api: A convenience interface for specifying models using formula strings and DataFrames. This API directly exposes the from_formula class method of models that support the formula API. Canonically imported using import statsmodels.formula.api as smf The API focuses on models and the most frequently used statistical test, and tools. Import Paths and Structure explains the design of the two API modules and how importing from the API differs from directly importing from the module where the model is defined. See the detailed topic pages in the User Guide for a complete list of available models, statistics, and tools.

The results include an estimate of covariance matrix, (whitened) residuals and an estimate of scale. Can be “pinv”, “qr”. “pinv” uses the Moore-Penrose pseudoinverse to solve the least squares problem. “qr” uses the QR factorization. The fit method uses the pseudoinverse of the design/exogenous variables to solve the least squares minimization. There was an error while loading.

Please reload this page. Every tutorial you read shows a different way to import Statsmodels. One guide starts with import statsmodels.api as sm. Another uses from statsmodels.formula.api import ols. A third imports directly from submodules like from statsmodels.regression.linear_model import OLS. Which approach should you use?

The confusion stems from a deliberate design choice. Statsmodels offers multiple import paths because different users need different things. Researchers writing academic papers want one workflow. Data scientists doing quick exploratory analysis want another. Understanding these three approaches will save you from blindly copying code that doesn’t match your actual needs. The statsmodels.api module serves as your main gateway to the library.

When you import sm, you get access to the most commonly used models and functions through a clean namespace. Ordinary Least Squares becomes sm.OLS. Logistic regression becomes sm.Logit. The add_constant function becomes sm.add_constant. The statsmodels.formula.api module gives you R-style formula syntax. Instead of manually separating your endog and exog variables, you write a formula string that describes the relationship.

The lowercase function names (ols instead of OLS) signal that you’re using the formula interface. Direct imports pull specific classes or functions from their exact location in the library structure. You import only what you need, nothing more. When diving into statistical modeling with Python’s powerful Statsmodels library, preparing your data can sometimes feel like a separate, time-consuming task. Manually creating dummy variables, interaction terms, or transformations often adds complexity before you even fit your first model. This is where the python statsmodels formula api shines!

Inspired by R’s elegant formula syntax, it provides a concise and intuitive way to define your models directly from a string, handled by the fantastic Patsy library under the hood. It simplifies your workflow, making your code cleaner and more readable. In this comprehensive guide, we’ll explore everything you need to know about the Statsmodels Formula API, from basic syntax to advanced transformations and interactions, empowering you to build sophisticated models with ease. The Statsmodels Formula API allows you to specify statistical models using a string-based formula, much like you would in R. This formula describes the relationship between your dependent (response) variable and your independent (predictor) variables. Its primary advantage is abstracting away the tedious data preparation steps.

It automatically handles tasks like creating design matrices, generating dummy variables for categorical features, and even constructing interaction terms, all based on a simple formula string. This drastically reduces boilerplate code and improves model interpretability. Return a regularized fit to a linear regression model. Only the coordinate descent algorithm is implemented. The maximum number of iteration cycles (an iteration cycle involves running coordinate descent on all variables). The penalty weight.

If a scalar, the same penalty weight applies to all variables in the model. If a vector, it must have the same length as params, and contains a penalty weight for each coefficient. The fraction of the penalty given to the L1 penalty term. Must be between 0 and 1 (inclusive). If 0, the fit is ridge regression. If 1, the fit is the lasso.

Create a Model from a formula and dataframe. An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame. Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals. Additional positional argument that are passed to the model.

These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

4 4 1 1 17 Statsmodels Formula Api Wls Statsmodels Api V1

People Also Search

A Regression Model With Diagonal But Non-identity Covariance Structure. The

An Intercept Is Not Included By Default And Should Be

An Array-like Object Of Booleans, Integers, Or Index Values That

The Eval_env Keyword Is Passed To Patsy. It Can Be

This Approach Simplifies Model Creation And Enhances Readability By Allowing