Formula Api Statsmodels Statsmodels Deepwiki

Leo Migdal
-
formula api statsmodels statsmodels deepwiki

This document describes the Formula API in statsmodels, which provides an R-style formula interface for specifying statistical models. The Formula API allows users to express model specifications using a concise, string-based syntax rather than directly managing design matrices. This approach simplifies model creation and enhances readability by allowing users to focus on the statistical relationships rather than data manipulation details. For information about direct data management without formulas, see Data Management. The Formula API provides a consistent interface for specifying models using R-like formulas. It leverages the patsy library for formula parsing and design matrix creation, which then feeds into statsmodels' model classes.

Sources: statsmodels/formula/api.py12-32 The Formula API provides formula-based constructors for many statsmodels model classes. Each of these constructors is a convenience function that calls the from_formula method of the corresponding model class. The main statsmodels API is split into models: statsmodels.api: Cross-sectional models and methods. Canonically imported using import statsmodels.api as sm.

statsmodels.tsa.api: Time-series models and methods. Canonically imported using import statsmodels.tsa.api as tsa. statsmodels.formula.api: A convenience interface for specifying models using formula strings and DataFrames. This API directly exposes the from_formula class method of models that support the formula API. Canonically imported using import statsmodels.formula.api as smf The API focuses on models and the most frequently used statistical test, and tools.

Import Paths and Structure explains the design of the two API modules and how importing from the API differs from directly importing from the module where the model is defined. See the detailed topic pages in the User Guide for a complete list of available models, statistics, and tools. Every tutorial you read shows a different way to import Statsmodels. One guide starts with import statsmodels.api as sm. Another uses from statsmodels.formula.api import ols. A third imports directly from submodules like from statsmodels.regression.linear_model import OLS.

Which approach should you use? The confusion stems from a deliberate design choice. Statsmodels offers multiple import paths because different users need different things. Researchers writing academic papers want one workflow. Data scientists doing quick exploratory analysis want another. Understanding these three approaches will save you from blindly copying code that doesn’t match your actual needs.

The statsmodels.api module serves as your main gateway to the library. When you import sm, you get access to the most commonly used models and functions through a clean namespace. Ordinary Least Squares becomes sm.OLS. Logistic regression becomes sm.Logit. The add_constant function becomes sm.add_constant. The statsmodels.formula.api module gives you R-style formula syntax.

Instead of manually separating your endog and exog variables, you write a formula string that describes the relationship. The lowercase function names (ols instead of OLS) signal that you’re using the formula interface. Direct imports pull specific classes or functions from their exact location in the library structure. You import only what you need, nothing more. This page documents the data handling and formula interface systems in statsmodels, which are responsible for processing input data in various formats, managing formula-based model specifications (similar to R), and handling data transformations for... The statsmodels library provides flexible data handling capabilities that allow users to specify models using either direct array inputs or a formula-based approach.

The data handling system processes various input types (NumPy arrays, pandas DataFrames, lists) and manages missing data, while the formula interface allows for concise model specification using R-style formulas. Sources: statsmodels/base/data.py56-95 statsmodels/formula/_manager.py168-250 statsmodels/formula/formulatools.py14-70 The data handling system is designed to process input data in different formats while preserving metadata. It abstracts away the details of data storage to provide a consistent interface for model estimation. The ModelData class serves as the base class for handling different data types: There was an error while loading.

Please reload this page. Create a Model from a formula and dataframe. An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame. Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.

Additional positional argument that are passed to the model. These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

This document explains how statsmodels handles various types of input data, converts them to consistent internal formats, and manages data transformations throughout the modeling process. For formula-based model specification, see Formula API. The statsmodels library provides a robust data management system that processes different input data types (NumPy arrays, pandas DataFrames, Series, Python lists) and transforms them into standardized internal formats. This system also handles missing values, maintains metadata, and reattaches that metadata to results. Sources: statsmodels/base/data.py57-505 statsmodels/formula/_manager.py168-894 Sources: statsmodels/base/data.py333-445 statsmodels/base/data.py453-505

The core of the data management system is the ModelData class hierarchy, which processes different types of input data. Create a Model from a formula and dataframe. An array-like object of booleans, integers, or index values that indicate the subset of df to use in the model. Assumes df is a pandas.DataFrame. Columns to drop from the design matrix. Cannot be used to drop terms involving categoricals.

Additional positional argument that are passed to the model. These are passed to the model with one exception. The eval_env keyword is passed to patsy. It can be either a patsy:patsy.EvalEnvironment object or an integer indicating the depth of the namespace to use. For example, the default eval_env=0 uses the calling namespace. If you wish to use a “clean” environment set eval_env=-1.

There was an error while loading. Please reload this page.

People Also Search

This Document Describes The Formula API In Statsmodels, Which Provides

This document describes the Formula API in statsmodels, which provides an R-style formula interface for specifying statistical models. The Formula API allows users to express model specifications using a concise, string-based syntax rather than directly managing design matrices. This approach simplifies model creation and enhances readability by allowing users to focus on the statistical relations...

Sources: Statsmodels/formula/api.py12-32 The Formula API Provides Formula-based Constructors For Many

Sources: statsmodels/formula/api.py12-32 The Formula API provides formula-based constructors for many statsmodels model classes. Each of these constructors is a convenience function that calls the from_formula method of the corresponding model class. The main statsmodels API is split into models: statsmodels.api: Cross-sectional models and methods. Canonically imported using import statsmodels.api...

Statsmodels.tsa.api: Time-series Models And Methods. Canonically Imported Using Import Statsmodels.tsa.api

statsmodels.tsa.api: Time-series models and methods. Canonically imported using import statsmodels.tsa.api as tsa. statsmodels.formula.api: A convenience interface for specifying models using formula strings and DataFrames. This API directly exposes the from_formula class method of models that support the formula API. Canonically imported using import statsmodels.formula.api as smf The API focuses...

Import Paths And Structure Explains The Design Of The Two

Import Paths and Structure explains the design of the two API modules and how importing from the API differs from directly importing from the module where the model is defined. See the detailed topic pages in the User Guide for a complete list of available models, statistics, and tools. Every tutorial you read shows a different way to import Statsmodels. One guide starts with import statsmodels.ap...

Which Approach Should You Use? The Confusion Stems From A

Which approach should you use? The confusion stems from a deliberate design choice. Statsmodels offers multiple import paths because different users need different things. Researchers writing academic papers want one workflow. Data scientists doing quick exploratory analysis want another. Understanding these three approaches will save you from blindly copying code that doesn’t match your actual ne...