Pull Requests Huggingface Lighteval Github

Leo Migdal

-Nov 17, 2025, 8:18 PM

pull requests huggingface lighteval github

There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.

There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team. Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether your model is being served somewhere or already loaded in memory.

Dive deep into your model's performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up. Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own custom task and custom metric, tailored to your needs. Lighteval supports 1000+ evaluation tasks across multiple domains and languages. Use this space to find what you need, or, here's an overview of some popular benchmarks: Note: lighteval is currently completely untested on Windows, and we don't support it yet. (Should be fully functional on Mac/Linux)

and get access to the augmented documentation experience 🤗 Lighteval is your all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack up. Evaluate your models using the most popular and efficient inference backends: Customization at your fingertips: create new tasks, metrics or model tailored to your needs, or browse all our existing tasks and metrics. Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.

There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails. Already on GitHub? Sign in to your account Fix PERPLEXITY task evaluation crash due to incorrect sampling method detection Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team.

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether it's transformers, tgi, vllm, or nanotron—with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up. Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own custom task and custom metric, tailored to your needs. Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally. Lighteval allows for many extras when installing, see here for a complete list. There was an error while loading.

Please reload this page. and get access to the augmented documentation experience You can evaluate AI models on the Hub in multiple ways and this page will guide you through the different options: Community leaderboards show how a model performs on a given task or domain. For example, there are leaderboards for question answering, reasoning, classification, vision, and audio. If you’re tackling a new task, you can use a leaderboard to see how a model performs on it.

Here are some examples of community leaderboards: There are many more leaderboards on the Hub. Check out all the leaderboards via this search or use this dedicated Space to find a leaderboard for your task. and get access to the augmented documentation experience Lighteval can be installed from PyPI or from source. This guide covers all installation options and dependencies.

The simplest way to install Lighteval is from PyPI: This installs the core package with all essential dependencies for basic evaluation tasks. Source installation is recommended for developers who want to contribute to Lighteval or need the latest features:

People Also Search

There Was An Error While Loading. Please Reload This Page.

There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.

There Was An Error While Loading. Please Reload This Page.

Dive Deep Into Your Model's Performance By Saving And Exploring

And Get Access To The Augmented Documentation Experience 🤗 Lighteval

There Was An Error While Loading. Please Reload This Page.

There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page.