Issues Huggingface Lighteval Github

Leo Migdal

-Nov 17, 2025, 6:06 PM

There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. I have been trying to run the given example: lighteval accelerate "pretrained=gpt2" "leaderboard|truthfulqa:mc|0|0" both locally and on Google Colab.

I have installed lighteval and it always runs out of memory both locally and in Google Colab. I tried changing it to lighteval vllm but I was getting other unexpected issues there so I'm really not sure what to do. lighteval accelerate "pretrained=gpt2" "leaderboard|truthfulqa:mc|0|0" I expected it to evaluate the GPT2 model on the TruthfulQA dataset, but that doesn't seem to be working properly. Please help! There was an error while loading.

Please reload this page. When cloning the repo, I'm encountering some issues where certain tests files are unable to be created. Here's the error trace. I believe this is due to the filenames with contains special characters like "|" and ":". For example, running this example also gives a similar error. There was an error while loading.

Please reload this page. Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team. Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether your model is being served somewhere or already loaded in memory. Dive deep into your model's performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up. Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own custom task and custom metric, tailored to your needs. Lighteval supports 1000+ evaluation tasks across multiple domains and languages.

Use this space to find what you need, or, here's an overview of some popular benchmarks: Note: lighteval is currently completely untested on Windows, and we don't support it yet. (Should be fully functional on Mac/Linux) and get access to the augmented documentation experience 🤗 Lighteval is your all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack up.

Evaluate your models using the most popular and efficient inference backends: Customization at your fingertips: create new tasks, metrics or model tailored to your needs, or browse all our existing tasks and metrics. Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally. There was an error while loading. Please reload this page. There was an error while loading.

Please reload this page. There was an error while loading. Please reload this page. There was an error while loading. Please reload this page. There was an error while loading.

Please reload this page. and get access to the augmented documentation experience We recommend using the --help flag to get more information about the available options for each command. lighteval --help Lighteval can be used with several different commands, each optimized for different evaluation scenarios. To evaluate GPT-2 on the Truthful QA benchmark with 🤗 Accelerate, run:

Tasks have a function applied at the sample level and one at the corpus level. For example, Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends RepositoryStats indexes 719,291 repositories, of these huggingface/lighteval is ranked #27,459 (96th percentile) for total stargazers, and #81,975 for total watchers. Github reports the primary language for this repository as Python, for repositories using this language it is ranked #4,519/153,693. huggingface/lighteval is also tagged with popular topics, for these it's ranked: huggingface (#36/481)

huggingface/lighteval has 36 open pull requests on Github, 505 pull requests have been merged over the lifetime of the repository. Github issues are enabled, there are 180 open issues and 249 closed issues. and get access to the augmented documentation experience You can evaluate AI models on the Hub in multiple ways and this page will guide you through the different options: Community leaderboards show how a model performs on a given task or domain. For example, there are leaderboards for question answering, reasoning, classification, vision, and audio.

If you’re tackling a new task, you can use a leaderboard to see how a model performs on it. Here are some examples of community leaderboards: There are many more leaderboards on the Hub. Check out all the leaderboards via this search or use this dedicated Space to find a leaderboard for your task.

Issues Huggingface Lighteval Github

People Also Search

There Was An Error While Loading. Please Reload This Page.

I Have Installed Lighteval And It Always Runs Out Of

Please Reload This Page. When Cloning The Repo, I'm Encountering

Please Reload This Page. Your Go-to Toolkit For Lightning-fast, Flexible

Use This Space To Find What You Need, Or, Here's