Dataset Catalog

Leo Migdal
-
dataset catalog

An official website of the United States government Official websites use .govA .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPSA lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites. U.S. Department of Health & Human Services —

U.S. Department of Health & Human Services — TFDS now supports the Croissant 🥐 format! Read the documentation to know more. See our getting-started guide for a quick introduction. Last Updated: Tue, 02 Dec 2025 05:00:28 GMT

Access to this page requires authorization. You can try signing in or changing directories. Access to this page requires authorization. You can try changing directories. Improve the accuracy of your machine learning models with publicly available datasets. To save time on data discovery and preparation, use curated datasets that are ready for machine learning projects.

Want to try using Ask Learn to clarify or guide you through this topic? This site uses Jekyll and Just the Docs, a documentation theme. We learned a lot about the quality of datasets by examining the metadata for the datasets hosted by Hugging Face. The tables in this catalog list the metadata for a small subset of these datasets, small because of how we had to filter them. Here are the details of that process: How we group the remaining 95K datasets into tables:

More details of our analysis of the licenses can be found in the GitHub repo’s license-notes.md. Here we provide a few more of the interesting details. The ScanCode LicenseDB project classifies licenses into one of six categories. The 109K “good” datasets are categorized as shown in Table 1: For our purposes, Permissive and Public Domain qualify as “open”, yielding 95K datasets. A total of 19 Permissive licenses were found, shown in Table 2:

People Also Search

An Official Website Of The United States Government Official Websites

An official website of the United States government Official websites use .govA .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPSA lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites. U.S. Department of Health & Human Services —

U.S. Department Of Health & Human Services — TFDS Now

U.S. Department of Health & Human Services — TFDS now supports the Croissant 🥐 format! Read the documentation to know more. See our getting-started guide for a quick introduction. Last Updated: Tue, 02 Dec 2025 05:00:28 GMT

Access To This Page Requires Authorization. You Can Try Signing

Access to this page requires authorization. You can try signing in or changing directories. Access to this page requires authorization. You can try changing directories. Improve the accuracy of your machine learning models with publicly available datasets. To save time on data discovery and preparation, use curated datasets that are ready for machine learning projects.

Want To Try Using Ask Learn To Clarify Or Guide

Want to try using Ask Learn to clarify or guide you through this topic? This site uses Jekyll and Just the Docs, a documentation theme. We learned a lot about the quality of datasets by examining the metadata for the datasets hosted by Hugging Face. The tables in this catalog list the metadata for a small subset of these datasets, small because of how we had to filter them. Here are the details of...

More Details Of Our Analysis Of The Licenses Can Be

More details of our analysis of the licenses can be found in the GitHub repo’s license-notes.md. Here we provide a few more of the interesting details. The ScanCode LicenseDB project classifies licenses into one of six categories. The 109K “good” datasets are categorized as shown in Table 1: For our purposes, Permissive and Public Domain qualify as “open”, yielding 95K datasets. A total of 19 Perm...