About
Meta Album is a meta-dataset created for few-shot learning, meta-learning, continual learning and so on.
Meta Album consists of 40 datasets from 10 unique domains. Datasets are arranged in sets (10 datasets, one dataset from each domain).
It is a continuously growing meta-dataset. See our datasets in Datasets Section
.
We repurposed datasets that were generously made available by original creators, see credits page
.
All datasets are free for use for academic purposes, provided that proper credits are given. For your convenience, you may cite
our paper
,
which references all original creators.
License
Meta-Album is released under a CC BY-NC 4.0
license permitting non-commercial use for research purposes, provided that you cite
us. Additionally, redistributed datasets have their own license, see the credits page
. All resources made available through this website provided “as is”. The curators of Meta-Album (and their home institutions and their sponsors) who have worked on its preparation, this website, the code
provided to read, process data, and run baseline methods, make no warranties concerning the licensed material, including fitness for any purpose, non-infringement absence of defects or errors, accuracy, and they decline any liability for losses or other possible consequences that may arise by using such material.
This briefly summarizes the terms of the license CC BY-NC 4.0
and the disclaimer (that the license includes).
Recommended use
The recommended use of Meta-Album is to conduct fundamental research on machine learning algorithms and conduct benchmarks, particularly in: few-shot learning, meta-learning, continual learning, transfer learning, and image classification.
Code
We provide code in our GitHub Repository
for
- Data processing
- Data formatting
- Quality control
- Meta-Album use cases
- Challenge winners code (NeurIPS 2021 MetaDL competition)
Visit our GitHub repository for more details.
Meta-Album GitHub RepositoryDatasets
We list in the tables below the data statistics of the original datasets. We generally chose image classification datasets with at least 20 classes having more than 40 examples per class. Each dataset was transformed into 128x128 pixel images. The data is available in 4 versions:
- Original data (from the original creators’ website)
- Meta Album extended = All classes having at least 40 examples per class, images 128x128 pixel
- Meta Album mini = same as Meta Album extended, but we randomly sampled only 40 examples for each class (hence the datasets are class-balanced)
- Meta Album micro = same as Meta Album mini, but only 20 randomly selected classes.
Download Instructions
Our datasets are hosted on OpenML platform
. The following piece of code will help you download the datasets.
Install OpenML for python
Download using OpenML python API
# import openml import openml # download dataset with DATASET_ID. Check Dataset detail page for DATASET_ID dataset = openml.datasets.get_dataset(DATASET_ID, download_data=True, download_all_files=True) # display dataset info print(dataset.name)
Datasets are downloaded in openml cache directory. You can check it with this code:
# display openml cache directory
print(openml.config.cache_directory)
Meta Album ID | Domain | Original Dataset | # Classes | # Images | More |
---|---|---|---|---|---|
LR_AM.BRD | Large Animals | Birds | 315 | 62, 454 | Details |
SM_AM.PLK | Small Animals | Plankton | 102 | 477,513 | Details |
PLT.FLW | Plants | Flowers | 102 | 13,069 | Details |
PLT_DIS.PLT_VIL | Plant Diseases | Plant Village | 38 | 56, 625 | Details |
MCR.BCT | Microscopy | Bacteria | 33 | 6,180 | Details |
REM_SEN.RESISC | Remote Sensing | RESISC | 45 | 34,100 | Details |
VCL.CRS | Vehicles | Cars | 196 | 24,825 | Details |
MNF.TEX | Manufacturing | Textures | 64 | 12,035 | Details |
HUM_ACT.SPT | Human Actions | 73 Sports | 73 | 14,136 | Details |
OCR.MD_MIX | OCR | Omniprint-MD-mix | 706 | 29,040 | Details |
Meta Album ID | Domain | Original Dataset | # Classes | # Images | More |
---|---|---|---|---|---|
LR_AM.DOG | Large Animals | Dogs | 120 | 26,080 | Details |
SM_AM.INS_2 | Small Animals | Insects 2 | 102 | 80,102 | Details |
PLT.PLT_NET | Plants | PlantNet | 25 | 122,488 | Details |
PLT_DIS.MED_LF | Plant Diseases | Medicinal Leaf | 26 | 3,396 | Details |
MCR.PNU | Microscopy | PanNuke | 19 | 7,090 | Details |
REM_SEN.RSICB | Remote Sensing | RSICB | 45 | 39,307 | Details |
VCL.APL | Vehicles | Airplanes | 21 | 11,265 | Details |
MNF.TEX_DTD | Manufacturing | Textures DTD | 47 | 8,320 | Details |
HUM_ACT.ACT_40 | Human Actions | Stanford 40 Actions | 40 | 5,749 | Details |
OCR.MD_5_BIS | OCR | Omniprint-MD-5-bis | 706 | 29,040 | Details |
Meta Album ID | Domain | Original Dataset | # Classes | # Images | More |
---|---|---|---|---|---|
LR_AM.AWA | Large Animals | Animals with Attributes | 50 | 40,118 | Details |
SM_AM.INS | Small Animals | Insects | 117 | 175,416 | Details |
PLT.FNG | Plants | Fungi | 25 | 16,922 | Details |
PLT_DIS.PLT_DOC | Plant Diseases | PlantDoc | 27 | 4,429 | Details |
MCR.PRT | Microscopy | Subcel. Human Protein | 21 | 16,690 | Details |
REM_SEN.RSD | Remote Sensing | RSD | 43 | 46,145 | Details |
VCL.BTS | Vehicles | Boats | 26 | 140,207 | Details |
MNF.TEX_ALOT | Manufacturing | Textures ALOT | 250 | 35,800 | Details |
HUM_ACT.ACT_410 | Human Actions | MPII Human Pose | 29 | 4,362 | Details |
OCR.MD_6 | OCR | Omniprint-MD-6 | 703 | 28,920 | Details |
Citation
If you are using Meta Album, cite our paper as mentioned below:
@inproceedings{meta-album-2022, title={Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification}, author={Ullah, Ihsan and Carrion, Dustin and Escalera, Sergio and Guyon, Isabelle M and Huisman, Mike and Mohr, Felix and van Rijn, Jan N and Sun, Haozhe and Vanschoren, Joaquin and Vu, Phan Anh}, booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, url = {https://meta-album.github.io/}, year = {2022} }Download as bib Meta-Album Paper