Subcellular Human Protein Dataset

Dataset details

Last updated: 15 Dec 2022
Meta Album ID MCR.PRT
Domain ID MCR
Domain Name Microscopy
Set Number 2
Dataset ID PRT
Dataset Name Subcellular Human Protein
Short Description Subcellular protein patterns in human cells
Long Description This dataset is a subset of the Subcellular dataset in the Protein Atlas project(https://www.proteinatlas.org/). The original dataset, which stems from the Human Protein Atlas Image Classification Kaggle competition(https://www.kaggle.com/competitions/human-protein-atlas-image-classification), comprises 31 072 RGBY images of size 512x512 px, each of which belongs to one or more out of 28 classes. The labels correspond to protein organelle localizations. For Meta-Album, we performed two modifications: (1), to turn the dataset into a multi-class dataset, we dropped all images belonging to more than a single class and also those images that belong to classes with less than 40 members; (2) we converted the remaining images into RGB simply by dropping the yellow channel; this was also a common practice in the competition. Finally, and as for all datasets in Meta-Album, the images from the original dataset were resized to 128x128 image size.
# Classes 21
# Images 15050
Keywords human protein, subcellular
Data Format images
Image size 128x128
License
(original data release)
CC BY-SA 3.0
License URL
(original data release)
https://www.proteinatlas.org/about/licence
License
(Meta-Album data release)
CC BY-SA 3.0
License URL
(Meta-Album data release)
https://www.proteinatlas.org/about/licence
Source The Human Protein Atlas
Source URL https://proteinatlas.org
https://www.kaggle.com/c/human-protein-atlas-image-classification
Original Author Peter J Thul, Lovisa Akesson, Mikaela Wiking, Diana Mahdessian, Aikaterini Geladaki, Hammou Ait Blal, Tove Alm, Anna Asplund, Lars Bjork, Lisa Breckels, and others
Original contact contact@proteinatlas.org
Meta Album author Felix Mohr
Created Date 01 June 2022
Contact Name Felix Mohr
Contact Email meta-album@chalearn.org
Contact URL https://meta-album.github.io/

Download Meta-data files

Download Dataset from OpenML

Dataset Version OpenML ID
Micro 44278 Download
Mini 44308 Download
Extended 44342 Download

Code to download dataset using OpenML API

      # import openml
      import openml
  
      # download dataset with DATASET_ID. DATASET_ID is OpenML ID
      dataset = openml.datasets.get_dataset(DATASET_ID)
  
      # display dataset info
      print(dataset.name)
              

Sample Images

Cite this dataset

@article{thul2017subcellular, 
  title={A subcellular map of the human proteome},
  author={Thul, Peter J and Akesson, Lovisa and Wiking, Mikaela and Mahdessian, Diana and Geladaki, Aikaterini and Ait Blal, Hammou and Alm, Tove and Asplund, Anna and Bjork, Lars and Breckels, Lisa M},
  journal={Science},
  volume={356},
  number={6340},
  year={2017},
  publisher={American Association for the Advancement of Science}
}

              
Download as bib

Cite Meta-Album

  @inproceedings{meta-album-2022,
    title={Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification},
    author={Ullah, Ihsan and Carrion, Dustin and Escalera, Sergio and Guyon, Isabelle M and Huisman, Mike and Mohr, Felix and van Rijn, Jan N and Sun, Haozhe and Vanschoren, Joaquin and Vu, Phan Anh},
    booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    url = {https://meta-album.github.io/},
    year = {2022}
  }
              
Download as bib