OmniPrint-MD-mix Dataset

Dataset details

Last updated: 15 Dec 2022
Meta Album ID OCR.MD_MIX
Domain ID OCR
Domain Name Optical Character Recognition
Set Number 0
Dataset ID MD_MIX
Dataset Name OmniPrint-MD-mix
Short Description Character images with a specific set of nuisance parameters
Long Description OmniPrint-MD-mix dataset consists of 28 240 images (128x128, RGB) from 706 categories. The images are synthesized with OmniPrint, and no further processing was done. The OmniPrint synthesis parameters are stated as follows: font size is 192, image size is 128, the strength of random perspective transformation is 0.04, left/right/top/bottom margins are all 20% of the image size, the strength of pre-rasterization elastic transformation is 0.035, random translation is activated both horizontally and vertically, rotation is within -60 and 60 degrees, horizontal shear is within -0.5 and 0.5, brightness is within 0.8333 and 1.2, contrast is within 0.8333 and 1.2, color enhancement is within 0.8333 and 1.2. The other parameters vary between images. We designed 20 settings, each setting is used to synthesize 2 images. All images/textures consists of photos taken by a personal mobile phone.
# Classes 706
# Images 28240
Keywords ocr
Data Format images
Image size 128x128
License
(original data release)
CC BY 4.0
License URL
(original data release)
https://creativecommons.org/licenses/by/4.0/
License
(Meta-Album data release)
CC BY 4.0
License URL
(Meta-Album data release)
https://creativecommons.org/licenses/by/4.0/
Source OmniPrint
Source URL https://github.com/SunHaozhe/OmniPrint
Original Author Haozhe Sun
Original contact sunhaozhe275940200@gmail.com
Meta Album author Haozhe Sun
Created Date 25 June 2021
Contact Name Haozhe Sun
Contact Email meta-album@chalearn.org
Contact URL https://meta-album.github.io/

Download Meta-data files

Download Dataset from OpenML

Dataset Version OpenML ID
Micro 44243 Download
Mini 44287 Download

Code to download dataset using OpenML API

      # import openml
      import openml
  
      # download dataset with DATASET_ID. DATASET_ID is OpenML ID
      dataset = openml.datasets.get_dataset(DATASET_ID)
  
      # display dataset info
      print(dataset.name)
              

Sample Images

Cite this dataset

@inproceedings{sun2021omniprint,
    title={OmniPrint: A Configurable Printed Character Synthesizer},
    author={Haozhe Sun and Wei-Wei Tu and Isabelle M Guyon},
    booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
    year={2021},
    url={https://openreview.net/forum?id=R07XwJPmgpl}
}
              
Download as bib

Cite Meta-Album

  @inproceedings{meta-album-2022,
    title={Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification},
    author={Ullah, Ihsan and Carrion, Dustin and Escalera, Sergio and Guyon, Isabelle M and Huisman, Mike and Mohr, Felix and van Rijn, Jan N and Sun, Haozhe and Vanschoren, Joaquin and Vu, Phan Anh},
    booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    url = {https://meta-album.github.io/},
    year = {2022}
  }
              
Download as bib