OmniPrint-MD-5-bis Dataset

Dataset details

Last updated: 15 Dec 2022
Meta Album ID OCR.MD_5_BIS
Domain ID OCR
Domain Name Optical Character Recognition
Set Number 1
Dataset ID MD_5_BIS
Dataset Name OmniPrint-MD-5-bis
Short Description Character images with a specific set of nuisance parameters
Long Description OmniPrint-MD-5-bis dataset consists of 28 240 images (128x128, RGB) from 706 categories. The images are synthesized with OmniPrint, and no further processing was done. The OmniPrint synthesis parameters are stated as follows: font size is 192, image size is 128, the strength of random perspective transformation is 0.04, left/right/top/bottom margins are all 20% of the image size, the strength of pre-rasterization elastic transformation is 0.035, random translation is activated both horizontally and vertically, image blending method is Poisson Image Editing, rotation is within -60 and 60 degrees, horizontal shear is within -0.5 and 0.5, the foreground is filled with a random color, the background consists of images downloaded from Pexels(https://www.pexels.com/).
# Classes 706
# Images 28240
Keywords ocr
Data Format images
Image size 128x128
License
(original data release)
CC BY 4.0
License URL
(original data release)
https://creativecommons.org/licenses/by/4.0/
License
(Meta-Album data release)
CC BY 4.0
License URL
(Meta-Album data release)
https://creativecommons.org/licenses/by/4.0/
Source OmniPrint
Source URL https://github.com/SunHaozhe/OmniPrint
Original Author Haozhe Sun
Original contact sunhaozhe275940200@gmail.com
Meta Album author Haozhe Sun
Created Date 25 June 2021
Contact Name Haozhe Sun
Contact Email meta-album@chalearn.org
Contact URL https://meta-album.github.io/

Download Meta-data files

Download Dataset from OpenML

Dataset Version OpenML ID
Micro 44252 Download
Mini 44296 Download

Code to download dataset using OpenML API

      # import openml
      import openml
  
      # download dataset with DATASET_ID. DATASET_ID is OpenML ID
      dataset = openml.datasets.get_dataset(DATASET_ID)
  
      # display dataset info
      print(dataset.name)
              

Sample Images

Cite this dataset

@inproceedings{sun2021omniprint,
    title={OmniPrint: A Configurable Printed Character Synthesizer},
    author={Haozhe Sun and Wei-Wei Tu and Isabelle M Guyon},
    booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
    year={2021},
    url={https://openreview.net/forum?id=R07XwJPmgpl}
}
              
Download as bib

Cite Meta-Album

  @inproceedings{meta-album-2022,
    title={Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification},
    author={Ullah, Ihsan and Carrion, Dustin and Escalera, Sergio and Guyon, Isabelle M and Huisman, Mike and Mohr, Felix and van Rijn, Jan N and Sun, Haozhe and Vanschoren, Joaquin and Vu, Phan Anh},
    booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    url = {https://meta-album.github.io/},
    year = {2022}
  }
              
Download as bib