Monk - Nuggets (examples of data analysis)


Last update: Wed May 5 10:32:11 CEST 2021
  • Expected number of human labels given pattern distance from class centroid


    This graph shows the distribution of number of human labels expected in the harvest,
    given the distance of a word or character sample from the corresponding class centroid.
    At this point in time, Febr. 24, 2017, a total of 257576 human-labeled/human-confirmed
    images was harvested over the collections at that time. At a pattern distance of 0.1 and below,
    at least 50 human-based labels can be expected. For a lifelong machine learning engine
    such as Monk, the challenge is to attract the labelers to prospect samples that help the
    learning process to enter a snowball avalanche of label collection.

    For a general discussion of this topic, see:

    Schomaker, L. (2021)
    Lifelong learning for text retrieval and recognition in
    historical handwritten document collections
    Handwritten Historical Document Analysis,
    Recognition, and Retrieval - State of the Art and Future Trends:
    Series in Machine Perception and Artificial Intelligence.
    Fischer, A., Liwicki, M. & Ingold, R. (Eds.),
    World Scientific Publishing, Vol. 89. p. 221-248

    van Oosten, J-P. (2021).
    The snowball principle for handwritten word-image retrieval:
    The importance of labelled data and humans in the loop.
    [Dissertation, promotor L. Schomaker]
    University of Groningen. https://doi.org/10.33612/diss.160750597


Copyright 2008-2021 Lambert Schomaker