What is the thesis about?
Scene-Text Recognition (STR) is a sub-field of computer vision that tackles the problem of text localization and recognition in natural images. Since scene-text provides crucial semantic information for high-level tasks, continued research interest has resulted in great leaps in performance. Much of this success is thanks to the surge of deep learning, which has significantly pushed the capabilities of STR models. However, these models adopt a purely generic approach toward text extraction, where all text is treated indistinctively and the possible semantics of the textual content are ignored. We identify and study two main disadvantages which are the consequence of this generic nature.
The first one is the reliance on vocabulary priors by the recognition step, which can degrade recognition performance on unseen words and morphological constructions. The second one is related to the \textit{detection granularity}, which we define as the boundary at which the network separates text into individual instances. Most networks establish this localization boundary at word level. If our downstream application requires textual expressions that feature spaces or line breaks, generic STR detectors will split it into different instances.
Quan: 21/10/2024
Més posts d'Esdeveniments
Cap comentari:
Publica un comentari a l'entrada