-
Pattern Recognition
-
News:
5/20/2026
Two Papers accepted at ICDAR 2026
Building on prior work on the CM/1 dataset, the paper studies vision-language models such as PaliGemma and Donut under severe annotation scarcity, including settings with only 1% of available labels. The work [...] pre-training and synthetic document generation as strategies for low-data learning, and introduces CM/1v2, an extended dataset with additional annotated fields including nationality, place of birth, and religion [...] images, the method achieves new state-of-the-art results on the HisFragIR20 benchmark, with 97.9% Top-1 accuracy and 78.8% mAP. Together, the two papers contribute new datasets, methods, and evaluations for …