Отрывок: To acquire text field images without spatial distortions introduced by the previous document pro- cessing steps (document detection, per-field segmenta- tion, etc.), we used the field bounding box coordinates provided in the dataset ground truth for each unique iden- tity document, as well as the document boundary coordi- nates provided for each photo, scan, and video frame. To obtain rectified field images we extended the annotated bounding boxes...
Название : MIDV-2020: a comprehensive benchmark dataset for identity document analysis
Авторы/Редакторы : Bulatov, K.B.
Emelianova, E.V.
Tropin, D.V.
Skoryukina, N.S.
Chernyshova, Y.S.
Sheshkus, A.V.
Usilin, S.A.
Ming, Z.
Burie, J.-C.
Luqman, M.M.
Arlazarov, V.V.
Ключевые слова : document analysis, document recognition, identity documents, open data, video recognition, document location, text recognition, face detection
Дата публикации : Апр-2022
Издательство : Самарский национальный исследовательский университет
Библиографическое описание : Bulatov KB, Emelianova EV, Tropin DV, Skoryukina NS, Chernyshova YS, Sheshkus AV, Usilin SA, Ming Z, Burie JC, Luqman MM, Arlazarov VV. MIDV-2020: a comprehensive benchmark dataset for identity document analysis. Computer Optics 2022; 46(2): 252-270. DOI: 10.18287/2412-6179-CO-1006.
Серия/номер : 46;2
Аннотация : Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. The dataset contains 72409 annotated images in total, making it the largest publicly available identity document dataset to the date of publication. We describe the structure of the dataset, its content and annotations, and present baseline experimental results to serve as a basis for future research. For the task of document location and identification content-independent, feature-based, and semantic segmentation-based methods were evaluated. For the task of document text field recognition, the Tesseract system was evaluated on field and character levels with grouping by field alphabets and document types. For the task of face detection, the performance of Multi Task Cascaded Convolutional Neural Networks-based method was evaluated separately for different types of image input modes. The baseline evaluations show that the existing methods of identity document analysis have a lot of room for improvement given modern challenges. We believe that the proposed dataset will prove invaluable for advancement of the field of document analysis and recognition.
URI (Унифицированный идентификатор ресурса) : 10.18287/2412-6179-CO-1006
http://repo.ssau.ru/handle/Zhurnal-Komputernaya-optika/MIDV2020-a-comprehensive-benchmark-dataset-for-identity-document-analysis-102063
Другие идентификаторы : Dspace\SGAU\20230217\102063
ГРНТИ: 28.23.15
Располагается в коллекциях: Журнал "Компьютерная оптика"

Файлы этого ресурса:
Файл Описание Размер Формат  
12_Bulatov-Emelianova-Tropin-et al_KI-Lit-MA-SV-JuN2-Gr.pdfОсновная статья7.46 MBAdobe PDFПросмотреть/Открыть



Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.