Many heads but one brain: FusionBrain – a single multimodal multitask architecture and a competition

Shonenkov, A.V.; Karachev, D.K.; Arkhipkin, V.S.; Bakshandaeva, D.D.; Voronov, A.D.; Dimitrov, D.V.; Davydova, V.F.; Potanin, M.S.; Tutubalina, E.V.; Kuznetsov, A.V.; Petiushko, A.A.

Отрывок: 01), the final result can range from 0 to 4. We also measured the performance of state-of-the-art single-task models – PLBART [52], Easter2 [53], MDETR [48] – for each of the subtasks on our private test sets (see Table 4). It should be noted that the vast majority of models (including the state-of-the-art one) s...

Название :	Many heads but one brain: FusionBrain – a single multimodal multitask architecture and a competition
Авторы/Редакторы :	Bakshandaeva, D.D. Dimitrov, D.V. Arkhipkin, V.S. Shonenkov, A.V. Potanin, M.S. Karachev, D.K. Kuznetsov, A.V. Voronov, A.D. Petiushko, A.A. Davydova, V.F. Tutubalina, E.V.
Ключевые слова :	multimodality, multitask, bilinguality, foundation models, FusionBrain challenge
Дата публикации :	Фев-2023
Издательство :	Самарский национальный исследовательский университет
Библиографическое описание :	Bakshandaeva D, Dimitrov D, Arkhipkin V, Shonenkov A, Potanin M, Karachev D, Kuznetsov A, Voronov A, Petiushko A, Davydova V, Tutubalina E. Many heads but one brain: FusionBrain – a single multimodal multitask architecture and a competition. Computer Optics 2023; 47(1): 185-195. DOI: 10.18287/ 2412-6179-CO-1220.
Серия/номер :	47;1
Аннотация :	Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called FusionBrain, the first competition which is targeted to make a universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The FusionBrain Challenge combines the following specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot Object Detection, and Visual Question Answering. We have created datasets for each task to test the participants’ submissions on it. Moreover, we have collected and made publicly available a new handwritten dataset in both English and Russian, which consists of 94,128 pairs of images and texts. We also propose a multimodal and multitask architecture – a baseline solution, in the centre of which is a frozen foundation model and which has been trained in Fusion mode along with Single-task mode. The proposed Fusion approach proves to be competitive and more energy-efficient compared to the task-specific one.
URI (Унифицированный идентификатор ресурса) :	10.18287/2412-6179-CO-1220 http://repo.ssau.ru/handle/Zhurnal-Komputernaya-optika/Many-heads-but-one-brain-FusionBrain-–-a-single-multimodal-multitask-architecture-and-a-competition-102049
Другие идентификаторы :	Dspace\SGAU\20230216\102049
Располагается в коллекциях:	Журнал "Компьютерная оптика"

Файлы этого ресурса:

Файл	Описание	Размер	Формат
21_Bakshandaeva_Dimitrov_Arkhipkin_Shonenkov_Potanin_Karachev_Kuznetsov-aut-MA-L-JuN2-gr.pdf	Основная статья	1.49 MB	Adobe PDF	Просмотреть/Открыть

Показать полное описание ресурса Просмотр статистики
Поделиться:

Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.

Репозиторий Самарского университета