Оптимизация загрузки данных в формате libsvm при решении двухклассовой задачи SVM методом усреднения решающих правил в условиях большой обучающей совокупности

Курбаков, М.Ю.; Макарова, А.И.; Сулимова, В.В.

Samara University Repository

Welcome to the Samara University repository!

This is an open electronic archive created to preserve and disseminate the scientific results of our university. The works of researchers, faculty members, and students are available to everyone for academic and research purposes.

The repository contains dissertations, research articles, educational and methodological materials, monographs, and regulatory documents. The collections cover a wide range of fields: from aerospace technologies, engine engineering, and computer science to history, linguistics, mathematics, and ecology. The archive brings together the scientific heritage of the university, including materials prepared during the period of Samara State Aerospace University and Samara State University.

Looking for publications by author, title, date, or subject? Full-text search will provide relevant results. Want to explore the academic life more closely? Browse materials by the universityâs organizational units.

Descriptions of all publications and articles available in the repository can be found in the libraryâs electronic catalog. Staff and students can log in via the menu "Login -> My Resource Archive" using their personal account credentials (SSAU_id).

Use the archive for study, research, and professional development!

Title:	Оптимизация загрузки данных в формате libsvm при решении двухклассовой задачи SVM методом усреднения решающих правил в условиях большой обучающей совокупности
Other Titles:	Data load optimization for solving SVM problem via averaging decision rules method for big training sets
Authors:	Курбаков, М.Ю. Макарова, А.И. Сулимова, В.В.
Issue Date:	May-2019
Publisher:	Новая техника
Citation:	Курбаков М.Ю. Оптимизация загрузки данных в формате libsvm при решении двухклассовой задачи SVM методом усреднения решающих правил в условиях большой обучающей совокупности / Курбаков М.Ю., Макарова А.И., Сулимова В.В. // Сборник трудов ИТНТ-2019 [Текст]: V междунар. конф. и молодеж. шк. "Информ. технологии и нанотехнологии": 21-24 мая: в 4 т. / Самар. нац.-исслед. ун-т им. С. П. Королева (Самар. ун-т), Ин-т систем. обраб. изобр. РАН-фил. ФНИЦ "Кристаллография и фотоника" РАН; [под ред. В.А. Фурсова]. - Самара: Новая техника, 2019. – Т. 4: Науки о данных. - 2019. - С. 53-60.
Abstract:	Метод опорных векторов (SVM) является одним из наиболее удобных и эффективных инструментов двухклассового распознавания. Однако существуют некоторые проблемы, препятствующие его применению для обучения в условиях больших объемов данных, в частности, проблема высокой вычислительной сложности процедуры обучения распознаванию и проблема хранения полного набора данных в оперативной памяти. В предыдущей работе нами был предложен метод усреднения решающих правил, направленный на решение первой проблемы, позволяющий быстро найти приближенное, но не сильно отличающееся от точного решение задачи SVM. В данной работе мы предлагаем решение второй проблемы - специализированную схему работы с данными, ориентированную на предложенный нами подход и оптимизирующую работу с памятью в условиях больших объемов данных. Предложенная схема основана на механизме отображения файлов в память и позволяет эффективно осуществлять загрузку произвольных подвыборок объектов из файла в традиционном формате libsvm. Экспериментальное исследование показывает преимущество данной схемы по сравнению с классическими способами работы с данными в том же формате. The Support Vector Machines (SVM) is one of the most convenient and effective instruments of two-class recognition. But there are some problems of its application for training in big data sets. One of these problems is the high computational complexity and the other consists in the necessity to save the full data set in RAM. The first problem can be decided by our decision rule averaging method, which allows us to quickly find an SVM solution that is close to exact. In this paper a specialized data handling scheme is proposed, which allows to avoid a one-time download of the full training set into the RAM. The proposed approach is based on the system mechanism of mapping files into memory and allows us to efficiently load arbitrary subsamples of objects from a file in the libsvm format, providing a significantly higher speed of work on large training sets compared to traditional methods of working with data. The proposed approach can be applied jointly with any incremental training methods that require fast loading from a libsvm file of an arbitrary subsamples of objects.
URI:	http://repo.ssau.ru/jspui/handle/123456789/11066
Appears in Collections:	Информационные технологии и нанотехнологии

Files in This Item:

File	Description	Size	Format
paper7.pdf	Основная статья	851.62 kB	Adobe PDF	View/Open

Show full item record