Отрывок: In other words, the text appears as a single long length string (an wide string). 3.3. Pre-processing of a text. It is perform pre-processing of a text: the text of a document is replaced by a filtered copy. For this purpose the following steps are performed: • Removal of HTML tags. A text document can to contain HTML tags. Because HTML tags are used for a document formatting, they do not affect to its contents. Therefore presence of HTML tags will just interfere ...
Полная запись метаданных
Поле DC Значение Язык
dc.contributor.authorE.V. Sharapova-
dc.contributor.authorR.V. Sharapov-
dc.date.accessioned2018-05-22 10:06:42-
dc.date.available2018-05-22 10:06:42-
dc.date.issued2018-
dc.identifierDspace\SGAU\20180518\69667ru
dc.identifier.citationE.V. Sharapova. The problem of fuzzy duplicate detection of large texts / E.V. Sharapova, R.V. Sharapov // Сборник трудов IV международной конференции и молодежной школы «Информационные технологии и нанотехнологии» (ИТНТ-2018) - Самара: Новая техника, 2018. - С.2565-2572.ru
dc.identifier.urihttp://repo.ssau.ru/handle/Informacionnye-tehnologii-i-nanotehnologii/The-problem-of-fuzzy-duplicate-detection-of-large-texts-69667-
dc.descriptionОсновная статьяru
dc.description.abstractIn the paper, we considered the problem of fuzzy duplicate detection. There are given the basic approaches to detection of text duplicates – distance between strings, fuzzy search algorithms without indexing data, fuzzy search algorithms with indexing data. The review of existing methods for the fuzzy duplicate detection is given. The algorithm of fuzzy duplicate detection is present. The algorithm of fuzzy duplicate texts detection was implemented in the system AVTOR.NET. The use of filtering text, stemming and character replacement, allow the algorithm to found duplicates even in minor modified texts.ru
dc.language.isoen_USru
dc.publisherНовая техникаru
dc.subjectfuzzy duplicate detectingru
dc.subjectfuzzy duplicateru
dc.subjecttextru
dc.titleThe problem of fuzzy duplicate detection of large textsru
dc.typeArticleru
dc.textpartIn other words, the text appears as a single long length string (an wide string). 3.3. Pre-processing of a text. It is perform pre-processing of a text: the text of a document is replaced by a filtered copy. For this purpose the following steps are performed: • Removal of HTML tags. A text document can to contain HTML tags. Because HTML tags are used for a document formatting, they do not affect to its contents. Therefore presence of HTML tags will just interfere ...-
Располагается в коллекциях: Информационные технологии и нанотехнологии

Файлы этого ресурса:
Файл Описание Размер Формат  
The problem of fuzzy duplicate detection of large texts.pdfОсновная статья168.43 kBAdobe PDFПросмотреть/Открыть



Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.