Отрывок: In other words, the text appears as a single long length string (an wide string). 3.3. Pre-processing of a text. It is perform pre-processing of a text: the text of a document is replaced by a filtered copy. For this purpose the following steps are performed: • Removal of HTML tags. A text document can to contain HTML tags. Because HTML tags are used for a document formatting, they do not affect to its contents. Therefore presence of HTML tags will just interfere ...
Полная запись метаданных
Поле DC | Значение | Язык |
---|---|---|
dc.contributor.author | E.V. Sharapova | - |
dc.contributor.author | R.V. Sharapov | - |
dc.date.accessioned | 2018-05-22 10:06:42 | - |
dc.date.available | 2018-05-22 10:06:42 | - |
dc.date.issued | 2018 | - |
dc.identifier | Dspace\SGAU\20180518\69667 | ru |
dc.identifier.citation | E.V. Sharapova. The problem of fuzzy duplicate detection of large texts / E.V. Sharapova, R.V. Sharapov // Сборник трудов IV международной конференции и молодежной школы «Информационные технологии и нанотехнологии» (ИТНТ-2018) - Самара: Новая техника, 2018. - С.2565-2572. | ru |
dc.identifier.uri | http://repo.ssau.ru/handle/Informacionnye-tehnologii-i-nanotehnologii/The-problem-of-fuzzy-duplicate-detection-of-large-texts-69667 | - |
dc.description | Основная статья | ru |
dc.description.abstract | In the paper, we considered the problem of fuzzy duplicate detection. There are given the basic approaches to detection of text duplicates – distance between strings, fuzzy search algorithms without indexing data, fuzzy search algorithms with indexing data. The review of existing methods for the fuzzy duplicate detection is given. The algorithm of fuzzy duplicate detection is present. The algorithm of fuzzy duplicate texts detection was implemented in the system AVTOR.NET. The use of filtering text, stemming and character replacement, allow the algorithm to found duplicates even in minor modified texts. | ru |
dc.language.iso | en_US | ru |
dc.publisher | Новая техника | ru |
dc.subject | fuzzy duplicate detecting | ru |
dc.subject | fuzzy duplicate | ru |
dc.subject | text | ru |
dc.title | The problem of fuzzy duplicate detection of large texts | ru |
dc.type | Article | ru |
dc.textpart | In other words, the text appears as a single long length string (an wide string). 3.3. Pre-processing of a text. It is perform pre-processing of a text: the text of a document is replaced by a filtered copy. For this purpose the following steps are performed: • Removal of HTML tags. A text document can to contain HTML tags. Because HTML tags are used for a document formatting, they do not affect to its contents. Therefore presence of HTML tags will just interfere ... | - |
Располагается в коллекциях: | Информационные технологии и нанотехнологии |
Файлы этого ресурса:
Файл | Описание | Размер | Формат | |
---|---|---|---|---|
The problem of fuzzy duplicate detection of large texts.pdf | Основная статья | 168.43 kB | Adobe PDF | Просмотреть/Открыть |
Показать базовое описание ресурса
Просмотр статистики
Поделиться:
Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.