دراسة فعالية منهجيات تقطيع النصوص في أنظمة استرجاع المعلومات الأكاديمية العربية

Authors

  • جعفر سلمان قسم تكنولوجيا المعلومات-كلية تكنولوجيا المعلومات والاتصالات-جامعة طرطوس الحكومية-طرطوس-سوريا
  • ريم مهنا قسم تكنولوجيا المعلومات-كلية تكنولوجيا المعلومات والاتصالات-جامعة طرطوس الحكومية-طرطوس-سوريا

Keywords:

Information Retrieval, Artificial Intelligence, Text Search, Text Chunking, Intelligent Systems, Retrieval-Augmented Generation (RAG).

Abstract

Extracting accurate information from long text documents faces challenges related to speed and efficiency. Intelligent retrieval systems based on artificial intelligence techniques are promising solutions, especially when integrated with large language models like ChatGPT, providing them with selected information from reliable and up-to-date sources.

This research represents the first experimental analytical study aimed at designing an integrated academic text retrieval system for the Arabic language, focusing on the academic field. The system relies on dividing documents into chunks and testing its performance across three main methodologies: fixed chunking, structural chunking, and semantic chunking.

The first two methodologies were applied across six different scenarios, which varied between three chunk sizes (small, medium, and large), with and without overlap. Semantic chunking, on the other hand, represented the seventh scenario. These scenarios were tested on a set of peer-reviewed scientific articles from the University of Tartous Journal, with the objective of evaluating chunk independence, segmentation success, and retrieval accuracy at different levels of returned results.

The results showed that semantic chunking achieved the highest accuracy at the top result, with excellent performance when multiple results were allowed. In contrast, the other methodologies produced lower accuracy in the first result but showed clear improvement when the top three to four results were considered.

These findings emphasis the importance of selecting an appropriate chunking methodology to improve text retrieval systems, and represent a step towards developing more accurate and efficient Arabic academic solutions.

Downloads

Published

2026-04-01