Hossam Elshahaby1
Cairo University, Egypt
Title: An end to end system for subtitle text extraction from movie videos
Biography
Biography: Hossam Elshahaby1
Abstract
A new technique for text detection inside a complex graphical background, its extraction, and enhancement to be easily recognized using the optical character recognition (OCR). The technique uses a deep neural network for feature extraction and classifying the text as containing text or not. An Error Handling and Correction (EHC) technique is used to resolve classification errors. A Multiple Frame Integration (MFI) algorithm is introduced to extract the graphical text from its background. Text enhancement is done by adjusting the contrast, minimize noise, and increasing the pixels resolution. A standalone software Component-Off-The-Shelf (COTS) is used to recognize the text characters and qualify the system performance. Generalization for multilingual text is done with the proposed solution. A newly created dataset containing videos with different languages is collected for this purpose to be used as a benchmark. A new HMVGG16 Convolutional Neural Network (CNN) is used for frame classification as text containing or non-text containing, has accuracy equals to 98%. The introduced system weighted average caption extraction accuracy equals to 96.15%. The Correctly Detected Characters (CDC) average recognition accuracy using the Abbyy SDK OCR engine equals 97.75%.