An end to end system for subtitle text extraction from movie videos | Hossam Elshahaby1 | Cairo University, Egypt |

Big Data Analysis and Data Mining

August 09-10, 2021

Knowledge discovery in databases: Step towards recovering economy after the pandemic: Covid-19

Hossam Elshahaby1

Cairo University, Egypt

Title: An end to end system for subtitle text extraction from movie videos

Biography

Biography: Hossam Elshahaby1

Abstract

A new technique for text detection inside a complex graphical background, its extraction, and enhancement to be easily recognized using the optical character recognition (OCR). The technique uses a deep neural network for feature extraction and classifying the text as containing text or not. An Error Handling and Correction (EHC) technique is used to resolve classification errors. A Multiple Frame Integration (MFI) algorithm is introduced to extract the graphical text from its background. Text enhancement is done by adjusting the contrast, minimize noise, and increasing the pixels resolution. A standalone software Component-Off-The-Shelf (COTS) is used to recognize the text characters and qualify the system performance. Generalization for multilingual text is done with the proposed solution. A newly created dataset containing videos with different languages is collected for this purpose to be used as a benchmark. A new HMVGG16 Convolutional Neural Network (CNN) is used for frame classification as text containing or non-text containing, has accuracy equals to 98%. The introduced system weighted average caption extraction accuracy equals to 96.15%. The Correctly Detected Characters (CDC) average recognition accuracy using the Abbyy SDK OCR engine equals 97.75%.