Afzal, Muhammad

Classification of Live Video Stream from Pakistani News Channels (Urdu) using Deep Learning Latest Techniques / Muhammad Afzal - 134p. Soft Copy 30cm

In our contemporary era, information is of prime importance and its dominant use by
social media and TV channels for making public opinion and cultural influence is quite evident.
Videos form the major portion of media and contain more elaborate information than a single
image. Today, videos are piling up in millions every day and their segregation, classification and
analysis are upheaval tasks. Live TV video stream contains voice, metadata and image frames
full of multiple information including written scripts etc. which can contribute to video
classification. But utilization of each type of data we need to do a separate study. However, we
have focused on classification of video stream using deep learning (DL) neural networks which
are well established solutions for images and small videos classification and gesture recognition.
In our study, we have suggested a mechanism for classification of big or live video
streams obtained from Pakistani TV News Channels into 5 classes (Advertisement, News, Talk
Show, Sports & Entertainment Program) using supervised DL pretrained neural networks. Due to
non-availability of authentic dataset on this subject, we have created a customized data of videos
recorded (approximately 335 hours videos) from various sources like different TV channelsā€˜
websites and YouTube. Videos were processed to extract image frames to prepare a trainable
dataset. For our experimentation, we have mainly used pretrained ResNet variants (ResNet18,
ResNet34, ResNet50, ResNet101 & ResNet152) on ImageNet dataset and few other models like
AlexNet, ConvNeXt_Tiny, DenseNet121, SqueezeNet and VGG11 for comparison purposes.
Then modified the last classification layer of the network as per number of target classes and
finetuned all weights of neural network on the subject dataset. We carried out various
experiments on these neural networks and achieved quite encouraging results having accuracies
ranging from 95% to 99%. For testing of videos on trained models, dynamic averaging time
domain window was applied to diminish the jitters in the output results. This can be useful in
many other applications as well including social media & advertisements analysis, classification
of small videos, industrial and business automation etc.


MS Robotics and Intelligent Machine Engineering

629.8