Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
commercial_detection_and_replacement.rst 32.9 KiB
Newer Older
Aryan Nanda's avatar
Aryan Nanda committed

.. _gsoc-proposal-template:

Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
###############################################################################

Introduction
*************

The BeagleBone® AI-64 from the BeagleBoard.org Foundation is a complete system for developing artificial intelligence (AI) and machine-learning solutions with the convenience and expandability of the BeagleBone platform and onboard peripherals to start learning and building applications.
Aryan Nanda's avatar
Aryan Nanda committed
Leveraging the capabilities of BeagleBoard’s powerful processing units, the project will focus on creating a real-time, efficient solution that enhances media consumption experiences by seamlessly integrating custom audio streams during commercial breaks.

Summary links
=============

- **Contributor:** `Aryan Nanda <https://forum.beagleboard.org/u/aryan_nanda>`_
- **Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
Aryan Nanda's avatar
Aryan Nanda committed
- **GSoC Repository:** TBD

Status
=======

This project is currently just a proposal.

Proposal
========
Aryan Nanda's avatar
Aryan Nanda committed
- Created accounts across `OpenBeagle <https://openbeagle.org/aryan_nanda>`_, `Discord <https://discord.com/users/758929156401528892>`_ and `Beagle Forum <https://forum.beagleboard.org/u/aryan_nanda>`_
- The PR Request for Cross Compilation: `#185 <https://github.com/jadonk/gsoc-application/pull/185>`_  
- Created a project proposal using the `proposed template <https://gsoc.beagleboard.io/proposals/template.html>`_
- **Resume** - Find my resume `here <https://drive.google.com/file/d/1BblSPdncbjKf4qG7s9ldb7ssIhfGN5bA/view?usp=sharing>`_
Aryan Nanda's avatar
Aryan Nanda committed
- **Forum:** :fab:`discourse` `u/aryan_nanda <https://forum.beagleboard.org/u/aryan_nanda>`_ 
- **OpenBeagle:** :fab:`gitlab` `aryan_nanda <https://openbeagle.org/aryan_nanda>`_
- **Github:** :fab:`github` `AryanNanda17  <https://github.com/AryanNanda17>`_
- **School:** :fas:`school` `Veermata Jijabai Technological Institute (VJTI) <https://vjti.ac.in/>`_
- **Country:** :fas:`flag` India
- **Primary language:** :fas:`language` English, Hindi
Aryan Nanda's avatar
Aryan Nanda committed
- **Typical work hours:** :fas:`clock` 9AM-5PM Indian Standard Time
- **Previous GSoC participation:** :fab:`google` This would be my first time participating in GSOC
Aryan Nanda's avatar
Aryan Nanda committed
**About the Project**
**********************

**Project name:** Enhanced Media Experience with AI-Powered Commercial Detection and Replacement

Description
============

Aryan Nanda's avatar
Aryan Nanda committed
I propose developing **GStreamer Plugins** capable of processing video inputs based on their classification. 
The plugins will identify commercials and either replace them with alternative content or obscure them, 
Aryan Nanda's avatar
Aryan Nanda committed
while also substituting the audio with predefined streams. This enhancement aims to improve the media 
consumption experience by eliminating unnecessary interruptions. I intend to **explore various video 
classification models** to achieve accurate detection and utilize TensorFlow Lite to leverage the **native 
accelerators of BeagleBone AI-64** for high-performance, real-time inferencing with minimal latency. 
I believe real-time high-performance would be the most critical thing for this project and I intend on testing 
a few different ways to see which one works best. 

Goals and Objectives
=====================

The goal of this project is to detect and replace commercials in video streams on BeagleBoard hardware
using a GStreamer pipeline which includes a model that accurately detects commercials with minimal latency.
Comparison of different model accuracy can be done by doing some manual analysis and trying different video
classification models and to finally use the best performing option to be included in the GStreamer pipeline 
for inferencing of real-time videos. This would be the result presented at the end of the project timeline. 
For phase 1 evaluation, the goal is to build a training dataset, preprocess it and fine-tune and train a Video
Aryan Nanda's avatar
Aryan Nanda committed
Classification model. For phase 2 evaluation, the goal is to use the the best model identified in phase 1 for 
commercial detection and build a GStreamer pipeline and use native accelerators present in BeagleBone AI-64 
for high-performance. 
Aryan Nanda's avatar
Aryan Nanda committed

Aryan Nanda's avatar
Aryan Nanda committed
In order to accomplish this project the following objectives need to be met.
Aryan Nanda's avatar
Aryan Nanda committed

1. Phase 1:-
    - Develop a dataset of videos and corresponding labels indicating the presence of commercials in specific segments.
    - Preprocess the dataset to ensure it's suitable for input into deep learning models. Moreover divide the datset into train, validation and test set. 
    - Apply transfer learning and Fine-tune various deep learning models and train them on the prepared dataset to identify the most accurate one for commercial detection in videos.
Aryan Nanda's avatar
Aryan Nanda committed
    - Save all trained models to local disk and perform real-time inference using OpenCV to determine the model that yields the best results with high-performance.  
2. Phase 2:-
    - Based on all the options tried in Phase 1, decide on the final model to be used in the GStreamer pipeline. 
    - Compiling the model and generating artifacts so that we can use it in TFLite Runtime. 
    - Building a GStreamer pipeline that would take real-time input of media and would identify the commercial segments in it. 
    - If the commercial segment is identified the GStreamer pipeline would either replace them with alternative content or obscure them, while also substituting the audio with predefined streams.
    - I will also try to cut the commercial out completely and splice the ends. 
Aryan Nanda's avatar
Aryan Nanda committed
    - Enhancing the Real-time performance using native hardware Accelerators present in BeagleBone AI-64.
Aryan Nanda's avatar
Aryan Nanda committed

**Methods**
***********************
In this section, I will individually specify the training dataset, model, GStreamer Pipeline etc. methods that I plan on using in greater details. 

Aryan Nanda's avatar
Aryan Nanda committed
Building training Dataset and Preprocessing
============================================
To train the model effectively, we need a dataset with accurate labels. Since a suitable commercial video
dataset isn't readily available, I'll create one. This dataset will consist of two classes: commercial and
non-commercial. By dividing the dataset into Commercial and Non-Commercial segments, I am focusing more on 
"Content Categorization". Separating the dataset into commercials and non-commercials allows our model to 
learn distinct features associated with each category. For commercials, this might include fast-paced editing, 
product logos, specific jingles, or other visual/audio cues. Non-commercial segments may include slower-paced 
scenes, dialogue, or narrative content.


To build this dataset, I'll refer to the `Youtube-8M dataset <https://research.google.com/youtube8m/>`_, 
which includes videos categorized as TV advertisements. However, since the Youtube-8M dataset provides encoded 
feature vectors instead of the actual videos, direct usage would result in significant latency. Therefore, 
I'll use it as a reference and download the videos labeled by it as advertisements to build our dataset. 
I will use web scraper to automate this process by extracting URLs of the commercial videos. For the 
non-commercial part, I will download random videos from other categories of Youtube-8M dataset. 
After the dataset is ready I will preprocess it to ensure it's suitable for input into deep learning models.
Aryan Nanda's avatar
Aryan Nanda committed


Aryan Nanda's avatar
Aryan Nanda committed
Moreover I'll divide the datset into train, validation and test set. To address temporal dependencies during 
training, I intend to employ random shuffling of the dataset using 
```tf.keras.preprocessing.image_dataset_from_directory() with shuffle=True```. This approach ensures that 
videos from different folders are presented to the model randomly, allowing it to learn scene change detection 
effectively.  
Aryan Nanda's avatar
Aryan Nanda committed


Video Classification models
============================
MoViNets is a good model for our task as it can operate on streaming videos for online inference. The main reason behind trying out MoViNets first is becaue it does quick and continuous analysis of incoming video streams. MoViNet utilizes NAS(Neural Architecture Search) to balance accuracy and efficiency, incorporates stream buffers for constant memory usage, and improves accuracy via temporal ensembles. The MoViNet architecture uses 3D convolutions that are "causal". Causal convolution ensures that the output at time t is computed using only inputs up to time t. This allows for efficient streaming. 
This make MoViNets a perfect choice for our case. 


Since we don't have a big dataset, we will use the pre-trained MoViNets model as a feature extractor 
and fine-tune it on our dataset. I will remove the classification layers of MoViNets and use its 
pre-trained weights to extract features from our dataset. Then, train a smaller classifier (e.g., a few fully connected layers) on top of these features.
This way we can use the features learned by MoViNets on the larger dataset with minimal risk of overfitting.
This can help improve the model's performance even with limited data.
Aryan Nanda's avatar
Aryan Nanda committed


- Source -> `MoViNets: Mobile Video Networks for Efficient Video Recognition <https://arxiv.org/pdf/2103.11511.pdf>`_
Aryan Nanda's avatar
Aryan Nanda committed
.. image:: Assets/Figure1.png
   :alt: Stream buffer in MoViNets

.. centered::
    Figure 1: Stream buffer in MoViNets

- Source -> `TensorFlow: transfer_learning_with_movinet <https://www.tensorflow.org/tutorials/video/transfer_learning_with_movinet>`_
Aryan Nanda's avatar
Aryan Nanda committed
.. image:: Assets/Figure2.png
   :alt: Standard Convolution Vs Causal Convolution

.. centered::
    Figure 2: Standard Convolution Vs Causal Convolution

Aryan Nanda's avatar
Aryan Nanda committed

Aryan Nanda's avatar
Aryan Nanda committed
If MoViNet does not perform well than we can use other models like ResNet-50+LSTMs. Since a video is just a series of frames, a naive video classification method would be pass each frame from a video file through a CNN, classify each frame individually and independently of each other, choose the label with the largest corresponding probability, label the frame, and assign the most assigned image label to the video.
Aryan Nanda's avatar
Aryan Nanda committed
To solve the problem of "prediction flickering", where the label for the video changes rapidly when scenes get labeled differently. I will use **rolling prediction averaging** to reduce “flickering” in results. And I will maintain a queue to store the last few frames and whenever a scene change is detected, all frames in the queue would be marked with the current result, allowing for retroactive scene modification.
The depth of the queue will be determined through experimentation to find the optimal setting.
Aryan Nanda's avatar
Aryan Nanda committed

The Conv+LSTMs model will perform well as it considers both the spatial and temporal features of videos just like a Conv3D model. The only reason it is not my first choice is because MoViNets are considered to be better for real-time performance.

- Source -> `deep-learning-for-videos-action-recognition-review <https://blog.qure.ai/notes/deep-learning-for-videos-action-recognition-review>`_
Aryan Nanda's avatar
Aryan Nanda committed
.. image:: Assets/Figure3.png
   :alt: Conv3D+LSTMs

.. centered::
    Figure 3: Conv+LSTMs

Optional method with Video Vision Transformers
-----------------------------------------------
This is a pure Transformer based model which extracts spatio-temporal tokens from the input video, which are then encoded by a series of transformer layers.
I have kept this as an optional method because our problem is of binary classification(either Commercial or Non-Commercial), so using such a complex model for this small problem may not be as efficient as other models.

Choosing the Best Performing model
===================================
After training the models, I'll assess their performance using evaluation metrics and conduct real-time inference on a sample video containing both commercial and non-commercial segments. I'll select the model with the highest accuracy and integrate it into the GStreamer pipeline for further processing.


Model Execution on BeagleBone AI-64
=====================================
Aryan Nanda's avatar
Aryan Nanda committed
BeagleBone AI-64 Linux for Edge AI supports importing pre-trained custom models to run inference on target. Moreover, Edge AI BeagleBone AI-64 images have TensorFlow Lite already installed with acceleration enabled.
Aryan Nanda's avatar
Aryan Nanda committed
The Debian-based SDK makes use of pre-compiled DNN (Deep Neural Network) models and performs inference using various OSRT (open source runtime) such as TFLite runtime, ONNX runtime etc. 

In order to infer a DNN, SDK expects the DNN and associated artifacts in the below directory structure.

.. code-block:: text

    project_root

    ├── param.yaml

    ├── artifacts
    │   ├── 264_tidl_io_1.bin
    │   ├── 264_tidl_net.bin
    │   ├── 264_tidl_net.bin.layer_info.txt
    │   ├── 264_tidl_net.bin_netLog.txt
    │   ├── 264_tidl_net.bin.svg
    │   ├── allowedNode.txt
    │   └── runtimes_visualization.svg

    └── model
        └── ssd_mobilenet_v2_300_float.tflite

Aryan Nanda's avatar
Aryan Nanda committed
1. model: This directory contains the DNN being targeted to infer.
Aryan Nanda's avatar
Aryan Nanda committed

2. artifacts: This directory contains the artifacts generated after the compilation of DNN for SDK.

3. param.yaml: A configuration file in yaml format to provide basic information about DNN, and associated pre and post processing parameters.

Therefore, after choosing the model to be used in GStreamer pipeline, I will generate the artifacts directory by following `these <https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/README.md>`_ instructions.

- Source -> `edgeai-tidl-tools <https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/README.md>`_
Aryan Nanda's avatar
Aryan Nanda committed
.. image:: Assets/Figure4.png
   :alt: TFLite Runtime

.. centered::
    Figure 4: TFLite Runtime

GStreamer Pipeline
===================
The data flow in the GStreamer pipeline at a high level can be split into 3-parts:-

1. Input Pipeline - Grabs a frame from the input source. 
2. Output Pipeline - Sends the output to the display.
3. Compute Pipeline - Performs pre-processing, inference and post-processing.

I will create a GStreamer Pipeline that will receive an input of a video and it will grab it frame by frame. The frame will be split into two paths. 
The “analytics” path resizes the input maintaining the aspect ratio and crops the input to match the resolution required to run the deep learning network.
The “visualization” path is provided to the post-processing module which overlays the detected classes. If a commercial video is detected, we apply blurring to the video frames and replace the audio.
If a non-commercial video is detected, proceed with the normal visualization process without blurring or replacing the audio.
Post-processed output is given to HW mosaic plugin which positions and resizes the output window on an empty background before sending to display.

The following GStreamer input and output pipeline describes how I will build the GStreamer pipeline:-

- GStreamer input pipeline:

.. code-block:: c++

    v4l2src device=/dev/video0 ! \
    video/x-raw,format=NV12,width=1920,height=1080 ! \
    tiovxmultiscaler name=split_01 ! \
    split_01. ! queue ! video/x-raw, width=320, height=320 ! \
    tiovxdlpreproc data-type=10 channel-order=1 mean-0=128.000000 mean-1=128.000000 mean-2=128.000000 scale-0=0.007812 scale-1=0.007812 scale-2=0.007812 tensor-format=rgb out-pool-size=4 ! \
    application/x-tensor-tiovx ! appsink name=pre_0 max-buffers=2 drop=true \
    split_01. ! queue ! video/x-raw, width=1280, height=720 ! \
    tiovxdlcolorconvert target=1 out-pool-size=4 ! \
    video/x-raw, format=RGB ! appsink name=sen_0 max-buffers=2 drop=true

- GStreamer output pipeline:

.. code-block:: c++

    appsrc format=GST_FORMAT_TIME is-live=true block=true do-timestamp=true name=post_0 ! \
        tiovxdlcolorconvert ! video/x-raw,format=NV12,width=1280,height=720 ! \
        queue ! mosaic_0.sink_0

    appsrc format=GST_FORMAT_TIME block=true num-buffers=1 name=background_0 ! \
        tiovxdlcolorconvert ! video/x-raw,format=NV12,width=1920,height=1080 ! \
        queue ! mosaic_0.background

    tiovxmosaic name=mosaic_0 \
        sink_0::startx=320  sink_0::starty=180  sink_0::width=1280  sink_0::height=720 \
        ! video/x-raw,format=NV12,width=1920,height=1080 ! \
        kmssink sync=false driver-name=tidss

- GStreamer Compute Pipeline
NNStreamer provides efficient and flexible data streaming for machine learning applications, making it suitable for tasks such as running inference on video frames.
So, I will use NNStreamer elements to do inferencing of videos. 

Aryan Nanda's avatar
Aryan Nanda committed
.. image:: Assets/Figure5.png
   :alt: GStreamer Pipeline

.. centered::
    Figure 5: GStreamer Pipeline


.. image:: Assets/Figure6.png
   :alt: Project Workflow

.. centered::
    Figure 6: Project Workflow
Aryan Nanda's avatar
Aryan Nanda committed
- `Python <https://www.python.org/>`_
- `C++ <https://isocpp.org/>`_
- `TensorFlow <https://www.tensorflow.org/>`_
Aryan Nanda's avatar
Aryan Nanda committed
- `TFLite <https://www.tensorflow.org/lite>`_ 
Aryan Nanda's avatar
Aryan Nanda committed
- `GStreamer <https://gstreamer.freedesktop.org/>`_
- `OpenCV <https://opencv.org/>`_
- `Build Systems <https://www.gnu.org/software/make/>`_
Aryan Nanda's avatar
Aryan Nanda committed
- Ability to capture and display video streams using `BeagleBone AI-64 <https://www.beagleboard.org/boards/beaglebone-ai-64>`_
Aryan Nanda's avatar
Aryan Nanda committed
**Timeline**
Aryan Nanda's avatar
Aryan Nanda committed
*************


Timeline summary
=================

.. table:: 

    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Date                   | Activity                                                                                                                                               |                                  
    +========================+========================================================================================================================================================+
    | February 26 - March 3  | Connect with possible mentors and request review on first draft                                                                                        |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | March 4 - March 10     | Complete prerequisites, verify value to community and request review on second draft                                                                   |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | March 11 - March 20    | Finalized timeline and request review on final draft                                                                                                   |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | March 21 - April 2     | Proposal review and Submit application                                                                                                                 |                               
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
Aryan Nanda's avatar
Aryan Nanda committed
    | April 3 - May 1        | Understanding GStreamer pipeline and TFLite runtime of BeagleBone AI-64.                                                                               |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | May 2 - May 10         | Start bonding and Discussing implementation ideas with mentors.                                                                                        |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | May 11 - May 31        | Focus on college exams.                                                                                                                                |                                  
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | June 1 - June 3        | Start coding and introductory video                                                                                                                    |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | June 3 - June 9        | :ref:`milestone #1<Milestone1>`  ->   Releasing introductory video and developing Commercial dataset                                                   |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | June 10 - June 16      | :ref:`milestone #2<Milestone2>`           ->   Developing Non-Commercial dataset and dataset Preprocessing                                             |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | June 17 - June 23      | :ref:`milestone #3<Milestone3>`           ->   Transfer learning and fine-tuning MoViNets architecture                                                 |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | June 24 - June 30      | :ref:`milestone #4<Milestone4>`           ->   Transfer learning and fine-tuning ResNet architecture                                                   |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | July 1 - July 7        | :ref:`milestone #5<Milestone5>`           ->   Evaluate performance metrics to choose the best-performing model.                                       |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | July 8 - July 14       | :ref:`Submit midterm evaluations <Submit midterm evaluation>`                                                                                          |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | July 15 - July 21      | :ref:`milestone #6<Milestone6>`           ->   Finalizing the best model by performing real-time inferencing                                           |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | July 22 - July 28      | :ref:`milestone #7<Milestone7>`           ->   Compiling the model and generating artifacts and building pre-processing part of GStreamer pipeline     |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | July 29 - August 4     | :ref:`milestone #8<Milestone8>`           ->   Building the compute pipeline using NNStreamer                                                          |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
Aryan Nanda's avatar
Aryan Nanda committed
    | August 5 - August 11   | :ref:`milestone #9<Milestone9>`            ->   Building the post-processing part of GStreamer pipeline                                                |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
Aryan Nanda's avatar
Aryan Nanda committed
    | August 12 - August 18  | :ref:`milestone #10<Milestone10>`        ->   Enhancing real-time performance                                                                          | 
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | August 19              | :ref:`Submit final project video, submit final work to GSoC site and complete final mentor evaluation<Final project video>`                            |
    +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+

Timeline detailed
Aryan Nanda's avatar
Aryan Nanda committed
==================
Aryan Nanda's avatar
Aryan Nanda committed
Community Bonding Period (May 1st - May 10th)
==============================================

- Discuss implementation ideas with mentors.
Aryan Nanda's avatar
Aryan Nanda committed
- Discuss the scope of the project.
Aryan Nanda's avatar
Aryan Nanda committed
.. _Milestone1:

Milestone #1, Introductory YouTube video (June 3rd)
===================================================

Aryan Nanda's avatar
Aryan Nanda committed
- Making an Introductory Video.
- Commercial dataset acquisition:
        - Web scrape videos marked as advertisements from YouTube 8-M dataset.
        - Ensure proper labeling and categorization of commercial videos.
Aryan Nanda's avatar
Aryan Nanda committed

.. _Milestone2:

Milestone #2 (June 10th)
==========================

- Non-commercial dataset acquisition:
    - Web scrape random videos from other categories of YouTube 8-M dataset.
    - Ensure diversity and relevance of non-commercial videos.
- Dataset preprocessing:
    - Preprocess acquired datasets for suitability in deep learning models.
    - Divide datasets into train, validation, and test sets.
    - Perform random shuffling of data to maintain temporal dependencies. 
Aryan Nanda's avatar
Aryan Nanda committed

.. _Milestone3:

Milestone #3 (June 17th)
=========================

- Transfer learning and fine-tuning MoViNets architecture:
    - Apply transfer learning on MoViNets and fine-tune its last few layers.
    - Train MoViNets on the prepared dataset for video classification.
Aryan Nanda's avatar
Aryan Nanda committed

.. _Milestone4:

Milestone #4 (June 24th)
==========================

- Transfer learning and fine-tuning ResNet architecture:
    - Adding additional layers of LSTMs for extracting temporal dependencies.
    - Developing ResNet-LSTMs model architecture for video classification.
    - Train the ResNet-LSTMs model on the prepared dataset.
Aryan Nanda's avatar
Aryan Nanda committed

.. _Milestone5:

Milestone #5 (July 1st)
========================
- Finalize the best model:
    - Save all trained models to local disk
    - Evaluate performance metrics to choose the best-performing model.

.. _Submit midterm evaluation:

Submit midterm evaluations (July 8th)
=====================================

- Document the progress made during the first phase of the project.
Aryan Nanda's avatar
Aryan Nanda committed

.. important:: 
    
    **July 12 - 18:00 UTC:** Midterm evaluation deadline (standard coding period) 

Aryan Nanda's avatar
Aryan Nanda committed
.. _Milestone6:

Milestone #6 (July 15th)
=========================

- Finalize the best model:
    - Perform real-time inference using OpenCV to determine the model that yields the best results with high-performance.
    - Based on all the options tried in Phase 1, decide on the final model to be used in the GStreamer pipeline.
Aryan Nanda's avatar
Aryan Nanda committed

.. _Milestone7:

Milestone #7 (July 22nd)
=========================

- Compile the chosen model and generate artifacts for TFLite runtime.
- Building the pre-processing part of GStreamer pipeline:
    - Develop the pre-processing module to prepare video frames for inference. 
Aryan Nanda's avatar
Aryan Nanda committed

.. _Milestone8:

Milestone #8 (July 29th)
=========================

- Building the compute pipeline using NNStreamer:
    - Implement NNStreamer for inferencing videos using the compiled model.
Aryan Nanda's avatar
Aryan Nanda committed

.. _Milestone9:

Milestone #9 (Aug 5th)
=======================

- Building the post-processing part of GStreamer pipeline:
    - Develop the post-processing module to perform actions based on classification results.
    - Implement replacement or obscuring of commercial segments and audio substitution.
Aryan Nanda's avatar
Aryan Nanda committed

.. _Milestone10:

Milestone #10 (Aug 12th)
========================

- Enhancing real-time performance:
    - Optimize the GStreamer pipeline for real-time performance using native hardware accelerators.
    - Ensure smooth and efficient processing of video streams.

Aryan Nanda's avatar
Aryan Nanda committed

.. _Final project video:

Final YouTube video (Aug 19th)
===============================

Aryan Nanda's avatar
Aryan Nanda committed
- Submit final project video, submit final work to GSoC site and complete final mentor evaluation.

Final Submission (Aug 24nd)
============================

.. important::

    **August 19 - 26 - 18:00 UTC:** Final week: GSoC contributors submit their final work 
    product and their final mentor evaluation (standard coding period)

    **August 26 - September 2 - 18:00 UTC:** Mentors submit final GSoC contributor 
    evaluations (standard coding period)

Initial results (September 3)
=============================

.. important:: 
    **September 3 - November 4:** GSoC contributors with extended timelines continue coding

    **November 4 - 18:00 UTC:** Final date for all GSoC contributors to submit their final work product and final evaluation

    **November 11 - 18:00 UTC:** Final date for mentors to submit evaluations for GSoC contributor projects with extended deadline

***********************

Aryan Nanda's avatar
Aryan Nanda committed
This project requires prior experience with machine learning, multimedia processing and embedded systems. 

- As a good starting point for this project, I build a `Sports Video Classification model <https://github.com/AryanNanda17/VideoProcessing-Based-on-Video-classifcation>`_ and did **Video Processing on it based on Video classification** using OpenCV(Video Processing based on Video classification is an important part of the project). `Demo <https://youtu.be/hoKE2dr2nT4>`_ :fas:`external-link`
- We will be building Pure C++ GStreamer pipeline from input to output so experience with C++ Codebases and build systems is required.
Aryan Nanda's avatar
Aryan Nanda committed
    - Relevant contribution - `#123 <https://github.com/SRA-VJTI/Pixels_Seminar/pull/123>`_ (OpenCV/C++)
Aryan Nanda's avatar
Aryan Nanda committed
- I have Previously Worked on the project `GestureSense <https://github.com/AryanNanda17/GestureSense/blob/master/GestureDetection/BgEliminationAndMotionDetection.py>`_ in which I did Image Processing based on Image classification using OpenCV/Python. 
- I have past experience with esp-32 microcontroller and I have Previously Worked on a project `Multi-Code-Esp <https://github.com/AryanNanda17/multi_code_esp>`_ in which I build a multi-code esp component. 
- Experience in Open-Source
    Contributed at pymc repository. Added enhancements.
Aryan Nanda's avatar
Aryan Nanda committed
        - `#7132 <https://github.com/pymc-devs/pymc/pull/7132>`_ (merged)
        - `#7125 <https://github.com/pymc-devs/pymc/pull/7125>`_ (merged)
    Resolved one issue in OpenCV repository (Improved Documentation).
Aryan Nanda's avatar
Aryan Nanda committed
        - `#22177 <https://github.com/opencv/opencv/issues/22177>`_ (merged)
Aryan Nanda's avatar
Aryan Nanda committed
- Contributions in `openbeagle.org/gsoc <https://openbeagle.org/gsoc/gsoc.beagleboard.io>`_ 
    - Resolved pdf pageBreak issue - `#33 <https://openbeagle.org/gsoc/gsoc.beagleboard.io/-/merge_requests/33>`_ (merged)
    - Added New idea - `#25 <https://openbeagle.org/gsoc/gsoc.beagleboard.io/-/merge_requests/25>`_ (merged)
    - Improved Documentation - `#23 <https://openbeagle.org/gsoc/gsoc.beagleboard.io/-/merge_requests/23#note_18477>`_ (merged)
    
Contingency
===========
Aryan Nanda's avatar
Aryan Nanda committed
- If I get stuck on my project and my mentor isn’t around, I will use the following resources:-
    - `MoViNets <https://www.tensorflow.org/hub/tutorials/movinet>`_
    - `GStreamer Docs <https://gstreamer.freedesktop.org/>`_
Aryan Nanda's avatar
Aryan Nanda committed
    - `BeagleBone AI-64 <https://docs.beagleboard.org/latest/boards/beaglebone/ai-64/01-introduction.html>`_
    - `NNStreamer <https://nnstreamer.github.io/>`_
Aryan Nanda's avatar
Aryan Nanda committed
- Moreover, the BeagleBoard community is extremely helpful and active in resolving doubts, which makes it a great going for the project resources and clarification.
Aryan Nanda's avatar
Aryan Nanda committed
- I intend to remain involved and provide ongoing support for this project beyond the duration of the GSOC timeline.
Aryan Nanda's avatar
Aryan Nanda committed
This project will not only enhance the media consumption experience for users of BeagleBoard hardware but also serve as an educational resource on integrating AI and machine learning capabilities into embedded systems. It will provide valuable insights into:

- The practical challenges of deploying neural network models in resource-constrained environments.
- The development of custom GStreamer plugins for multimedia processing.
- Real-world applications of machine learning in enhancing digital media experiences.
Aryan Nanda's avatar
Aryan Nanda committed
- The PR Request for Cross Compilation: `#185 <https://github.com/jadonk/gsoc-application/pull/185>`_
- Relevant Coursework: `Neural Networks and Deep Learning <https://www.coursera.org/account/accomplishments/verify/LKHTEA9XRWML>`_, `Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization <https://www.coursera.org/account/accomplishments/verify/E52UFAHAY5UG>`_, `Convolutional Neural Networks <https://www.coursera.org/account/accomplishments/verify/9L4QL25AEL3L>`_