Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • gsoc/gsoc.beagleboard.io
  • Krishna_13/gsoc.beagleboard.io
  • krvprashanth/gsoc.beagleboard.io
  • lorforlinux/gsoc.beagleboard.io
  • jkridner/gsoc
  • anujdeshpande/gsoc.beagleboard.io
  • ayush1325/gsoc.beagleboard.io
  • samdai/gsoc.beagleboard.io
  • abdelrahman/gsoc.beagleboard.io
  • aryan_nanda/gsoc.beagleboard.io
  • fuadzade/gsoc.beagleboard.io
  • vvaishak/gsoc.beagleboard.io
  • Roger18/gsoc.beagleboard.io
  • mclem/gsoc.beagleboard.io
  • NachtSpyder04/gsoc.beagleboard.io
  • melta101/melta101-gsoc
  • saiprasad-patil/gsoc.beagleboard.io
  • mattd/gsoc.beagleboard.io
  • SurajS0215/gsoc.beagleboard.io
  • jarm/gsoc.beagleboard.io
  • ijc/gsoc.beagleboard.io
  • himanshuk/gsoc.beagleboard.io
  • mahelaekanayake10/gsoc.beagleboard.io
  • alecdenny/gsoc.beagleboard.io
  • darshan15/gsoc.beagleboard.io
  • san.s.kar03/gsoc.beagleboard.io
  • jjateen/gsoc.beagleboard.io
  • vidhusarwal/gsoc.beagleboard.io
  • giuliomoro/gsoc.beagleboard.io
  • ketanthorat/gsoc.beagleboard.io
  • Sahil7741/gsoc.beagleboard.io
  • Whiz-Manas/mana-gsoc-beagleboard-io
32 results
Show changes
Commits on Source (153)
Showing
with 1184 additions and 66 deletions
image: beagle/sphinx-build-env:latest
# The Docker image that will be used to build your app
image: registry.git.beagleboard.org/docs/sphinx-build-env:latest
pages:
tags:
- docker-amd64
before_script:
- source ./venv-build-env.sh
script:
- "./gitlab-build.sh"
artifacts:
paths:
- public
\ No newline at end of file
- public
.. _C:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/c.html
.. _Assembly:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/assembly.html
.. _Verilog:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/verilog.html
.. _Zephyr:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/zephyr.html
.. _Linux:
https://docs.beagleboard.cc/latest/intro/beagle101/linux.html
.. _device-tree:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/device-tree.html
.. _FPGA:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/fpga.html
.. _basic wiring:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/basic-wiring.html
.. _motors:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/motors.html
.. _embedded serial interfaces:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/embedded-serial.html
.. _OpenBeagle CI:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/openbeagle-ci.html
.. _verification:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/verification.html
.. _wireless communications:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/wireless-communications.html
.. _Buildroot:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/buildroot.html
.. _RISC-V ISA:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/riscv.html
\ No newline at end of file
......@@ -17,15 +17,24 @@ from sphinx.application import Sphinx
sys.path.append(str(Path(".").resolve()))
project = 'gsoc.beagleboard.io'
copyright = '2024, BeagleBoard.org'
copyright = '2025, BeagleBoard.org'
author = 'BeagleBoard.org'
# Add epilog details to rst_epilog
rst_epilog =""
rst_epilog_path = "_static/epilog/"
for (dirpath, dirnames, filenames) in os.walk(rst_epilog_path):
for filename in filenames:
with open(dirpath + filename) as f:
rst_epilog += f.read()
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = [
"sphinx_design",
"sphinxcontrib.youtube",
"sphinxcontrib.images",
"sphinx_copybutton"
]
......@@ -122,8 +131,8 @@ html_theme_options = {
"use_edit_page_button": True,
"show_toc_level": 1,
"navbar_align": "right",
"show_nav_level": 2,
"announcement": "Welcome to the new site for BeagleBoard.org GSoC 2024 projects!",
"show_nav_level": 1,
"announcement": "Welcome to the site for BeagleBoard.org GSoC 2025 projects!",
# "show_version_warning_banner": True,
"navbar_center": ["navbar-nav"],
"navbar_start": ["navbar-logo"],
......@@ -168,7 +177,7 @@ html_context = {
latex_elements = {
"papersize": "a4paper",
"maketitle": open("_static/latex/title.tex").read(),
"preamble": open("_static/latex/preamble.tex").read(),
"preamble": open("_static/latex/preamble.tex").read() + r"\let\cleardoublepage\clearpage",
"sphinxsetup": ",".join(
(
"verbatimwithframe=false",
......@@ -181,6 +190,7 @@ latex_elements = {
),
}
sd_fontawesome_latex = True
latex_engine = "xelatex"
latex_logo = str("_static/images/logo-latex.pdf")
latex_documents = []
......
......@@ -12,14 +12,14 @@ Guides
Spend your summer break writing code and learning about open source development while earning money!
Accepted contributors work with a mentor and become a part of the open source community. Many become lifetime
open source developers! The 2024 contributor application window will be open from
`March 18th 2024 <https://developers.google.com/open-source/gsoc/timeline#march_18_-_1800_utc>`_ to
`April 2nd 2024 <https://developers.google.com/open-source/gsoc/timeline#april_2_-_1800_utc>`_!
open source developers! The 2025 contributor application window will be open from
`March 24 2025 <https://opensource.googleblog.com/2025/01/google-summer-of-code-2025-is-here.html>`_ to
`April April 8 2025 <https://opensource.googleblog.com/2025/01/google-summer-of-code-2025-is-here.html>`_!
But don't wait for then to engage! Come to our `Discord <https://bbb.io/gsocchat>`_ and
`Forum <https://bbb.io/gsocml>`_ to share ideas today.
This section includes guides for :ref:`contributors <gsoc-contributor-guide>` & :ref:`mentors <gsoc-mentor-guide>` who want to participate
in GSoC 2024 with `BeagleBoard.org <www.beagleboard.org>`_. It's highly recommended to check `GSoC Frequently Asked Questions
in GSoC 2025 with `BeagleBoard.org <www.beagleboard.org>`_. It's highly recommended to check `GSoC Frequently Asked Questions
<https://developers.google.com/open-source/gsoc/faq>`_. For anyone who just want to contribute to this site we also have
a step by step :ref:`contribution guide <gsoc-site-editing-guide>`.
......
......@@ -14,7 +14,7 @@ Mentor Guide
become familiar with the code base and testing practices, to finally releasing their code on
`OpenBeagle <https://openbeagle.org/>`_ for the world to use!
You will also need be invited by an administrator to register on the GSoC site and request
You will also need to be invited by an administrator to register on the GSoC site and request
to be a mentor for `BeagleBoard.org <https://www.beagleboard.org/>`_.
Who Are Mentors?
......@@ -25,7 +25,7 @@ with a GSoC contributor. Mentors provide guidance such as pointers to useful doc
In addition to providing GSoC contributors with feedback and pointers, a mentor acts as an ambassador to help
GSoC contributors integrate into their project’s community. `BeagleBoard.org <https://www.beagleboard.org/>`_
always assigns more than one mentor to each of GSoC contributor. Many members of `BeagleBoard.org <https://www.
beagleboard.org/>`_ community also provide guidance to GSoC contributors without mentoring in an “official”
beagleboard.org/>`_ community also provides guidance to GSoC contributors without mentoring in an “official”
capacity, as much as they would answer anyone’s questions on our `Discord <https://bbb.io/gsocchat>`_ and our
`Forum <https://bbb.io/gsocml>`_.
......@@ -34,7 +34,7 @@ Idea Submission Process
Mentors should:
1. Submit projects ideas to our `Forum <https://bbb.io/gsocml>`_ and then
1. Submit project ideas to our `Forum <https://bbb.io/gsocml>`_ and then
2. Contribute an update to our :ref:`gsoc-project-ideas` page using our :ref:`gsoc-site-editing-guide` to promote their idea to contributors.
Only ideas deemed by administrators as being sufficiently supported by qualified mentors will be merged.
......@@ -44,11 +44,11 @@ Only ideas deemed by administrators as being sufficiently supported by qualified
BeagleBoard.org mentored GSoC projects are supposed to be for software projects that service the Beagle and general open source
embedded systems community, not theses, how-to guides or what I did over my summer vacation ideas.
Prospective mentors, sudents will use our `Discord <https://bbb.io/gsocchat>`_ and `Forum <https://bbb.io/gsocml>`_
Prospective mentors, students will use our `Discord <https://bbb.io/gsocchat>`_ and `Forum <https://bbb.io/gsocml>`_
to make contact with you, so be sure to provide up-to-date information. Please feel free to add yourself on mentors page and we will monitor
and police that list. Acceptance as an official mentor with the ability to rate proposals and grade contributors will come via the Google system.
We will only approve official mentors who have a proven track record with Beagle, but welcome all community members to provide guidance to both
mentors and contributors to best service the community as a whole. Don’t be shy and don’t be offended when we edit. We are thrilled to have you on-board!
mentors and contributors to best serve the community as a whole. Don’t be shy, and don’t be offended when we edit. We are thrilled to have you on board!
......
......@@ -28,30 +28,31 @@ Ideas
| :bdg-info:`Low complexity` | :bdg-info-line:`90 hours` |
+------------------------------------+-------------------------------+
.. card:: Low-latency I/O RISC-V CPU core in FPGA fabric
.. tip::
Below are the latest project ideas, you can also check our our :ref:`gsoc-old-ideas` and :ref:`Past_Projects` for inspiration.
:fas:`microchip;pst-color-primary` FPGA gateware improvements :bdg-success:`Medium complexity` :bdg-success-line:`175 hours`
.. card:: A Conversational AI Assistant for BeagleBoard using RAG and Fine-tuning
^^^^
:fas:`brain;pst-color-secondary` Deep Learning :bdg-success:`Medium complexity` :bdg-success-line:`175 hours`
BeagleV-Fire features RISC-V 64-bit CPU cores and FPGA fabric. In that FPGA fabric, we'd like to
implement a RISC-V 32-bit CPU core with operations optimized for low-latency GPIO. This is similar
to the programmable real-time unit (PRU) RISC cores popularized on BeagleBone Black.
^^^^
| **Goal:** RISC-V-based CPU on BeagleV-Fire FPGA fabric with GPIO
| **Hardware Skills:** Verilog, Verification, FPGA
| **Software Skills:** RISC-V ISA, assembly, `Linux`_
| **Possible Mentors:** `Cyril Jean <https://forum.beagleboard.org/u/vauban>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
BeagleBoard currently lacks an AI-powered assistant to help users troubleshoot errors. This project aims to address that need while also streamlining the onboarding process for new contributors, enabling them to get started more quickly.
| **Goal:** Develop a domain-specific chatbot for BeagleBoard using a combination of RAG and fine-tuning of an open-source LLM (like Llama 3, Mixtral, or Gemma). This chatbot will assist users with troubleshooting, provide information about BeagleBoard products, and streamline the onboarding process for new contributors.
| **Hardware Skills:** Ability to test applications on BeagleBone AI-64/BeagleY-AI and optimize for performance using quantization techniques.
| **Software Skills:** Python, RAG, Scraping techniques, Fine tuning LLMs, Gradio, Hugging Face Inference Endpoints, NLTK/spaCy, Git
| **Possible Mentors:** `Aryan Nanda <https://forum.beagleboard.org/u/aryan_nanda/>`_
++++
.. button-link:: https://forum.beagleboard.org/t/low-latency-risc-v-i-o-cpu-core/37156
.. button-link:: https://forum.beagleboard.org/t/beaglemind/40806
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card:: Update beagle-tester for mainline testing
:fab:`linux;pst-color-primary` Linux kernel improvements :bdg-success:`Medium complexity` :bdg-danger-line:`350 hours`
......@@ -63,8 +64,8 @@ Ideas
and device-tree overlays on various Beagle computers.
| **Goal:** Execution on Beagle test farm with over 30 mikroBUS boards testing all mikroBUS enabled cape interfaces (PWM, ADC, UART, I2C, SPI, GPIO and interrupt) performing weekly mainline Linux regression verification
| **Hardware Skills:** basic wiring, familiarity with embedded serial interfaces
| **Software Skills:** device-tree, `Linux`_, `C`_, continuous integration with GitLab, Buildroot
| **Hardware Skills:** `basic wiring`_, `embedded serial interfaces`_
| **Software Skills:** `device-tree`_, `Linux`_, `C`_, `OpenBeagle CI`_, `Buildroot`_
| **Possible Mentors:** `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_, `Anuj Deshpande <https://forum.beagleboard.org/u/Anuj_Deshpande>`_, `Dhruva Gole <https://forum.beagleboard.org/u/dhruvag2000>`_
++++
......@@ -86,7 +87,7 @@ Ideas
acceptable upstream.
| **Goal:** Add functional gaps, submit upstream patches for these drivers and respond to feedback
| **Hardware Skills:** Familiarity with wireless communication
| **Hardware Skills:** `wireless communications`_
| **Software Skills:** `C`_, `Linux`_
| **Possible Mentors:** `Ayush Singh <https://forum.beagleboard.org/u/ayush1325>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
......@@ -108,7 +109,7 @@ Ideas
needs to be cleaned up. We can also work on support for Raspberry Pi if UCSD releases their Hat for it.
| **Goal:** Update librobotcontrol for Robotics Cape on BeagleBone AI, BeagleBone AI-64 and BeagleV-Fire
| **Hardware Skills:** Basic wiring, some DC motor familiarity
| **Hardware Skills:** `basic wiring`_, `motors`_
| **Software Skills:** `C`_, `Linux`_
| **Possible Mentors:** `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
......@@ -122,7 +123,7 @@ Ideas
.. card:: Upstream Zephyr Support on BBAI-64 R5
:fas:`timeline;pst-color-secondary` RTOS/microkernel imporvements :bdg-success:`Medium complexity` :bdg-success-line:`350 hours`
:fas:`timeline;pst-color-secondary` RTOS/microkernel imporvements :bdg-success:`Medium complexity` :bdg-danger-line:`350 hours`
^^^^
......@@ -143,26 +144,9 @@ Ideas
:fab:`discourse;pst-color-light` Discuss on forum
.. card:: Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
:fas:`brain;pst-color-secondary` Deep Learning :bdg-success:`Medium complexity` :bdg-success-line:`350 hours`
^^^^
Leveraging the capabilities of BeagleBoard’s powerful processing units, the project will focus on creating a real-time, efficient solution that enhances media consumption experiences by seamlessly integrating custom audio streams during commercial breaks.
| **Goal:** Build a deep learning model, training data set, training scripts, and a runtime for detection and modification of the video stream.
| **Hardware Skills:** Ability to capture and display video streams using `Beagleboard ai-64 <https://www.beagleboard.org/boards/beaglebone-ai-64>`_
| **Software Skills:** `Python <https://www.python.org/>`_, `TensorFlow <https://www.tensorflow.org/>`_, `TFlite <https://www.tensorflow.org/lite>`_, `Keras <https://www.tensorflow.org/guide/keras>`_, `GStreamer <https://gstreamer.freedesktop.org/>`_, `OpenCV <https://opencv.org/>`_
| **Possible Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
++++
.. button-link:: https://forum.beagleboard.org/t/enhanced-media-experience-with-ai-powered-commercial-detection-and-replacement/37358
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. button-link:: https://forum.beagleboard.org/tag/gsoc-ideas
:color: danger
......@@ -171,13 +155,7 @@ Ideas
:fab:`discourse;pst-color-light` Visit our forum to see newer ideas being discussed!
.. toctree::
:hidden:
.. tip::
You can also check our our :ref:`gsoc-old-ideas` and :ref:`Past_Projects` for inspiration.
.. _Linux:
https://docs.beagleboard.org/latest/intro/beagle101/linux.html
.. _C:
https://jkridner.beagleboard.io/docs/latest/intro/beagle101/learning-c.html
old/index
\ No newline at end of file
......@@ -17,13 +17,15 @@ into professional automation tasks, is strongly desired.
^^^^
- **Goal:** Complete implementation of librobotcontrol on BeagleBone AI/AI-64.
- **Hardware Skills:** Basic wiring
- **Software Skills:** C, Linux
- **Possible Mentors:** jkridner, lorforlinux
- **Expected Size of Project:** 350 hrs
- **Hardware Skills:** `basic wiring`_, `motors`_
- **Software Skills:** `C`_, `Linux`_
- **Possible Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
- **Expected Size of Project:** 175 hrs
- **Rating:** Medium
- **Upstream Repository:** https://github.com/jadonk/librobotcontrol/tree/bbai
- **References:**
- **Upstream Repository:** `BeagleBoard.org / librobotcontrol · GitLab <https://openbeagle.org/beagleboard/librobotcontrol>`_
- **References:**
- `Robotics Control Library — BeagleBoard Documentation <https://docs.beagle.cc/projects/librobotcontrol/docs/index.html>`_
- `Robot Control Library: Main Page <https://old.beagleboard.org/static/librobotcontrol/>`_
- http://www.strawsondesign.com/docs/librobotcontrol/index.html
++++
......
......@@ -14,6 +14,48 @@ For some background, be sure to check out `simplify embedded edge AI development
<https://e2e.ti.com/blogs_/b/process/posts/simplify-embedded-edge-ai-development>`_
post from TI.
.. card:: Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
:fas:`brain;pst-color-secondary` Deep Learning :bdg-success:`Medium complexity` :bdg-danger-line:`350 hours`
^^^^
Leveraging the capabilities of BeagleBoard’s powerful processing units, the project will focus on creating a real-time, efficient solution that enhances media consumption experiences by seamlessly integrating custom audio streams during commercial breaks.
| **Goal:** Build a deep learning model, training data set, training scripts, and a runtime for detection and modification of the video stream.
| **Hardware Skills:** Ability to capture and display video streams using `BeagleBone AI-64 <https://www.beagleboard.org/boards/beaglebone-ai-64>`_
| **Software Skills:** `Python <https://www.python.org/>`_, `TensorFlow <https://www.tensorflow.org/>`_, `TFlite <https://www.tensorflow.org/lite>`_, `Keras <https://www.tensorflow.org/guide/keras>`_, `GStreamer <https://gstreamer.freedesktop.org/>`_, `OpenCV <https://opencv.org/>`_
| **Possible Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
++++
.. button-link:: https://forum.beagleboard.org/t/enhanced-media-experience-with-ai-powered-commercial-detection-and-replacement/37358
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card:: Embedded differentiable logic gate networks for real-time interactive and creative applications
:fas:`brain;pst-color-secondary` Creative AI :bdg-success:`Medium complexity` :bdg-danger-line:`350 hours`
^^^^
This project seeks to explore the potential of creative embedded AI, specifically using `Differentiable Logic (DiffLogic) <https://github.com/Felix-Petersen/difflogic>`_, by creating a system that can perform tasks like machine listening, sensor processing, sound and gesture classification, and generative AI.
| **Goal:** Develop an embedded machine learning system on BeagleBone that leverages `Differentiable Logic (DiffLogic) <https://github.com/Felix-Petersen/difflogic>`_ for real-time interactive music creation and environment sensing.
| **Hardware Skills:** Audio and sensor IO with `Bela.io <http://bela.io>`_
| **Software Skills:** Machine learning, deep learning, BeagleBone Programmable Real Time Unit (PRU) programming (see `PRU Cookbook <https://docs.beagleboard.org/latest/books/pru-cookbook/index.html>`_).
| **Possible Mentors:** `Jack Armitage <https://forum.beagleboard.org/u/jarm>`_, `Chris Kiefer <https://forum.beagleboard.org/u/luuma>`_
++++
.. button-link:: https://forum.beagleboard.org/t/embedded-differentiable-logic-gate-networks-for-real-time-interactive-and-creative-applications/37768
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card::
:fas:`brain;pst-color-secondary` **YOLO models on the X15/AI-64**
......
......@@ -3,6 +3,29 @@
FPGA based projects
####################
.. card:: Low-latency I/O RISC-V CPU core in FPGA fabric
:fas:`microchip;pst-color-primary` FPGA gateware improvements :bdg-success:`Medium complexity` :bdg-success-line:`175 hours`
^^^^
BeagleV-Fire features RISC-V 64-bit CPU cores and FPGA fabric. In that FPGA fabric, we'd like to
implement a RISC-V 32-bit CPU core with operations optimized for low-latency GPIO. This is similar
to the programmable real-time unit (PRU) RISC cores popularized on BeagleBone Black.
| **Goal:** RISC-V-based CPU on BeagleV-Fire FPGA fabric with GPIO
| **Hardware Skills:** `Verilog`_, `verification`_, `FPGA`_
| **Software Skills:** `RISC-V ISA`_, `assembly`_, `Linux`_
| **Possible Mentors:** `Cyril Jean <https://forum.beagleboard.org/u/vauban>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
++++
.. button-link:: https://forum.beagleboard.org/t/low-latency-risc-v-i-o-cpu-core/37156
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card::
:fas:`microchip;pst-color-secondary` **RISC-V Based PRU on FPGA**
......
:orphan:
.. _gsoc-old-ideas:
Old GSoC Ideas
......
.. _gsoc-2024-projects:
:far:`calendar-days` 2024
##########################
.. note:: Only 3 out of 4 :ref:`accepted students <gsoc-2024-proposals>` were able to complete the program in 2024.
Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
********************************************************************************
.. youtube:: Kagg8JycOfo
:width: 100%
| **Summary:** Leveraging the capabilities of BeagleBoard’s powerful processing units, the project will focus on creating a real-time, efficient solution that enhances media consumption experiences by seamlessly integrating custom audio streams during commercial breaks.
- Develop a neural network model: Combine Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to analyze video and audio data, accurately identifying commercial segments within video streams.
- Implement a real-time pipeline: Create a real-time pipeline for BeagleBoard that utilizes the trained model to detect commercials in real-time and replace them with alternative content or obfuscate them, alongside replacing the audio with predefined streams.
- Optimize for BeagleBoard: Ensure the entire system is optimized for real-time performance on BeagleBoard hardware, taking into account its unique computational capabilities and constraints.
**Contributor:** Aryan Nanda
**Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_, Kumar Abhishek
.. grid:: 2 2 2 2
.. grid-item::
.. button-link:: https://summerofcode.withgoogle.com/archive/2024/projects/UOX7iDEU
:color: info
:shadow:
:expand:
:fab:`google;pst-color-light` - GSoC Registry
.. grid-item::
.. button-ref:: gsoc-2024-proposal-aryan-nanda
:color: primary
:shadow:
:expand:
Proposal
Low-latency I/O RISC-V CPU core in FPGA fabric
************************************************
.. youtube:: ic0RRK6d3hg
:width: 100%
| **Summary:** Implementation of PRU subsystem on BeagleV-Fire’s FPGA fabric, resulting in a real-time microcontroller system working alongside the main CPU, providing low-latency access to I/O.
**Contributor:** Atharva Kashalkar
**Mentors:** `Cyril Jean <https://forum.beagleboard.org/u/vauban>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, Vedant Paranjape, Kumar Abhishek
.. grid:: 2 2 2 2
.. grid-item::
.. button-link:: https://summerofcode.withgoogle.com/archive/2024/projects/KjUoFlg2
:color: info
:shadow:
:expand:
:fab:`google;pst-color-light` - GSoC Registry
.. grid-item::
.. button-ref:: gsoc-2024-proposal-roger18
:color: primary
:shadow:
:expand:
Proposal
Differentiable Logic for Interactive Systems and Generative Music - Ian Clester
********************************************************************************
.. youtube:: NvHxMCF8sAQ
:width: 100%
| **Summary:** Developing an embedded machine learning system on BeagleBoard that leverages Differentiable Logic (DiffLogic) for real-time interactive music creation and environment sensing. The system will enable on-device learning, fine-tuning, and efficient processing for applications in new interfaces for musical expression.
**Contributor:** Ian Clester
**Mentors:** `Jack Armitage <https://forum.beagleboard.org/u/jarm/summary>`_, Chris Kiefer
.. grid:: 2 2 2 2
.. grid-item::
.. button-link:: https://summerofcode.withgoogle.com/archive/2024/projects/FBk0MM8g
:color: info
:shadow:
:expand:
:fab:`google;pst-color-light` - GSoC Registry
.. grid-item::
.. button-ref:: gsoc-2024-proposal-ijc
:color: primary
:shadow:
:expand:
Proposal
\ No newline at end of file
......@@ -14,6 +14,11 @@ GSoC over the previous years is given in the section that follows.
:margin: 4 4 0 0
:gutter: 4
.. grid-item-card:: :far:`calendar-days` 2024
:text-align: center
:link: gsoc-2024-projects
:link-type: ref
.. grid-item-card:: :far:`calendar-days` 2023
:text-align: center
:link: gsoc-2023-projects
......@@ -83,6 +88,7 @@ GSoC over the previous years is given in the section that follows.
:maxdepth: 1
:hidden:
2024
2023
2022
2021
......
proposals/2024/aryan_nanda/images/Figure5.png

211 KiB

proposals/2024/aryan_nanda/images/Figure6.png

108 KiB

.. _gsoc-2024-proposal-aryan-nanda:
.. _gsoc-proposal-template:
Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
###############################################################################
Enhanced Media Experience with AI-Powered Commercial Detection and Replacement - Aryan Nanda
############################################################################################
Introduction
*************
The BeagleBone® AI-64 from the BeagleBoard.org Foundation is a complete system for developing artificial intelligence (AI) and machine-learning solutions with the convenience and expandability of the BeagleBone platform and onboard peripherals to start learning and building applications.
Leveraging the capabilities of BeagleBoard’s powerful processing units, the project will focus on creating a real-time, efficient solution that enhances media consumption experiences by seamlessly integrating custom audio streams during commercial breaks.
Summary links
......@@ -14,12 +14,13 @@ Summary links
- **Contributor:** `Aryan Nanda <https://forum.beagleboard.org/u/aryan_nanda>`_
- **Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
- **GSoC Repository:** TBD
- **Repository:** `Main Code Repostiory on Gitlab <https://openbeagle.org/aryan_nanda/gsoc_2024-enhanced_media_experience_with_ai-powered_commercial_detection_and_replacement>`_, `Mirror of Code Repository on Github <https://github.com/AryanNanda17/GSoC_2024-Enhanced_Media_Experience_with_AI-Powered_Commercial_Detection_and_Replacement>`_
- **Weekly Updates:** `Forum Thread <https://forum.beagleboard.org/t/weekly-progress-report-thread-enhanced-media-experience-with-ai-powered-commercial-detection-and-replacement/38487>`_
Status
=======
This project is currently just a proposal.
This project has been accepted for GSoC 2024.
Proposal
========
......@@ -30,7 +31,7 @@ Proposal
About
=====
- Find my Resume `here <https://drive.google.com/file/d/1BblSPdncbjKf4qG7s9ldb7ssIhfGN5bA/view?usp=sharing>`_
- **Resume** - Find my resume `here <https://drive.google.com/file/d/1UPXxEo_Z-qPHpVlnPLcai9cBInQj_c5j/view?usp=sharing>`_
- **Forum:** :fab:`discourse` `u/aryan_nanda <https://forum.beagleboard.org/u/aryan_nanda>`_
- **OpenBeagle:** :fab:`gitlab` `aryan_nanda <https://openbeagle.org/aryan_nanda>`_
- **Github:** :fab:`github` `AryanNanda17 <https://github.com/AryanNanda17>`_
......@@ -41,15 +42,15 @@ About
- **Previous GSoC participation:** :fab:`google` This would be my first time participating in GSOC
**About the Project**
********
**********************
**Project name:** Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
Description
============
I propose developing a **GStreamer Plugin** capable of processing video inputs based on their classification.
The plugin will identify commercials and either replace them with alternative content or obscure them,
I propose developing **GStreamer Plugins** capable of processing video inputs based on their classification.
The plugins will identify commercials and either replace them with alternative content or obscure them,
while also substituting the audio with predefined streams. This enhancement aims to improve the media
consumption experience by eliminating unnecessary interruptions. I intend to **explore various video
classification models** to achieve accurate detection and utilize TensorFlow Lite to leverage the **native
......@@ -66,81 +67,189 @@ Comparison of different model accuracy can be done by doing some manual analysis
classification models and to finally use the best performing option to be included in the GStreamer pipeline
for inferencing of real-time videos. This would be the result presented at the end of the project timeline.
For phase 1 evaluation, the goal is to build a training dataset, preprocess it and fine-tune and train a Video
Classification model to identify commercials segments in a video accurately. For phase 2 evaluation, the goal
is to use the the best model identified in phase 1 for commercial detection and build a GStreamer pipeline
which would do video processing based on commercial segments classification and using native accelerators
present in BeagleBone-ai-64 for high-performance.
Classification model. For phase 2 evaluation, the goal is to use the the best model identified in phase 1 for
commercial detection and build a GStreamer pipeline and use native accelerators present in BeagleBone AI-64
for high-performance.
In order to accomplish this project the following objectives needs to be met.
In order to accomplish this project the following objectives need to be met.
1. Phase 1:-
- Develop a dataset of videos and corresponding labels indicating the presence of commercials in specific segments.
- Preprocess the dataset to ensure it's suitable for input into deep learning models. Moreover divide the datset into train, validation and test set.
- Fine-tune various deep learning models and train them on the prepared dataset to identify the most accurate one for commercial detection in videos.
- Apply transfer learning and Fine-tune various deep learning models and train them on the prepared dataset to identify the most accurate one for commercial detection in videos.
- Save all trained models to local disk and perform real-time inference using OpenCV to determine the model that yields the best results with high-performance.
2. Phase 2:-
- Based on all the options tried in Phase 1, decide on the final model to be used in the GStreamer pipeline.
- Compiling the model and generating artifacts so that we can use it in TFLite Runtime.
- Building a GStreamer pipeline that would take real-time input of media and would identify the commercial segments in it.
- If the commercial segment is identified the GStreamer pipeline would either replace them with alternative content or obscure them, while also substituting the audio with predefined streams.
- Enhancing the Real-time performance using native hardware Accelerators present in BeagleBone-Ai-64.
- I will also try to cut the commercial out completely and splice the ends.
- Enhancing the Real-time performance using native hardware Accelerators present in BeagleBone AI-64.
**Methods**
***********************
In this section, I will individually specify the training dataset, model, GStreamer Pipeline etc. methods that I plan on using in greater details.
Building training Dataset
==================
To train the model effectively, we need a dataset with accurate labels. Since a suitable commercial video dataset isn't readily available, I'll create one. This dataset will consist of two classes: commercial and non-commercial. To build this dataset, I'll refer to the `Youtube-8M dataset <https://research.google.com/youtube8m/>`_, which includes videos categorized as TV advertisements. I'll download these videos and organize them into two folders: commercial and non-commercial. However, since the Youtube-8M dataset provides encoded feature vectors instead of the actual videos, direct usage would result in significant latency. Therefore, I'll use it as a reference and download the videos labeled by it as advertisements to build our dataset. After the dataset is ready I will preprocess it to ensure it's suitable for input into deep learning models. Moreover I'll divide the datset into train, validation and test set.
Building training Dataset and Preprocessing
============================================
To train the model effectively, we need a dataset with accurate labels. Since a suitable commercial video
dataset isn't readily available, I'll create one. This dataset will consist of two classes: commercial and
non-commercial. By dividing the dataset into Commercial and Non-Commercial segments, I am focusing more on
"Content Categorization". Separating the dataset into commercials and non-commercials allows our model to
learn distinct features associated with each category. For commercials, this might include fast-paced editing,
product logos, specific jingles, or other visual/audio cues. Non-commercial segments may include slower-paced
scenes, dialogue, or narrative content.
To build this dataset, I'll refer to the **Youtube-8M dataset** [1],
which includes videos categorized as TV advertisements. However, since the Youtube-8M dataset provides encoded
feature vectors instead of the actual videos, direct usage would result in significant latency. Therefore,
I'll use it as a reference and download the videos labeled by it as advertisements to build our dataset.
I will use web scraper to automate this process by extracting URLs of the commercial videos. For the
non-commercial part, I will download random videos from other categories of Youtube-8M dataset.
After the dataset is ready I will preprocess it to ensure it's suitable for input into deep learning models.
Moreover I'll divide the datset into train, validation and test set. To address temporal dependencies during
training, I intend to employ random shuffling of the dataset using
```tf.keras.preprocessing.image_dataset_from_directory() with shuffle=True```. This approach ensures that
videos from different folders are presented to the model randomly, allowing it to learn scene change detection
effectively.
Video Classification models
============================
MoViNets is a good model for our task as it can operate on streaming videos for online inference. The main reason behind trying out MoViNets first is becaue it does quick and continuous analysis of incoming video streams. MoViNet utilizes NAS(Neural Architecture Search) to balance accuracy and efficiency, incorporates stream buffers for constant memory usage, and improves accuracy via temporal ensembles. The MoViNet architecture uses 3D convolutions that are "causal". Causal convolution ensures that the output at time t is computed using only inputs up to time t. This allows for efficient streaming.
This make MoViNets a perfect choice for our case. So, I will fine tune the MoViNets model and will train it on the commercial detection dataset.
**MoViNets** is a good model for our task as it can operate on streaming videos for online inference. The main reason behind trying out MoViNets first is becaue it does quick and continuous analysis of incoming video streams. MoViNet utilizes NAS(Neural Architecture Search) to balance accuracy and efficiency, incorporates stream buffers for constant memory usage [Fig. 1], and improves accuracy via temporal ensembles [2]. The MoViNet architecture uses 3D convolutions that are "causal". Causal convolution ensures that the output at time t is computed using only inputs up to time t [2][Fig. 2]. This allows for efficient streaming.
This make MoViNets a perfect choice for our case.
Since we don't have a big dataset, we will use the pre-trained MoViNets model as a feature extractor
and fine-tune it on our dataset. I will remove the classification layers of MoViNets and use its
pre-trained weights to extract features from our dataset. Then, train a smaller classifier (e.g., a few fully connected layers) on top of these features.
This way we can use the features learned by MoViNets on the larger dataset with minimal risk of overfitting.
This can help improve the model's performance even with limited data.
.. image:: Assets/Figure1.png
.. image:: images/Figure1.png
:alt: Stream buffer in MoViNets
.. centered::
Figure 1: Stream buffer in MoViNets
Figure 1: Stream buffer in MoViNets [2]
.. image:: Assets/Figure2.png
.. image:: images/Figure2.png
:alt: Standard Convolution Vs Causal Convolution
.. centered::
Figure 2: Standard Convolution Vs Causal Convolution
Figure 2: Standard Convolution Vs Causal Convolution [2]
If MoViNet does not perform well than we can use other models like ResNet-50+LSTMs. Since a video is just a series of frames, a naive video classification method would be pass each frame from a video file through a CNN, classify each frame individually and independently of each other, choose the label with the largest corresponding probability, label the frame, and assign the most assigned image label to the video.
To solve the problem of "prediction flickering", where the label for the video changes rapidly when scenes get labeled differently. I will use **rolling prediction averaging** to reduce “flickering” in results.
The Conv+LSTMs model will perform well as it considers both the spatial and temporal features of videos just like a Conv3D model. The only reason it is not my first choice is because MoViNets are considered to be better for real-time performance.
If MoViNet does not perform well than we can use other models like **Conv+LSTMs** [Fig. 3][3]. Since a video is just a series of frames, a naive video classification method would be pass each frame from a video file through a CNN, classify each frame individually and independently of each other, choose the label with the largest corresponding probability, label the frame, and assign the most assigned image label to the video.
To solve the problem of "prediction flickering", where the label for the video changes rapidly when scenes get labeled differently. I will use **rolling prediction averaging** to reduce “flickering” in results. And I will maintain a queue to store the last few frames and whenever a scene change is detected, all frames in the queue would be marked with the current result, allowing for retroactive scene modification.
The depth of the queue will be determined through experimentation to find the optimal setting.
The Conv+LSTMs model will perform well as it considers both the spatial and temporal features of videos just like a Conv3D model. The only reason it is not my first choice is because MoViNets are considered to be better for real-time performance.
.. image:: Assets/Figure3.png
.. image:: images/Figure3.png
:alt: Conv3D+LSTMs
.. centered::
Figure 3: Conv+LSTMs
Figure 3: Conv+LSTMs [3]
Optional method with Video Vision Transformers
-----------------------------------------------
This is a pure Transformer based model which extracts spatio-temporal tokens from the input video, which are then encoded by a series of transformer layers.
I have kept this as an optional method because our problem is of binary classification(either Commercial or Non-Commercial), so using such a complex model for this small problem may not be as efficient as other models.
Optional Methods
-----------------
- **ViViT**: A Video Vision Transformer, this is a pure Transformer based model which extracts spatio-temporal tokens from the input video, which are then encoded by a series of transformer layers [4]. I have kept this as an optional method because our problem is of binary classification(either Commercial or Non-Commercial), so using such a complex model for this small problem may not be as efficient as other models.
- **Audio fingerprinting**: This method involves extracting unique characteristics or features from audio signals to create a compact representation, often called a fingerprint. These fingerprints can then be compared against a database or used for various audio processing tasks [5]. I have kept it as an optional method because it may sometimes yield poorer results compared to deep learning models like MoViNets and Conv+LSTMs, particularly in tasks requiring complex audio understanding.
- **Scene Change Detection**: This approach involve detecting scene changes to segment the video into distinct segments or shots based on difference between pixel values of two consecutive frames etc [6]. And then applying video classification model on segmented frames. I have kept this as an optional appraoch because I think adding an additional step would cause unnecessary challenges.
Choosing the Best Performing model
===================================
After training the models, I'll assess their performance using evaluation metrics and conduct real-time inference on a sample video containing both commercial and non-commercial segments. I'll select the model with the highest accuracy and integrate it into the GStreamer pipeline for further processing.
I will choose the best performing model based on model's performance on:-
1. **Evaluation metrics**
- Accuracy: This metric provides a general overview of how well our model is performing overall. It's a good starting point for evaluating performance, but it might not be sufficient on its own, especially if the classes are imbalanced.
- Precision and Recall: These metrics provide insights into the model's ability to minimize false positives (precision) and false negatives (recall). Since our problem involves binary classification, precision and recall are essential for understanding the model's performance on each class (commercial and non-commercial).
- F1 Score: It provides a balanced measure of the model's performance. A high F1 score indicates that the model effectively balances precision and recall, ensuring both high accuracy and coverage in detecting commercials.
2. **Real-time inferencing**
- I will use OpenCV to evaluate real-time performance of the models. This ensures that our model's performance is evaluated under conditions similar to those it will encounter in deployment.
This code snippet illustrates how the model will be assessed in real-time, focusing on both **detection accuracy** and **frames per second (FPS)** as the primary evaluation metrics.
.. code-block:: python
# Import necessary libraries
from keras.models import load_model
from collections import deque
import numpy as np
import pickle
import cv2
import time
model = load_model("./our_trained_commercial_detection_model") # Load the pre-trained Keras model
lb = pickle.load(open("./videoClassification.pickle", "rb")) # Load the label binarizer used during training
mean = np.array([123.68, 116.779, 103.939], dtype="float32") # Define the mean value for image preprocessing
Queue = deque(maxlen=128) # Define a deque (double-ended queue) to store predictions of past frames
capture_video = cv2.VideoCapture("./example_clips/DemoVideo.mp4") # Open the video file for reading
(Width, Height) = (None, None) # Initialize variables for storing frame dimensions and previous time
ptime = 0
# Loop through each frame of the video
while True:
(taken, frame) = capture_video.read() # Read a frame from the video
if not taken: # Break the loop if no more frames are available
break
if Width is None or Height is None: # Get frame dimensions if not already obtained
(Width, Height) = frame.shape[:2]
output = frame.copy() # Make a copy of the frame for processing and classification
#---------------------------Pre-Processing--------------------------#
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # Preprocess the frame: convert color space and resize
frame = cv2.resize(frame, (sizeRequiredByModelHere)).astype("float32")
frame -= mean # Subtract the mean value for normalization
#--------------------------model-inferencing-----------------------#
preds = model.predict(np.expand_dims(frame, axis=0))[0] # Make predictions on the preprocessed frame
Queue.append(preds) # Append the predictions to the deque
results = np.array(Queue).mean(axis=0) # Calculate the average of predictions from past frames
i = np.argmax(results) # Get the index of the class with the highest average prediction
label = lb.classes_[i] # Get the label corresponding to the predicted class
#---------------------------Post-Processing--------------------------#
if label == "commercial": # Apply Gaussian blur to the output frame if the label is "commercial"
output = cv2.GaussianBlur(output, (99, 99), 0)
ctime = time.time() # Calculate and display FPS
fps = int(1 / (ctime - ptime))
cv2.putText(output, str(fps), (20, 200), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 0, 255), 2)
ptime = ctime
cv2.imshow("In progress", output) # Show the processed frame
key = cv2.waitKey(1) & 0xFF # Handle user input (press 'q' to quit)
if key == ord("q"):
break
# Release resources when finished
capture_video.release()
cv2.destroyAllWindows()
Model Execution on BeagleBone AI-64
=====================================
The BeagleBone AI-64 Linux for Edge AI supports importing pre-trained custom models to run inference on target. Moreover, Edge AI BeagleBone AI-64 images have TensorFlow Lite already installed with acceleration enabled.
BeagleBone AI-64 Linux for Edge AI supports importing pre-trained custom models to run inference on target. Moreover, Edge AI BeagleBone AI-64 images have TensorFlow Lite already installed with acceleration enabled.
The Debian-based SDK makes use of pre-compiled DNN (Deep Neural Network) models and performs inference using various OSRT (open source runtime) such as TFLite runtime, ONNX runtime etc.
In order to infer a DNN, SDK expects the DNN and associated artifacts in the below directory structure.
In order to infer a DNN, SDK expects the DNN and associated artifacts in the below directory structure [7].
.. code-block:: text
......@@ -158,77 +267,60 @@ In order to infer a DNN, SDK expects the DNN and associated artifacts in the bel
│ └── runtimes_visualization.svg
└── model
└── ssd_mobilenet_v2_300_float.tflite
└── my_commercial_detection_model.tflite
1. model: This directory contains the DNN being targeted to infer
1. model: This directory contains the DNN being targeted to infer.
2. artifacts: This directory contains the artifacts generated after the compilation of DNN for SDK.
3. param.yaml: A configuration file in yaml format to provide basic information about DNN, and associated pre and post processing parameters.
Therefore, after choosing the model to be used in GStreamer pipeline, I will generate the artifacts directory by following `these <https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/README.md>`_ instructions.
Therefore, after choosing the model to be used in GStreamer pipeline, I will generate the artifacts directory by following the instructions mentioned in TexasInstruments:edgeai-tidl-tools examples [7].
.. image:: Assets/Figure4.png
.. image:: images/Figure4.png
:alt: TFLite Runtime
.. centered::
Figure 4: TFLite Runtime
Figure 4: TFLite Runtime [7]
GStreamer Pipeline
===================
The data flow in the GStreamer pipeline at a high level can be split into 3-parts:-
The data flow in the GStreamer pipeline at a high level can be split into 3-parts [8]:-
1. Input Pipeline - Grabs a frame from the input source.
2. Output Pipeline - Sends the output to the display.
3. Compute Pipeline - Performs pre-processing, inference and post-processing.
I will create a GStreamer Pipeline that will receive an input of a video and it will grab it frame by frame. The frame will be split into two paths.
The “analytics” path resizes the input maintaining the aspect ratio and crops the input to match the resolution required to run the deep learning network.
The “visualization” path is provided to the post-processing module which overlays the detected classes. If a commercial video is detected, we apply blurring to the video frames and replace the audio.
If a non-commercial video is detected, proceed with the normal visualization process without blurring or replacing the audio.
Post-processed output is given to HW mosaic plugin which positions and resizes the output window on an empty background before sending to display.
The following GStreamer input and output pipeline describes how I will build the GStreamer pipeline:-
- GStreamer input pipeline:
I will create a GStreamer Pipeline that will receive input from an **hdmi source** and it will grab it frame by frame. The frame will be split into two paths.
.. code-block:: c++
v4l2src device=/dev/video0 ! \
video/x-raw,format=NV12,width=1920,height=1080 ! \
tiovxmultiscaler name=split_01 ! \
split_01. ! queue ! video/x-raw, width=320, height=320 ! \
tiovxdlpreproc data-type=10 channel-order=1 mean-0=128.000000 mean-1=128.000000 mean-2=128.000000 scale-0=0.007812 scale-1=0.007812 scale-2=0.007812 tensor-format=rgb out-pool-size=4 ! \
application/x-tensor-tiovx ! appsink name=pre_0 max-buffers=2 drop=true \
split_01. ! queue ! video/x-raw, width=1280, height=720 ! \
tiovxdlcolorconvert target=1 out-pool-size=4 ! \
video/x-raw, format=RGB ! appsink name=sen_0 max-buffers=2 drop=true
- GStreamer output pipeline:
.. code-block:: c++
appsrc format=GST_FORMAT_TIME is-live=true block=true do-timestamp=true name=post_0 ! \
tiovxdlcolorconvert ! video/x-raw,format=NV12,width=1280,height=720 ! \
queue ! mosaic_0.sink_0
appsrc format=GST_FORMAT_TIME block=true num-buffers=1 name=background_0 ! \
tiovxdlcolorconvert ! video/x-raw,format=NV12,width=1920,height=1080 ! \
queue ! mosaic_0.background
The “analytics” path normalizes the frame and resizes the input to match the resolution required to run the deep learning model.
The “visualization” path is provided to the post-processing module which does the required post process required by the model.
If a commercial video is detected, we apply blurring to the video frames and replace the audio.
If a non-commercial video is detected, proceed with the normal visualization process without blurring or replacing the audio.
Post-processed output is then sent to display [8].
tiovxmosaic name=mosaic_0 \
sink_0::startx=320 sink_0::starty=180 sink_0::width=1280 sink_0::height=720 \
! video/x-raw,format=NV12,width=1920,height=1080 ! \
kmssink sync=false driver-name=tidss
NNStreamer provides efficient and flexible data streaming for machine learning
applications, making it suitable for tasks such as running inference on video frames.
So, I will use NNStreamer elements to do inferencing of videos.
.. image:: Assets/Figure5.png
.. image:: images/Figure5.png
:alt: GStreamer Pipeline
.. centered::
Figure 5: GStreamer Pipeline
Figure 5: GStreamer Pipeline [8]
The above GStreamer pipeline is a demo pipeline inspired from edge_ai_apps/data_flows [8] and there could be a few more changes to it depending upon our specific need.
- **"hdmisrc"** element is used for capturing audio and video data from an HDMI source.
- **"videoconvert"** ensure proper format conversion for display.
- **"tiovxcolorconvert"** is used to perform color space conversion.
- **"tiovxmultiscaler"** is used to perform multi-scaling operations on video frames. It allows us to efficiently scale the input frames to multiple desired resolutions in a single step.
- **"tiovxdlpreproc"** is used to perform pre-processing of input data in deep learning inference pipelines using TIOVX (TI OpenVX) framework.
- **"kmssink"** is used for displaying video on systems.
.. image:: Assets/Figure6.png
Project Workflow
===================
.. image:: images/Figure6.png
:alt: Project Workflow
.. centered::
......@@ -240,7 +332,7 @@ Software
- `Python <https://www.python.org/>`_
- `C++ <https://isocpp.org/>`_
- `TensorFlow <https://www.tensorflow.org/>`_
- `TFlite <https://www.tensorflow.org/lite>`_
- `TFLite <https://www.tensorflow.org/lite>`_
- `GStreamer <https://gstreamer.freedesktop.org/>`_
- `OpenCV <https://opencv.org/>`_
- `Build Systems <https://www.gnu.org/software/make/>`_
......@@ -248,10 +340,10 @@ Software
Hardware
========
- Ability to capture and display video streams using `Beagleboard ai-64 <https://www.beagleboard.org/boards/beaglebone-ai-64>`_
- Ability to capture and display video streams using `BeagleBone AI-64 <https://www.beagleboard.org/boards/beaglebone-ai-64>`_
**Timeline**
********
*************
Timeline summary
......@@ -259,140 +351,178 @@ Timeline summary
.. table::
+------------------------+----------------------------------------------------------------------------------------------------+
| Date | Activity |
+========================+====================================================================================================+
| April 3 - May 1 | Understanding GStreamer pipeline and TFLite runtime of BeagleBone-Ai-64. |
+------------------------+----------------------------------------------------------------------------------------------------+
| May 1 - May 10 | Discussing implementation ideas with mentors. |
+------------------------+----------------------------------------------------------------------------------------------------+
| May 10 - May 31 | Focus on college exams. |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 1 - June 3 | Start coding and introductory video |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 3 | Release introductory video and complete milestone #1 |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 10 | :ref:`Complete milestone #2<Milestone2>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 17 | :ref:`Complete milestone #3<Milestone3>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 24 | :ref:`Complete milestone #4<Milestone4>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 1 | :ref:`Complete milestone #5<Milestone5>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 8 | Submit midterm evaluations |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 15 | :ref:`Complete milestone #6<Milestone6>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 22 | :ref:`Complete milestone #7<Milestone7>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 29 | :ref:`Complete milestone #8<Milestone8>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| August 5 | :ref:`Complete milestone #9<Milestone9>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| August 12 | :ref:`Complete milestone #10<Milestone10>` |
+------------------------+----------------------------------------------------------------------------------------------------+
| August 19 | Submit final project video, submit final work to GSoC site and complete final mentor evaluation |
+------------------------+----------------------------------------------------------------------------------------------------+
+------------------------+--------------------------------------------------------------------------------------+
| Date | Activity |
+========================+======================================================================================+
| February 26 - March 3 | Connect with possible mentors and request review on first draft |
+------------------------+--------------------------------------------------------------------------------------+
| March 4 - March 10 | Complete prerequisites, verify value to community and request review on second draft |
+------------------------+--------------------------------------------------------------------------------------+
| March 11 - March 20 | Finalized timeline and request review on final draft |
+------------------------+--------------------------------------------------------------------------------------+
| March 21 - April 2 | Proposal review and Submit application |
+------------------------+--------------------------------------------------------------------------------------+
| April 3 - May 1 | Understanding GStreamer pipeline and TFLite runtime of BeagleBone AI-64. |
+------------------------+--------------------------------------------------------------------------------------+
| May 2 - May 10 | :ref:`ACRBonding` |
+------------------------+--------------------------------------------------------------------------------------+
| May 11 - May 31 | Focus on college exams. |
+------------------------+--------------------------------------------------------------------------------------+
| June 1 - June 3 | Start coding and introductory video |
+------------------------+--------------------------------------------------------------------------------------+
| June 3 - June 9 | :ref:`ACRMilestone1` |
+------------------------+--------------------------------------------------------------------------------------+
| June 10 - June 16 | :ref:`ACRMilestone2` |
+------------------------+--------------------------------------------------------------------------------------+
| June 17 - June 23 | :ref:`ACRMilestone3` |
+------------------------+--------------------------------------------------------------------------------------+
| June 24 - June 30 | :ref:`ACRMilestone4` |
+------------------------+--------------------------------------------------------------------------------------+
| July 1 - July 7 | :ref:`ACRMilestone5` |
+------------------------+--------------------------------------------------------------------------------------+
| July 8 - July 14 | :ref:`ACRSubmit-midterm-evaluations` |
+------------------------+--------------------------------------------------------------------------------------+
| July 15 - July 21 | :ref:`ACRMilestone6` |
+------------------------+--------------------------------------------------------------------------------------+
| July 22 - July 28 | :ref:`ACRMilestone7` |
+------------------------+--------------------------------------------------------------------------------------+
| July 29 - August 4 | :ref:`ACRMilestone8` |
+------------------------+--------------------------------------------------------------------------------------+
| August 5 - August 11 | :ref:`ACRMilestone9` |
+------------------------+--------------------------------------------------------------------------------------+
| August 12 - August 18 | :ref:`ACRMilestone10` |
+------------------------+--------------------------------------------------------------------------------------+
| August 19 | :ref:`ACRFinal-project-video` |
+------------------------+--------------------------------------------------------------------------------------+
Timeline detailed
=================
==================
.. _ACRBonding:
Community Bonding Period (May 1st - May 10th)
==============================================
----------------------------------------------
- Discuss implementation idea with mentors.
- Discuss the Scope of the project.
- Discuss implementation ideas with mentors.
- Discuss the scope of the project.
.. _Milestone1:
.. _ACRMilestone1:
Milestone #1, Introductory YouTube video (June 3rd)
===================================================
Milestone #1, Releasing introductory video and developing commercial dataset (June 3)
-------------------------------------------------------------------------------------
- Making an Introductory Video.
- Starting working on dataset.
- Commercial dataset acquisition:
- Web scrape videos marked as advertisements from YouTube 8-M dataset.
- Ensure proper labeling and categorization of commercial videos.
.. _ACRMilestone2:
Milestone #2, Developing non-commercial dataset and dataset preprocessing (June 10)
-------------------------------------------------------------------------------------
.. _Milestone2:
- Non-commercial dataset acquisition:
- Web scrape random videos from other categories of YouTube 8-M dataset.
- Ensure diversity and relevance of non-commercial videos.
- Dataset preprocessing:
- Preprocess acquired datasets for suitability in deep learning models.
- Divide datasets into train, validation, and test sets.
- Perform random shuffling of data to maintain temporal dependencies.
Milestone #2 (June 10th)
==========================
.. _ACRMilestone3:
- Develop a dataset of videos and corresponding labels indicating the presence of commercials in specific segments.
- Preprocess the dataset to ensure it’s suitable for input into deep learning models. Moreover divide the datset into train, validation and test set.
Milestone #3, Transfer learning and fine-tuning MoViNets architecture (June 17)
-------------------------------------------------------------------------------------
.. _Milestone3:
- Transfer learning and fine-tuning MoViNets architecture:
- Apply transfer learning on MoViNets and fine-tune its last few layers.
- Train MoViNets on the prepared dataset for video classification.
Milestone #3 (June 17th)
=========================
.. _ACRMilestone4:
- Fine-tune various deep learning models and train them on the prepared dataset to identify the most accurate one for commercial detection in videos.
Milestone #4, Transfer learning and fine-tuning ResNet architecture (June 24)
-------------------------------------------------------------------------------------
.. _Milestone4:
- Transfer learning and fine-tuning ResNet architecture:
- Adding additional layers of LSTMs for extracting temporal dependencies.
- Developing ResNet-LSTMs model architecture for video classification.
- Train the ResNet-LSTMs model on the prepared dataset.
Milestone #4 (June 24th)
==========================
.. _ACRMilestone5:
- Continuing working on model and fine-tuning it to give the best results.
Milestone #5, Evaluate performance metrics to choose the best-performing model (July 1)
---------------------------------------------------------------------------------------
.. _Milestone5:
- Finalize the best model:
- Save all trained models to local disk
- Evaluate performance metrics to choose the best-performing model.
Milestone #5 (July 1st)
========================
- Save all trained models to local disk and perform real-time inference using OpenCV to determine the model that yields the best results with high-performance.
.. _ACRSubmit-midterm-evaluations:
Submit midterm evaluations (July 8th)
=====================================
-------------------------------------------------------------------------------------
- Completing all the phase 1 tasks if reamaining.
- Document the progress made during the first phase of the project.
.. important::
**July 12 - 18:00 UTC:** Midterm evaluation deadline (standard coding period)
.. _Milestone6:
.. _ACRMilestone6:
Milestone #6, Finalizing the best model by performing real-time inferencing (July 15)
--------------------------------------------------------------------------------------
- Finalize the best model:
- Perform real-time inference using OpenCV to determine the model that yields the best results with high-performance.
- Based on all the options tried in Phase 1, decide on the final model to be used in the GStreamer pipeline.
Milestone #6 (July 15th)
=========================
.. _ACRMilestone7:
- Based on all the options tried in Phase 1, decide on the final model to be used in the GStreamer pipeline.
- Compiling the model and generating artifacts so that we can use it in TFLite Runtime.
Milestone #7, Compiling the model and generating artifacts and building pre-processing part of GStreamer pipeline (July 22)
----------------------------------------------------------------------------------------------------------------------------
.. _Milestone7:
- Compile the chosen model and generate artifacts for TFLite runtime.
- Building the pre-processing part of GStreamer pipeline:
- Develop the pre-processing module to prepare video frames for inference.
Milestone #7 (July 22nd)
=========================
.. _ACRMilestone8:
- Building a GStreamer pipeline that would take real-time input of media and would identify the commercial segments in it.
Milestone #8, Building the compute pipeline using NNStreamer (July 29)
----------------------------------------------------------------------------------------------------------------------------
.. _Milestone8:
- Building the compute pipeline using NNStreamer:
- Implement NNStreamer for inferencing videos using the compiled model.
Milestone #8 (July 29th)
=========================
.. _ACRMilestone9:
- Continuing working on the GStreamer pipeline.
- If the commercial segment is identified the GStreamer pipeline would either replace them with alternative content or obscure them, while also substituting the audio with predefined streams.
Milestone #9, Building the post-processing part of GStreamer pipeline (August 5)
----------------------------------------------------------------------------------------------------------------------------
.. _Milestone9:
- Building the post-processing part of GStreamer pipeline:
- Develop the post-processing module to perform actions based on classification results.
- Implement replacement or obscuring of commercial segments and audio substitution.
Milestone #9 (Aug 5th)
=======================
.. _ACRMilestone10:
- Enhancing the Real-time performance using native hardware Accelerators present in BeagleBone-Ai-64.
Milestone #10, Enhancing real-time performance (August 12)
----------------------------------------------------------------------------------------------------------------------------
.. _Milestone10:
- Enhancing real-time performance:
- Optimize the GStreamer pipeline for real-time performance using native hardware accelerators.
- Ensure smooth and efficient processing of video streams.
Milestone #10 (Aug 12th)
========================
- Continuing working on optimizing the real-time output for best performance.
.. _ACRFinal-project-video:
Final YouTube video (Aug 19th)
===============================
Submit final project video, submit final work to GSoC site and complete final mentor evaluation (August 19)
----------------------------------------------------------------------------------------------------------------------------
- Submit final project video, submit final work to GSoC site and complete final mentor evaluation
- Submit final project video, submit final work to GSoC site and complete final mentor evaluation.
Final Submission (Aug 24nd)
============================
----------------------------------------------------------------------------------------------------------------------------
.. important::
......@@ -403,7 +533,7 @@ Final Submission (Aug 24nd)
evaluations (standard coding period)
Initial results (September 3)
=============================
----------------------------------------------------------------------------------------------------------------------------
.. important::
**September 3 - November 4:** GSoC contributors with extended timelines continue coding
......@@ -412,34 +542,36 @@ Initial results (September 3)
**November 11 - 18:00 UTC:** Final date for mentors to submit evaluations for GSoC contributor projects with extended deadline
Experience and approch
Experience and approach
***********************
This project requires prior experience with machine learning, multimedia processing and embedded systems.
- As a good starting point for this project, I build a `Sports Video Classification model <https://github.com/AryanNanda17/VideoProcessing-Based-on-Video-classifcation>`_ and did **Video Processing on it based on Video classification** using OpenCV(Video Processing based on Video classification is an important part of the project). `Demo <https://youtu.be/hoKE2dr2nT4>`_ :fas:`external-link`
- We will be building Pure C++ GStreamer pipeline from input to output so experience with C++ Codebases and build systems is required.
- Relevant contribution - https://github.com/SRA-VJTI/Pixels_Seminar/pull/123 (OpenCV/C++)
- Relevant contribution - `#123 <https://github.com/SRA-VJTI/Pixels_Seminar/pull/123>`_ (OpenCV/C++)
- I have Previously Worked on the project `GestureSense <https://github.com/AryanNanda17/GestureSense/blob/master/GestureDetection/BgEliminationAndMotionDetection.py>`_ in which I did Image Processing based on Image classification using OpenCV/Python.
- I have past experience with esp-32 microcontroller and I have Previously Worked on a project `Multi-Code-Esp <https://github.com/AryanNanda17/multi_code_esp>`_ in which I build a multi-code esp component.
- Experience in Open-Source
Contributed at pymc repository. Added enhancements.
- https://github.com/pymc-devs/pymc/pull/7132 (merged)
- https://github.com/pymc-devs/pymc/pull/7125 (merged)
Resolved one issue in OpenCV repository.
- https://github.com/opencv/opencv/issues/22177 (merged)
- `#7132 <https://github.com/pymc-devs/pymc/pull/7132>`_ (merged)
- `#7125 <https://github.com/pymc-devs/pymc/pull/7125>`_ (merged)
Resolved one issue in OpenCV repository (Improved Documentation).
- `#22177 <https://github.com/opencv/opencv/issues/22177>`_ (merged)
- Contributions in `openbeagle.org/gsoc <https://openbeagle.org/gsoc/gsoc.beagleboard.io>`_
- Added `New idea. <https://openbeagle.org/gsoc/gsoc.beagleboard.io/-/merge_requests/25>`_
- Improved `Documentation <https://openbeagle.org/gsoc/gsoc.beagleboard.io/-/merge_requests/23#note_18477>`_
- Resolved pdf pageBreak issue - `#33 <https://openbeagle.org/gsoc/gsoc.beagleboard.io/-/merge_requests/33>`_ (merged)
- Added New idea - `#25 <https://openbeagle.org/gsoc/gsoc.beagleboard.io/-/merge_requests/25>`_ (merged)
- Improved Documentation - `#23 <https://openbeagle.org/gsoc/gsoc.beagleboard.io/-/merge_requests/23#note_18477>`_ (merged)
Contingency
===========
- If I get stuck on my project and my mentor isn’t around, I will use the following resources:-
- `MoViNets <https://www.tensorflow.org/hub/tutorials/movinet>`_
- `Video Visual Transformer <https://huggingface.co/docs/transformers/en/model_doc/vivit>`_
- `GStreamer Docs <https://gstreamer.freedesktop.org/>`_
- `BeagleBone-Ai-64 <https://docs.beagleboard.org/latest/boards/beaglebone/ai-64/01-introduction.html>`_
- `BeagleBone AI-64 docs <https://docs.beagleboard.org/latest/boards/beaglebone/ai-64/01-introduction.html>`_
- `NNStreamer <https://nnstreamer.github.io/>`_
- Moreover, the BeagleBoard community is extremely helpful and active in resolving doubts, which makes it a great going for the project resources and clarification.
- I intend to remain involved and provide ongoing support for this project beyond the duration of the GSOC timeline.
Benefit
========
......@@ -454,4 +586,16 @@ Misc
====
- The PR Request for Cross Compilation: `#185 <https://github.com/jadonk/gsoc-application/pull/185>`_
- Relevant Coursework: `Neural Networks and Deep Learning <https://www.coursera.org/account/accomplishments/verify/LKHTEA9XRWML>`_, `Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization <https://www.coursera.org/account/accomplishments/verify/E52UFAHAY5UG>`_, `Convolutional Neural Networks <https://www.coursera.org/account/accomplishments/verify/9L4QL25AEL3L>`_
\ No newline at end of file
- Relevant Coursework: `Neural Networks and Deep Learning <https://www.coursera.org/account/accomplishments/verify/LKHTEA9XRWML>`_, `Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization <https://www.coursera.org/account/accomplishments/verify/E52UFAHAY5UG>`_, `Convolutional Neural Networks <https://www.coursera.org/account/accomplishments/verify/9L4QL25AEL3L>`_
References
***********
1. Youtube: `YouTube-8M: A Large and Diverse Labeled Video Dataset <https://research.google.com/youtube8m/>`_
2. Dan Kondratyuk*, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong: `MoViNets: Mobile Video Networks for Efficient Video Recognition. <https://arxiv.org/pdf/2103.11511.pdf>`_
3. Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell: `Long-term Recurrent Convolutional Networks for Visual Recognition and Description <https://arxiv.org/pdf/1411.4389.pdf>`_
4. Anurag Arnab* Mostafa Dehghani* Georg Heigold Chen Sun Mario Luciˇ c´† Cordelia Schmid†: `ViViT: A Video Vision Transformer <https://arxiv.org/pdf/2103.15691.pdf>`_
5. Nilesh M. Patil, Dr. Milind U. Nemade: `Content-Based Audio Classification and Retrieval: A Novel Approach <https://www.academia.edu/40346310/Content_Based_Audio_Classification_and_Retrieval_A_Novel_Approach>`_
6. Igor Bieda, Anton Kisil, Taras Panchenko: `An Approach to Scene Change Detection <https://ieeexplore.ieee.org/document/9660887>`_
7. TexasInstruments: `edgeai-tidl-tools <https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/README.md>`_
8. BeagleBone AI-64: `data-flows in edge_ai_apps <https://docs.beagleboard.org/latest/boards/beaglebone/ai-64/edge_ai_apps/data_flows.html>`_
.. _gsoc-2024-proposal-ijc:
Differentiable Logic for Interactive Systems and Generative Music - Ian Clester
###############################################################################
Introduction
*************
Summary links
=============
- **Contributor:** `Ian Clester <https://forum.beagleboard.org/u/ijc>`_
- **Mentors:** `Jack Armitage <https://forum.beagleboard.org/u/jarm>`_, `Chris Kiefer <https://forum.beagleboard.org/u/luuma>`_
- **GSoC:** `Google Summer of Code <https://summerofcode.withgoogle.com/archive/2023/projects/iTfGBkDk>`_
- **Weekly Updates:** `Forum Thread <https://forum.beagleboard.org/t/weekly-progress-report-differentiable-logic-for-interactive-systems-and-generative-music/38486>`_
- **Repository**: `embedded-difflogic <https://openbeagle.org/ijc/embedded-difflogic>`_
Status
=======
This project has been accepted for GSoC 2024.
About
=====
- **Forum:** :fab:`discourse` `u/ijc (Ian Clester) <https://forum.beagleboard.org/u/ijc>`_
- **OpenBeagle:** :fab:`gitlab` `openbeagle.org/ijc <https://openbeagle.org/ijc>`_
- **Discord:** :fas:`comments` `bbb.io/gsocchat <https://bbb.io/gsocchat>`_
- **Github:** :fab:`github` `ijc8 (Ian Clester) <https://github.com/ijc8>`_
- **School:** :fas:`school` Georgia Institute of Technology
- **Country:** :fas:`flag` United States
- **Primary language:** :fas:`language` English
- **Typical work hours:** :fas:`clock` 9AM-6PM US Eastern
- **Previous GSoC participation:** :fab:`google` `Better Faust on the Web (2023) <https://summerofcode.withgoogle.com/archive/2023/projects/L6oI4LhW>`_
Project
********
**Project name:** Differentiable Logic for Interactive Systems and Generative Music
Description
============
The general aim of this project is to enable the development of models that are suitably efficient for use in real-time interactive applications on embedded systems (particularly the BeagleBone-based Bela).
At the project's core is difflogic [1]_, a recent technique that employs sparsely-connected network composed of basic logic gates (rather than densley-connected neurons with complex activation functions) to obtain small models and fast inference.
Thus, the first and foremost goal of the project is to enable a convenient workflow for developing difflogic models and running them on the Bela. The expected use case is developing and training models on a larger machine (e.g. a laptop, desktop, or server), followed by exporting the model to C and cross-compiling it for the BeagleBone - either the main CPU (ARM Cortex-A8) or the PRUs.
To support this workflow, I will develop wrappers for exporting compiled difflogic models for use in the various languages supported on Bela (C++, Pure Data, SuperCollider, Csound).
These wrappers will likely take inspiration from other projects that bring machine learning into computer music environments, such as `nn~ <https://github.com/acids-ircam/nn_tilde>`_ and `FluCoMa <https://www.flucoma.org/>`_.
This first goal, along with profiling and benchmarking the performance of difflogic models on both the main CPU and the PRUs, constitutes roughly the first half of the project.
The other, more exploratory half of the project consists of building out integrations and applications of difflogic for the rapid development of useful audio models.
To that end, I intend to explore the possibilities of combining difflogic networks with techniques such as DDSP (differentiable digital signal processing) [2]_, possibly also leveraging Faust auto-differentation.
I also intend to investigate the feasibility of "porting" well-known ML architectures such as VAEs to difflogic networks, and of training difflogic networks to approximate the behavior of existing neural networks (i.e. knowledge distillation).
Audio models such as RAVE [3]_, PESTO [4]_, and Whisper [5]_ may be of particular interest.
Furthermore, I will explore opportunities to combine difflogic networks with other cheap, effective techniques like the $Q recognizer [6]_ for gestural control, linear predictive coding for audio analysis & resynthesis, and toolkits such as `RapidLib <https://github.com/jarmitage/RapidLibBela>`_.
Such combinations may be particularly useful for interactive machine learning (as in Wekinator [7]_), should fine-tuning difflogic models on-device prove too costly.
In this phase of the project, I will develop example applications involving sound analysis, classification, and synthesis, and experiment with interactive machine learning.
Finally, I intend to dedicate some time to a specific creative application: generating networks of logic gates to approximate particular sounds and exploring the space of such sound-generating networks.
This application is inspired by bytebeat [8]_, a practice which involves writing short expressions that describe audio as a function of time, generating music sample-by-sample.
Typically, these expressions involve many bit-twiddling operations, consisting primarily of logic gates (bitwise AND, OR, XOR, NOT) and shifts --- a fact that suggests a remarkably good fit for difflogic, wherein models consist of networks of gates.
Other inspirations include work on sound matching: reproducing a given sound or family of sounds by estimating synthesizer parameters [9]_, generating patches [10]_, or training models [11]_.
In this vein, I will attempt to train difflogic gates to reproduce particular sounds, treating the entire network as a bytebeat-style function of time (sample index) that outputs samples.
Thanks to the tricks difflogic employs to train a network of discrete gates, this approach will enable sound matching via gradient descent and backpropagation (as in e.g. DDSP) rather than evolutionary methods, while still ultimately generating a discrete function.
Lastly, I will build an interactive application to explore the space of sound-generating networks (e.g. by mutating a network, or morphing between two networks) and visualize the execution of logic gate networks.
Software
=========
- C
- C++
- Python
- PyTorch
- difflogic
- dasp
- Faust
- Linux
Hardware
========
- Bela
- BeagleBone Black
- Bela Cape
- Microphone
- Speaker
- OLED screen
Timeline
********
.. note:: This timeline is based on the `official GSoC timeline <https://developers.google.com/open-source/gsoc/timeline>`_
Timeline summary
=================
.. table::
+------------------------+----------------------------------------------------------------------------------------------------+
| Date | Activity |
+========================+====================================================================================================+
| February 26 | Connect with possible mentors and request review on first draft |
+------------------------+----------------------------------------------------------------------------------------------------+
| March 4 | Complete prerequisites, verify value to community and request review on second draft |
+------------------------+----------------------------------------------------------------------------------------------------+
| March 11 | Finalized timeline and request review on final draft |
+------------------------+----------------------------------------------------------------------------------------------------+
| March 21 | Submit application |
+------------------------+----------------------------------------------------------------------------------------------------+
| May 1 | Start bonding |
+------------------------+----------------------------------------------------------------------------------------------------+
| May 27 | Start coding and introductory video |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 3 | Release introductory video and complete milestone #1 |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 10 | Complete milestone #2 |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 17 | Complete milestone #3 |
+------------------------+----------------------------------------------------------------------------------------------------+
| June 24 | Complete milestone #4 |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 1 | Complete milestone #5 |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 8 | Submit midterm evaluations |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 15 | Complete milestone #6 |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 22 | Complete milestone #7 |
+------------------------+----------------------------------------------------------------------------------------------------+
| July 29 | Complete milestone #8 |
+------------------------+----------------------------------------------------------------------------------------------------+
| August 5 | Complete milestone #9 |
+------------------------+----------------------------------------------------------------------------------------------------+
| August 12 | Complete milestone #10 |
+------------------------+----------------------------------------------------------------------------------------------------+
| August 19 | Submit final project video, submit final work to GSoC site and complete final mentor evaluation |
+------------------------+----------------------------------------------------------------------------------------------------+
Timeline detailed
=================
Community Bonding Period (May 1st - May 26th)
----------------------------------------------------------------------------
GSoC contributors get to know mentors, read documentation, get up to speed to begin working on their projects
Coding begins (May 27th)
----------------------------------------------------------------------------
Milestone #1, Introductory YouTube video (June 3rd)
----------------------------------------------------------------------------
- Setup development environment
- Train trivial difflogic network on laptop & run generated C on Bela (main CPU)
Milestone #2 (June 10th)
----------------------------------------------------------------------------
- Run difflogic network on PRU
- Perform feature extraction (FFT, MFCCs) on PRU
Milestone #3 (June 17th)
----------------------------------------------------------------------------
- Build wrappers to simplify use of difflogic networks in Bela projects
- C++ (namespace & wrapper around difflogic-generated C)
- SuperCollider (UGen)
Milestone #4 (June 24th)
----------------------------------------------------------------------------
- Build wrappers to simplify use of difflogic networks in Bela projects
- Pure Data (external)
- Csound (UDO)
Milestone #5 (July 1st)
----------------------------------------------------------------------------
- Explore feasibility of combining difflogic with DDSP techniques (via dasp and possibly Faust auto-differentiation)
- Use difflogic network to control synthesizer parameters
Submit midterm evaluations (July 8th)
----------------------------------------------------------------------------
.. important::
**July 12 - 18:00 UTC:** Midterm evaluation deadline (standard coding period)
Milestone #6 (July 15th)
----------------------------------------------------------------------------
- Investigate feasibility of interactive machine learning (e.g. fine-tuning) with difflogic networks
- Combine difflogic network with complementary cheaply techniques (e.g. LPC, template matching via $Q, RapidLib)
Milestone #7 (July 22nd)
----------------------------------------------------------------------------
- Work on example applications
- Classify short mouth sounds for interactive system control (à la `parrot.py <https://github.com/chaosparrot/parrot.py>`_)
- Perform real-time pitch estimation (à la PESTO)
Milestone #8 (July 29th)
----------------------------------------------------------------------------
- Experiment with implementing popular architectures (e.g. VAEs, as in RAVE) as difflogic networks
- Experiment with difflogic knowledge distillation: training a difflogic network to approximate the behavior of a pre-trained, conventional neural network (student/teacher)
Milestone #9 (Aug 5th)
----------------------------------------------------------------------------
- Experiment with training difflogic networks for sound reconstruction
- Bytebeat-inspired: feed increasing timestamps to network, get subsequent audio samples out
Milestone #10 (Aug 12th)
----------------------------------------------------------------------------
- Creative application: Interactive exploration of space of difflogic sound reconstruction models
- "Glitch" - random perturbations of network (mutate gates & connections)
- "Morph" - interpolate (in terms of tree edit-distance) between different sound-generating networks
- Visualize difflogic networks & their execution
Final YouTube video (Aug 19th)
----------------------------------------------------------------------------
Submit final project video, submit final work to GSoC site
and complete final mentor evaluation
Final Submission (Aug 24nd)
----------------------------------------------------------------------------
.. important::
**August 19 - 26 - 18:00 UTC:** Final week: GSoC contributors submit their final work
product and their final mentor evaluation (standard coding period)
**August 26 - September 2 - 18:00 UTC:** Mentors submit final GSoC contributor
evaluations (standard coding period)
Initial results (September 3)
----------------------------------------------------------------------------
.. important::
**September 3 - November 4:** GSoC contributors with extended timelines continue coding
**November 4 - 18:00 UTC:** Final date for all GSoC contributors to submit their final work product and final evaluation
**November 11 - 18:00 UTC:** Final date for mentors to submit evaluations for GSoC contributor projects with extended deadline
Experience and approach
***********************
I have extensive experience with embedded systems and real-time audio.
As an undergraduate, I worked on embedded systems during internships at Astranis and Google.
For a final class project, I developed a multi-effects pedal with a configurable signal chain in C using fixed-point arithmetic on the `Cypress PSoC 5 <https://www.infineon.com/cms/en/product/microcontroller/32-bit-psoc-arm-cortex-microcontroller/32-bit-psoc-5-lp-arm-cortex-m3/>`_ (an ARM-based system-on-a-chip with configurable digital and analog blocks).
My `master's work <https://dspace.mit.edu/handle/1721.1/129201>`_ involved localizing RFID tags using software-defined radios with framerates sufficient for interactive systems.
Currently, I am a teaching assistant for a class on Audio Software Engineering (in Rust, with a focus on real-time audio software), in which I have been responsible for preparing much of the material and lectures.
I have worked with a variety of microcontrollers and single-board computers, from writing assembly on the Intel 8051, to C++ on Arduinos and ESP32s, to Python and JS on Raspberry Pis.
I have also employed machine learning techniques to build interactive systems.
In a graduate course on multimodal user interaction, I gained experience with classic machine learning techniques, and employed cheap techniques for gesture recognition in a `tablet-based musical sketchpad <https://github.com/ijc8/notepad>`_.
In the meantime, I have been following developments in machine learning for audio (particularly those that are feasible to run locally, especially sans GPU), and I have experimented with models such as RAVE and Whisper (using the latter for an recent interactive audiovisual `hackathon project <https://github.com/ijc8/hackathon-2024>`_).
Much of my graduate work has focused on generative music and computational representations of music.
My recent work on `ScoreCard <https://ijc8.me/s>`_ has put an extreme emphasis on fitting music-generating programs (typically written in C) into efficient, self-contained packages that are small enough to store in a QR code (\< 3kB).
Previous projects such as `Blocks <https://ijc8.me/blocks>`_ (an audiovisual installation) and `kilobeat <https://ijc8.me/kilobeat>`_ (a collaborative livecoding tool) have probed the musical potential of extremely short fragments of code (bytebeat & floatbeat expressions).
These projects also explore methods of visualizing musical programs, either in terms of their output or their execution.
More information about my work is available on `my website <https://ijc8.me>`_ and `GitHub <https://github.com/ijc8>`_.
I am particularly interested in difflogic because it occupies an intersection between lightweight machine learning techniques (cheaper is better!) and compact representations of musical models (less is more!), and I am strongly motivated to see what it can do.
Contingency
===========
If I get stuck on something related to BeagleBoard or Bela development, I plan to take advantage of resources within those communities (such as documentation, forums, and Discord servers).
If I get stuck on something related to ML or DSP, I plan to refer back to reference texts and the papers and code of related work (DDSP, RAVE, PESTO, etc.), and I may reach out to colleagues within the ML space (such as those in the Music Information Retrieval lab within my department) for advice.
If I get stuck on something related to music or design, I plan to take a break and go on a walk. :-)
Benefit
========
The first half of this project will provide a straightforward means to develop models with difflogic and run them on embedded systems such as BeagleBoards and particularly Bela. (The wrappers for Bela's supported languages may also prove generally useful outside of embedded contexts.)
Making it easier for practitioners to use difflogic models in creative applications will, in turn, aid in the development of NIMEs and DMIs that can benefit from the small size and fast inference (and corresponding portability and low latency) of difflogic networks.
The second half of this project, depending on the results of my explorations, may demonstrate useful ways to combine difflogic with other ML & DSP techniques, and provide some useful and interesting audio-focused applications to serve as effective demonstrations of the possibilities for ML on the BeagleBoard and possible starting points for others.
Misc
====
`Here <https://github.com/jadonk/gsoc-application/pull/194>`_ is my pull request demonstrating cross-compilation and version control.
References
==========
.. [1] Petersen, F. et al. 2022. Deep Differentiable Logic Gate Networks. Proceedings of the 36th Conference on Neural Information Processing Systems (Oct. 2022).
.. [2] Engel, J. et al. 2020. DDSP: Differentiable Digital Signal Processing. Proceedings of the International Conference on Learning Representations (2020).
.. [3] Caillon, A. and Esling, P. 2021. RAVE: A variational autoencoder for fast and high-quality neural audio synthesis. arXiv.
.. [4] Riou, A. et al. 2023. PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective. Proceedings of the 24th International Society for Music Information Retrieval Conference (Sep. 2023).
.. [5] Radford, A. et al. 2023. Robust Speech Recognition via Large-Scale Weak Supervision. Proceedings of the 40th International Conference on Machine Learning (2023).
.. [6] Vatavu, R.-D. et al. 2018. $Q: a super-quick, articulation-invariant stroke-gesture recognizer for low-resource devices. Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (New York, NY, USA, Sep. 2018), 1–12.
.. [7] Fiebrink, R. et al. 2009. A Meta-Instrument for Interactive, On-the-fly Machine Learning. Proceedings of the International Conference on New Interfaces for Musical Expression (2009), 280–285.
.. [8] Heikkilä, V.-M. 2011. Discovering novel computer music techniques by exploring the space of short computer programs. arXiv.
.. [9] Yee-King, M. and Roth, M. 2008. Synthbot: An unsupervised software synthesizer programmer. ICMC (2008).
.. [10] Macret, M. and Pasquier, P. 2014. Automatic design of sound synthesizers as pure data patches using coevolutionary mixed-typed cartesian genetic programming. Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation (New York, NY, USA, Jul. 2014), 309–316.
.. [11] Caspe, F. et al. 2022. DDX7: Differentiable FM Synthesis of Musical Instrument Sounds. Proceedings of the 23rd International Society for Music Information Retrieval Conference. (2022).