Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • gsoc/gsoc.beagleboard.io
  • Krishna_13/gsoc.beagleboard.io
  • krvprashanth/gsoc.beagleboard.io
  • lorforlinux/gsoc.beagleboard.io
  • jkridner/gsoc
  • anujdeshpande/gsoc.beagleboard.io
  • ayush1325/gsoc.beagleboard.io
  • samdai/gsoc.beagleboard.io
  • abdelrahman/gsoc.beagleboard.io
  • aryan_nanda/gsoc.beagleboard.io
  • fuadzade/gsoc.beagleboard.io
  • vvaishak/gsoc.beagleboard.io
  • Roger18/gsoc.beagleboard.io
  • mclem/gsoc.beagleboard.io
  • NachtSpyder04/gsoc.beagleboard.io
  • melta101/melta101-gsoc
  • saiprasad-patil/gsoc.beagleboard.io
  • mattd/gsoc.beagleboard.io
  • SurajS0215/gsoc.beagleboard.io
  • jarm/gsoc.beagleboard.io
  • ijc/gsoc.beagleboard.io
  • himanshuk/gsoc.beagleboard.io
  • mahelaekanayake10/gsoc.beagleboard.io
  • alecdenny/gsoc.beagleboard.io
  • darshan15/gsoc.beagleboard.io
  • san.s.kar03/gsoc.beagleboard.io
  • jjateen/gsoc.beagleboard.io
  • vidhusarwal/gsoc.beagleboard.io
  • giuliomoro/gsoc.beagleboard.io
  • ketanthorat/gsoc.beagleboard.io
  • Sahil7741/gsoc.beagleboard.io
  • Whiz-Manas/mana-gsoc-beagleboard-io
  • fayezzouari/gsoc.beagleboard.io
33 results
Show changes
Commits on Source (56)
Showing
with 724 additions and 157 deletions
# The Docker image that will be used to build your app
image: beagle/sphinx-build-env:latest
image: registry.git.beagleboard.org/docs/sphinx-build-env:latest
pages:
tags:
- docker-amd64
before_script:
- source ./venv-build-env.sh
script:
- "./gitlab-build.sh"
artifacts:
......
.. _C:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/c.html
.. _Assembly:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/assembly.html
.. _Verilog:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/verilog.html
.. _Zephyr:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/zephyr.html
.. _Linux:
https://docs.beagleboard.cc/latest/intro/beagle101/linux.html
.. _device-tree:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/device-tree.html
.. _FPGA:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/fpga.html
.. _basic wiring:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/basic-wiring.html
.. _motors:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/motors.html
.. _embedded serial interfaces:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/embedded-serial.html
.. _OpenBeagle CI:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/openbeagle-ci.html
.. _verification:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/verification.html
.. _wireless communications:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/wireless-communications.html
.. _Buildroot:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/buildroot.html
.. _RISC-V ISA:
https://docs.beagleboard.cc/docs/latest/intro/beagle101/riscv.html
\ No newline at end of file
......@@ -17,15 +17,24 @@ from sphinx.application import Sphinx
sys.path.append(str(Path(".").resolve()))
project = 'gsoc.beagleboard.io'
copyright = '2024, BeagleBoard.org'
copyright = '2025, BeagleBoard.org'
author = 'BeagleBoard.org'
# Add epilog details to rst_epilog
rst_epilog =""
rst_epilog_path = "_static/epilog/"
for (dirpath, dirnames, filenames) in os.walk(rst_epilog_path):
for filename in filenames:
with open(dirpath + filename) as f:
rst_epilog += f.read()
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = [
"sphinx_design",
"sphinxcontrib.youtube",
"sphinxcontrib.images",
"sphinx_copybutton"
]
......@@ -122,8 +131,8 @@ html_theme_options = {
"use_edit_page_button": True,
"show_toc_level": 1,
"navbar_align": "right",
"show_nav_level": 2,
"announcement": "Welcome to the new site for BeagleBoard.org GSoC 2024 projects!",
"show_nav_level": 1,
"announcement": "Welcome to the site for BeagleBoard.org GSoC 2025 projects!",
# "show_version_warning_banner": True,
"navbar_center": ["navbar-nav"],
"navbar_start": ["navbar-logo"],
......@@ -181,6 +190,7 @@ latex_elements = {
),
}
sd_fontawesome_latex = True
latex_engine = "xelatex"
latex_logo = str("_static/images/logo-latex.pdf")
latex_documents = []
......
......@@ -12,14 +12,14 @@ Guides
Spend your summer break writing code and learning about open source development while earning money!
Accepted contributors work with a mentor and become a part of the open source community. Many become lifetime
open source developers! The 2024 contributor application window will be open from
`March 18th 2024 <https://developers.google.com/open-source/gsoc/timeline#march_18_-_1800_utc>`_ to
`April 2nd 2024 <https://developers.google.com/open-source/gsoc/timeline#april_2_-_1800_utc>`_!
open source developers! The 2025 contributor application window will be open from
`March 24 2025 <https://opensource.googleblog.com/2025/01/google-summer-of-code-2025-is-here.html>`_ to
`April April 8 2025 <https://opensource.googleblog.com/2025/01/google-summer-of-code-2025-is-here.html>`_!
But don't wait for then to engage! Come to our `Discord <https://bbb.io/gsocchat>`_ and
`Forum <https://bbb.io/gsocml>`_ to share ideas today.
This section includes guides for :ref:`contributors <gsoc-contributor-guide>` & :ref:`mentors <gsoc-mentor-guide>` who want to participate
in GSoC 2024 with `BeagleBoard.org <www.beagleboard.org>`_. It's highly recommended to check `GSoC Frequently Asked Questions
in GSoC 2025 with `BeagleBoard.org <www.beagleboard.org>`_. It's highly recommended to check `GSoC Frequently Asked Questions
<https://developers.google.com/open-source/gsoc/faq>`_. For anyone who just want to contribute to this site we also have
a step by step :ref:`contribution guide <gsoc-site-editing-guide>`.
......
......@@ -28,29 +28,53 @@ Ideas
| :bdg-info:`Low complexity` | :bdg-info-line:`90 hours` |
+------------------------------------+-------------------------------+
.. card:: Low-latency I/O RISC-V CPU core in FPGA fabric
.. tip::
Below are the latest project ideas, you can also check our our :ref:`gsoc-old-ideas` and :ref:`Past_Projects` for inspiration.
:fas:`microchip;pst-color-primary` FPGA gateware improvements :bdg-success:`Medium complexity` :bdg-success-line:`175 hours`
.. card:: Upstream Greybus module for Zephyr
^^^^
:fas:`timeline;pst-color-secondary` RTOS/microkernel imporvements :bdg-success:`Medium complexity` :bdg-danger-line:`350 hours`
BeagleV-Fire features RISC-V 64-bit CPU cores and FPGA fabric. In that FPGA fabric, we'd like to
implement a RISC-V 32-bit CPU core with operations optimized for low-latency GPIO. This is similar
to the programmable real-time unit (PRU) RISC cores popularized on BeagleBone Black.
^^^^
| **Goal:** RISC-V-based CPU on BeagleV-Fire FPGA fabric with GPIO
| **Hardware Skills:** Verilog, Verification, FPGA
| **Software Skills:** RISC-V ISA, assembly, `Linux`_
| **Possible Mentors:** `Cyril Jean <https://forum.beagleboard.org/u/vauban>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
Greybus support for Zephyr is currently an out of tree module. This makes it hard to maintain, and makes it almost impossible for other people to test and contribute.
The goal of the project is to get greybus and BeagleConnect Technology ecosystem to a point that is easier to test and maintain. It can be considered a continuation of Replace GBridge Project from 2023
| **Goal:** Add testing, cleanup mikroBUS support, submit upstream and respond to feedback.
| **Hardware Skills:** basic wiring
| **Software Skills:** device-tree, C, ZephyrRTOS, Linux, TCP.
| **Possible Mentors:** `Ayush Singh <https://forum.beagleboard.org/u/ayush1325>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
++++
.. button-link:: https://forum.beagleboard.org/t/low-latency-risc-v-i-o-cpu-core/37156
.. button-link:: https://forum.beagleboard.org/t/upstream-greybus-module-for-zephyr/41170
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card:: A Conversational AI Assistant for BeagleBoard using RAG and Fine-tuning
:fas:`brain;pst-color-secondary` Deep Learning :bdg-success:`Medium complexity` :bdg-success-line:`175 hours`
^^^^
BeagleBoard currently lacks an AI-powered assistant to help users troubleshoot errors. This project aims to address that need while also streamlining the onboarding process for new contributors, enabling them to get started more quickly.
| **Goal:** Develop a domain-specific chatbot for BeagleBoard using a combination of RAG and fine-tuning of an open-source LLM (like Llama 3, Mixtral, or Gemma). This chatbot will assist users with troubleshooting, provide information about BeagleBoard products, and streamline the onboarding process for new contributors.
| **Hardware Skills:** Ability to test applications on BeagleBone AI-64/BeagleY-AI and optimize for performance using quantization techniques.
| **Software Skills:** Python, RAG, Scraping techniques, Fine tuning LLMs, Gradio, Hugging Face Inference Endpoints, NLTK/spaCy, Git
| **Possible Mentors:** `Aryan Nanda <https://forum.beagleboard.org/u/aryan_nanda/>`_
++++
.. button-link:: https://forum.beagleboard.org/t/beaglemind/40806
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card:: Update beagle-tester for mainline testing
......@@ -63,8 +87,8 @@ Ideas
and device-tree overlays on various Beagle computers.
| **Goal:** Execution on Beagle test farm with over 30 mikroBUS boards testing all mikroBUS enabled cape interfaces (PWM, ADC, UART, I2C, SPI, GPIO and interrupt) performing weekly mainline Linux regression verification
| **Hardware Skills:** basic wiring, familiarity with embedded serial interfaces
| **Software Skills:** device-tree, `Linux`_, `C`_, continuous integration with GitLab, Buildroot
| **Hardware Skills:** `basic wiring`_, `embedded serial interfaces`_
| **Software Skills:** `device-tree`_, `Linux`_, `C`_, `OpenBeagle CI`_, `Buildroot`_
| **Possible Mentors:** `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_, `Anuj Deshpande <https://forum.beagleboard.org/u/Anuj_Deshpande>`_, `Dhruva Gole <https://forum.beagleboard.org/u/dhruvag2000>`_
++++
......@@ -86,7 +110,7 @@ Ideas
acceptable upstream.
| **Goal:** Add functional gaps, submit upstream patches for these drivers and respond to feedback
| **Hardware Skills:** Familiarity with wireless communication
| **Hardware Skills:** `wireless communications`_
| **Software Skills:** `C`_, `Linux`_
| **Possible Mentors:** `Ayush Singh <https://forum.beagleboard.org/u/ayush1325>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
......@@ -108,7 +132,7 @@ Ideas
needs to be cleaned up. We can also work on support for Raspberry Pi if UCSD releases their Hat for it.
| **Goal:** Update librobotcontrol for Robotics Cape on BeagleBone AI, BeagleBone AI-64 and BeagleV-Fire
| **Hardware Skills:** Basic wiring, some DC motor familiarity
| **Hardware Skills:** `basic wiring`_, `motors`_
| **Software Skills:** `C`_, `Linux`_
| **Possible Mentors:** `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
......@@ -122,7 +146,7 @@ Ideas
.. card:: Upstream Zephyr Support on BBAI-64 R5
:fas:`timeline;pst-color-secondary` RTOS/microkernel imporvements :bdg-success:`Medium complexity` :bdg-success-line:`350 hours`
:fas:`timeline;pst-color-secondary` RTOS/microkernel imporvements :bdg-success:`Medium complexity` :bdg-danger-line:`350 hours`
^^^^
......@@ -143,42 +167,37 @@ Ideas
:fab:`discourse;pst-color-light` Discuss on forum
.. card:: Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
:fas:`brain;pst-color-secondary` Deep Learning :bdg-success:`Medium complexity` :bdg-success-line:`350 hours`
^^^^
Leveraging the capabilities of BeagleBoard’s powerful processing units, the project will focus on creating a real-time, efficient solution that enhances media consumption experiences by seamlessly integrating custom audio streams during commercial breaks.
| **Goal:** Build a deep learning model, training data set, training scripts, and a runtime for detection and modification of the video stream.
| **Hardware Skills:** Ability to capture and display video streams using `Beagleboard ai-64 <https://www.beagleboard.org/boards/beaglebone-ai-64>`_
| **Software Skills:** `Python <https://www.python.org/>`_, `TensorFlow <https://www.tensorflow.org/>`_, `TFlite <https://www.tensorflow.org/lite>`_, `Keras <https://www.tensorflow.org/guide/keras>`_, `GStreamer <https://gstreamer.freedesktop.org/>`_, `OpenCV <https://opencv.org/>`_
| **Possible Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
++++
.. card:: ALSA linux driver for Bela cape
.. card:: Embedded differentiable logic gate networks for real-time interactive and creative applications
:fas:`wand-sparkles;pst-color-danger` Linux kernel audio :bdg-success:`Medium complexity` :bdg-success-line:`175 hours or longer`
:fas:`brain;pst-color-secondary` Creative AI :bdg-success:`Medium complexity` :bdg-success-line:`350 hours`
^^^^
This project seeks to explore the potential of creative embedded AI, specifically using `Differentiable Logic (DiffLogic) <https://github.com/Felix-Petersen/difflogic>`_, by creating a system that can perform tasks like machine listening, sensor processing, sound and gesture classification, and generative AI.
The Bela cape rev C comes with a ES9080Q 8-channel DAC and a TLV320AIC3104 stereo codec. The former has no support on Linux, while the latter is well supported. This project involves:
- writing a driver for the ES9080Q, which interacts with existing McASP and DMA drivers
- have the ES9080Q and TLV320AIC3104 show up as a single ALSA device with 2
inputs and 10 outputs. Explore whether this can happen via a simple wrapper
in the device tree or requires a dedicated driver.
| **Goal:** Develop an embedded machine learning system on BeagleBone that leverages `Differentiable Logic (DiffLogic) <https://github.com/Felix-Petersen/difflogic>`_ for real-time interactive music creation and environment sensing.
| **Hardware Skills:** Audio and sensor IO with `Bela.io <http://bela.io>`_
| **Software Skills:** Machine learning, deep learning, BeagleBone Programmable Real Time Unit (PRU) programming (see `PRU Cookbook <https://docs.beagleboard.org/latest/books/pru-cookbook/index.html>`_).
| **Possible Mentors:** `Jack Armitage <https://forum.beagleboard.org/u/jarm>`_, `Chris Kiefer <https://forum.beagleboard.org/u/luuma>`_
| **Goal:** Upstream support of ES9080Q; simultaneous access to ES9080Q and TLV320AIC3104 via ALSA
| **Hardware Skills:** basic wiring, logic analyzer
| **Software Skills:** `C`_ or `Rust`_, `Linux`_
| **Possible Mentors:** `Giulio Moro <https://forum.beagleboard.org/u/giuliomoro>`_
| **Upstream Repository:** `Design files for the Bela cape Rev C <https://github.com/BelaPlatform/bela-hardware/tree/master/cape/bela_cape_rev_C3>`_
| **References:** `The Bela repo shows how to access these devices with a custom driver and DMA running on the PRU. <https://github.com/BelaPlatform/Bela/blob/master/pru/pru_rtaudio_irq.p>`_
++++
.. button-link:: https://forum.beagleboard.org/t/enhanced-media-experience-with-ai-powered-commercial-detection-and-replacement/37358
.. button-link:: https://forum.beagleboard.org/t/alsa-drivers-for-es9080q-and-tlv320aic3104/41532
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. button-link:: https://forum.beagleboard.org/tag/gsoc-ideas
:color: danger
:expand:
......@@ -186,13 +205,7 @@ Ideas
:fab:`discourse;pst-color-light` Visit our forum to see newer ideas being discussed!
.. toctree::
:hidden:
.. tip::
You can also check our our :ref:`gsoc-old-ideas` and :ref:`Past_Projects` for inspiration.
.. _Linux:
https://docs.beagleboard.org/latest/intro/beagle101/linux.html
.. _C:
https://jkridner.beagleboard.io/docs/latest/intro/beagle101/learning-c.html
old/index
......@@ -17,13 +17,15 @@ into professional automation tasks, is strongly desired.
^^^^
- **Goal:** Complete implementation of librobotcontrol on BeagleBone AI/AI-64.
- **Hardware Skills:** Basic wiring
- **Software Skills:** C, Linux
- **Possible Mentors:** jkridner, lorforlinux
- **Expected Size of Project:** 350 hrs
- **Hardware Skills:** `basic wiring`_, `motors`_
- **Software Skills:** `C`_, `Linux`_
- **Possible Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
- **Expected Size of Project:** 175 hrs
- **Rating:** Medium
- **Upstream Repository:** https://github.com/jadonk/librobotcontrol/tree/bbai
- **References:**
- **Upstream Repository:** `BeagleBoard.org / librobotcontrol · GitLab <https://openbeagle.org/beagleboard/librobotcontrol>`_
- **References:**
- `Robotics Control Library — BeagleBoard Documentation <https://docs.beagle.cc/projects/librobotcontrol/docs/index.html>`_
- `Robot Control Library: Main Page <https://old.beagleboard.org/static/librobotcontrol/>`_
- http://www.strawsondesign.com/docs/librobotcontrol/index.html
++++
......
......@@ -14,6 +14,48 @@ For some background, be sure to check out `simplify embedded edge AI development
<https://e2e.ti.com/blogs_/b/process/posts/simplify-embedded-edge-ai-development>`_
post from TI.
.. card:: Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
:fas:`brain;pst-color-secondary` Deep Learning :bdg-success:`Medium complexity` :bdg-danger-line:`350 hours`
^^^^
Leveraging the capabilities of BeagleBoard’s powerful processing units, the project will focus on creating a real-time, efficient solution that enhances media consumption experiences by seamlessly integrating custom audio streams during commercial breaks.
| **Goal:** Build a deep learning model, training data set, training scripts, and a runtime for detection and modification of the video stream.
| **Hardware Skills:** Ability to capture and display video streams using `BeagleBone AI-64 <https://www.beagleboard.org/boards/beaglebone-ai-64>`_
| **Software Skills:** `Python <https://www.python.org/>`_, `TensorFlow <https://www.tensorflow.org/>`_, `TFlite <https://www.tensorflow.org/lite>`_, `Keras <https://www.tensorflow.org/guide/keras>`_, `GStreamer <https://gstreamer.freedesktop.org/>`_, `OpenCV <https://opencv.org/>`_
| **Possible Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
++++
.. button-link:: https://forum.beagleboard.org/t/enhanced-media-experience-with-ai-powered-commercial-detection-and-replacement/37358
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card:: Embedded differentiable logic gate networks for real-time interactive and creative applications
:fas:`brain;pst-color-secondary` Creative AI :bdg-success:`Medium complexity` :bdg-danger-line:`350 hours`
^^^^
This project seeks to explore the potential of creative embedded AI, specifically using `Differentiable Logic (DiffLogic) <https://github.com/Felix-Petersen/difflogic>`_, by creating a system that can perform tasks like machine listening, sensor processing, sound and gesture classification, and generative AI.
| **Goal:** Develop an embedded machine learning system on BeagleBone that leverages `Differentiable Logic (DiffLogic) <https://github.com/Felix-Petersen/difflogic>`_ for real-time interactive music creation and environment sensing.
| **Hardware Skills:** Audio and sensor IO with `Bela.io <http://bela.io>`_
| **Software Skills:** Machine learning, deep learning, BeagleBone Programmable Real Time Unit (PRU) programming (see `PRU Cookbook <https://docs.beagleboard.org/latest/books/pru-cookbook/index.html>`_).
| **Possible Mentors:** `Jack Armitage <https://forum.beagleboard.org/u/jarm>`_, `Chris Kiefer <https://forum.beagleboard.org/u/luuma>`_
++++
.. button-link:: https://forum.beagleboard.org/t/embedded-differentiable-logic-gate-networks-for-real-time-interactive-and-creative-applications/37768
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card::
:fas:`brain;pst-color-secondary` **YOLO models on the X15/AI-64**
......
......@@ -3,6 +3,29 @@
FPGA based projects
####################
.. card:: Low-latency I/O RISC-V CPU core in FPGA fabric
:fas:`microchip;pst-color-primary` FPGA gateware improvements :bdg-success:`Medium complexity` :bdg-success-line:`175 hours`
^^^^
BeagleV-Fire features RISC-V 64-bit CPU cores and FPGA fabric. In that FPGA fabric, we'd like to
implement a RISC-V 32-bit CPU core with operations optimized for low-latency GPIO. This is similar
to the programmable real-time unit (PRU) RISC cores popularized on BeagleBone Black.
| **Goal:** RISC-V-based CPU on BeagleV-Fire FPGA fabric with GPIO
| **Hardware Skills:** `Verilog`_, `verification`_, `FPGA`_
| **Software Skills:** `RISC-V ISA`_, `assembly`_, `Linux`_
| **Possible Mentors:** `Cyril Jean <https://forum.beagleboard.org/u/vauban>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_
++++
.. button-link:: https://forum.beagleboard.org/t/low-latency-risc-v-i-o-cpu-core/37156
:color: danger
:expand:
:fab:`discourse;pst-color-light` Discuss on forum
.. card::
:fas:`microchip;pst-color-secondary` **RISC-V Based PRU on FPGA**
......
:orphan:
.. _gsoc-old-ideas:
Old GSoC Ideas
......
.. _gsoc-2024-projects:
:far:`calendar-days` 2024
##########################
.. note:: Only 3 out of 4 :ref:`accepted students <gsoc-2024-proposals>` were able to complete the program in 2024.
Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
********************************************************************************
.. youtube:: Kagg8JycOfo
:width: 100%
| **Summary:** Leveraging the capabilities of BeagleBoard’s powerful processing units, the project will focus on creating a real-time, efficient solution that enhances media consumption experiences by seamlessly integrating custom audio streams during commercial breaks.
- Develop a neural network model: Combine Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to analyze video and audio data, accurately identifying commercial segments within video streams.
- Implement a real-time pipeline: Create a real-time pipeline for BeagleBoard that utilizes the trained model to detect commercials in real-time and replace them with alternative content or obfuscate them, alongside replacing the audio with predefined streams.
- Optimize for BeagleBoard: Ensure the entire system is optimized for real-time performance on BeagleBoard hardware, taking into account its unique computational capabilities and constraints.
**Contributor:** Aryan Nanda
**Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_, Kumar Abhishek
.. grid:: 2 2 2 2
.. grid-item::
.. button-link:: https://summerofcode.withgoogle.com/archive/2024/projects/UOX7iDEU
:color: info
:shadow:
:expand:
:fab:`google;pst-color-light` - GSoC Registry
.. grid-item::
.. button-ref:: gsoc-2024-proposal-aryan-nanda
:color: primary
:shadow:
:expand:
Proposal
Low-latency I/O RISC-V CPU core in FPGA fabric
************************************************
.. youtube:: ic0RRK6d3hg
:width: 100%
| **Summary:** Implementation of PRU subsystem on BeagleV-Fire’s FPGA fabric, resulting in a real-time microcontroller system working alongside the main CPU, providing low-latency access to I/O.
**Contributor:** Atharva Kashalkar
**Mentors:** `Cyril Jean <https://forum.beagleboard.org/u/vauban>`_, `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, Vedant Paranjape, Kumar Abhishek
.. grid:: 2 2 2 2
.. grid-item::
.. button-link:: https://summerofcode.withgoogle.com/archive/2024/projects/KjUoFlg2
:color: info
:shadow:
:expand:
:fab:`google;pst-color-light` - GSoC Registry
.. grid-item::
.. button-ref:: gsoc-2024-proposal-roger18
:color: primary
:shadow:
:expand:
Proposal
Differentiable Logic for Interactive Systems and Generative Music - Ian Clester
********************************************************************************
.. youtube:: NvHxMCF8sAQ
:width: 100%
| **Summary:** Developing an embedded machine learning system on BeagleBoard that leverages Differentiable Logic (DiffLogic) for real-time interactive music creation and environment sensing. The system will enable on-device learning, fine-tuning, and efficient processing for applications in new interfaces for musical expression.
**Contributor:** Ian Clester
**Mentors:** `Jack Armitage <https://forum.beagleboard.org/u/jarm/summary>`_, Chris Kiefer
.. grid:: 2 2 2 2
.. grid-item::
.. button-link:: https://summerofcode.withgoogle.com/archive/2024/projects/FBk0MM8g
:color: info
:shadow:
:expand:
:fab:`google;pst-color-light` - GSoC Registry
.. grid-item::
.. button-ref:: gsoc-2024-proposal-ijc
:color: primary
:shadow:
:expand:
Proposal
\ No newline at end of file
......@@ -14,6 +14,11 @@ GSoC over the previous years is given in the section that follows.
:margin: 4 4 0 0
:gutter: 4
.. grid-item-card:: :far:`calendar-days` 2024
:text-align: center
:link: gsoc-2024-projects
:link-type: ref
.. grid-item-card:: :far:`calendar-days` 2023
:text-align: center
:link: gsoc-2023-projects
......@@ -83,6 +88,7 @@ GSoC over the previous years is given in the section that follows.
:maxdepth: 1
:hidden:
2024
2023
2022
2021
......
.. _gsoc-2024-proposal-aryan-nanda:
.. _gsoc-proposal-template:
Enhanced Media Experience with AI-Powered Commercial Detection and Replacement
###############################################################################
Enhanced Media Experience with AI-Powered Commercial Detection and Replacement - Aryan Nanda
############################################################################################
Introduction
*************
......@@ -15,12 +14,13 @@ Summary links
- **Contributor:** `Aryan Nanda <https://forum.beagleboard.org/u/aryan_nanda>`_
- **Mentors:** `Jason Kridner <https://forum.beagleboard.org/u/jkridner>`_, `Deepak Khatri <https://forum.beagleboard.org/u/lorforlinux>`_
- **GSoC Repository:** TBD
- **Repository:** `Main Code Repostiory on Gitlab <https://openbeagle.org/aryan_nanda/gsoc_2024-enhanced_media_experience_with_ai-powered_commercial_detection_and_replacement>`_, `Mirror of Code Repository on Github <https://github.com/AryanNanda17/GSoC_2024-Enhanced_Media_Experience_with_AI-Powered_Commercial_Detection_and_Replacement>`_
- **Weekly Updates:** `Forum Thread <https://forum.beagleboard.org/t/weekly-progress-report-thread-enhanced-media-experience-with-ai-powered-commercial-detection-and-replacement/38487>`_
Status
=======
This project is currently just a proposal.
This project has been accepted for GSoC 2024.
Proposal
========
......@@ -130,13 +130,13 @@ This way we can use the features learned by MoViNets on the larger dataset with
This can help improve the model's performance even with limited data.
.. image:: Assets/Figure1.png
.. image:: images/Figure1.png
:alt: Stream buffer in MoViNets
.. centered::
Figure 1: Stream buffer in MoViNets [2]
.. image:: Assets/Figure2.png
.. image:: images/Figure2.png
:alt: Standard Convolution Vs Causal Convolution
.. centered::
......@@ -150,7 +150,7 @@ The depth of the queue will be determined through experimentation to find the op
The Conv+LSTMs model will perform well as it considers both the spatial and temporal features of videos just like a Conv3D model. The only reason it is not my first choice is because MoViNets are considered to be better for real-time performance.
.. image:: Assets/Figure3.png
.. image:: images/Figure3.png
:alt: Conv3D+LSTMs
.. centered::
......@@ -277,7 +277,7 @@ In order to infer a DNN, SDK expects the DNN and associated artifacts in the bel
Therefore, after choosing the model to be used in GStreamer pipeline, I will generate the artifacts directory by following the instructions mentioned in TexasInstruments:edgeai-tidl-tools examples [7].
.. image:: Assets/Figure4.png
.. image:: images/Figure4.png
:alt: TFLite Runtime
.. centered::
......@@ -303,7 +303,7 @@ NNStreamer provides efficient and flexible data streaming for machine learning
applications, making it suitable for tasks such as running inference on video frames.
So, I will use NNStreamer elements to do inferencing of videos.
.. image:: Assets/Figure5.png
.. image:: images/Figure5.png
:alt: GStreamer Pipeline
.. centered::
......@@ -320,7 +320,7 @@ The above GStreamer pipeline is a demo pipeline inspired from edge_ai_apps/data_
Project Workflow
===================
.. image:: Assets/Figure6.png
.. image:: images/Figure6.png
:alt: Project Workflow
.. centered::
......@@ -351,73 +351,75 @@ Timeline summary
.. table::
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Date | Activity |
+========================+========================================================================================================================================================+
| February 26 - March 3 | Connect with possible mentors and request review on first draft |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| March 4 - March 10 | Complete prerequisites, verify value to community and request review on second draft |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| March 11 - March 20 | Finalized timeline and request review on final draft |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| March 21 - April 2 | Proposal review and Submit application |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| April 3 - May 1 | Understanding GStreamer pipeline and TFLite runtime of BeagleBone AI-64. |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| May 2 - May 10 | Start bonding and Discussing implementation ideas with mentors. |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| May 11 - May 31 | Focus on college exams. |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| June 1 - June 3 | Start coding and introductory video |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| June 3 - June 9 | :ref:`milestone #1<Milestone1>` -> Releasing introductory video and developing Commercial dataset |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| June 10 - June 16 | :ref:`milestone #2<Milestone2>` -> Developing Non-Commercial dataset and dataset Preprocessing |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| June 17 - June 23 | :ref:`milestone #3<Milestone3>` -> Transfer learning and fine-tuning MoViNets architecture |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| June 24 - June 30 | :ref:`milestone #4<Milestone4>` -> Transfer learning and fine-tuning ResNet architecture |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| July 1 - July 7 | :ref:`milestone #5<Milestone5>` -> Evaluate performance metrics to choose the best-performing model. |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| July 8 - July 14 | :ref:`Submit midterm evaluations <Submit midterm evaluation>` |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| July 15 - July 21 | :ref:`milestone #6<Milestone6>` -> Finalizing the best model by performing real-time inferencing |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| July 22 - July 28 | :ref:`milestone #7<Milestone7>` -> Compiling the model and generating artifacts and building pre-processing part of GStreamer pipeline |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| July 29 - August 4 | :ref:`milestone #8<Milestone8>` -> Building the compute pipeline using NNStreamer |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| August 5 - August 11 | :ref:`milestone #9<Milestone9>` -> Building the post-processing part of GStreamer pipeline |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| August 12 - August 18 | :ref:`milestone #10<Milestone10>` -> Enhancing real-time performance |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| August 19 | :ref:`Submit final project video, submit final work to GSoC site and complete final mentor evaluation<Final project video>` |
+------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
+------------------------+--------------------------------------------------------------------------------------+
| Date | Activity |
+========================+======================================================================================+
| February 26 - March 3 | Connect with possible mentors and request review on first draft |
+------------------------+--------------------------------------------------------------------------------------+
| March 4 - March 10 | Complete prerequisites, verify value to community and request review on second draft |
+------------------------+--------------------------------------------------------------------------------------+
| March 11 - March 20 | Finalized timeline and request review on final draft |
+------------------------+--------------------------------------------------------------------------------------+
| March 21 - April 2 | Proposal review and Submit application |
+------------------------+--------------------------------------------------------------------------------------+
| April 3 - May 1 | Understanding GStreamer pipeline and TFLite runtime of BeagleBone AI-64. |
+------------------------+--------------------------------------------------------------------------------------+
| May 2 - May 10 | :ref:`ACRBonding` |
+------------------------+--------------------------------------------------------------------------------------+
| May 11 - May 31 | Focus on college exams. |
+------------------------+--------------------------------------------------------------------------------------+
| June 1 - June 3 | Start coding and introductory video |
+------------------------+--------------------------------------------------------------------------------------+
| June 3 - June 9 | :ref:`ACRMilestone1` |
+------------------------+--------------------------------------------------------------------------------------+
| June 10 - June 16 | :ref:`ACRMilestone2` |
+------------------------+--------------------------------------------------------------------------------------+
| June 17 - June 23 | :ref:`ACRMilestone3` |
+------------------------+--------------------------------------------------------------------------------------+
| June 24 - June 30 | :ref:`ACRMilestone4` |
+------------------------+--------------------------------------------------------------------------------------+
| July 1 - July 7 | :ref:`ACRMilestone5` |
+------------------------+--------------------------------------------------------------------------------------+
| July 8 - July 14 | :ref:`ACRSubmit-midterm-evaluations` |
+------------------------+--------------------------------------------------------------------------------------+
| July 15 - July 21 | :ref:`ACRMilestone6` |
+------------------------+--------------------------------------------------------------------------------------+
| July 22 - July 28 | :ref:`ACRMilestone7` |
+------------------------+--------------------------------------------------------------------------------------+
| July 29 - August 4 | :ref:`ACRMilestone8` |
+------------------------+--------------------------------------------------------------------------------------+
| August 5 - August 11 | :ref:`ACRMilestone9` |
+------------------------+--------------------------------------------------------------------------------------+
| August 12 - August 18 | :ref:`ACRMilestone10` |
+------------------------+--------------------------------------------------------------------------------------+
| August 19 | :ref:`ACRFinal-project-video` |
+------------------------+--------------------------------------------------------------------------------------+
Timeline detailed
==================
.. _ACRBonding:
Community Bonding Period (May 1st - May 10th)
==============================================
----------------------------------------------
- Discuss implementation ideas with mentors.
- Discuss the scope of the project.
.. _Milestone1:
.. _ACRMilestone1:
Milestone #1, Introductory YouTube video (June 3rd)
===================================================
Milestone #1, Releasing introductory video and developing commercial dataset (June 3)
-------------------------------------------------------------------------------------
- Making an Introductory Video.
- Commercial dataset acquisition:
- Web scrape videos marked as advertisements from YouTube 8-M dataset.
- Ensure proper labeling and categorization of commercial videos.
.. _Milestone2:
.. _ACRMilestone2:
Milestone #2 (June 10th)
==========================
Milestone #2, Developing non-commercial dataset and dataset preprocessing (June 10)
-------------------------------------------------------------------------------------
- Non-commercial dataset acquisition:
- Web scrape random videos from other categories of YouTube 8-M dataset.
......@@ -427,38 +429,39 @@ Milestone #2 (June 10th)
- Divide datasets into train, validation, and test sets.
- Perform random shuffling of data to maintain temporal dependencies.
.. _Milestone3:
.. _ACRMilestone3:
Milestone #3 (June 17th)
=========================
Milestone #3, Transfer learning and fine-tuning MoViNets architecture (June 17)
-------------------------------------------------------------------------------------
- Transfer learning and fine-tuning MoViNets architecture:
- Apply transfer learning on MoViNets and fine-tune its last few layers.
- Train MoViNets on the prepared dataset for video classification.
.. _Milestone4:
.. _ACRMilestone4:
Milestone #4 (June 24th)
==========================
Milestone #4, Transfer learning and fine-tuning ResNet architecture (June 24)
-------------------------------------------------------------------------------------
- Transfer learning and fine-tuning ResNet architecture:
- Adding additional layers of LSTMs for extracting temporal dependencies.
- Developing ResNet-LSTMs model architecture for video classification.
- Train the ResNet-LSTMs model on the prepared dataset.
.. _Milestone5:
.. _ACRMilestone5:
Milestone #5, Evaluate performance metrics to choose the best-performing model (July 1)
---------------------------------------------------------------------------------------
Milestone #5 (July 1st)
========================
- Finalize the best model:
- Save all trained models to local disk
- Evaluate performance metrics to choose the best-performing model.
.. _Submit midterm evaluation:
.. _ACRSubmit-midterm-evaluations:
Submit midterm evaluations (July 8th)
=====================================
-------------------------------------------------------------------------------------
- Document the progress made during the first phase of the project.
......@@ -466,60 +469,60 @@ Submit midterm evaluations (July 8th)
**July 12 - 18:00 UTC:** Midterm evaluation deadline (standard coding period)
.. _Milestone6:
.. _ACRMilestone6:
Milestone #6 (July 15th)
=========================
Milestone #6, Finalizing the best model by performing real-time inferencing (July 15)
--------------------------------------------------------------------------------------
- Finalize the best model:
- Perform real-time inference using OpenCV to determine the model that yields the best results with high-performance.
- Based on all the options tried in Phase 1, decide on the final model to be used in the GStreamer pipeline.
.. _Milestone7:
.. _ACRMilestone7:
Milestone #7 (July 22nd)
=========================
Milestone #7, Compiling the model and generating artifacts and building pre-processing part of GStreamer pipeline (July 22)
----------------------------------------------------------------------------------------------------------------------------
- Compile the chosen model and generate artifacts for TFLite runtime.
- Building the pre-processing part of GStreamer pipeline:
- Develop the pre-processing module to prepare video frames for inference.
.. _Milestone8:
.. _ACRMilestone8:
Milestone #8 (July 29th)
=========================
Milestone #8, Building the compute pipeline using NNStreamer (July 29)
----------------------------------------------------------------------------------------------------------------------------
- Building the compute pipeline using NNStreamer:
- Implement NNStreamer for inferencing videos using the compiled model.
.. _Milestone9:
.. _ACRMilestone9:
Milestone #9 (Aug 5th)
=======================
Milestone #9, Building the post-processing part of GStreamer pipeline (August 5)
----------------------------------------------------------------------------------------------------------------------------
- Building the post-processing part of GStreamer pipeline:
- Develop the post-processing module to perform actions based on classification results.
- Implement replacement or obscuring of commercial segments and audio substitution.
.. _Milestone10:
.. _ACRMilestone10:
Milestone #10 (Aug 12th)
========================
Milestone #10, Enhancing real-time performance (August 12)
----------------------------------------------------------------------------------------------------------------------------
- Enhancing real-time performance:
- Optimize the GStreamer pipeline for real-time performance using native hardware accelerators.
- Ensure smooth and efficient processing of video streams.
.. _Final project video:
.. _ACRFinal-project-video:
Final YouTube video (Aug 19th)
===============================
Submit final project video, submit final work to GSoC site and complete final mentor evaluation (August 19)
----------------------------------------------------------------------------------------------------------------------------
- Submit final project video, submit final work to GSoC site and complete final mentor evaluation.
Final Submission (Aug 24nd)
============================
----------------------------------------------------------------------------------------------------------------------------
.. important::
......@@ -530,7 +533,7 @@ Final Submission (Aug 24nd)
evaluations (standard coding period)
Initial results (September 3)
=============================
----------------------------------------------------------------------------------------------------------------------------
.. important::
**September 3 - November 4:** GSoC contributors with extended timelines continue coding
......@@ -565,7 +568,7 @@ Contingency
- If I get stuck on my project and my mentor isn’t around, I will use the following resources:-
- `MoViNets <https://www.tensorflow.org/hub/tutorials/movinet>`_
- `GStreamer Docs <https://gstreamer.freedesktop.org/>`_
- `BeagleBone AI-64 <https://docs.beagleboard.org/latest/boards/beaglebone/ai-64/01-introduction.html>`_
- `BeagleBone AI-64 docs <https://docs.beagleboard.org/latest/boards/beaglebone/ai-64/01-introduction.html>`_
- `NNStreamer <https://nnstreamer.github.io/>`_
- Moreover, the BeagleBoard community is extremely helpful and active in resolving doubts, which makes it a great going for the project resources and clarification.
- I intend to remain involved and provide ongoing support for this project beyond the duration of the GSOC timeline.
......
.. _gsoc-proposal-template:
.. _gsoc-2024-proposal-ijc:
Proposal template
#################
Differentiable Logic for Interactive Systems and Generative Music - Ian Clester
###############################################################################
Introduction
*************
......@@ -9,104 +9,89 @@ Introduction
Summary links
=============
- **Contributor:** `Alec Denny <https://forum.beagleboard.org/u/alecdenny/summary>`_
- **Mentors:** `Jack Armitage <https://forum.beagleboard.org/u/jarm/summary>`_, `Chris Kiefer <https://forum.beagleboard.org/u/luuma/summary>`_
- **Code:** `Google Summer of Code / greybus / cc1352-firmware · GitLab <https://openbeagle.org/gsoc/greybus/cc1352-firmware>`_
- **GSoC:** `Google Summer of Code <https://forum.beagleboard.org/t/embedded-differentiable-logic-gate-networks-for-real-time-interactive-and-creative-applications/37768>`_
- **Contributor:** `Ian Clester <https://forum.beagleboard.org/u/ijc>`_
- **Mentors:** `Jack Armitage <https://forum.beagleboard.org/u/jarm>`_, `Chris Kiefer <https://forum.beagleboard.org/u/luuma>`_
- **GSoC:** `Google Summer of Code <https://summerofcode.withgoogle.com/archive/2023/projects/iTfGBkDk>`_
- **Weekly Updates:** `Forum Thread <https://forum.beagleboard.org/t/weekly-progress-report-differentiable-logic-for-interactive-systems-and-generative-music/38486>`_
- **Repository**: `embedded-difflogic <https://openbeagle.org/ijc/embedded-difflogic>`_
Status
=======
This project is currently just a proposal.
Proposal
========
This project has been accepted for GSoC 2024.
About
=====
- **Forum:** :fab:`discourse` `u/alecdenny (Alec Denny) <https://forum.beagleboard.org/u/alecdenny/summary>`_
- **OpenBeagle:** :fab:`gitlab` `alecdenny (Alec Denny) <https://openbeagle.org/alecdenny>`_
- **Github:** :fab:`github` `alecdenny (Alec Denny) <https://github.com/alecdenny>`_
- **School:** :fas:`school` Columbia College Chicago
- **Forum:** :fab:`discourse` `u/ijc (Ian Clester) <https://forum.beagleboard.org/u/ijc>`_
- **OpenBeagle:** :fab:`gitlab` `openbeagle.org/ijc <https://openbeagle.org/ijc>`_
- **Discord:** :fas:`comments` `bbb.io/gsocchat <https://bbb.io/gsocchat>`_
- **Github:** :fab:`github` `ijc8 (Ian Clester) <https://github.com/ijc8>`_
- **School:** :fas:`school` Georgia Institute of Technology
- **Country:** :fas:`flag` United States
- **Primary language:** :fas:`language` English
- **Typical work hours:** :fas:`clock` 8AM-5PM US Pacific
- **Previous GSoC participation:** :fab:`google` N/A
- **Typical work hours:** :fas:`clock` 9AM-6PM US Eastern
- **Previous GSoC participation:** :fab:`google` `Better Faust on the Web (2023) <https://summerofcode.withgoogle.com/archive/2023/projects/L6oI4LhW>`_
Project
********
**Project name:** Granular style transfer instrument with differentiable logic gate networks
**Project name:** Differentiable Logic for Interactive Systems and Generative Music
Description
============
The general aim of this project is to enable the development of models that are suitably efficient for use in real-time interactive applications on embedded systems (particularly the BeagleBone-based Bela).
At the project's core is difflogic [1]_, a recent technique that employs sparsely-connected network composed of basic logic gates (rather than densley-connected neurons with complex activation functions) to obtain small models and fast inference.
Thus, the first and foremost goal of the project is to enable a convenient workflow for developing difflogic models and running them on the Bela. The expected use case is developing and training models on a larger machine (e.g. a laptop, desktop, or server), followed by exporting the model to C and cross-compiling it for the BeagleBone - either the main CPU (ARM Cortex-A8) or the PRUs.
To support this workflow, I will develop wrappers for exporting compiled difflogic models for use in the various languages supported on Bela (C++, Pure Data, SuperCollider, Csound).
These wrappers will likely take inspiration from other projects that bring machine learning into computer music environments, such as `nn~ <https://github.com/acids-ircam/nn_tilde>`_ and `FluCoMa <https://www.flucoma.org/>`_.
This first goal, along with profiling and benchmarking the performance of difflogic models on both the main CPU and the PRUs, constitutes roughly the first half of the project.
The other, more exploratory half of the project consists of building out integrations and applications of difflogic for the rapid development of useful audio models.
To that end, I intend to explore the possibilities of combining difflogic networks with techniques such as DDSP (differentiable digital signal processing) [2]_, possibly also leveraging Faust auto-differentation.
I also intend to investigate the feasibility of "porting" well-known ML architectures such as VAEs to difflogic networks, and of training difflogic networks to approximate the behavior of existing neural networks (i.e. knowledge distillation).
Audio models such as RAVE [3]_, PESTO [4]_, and Whisper [5]_ may be of particular interest.
Furthermore, I will explore opportunities to combine difflogic networks with other cheap, effective techniques like the $Q recognizer [6]_ for gestural control, linear predictive coding for audio analysis & resynthesis, and toolkits such as `RapidLib <https://github.com/jarmitage/RapidLibBela>`_.
Such combinations may be particularly useful for interactive machine learning (as in Wekinator [7]_), should fine-tuning difflogic models on-device prove too costly.
In this phase of the project, I will develop example applications involving sound analysis, classification, and synthesis, and experiment with interactive machine learning.
Finally, I intend to dedicate some time to a specific creative application: generating networks of logic gates to approximate particular sounds and exploring the space of such sound-generating networks.
This application is inspired by bytebeat [8]_, a practice which involves writing short expressions that describe audio as a function of time, generating music sample-by-sample.
Typically, these expressions involve many bit-twiddling operations, consisting primarily of logic gates (bitwise AND, OR, XOR, NOT) and shifts --- a fact that suggests a remarkably good fit for difflogic, wherein models consist of networks of gates.
Other inspirations include work on sound matching: reproducing a given sound or family of sounds by estimating synthesizer parameters [9]_, generating patches [10]_, or training models [11]_.
In this vein, I will attempt to train difflogic gates to reproduce particular sounds, treating the entire network as a bytebeat-style function of time (sample index) that outputs samples.
Thanks to the tricks difflogic employs to train a network of discrete gates, this approach will enable sound matching via gradient descent and backpropagation (as in e.g. DDSP) rather than evolutionary methods, while still ultimately generating a discrete function.
Lastly, I will build an interactive application to explore the space of sound-generating networks (e.g. by mutating a network, or morphing between two networks) and visualize the execution of logic gate networks.
Create an AI powered granular style transfer algorithm runnable on the bela board with the goal of creating a usable instrument. The algorithm will allow for users to provide their own style target sample whose timbral characteristics can be explored through interaction via the microphone or audio input. The synthesis engine will be based on granular / concatenative synthesis methods, the AI component will be a network that maps input spectral features to a predicted point in the sample from which to playback audio.
Neural Network:
The AI component of the synthesis engine will have input in the form of `MFCCs <https://en.wikipedia.org/wiki/Mel-frequency_cepstrum>`_, coefficients which describe the frequency spectrum in terms of amplitude. MFCCs are like the Fourier transform’s amplitude spectrum but with amplitude and frequency scaled to match the characteristics of human perception. The model will use this information to predict from what point in the target sample to play back audio. Based on input, the network should output the most perceptually similar grain from the sample. To train the network, MFCCs from each grain of the sample will be input, with the prediction being the time offset at which the grain occurs in the sample. The neural network itself will be based on differentiable logic gate networks whose neurons have boolean inputs and outputs, and each perform one of 16 logic gate operations [1]. These networks can contain fewer neurons and be much smaller because each neuron's operation can be complex, and weights do not need to be stored, only which logic gate operation to perform [1].
Output representation will be similar to `thermometer format <https://en.wikipedia.org/wiki/Unary_coding>`_, which is closest to how classification is achieved in models previously trained by the authors and our mentors. The classification examples use binary adders, which sum a group of output neurons in order to predict the most likely class. In this case, many output neurons will be summed to determine from which section of the sample to play back grains from. As an example, if there are 1000 binary output neurons, when 500 are 1s and 500 are 0s, the output of the model would be considered 50% time offset in the target sample. Since the resolution is limited, grains will be played back from around the range specified by the network’s prediction. I have chosen to limit the resolution this way just based on the capabilities of the models I’ve read about, if there is a way to represent time with more detail than thermometer format I would be open to using that as well.
In order to feed the model enough data, vocal data will be analyzed and connected to its most similar timbral points in the sample using algorithms found in concatenative synthesis. This synthetic data can be used to condition the network for vocal inputs. Ideally the inclusion of this synthetic data will allow for vocal interaction from the user to navigate the granular synthesis engine. By synthetic data, I am not referring to synthesized vocal sounds, but rather vocal sounds that have been prepared in the format of MFCCs, and tied to their most similar grain in the target sample. Concatenative synthesis hosts a family of techniques by which individual grains of audio can be analyzed, compared and connected to each other based on similarity [2]. Training the model to replicate this process for inputs with features far outside those of the target sample could allow for an instrument that is better conditioned for any user input, but here I think focusing on vocal inputs would be most relevant.
.. image:: artifact.png
:width: 400
:alt: Alternative text
Resynthesis:
Resynthesis will rely on the neural network to determine which sample to play back. The user should be able to record a sample from which to apply a style transfer, or even use live input to the device. When preparing the training data and collecting input on the instrument, the pitch of each grain should be determined. While triggering grains, they can be resampled to match the pitch of the input (ideally relatively close). I would start by just resampling to match pitch between instrument input and model output, but there’s room for exploration of frequency domain techniques both for matching pitch and timbre. A variety of controls could be added if time permits, giving the user access to familiar granular synthesis controls.
Potential Controls:
- Play Speed: this would control the rate at which the instrument plays through user recorded loops. Since the MFCCs from performance are translated to grains from the sample, slowing or raising the speed at which we move through the control audio could allow for a variety of creative uses.
- Mix: Allow the user to mix in the sound of the recorded audio that is controlling the synth.
- Grain Size: change the size of grains played back by the resynthesis engine.
- Grain Trigger Rate: change the rate at which control audio is polled for input to the model (and used to trigger grain output from the synthesis component).
- Pitch Correction: control the degree to which grains are stretched to match the pitch of control audio.
- Randomness: control the amount of randomness applied to the time offset at which grains are triggered.
Goals:
1. Granular style transfer network using differential logic gate networks
2. Handheld instrument for navigating the network using audio input
Software
=========
Related Work:
Here are a few similar instruments and pieces of software.
`<https://gitlab.com/then-try-this/samplebrain>`_
`<https://dillonbastan.com/store/maxforlive/index.php?product=coalescence>`_
`<https://learn.flucoma.org>`_
`<https://github.com/ircam-ismm/catart-mubu>`_
`<https://www.xlnaudio.com/products/xo>`_
- C
- C++
- Python
- PyTorch
- difflogic
- dasp
Software
=========
Python
C++
Max/MSP
- Faust
- Linux
Hardware
========
Bela board
Knobs / Buttons
microphone
Audio I/O
- Bela
- BeagleBone Black
- Bela Cape
- Microphone
- Speaker
- OLED screen
Timeline
********
Provide a development timeline with 10 milestones, one for each week of development without
an evaluation, and any pre-work. (A realistic, measurable timeline is critical to our selection process.)
.. note:: This timeline is based on the `official GSoC timeline <https://developers.google.com/open-source/gsoc/timeline>`_
......@@ -159,68 +144,88 @@ Timeline detailed
=================
Community Bonding Period (May 1st - May 26th)
==============================================
----------------------------------------------------------------------------
GSoC contributors get to know mentors, read documentation, get up to speed to begin working on their projects
Coding begins (May 27th)
=========================
----------------------------------------------------------------------------
Milestone #1, Introductory YouTube video (June 3rd)
===================================================
Explore options for concatenative synthesis (i.e. how to prepare synthetic data). Get acquainted with training DiffLogic networks.
----------------------------------------------------------------------------
- Setup development environment
- Train trivial difflogic network on laptop & run generated C on Bela (main CPU)
Milestone #2 (June 10th)
==========================
Prototype DSP, granular synthesis options (maybe in Max). Start the granular synthesis engine.
----------------------------------------------------------------------------
- Run difflogic network on PRU
- Perform feature extraction (FFT, MFCCs) on PRU
Milestone #3 (June 17th)
=========================
Generate training data from target samples, generate synthetic data from vocal training data.
----------------------------------------------------------------------------
- Build wrappers to simplify use of difflogic networks in Bela projects
- C++ (namespace & wrapper around difflogic-generated C)
- SuperCollider (UGen)
Milestone #4 (June 24th)
==========================
Train networks and get them on the board.
----------------------------------------------------------------------------
- Build wrappers to simplify use of difflogic networks in Bela projects
- Pure Data (external)
- Csound (UDO)
Milestone #5 (July 1st)
========================
Create an interface for training models based on user supplied style targets.
----------------------------------------------------------------------------
- Explore feasibility of combining difflogic with DDSP techniques (via dasp and possibly Faust auto-differentiation)
- Use difflogic network to control synthesizer parameters
Submit midterm evaluations (July 8th)
=====================================
----------------------------------------------------------------------------
.. important::
**July 12 - 18:00 UTC:** Midterm evaluation deadline (standard coding period)
Milestone #6 (July 15th)
=========================
On board granular synthesis engine.
----------------------------------------------------------------------------
- Investigate feasibility of interactive machine learning (e.g. fine-tuning) with difflogic networks
- Combine difflogic network with complementary cheaply techniques (e.g. LPC, template matching via $Q, RapidLib)
Milestone #7 (July 22nd)
=========================
Develop recording / playback interface.
----------------------------------------------------------------------------
- Work on example applications
- Classify short mouth sounds for interactive system control (à la `parrot.py <https://github.com/chaosparrot/parrot.py>`_)
- Perform real-time pitch estimation (à la PESTO)
Milestone #8 (July 29th)
=========================
Add additional parameters and functionality to the synthesis engine.
----------------------------------------------------------------------------
- Experiment with implementing popular architectures (e.g. VAEs, as in RAVE) as difflogic networks
- Experiment with difflogic knowledge distillation: training a difflogic network to approximate the behavior of a pre-trained, conventional neural network (student/teacher)
Milestone #9 (Aug 5th)
=======================
Figure out extra controls, inputs / outputs for the board.
----------------------------------------------------------------------------
- Experiment with training difflogic networks for sound reconstruction
- Bytebeat-inspired: feed increasing timestamps to network, get subsequent audio samples out
Milestone #10 (Aug 12th)
========================
Assemble the instrument and add finishing touches.
----------------------------------------------------------------------------
- Creative application: Interactive exploration of space of difflogic sound reconstruction models
- "Glitch" - random perturbations of network (mutate gates & connections)
- "Morph" - interpolate (in terms of tree edit-distance) between different sound-generating networks
- Visualize difflogic networks & their execution
Final YouTube video (Aug 19th)
===============================
----------------------------------------------------------------------------
Submit final project video, submit final work to GSoC site
and complete final mentor evaluation
Final Submission (Aug 24nd)
============================
----------------------------------------------------------------------------
.. important::
......@@ -231,7 +236,7 @@ Final Submission (Aug 24nd)
evaluations (standard coding period)
Initial results (September 3)
=============================
----------------------------------------------------------------------------
.. important::
**September 3 - November 4:** GSoC contributors with extended timelines continue coding
......@@ -240,49 +245,58 @@ Initial results (September 3)
**November 11 - 18:00 UTC:** Final date for mentors to submit evaluations for GSoC contributor projects with extended deadline
Experience and approch
Experience and approach
***********************
I have extensive experience with embedded systems and real-time audio.
As an undergraduate, I worked on embedded systems during internships at Astranis and Google.
For a final class project, I developed a multi-effects pedal with a configurable signal chain in C using fixed-point arithmetic on the `Cypress PSoC 5 <https://www.infineon.com/cms/en/product/microcontroller/32-bit-psoc-arm-cortex-microcontroller/32-bit-psoc-5-lp-arm-cortex-m3/>`_ (an ARM-based system-on-a-chip with configurable digital and analog blocks).
My `master's work <https://dspace.mit.edu/handle/1721.1/129201>`_ involved localizing RFID tags using software-defined radios with framerates sufficient for interactive systems.
Currently, I am a teaching assistant for a class on Audio Software Engineering (in Rust, with a focus on real-time audio software), in which I have been responsible for preparing much of the material and lectures.
I have worked with a variety of microcontrollers and single-board computers, from writing assembly on the Intel 8051, to C++ on Arduinos and ESP32s, to Python and JS on Raspberry Pis.
I have previously used pytorch to train small neural networks for granular synthesis. My college capstone used simple audio features from analyzing individual grains to try to rearrange source samples into new compositions.
`here <https://alecdenny.wordpress.com/>`_
I have also manipulated small convolutional image synthesis models for a personal project.
`here <https://www.dropbox.com/scl/fi/mmnkrnr3l7jec3zqjfcbu/iputmyhandinthefiredraft1.mp4?rlkey=nsxpjvotrqmftidh4igbvbsxw&dl=0>`_
I am currently building the interface for a VST plugin in C++ through which I’ve gained experience with real time audio code and musical interface design. This project is a collaboration with Nathan Blair.
`here <https://www.dropbox.com/scl/fi/w75xzhwyjgcfvejkrg8fv/waveshinedemo.mov?rlkey=iuxey52f6hq52pfjvui6uv058&dl=0>`_
I have experience with spectral processing and synthesis. I am currently working on a python framework for spectral synthesis that mimics the structure of 3d graphics shaders called WavKitchen.
`here <https://github.com/alecdenny/wavkitchen>`_
I have also employed machine learning techniques to build interactive systems.
In a graduate course on multimodal user interaction, I gained experience with classic machine learning techniques, and employed cheap techniques for gesture recognition in a `tablet-based musical sketchpad <https://github.com/ijc8/notepad>`_.
In the meantime, I have been following developments in machine learning for audio (particularly those that are feasible to run locally, especially sans GPU), and I have experimented with models such as RAVE and Whisper (using the latter for an recent interactive audiovisual `hackathon project <https://github.com/ijc8/hackathon-2024>`_).
I make audio / visual art and have coded a variety of visualizers and interfaces for performance which I got the chance to use all over the US last year.
Much of my graduate work has focused on generative music and computational representations of music.
My recent work on `ScoreCard <https://ijc8.me/s>`_ has put an extreme emphasis on fitting music-generating programs (typically written in C) into efficient, self-contained packages that are small enough to store in a QR code (\< 3kB).
Previous projects such as `Blocks <https://ijc8.me/blocks>`_ (an audiovisual installation) and `kilobeat <https://ijc8.me/kilobeat>`_ (a collaborative livecoding tool) have probed the musical potential of extremely short fragments of code (bytebeat & floatbeat expressions).
These projects also explore methods of visualizing musical programs, either in terms of their output or their execution.
More information about my work is available on `my website <https://ijc8.me>`_ and `GitHub <https://github.com/ijc8>`_.
I am particularly interested in difflogic because it occupies an intersection between lightweight machine learning techniques (cheaper is better!) and compact representations of musical models (less is more!), and I am strongly motivated to see what it can do.
Contingency
===========
If I get stuck on something related to BeagleBoard or Bela development, I plan to take advantage of resources within those communities (such as documentation, forums, and Discord servers).
I studied music technology in a program that focused more on composition and performance, but have taught myself most of what I know about machine learning, C++ and digital signal processing. If I get stuck, I know I have experience bringing myself up to speed on these topics, and know where to find resources for them.The main thing I see myself getting stuck on is programming the board itself, I have limited experience with hardware though I have used arduino before in school.
If I get stuck on something related to ML or DSP, I plan to refer back to reference texts and the papers and code of related work (DDSP, RAVE, PESTO, etc.), and I may reach out to colleagues within the ML space (such as those in the Music Information Retrieval lab within my department) for advice.
If I get stuck on something related to music or design, I plan to take a break and go on a walk. :-)
Benefit
========
The first half of this project will provide a straightforward means to develop models with difflogic and run them on embedded systems such as BeagleBoards and particularly Bela. (The wrappers for Bela's supported languages may also prove generally useful outside of embedded contexts.)
Making it easier for practitioners to use difflogic models in creative applications will, in turn, aid in the development of NIMEs and DMIs that can benefit from the small size and fast inference (and corresponding portability and low latency) of difflogic networks.
Generative AI is a major topic in music software / hardware right now. One of the main constraints for developers right now is the feasibility of running or training models for real time audio applications. Differentiable logic gate networks offer a novel approach to shrinking models for these use cases. One remaining concern I had after reviewing the literature was the ability to generate high resolution outputs from the model, or represent continuous numbers with accuracy and trainability. This network architecture is an approach that pushes the potential uses for AI on hardware into the realm of generative synthesis in a way that elides some of these difficulties.
For creatives using AI instruments, the challenges are in interfacing with AI, as well as originality. AI can shrink very complex synthesis algorithms down to a handful of controls, but can struggle to come up with easily name-able parameters, or provide the user with too many parameters to possibly feel comfortable using them. For this project, the control interface is primarily the user’s voice, but potentially any audio or sound in their immediate environment. Because the synthesis engine is not entirely AI, familiar granular synthesis controls are also available, adding a level of familiarity to the instrument. Originality with AI instruments poses a challenge for artists when massive datasets are required for one to work, or outputs could contain copyrighted material. With this model, artists could train models themselves, with no potential for accidental infringement.
The potential benefits for this project would be a working library for generative AI synthesis on the beagle board, as well as an exploration of the potential use cases for small neural networks in electronic instruments. The benefits of this project are more geared towards the NIME and music AI communities than specifically the beagle board community. However I think creating this instrument and allowing users to train models and upload them to the board could yield a low commitment entry point for those involved in these communities to be introduced to beagle boards.
This project would yield a reusable neural network architecture, the synthesis component could be greatly extended. Using frequency domain pitch shifting or cross synthesis techniques could build a variety of instruments on top of the framework of this project. Creating an interface to prepare data and train models for use on the board could also prove useful for other projects.
References
==========
1. F Petersen, C. Borgelt, Hilde, H. Kuehne, O. Deussen. Deep Differentiable Logic Gate Networks. *Neurips.* [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/0d3496dd0cec77a999c98d35003203ca-Paper-Conference.pdf
2. S Diemo, “Concatenative Sound Synthesis: The Early Years,” *Journal of New Music Research*, vol. 35, March, 2006. [Online]. Available: https://hal.science/hal-01161361/document
The second half of this project, depending on the results of my explorations, may demonstrate useful ways to combine difflogic with other ML & DSP techniques, and provide some useful and interesting audio-focused applications to serve as effective demonstrations of the possibilities for ML on the BeagleBoard and possible starting points for others.
Misc
====
to do : have not made merge request yet
`Here <https://github.com/jadonk/gsoc-application/pull/194>`_ is my pull request demonstrating cross-compilation and version control.
References
==========
.. [1] Petersen, F. et al. 2022. Deep Differentiable Logic Gate Networks. Proceedings of the 36th Conference on Neural Information Processing Systems (Oct. 2022).
.. [2] Engel, J. et al. 2020. DDSP: Differentiable Digital Signal Processing. Proceedings of the International Conference on Learning Representations (2020).
.. [3] Caillon, A. and Esling, P. 2021. RAVE: A variational autoencoder for fast and high-quality neural audio synthesis. arXiv.
.. [4] Riou, A. et al. 2023. PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective. Proceedings of the 24th International Society for Music Information Retrieval Conference (Sep. 2023).
.. [5] Radford, A. et al. 2023. Robust Speech Recognition via Large-Scale Weak Supervision. Proceedings of the 40th International Conference on Machine Learning (2023).
.. [6] Vatavu, R.-D. et al. 2018. $Q: a super-quick, articulation-invariant stroke-gesture recognizer for low-resource devices. Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (New York, NY, USA, Sep. 2018), 1–12.
.. [7] Fiebrink, R. et al. 2009. A Meta-Instrument for Interactive, On-the-fly Machine Learning. Proceedings of the International Conference on New Interfaces for Musical Expression (2009), 280–285.
.. [8] Heikkilä, V.-M. 2011. Discovering novel computer music techniques by exploring the space of short computer programs. arXiv.
.. [9] Yee-King, M. and Roth, M. 2008. Synthbot: An unsupervised software synthesizer programmer. ICMC (2008).
.. [10] Macret, M. and Pasquier, P. 2014. Automatic design of sound synthesizers as pure data patches using coevolutionary mixed-typed cartesian genetic programming. Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation (New York, NY, USA, Jul. 2014), 309–316.
.. [11] Caspe, F. et al. 2022. DDX7: Differentiable FM Synthesis of Musical Instrument Sounds. Proceedings of the 23rd International Society for Music Information Retrieval Conference. (2022).
.. _gsoc-2024-proposals:
:far:`calendar-days` 2024
##########################
.. toctree::
:maxdepth: 1
ijc/index
aryan_nanda/index
roger18/index
melta101/index
\ No newline at end of file