Speaker
This documentation is under construction and incomplete. We are in the process of choosing a speaker that works well with our robots.
At the moment, we are using the this speaker from Digikey (1 (opens in a new tab)) It can be wired up to the Jetson Orin as such:
Here is an example of code to play a wav file on the speaker:
import numpy as np
import sounddevice as sd
import soundfile as sf
sound, sr = sf.read("test.wav") #sr is the sampling rate. sound is a numpy array.
sd.play(audio, sr, blocking=True)
The default device that sounddevice
use is most often the right one. Even if
two speakers are used, they will be both plugged into one amp. If additional speakers
or mics are plugged in or if default selection is incorrect, you can list the
available devices using sd.query_devices()
and select one using sd.default.device = ...
From Bits to Sound
This is the path sound takes on the linux system.
- Sound Server: Most basically, orchestrates device driver to work with mulitplexing. Examples include PulseAudio, Pipewire, JACK.
- Driver: The Advanced LInux Sound Architecture (ALSA) drives the digital to analog conversion and writes the bytes to the output.
- Sound Card acts as the Digital to Analog Converter (DAC) Controlled by the driver. May have additional hardware here such as amps and pre-amps.
- Speaker converts electrical signal to sound.
Our implementation uses soundfile
and sounddevice
, which are libraries built to
manipulate audio files. Their functionalities for interacting with the sound card and
getting processed data is through PortAudio, an abstraction layer atop of ALSA.
These can be set up on a fresh Jetson using sudo apt-get install libportaudio2
and pip install soundfile sounddevice
. ALSA is built into the kernel on the Jetson
and does not need to be installed.
PortAudio lives here github (opens in a new tab). ALSA lives here github (opens in a new tab).
Other Terms
- Sampling Rate = how many times per second information is captured. 44.1 khz is common for audio, 48 khz for video, and higher specification audio have higher sampling rates such as 96 khz.
- Multiplexing = the process of interleaving multiple signals (eg. Spotify and Youtube) into a single signal at the same time. May require resampling if sampling rates are different.
Set up
Flash the jetson with Nvidia default image https://docs.nvidia.com/sdk-manager/download-run-sdkm/index.html (opens in a new tab)
- Download Jetson SDK manager, must be Ubuntu 20.0 system
- For Orin, flash by shorting GND and FC REC, while plugged in power jack and connect to type C. For AGX just connect to USB C (must be the one next to the pins)
- Jetpack 6.0. Follow the steps. On step 2, check everything
- The downloading and installation can take ~1 hour.
(download miniconda)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
chmod +x [Miniconda3-latest-Linux-aarch64.sh](http://miniconda3-latest-linux-aarch64.sh/)
./Miniconda3-latest-Linux-aarch64.sh
source ~/.bashrc
(download your code using soundfile, sounddevice)
git clone https://github.com/kscalelabs/functions.git
git branch -r
git checkout -b audio
download dependencies
conda create —name func_conda python=3.8.19
conda activate func_conda
pip install numpy sounddevice soundfile
sudo apt-get install libportaudio2
then run! Note that the automatic sd.default.device is the right one. Comment out that line when running, unless running into issues.