Microphone

⚠️

This documentation is under construction. We currently use the ReSpeaker Mic Array v2.0 for our robots.

Hardware Setup

We use the ReSpeaker Mic Array v2.0 (product link), which features:

4 high-performance digital microphones
12 RGB LEDs
USB connectivity
Built-in algorithms for DOA (Direction of Arrival) and beamforming

The microphone connects to the Jetson Orin via USB as shown:

Recording Audio

Here’s a basic example of recording audio using the microphone:

import numpy as np
import sounddevice as sd
import soundfile as sf
 
def record_audio(duration=10.0, sample_rate=44100, channels=1):
    """Record audio from the default microphone.
 
    Args:
        duration (float): Recording duration in seconds
        sample_rate (int): Sample rate in Hz
        channels (int): Number of audio channels
    """
    print("Recording...")
    audio_data = sd.rec(
        int(duration * sample_rate),
        samplerate=sample_rate,
        channels=channels,
        blocking=True
    )
    print("Done recording!")
    return audio_data
 
# Example usage
audio = record_audio()
sf.write("recorded_audio.wav", audio, 44100)

Device Selection

The default audio device is usually correct, but you can:

List available devices: sd.query_devices()
Set a specific device: sd.default.device = [device_id]
Get current device info: sd.query_devices(sd.default.device[0])

Audio Signal Path

For the ReSpeaker MEMS microphones:

Microphone Array → MP34DT01 MEMS microphones capture audio
Digital Processing → Outputs PDM (Pulse-Density Modulation) data
USB Interface → Handles data transfer to the Jetson
Software Stack

Physical Principles of Audio Recording

MEMS Microphone Technology

The ReSpeaker uses MEMS (Micro-Electrical-Mechanical Systems) microphones, specifically the MP34DT01. Here’s how they work:

Mechanical Structure
- A thin membrane (diaphragm) suspended over a fixed backplate
- Forms a variable capacitor where the membrane moves with sound waves
- Typical membrane size: 0.5-1mm diameter, ~1μm thick

Sound to Electrical Conversion

Sound Wave → Membrane Vibration → Capacitance Change → Electrical Signal

Digital Output
- Unlike analog mics, MEMS mics output digital PDM (Pulse Density Modulation)
- PDM represents analog amplitude through pulse density:
```
Analog: ～～～～
PDM:   _▄_▄▄_▄_▄▄▄_▄  (Higher amplitude = more pulses)
```

Array Processing

The ReSpeaker’s 4-mic array enables:

Beamforming

Mic 1: ----sound1----→
Mic 2: ---sound1-delay→  → Digital Processing → Enhanced Signal
Mic 3: --sound1-delay2→
Mic 4: -sound1-delay3-→

Uses time differences between mics
Constructively combines desired direction
Destructively cancels other directions

Direction of Arrival (DOA)

Calculates sound source angle using:

θ = arcsin(c × Δt / d)
where:
c = speed of sound
Δt = time delay between mics
d = distance between mics

Sampling Process

Temporal Sampling
```
Continuous: ～～～～～
Sampled:    • • • • •
```
- 44.1kHz standard (captures up to 22.05kHz per Nyquist)
- Each sample is typically 16-bit (65,536 amplitude levels)
Spatial Sampling
- 4 mics in circular array
- Spatial Nyquist frequency: f = c/(2d)
- Where d is mic spacing (~4.36cm on ReSpeaker)

Digital Signal Chain

                                            ┌→ Beamforming
Sound → MEMS → PDM → Decimation Filter → PCM┼→ DOA
                                            └→ Audio Output

Speaker IMU