tonic.audio_augmentations#

Module Contents#

Classes#

RandomTimeStretch

Time-stretch an audio sample by a fixed rate.

RandomPitchShift

Shift the pitch of a waveform by n_steps steps .

RandomAmplitudeScale

Scales the maximum amplitude of the incoming signal to a random amplitude chosen from a

AddWhiteNoise

Add white noise to the data sample with a known ratio.

RIR

Convolves a RIR (room impluse response) to the data sample.

class tonic.audio_augmentations.RandomTimeStretch#

Time-stretch an audio sample by a fixed rate. :param samplerate: sample rate of the sample :type samplerate: float :param sample_length: sample length in seconds :type sample_length: int :param factors: range of desired factors for time stretch :type factors: float :param aug_index: index of the chosen factor for time stretch. It will be randomly chosen from the desired range (if not passed while initilization) :type aug_index: int :param caching: if we are caching the DiskCached dataset will dynamically pass copy index of data item to the transform (to set aug_index). Otherwise the aug_index will be chosen randomly in every call of transform :type caching: bool :param fix_length: if True, time stretched signal will be returned in a fixed length (samplerate * sample_length ) :type fix_length: bool

Parameters:

audio (np.ndarray) – data sample

Returns:

time stretched data sample

Return type:

np.ndarray

samplerate: float#
sample_length: int#
factors: list#
aug_index: int = 0#
caching: bool = False#
fix_length: bool = True#
__call__(audio: numpy.ndarray)#
Parameters:

audio (numpy.ndarray) –

class tonic.audio_augmentations.RandomPitchShift#

Shift the pitch of a waveform by n_steps steps .

Parameters:
  • samplerate (float) – sample rate of the sample

  • factors (float) – range of desired factors for pitch shift

  • aug_index (int) – index of the chosen factor for pitchshift. It will be randomly chosen from the desired range (if not passed while initilization)

  • caching (bool) – if we are caching, the DiskCached dataset will dynamically pass copy index of data item to the transform (to set aug_index). Otherwise the aug_index will be chosen randomly in every call of transform

  • audio (np.ndarray) – data sample

Returns:

pitch shifted data sample

Return type:

np.ndarray

samplerate: float#
factors: list#
aug_index: int = 0#
caching: bool = False#
__call__(audio: numpy.ndarray)#
Parameters:

audio (numpy.ndarray) –

class tonic.audio_augmentations.RandomAmplitudeScale#

Scales the maximum amplitude of the incoming signal to a random amplitude chosen from a range.

Parameters:
  • samplerate (float) – sample rate of the sample

  • min_amp (float) – minimum of the amplitude range in volts

  • max_amp (float) – maximum of the amplitude range in volts

  • factors (float) – range of desired factors for amplitude scaling

  • aug_index (int) – index of the chosen factor for pitchshift. It will be randomly chosen from the desired range (if not passed while initilization)

  • caching (bool) – if we are caching, the DiskCached dataset will dynamically pass copy index of data item to the transform (to set aug_index). Otherwise the aug_index will be chosen randomly in every call of transform

  • data (np.ndarray) – input (single-or multi-channel) signal.

Returns:

scaled version of the signal.

Return type:

np.ndarray

samplerate: float#
min_amp: float = 0.058#
max_amp: float = 0.159#
factors: list#
aug_index: int = 0#
caching: bool = False#
__post_init__()#
__call__(audio: numpy.ndarray)#
Parameters:

audio (numpy.ndarray) –

class tonic.audio_augmentations.AddWhiteNoise#

Add white noise to the data sample with a known ratio.

Parameters:
  • samplerate (float) – sample rate of the sample

  • factors (float) – range of desired ratios for added noise

  • aug_index (int) – index of the chosen factor for noise. It will be randomly chosen from the desired range (if not passed while initilization)

  • caching (bool) – if we are caching the DiskCached dataset will dynamically pass copy index of data item to the transform (to set aug_index). Otherwise the aug_index will be chosen randomly in every call of transform

  • audio (np.ndarray) – data sample

Returns:

data sample with added noise

Return type:

np.ndarray

samplerate: float#
factors: list#
aug_index: int = 0#
caching: bool = False#
__call__(audio: numpy.ndarray)#
Parameters:

audio (numpy.ndarray) –

class tonic.audio_augmentations.RIR#

Convolves a RIR (room impluse response) to the data sample.

Parameters:
  • samplerate (float) – sample rate of the sample

  • rir_audio (str) – path to a sample room impluse response in the .wav format

  • caching (bool) – if we are caching the DiskCached dataset will dynamically pass copy index of data item to the transform (to set aug_index). Otherwise the aug_index will be chosen randomly in every call of transform

  • audio (np.ndarray) – data sample

Returns:

data sample convolved with RIR

Return type:

np.ndarray

samplerate: float#
rir_audio: str#
caching: bool = False#
__call__(audio)#