:py:mod:`tonic`
===============

.. py:module:: tonic


Subpackages
-----------
.. toctree::
   :titlesonly:
   :maxdepth: 3

   datasets/index.rst
   functional/index.rst


Submodules
----------
.. toctree::
   :titlesonly:
   :maxdepth: 1

   audio_augmentations/index.rst
   audio_transforms/index.rst
   cached_dataset/index.rst
   collation/index.rst
   dataset/index.rst
   download_utils/index.rst
   io/index.rst
   sliced_dataset/index.rst
   slicers/index.rst
   transforms/index.rst
   utils/index.rst


Package Contents
----------------

Classes
~~~~~~~

.. autoapisummary::

   tonic.Aug_DiskCachedDataset
   tonic.CachedDataset
   tonic.DiskCachedDataset
   tonic.MemoryCachedDataset
   tonic.Dataset
   tonic.SlicedDataset


Attributes
~~~~~~~~~~

.. autoapisummary::

   tonic.__version__


.. py:class:: Aug_DiskCachedDataset


   Bases: :py:obj:`DiskCachedDataset`

   Aug_DiskCachedDataset is a child class from DiskCachedDataset with further customizations to
   handle augmented copies of a sample. The goal of this customization is to map the indices of
   cached files (copy) to augmentation parameters. This is useful in a category of augmentations
   where the range of parameter is rather disceret and non probabilistic, for instance an audio
   sample is being augmented with noise and SNR can take only N=5 values. Passing copy_index to
   augmentation Class as an init argument ensures that each copy will be a a distinct augmented
   sample with a trackable parameter.

   'generate_all' method generates all augmented vesions of a sample.
   'generate_copy' method generates the missing variant (augmented version)


   Therefore all transforms applied to the dataset are categorized by the keys:  "pre_aug", "augmentations"
    and "post_aug".

    Args:
        'all_transforms' is a dictionarty passed to this class containing information about all transforms.

   .. py:attribute:: all_transforms
      :type: dict | None

      
   .. py:method:: __post_init__()


   .. py:method:: generate_all(item)


   .. py:method:: generate_copy(item, copy)


   .. py:method:: __getitem__(item) -> tuple[object, object]


.. py:class:: CachedDataset(*args, **kwargs)


   Bases: :py:obj:`DiskCachedDataset`

   Deprecated class that points to DiskCachedDataset for now but will be removed in a future
   release.

   Please use MemoryCachedDataset or DiskCachedDataset in the future.


.. py:class:: DiskCachedDataset


   DiskCachedDataset caches the data samples to the hard drive for subsequent reads, thereby
   potentially improving data loading speeds. If dataset is None, then the length of this dataset
   will be inferred from the number of files in the caching folder. Pay attention to the cache
   path you're providing, as DiskCachedDataset will simply check if there is a file present with
   the index that it is looking for. When using train/test splits, it is wise to also take that
   into account in the cache path.

   .. note:: When you change the transform that is applied before caching, DiskCachedDataset cannot know about this and will present you
             with an old file. To avoid this you either have to clear your cache folder manually when needed, incorporate all
             transformation parameters into the cache path which creates a tree of cache files or use reset_cache=True.

   .. note:: Caching Pytorch tensors will write numpy arrays to disk, so be careful when loading the sample and you expect a tensor. The recommendation is to defer the transform to tensor as late as possible.

   :param dataset: Dataset to be cached to disk. Can be None, if only files in cache_path should be used.
   :param cache_path: The preferred path where the cache will be written to and read from.
   :param reset_cache: When True, will clear out the cache path during initialisation. Default is False
   :param transform: Transforms to be applied on the data
   :param target_transform: Transforms to be applied on the label/targets
   :param transforms: A callable of transforms that is applied to both data and labels at the same time.
   :param num_copies: Number of copies of each sample to be cached.
                      This is a useful parameter if the dataset is being augmented with slow, random transforms.
   :param compress: Whether to apply lightweight lzf compression, default is True.

   .. py:attribute:: dataset
      :type: collections.abc.Iterable

      
   .. py:attribute:: cache_path
      :type: str

      
   .. py:attribute:: reset_cache
      :type: bool
      :value: False

      
   .. py:attribute:: transform
      :type: collections.abc.Callable | None

      
   .. py:attribute:: target_transform
      :type: collections.abc.Callable | None

      
   .. py:attribute:: transforms
      :type: collections.abc.Callable | None

      
   .. py:attribute:: num_copies
      :type: int
      :value: 1

      
   .. py:attribute:: compress
      :type: bool
      :value: True

      
   .. py:method:: __post_init__()


   .. py:method:: __getitem__(item) -> tuple[object, object]


   .. py:method:: __len__()


.. py:class:: MemoryCachedDataset


   MemoryCachedDataset caches the samples to memory to substantially improve data loading
   speeds. However you have to keep a close eye on memory consumption while loading your samples,
   which can increase rapidly when converting events to rasters/frames. If your transformed
   dataset doesn't fit into memory, yet you still want to cache samples to speed up training,
   consider using `DiskCachedDataset` instead.

   :param dataset: Dataset to be cached to memory.
   :param device: Device to cache to. This is preferably a torch device. Will cache to CPU memory if None (default).
   :param transform: Transforms to be applied on the data
   :param target_transform: Transforms to be applied on the label/targets
   :param transforms: A callable of transforms that is applied to both data and labels at the same time.

   .. py:attribute:: dataset
      :type: collections.abc.Iterable

      
   .. py:attribute:: device
      :type: str | None

      
   .. py:attribute:: transform
      :type: collections.abc.Callable | None

      
   .. py:attribute:: target_transform
      :type: collections.abc.Callable | None

      
   .. py:attribute:: transforms
      :type: collections.abc.Callable | None

      
   .. py:attribute:: samples_dict
      :type: dict

      
   .. py:method:: __getitem__(index)


   .. py:method:: __len__()


.. py:class:: Dataset(save_to: str, transform: collections.abc.Callable | None = None, target_transform: collections.abc.Callable | None = None, transforms: collections.abc.Callable | None = None)


   Base class for Tonic datasets which download public data.

   Contains a few helper function to reduce duplicated code.

   .. py:method:: __repr__()

      Return repr(self).


   .. py:method:: download() -> None

      Downloads from a given url, places into target folder and verifies the file hash.


.. py:class:: SlicedDataset


   The primary use case for a SlicedDataset is to cut existing examples in a dataset into
   smaller chunks. For that it takes an iterable dataset and a slicing method as input. It then
   generates metadata about the slices and where to find them in each original sample. The new
   dataset length will be the sum of all slices across samples.

   :param dataset: a dataset object which implements __getitem__ and __len__ methods.
   :param slicer: a function which implements the tonic.slicers.Slicer protocol, meaning that
                  it doesn't have to inherit from it but implement all its methods.
   :param metadata_path: filepath where slice metadata should be stored, so that it does not
                         have to be recomputed the next time. If None, will be recomputed
                         every time.
   :param transform: Transforms to be applied on the data
   :param target_transform: Transforms to be applied on the label/targets
   :param transforms: A callable of transforms that is applied to both data and labels at the same time.

   .. py:attribute:: dataset
      :type: collections.abc.Iterable

      
   .. py:attribute:: slicer
      :type: tonic.slicers.Slicer

      
   .. py:attribute:: metadata_path
      :type: str | None

      
   .. py:attribute:: transform
      :type: collections.abc.Callable | None

      
   .. py:attribute:: target_transform
      :type: collections.abc.Callable | None

      
   .. py:attribute:: transforms
      :type: collections.abc.Callable | None

      
   .. py:method:: __post_init__()

      Will try to read metadata from disk to know where slices start and stop for each sample.

      If no metadata_path is provided or no file slice_metadata.h5 is found in that path,
      metadata will be generated from scratch.


   .. py:method:: generate_metadata()

      Slices every sample in the wrapped dataset and returns start and stop metadata for each
      slice.


   .. py:method:: __getitem__(item) -> Any


   .. py:method:: __len__()


.. py:data:: __version__