Real-Life Indoor Sound Event Dataset (ReaLISED) for Sound Event Classification (SEC)

  1. Mohino-Herranz, Inma 1
  2. García-Gómez, Joaquín 2
  3. Aguilar-Ortega, Miguel 2
  4. Utrilla-Manso, Manuel 2
  5. Gil-Pita, Roberto 2
  6. Rosa-Zurera, Manuel 2
  1. 1 University of Alcalá; National Institute of Aerospace Technology (INTA)
  2. 2 Universidad de Alcalá
    info

    Universidad de Alcalá

    Alcalá de Henares, España

    ROR https://ror.org/04pmn0e78

Editor: Zenodo

Year of publication: 2022

Type: Dataset

CC BY 4.0

Abstract

The Real-Life Indoor Sound Event Dataset (ReaLISED) offers the scientific community the possibility of testing Sound Event Classification (SEC) algorithms with new real indoor audio event recordings. The full set is made up of 2479 sound recordings of 18 events. The 18 event classes are the following: beater, cooking, cupboard/wardrobe, dishwasher, door, drawer, furniture movement, microwave, object falling, smoke extractor, speech, switch, television, vacuum cleaner, walking, washing machine, water tap, and window. There are 2479 clips of isolated sounds, which result in 3624.51 seconds. The number of events in each class is between 104 for the "Window" class and 190 for the “Speech” class, with a mean value of 138 events and a standard deviation of 25. Four Olympus LS-100 recorders were used. The sampling frequency was set to 44.1 kHz and 24 bits per sample. The stereo mode was used, and a medium sensitivity of the microphone was set. The distance between the recorder and the sound source was set to approximately 30-40 cm. Apart from the labels related to the class of event, extra information for each recording is provided in order to be exploited if necessary in the future, with other research purposes. This extra information completes the description of the sound source. The dataset is introduced to the scientific community by providing all the .flac files which composed it. The name of the files is built with 5 pieces of information, separated with underscores (“_”), with the format “abc_123_45_67_8.flac′′: “abc”: the first three letters indicate the source that produces the sound. This segment can take 18 different values: ‘bea’ (beater), ‘coo’ (cooking), ‘cup’ (cupboard/wardrobe), ‘dis’ (dishwasher), ‘doo’ (door), ‘dra’ (drawer), ‘fur’ (furniture movement), ‘mic’ (microwave), ‘obj’ (object falling), ‘smo’ (smoke extractor), ‘spe’ (speech), ‘swi’ (switch), ‘tel’ (television), ‘vac’ (vacuum cleaner), ‘wal’ (walking), ‘was’ (washing machine), ‘wat’ (water tap), win’ (window). “123”: this set of digits identifies the event among the number of events produced by the source identified with “abc”. This segment can take all the values between ‘001’ and ‘190’, which is the maximum number of events of a particular class we can find in the dataset (speech). “45”: this set of digits identifies the action that produce the sound. This segment can take 11 different values: ‘01’ (close), ‘02’ (open), ‘03’ (throw), ‘04’ (turn on), ‘05’ (turn off), ‘06’ (move), ‘07’ (plug), ‘08’ (unplug), ‘09’ (raise), ‘10’ (lower), and ‘00’ (there is no information about the action). “67”: this set of digits identifies the material the sound source is made of. This segment can take 14 different values: ’01’ (wood), ’02’ (glass), ’03’ (metal), ’04’ (plastic), ’05’ (ceramic), ’06’ (synthetic), ’07’ (cardboard), ’08’ (marble), ’09’ (floating platform), ’10' (platelet), ’11’ (wicker), ’12’ (carpet), ’13’ (medium-density fibreboard MDF), and ’00’ (there is no information about the material). “8”: the last digit gives approximate information about the intensity of the recorded sound. It can take 4 different values: ’1’ (low intensity), ’2’ (medium intensity), ’3’ (high intensity), ’0’ (there ir no information about the intensity). For clarity, some examples of audio file names with this code are shown hereunder: “doo_040_02_00_3.flac” is the name of the 40th file in the Door class, described as “opening a door of unknown material with high intensity”. “fur_058_06_01_2.flac” is the name of the 58th file in the furniture movement class, described as “moving a wooden furniture with medium intensity”. “vac_001_00_00_0.flac” is the name of the 1st audio file in the vacuum cleaner class, described as “using the vacuum cleaner, without information about the action, neither the material or the intensity”.