Skip to content

Caerii/IS426_Audio2Spectrogram

Repository files navigation

IS426_Audio2Spectrogram

Processes audio into usable spectrograms for classification

There are several processing steps

The program should ask for which classes of data you are going to be processing. At the moment, it only processes one large audio clip, but later versions will process all the files within a directory.

This will create empty folders, and the program will ask you to select the files on your computer which correspond to that class. (Currently only one class at a time, multiclass processing in next version).

These files will be cloned into a folder in the root directory of the program in "./data"

So you've got these audio files. There is a function (waveformIngestion()) that takes in the: -frequency of the audio data (whether it be 44100 or 48000 or something), this will determine the array size for the audio. -url of each audio file -the "chunk" size that you want to break the audio down into

This will be applied to all of the audio clips.

So next you're going to have a folder structure be autogenerated from this function which takes an audio clip, and the audio will be split into chunks proportional to the total size of the clip, each audio clip will spawn one folder that has all of the clips of the chunks inside of it.

This means in each class folder, you'll have a large amount of folders corresponding to the amount of examples that exist for that particular class.

This repository will process a test audio file which is Steve Lacy's song "Bad Habits".

A copy of this file structure will be used then to generate the spectrograms of each of the chunks. The chunks will then become MEL Power Law Spectrograms.

These chunks, that are then labeled by their folder, can be used to train a machine learning classification model that utilizes CNN's (Convolutional Neural Networks).

Presto, we have done dimensionality reduction on audio files into an image format which features can be extracted from and learned utilizing the advancements in computer vision, without any specialized domain knowledge!

About

Processes audio into usable spectrograms for classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages