论文readme.md备份

Baby-Crying-Detection-Based-on-Audio-and-Video-Fusion

Introduction

There are dataset and code for paper which name is Research of Infant Crying Detection Method Based on Audio and Video Fusion. We create a public dataset about baby crying and show some detail code about how to finish our work about improve baby crying detection accuracy. It will introduce how to use this data and how to achieve crying sense by audio and video fusion method in the following.

database

We collect some data from AudioSet[1] project. It’s aim to achieve a standard data set in audio analysis. Firstly,we get youtube’s URL and time interval from AudioSet. It have 2300 data sample which not carefully handled. Next, deal those data and collect counter-sample data from UCF101[2]. finally,we get data distribution in the figure below. every video sample play 4s. every audio also play 4s and correspond to the content of the video.

class baby crying not baby crying total
audio dataset 1200 1300 2500
video dataset 1200 1300 2500

you can download not processing data from here. We also put these data into baiducloud:baiducloud,extract code: sjpq.

Download precessing video data from here.

Download precessing audio data from here.

Some suggest : you can develop self work by these raw dataset or put precessing audio and video data in our code at right file path to run.

How to run our code?

tip:

We develop our code in google colab. so it just publish a .ipynb file . And our paper work have some fixed in the finally push. It aim to prove dataset for your self work .And give you some suggest code if you want to use audio video fusion method .

framework

论文readme.md备份

1

this path exist processing video data. The reason why path name is 3DCNN is that we create it do 3D conv example firstly.

2

SVM file path exist processing audio data . It is first use in SVM example . Both of 1 and 2 have two class : crying and no crying.

3

We deal our data use ffmpeg library . however,I have remove those process code into finishwork.ipynb. So ,this file just is processing demo without any use.

4

This text file exist all message about baby crying dataset,you can download 2300 video from a.text .It has youtube ID 、crying start time、 crying end time. We hope this file help for you looking forward origin.

5

It is key file in our work. I put almost all work in this file .But It is not finally version in our paper. The final version of the work need contact qq:2425497621 duce some cooperation with hospital.

Step ( run finshwork.ipynb in colab)

**step 1: deal data **

At first,we use 4 unit code deal raw data .It generate 3DCNN and SVM path.

step 2: extract input data

We extract some feature and input from 3DCNN and SVM. And save them.

step 3: train SVM

Train data by SVM method.

step 4: train CNN

Train data by CNN method.

**step 4: train 3DCNN **

Train data by 3DCNN method. In this unit we solve the problem of timing synchronization at first. the detail please read

function read_data().

step 5: train LSTM

Train data by LSTM method.

**step 6: train audio and video fushion menthod **

Train data by audio and video fushion menthod . There have four method achieve by keras framework . We suggest you use tensorflow.keras .It also support TPU.

Result

The result of our finally work.

论文readme.md备份

references

[1] Gemmeke J F, Ellis D P W, Freedman D, et al. Audio set: An ontology and human-labeled dataset for audio events[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017: 776-780.

[2]Khurram Soomro, Amir Roshan Zamir and Mubarak Shah, UCF101: A Dataset of 101 Human Action Classes From

Videos in The Wild, CRCV-TR-12-01, November, 2012.

Supplementary explanation

if readme.md file is’t work ,please open my blog at here