0
$\begingroup$

I am working on a Deep Learning model which will help me predict deep fake voices. For the data preprocessing, I have done everything to the T, following papers which have already been published. But the problem I face is when I want to train the model. The CNN model starts at 51% ish accuracy and ends up at 54%-ish accuracy. I am feeding images (arrays basically) of spectrograms to the model. I don't know if any of my fundamentals are wrong. I want to improve the accuracy via training.

My dataset size is 1000 audio files, I can bump it upto 4500 if needed. But my model needs to show promise of learning first, the accuracy remaining very close from start to end tells me otherwise.

import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np


def create_cnn_model(input_shape):
    model = models.Sequential()

   
    model.add(layers.InputLayer(input_shape=input_shape))

    
    model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
    model.add(layers.MaxPooling2D((2, 2)))

    
    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(256, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(128, activation='relu'))


    model.add(layers.Dense(1, activation='sigmoid'))

    return model


input_shape = (X_train.shape[1], X_train.shape[2], 1)  
cnn_model = create_cnn_model(input_shape)

cnn_model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])


history = cnn_model.fit(X_train, y_train, epochs=50, batch_size=32,     validation_data=(X_val, y_val))

test_loss, test_acc = cnn_model.evaluate(X_val, y_val)
print(f"Test Accuracy: {test_acc * 100:.2f}%")
$\endgroup$

1 Answer 1

0
$\begingroup$

Low accuracy during train-time often points out to the problem of underfitting the data distribution. You can try some of the following methods to improve the fitting of data / increase capacity of model

  1. Use more neurons in the fully connected layers
  2. Have more depth, in both convolutional and FCNN layers

Feeding more data to the model might do more harm than good, as it will effectively reduce the model's capacity, thereby exacerbating the underfitting problem.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.