I am working on a Deep Learning model which will help me predict deep fake voices. For the data preprocessing, I have done everything to the T, following papers which have already been published. But the problem I face is when I want to train the model. The CNN model starts at 51% ish accuracy and ends up at 54%-ish accuracy. I am feeding images (arrays basically) of spectrograms to the model. I don't know if any of my fundamentals are wrong. I want to improve the accuracy via training.
My dataset size is 1000 audio files, I can bump it upto 4500 if needed. But my model needs to show promise of learning first, the accuracy remaining very close from start to end tells me otherwise.
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
def create_cnn_model(input_shape):
model = models.Sequential()
model.add(layers.InputLayer(input_shape=input_shape))
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
return model
input_shape = (X_train.shape[1], X_train.shape[2], 1)
cnn_model = create_cnn_model(input_shape)
cnn_model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = cnn_model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_val, y_val))
test_loss, test_acc = cnn_model.evaluate(X_val, y_val)
print(f"Test Accuracy: {test_acc * 100:.2f}%")