Deepfake Video Detection
Dual InceptionResNet + BiLSTM achieving 94.5% accuracy
Overview
As synthetic media becomes indistinguishable from real content, detection systems become critical. I built a detection pipeline combining dual InceptionResNet CNNs with BiLSTM temporal modeling on the Celeb-DF dataset. Served predictions via Flask REST API with per-frame confidence scoring. The 94.5% accuracy was achieved through careful architecture choices and ensemble methods.
The Problem
Synthetic media generation has advanced rapidly, making deepfakes increasingly difficult to distinguish from authentic content. This poses risks for misinformation, identity fraud, and trust in digital media. Reliable detection systems are becoming critical infrastructure for content verification.
The Approach
I developed a multi-modal detection pipeline that analyzes facial artifacts at multiple levels: spatial-domain analysis for visual inconsistencies, frequency-domain analysis for GAN fingerprints, and temporal consistency checks for video content.
The system used a ResNet-based backbone fine-tuned on a diverse dataset of both genuine and synthetically generated faces. Built with Python, PyTorch, OpenCV for image processing, and FFmpeg for video analysis. Adversarial training improved robustness against newer generation techniques.
Key Decisions
The frequency-domain analysis proved particularly valuable — GAN-generated images leave subtle spectral artifacts that are invisible to human observers but detectable through Fourier analysis. Combining spatial and frequency features gave the model complementary detection signals.
I implemented an adversarial training loop where the detector was continuously challenged with increasingly sophisticated synthetic samples, preventing overfitting to specific generation methods.
Impact
The system achieved 94.5% detection accuracy across multiple deepfake generation methods. The research findings were published, contributing to the broader effort to develop reliable synthetic media detection.