Movie Genre Classifier

Project Description: Movie Poster Genre Classification

Overview

This project focuses on classifying movie genres based on poster images using a combination of web scraping, database management, and machine learning. By scraping data from public movie databases, processing poster images into feature vectors, and performing genre classification, the project demonstrates the integration of data engineering and deep learning techniques to solve real-world challenges.

Key Steps

Web Scraping:
- Objective: Scrape movie metadata, including titles, genres, and poster image URLs, from IMDb or TMDb.
- Implementation:
  - Used BeautifulSoup and requests to extract data such as titles and genres.
  - Downloaded poster images for vectorization and analysis.
- Outcome: Collected a comprehensive dataset of movie metadata and poster images for further processing.
Data Storage:
- Objective: Organize metadata and poster embeddings into a structured SQL database.
- Implementation:
  - Designed a schema with three main tables:
    - Movies: Includes movie titles and poster images.
    - Genres: Stores unique genre names.
    - Movie-Genres Relationship: Links movies to their respective genres.
  - Stored poster embeddings (256-dimensional feature vectors) in a separate vectors table for efficient retrieval.
- Outcome: A scalable database for storing and querying metadata and embeddings.
Image Vectorization:
- Objective: Convert poster images into embeddings that capture visual features.
- Implementation:
  - Fine-tuned a pre-trained ResNet34 model to generate 256-dimensional embeddings.
  - Incorporated data augmentation (resizing, normalization) to improve generalization.
- Outcome: Generated embeddings for all poster images, representing them as high-dimensional feature vectors.
Genre Classification:
- Objective: Train a neural network to predict movie genres based on poster embeddings.
- Implementation:
  - Used a multi-label classification setup, as movies often belong to multiple genres.
  - Handled class imbalance by computing and applying appropriate weights during training.
  - Evaluated performance using metrics such as F1-score, precision, and recall.
- Outcome: Developed a model capable of predicting genres with high accuracy.
Vector Search:
- Objective: Perform similarity searches to classify or recommend movies based on poster vectors.
- Implementation:
  - Stored embeddings in the SQL database for efficient querying.
  - Calculated similarities using Euclidean distance between vectors.
- Outcome: Enabled genre classification and recommendations for new posters by querying the vector database.

Skills Demonstrated

Web Scraping: Extracted structured data from unstructured web pages using BeautifulSoup and requests.
Database Management: Designed and implemented a relational database schema, handling binary data storage for images.
Deep Learning:
- Fine-tuned a CNN for feature extraction and genre classification.
- Implemented custom layers to generate embeddings for vector search.
Data Engineering:
- Managed large datasets of images and metadata.
- Performed data augmentation and preprocessing to ensure model robustness.
Machine Learning: Used metrics like F1-score and recall to assess multi-label classification performance.
Optimization: Leveraged GPU acceleration for model training and batch processing for efficiency.

Key Outcomes

Created a scalable pipeline for scraping, storing, and analyzing movie posters and metadata.
Built a multi-label classification model for genre prediction with robust performance.
Enabled vector-based similarity search for recommendations or classification of new posters.

Next Steps

Expand Dataset: Include more diverse movies and genres to improve model generalization.
Enhance Search: Integrate more advanced vector search frameworks like FAISS or Milvus for scalability.
Deploy Solution: Build an interactive web interface for users to upload posters and get genre recommendations.

This project illustrates a practical application of machine learning in the entertainment domain, showcasing the potential of integrating data engineering with AI solutions.

Experience the demo of my project with ease! Simply run the Docker-Compose containers locally to explore its functionality.

Google Sites

Report abuse

Movie Genre Classifier

Project Description: Movie Poster Genre Classification

Overview

Key Steps

Web Scraping:

Data Storage:

Image Vectorization:

Genre Classification:

Vector Search:

Skills Demonstrated

Key Outcomes

Next Steps