Supervised, Real-Time LSTM Classification
This guide details a powerful, automated workflow for training a temporal deep learning model (LSTM) to classify animal behaviors in real-time. This toolkit leverages custom-trained pose-estimation models to create a robust, supervised classification pipeline, controlled by a single command-line interface.
🚀 Project Structure
The toolkit is organized into a modular structure for better maintainability and clarity. All operations are run from the root of the integrapose_lstm/ directory.
📁 integrapose_lstm/
├─ 📁 data_processing/
│ ├─ features.py
│ ├─ loader.py
│ ├─ preparation.py
│ └─ process_clips.py
├─ 📁 gui/
│ └─ clipper.py
├─ 📁 inference/
│ └─ real_time.py
├─ 📁 training/
│ ├─ models.py
│ └─ train.py
├─ 📜 main.py
├─ 📜 run_training.py
├─ 📜 config.json
└─ 📜 requirements.txt
The Complete End-to-End Workflow
Phase 1: Pose Model Training (IntegraPose Main Toolkit)
- Input: Raw video frames for keypoint labeling.
- Action: Use the main IntegraPose GUI to label keypoints and train a custom YOLO-pose model.
- Output: A trained pose-estimation model (e.g., yolo_pose_model.pt).
Phase 2: Labeled Clip & Pose Data Generation (This Toolkit)
- Input: A long, unlabeled video and your trained YOLO model.
- Action 1 (Clipping): Run python main.py clip to create and save short video clips of specific behaviors.
- Action 2 (Automated Pose Estimation): Run python main.py process to automatically generate .txt pose data for all clips.
- Action 3 (Consolidation): Run python main.py prepare to create the final, unified training dataset.
- Output: A folder of labeled pose data ready for training.
Phase 3: LSTM Classifier Training (This Toolkit)
- Input: The labeled pose dataset from Phase 2.
- Action: Run python main.py train.
- Output: A trained LSTM classifier (lstm_classifier.pt) and normalization statistics (norm_stats.pkl).
Phase 4: Real-Time Deployment (This Toolkit)
- Input: Both the yolo_pose_model.pt and the lstm_classifier.pt.
- Action: Run python main.py infer on a new video.
- Output: Live, frame-by-frame behavioral classification.
Step-by-Step Guide
Prerequisites
- You must have a custom-trained YOLO-pose model from the main IntegraPose toolkit.
- Install all required packages by running pip install -r requirements.txt in your terminal.
Stage 1: Create Labeled Video Clips
Generate a dataset of short video clips, each containing a single, clear example of a behavior, using the Video Clipper GUI.
Configuration (`config.json`)
First, define your behaviors and keyboard shortcuts in the dataset_preparation section of your config file.
Running the GUI & Labeling
- Click Load Video to open a long video you want to label.
- Find a behavior, then use Set Start Frame and Set End Frame to mark its boundaries.
- Assign a label with a keypress (e.g., w for "Walking").
- Click Add Clip to Queue. You can queue many clips for multiple behaviors.
- When finished, click Extract All Queued Clips. The GUI will save the clips into a directory structure like the one below.
📁 clips_for_labeling/
├─ 📁 Walking/
│ └─ your_video_Walking_frames_100_250.mp4
└─ 📁 Rearing/
└─ your_video_Rearing_frames_550_700.mp4
Stage 2: Automate Pose Data Generation
This command automates the previously manual process of generating keypoint data for all your video clips.
Configuration (`config.json`)
Ensure the path to your trained YOLO model is correct in the inference_params section.
Execution
The script will scan all subfolders in your source_directory, run your YOLO model on every .mp4 file, and save the resulting .txt pose files directly into the correct behavior folder.
Automation Complete!
The tedious manual step of running YOLO and moving files is now handled by a single command. Each behavior folder now contains both video clips and their corresponding pose data.
Stage 3: Prepare the Final Dataset
This script automates the final consolidation of your pose data into a training-ready format.
Configuration and Execution
Ensure your data.pose_directory in config.json is set to where you want the final dataset to be stored. Then run:
The script scans the subfolders, finds all .txt files, replaces their internal class ID with the correct one from your config, and copies them into the final training directory.
Stage 4: Train the LSTM Classifier
This stage trains the LSTM model on your prepared dataset.
Understanding the Sequence Length
A critical parameter is the sequence_length in your config, which defines the LSTM's "rolling buffer" or "memory".
This value dictates how many consecutive frames the LSTM analyzes to make a single prediction. A shorter length (e.g., 15 frames at 30fps = 0.5s) is good for quick behaviors, while a longer length may better capture complex sequences but will have a greater classification delay.
Running the Training
The script will compute features, create sequences, and train the model. It saves the best model (lstm_classifier.pt) and normalization data (norm_stats.pkl) to your output folder.
Stage 5: Run Real-Time Inference
Deploy your models for live classification.
Configuration
Update the video_source in your config.json to point to the video you wish to analyze.
Execution
A window will appear showing the live classification. Press q to quit the video stream.