IntegraPose Real-Time LSTM Classification Guide

Supervised, Real-Time LSTM Classification

This guide details a powerful, automated workflow for training a temporal deep learning model (LSTM) to classify animal behaviors in real-time. This toolkit leverages custom-trained pose-estimation models to create a robust, supervised classification pipeline, controlled by a single command-line interface.

🚀 Project Structure

The toolkit is organized into a modular structure for better maintainability and clarity. All operations are run from the root of the integrapose_lstm/ directory.

📁 integrapose_lstm/
    ├─ 📁 data_processing/
    │  ├─ features.py
    │  ├─ loader.py
    │  ├─ preparation.py
    │  └─ process_clips.py
    ├─ 📁 gui/
    │  └─ clipper.py
    ├─ 📁 inference/
    │  └─ real_time.py
    ├─ 📁 training/
    │  ├─ models.py
    │  └─ train.py
    ├─ 📜 main.py
    ├─ 📜 run_training.py
    ├─ 📜 config.json
    └─ 📜 requirements.txt

The Complete End-to-End Workflow

Phase 1: Pose Model Training (IntegraPose Main Toolkit)

Input: Raw video frames for keypoint labeling.
Action: Use the main IntegraPose GUI to label keypoints and train a custom YOLO-pose model.
Output: A trained pose-estimation model (e.g., yolo_pose_model.pt).

⬇️

Phase 2: Labeled Clip & Pose Data Generation (This Toolkit)

Input: A long, unlabeled video and your trained YOLO model.
Action 1 (Clipping): Run python main.py clip to create and save short video clips of specific behaviors.
Action 2 (Automated Pose Estimation): Run python main.py process to automatically generate .txt pose data for all clips.
Action 3 (Consolidation): Run python main.py prepare to create the final, unified training dataset.
Output: A folder of labeled pose data ready for training.

⬇️

Phase 3: LSTM Classifier Training (This Toolkit)

Input: The labeled pose dataset from Phase 2.
Action: Run python main.py train.
Output: A trained LSTM classifier (lstm_classifier.pt) and normalization statistics (norm_stats.pkl).

⬇️

Phase 4: Real-Time Deployment (This Toolkit)

Input: Both the yolo_pose_model.pt and the lstm_classifier.pt.
Action: Run python main.py infer on a new video.
Output: Live, frame-by-frame behavioral classification.

Step-by-Step Guide

Prerequisites

You must have a custom-trained YOLO-pose model from the main IntegraPose toolkit.
Install all required packages by running pip install -r requirements.txt in your terminal.

Stage 1: Create Labeled Video Clips

Generate a dataset of short video clips, each containing a single, clear example of a behavior, using the Video Clipper GUI.

Configuration (`config.json`)

First, define your behaviors and keyboard shortcuts in the dataset_preparation section of your config file.

"dataset_preparation": {
  "source_directory": "clips_for_labeling",
  "key_mapping": { "w": "Walking", "r": "Rearing", ... },
  "class_mapping": { "Walking": 0, "Rearing": 2, ... }
}
                        

Running the GUI & Labeling

python main.py clip

Click Load Video to open a long video you want to label.
Find a behavior, then use Set Start Frame and Set End Frame to mark its boundaries.
Assign a label with a keypress (e.g., w for "Walking").
Click Add Clip to Queue. You can queue many clips for multiple behaviors.
When finished, click Extract All Queued Clips. The GUI will save the clips into a directory structure like the one below.

📁 clips_for_labeling/
    ├─ 📁 Walking/
    │  └─ your_video_Walking_frames_100_250.mp4
    └─ 📁 Rearing/
       └─ your_video_Rearing_frames_550_700.mp4

Stage 2: Automate Pose Data Generation

This command automates the previously manual process of generating keypoint data for all your video clips.

Configuration (`config.json`)

Ensure the path to your trained YOLO model is correct in the inference_params section.

"inference_params": {
  "yolo_model_path": "path/to/your/yolo_pose_model.pt",
  ...
}
                        

Execution

python main.py process

The script will scan all subfolders in your source_directory, run your YOLO model on every .mp4 file, and save the resulting .txt pose files directly into the correct behavior folder.

Automation Complete!

The tedious manual step of running YOLO and moving files is now handled by a single command. Each behavior folder now contains both video clips and their corresponding pose data.

Stage 3: Prepare the Final Dataset

This script automates the final consolidation of your pose data into a training-ready format.

Configuration and Execution

Ensure your data.pose_directory in config.json is set to where you want the final dataset to be stored. Then run:

python main.py prepare

The script scans the subfolders, finds all .txt files, replaces their internal class ID with the correct one from your config, and copies them into the final training directory.

Stage 4: Train the LSTM Classifier

This stage trains the LSTM model on your prepared dataset.

Understanding the Sequence Length

A critical parameter is the sequence_length in your config, which defines the LSTM's "rolling buffer" or "memory".

"sequence_params": {
  "sequence_length": 15
}
                        

This value dictates how many consecutive frames the LSTM analyzes to make a single prediction. A shorter length (e.g., 15 frames at 30fps = 0.5s) is good for quick behaviors, while a longer length may better capture complex sequences but will have a greater classification delay.

Running the Training

python main.py train

The script will compute features, create sequences, and train the model. It saves the best model (lstm_classifier.pt) and normalization data (norm_stats.pkl) to your output folder.

Stage 5: Run Real-Time Inference

Deploy your models for live classification.

Configuration

Update the video_source in your config.json to point to the video you wish to analyze.

"inference_params": {
  "yolo_model_path": "...",
  "video_source": "path/to/your/inference_video.mp4"
}
                        

Execution

python main.py infer

A window will appear showing the live classification. Press q to quit the video stream.