IntegraPose: LSTM Autoencoder Guide

Sub-Behavioral Analysis with LSTM Autoencoders

Go beyond broad behavioral labels. This guide explains how to use a Sequence-to-Sequence (Seq2Seq) LSTM Autoencoder to discover subtle, stereotyped variations—or "sub-behaviors"—within the same general action, like different types of walking or grooming.

How It Works: Learning Behavioral Signatures

📜

1. VAE Pre-training

First, a Variational Autoencoder (VAE) is trained on all individual poses. Its job is to learn a compressed, "latent" representation for every single frame, simplifying the postural information before analyzing sequences.

🧠

2. LSTM Seq2Seq Autoencoder

The core of the method. An LSTM Encoder reads a sequence of VAE-encoded poses and compresses the entire dynamic movement into a single thought vector—its "embedding." An LSTM Decoder then tries to reconstruct the original sequence from this vector. By learning to do this, the encoder becomes an expert at creating embeddings that represent behavioral signatures.

📊

3. HDBSCAN Clustering

Finally, all the sequence embeddings are collected. HDBSCAN, a powerful clustering algorithm, groups these embeddings based on their similarity. Each resulting cluster represents a distinct, stereotyped sub-behavior.

Step-by-Step Guide

Prepare the Configuration File

This entire analysis is controlled by a single config.json file. You must edit this file to point to your data and set the analysis parameters. Below is a breakdown of the key sections.

Data and Feature Parameters

{
  "data": {
    "pose_directory": "path/to/your/yolo/labels",
    "video_file": "path/to/your/source_video.mp4",
    "output_directory": "path/to/your/analysis_results",
    "keypoint_names": ["nose", "leftear", ...],
    "save_model": true
  },
  "feature_params": {
    "angles_to_compute": [
      ["leftear", "nose", "rightear"],
      ["f_leftpaw", "thorax", "f_rightpaw"]
    ]
  }
}
                        

pose_directory: Path to the folder containing YOLO .txt output files.
video_file: Path to the corresponding video file, used for un-normalizing coordinates and generating videos.
output_directory: Where all results (videos, plots, CSVs) will be saved.
keypoint_names: An ordered list of your keypoint names, matching the model's output.
angles_to_compute: Define triplets of keypoints to calculate angles as features.

Set Model Hyperparameters

In the same config.json file, you define the architecture and training settings for the VAE and LSTM models.

{
  "vae_params": {
    "latent_dim": 32,
    "intermediate_dim": 256,
    "epochs": 1000,
    "dropout_rate": 0.2
  },
  "lstm_autoencoder_params": {
    "hidden_dim": 256,
    "num_layers": 2,
    "epochs": 500,
    "dropout_rate": 0.2
  },
  "clustering_params": {
    "min_cluster_size": 3
  }
}
                        

vae_params: Controls the VAE that pre-processes the poses. latent_dim is the most important parameter here.
lstm_autoencoder_params: Controls the main Seq2Seq model. hidden_dim defines the size of the sequence embedding.
clustering_params: min_cluster_size tells HDBSCAN the minimum number of sequences required to form a distinct sub-behavior cluster.

Run the Analysis Script

Once your config.json is saved, open a terminal, activate your Python environment, and run the main analysis script.

python run_analysis.py

The script will execute the full pipeline automatically: loading data, training models, clustering, and generating outputs. Monitor the terminal for progress updates and any potential warnings.

Interpret the Outputs

The most important results will be in your specified output directory.

Sub-Behavior Videos

The script generates a video for each discovered sub-cluster, named according to the original behavior and the new sub-cluster label (e.g., behavior_0_subcluster_0.mp4, behavior_0_subcluster_1.mp4).

By watching these videos, you can assign meaningful names to the clusters. For example, you might find that "Walking" (Behavior 0) was automatically segmented into "fast walking" (Sub-cluster 0) and "hesitant walking" (Sub-cluster 1).

UMAP Visualization

A plot named umap_visualization_behavior_class_ids.png will be saved. This plot shows the final sequence embeddings projected into 2D space, colored by their original broad behavior label. It helps you visualize how different behaviors are represented and whether the discovered sub-clusters are distinct.

Data Report

A detailed CSV file, behavior_clusters_full.csv, contains the original detection data along with the final assigned sub-cluster label for each frame, allowing for further quantitative analysis.