IntegraPose: HMM-VAE-LSTM Segmentation Guide

Unsupervised Behavioral Segmentation

Discover the hidden language of behavior. This guide details an advanced, unsupervised workflow that automatically segments continuous pose data into discrete, meaningful behavioral "syllables" or motifs without any prior labels.

The Three-Stage Pipeline

🧬

1. VAE: Pose Simplifier

The Variational Autoencoder (VAE) learns to compress high-dimensional pose features (angles, distances) into a simple, low-dimensional latent space. This captures the "essence" of each posture while removing noise.

⏳

2. LSTM: Dynamics Modeler

The Long Short-Term Memory (LSTM) network learns the rules of movement by predicting the next pose in a sequence. It models the temporal flow of behavior. (See note below).

🧩

3. HMM: Behavior Segmenter

The Hidden Markov Model (HMM) takes the simplified pose sequences and groups them into a set of recurring, stereotyped hidden states. These states are the discovered behavioral syllables.

🚨 Important Note on the LSTM's Role

In this specific workflow, the LSTM is trained to model the temporal dynamics of behavior but is **not** a sequence-to-sequence (Seq2Seq) model and is **not directly used for the final segmentation**. The VAE provides its latent space output directly to the HMM, which performs the segmentation. The trained LSTM can be valuable for downstream tasks like anomaly detection but is an optional component in the primary discovery pipeline.

Step-by-Step Guide

Setup Project and Data

Begin by setting up your project in **Tab 1: Setup & Input**. This process is identical to the standard HMM setup.

Define your Keypoint Names and Behavior Names.
Create at least one data group (e.g., "Control") and add your data sources (Pose Directory + Video File). The VAE will be trained on the data from the **first group** you add.
Define your Skeleton Connections and Normalization Reference Points for accurate feature calculation.

Configure VAE/LSTM Parameters

Navigate to **Tab 2: Analysis Parameters** and configure the settings for the unsupervised pipeline.

Set the **Analysis Type** to VAE/LSTM.
In the "VAE/LSTM Parameters" section, define the core architecture:
- Latent Dimensions: The size of the compressed pose representation. A value between 8 and 16 is a good starting point.
- VAE Epochs: Number of training iterations for the VAE. 100-200 is typical.
- Num Behavioral Clusters (HMM): This sets the number of hidden states for the HMM to discover. It's the most important parameter for defining the granularity of your behavioral syllables.
- Min Postural Cluster Size (HDBSCAN): Used for an optional, separate clustering of static poses. Does not affect the main HMM segmentation.
(Optional) Check Run Bidirectional LSTM after VAE if you wish to train the dynamics model.

💡 Tip: Use "Suggest Parameters"

If you are unsure where to start, load your data in Tab 1 and then click the "Suggest Parameters" button. The tool will analyze your dataset size and suggest reasonable starting values for UMAP, HDBSCAN, and the VAE latent dimensions.

Execute the Analysis

Go to **Tab 3: Execute & Visualize**.

Select your Output Folder.
Click the **"Run Analysis"** button.
Monitor the progress bar and status updates. The process will involve:
- Computing high-dimensional feature vectors.
- Training the VAE on your baseline group.
- Encoding all pose data into the VAE's latent space.
- Training the HMM on the latent sequences to discover behavioral states.
- Generating reports, plots, and videos.

Interpret the Outputs

Once the analysis is complete, several key outputs will be generated in your specified output folder and can be previewed in the GUI.

Visualizations (Tab 3)

Use the "Plot to Display" dropdown to view:

latent_space: A 2D UMAP projection of the VAE's latent space. It shows the overall structure of all postures in your dataset.
postural_clusters: The latent space colored by the HDBSCAN postural clusters. This shows groups of statically similar poses.
behavioral_clusters: The latent space colored by the HMM behavioral states. This is the key plot, showing how the discovered temporal syllables map onto the posture space.
vae_comparison: If you have multiple groups, this heatmap shows the distance between group centroids in the latent space, quantifying overall behavioral differences.

Generated Videos

In your output folder, you will find a video for each discovered behavioral cluster (e.g., behavior_cluster_0.mp4, behavior_cluster_1.mp4, etc.).

These videos are montages of all instances where the animal was in that specific HMM state. **Watching these videos is the most important step for assigning a human-interpretable name (e.g., "sniffing," "turning") to each automatically discovered syllable.**