# Tutorial: Conditioned Generation of 3D Human Motions (Action2Motion)

Action2Motion can be seen as an inverse of action recognition: given a prescribed action type, it aims to generate plausible human motion sequences in 3D. Importantly, the set of generated motions are expected to maintain its diversity to be able to explore the entire action-conditioned motion space; meanwhile, each sampled sequence faithfully resembles a natural human body articulation dynamics. Motivated by these objectives, Action2Motion follows the physics law of human kinematics by adopting the Lie Algebra theory to represent the natural human motions; we also propose a temporal Variational Auto-Encoder (VAE) that encourages a diverse sampling of the motion space. 

See more from [original implementation](https://ericguo5513.github.io/action-to-motion/), and [paper link](https://arxiv.org/pdf/2007.15240.pdf)

## Dataset

To get the pre-process the dataset, please refer to the this [Github repository](https://github.com/Mathux/ACTOR) and agree to the license. There following code shows examples from `HumanAct12` dataset.

In [1]:
# Set data path
data_path = "E://researches/action-to-motion/dataset/humanact12"

## Training

In [2]:
import torch

In [3]:
from genmotion.algorithm.action2motion.configs import params
from genmotion.algorithm.action2motion.utils import paramUtil
from genmotion.algorithm.action2motion.dataset import MotionFolderDatasetHumanAct12, MotionDataset

In [5]:
opt = params.TrainingConfig()
print(vars(opt))

{'arbitrary_len': False, 'batch_size': 8, 'checkpoints_dir': './checkpoints/vae', 'clip_set': './dataset/pose_clip_full.csv', 'coarse_grained': True, 'dataset_type': 'humanact12', 'decoder_hidden_layers': 2, 'dim_z': 30, 'eval_every': 2000, 'gpu_id': 0, 'hidden_size': 128, 'isTrain': True, 'is_continue': False, 'iters': 50000, 'lambda_align': 0.5, 'lambda_kld': 0.001, 'lambda_trajec': 0.8, 'lie_enforce': False, 'motion_length': 60, 'name': 'act2motion', 'no_trajectory': False, 'plot_every': 50, 'posterior_hidden_layers': 1, 'print_every': 20, 'prior_hidden_layers': 1, 'save_every': 2000, 'save_latest': 50, 'skip_prob': 0, 'tf_ratio': 0.6, 'time_counter': True, 'use_geo_loss': False, 'use_lie': True}


In [6]:
import torch
print("torch version:", torch.__version__)

torch version: 1.7.1


In [7]:
device = torch.device("cuda:" + str(opt.gpu_id) if torch.cuda.is_available() else "cpu")

In [8]:
joints_num = 0
input_size = 72
data = None

In [9]:
if opt.dataset_type == "humanact12":
    input_size = 72
    joints_num = 24
    raw_offsets = paramUtil.humanact12_raw_offsets
    kinematic_chain = paramUtil.humanact12_kinematic_chain
    data = MotionFolderDatasetHumanAct12(data_path, opt, lie_enforce=opt.lie_enforce)

Total number of frames 90099, videos 1191, action types 12


In [11]:
data[0][0].shape

(64, 72)

In [15]:
opt.dim_category = len(data.labels)
# arbitrary_len won't limit motion length, but the batch size has to be 1
if opt.arbitrary_len:
    opt.batch_size = 1
    motion_loader = torch.utils.data.DataLoader(data, batch_size=opt.batch_size, drop_last=True, num_workers=1, shuffle=True)
else:
    motion_dataset = MotionDataset(data, opt)
    motion_loader =  torch.utils.data.DataLoader(motion_dataset, batch_size=opt.batch_size, drop_last=True, num_workers=2, shuffle=True)

In [17]:
len(motion_loader)

148

In [18]:
opt.pose_dim = input_size

if opt.time_counter:
    opt.input_size = input_size + opt.dim_category + 1
else:
    opt.input_size = input_size + opt.dim_category

opt.output_size = input_size