Temporal Convolutional Networks for BMX Dynamics

The Question

How good is AI at learning vehicle dynamics — and where does a learned model stop being trustworthy? A half-car/bicycle system, with its simple formulation yet rich dynamics, is an ideal testbed: interesting enough to be non-trivial, cheap enough to generate the large datasets a neural network needs. The experiment trains a sequence model to reproduce the bike's dynamics and then probes the limits of what it has actually learned.

Background

Temporal Convolutional Networks (TCNs) were proposed in 2018 by researchers at Carnegie Mellon University as an alternative to Recurrent Neural Networks for sequence modelling and forecasting. The key finding was that CNNs — traditionally associated with image classification — can outperform RNNs on a wide class of temporal problems. With their interpretable structure and efficient training, TCNs are well suited to learning dynamical systems from simulation data.

The Physical Model

The underlying vehicle model is a half-car/bicycle system with four degrees of freedom: two for the chassis and one for each wheel, capturing vertical dynamics, pitch, and independent suspension motion. Unilateral tyre contact is included, meaning the bicycle can become airborne — a detail that matters on technical downhill terrain. The model was run as a forward dynamic simulation along 1000 procedurally generated downhill courses, producing approximately 8 hours of state trajectories and 350 km of course data.

To prepare the data for the TCN, the model's constraint equations were orthogonalized so the network could learn states directly, without being asked to satisfy algebraic constraints it was not designed for.

Architecture

The TCN architecture reflects the multi-timescale structure of vehicle dynamics. Temporal blocks on the left capture behavior at different time scales. Convolutional filters scan the historical state sequence for motion patterns, with kernel size determining the temporal reach within each scale and filter count determining the number of distinct patterns searched. A pattern-of-patterns layer in the middle combines these representations, while Chomp1d ensures causality, Dropout provides regularization, and ReLU introduces nonlinearity. Without that nonlinear activation, the TCN would reduce to a complex linear regression.

Course geometry is encoded with 10 features per discretization point and compressed by a small MLP into a 128-dimensional embedding before entering the TCN.

Training and Generalization

Correct hyperparameter choice is critical. Strongly prioritizing teacher forcing — given ground-truth past states, predict the next step — over autoregressive rollout during the first 20 to 50 epochs is essential for stability. Velocity state predictions require high weighting relative to position states to suppress drift in long rollouts.

The trained TCN predicts positions and velocities accurately across the test distribution, including airborne phases. The key constraint on its use is also the clearest: the network learns to predict the quantities it was trained on. Attempting to derive suspension forces or tyre loads from predicted positions, without having explicitly included those as training targets, produces unreliable results.