ConSciNet
This repository contains the trained models, data and code used in the publication: https://arxiv.org/abs/2103.10905
view repo
Given an unknown dynamic system such as a coupled harmonic oscillator with n springs and point masses. We are often interested in gaining insights into its physical parameters, i.e. stiffnesses and masses, by observing trajectories of motion. How do we achieve this from video frames or time-series data and without the knowledge of the dynamics model? We present a neural framework for estimating physical parameters in a manner consistent with the underlying physics. The neural framework uses a deep latent variable model to disentangle the system physical parameters from canonical coordinate observations. It then returns a Hamiltonian parameterization that generalizes well with respect to the discovered physical parameters. We tested our framework with simple harmonic oscillators, n=1, and noisy observations and show that it discovers the underlying system parameters and generalizes well with respect to these discovered parameters. Our model also extrapolates the dynamics of the system beyond the training interval and outperforms a non-physically constrained baseline model. Our source code and datasets can be found at this URL: https://github.com/gbarber94/ConSciNet.
READ FULL TEXT VIEW PDF
In this paper, we introduce Symplectic ODE-Net (SymODEN), a deep learnin...
read it
Canonical transformation plays a fundamental role in simplifying and sol...
read it
We present a novel architecture named Neural Physicist (NeurPhy) to lear...
read it
This paper presents a machine learning framework (GP-NODE) for Bayesian
...
read it
For many of the physical phenomena around us, we have developed sophisti...
read it
In this study, we propose a method for extracting the hidden algebraic
s...
read it
Robots performing tasks in dynamic environments would benefit greatly fr...
read it
This repository contains the trained models, data and code used in the publication: https://arxiv.org/abs/2103.10905
Neural networks have produced strong results across a range of disciplines and have addressed problems in Image classification [14, 13, 15]
[2, 5, 9], Protein folding [25], and so on. These successes are in part due to a neural network’s ability to extract and learn complex features and relations from data. Often these learned features and relations are not easily accessible and are referred to as the model’s “black box”. This is problematic for modeling physical systems where we often have strong prior knowledge of the underlying physics, constraints, and relations a network must learn. Standard neural networks do not have this knowledge and must learn directly from data. In practice, even for simple physical systems, this proves to be a challenging task with networks often failing to generalize a system’s behavior beyond the training interval. To address this issue neural architectures have been designed to explicitly embed physical constraints and symmetries [6, 17, 26, 20, 7] in the description of classical dynamical systems.Classical dynamical systems are described by Hamiltonian mechanics[21, 23, 27]. The principle of Hamiltonian mechanics was first incorporated into the design of neural networks nearly three decades ago [4]. In recent years this area has seen renewed interest [6, 26, 1, 18]. Here we will focus our discussion around [6]’s Hamiltonian Neural Network architecture for learning exactly conserved quantities from data in an unsupervised manner. Hamiltonian neural networks (HNNs) draw their inspiration from Hamiltonian mechanics and function through parameterizing a system’s Hamiltonian function with a neural architecture. The Hamiltonian of a system is a function that relates the system’s total energy to its canonical coordinates i.e. its generalized coordinates and momentum . The Hamiltonian of a physical dynamical system is given by:
(1) |
Where the total energy is the sum of the system’s kinetic and potential energies. The dynamics or the time evolution of the states is given by:
(2) |
In a HNN, the model receives observations of a system’s canonical coordinates as input. This input is passed through a feed-forward neural network and an energy-like value
is output from the network. This section of the model is a neural network parameterization of Equation 1:(3) |
The partial derivatives with respect to the input canonical coordinates of this network parameterization are then computed with the automatic differentiation library Autograd [16]. This section of the model is a parameterization of Equation 2:
(4) |
The objective of a HNN model is to minimize the mean squared error metric between the networks partial derivatives, Equation 4, and the derivatives of the input canonical coordinates . After training, the network can be treated as the underlying time evolution model and can be evaluated over a time range with an ODE solver.
This approach brings with it many aspects of the underlying Hamiltonian and is successful in generating predictions that respect the conservation of energy constraint outside the training interval. The HNN approach however does not maintain the same levels of flexibility and generative properties of the underlying Hamiltonian function in regards to a system’s physical parameters. For example, let us consider the Hamiltonian for a simple pendulum system shown in Figure 3.
(5) |
Where is the mass, is gravitational constant, is length, is the position defined by , is the angular momentum, . From this formulation, we can readily adjust the length parameter to simulate a new system. A HNN is incapable of this generalization. In [6] the system’s physical parameters are held constant and absorbed into the canonical coordinates. In the current work we will draw inspiration from work in physical parameter discovery in particular SciNet [11], to address this limitation and extend a HNNs generalization ability.
In this paper, we will propose a neural framework for joint parameter discovery and generative modeling of Hamiltonian systems. We will then utilize this framework to disentangle the physical parameter of a Hamiltonian system from its canonical coordinates and return a Hamiltonian parameterization that maintains the flexibility and generative properties of the system’s underlying Hamiltonian with regards to the system’s physical parameters. We make the following contributions: 1. We propose a neural framework for join parameter discovery and generative modeling of Hamiltonian systems that merges deep latent variable models and physical constraint embedded neural networks, 2. We disentangle physical parameter like values from noisy coordinate observations, 3. We return a neural network parameterization of a system’s Hamiltonian that generalizes well with regards to the discovered physical parameters and extrapolates the dynamics in physically consistent manner beyond the training interval. We will start with an overview of deep latent variables for parameter discovery with a focus on [11] and progress to our proposed architecture.
Given an unknown dynamical system, we are often interested in gaining insight into its governing parameters. For simple linear systems, standard curve fitting and regression methods can return functions with interpretable parameters and coefficients. However, for high-dimensional nonlinear systems, this can prove challenging. Neural networks act as universal function approximators [10] and as such, they are a natural choice for modeling the function of such a system. These networks learn features and parameters from the data during training, to drive their predictive accuracy. The learned features and parameters are not easily interpreted in a physical context and are instead masked within the nonlinear combinations and transformations executed by the network. To acquire physically interpretable parameters we will turn to deep latent variable models, a family of neural architectures that utilize bottlenecks to extract minimal representations.
Deep latent variable models typically contain two components, an encoder that maps the input to a reduced latent space, and a decoder that utilizes the latent variables to make a prediction. The bottleneck encourages the encoder to learn a minimal representation of the input. A deep latent variable model’s utility is limited by the quality of the learned latent representation. A good minimal representation will allow inference and the extraction of physically meaningful parameters.
There are two major considerations when designing a deep latent variable model for parameter discovery: the size of the latent vector and latent variable disentanglement. The size of the latent vector is generally fixed prior to training. Here if we have prior knowledge of a system’s minimal representation we can determine its size. For example, take an object moving with constant velocity, the minimal representation required to describe its next position includes its current position and velocity. The latent vector size for such a model could then be set to two. If we lack prior knowledge about the system the second consideration, disentanglement in the latent space becomes central. Ideally, a model with a good disentanglement will not encoded information into latent variables beyond those required for a minimal representation. For the case of an object with constant velocity, if the model was supplied with more than two latent variables (say six) it should only encode information in two. The SciNet architecture
[11] has produced these types of disentangled representations for systems in classical and quantum mechanics.The SciNet approach seeks to jointly disentangle a system’s physical parameters from coordinate observations and model its behavior with a neural network parameterized function of the disentangled parameters and some auxiliary variables. SciNet’s encoder is molded after that of a -VAE [8] encoder, as input it receives coordinate observations and as output it returns a reduced set of latent variables :
. SciNet’s decoder receives the latent variables and auxiliary variables as input and models a function of the input in terms of the target output:
.As a motivating example let us formulate a SciNet approach for an object moving with constant velocity. Given the objects initial position and velocity , it’s position at any time point is given by: . SciNet aims to learn a parameterization of this function and its input parameters. To accomplish this we can vary the objects the speed and take an observations of its trajectory at each speed as input to the encoder, select as our auxiliary latent variable and seek to output from the decoder. SciNet’s decoder is then given by a function of and the latent variables output from the encoder: .
As is provided to the system its minimal representation includes two constants: , . In this case, as is held constant it will not be disentangled by the network. During training SciNet’s encoder will learn to map from an input trajectory length to a minimal latent representation encoding information into only one latent variable a velocity like a parameter: . The decoder for this system can then be written as: . After training this parameterization can be split from the model and used to simulate new systems. Similar to adjusting the velocity in the underlying equation, the velocity like parameter can be adjusted to simulate an object traveling a different speeds. This generative property is bounded by the range of the physical parameters present in the training data.
The parameterization returned in a SciNet maintains a similar level of flexibility to a system’s underlying function in regards to its physical parameters. SciNet however does not directly embed physical constraints and can return physically inconsistent predictions. In the current work, we will propose a novel HNN SciNet hybrid architecture for energy conserving systems. Our approach maintains the flexibility and generative properties of a system’s underlying Hamiltonian in regards to a system’s physical parameters. We call this framework of joint parameter discovery and generative modeling Constrained SciNet (ConSciNet).
Constrained SciNet is a deep latent variable model that disentangles a system’s physical parameters from canonical coordinate observations and returns a Hamiltonian parameterization that generalizes with respect to the discovered parameters. Conceptually the proposed Constrained SciNet architecture follows a similar information flow to that of a standard SciNet with an important distinction in that the decoder has been replaced with a modified HNN and the auxiliary latent variables have been fixed by the system’s canonical coordinates. The general layout of a ConSciNet model is illustrated in Figure 1. Structurally the ConSciNet model is composed of three components: an encoder, a decoder, and an interface.
The Encoder as in a standard SciNet is molded after the encoder of a Variational AutoEncoder [12, 22]
with ELU activation functions
[3]. As input, the encoder receives observations of the system’s canonical coordinates over some time interval. As output, the encoder returns a reduced latent vector . The size of can be freely chosen prior to training. Here as we are interested in evaluating the model’s performance in latent variable disentanglement we set the size of to 3, a value greater than the minimum representations required for our trials. Our encoder is then given as: .The Interface receives the latent variables and auxiliary latent variables as input and structures this input for the decoder. Here the auxiliary latent variables are taken as one set of canonical coordinates per target. Limiting the coordinates passed to the decoder prevents the modified HNN decoder from directly learning a representation of the parameters itself and instead encourages the latent variables passed from the encoder to contain this representation. The Interface returns:
The Decoder is a modified HNN. In addition to the canonical coordinates it receives the latent variables as input: . As in a HNN the partial derivatives of this network are computed with respect to the canonical coordinates with Autograd [16]. The latent variables are treated as physical parameters and their partial derivatives are not returned. The inclusion of the physical parameter representation in the HNN’s input allows the model to generalize with respect to a system’s physical parameters, as in the underlying Hamiltonian formulation. Our decoder is given as: .
The objective function the ConSciNet model is trained with is given in Equation 6:
(6) |
This loss function is composed of two components. The first corresponds to the HNN loss and measures the performance of the decoder. The second component corresponds to the
-VAE loss (KL-divergence) and encourages a disentangled representation. Heuristically, we found small values for
around 0.005 struck a good balance between latent variable disentanglement and model performance.Baseline model. In addition to the ConSciNet model, a Baseline model without the Hamiltonian parameterization was trained as a point of comparison. The architecture for the Baseline model is sketched in Figure 2 and resembles that of a standard SciNet. The Baseline model’s decoder did not parameterize the Hamiltonian and instead consisted of a standard feed-forward network that directly output . The objective function the Baseline model is trained with is given in Equation 7:
(7) |
Implementation
. We implemented the ConSciNet and Baseline model in PyTorch
[19]. Our models and source code are available at https://github.com/gbarber94/ConSciNet.The ConSciNet model’s performance and parameter generalization ability was assessed on two simple dynamic systems from [6]: an ideal pendulum and ideal mass-spring system. For both systems, their physical parameters were varied during data generation and noise was added.
The Hamiltonian of an Ideal pendulum system given in Equation 5
was used to simulate multiple systems. For each simulation the length parameter was varied between 0.3 and 0.8 and its time evolution was evaluated over the range 0 to 10. For each time evolution, a noise vector of equal length was then added. The noise vector was sampled from a normal distribution with mean
. In total 50,000 trajectories of length 100 were generated. These trajectories were used as input into the encoder. For every trajectory, a time point was randomly selected over the generation range. At this point, the system’s time evolution was evaluated and its derivative stored. The coordinate evaluation at this point was used as the auxiliary latent variables and its derivative was used as the model’s target.(8) |
The Hamiltonian of an Ideal mass-spring system is given in Equation 8 where is the spring constant, is the mass, is the position and is the linear momentum given by . Here we sampled values for between 0.1 and 0.5, and values for between 0.5 and 1. For each parameter pairs we simulated, the system’s time evolutions from [0,10] and added a noise sampled from a normal distribution with mean and standard deviation . In total 50,000 trajectories of length 100 were generated. The model’s targets and auxiliary latent variables were sampled following the same procedure described in Task 1.
The performance of the ConSciNet and Baseline model (SciNet with a standard feed-forward decoder) were evaluated in terms of their latent variable disentanglement and generative functionality. After training, both models were split at their interface. In both tasks, the encoder was used to assess latent variable disentanglement. Trajectories with known physical parameters were passed to the encoders and their output latent variables were stored. The known physical parameters and their corresponding latent variables were then examined to identify latent variables encoding physical parameter like values.
Similarly, for both tasks, the decoder was used to assess the generative functionality of the models. The latent variable or variables identified as corresponding to physical parameters were interpolated between the minimum and maximum values of the corresponding physical parameters. The remaining latent variables that did not encode information were taken as 0. These values were then formatted to match the input latent vector structure of the decoder to create a parameter argument. For example, if the first entry in the latent vector corresponds to a length-like parameter
and all others encoded no information the parameter argument would be given by: . A parameter argument was generated for each interpolated latent variable. An ODE solver was then used to integrate the decoders and return coordinate predictions for each parameter argument. Here we used SciPy’s [28] solve_ivp function to integrate the network with a tolerance of 1e-12. A 4th order Runge-Kutta [24] method was selected for the solver. During this evaluation, an initial canonical coordinate value was supplied and the parameter argument was held constant. These network-generated trajectories were then assessed for physical constancy and alignment to the ground truth.We found that both the ConSciNet and Baseline models were able to successfully disentangle a length-like parameter from the noisy input coordinates. In Figure 4 we present the disentanglement results for both models. Here we plot the latent variable activations output by the encoder against the length value of the input trajectory.
From Figure 4(a), the ConSciNet model encoded in the 1st latent variable and encoded values close to 0 in the remainder. From Figure 4(b) the Baseline model encoded the value in the 3rd latent variable and values close to 0 in the remainder. The specific position of the latent variable encoding information and its sign do not convey information. These values are subject to change if the model is retrained. The discovered parameter is not the exact underlying length parameter rather it is some transformation of the value. In this case, the transformation is roughly linear.
Pendulum interpolated | MSE: t-span[0,20] | ||
---|---|---|
length | Baseline | ConSciNet |
0.3 | 0.160261 | 0.028775 |
0.5 | 0.023244 | 0.000613 |
0.6 | 0.022039 | 0.000117 |
0.8 | 0.025094 | 0.001503 |
We found the network parameterization learned in ConSciNet’s decoder maintained similar generative properties to the system’s underlying time evolution equations with regards to the physical parameters. ConSciNet was able to generalize across the discovered parameter domain and accurately extrapolate the time evolutions of the system governed by these parameters beyond the training interval t-span [0,10]. In Figure 5 we present trajectories generated from evaluating the decoder at four values of .
The evaluation values for were selected using a scheme that allowed for a ground truth comparison. From Figure 4, it is apparent that the mapping between the ground truth parameter and the parameter-like value learned by the network can be modeled with a simple polynomial fit. Here we fit a cubic polynomial for each model mapping from to . We then linearly interpolated four values for between the minimum and maximum values present in the data and mapped these values to their corresponding values to obtain the evaluation values of for each model.
In Figure 5(a), we present phase plots generated from evaluating the models at each value over the time interval t-span of [0,20]. The ConSciNet model generated trajectories that aligned with the ground truth trajectories and successfully respected the conservation of energy constraint over the evaluation window. The Baseline model on the other hand deviated from the ground truth trajectories over time and failed to conserve energy outside of the training interval, t-span of [0,10]. Its predictions generally gained or lost energy as the evaluation interval expanded beyond the training interval.
In Figure 5(b), we highlight the generalization ability of the models in regards to the discovered parameters, over the time interval t-span of [0,10]. As the length like parameter is increased the period decreases aligning with the expected physical behavior. The Baseline model while matching this behavior suffers in accuracy in comparison to the ConSciNet model.
The individual results here for each value of are similar to [6]’s results for a system with fixed parameters suggesting the ConSciNet approach successfully generalizes an HNN with regards to a system’s parameters.
The success of the Baseline model in parameter disentanglement and it’s predictive performance over the training range indicates that the SciNet framework can be used as in our approach to model time derivatives. This may have applications in parameter disentanglement for NeuralODEs.
Mass-spring interpolated | MSE: t-span[0,20] | ||
---|---|---|
Baseline MSE | ConSciNet MSE | |
0.1 | 0.185356 | 0.066921 |
0.2 | 0.336565 | 0.020660 |
0.4 | 0.491436 | 0.129221 |
0.5 | 0.485869 | 0.020968 |
Mass-spring interpolated mass | MSE: t-span[0,20] | ||
---|---|---|
Baseline MSE | ConSciNet MSE | |
0.5 | 0.645808 | 0.353912 |
0.7 | 0.207255 | 0.010912 |
0.8 | 0.845841 | 0.163615 |
1.0 | 0.911760 | 0.525072 |
In the ideal mass-spring trial the ConSciNet and Baseline models both returned a good disentangled latent representation with both models only encoding information in two of the provided latent variables. This was the expected behavior as only the mass and spring coefficient were varied in the input data.
In Figure 6 we present the disentanglement results for both models. The ConSciNet model encoded a spring coefficient-like value in the 1st latent variable position and a mass-like parameter in the 3rd latent variable position, Figure 6(a). The Baseline model encoded a spring coefficient-like value in the 2nd latent variable position and a mass-like parameter in the 3rd latent variable position, Figure 6(b). The remaining latent variable in both models presented as a flat surface with values close to 0 indicating little to no information was encoded.
In the mass-spring trial we found the network parameterization learned in ConSciNet’s decoder maintained similar generative properties to the system’s underlying time evolution equations with regards to the physical parameters. As in the pendulum trial ConSciNet was able to generalize across the discovered parameter domain and extrapolate the time evolutions of the system governed by these parameters beyond the training interval t-span [0,10]. In Figure 7 we showcase trajectories generated from evaluating the decoders at four increasing values of and .
The like parameter evaluation values were selected using a similar scheme to the pendulum trial to allow a ground truth comparison. Here instead of a cubic polynomial we fit a 5th order polynomial to map between the like parameters and their corresponding ground truth parameters . A 5th order polynomial was selected to provided greater flexibility given the curvature present in Figure 6. This simple polynomial mapping between the like parameters and the ground truth parameter is sufficient to capture the physical behavior trend, however accuracy in both models is decreased in comparison to the pendulum trial, MSE Tables 1, 2 and 3. The need for establishing a robust mapping between the network disentangled parameters and some underlying physical parameter is a limitation of the current approach, particularly when the underlying physical parameters are unknown. In Figure 7(a) and (b), we highlight the generalization ability of the models in regards to and respectively. As the spring coefficient like parameter and are increased independently the amplitude of the momentum increases aligning with the expected physical behavior.
We have presented a neural framework for joint parameter discovery and generative modeling of Hamiltonian systems. Our approach merges aspects of deep latent variable models and Hamiltonian Neural Networks. We evaluated our approach on noisy simple dynamical systems. Here we demonstrated that it discovered parameters akin to the underlying physical parameters and returned a Hamiltonian parameterization that generalized well with respect to the disentangled parameters. Our approach respected the conservation of energy constraint and outperformed a non-physically constrained baseline model. The baseline model struggled to extrapolate beyond the training interval and lost/gained energy. The conservation of energy constraint embedded in our model limits our approach to energy conserving systems. The decoder and auxiliary latent variables used in our approach could be modified to embed another constraint. For example, the decoder could be replaced with an even or odd network to embed an even or odd symmetry. We believe that merging physical constraint embedded neural networks and deep latent variable models can provide a strong framework for improving network interpretability and generalization ability.
Pytorch: an imperative style, high-performance deep learning library
. arXiv preprint arXiv:1912.01703. Cited by: §3.Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.Stochastic backpropagation and approximate inference in deep generative models
. InInternational conference on machine learning
, pp. 1278–1286. Cited by: §3.
Comments
There are no comments yet.