Add code to train a simple turret controller

This uses Soft Actor-Critic with automatic temperature adjustment to
train a controller for last year's turret.  This is a stepping step to
training a more complicated turret for the swerve.  I need to still
parallelize it, play with the hyper-parameters, and teach it how to go
to a nonzero goal.

Change-Id: I1357b5fbf8549acac4ee0b94ef8f2636867c28ad
Signed-off-by: Austin Schuh <austin.linux@gmail.com>
diff --git a/frc971/control_loops/swerve/physics_test.py b/frc971/control_loops/swerve/physics_test.py
index 6e7ddf4..a45bf46 100644
--- a/frc971/control_loops/swerve/physics_test.py
+++ b/frc971/control_loops/swerve/physics_test.py
@@ -59,6 +59,7 @@
         Xdot = self.position_swerve_full_dynamics(X, U)
         Xdot_jax = jax_dynamics.full_dynamics(self.coefficients, X[:, 0], U[:,
                                                                             0])
+        self.assertEqual(Xdot.shape[0], Xdot_jax.shape[0])
 
         self.assertLess(
             numpy.linalg.norm(Xdot[:, 0] - Xdot_jax),
@@ -71,6 +72,9 @@
         velocity_physics_jax = jax_dynamics.velocity_dynamics(
             self.coefficients, X_velocity[:, 0], U[:, 0])
 
+        self.assertEqual(velocity_physics.shape[0],
+                         velocity_physics_jax.shape[0])
+
         self.assertLess(
             numpy.linalg.norm(velocity_physics[:, 0] - velocity_physics_jax),
             2e-8,