Optimize discretization of A/Q for EKF
Because we are recomputing the discretization of A/Q for every single
step in the EKF, it had been taking up ~70-80% of the total computation
time of the entire EKF. This knocks it down to something like 50% of the
total EKF computation time (for ~3-4x speed improvements overall).
Because the discretizations of A/Q are dependent on both dt and on the
heading of the robot, they are not trivially cacheable. Future
improvements would be:
-Only recalculate discretization when the timestep changes and assume
that the heading estimate will only change by minute amounts and
so we don't need to redescritize unless dt has changed.
-Figure out how to cache 7x7 components of each that should stay
constant for constant dt.
-Determine whether any convenient relationships hold with respect to
dt/heading that would let us make a reasonably small interpolation
table.
Change-Id: I8838d4837f380a2411ab9da99d542cdbe9999e41
diff --git a/frc971/control_loops/c2d.h b/frc971/control_loops/c2d.h
index 1bf99a2..b6f4198 100644
--- a/frc971/control_loops/c2d.h
+++ b/frc971/control_loops/c2d.h
@@ -64,8 +64,60 @@
Eigen::Matrix<Scalar, num_states, num_states> phi22 =
phi.block(num_states, num_states, num_states, num_states);
- *Q_d = phi22.transpose() * phi12;
- *Q_d = (*Q_d + Q_d->transpose()) / static_cast<Scalar>(2.0);
+ Qtemp = phi22.transpose() * phi12;
+ *Q_d = (Qtemp + Qtemp.transpose()) / static_cast<Scalar>(2.0);
+}
+
+// Does a faster approximation for the discretizing A/Q, for if solving a 2Nx2N
+// matrix exponential is too expensive.
+// Basic reasoning/method:
+// The original algorithm does a matrix exponential on a 2N x 2N matrix (the
+// block matrix made of of A and Q). This is extremely expensive for larg-ish
+// matrices. This function takes advantage of the structure of the matrix
+// we are taking the exponential and notes that we care about two things:
+// 1) The exponential of A*t, which is only NxN and so is relatively cheap.
+// 2) The upper-right quarter of the 2Nx2N matrix, which we can approximate
+// using a taylor series to several terms and still be substantially cheaper
+// than taking the big exponential.
+template <typename Scalar, int num_states>
+void DiscretizeQAFast(
+ const Eigen::Matrix<Scalar, num_states, num_states> &Q_continuous,
+ const Eigen::Matrix<Scalar, num_states, num_states> &A_continuous,
+ ::std::chrono::nanoseconds dt,
+ Eigen::Matrix<Scalar, num_states, num_states> *Q_d,
+ Eigen::Matrix<Scalar, num_states, num_states> *A_d) {
+ Eigen::Matrix<Scalar, num_states, num_states> Qtemp =
+ (Q_continuous + Q_continuous.transpose()) / static_cast<Scalar>(2.0);
+ Scalar dt_d =
+ ::std::chrono::duration_cast<::std::chrono::duration<Scalar>>(dt).count();
+ Eigen::Matrix<Scalar, num_states, num_states> last_term = Qtemp;
+ double last_coeff = dt_d;
+ const Eigen::Matrix<Scalar, num_states, num_states> At =
+ A_continuous.transpose();
+ Eigen::Matrix<Scalar, num_states, num_states> Atn = At;
+ Eigen::Matrix<Scalar, num_states, num_states> phi12 = last_term * last_coeff;
+ Eigen::Matrix<Scalar, num_states, num_states> phi22 =
+ At.Identity() + Atn * last_coeff;
+ // TODO(james): Tune this once we have the robot up; ii < 6 is enough to get
+ // beyond any real numerical issues. 5 should be fine, but just happened to
+ // kick a test over to failing. 4 would probably work.
+ for (int ii = 2; ii < 6; ++ii) {
+ Eigen::Matrix<Scalar, num_states, num_states> next_term =
+ -A_continuous * last_term + Qtemp * Atn;
+ last_coeff *= dt_d / static_cast<Scalar>(ii);
+ phi12 += next_term * last_coeff;
+
+ last_term = next_term;
+
+ Atn *= At;
+ phi22 += last_coeff * Atn;
+ }
+ Eigen::Matrix<Scalar, num_states, num_states> phi22t =
+ phi22.transpose();
+
+ Qtemp = phi22t * phi12;
+ *Q_d = (Qtemp + Qtemp.transpose()) / static_cast<Scalar>(2.0);
+ *A_d = phi22t;
}
} // namespace controls