Handle missing data at startup better
We had a case where the data was sent *before* the start of the log
file, and the timestamp was sent *after* the start of the log file. So,
we couldn't match the data because it wasn't logged, but the data wasn't
supposed to be logged whereas the timestamp was supposed to be logged.
The CHECK that fires is trying to catch when data goes away in the
middle of a log and notify the user that we have missing data. That is
not the case here.
Be more aggressive here and declare any message with missing data where
the data was before the start of the log to be fine and ignore it.
We'll see if there are more cases.
Change-Id: Ia233b0b88699fc3febf2fac1fbd7ee10b03631cd
diff --git a/aos/events/logging/logger.cc b/aos/events/logging/logger.cc
index eb7221d..2dc98a9 100644
--- a/aos/events/logging/logger.cc
+++ b/aos/events/logging/logger.cc
@@ -1437,12 +1437,18 @@
// simulation.
state->Send(std::move(timestamped_message));
} else if (!ignore_missing_data_ &&
+ // When starting up, we can have data which was sent before the
+ // log starts, but the timestamp was after the log starts. This
+ // is unreasonable to avoid, so ignore the missing data.
+ timestamped_message.monotonic_remote_time >=
+ state->monotonic_remote_start_time(
+ timestamped_message.channel_index) &&
!FLAGS_skip_missing_forwarding_entries) {
- // We've found a timestamp without data. This likely means that we are
- // at the end of the log file. Record it and CHECK that in the rest of
- // the log file, we don't find any more data on that channel. Not all
- // channels will end at the same point in time since they can be in
- // different files.
+ // We've found a timestamp without data that we expect to have data for.
+ // This likely means that we are at the end of the log file. Record it
+ // and CHECK that in the rest of the log file, we don't find any more
+ // data on that channel. Not all channels will end at the same point in
+ // time since they can be in different files.
VLOG(1) << "Found the last message on channel "
<< timestamped_message.channel_index;
@@ -1454,7 +1460,8 @@
// Now that we found the end of one channel, artificially stop the
// rest. It is confusing when part of your data gets replayed but not
- // all.
+ // all. Read the rest of the messages and drop them on the floor while
+ // doing some basic validation.
while (state->OldestMessageTime() != monotonic_clock::max_time) {
TimestampedMessage next = state->PopOldest();
// Make sure that once we have seen the last message on a channel,
@@ -2098,6 +2105,10 @@
// mapper directly.
filter->Pop(event_loop_->node(), result.monotonic_event_time);
}
+ VLOG(1) << "Popped " << result
+ << configuration::CleanedChannelToString(
+ event_loop_->configuration()->channels()->Get(
+ factory_channel_index_[result.channel_index]));
return result;
}
diff --git a/aos/events/logging/logger.h b/aos/events/logging/logger.h
index 3cb44f6..d142bd8 100644
--- a/aos/events/logging/logger.h
+++ b/aos/events/logging/logger.h
@@ -457,9 +457,9 @@
// Returns true if the channel exists on the node and was logged.
template <typename T>
bool HasLoggedChannel(std::string_view name, const Node *node = nullptr) {
- const Channel *channel = configuration::GetChannel(logged_configuration(), name,
- T::GetFullyQualifiedName(), "", node,
- true);
+ const Channel *channel =
+ configuration::GetChannel(logged_configuration(), name,
+ T::GetFullyQualifiedName(), "", node, true);
if (channel == nullptr) return false;
return channel->logger() != LoggerConfig::NOT_LOGGED;
}
@@ -497,7 +497,7 @@
// Class to manage sending RemoteMessages on the provided node after the
// correct delay.
- class RemoteMessageSender{
+ class RemoteMessageSender {
public:
RemoteMessageSender(aos::Sender<message_bridge::RemoteMessage> sender,
EventLoop *event_loop);
@@ -606,6 +606,12 @@
->node_event_loop_factory_->monotonic_now();
}
+ // Returns the start time of the remote for the provided channel.
+ monotonic_clock::time_point monotonic_remote_start_time(
+ size_t channel_index) {
+ return channel_source_state_[channel_index]->monotonic_start_time();
+ }
+
distributed_clock::time_point RemoteToDistributedClock(
size_t channel_index, monotonic_clock::time_point time) {
return channel_source_state_[channel_index]