Inbound email processing
2016-09-09

IMPORTANT NOTE: This announcement only applies to emails sent from your email client (Outlook, Mail, Gmail Web, etc). It does not apply to messages sent from Communicate -> Send an Email, Newsletters, Event Invites, etc. It also does not apply to customers using subdomain emails.

Lately you may have noticed that emails sent to mailing lists from your email client didn't get delivered. Until this morning. Some of you have brought this to our attention. Thank you for your help debugging this problem.

At first we thought it was a problem with specific email addresses and focused our attentions researching individual complaints. That turned out to be a dead end.

This morning I double checked our process (we call him Mailman) that checks for new messages and then puts them in our email processing queue. Years ago we had problems with that guy, but for the past couple of years he's been rock solid. Additionally, he's monitored by another process, so when he crashes we get notified.

At first glance everything looked okay. Mailman was running and looked healthy. So then I checked the last time he processed a message and well... I wish I'd been wearing dark colored pants today. He hadn't processed a message since August 31.

So I killed him. And started a new one. And this Mailman started processing messages right away, clearing out all the messages we've been sitting on since August 31.

This is really embarrassing to me personally for two reasons.

First, we thought we had this critical part of TroopTrack well-monitored. We were wrong. I'm sorry we messed this up. We are going to figure out a better way of monitoring mailman so we can tell not only that he's alive, but that he's also doing his job.

Second, I should have checked mailman first. It only takes a few minutes to verify that he is processing messages and it was a critical mistake for me not to do that immediately. To not do that for nine days is a humiliating error in judgment.

Thank you to everyone who helped us debug this problem and a sincere apology to everyone impacted by this mistake.

Permalink