Every app developer wants their app to become popular. For a messaging app, sending out 400k push notifications in one day is a good measure of success, right? Not always.. this is a story of how server glitches can make things get out of hand very, very quickly!
Background: talk to myself
Last year I had a funny idea for an app. What if you could talk to everyone else who has the same name as you? Maybe I’d end up chatting with John Travolta or John Cleese. I poked away at the app in my spare time and released talk to myself last fall (its App Store review story is a whole other kettle of fish).
talk to myself is a quirky little free app, and has achieved some mild level of popularity. It’s gotten over 11 thousand downloads and 60 thousand messages posted so far.
When you launch the app, you enter your name, and then you’re in a conversation with other people who have the same name. Say hello, and when others respond you’ll get a push (like any other messaging app).
Besides being a fun side project, it was a good way to learn some new things as a developer. I built it on the StackMob platform, which was a great way to get a server-based app going without having to deal with, you know, a server.
Time for an update!
This January I decided to work on an update for talk to myself, with a few new user facing features as well as improvements on the network side of things. I saw that StackMob had released a new version of their iOS SDK, so I dutifully updated the pod and made the necessary integration changes. There were a few quirks but overall it went pretty smoothly.
One thing I did notice during development was that push notifications weren’t working; I was getting a 404 error. I chalked this up to a glitch in my integration and put it off for a while. Cut to late January, as I’m preparing a beta build. At that point I was making a build that ran against the production backend, and I found that push notifications still weren’t working. A peek at the push notification revealed the problem was much worse than I had thought:
Push notifications had suddenly gone from around 1300 a day to zero, on or around on January 17th. It wasn’t that users had suddenly stopped sending messages, either. The API usage metrics were looking pretty normal:
I immediately filed a support request with StackMob and awaited their response.
The fix is in
Just two hours later (at 11 pm their time) I got a response, though I was sleeping. Throughout the following day I sent more technical details and StackMob worked to solve the problem.
Out of nowhere, that afternoon my phone and iPad started pinging. A lot. Non-stop. Push notifications from talk to myself were flooding in faster than the device could display them! And the strangest thing was, they were from all kinds of different names. Normally you’ll just get notifications from people with your name; I was seeing messages from dozens of different names. I immediately sent an email to the technician at StackMob:
I quickly checked with friends & colleagues and confirmed that they were experiencing what seemed to be the same thing: every push from every user was being sent to every device (i.e. a broadcast). I sent some more frantic messages to StackMob, while logging in to my Apple Developer account to prepare to revoke the push certificates (basically, pulling the plug). As I was about to do so, the notifications stopped and the StackMob tech emailed to let me know.
The real fix, and the gory details
The craziness over, I relaxed a little and waited as the StackMob technicians continued to investigate the problem. It turned out to be a combination of problems – mine and StackMob’s.
StackMob recently changed their server push API, but the iOS SDK was still hitting the old API. To keep apps working properly, they translate calls to the old API to the new API. Reasonable enough. However, one change in behaviour involved the semantics of what happens when you send a push to an empty list of users. In the old API it did nothing; in the new API it broadcasts to all users.
That wouldn’t have been a problem, except that in talk to myself, lots of users enter the app to find themselves as the first person chatting with their name. They send a message, and the app sends a push to all of the other users in that chat – an empty list of users. Whoops! I had missed optimizing out that call before, since it never did anything. But on that day, it did a lot.
One person installed the app and said “Hi.” Which broadcast a push notification to everyone else who had the app installed – including a number of users who had already been the first with their name. They launched the app and also said “Hi” – causing a tremendous flood of broadcasts!
The broadcast bug was in effect for just 5 minutes. Over four hundred thousand push notifications were sent in that time! Here’s that same chart – but captured later, with January 28th’s data included.
The above chart has the same date range as before, but the previous 1300 push notifications a day are indistinguishable from zero. Yowza!
By the following day, StackMob had determined a solution to the problem. With some trepidation, I gave them the go ahead to flip the switch and activate the fix. I waited a few minutes, and my devices did not go crazy. I fired up the app and sent a message to John from my iPhone, and my iPad pinged. Just once, the right number of times!
Push notifications were working properly again, but there was sure to be some fallout from this event. Every user had been bombarded with at least 40 push notifications in 5 minutes, telling them someone (who didn’t share their name) had sent them a message. There was no such message to be found upon entering the app, of course (keeping with the principle of not sending data in your notifications).
I decided to send out a broadcast of my own, apologizing for the problem and letting people know that notifications were now working properly again. Perhaps that helped, because the app only received one review mentioning the incident (and it was reasonably positive to boot).
One concern was that with an incident like this, you could have a lot of people deleting the app. Fortunately, that doesn’t seem to have happened, and people continue to use the app!
Version 1.2 went up on the App Store on February 5th, and includes the optimization to make sure this broadcast problem can never happen again. It also includes automatic smiley faces, the ability to change your name (once) if you make a mistake entering it, and various other fixes and improvements. And while talk to myself may never naturally get to 400k push notifications a day (let alone in 5 minutes), at least I have a good story to tell!
One final note: while this problem did occur on StackMob’s watch, I have to give them props for their rapid developer support response. The fact that their SDK is open source has also been very helpful in diagnosing and fixing problems. I’m happy to continue using them for talk to myself and other projects!