27 Sep 2017

A bizarre "403 Forbidden" bug-hunting adventure

I've been trying to figure out a strange bug in Exist for iOS over the past couple of months, and finally managed to fix it yesterday. Without Josh's help I would still be scratching my head over this bizarre issue.

It started with one user reporting the bug. They couldn't submit any data to Exist, all of a sudden. The app's POST methods were always failing, but the GET methods were fine. They could open the app and view their data, but they couldn't submit custom tags or a mood rating for the day.

This is obviously problematic. A big part of the mobile app's functionality is to help users get their data into Exist, since our web app offers more in-depth "read" capabilities with graphs and charts. In the background, Exist for iOS also sends the users Apple Health data (if they've turned the feature on) to Exist, so I knew those POST requests would also be failing.

Not long after this first report, I got a second report of the same bug. When users submit a bug report from within the app, a text file of logs gets attached to their email. The logs from both users told me this was a 403 error. But that makes no sense. A 403 error means the user isn't authorised. Usually that means they need to log in. But if the user wasn't authorised to make a POST request, they wouldn't be authorised to do a GET request, either. But both users were able to open the app and read their data from the API without issue. They just couldn't POST any data.

Of course, the first thing that made sense was that my POST and GET requests worked differently, somehow. My POST request method must look for the user's auth token in a different place to the GET requests, and the token had somehow been moved or destroyed. Remember, not all users had this problem. And I couldn't replicate it at all. So my app's POST requests worked fine—most of the time. But for these particular users, POST requests were failing every single time after something particular happened.

But there was nothing obviously wrong with my code. My POST and GET requests had the same headers, the same timeout parameters, and looked in the same place for the user's auth token. I couldn't find anything different between them that seemed like it could be causing this bug.

At this point, I had feedback from the users experiencing this bug that logging out and back in (which clears the app's keychain and cache data) didn't solve the issue, but that deleting and reinstalling the app did. This only puzzled me further, but at least it gave me something I could tell users who had discovered this bug.

After more fruitless exploring of my networking code, and every interaction my app has with the keychain, and many discussions that led to dead ends, Josh put on his serious debugging hat and helped me dig into this further. See, although I was seeing a 403 "forbidden" error, I was also getting an error message I'd never seen before:

CSRF Failed: Referer checking failed - no Referer.

Initially I'd thought this must be a clue as to what the issue was, and went digging into GitHub issues for Django Rest Framework, which we use to help provide our server-side API. But I didn't turn up anything useful, and Josh was adamant that I must not be sending the Authorization header correctly in these cases. CSRF, or "Cross-Site Request Forgery", is a method of preventing malicious requests by making sure all requests match specific criteria. Josh knew that if it didn't find an OAuth2 token in the request, the API fell back to using cookie-based "session authentication", so developers can look at it in their browser if they're logged in to Exist. This session authentication is the spot where CSRF validation happens, so to trigger that, somehow the API must not have received my auth token.

While I googled for ideas on how my auth headers might be getting stripped out of my requests, Josh took to the command line to attempt to replicate the error message I was seeing. But here's the rub: he couldn't do it. Sending no auth header, incorrect auth headers, whatever he tried ended in a more specific authorisation error. He couldn't get the server to return the CSRF error that I was seeing.

With a little more digging, Josh was able to find this comment on a GitHub issue that explains how CSRF won't ever be triggered unless the user is authorised. All of this CSRF business was starting to point to the fact that the server seemed to think my app was a browser (or at least was pretending to be a browser) for these POST requests. But I wasn't sending any cookies, or doing anything else to make my app seem like a browser. And it was only in these particular cases—I think four users have reported this, in total—whereas all our other users were able to do POST requests just fine.

I suddenly remembered that pre-iOS 11 using a Safari View Controller in your app would share the cookies and other data from Safari with your app's web view. This seemed like a dead-end, since I'm not using any Safari View Controllers in my app—but I do use a WKWebView. In a couple of places I load a simple help page from our server inside a web view.

Josh realised my app might be somehow getting cookies from visiting our site inside this web view, and then sharing those cookies with AFNetworking's underlying NSURLSession, which is handling my requests. We didn't really understand how this could happen, since those help pages are just simple text-based pages with no links to Exist or web app menus of any kind, but it seemed to make sense. And when we turned up confirmation that NSURLConnection, which used to underly AFNetworking before NSURLSession was introduced, does automatically use cookies from an app's web view in all future requests, it seemed we were onto something.

After much trial and error, we discovered the culprit. One of those help pages helpfully links the user to a related settings page within our app which we had totally forgotten about. Because it's inside a web view, tapping that link opens the settings page in our web app inside the web view, where some users would go ahead and log in so they could adjust those settings.

For those users, the web view stored the cookies created when they completed a browser log in, and NSURLSession oh so helpfully auto-fucking-matically decided I would probably—no, definitely—want to send those cookies in every single request my app did after that. Forever and always.

So here's why it was all so confusing: whenever I did a GET request, those cookies were enough for the server to respond with the user's data, as if they were using a browser. But when doing a POST request, Django Rest Framework adds an extra layer of security with CSRF, as it should. And so the CSRF check was failing every time, because I wasn't sending a Referer header, or any other related CSRF bits. So the server really did think my app was a browser, but not for any obvious or visible reason that I could find in my code.

And of course, I didn't even know about these cookies, so I wasn't deleting them on log out. The only way to clear them was to delete the app entirely and reinstall it.

I've since fixed this bug by removing the link to said settings screen entirely, and stopping the web view from persisting cookies with a nonPersistentDataStore like so:

WKWebViewConfiguration *config = [WKWebViewConfiguration new];
config.websiteDataStore = [WKWebsiteDataStore nonPersistentDataStore];
self.webView = [[WKWebView alloc] initWithFrame:CGRectZero configuration:config];

Unfortunately, I can't release the fix yet, because Apple is holding us up in other fun ways. But as soon as we can release updates to the App Store again, I'll be able to send the results of this long and arduous bug fix journey to all my users.