I made a web page that predicts where I'm going. It was a fun little academic exercise, and I thought some of the challenges were interesting.
Ever since it came out that Apple was tracking and keeping location data on everyone's iPhones, and owners could get to that data my curiosity was piqued. Apple was quick to stop tracking so much data and to limit access to the database, but I saved off my phone's database of locations while I could. Later, I installed a new app, OpenPaths, that intentionally and continuously logs my phone's locations and makes that data available to me.
Now that I was logging my phone's locations in the background, I could ask myself what I wanted to do with the data. I knew what I wanted to do right away! I wanted the computer to answer the following question:
Given where I've been, where am I probably going now?
I'd make a webpage whose only job it was to display its answer to that question. It's a simple web page to look at, but the devil's in the details.
The human brain is wonderfully better at answering that sort of question than the computer. It's a matter of pattern recognition across at least three dimesions: time and two-dimensional space.
I started with the way I'd think about answering that question:
If there aren't any notable exceptions, like travelling for work or vacation, then I follow a bi-weekly schedule more closely than a weekly schedule. So I'd look at where I was at this time of day two weeks ago.
To translate that into an algorithm for the webpage a few things need to happen. I have to codify what a "notable exception" is. Perhaps it's being more than 100 miles away from home for more than a day or two. Or, if I'm currently away on vacation, then the program shouldn't be looking at what I've been doing two weeks ago at home.
Here's the algorithm that the computer uses:
- If I was near here two weeks ago, consider where I was going back then at this time.
- Otherwise, consider where I was at this day of the week last week.
- If I was away on each of those occasions, then how about where I was yesterday at this time?
Simple enough. But the question of "two weeks ago at this time of day" itself is a bit ambiguous. Two things get in the way of that that the human brain just automatically figures out. One, time-of-day is local. If I am in California today, but I was in Hawaii two weeks ago, I can pretty easily calculate "breakfast time" for either. But the computer would have to first translate latitude and longitude coordinates into time-zones on the Earth. Then it'd be able to calculate relative time-of-day for either week at either location. The other issue, daylight savings time throws off the way a program might naïvely calculate "two weeks ago."
Account for daylight savings time
"Two weeks ago at this time of day" is a loaded phrase. The naïve approach for a computer that keeps track of time by incrementing seconds would be to subtract the number of seconds in a day, and do that 14 times. But that doesn't account for daylight savings time, which would throw off the results for 4 weeks out of a year.
My program uses Python, which has a library to translate time from epoch timestamps (which are used by OpenPath's library) to a calendar date and time-of-day format that's more native to the human mind. So the actual code calculates "number-of-seconds to this time-of-date two weeks ago" as follows:
now - time.mktime( (datetime.fromtimestamp( now ) - timedelta( days = 14, hours=0 )).timetuple() )
Huh, that's sort of wordy compared to the naïve alternative, but the important thing is that this approach is always correct. Once we've figured out when "two weeks ago" actually is, then we can calculate what "where" and "how far" actually mean...
If you're near the equator, then calculating short distances using latitude and longitude can be approximated by an equation based on Pythagorean Theorem. What's funny is that if you search the web looking for the algorithm in Python, you'll usually see something like the following function:
def distance(p1, p2): return math.sqrt((p1 - p2)**2 + (p1 - p2)**2)
But as long as you're importing the math module anyway, it'd be even more direct if you just used math's own "hypot" (short for "hypotenuse") function:
def distance(p1, p2): return math.hypot(p1 - p2, p1 - p2)
But that calculates planar distance, and we're not on a plane. We're essentially on a sphere. So it's better to use the Haversine formula if we want to get an accurate distance between two points defined by latitude/longitude coordinates.
Now that timedelta and the Haversine formula handle the "when" and "where" in my fuzzy algorithm, it's time to take a look at the presentation of the data.
The Webpage Itself
So much for the algorithm. What about the quality of the webpage itself?
It's a small webpage. So it's only sensible and intuitive that it'd be a quick and responsive webpage, too. But the webpage wouldn't work without making relatively long queries to two remote services.
- Retrieve new location data from the remote OpenPaths API service.
- Retrieve specific map data from the Google Maps API service.
In between the two big remote queries, the program needs to perform the actual prediction for where I'm going to be based on the OpenPaths data, and send the predicted points to Google Maps. There's no way to avoid the fact that the webpage is going to take a few seconds to do all its work.
The best work-around for that is two things:
- Ajax. The web server can quickly serve a simple HTML web page to the client which'll get displayed for the user right away. Then the browser can make another request to the server for just the data that takes a long time to calculate and retrieve.
- Caching. Once I've retrieved raw datapoints and made the prediction calculations, then that prediction shouldn't change for a few minutes. I can save off my prediction and immediately hand it back the next time the webpage is requested, if it's requested relatively soon.
It's critical to me that the webpage be small and simple. It has to get to the point as quickly as possible. But it's also important to me that I give credit to the tools and services I used to make it possible. That called for some credits to be put in a footer.
I wanted the footer to be relative to the browser's window viewing the page. But if that window was too short, then the footer would end up overwriting or being overwritten by the map or the text above the map. The fix for that was some clever CSS that put the footer at the bottom of the window, but never let it cover up the important part of the page, the map.
So far so good. But then I discovered something unexpected...
Sadly OpenPaths seems to collect bad data from my phone occasionally while it's at rest. All of the recorded and predicted movement in the map below is due to bogus data from OpenPaths.
All of the points along the same angle that extends to the south east are bad data. The phone didn't go anywhere all that time. I have no idea why it sometimes pretends to travel to that part of town, but I don't like it. This called for another interesting algorithm that's better suited for a human brain:
If the datapoints smell fishy, don't use them.
It's really easy for me to detect which datapoints are bad, and not only just because I know where my phone's been. It's because there's a certain pattern, the angle and distance traveled by the bogus points. So I've got a work-in-progress algorithm the elides points that smell fishy.
A Secret Mode
The main point to the site was the prediction. But as long as I had all this historical data, it seemed like a shame if I couldn't easily look it up, too. So there's a secret mode, impossible to find and discover. (Since nowadays people don't read long blogs or actually type in the URL bar of their browsers.)
If you add an HTTP "GET" parameter, t, to the URL, the website will return a corresponding location history instead of a prediction of where it thinks I'm going to be. t can take one of three different forms, a UNIX timestamp, an RFC 2822 date and time, or a negative number of days to look back. Here are some examples:
I flew in to Los Angeles on that day. Timestamps are good if you're already dealing with them or want a relatively short token to represent an absolute time. Otherwise, they're an epoch fail waiting to happen.
Took the kids to Disneyland that day. Fun! That date format is handy if you want to browse my location history and are thinking in terms of calendar dates.
What'd I do last week? This is handy if I don't need an absolute time and date, but just want an offset from the current time and date.
Finally, I've got a micro site that was really fun to build and with which I'm quite pleased.