The Map is not the Territory

newsletter data data science

Seek ground truth whenever possible to accelerate learning.

TJ Palanca

The map is not the territory; seek ground truth whenever possible to accelerate learning.

uberHOP is a little example from my experience. The product was a point-to-point (a.k.a UV express) service Uber launched in Manila, along with Seattle and Toronto.

The way it worked was simple: you would make a request to take a specific route during peak hours, and we would batch you in with up to 6 people to take a high occupancy vehicle along the route.

uberHOP needed high occupancy to become profitable

The pricing was at a 70% discount to uberX (the traditional ride product), and drivers were guaranteed earnings, so there was a minimum average occupancy needed to hit profitability. To get to that high occupancy, we needed to ensure that the routes selected were of high quality.

Initial approach: Clustering!

My first instinct as a data person was clustering. We needed to find pairs of longitude and latitude that had enough pickup and dropoff density in them to have a decent chance of becoming profitable.

The launch routes were selected using this method, but we had limited success, even after a novelty period, cancellation rates remained high.

I tried different algorithms, distance metrics, using various map features, dispatch radiuses, all for very incremental gains.