Addresses are immutable. At least, that’s what the average person believes. Places, we’re taught, are discrete points on a map, as inelastic and permanent as mountains. But mountains aren’t as solid as they seem. Given enough time and energy, even the biggest boulders eventually erode — and so does one’s faith in location data, which crumbles when you realize how amorphous and indefinite addresses really are.
Consider, for example, just a few of the address idiosyncrasies software developers often encounter when they’re programming location-based applications: Sometimes, building numbers appear more than once in different places on the same street. Sometimes, one building has multiple numbers. Some addresses contain fractions, indicating a place within a place. Street names regularly recur, sometimes even in the same city. So do city names, often in the same country and sometimes even in the same state. Some roads have multiple names. Occasionally, buildings, streets, cities, counties and countries are even renamed or assigned new postal codes, rendering their prior “location” nonexistent.
Landmarks that may or may not have addresses can be just as ambiguous. Perhaps even more so. Imagine, for instance, airports, stadiums, convention centers and parks, to name just a few examples: The United States’ busiest airport, Hartsfield-Jackson Atlanta International Airport, spans 4,700 acres — including five runways with an average length of approximately 10,000 feet, more than 30,000 public parking spaces, and a 156-acre terminal complex that encompasses more than 200 concessions and seven concourses with nearly 200 gates.
The brand-new Allegiant Stadium in Las Vegas, meanwhile, features 65,000 seats, 127 suites, 297 restrooms and 2,200 doors. America’s largest convention center, Chicago’s McCormick Place, includes 2.6 million square feet of exhibit halls, 173 meeting rooms and six ballrooms. And America’s biggest park, New York’s Central Park, comprises 843 acres with 136 acres of woodlands, 250 acres of lawns, seven manmade lakes and ponds, 36 bridges and arches, and 58 miles of pedestrian paths.
If you ask a friend to meet you at any of the aforementioned places, neither a name nor an address will suffice; with infinite possible positions at which to rendezvous inside them, more precise instructions or coordinates are needed. That’s why geographic information systems and location-based services are so essential. They make it easy to answer the question, “Where?”
In theory, anyway. In reality, a GIS application is only as effective as the location encoding system on which it rests. And location encoding systems are as imperfect and obscure as the locations they seek to codify. This is especially apparent during data conflation, when programmers have to determine whether two disparate data points — for example, 123 Main St. versus 123 Main Street, 456 W. 1st Ave. versus 456 West First Avenue or 789 Broadway St. with one set of coordinates versus 789 Broadway St. with another set of coordinates just a few meters away — are referring to the same or different points on a map.
The more you think about it, the more you realize that location data is like an impressionist painting: What looks cohesive from afar is actually extremely chaotic up close.
In the GIS community, the consequences of chaos are stark. Without an easy way to encode, share and understand location data, individuals, businesses, governments and other users are forced to waste valuable time, money and talent on data conflation instead of data analysis.
Fortunately, chaos with the right tools can be controlled. And in the case of location data, the best tool for the job is a universal standard identifier that is free and open, and which makes place-based information easy to ingest, share, merge, manage and, ultimately, analyze.
There’s just one problem: While there have been many admirable attempts to create a universal location encoding system, so far none has gained widespread adoption.
Enter Placekey. The GIS community’s newest universal standard identifier, Placekey, promises to succeed by making location information unambiguous, accessible and interoperable. Its secret? Instead of defining locations according to coordinates, it defines them according to context.
The Long Quest for Location
Although web-based services have brought GIS challenges and opportunities to the forefront, humankind has been trying to find an effective location encoding standard for centuries. Take latitude and longitude, for example. As early as 600 B.C., the Phoenicians were using the sun and stars to determine latitude. In the second century B.C., Greek astronomer Hipparchus first conceived the idea to calculate longitude by measuring the difference between local time and “absolute time,” although his method could not function without the existence of an accurate clock. In the 18th century, Englishman John Harrison finally created one — a spring-powered clock known as a marine chronometer.
Today, GPS uses the same time-based principles to calculate latitude and longitude from space. But latitude and longitude express only point locations, whereas an ideal location encoding system also will allow the expression of non-point locations, such as landmarks, regions, or a specific office on the 29th floor of a building. And so, the search for an alternative continues.
Location Standards: Past and Present
The ubiquity of data and compute and growing complexity of questions data science aims to answer has put focus on the need for a location identifier that can effortlessly join datasets. Many well adopted and valuable identifiers already exist to solve other problems, like navigation. Lots can be learned and emulated from the identifiers considered below.
Here are some of the notable attempts that have aimed at creating a universal standard identifier:
- Geohash: Geohashing is a geocoding method whereby latitude and longitude are encoded into a string of numbers and letters that denote an area on a map; every decimal place in a geohash represents an order of resolution such that more characters equates with more precision. While this approach is ideally suited to conjuring two-dimensional points and parcels on a map, it fails to account for a third dimension — an apartment building, for example, with multiple units across multiple vertical stories — and provides zero context about the people and places that occupy space. Put another way: Geohashing describes where, but not who or
- Mapcode: Designed to be short, memorable and shareable, mapcodes distill locations into brief alphanumeric codes that encompass two sets of letters and numbers separated by a decimal. Unfortunately, some locations can have multiple codes. And while mapcodes are free and open, decoding them requires support from the nonprofit Mapcode Foundation, which is charged with maintaining and distributing the mapcodes data file. As with geohashes, mapcodes help you find and reach a precise point on a map.
- What3Words (W3W): Built on the premise that street addresses don’t always point to precise locations, W3W is a web-based location service that has divided the Earth into 3-square-meter parcels, each of which it has assigned a unique and immutable three-word identifier. The result is a simple way to encode, remember and share very precise locations, which makes W3W a great tool for precise navigation and human to machine communication. Because identifiers are randomly assigned, they lack consistency and order, which means there is no relationship to places that are near one another and make it less ideal for machine to machine communication. W3W is neither free nor open which is an additional barrier to adoption.
- Open Location Code (OLC): Conceived by Google, OLC is an open-source geocoding system built around location identifiers known as “plus codes,” which are designed to function like street addresses and to be used in place of them — particularly in places where street addresses don’t exist. Like geohashes, plus codes are derived from latitude and longitude and expressed in the form of a short alphanumeric code. For that reason, plus codes share many of the upsides of geohashing, and has had great adoption. But because plus codes are derived from latitude and longitude, they ultimately can denote only geographic coordinates and are unable to define the discrete places that occupy those coordinates, and to distinguish between them.
- CitoCode: Like many other location encoding systems, CitoCode was conceived to create short geocodes that are easier to use, remember and share than complex addresses or long sets of coordinates. CitoCodes are unique in several respects. For one, they can be encoded not only with locations, but also with context, including names, phone numbers, websites, email addresses and descriptive text. Also, they’re user-generated; codes are chosen, not assigned. That they can be personalized means CitoCodes function well for memory and marketing purposes, but not for consistent, universal addressing and navigation.
A Better Way
The vast majority of existing location encoding systems — including Geohash, Mapcode, W3W and OLC — share one significant blemish: They’re solely based on grids.
Grids are useful tools for finding points on a map and navigating between them. Those points could be virtually anything, from a spot on the beach where you’re meeting friends to the rear entrance at a music venue. As previously noted, however, points on a map convey only position. They lack context. And from a data science perspective, that means they also lack consistency and correctness.
Imagine, for example, your own home. When you enter your address, Grid A puts a location pin in your living room. Grid B drops one just a few feet away in your bedroom. When they’re conflating data in order to create a location-based service or application, data scientists must determine whether those pins belong to the same place or not. One data scientist might decide that they do, because they’re located in the same home. Another might decide that they don’t, because they’re located in different rooms. If you consider this conundrum on a larger scale, you can begin to understand the difficulties that arise when joining data through purely grid-based systems.
Instead of another grid-based system that defines spaces, what’s therefore needed is a location encoding system that defines places.
Placekey is a new universal identifier that already has more than 1000 member organizations, including Esri, CARTO, Nielsen, JLL, Dun & Bradstreet, SafeGraph, Landgrid, the City of Boston, and Experian.
Designed to make it as easy as possible to access, share and combine location data, and to do so in a hyper-local manner, Placekey’s free and open API can generate identifiers for virtually every place in the world. Initially, that means any place in the United States with a postal address, including vertical places (e.g., apartments or units within a multi-story building) and points of interest (e.g., a coffee shop within a retail store or a restaurant inside an airport terminal). Eventually, however, it will mean any place worldwide with or without a postal address — including areas or regions (e.g., school districts, counties, neighborhoods).
Every Placekey includes a “Where Part” and an optional “What Part.” The “Where Part” encodes a place’s geographical coordinates into three unique character sequences of three characters each using Uber’s open-source H3 hierarchical hex-grid system; the leftmost character sequence denotes a large region, the middle sequence a slightly smaller region and the rightmost sequence an even smaller region, which allows one to judge locations’ proximity to each other by reading the “Where Part” left to right and gauging how many characters they have in common.
Combining geographical location with address and POI data in a free and open format is what makes Placekey distinctly capable of solving the problems unique to joining disparate geospatial datasets. Placekey’s [email protected] structure is built to create context and the consistency needed to handle matching different kinds of datasets, all in a single string that is easy to store in databases. As an example, weather data or elevation data can be linked to the “Where Part”. The ‘What Part’ can account for multiple POIs with a single address and for multiple POIs with the same address. The latter is especially powerful; although a Placekey’s “Where Part” and the address encoding of its “What Part” remain consistent regardless of what POI inhabits a given place, the POI encoding of its “What Part” changes relative to the place’s identity, making it possible to add a time component to data matching.
Therein lies Placekey’s real promise: to convert addresses and places to a single string that can bring disparate datasets together. According to various surveys on the subject, data scientists spend anywhere from 20 percent to 80 percent of their time collecting and preparing data. Imagine a world in which they could spend that time analyzing it, instead. In fields as diverse as advertising, emergency management, urban planning, public health, energy, transportation and logistics, to name just a few, the result would be adeeper understanding about places, the people and the communities that reside in and around them, and more effective solutions with which to solve the global problems we’re destined to face.
In that world — the one that Placekey intends to build — location data will no longer be disparate; it will be unifying.
Placekey officially launches on Oct. 7, 2020. To learn more, or to retrieve a Placekey using the Placekey API, visit placekey.io.