Research project to establish a standard and technology for creating unique, permanent ids for "places".
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Bèr Kessels be1fbb63e0 Add requirements to allow for merging and splitting 2 months ago
.gitignore Initial commit 2 months ago
LICENSE Initial commit 2 months ago
README.md Add requirements to allow for merging and splitting 2 months ago

README.md

OpenPlaceID

OpenplaceReviews, as well as other niche maps, have a need for a persistent identifier when referring to a “place”.

Open Street Map (OSM) uses Ids for its internal structure: to identify a node. Nodes may change, when the feature they refer to, does not.

Take some scenario’s and issues:

  • A swimming-pool S is a single point (node), with ID 1337. A mapper replaces that pool with an area (polygon) which gets ID 2001. The feature is still swimming-pool S, but the OSM ID has changed.
  • A restaurant R moves to the other side of town. A mapper might move the point to the new location, the point keeps the same OSM ID. Another mapper might delete the old one and add a new one at the new location, the point gets a new OSM ID.
  • A shop S closes and shop T opens on the same spot. A mapper edits the name and some tags from the point S to make it shop T. The place is a new one, but the OSM Id has remained the same.

There are many more scenario’s in which the OSM-id might change but the “place” it refers to did not. And in which the OSM-id did not change but the “place” it refers to did change.

There is a clear disconnect between what OSM needs from its IDs and what a place database needs from its IDs.

Requirements

It must be:

  • Independent
  • Project-agnostic
  • Verifyable (including history)
  • Not limited to OSM

It may be:

  • Decentralized
  • Hierarchical

The problem has been descriped in detail in OSM wiki. Some projects have taken as shot at developing or describing something that could work. None have finished or are active.

Wikidata

Wikidata has a uniform and universal Id mechanism. The OSM tag wikidata= links from OSM to wikidata. Wikidata, links back to OSM relations

There are over 1.3 million tags with wikidata in OSM. This is still only 0.02% of all nodes. And just under 5% of all “relations”.

For example, businesses are mostly untagged.

QID microservice

A microservice has been proposed that takes a wikidata ID and resolves that to the relevant OSM URL.

It maintains an internal mapping between wikidata IDs and their OSM counterpart elements (relations, ways, nodes). The demo server is now giving errors.

Nominatim

Nominatim’s place_id is only an internal parameter of the engine. You cannot use place_id for anything, it is a technical database key and depends on a single Nominatim instance.

Overpass

The solution is to link to the object with a certain property, usually a certain combination of tags. If a unique object exists, you are redirected to the object’s web page at openstreetmap.org. Otherwise, if for example the referred way has been split, a search result page shows all possible objects.

From Overpass_API/Permanent_ID

Google

Google uses a place-id.

Some highlights and relevant parts:

A place-id spans across projects and APIs:

You can use the same place ID across the Places API and a number of Google Maps Platform APIs.

A place-id is discovered, using a “fuzzy” search.

A common way of using place IDs is to search for a place .. then use the returned place ID to retrieve place details.

A place-id may be replaced or moved, in which case the API should find the new id and redirect there.

In particular, some types of place IDs may sometimes cause a NOT_FOUND response, or the API may return a different place ID in the response.

Place-Ids don’t just refer to points of interest or “businesses”.

Place IDs are available for most locations, including businesses, landmarks, parks, and intersections.

Proposed solution

Taking the requirements and nice-to-haves in consideration, several categories of technical solutions can be identified.

DHT, or distributed Key-Value storage.

As a user, I want to be able to look up the OSM type-ID-tuple for a certain OpenPlaceID.

As a user, I want to be able to look up the OpenPlaceID for an OSM Id.

As a user, I want to be able to look up the OpenPlaceID for a Google place-Id or a Yelp Business Id. Or any other third party database.

As a user, I want to be able to look up the Yelp Business Id, or Google place-id for a certain OpenPlaceID.

As a user, I want to be able to insert new OpenPlaceIDs for missing places into a global accessible database.

As a user, I want to be able to edit the OSM type-ID-tuple that belongs to a OpenPlaceID, when the OSM data has changed.

As a user, I want to be able to change the OSM type-ID-tuple into an existing OpenPlaceID, so as to redirect to another feature. If that feature merged, moved or otherwise changed.

As a user, I want to be able to split a feature into two or more features.

As a user, I want to be able to merge two or more OpenPlaceIDs into one.

An OSM type-ID-tuple means that we return and store not just the numeric OSM id, since that can conflict with IDs of other elements. E.g. Node 1337 might conflict with Way 1337. We ensure uniqueness by adding the type: node, way or relation.

The requirement to not be limited to OSM only, requires the DHT or key-value database to allow mulitple values, or multiple mappings for one OpenPlaceID. E.g. one OpenPlaceID might refer to an OSM type-ID-tuple, and a Google Place ID, and a Yelp Business Id and so on.

The requirement to reverse look up placeIds for a certain external Id (OSM, Google etc) must not nessecarily be handled by the database, it could be done by clients or database(s) built specially for this case from the original mapping.

This storage would be distributed by nature and not owned, nor ran by any one entity, so as to avoid having a single point of failure. While that is the goal, an intermediate solution may rely on more centralised setups untill the proper distribution algorithms are fully in place.

Solutions to this, are obviously several of the larger Blockchains, or several of the more established DHTs. An alternative could be a setup like DNS, in which there is a central authority (root) but that only delegates parts of the databases untill we reach the final authority to handle the OpenPlaceID-OSM-id mapping.

Search microservice

As a developer, I want to access a search service that allows me to query for a place and receive one or more OpenPlaceIDs in return.

As a developer, I want to be able to run this search service on premise or embedded in my software, so as not to rely on third parties.

Such a microservice would use a Decentralized, global mapping.

A search query would return a list of results. When the search query is precise enough to return only one single result, that would be a list with one item.

A search result must contain enough metadata for a developer or user, to determine what features are being returned. E.g. only a list of OpenPlaceIDs would not suffice, as they are meaningless. Including names, types or other distinguishable attributes would help identifying the correct OpenPlaceID from the result.

Search Index

The search index maps OpenPlaceIDs with their searchable attributes. Attributes can be values such as boundaries, locations, names, types or any other interesting tag.

The search index must be in a format that can either be converted to local database systems (such as elastic-search, or postgresql) or is already in such a format.

The search index would need to be either reproducable or globally accessible.

Reproducable

Each server running the search service, would need to be able to build a mapping of searchable-attributes with their OpenPlaceIDs.

One way to achieve this, is to iterate over all the OpenPlaceIDs in the global key-value storage (possibly through the redirection service) and from there attach the attributes as found in OSM to the index.

While slow, it is fully reproducable and deterministic for a given OSM database snapshot.

Globally accessible

An alternative or additional feature, is to publish an additional database next to the Key-Value storage, that contains a pre-made index.

Search services can use this to either bootstrap the index-building, or to fall back to, in case of outdated data. Or both.

Redirection microservice

As a developer, I want to fetch a OpenPlaceID and be redirected to the correct OSM URI of that node, way, or relation.

As a developer, I want to be able to follow multiple redirections leading to an OSM URI of that node, way, or relation, when the nodes, ways or relations were merged, moved or otherwise changed.

As a developer, I want to get the correct HTTP headers for features that once existed but no longer exists (410 gone) have never existed (404 not found) or is otherwise not available (404 not found).

As a developer, I want to be able to run this redirection service on premise or embedded in my software, so as not to rely on third parties.

Immediate work and investigation

Determine what DHTs might work for storing the amounts of data that OSM needs.

  • ~1.3 million Wikidata-OSM mappings immediately
  • ~15.3 million Amenities in OSM
  • ~71.1 million Names in OSM

Also determine what would warrant a “place-id” and how many of those unique “features” there are in OSM.

Further investigation

The IDs could present interlinked sets, or a tree (nested) structures.

Tree structures, nested sets

For example, say, we have this monument

This represets a tree:

  Europe » the Netherlands » Amsterdam » National Monument

IDs could represent that:

  Europe (.eu) » the Netherlands (.eu.nl) » Amsterdam (.eu.nl.amsterdam) » National Monument (.eu.nl.amsterdam.natmon)

Obviously this example is naïve. It needs further investigation into how to avoid conflicts. And it might represent issues when trees grow wildy unbalanced. While the example uses human readable names, the implementation could use fully randomized ids using basic character sets.

There might be ID-schemes that allow such hierarchical setups. E.g. URI, or DNS is such a system.

graph structure

Above tree requires there to be only one parent always. In reality, many elements are not so simple. E.g. Turkey: would that be “.eu.tr” or “.asia.tr”? Same for roads, water, areas spanning borders etc.

A graph structure allows for more eleborate relations, in which an “element” can have multiple parents. Quite often, this will lead to a situation in which a relation needs to be named: “natmon--is in-->amsterdam”.

There might be ID-schemes that encode such relations and structures.

Further reading