|Bèr Kessels be1fbb63e0 Add requirements to allow for merging and splitting||11 月之前|
OpenplaceReviews, as well as other niche maps, have a need for a persistent identifier when referring to a “place”.
Open Street Map (OSM) uses Ids for its internal structure: to identify a node. Nodes may change, when the feature they refer to, does not.
Take some scenario’s and issues:
There are many more scenario’s in which the OSM-id might change but the “place” it refers to did not. And in which the OSM-id did not change but the “place” it refers to did change.
There is a clear disconnect between what OSM needs from its IDs and what a place database needs from its IDs.
It must be:
It may be:
The problem has been descriped in detail in OSM wiki. Some projects have taken as shot at developing or describing something that could work. None have finished or are active.
There are over 1.3 million tags with wikidata in OSM. This is still only 0.02% of all nodes. And just under 5% of all “relations”.
For example, businesses are mostly untagged.
A microservice has been proposed that takes a wikidata ID and resolves that to the relevant OSM URL.
It maintains an internal mapping between wikidata IDs and their OSM counterpart elements (relations, ways, nodes). The demo server is now giving errors.
Nominatim’s place_id is only an internal parameter of the engine. You cannot use place_id for anything, it is a technical database key and depends on a single Nominatim instance.
The solution is to link to the object with a certain property, usually a certain combination of tags. If a unique object exists, you are redirected to the object’s web page at openstreetmap.org. Otherwise, if for example the referred way has been split, a search result page shows all possible objects.
Google uses a place-id.
Some highlights and relevant parts:
A place-id spans across projects and APIs:
You can use the same place ID across the Places API and a number of Google Maps Platform APIs.
A place-id is discovered, using a “fuzzy” search.
A common way of using place IDs is to search for a place .. then use the returned place ID to retrieve place details.
A place-id may be replaced or moved, in which case the API should find the new id and redirect there.
In particular, some types of place IDs may sometimes cause a NOT_FOUND response, or the API may return a different place ID in the response.
Place-Ids don’t just refer to points of interest or “businesses”.
Place IDs are available for most locations, including businesses, landmarks, parks, and intersections.
Taking the requirements and nice-to-haves in consideration, several categories of technical solutions can be identified.
As a user, I want to be able to look up the OSM type-ID-tuple for a certain OpenPlaceID.
As a user, I want to be able to look up the OpenPlaceID for an OSM Id.
As a user, I want to be able to look up the OpenPlaceID for a Google place-Id or a Yelp Business Id. Or any other third party database.
As a user, I want to be able to look up the Yelp Business Id, or Google place-id for a certain OpenPlaceID.
As a user, I want to be able to insert new OpenPlaceIDs for missing places into a global accessible database.
As a user, I want to be able to edit the OSM type-ID-tuple that belongs to a OpenPlaceID, when the OSM data has changed.
As a user, I want to be able to change the OSM type-ID-tuple into an existing OpenPlaceID, so as to redirect to another feature. If that feature merged, moved or otherwise changed.
As a user, I want to be able to split a feature into two or more features.
As a user, I want to be able to merge two or more OpenPlaceIDs into one.
An OSM type-ID-tuple means that we return and store not just the numeric OSM id, since that can conflict with IDs of other elements. E.g. Node 1337 might conflict with Way 1337. We ensure uniqueness by adding the type: node, way or relation.
The requirement to not be limited to OSM only, requires the DHT or key-value database to allow mulitple values, or multiple mappings for one OpenPlaceID. E.g. one OpenPlaceID might refer to an OSM type-ID-tuple, and a Google Place ID, and a Yelp Business Id and so on.
The requirement to reverse look up placeIds for a certain external Id (OSM, Google etc) must not nessecarily be handled by the database, it could be done by clients or database(s) built specially for this case from the original mapping.
This storage would be distributed by nature and not owned, nor ran by any one entity, so as to avoid having a single point of failure. While that is the goal, an intermediate solution may rely on more centralised setups untill the proper distribution algorithms are fully in place.
Solutions to this, are obviously several of the larger Blockchains, or several of the more established DHTs. An alternative could be a setup like DNS, in which there is a central authority (root) but that only delegates parts of the databases untill we reach the final authority to handle the OpenPlaceID-OSM-id mapping.
As a developer, I want to access a search service that allows me to query for a place and receive one or more OpenPlaceIDs in return.
As a developer, I want to be able to run this search service on premise or embedded in my software, so as not to rely on third parties.
Such a microservice would use a Decentralized, global mapping.
A search query would return a list of results. When the search query is precise enough to return only one single result, that would be a list with one item.
A search result must contain enough metadata for a developer or user, to determine what features are being returned. E.g. only a list of OpenPlaceIDs would not suffice, as they are meaningless. Including names, types or other distinguishable attributes would help identifying the correct OpenPlaceID from the result.
The search index maps OpenPlaceIDs with their searchable attributes. Attributes can be values such as boundaries, locations, names, types or any other interesting tag.
The search index must be in a format that can either be converted to local database systems (such as elastic-search, or postgresql) or is already in such a format.
The search index would need to be either reproducable or globally accessible.
Each server running the search service, would need to be able to build a mapping of searchable-attributes with their OpenPlaceIDs.
One way to achieve this, is to iterate over all the OpenPlaceIDs in the global key-value storage (possibly through the redirection service) and from there attach the attributes as found in OSM to the index.
While slow, it is fully reproducable and deterministic for a given OSM database snapshot.
An alternative or additional feature, is to publish an additional database next to the Key-Value storage, that contains a pre-made index.
Search services can use this to either bootstrap the index-building, or to fall back to, in case of outdated data. Or both.
As a developer, I want to fetch a OpenPlaceID and be redirected to the correct OSM URI of that node, way, or relation.
As a developer, I want to be able to follow multiple redirections leading to an OSM URI of that node, way, or relation, when the nodes, ways or relations were merged, moved or otherwise changed.
As a developer, I want to get the correct HTTP headers for features that once existed but no longer exists (410 gone) have never existed (404 not found) or is otherwise not available (404 not found).
As a developer, I want to be able to run this redirection service on premise or embedded in my software, so as not to rely on third parties.
Determine what DHTs might work for storing the amounts of data that OSM needs.
Also determine what would warrant a “place-id” and how many of those unique “features” there are in OSM.
The IDs could present interlinked sets, or a tree (nested) structures.
For example, say, we have this monument
This represets a tree:
Europe » the Netherlands » Amsterdam » National Monument
IDs could represent that:
Europe (.eu) » the Netherlands (.eu.nl) » Amsterdam (.eu.nl.amsterdam) » National Monument (.eu.nl.amsterdam.natmon)
Obviously this example is naïve. It needs further investigation into how to avoid conflicts. And it might represent issues when trees grow wildy unbalanced. While the example uses human readable names, the implementation could use fully randomized ids using basic character sets.
There might be ID-schemes that allow such hierarchical setups. E.g. URI, or DNS is such a system.
Above tree requires there to be only one parent always. In reality, many elements are not so simple. E.g. Turkey: would that be “.eu.tr” or “.asia.tr”? Same for roads, water, areas spanning borders etc.
A graph structure allows for more eleborate relations, in which an “element” can have multiple parents. Quite often, this will lead to a situation in which a relation needs to be named: “natmon--is in-->amsterdam”.
There might be ID-schemes that encode such relations and structures.