Requirements for fannish resource identifiers

Published: 2023-05-09T18:31:26-07:00.

The following blogpost is a summary of a discussion which was held in the Fandom Coders discord about I·D requirements for various types of fannish resources, and how these things might federate out or be handled by other services. Our goal is to create a decentralized network of fannish platforms, so figuring out resource identification requirements is an important first step.

Note that in the discussion which follows, a “resource” might be a work, an author, a tag, a bookmark, or something else… anything which might be a metadata subject.

① Resources should have Tag U·R·I’s.

Resource Tag U·R·I’s should be U·R·I’s of the form :⁠—

tag:<domain>,<date>:<path>

—: where <domain> is the domain name of a site, <date> is some date (in YYYY-MM-DD format), and <path> is some path decided by the person or people who owned <domain> at <date> to uniquely identify the resource. Tag U·R·I’s are ideal for fannish resources for the following reasons :⁠—

In order for a fannish resource to be published on the internet, it must be published at a domain on a date. So these requirements are easily satisfied.
No external registration (beyond owning a domain name) is necessary to mint U·R·I’s, and no maintenance is necessary.
The domain name in the Tag U·R·I indicates who should be the trusted party when it comes to information about the resource: <domain>. If you hear about the resource from somewhere else, you know to view the information you receive with some level of suspicion.

Some additional notes :⁠—

The term “Tag U·R·I” has no relation to the normal fannish use of “tag”; it’s just what the U·R·I scheme happens to be called.
<date> does not (and maybe should not) have to be the actual date a resource was created. My recommendation would be to set <date> to the date that a service was founded, so that e·g if a service dies and a new one is started at the same domain, the two generate clearly distinguishable U·R·I’s.
It’s not possible to distinguish between beneficial reasons for content changes at <domain> (an author editing a work) and malicious ones (hostile domain takeover). It’s also not possible to verify that the people at <domain> actually controlled the domain at <date>. But if people play by the rules, an accidental name collision will never happen.

② Resources should have canonical U·R·L’s containing their Tag.

The canonical U·R·L for a resource should look like this :⁠—

https://<domain>/<subpath>/tag:<domain>,<date>:<path>

There are a few important things of note here :⁠—

Both instances of <domain> must be the same, or else the U·R·L is not canonical.
The entire Tag U·R·I is present in the U·R·L, allowing it to be identified even if the U·R·L ceases to be dereferencable.
<path> may contain anything, including a query or fragment part.

It is possible for resources to be mirrored. Mirrors must have U·R·L’s like the following :⁠—

https://<mirror-domain>/<mirror-subpath>/tag:<domain>,<date>:<path>

—: that is, the same easily‐recognizable Tag U·R·I, but at a different domain and subpath. Mirrors must identify the canonical U·R·L of the resource they are mirroring. owl:sameAs might be one mechanism of doing this in R·D·F.

③ Crossposted resources should link to each other.

If a work is crossposted in two locations, one is not necessarily “canonical” and the other a “mirror”. Likely, both will be canonical and have their own Tag U·R·I’s (and this is a good thing). Crossposted works should instead identify themselves by linking to each other in some reciprocal fashion. We may need to come up with our own metadata term for specifying this, but see e·g dcterms:hasFormat and dcterms:isFormatOf which encode a similar (but not necessarily reciprocal in the same way) relationship.

④ Platforms should only trust mirrors as a last resort.

And with copious warnings. If at all possible, platforms should direct users to the canonical U·R·L associated with a resource. However, this may not be possible (if an archive moves or goes down). In that case, a platform may direct users to a mirror, with a warning that the mirrored version is not the original published work and may differ in significant ways.

Additional thoughts.

These things were either only briefly touched on, or else are my own ideas which came as I was writing this post.

Mirroring should be explictly opt·in, and ideally automated (to reduce the likelihood of intentional or unintentional error). We will need to develop protocols for this.
For added security, publishing platforms might implement Webfinger, to guard against mirrors which correctly identify works they control but misidentify their path (thus making them appear to be down). Discovery platforms may, and probably should, attempt to make a Webfinger request for the resource with its Tag U·R·I instead of trusting the canonical path. However, supporting Webfinger should not be required of all publishing platforms, and the attack vector from mirrors in this sense is pretty small.
Webfinger or ordinary H·T·T·P redirects could be used to forward services to new “canonical” U·R·L’s in the case that a service moves. However, this trail would only be followable for as long as the redirects or Webfinger endpoint remains up at the original domain.
Instead of mirroring tags, a service might indicate that its version of a tag is intended to be synonymous with another service’s version of a tag using skos:closeMatch. The stronger statement skos:exactMatch requires agreement from both services. Tag mirrors are useful in case the canonical service for a tag goes down, but should not be relied upon otherwise.
Publishing platforms may serve a “tombstone” at the canonical U·R·L for a resource, indicating that it was intentionally deleted. In this case, a mirrored version must not be used.