Ladyā€™s Computer: MY computer over YOUR internets

Ladyā€™s Weblog

Requirements for fannish resource identifiers

Lady

Published: .

The following blogpost is a summary of a discussion which was held in the Fandom Coders discord about IĀ·D requirements for various types of fannish resources, and how these things might federate out or be handled by other services. Our goal is to create a decentralized network of fannish platforms, so figuring out resource identification requirements is an important first step.

Note that in the discussion which follows, a ā€œresourceā€ might be a work, an author, a tag, a bookmark, or something elseā€¦ anything which might be a metadata subject.

ā‘  Resources should have Tag UĀ·RĀ·Iā€™s.

Resource Tag UĀ·RĀ·Iā€™s should be UĀ·RĀ·Iā€™s of the formā€Æ:ā ā€”

tag:<domain>,<date>:<path>

ā€”:ā€Æwhere <domain> is the domain name of a site, <date> is some date (in YYYY-MM-DD format), and <path> is some path decided by the person or people who owned <domain> at <date> to uniquely identify the resource. Tag UĀ·RĀ·Iā€™s are ideal for fannish resources for the following reasonsā€Æ:ā ā€”

  • In order for a fannish resource to be published on the internet, it must be published at a domain on a date. So these requirements are easily satisfied.

  • No external registration (beyond owning a domain name) is necessary to mint UĀ·RĀ·Iā€™s, and no maintenance is necessary.

  • The domain name in the Tag UĀ·RĀ·I indicates who should be the trusted party when it comes to information about the resource: <domain>. If you hear about the resource from somewhere else, you know to view the information you receive with some level of suspicion.

Some additional notesā€Æ:ā ā€”

  • The term ā€œTag UĀ·RĀ·Iā€ has no relation to the normal fannish use of ā€œtagā€; itā€™s just what the UĀ·RĀ·I scheme happens to be called.

  • <date> does not (and maybe should not) have to be the actual date a resource was created. My recommendation would be to set <date> to the date that a service was founded, so that eĀ·g if a service dies and a new one is started at the same domain, the two generate clearly distinguishable UĀ·RĀ·Iā€™s.

  • Itā€™s not possible to distinguish between beneficial reasons for content changes at <domain> (an author editing a work) and malicious ones (hostile domain takeover). Itā€™s also not possible to verify that the people at <domain> actually controlled the domain at <date>. But if people play by the rules, an accidental name collision will never happen.

ā‘” Resources should have canonical UĀ·RĀ·Lā€™s containing their Tag.

The canonical UĀ·RĀ·L for a resource should look like thisā€Æ:ā ā€”

https://<domain>/<subpath>/tag:<domain>,<date>:<path>

There are a few important things of note hereā€Æ:ā ā€”

  1. Both instances of <domain> must be the same, or else the UĀ·RĀ·L is not canonical.

  2. The entire Tag UĀ·RĀ·I is present in the UĀ·RĀ·L, allowing it to be identified even if the UĀ·RĀ·L ceases to be dereferencable.

  3. <path> may contain anything, including a query or fragment part.

It is possible for resources to be mirrored. Mirrors must have UĀ·RĀ·Lā€™s like the followingā€Æ:ā ā€”

https://<mirror-domain>/<mirror-subpath>/tag:<domain>,<date>:<path>

ā€”:ā€Æthat is, the same easilyā€recognizable Tag UĀ·RĀ·I, but at a different domain and subpath. Mirrors must identify the canonical UĀ·RĀ·L of the resource they are mirroring. owl:sameAs might be one mechanism of doing this in RĀ·DĀ·F.

ā‘¢ Crossposted resources should link to each other.

If a work is crossposted in two locations, one is not necessarily ā€œcanonicalā€ and the other a ā€œmirrorā€. Likely, both will be canonical and have their own Tag UĀ·RĀ·Iā€™s (and this is a good thing). Crossposted works should instead identify themselves by linking to each other in some reciprocal fashion. We may need to come up with our own metadata term for specifying this, but see eĀ·g dcterms:hasFormat and dcterms:isFormatOf which encode a similar (but not necessarily reciprocal in the same way) relationship.

ā‘£ Platforms should only trust mirrors as a last resort.

And with copious warnings. If at all possible, platforms should direct users to the canonical UĀ·RĀ·L associated with a resource. However, this may not be possible (if an archive moves or goes down). In that case, a platform may direct users to a mirror, with a warning that the mirrored version is not the original published work and may differ in significant ways.

Additional thoughts.

These things were either only briefly touched on, or else are my own ideas which came as I was writing this post.

  • Mirroring should be explictly optĀ·in, and ideally automated (to reduce the likelihood of intentional or unintentional error). We will need to develop protocols for this.

  • For added security, publishing platforms might implement Webfinger, to guard against mirrors which correctly identify works they control but misidentify their path (thus making them appear to be down). Discovery platforms may, and probably should, attempt to make a Webfinger request for the resource with its Tag UĀ·RĀ·I instead of trusting the canonical path. However, supporting Webfinger should not be required of all publishing platforms, and the attack vector from mirrors in this sense is pretty small.

  • Webfinger or ordinary HĀ·TĀ·TĀ·P redirects could be used to forward services to new ā€œcanonicalā€ UĀ·RĀ·Lā€™s in the case that a service moves. However, this trail would only be followable for as long as the redirects or Webfinger endpoint remains up at the original domain.

  • Instead of mirroring tags, a service might indicate that its version of a tag is intended to be synonymous with another serviceā€™s version of a tag using skos:closeMatch. The stronger statement skos:exactMatch requires agreement from both services. Tag mirrors are useful in case the canonical service for a tag goes down, but should not be relied upon otherwise.

  • Publishing platforms may serve a ā€œtombstoneā€ at the canonical UĀ·RĀ·L for a resource, indicating that it was intentionally deleted. In this case, a mirrored version must not be used.