Chaz Meyers (cpm) wrote in suggestions,
Chaz Meyers

RSS should respect <guid>

RSS should respect <guid>.

Short, concise description of the idea
<guid> provides a unique identifier and (sometimes) a valid link. <guid> should be used when possible instead of <link>.

Full description of the idea

<link> is used in many cases to link to an entry. This is useful since many feeds only post excerpts, and many blogs have their own commenting mechanisms. LiveJournal also uses it to figure out if an <item> is new or not. If it's not new, they'll update the original post instead of posting a new one. <link> is an optional element in every version of RSS except for .91 and 1.0.

<guid> was introduced in RSS 2.0 and allows posters to specify a unique identifier for every <item>. Since <link> isn't guaranteed to be unique or consistant, when both <link> and <guid> are present, <guid> should be preferred. That's what it's there for, after all.

If <guid> is present, it's value must be a link to the entry, unless if the optional isPermaLink attribute is set to false. So, if is present and is a permalink, it should be displayed along with <link>. If <guid> and <link> are both present and contain the same URL, only one should be displayed.

An ordered list of benefits

  • diveintomark and other feeds that do not include full articles in their feed and use <guid> instead of <link> would be useful.
  • A less hackish way to tell if an is unique or not.
  • Feeds can contain multiple <item>'s <link>ing to one news article without LJ assuming that the entire feed is full of duplicate <link>s that cannot be trusted.

An ordered list of problems/issues involved

  • XML::RSS does not support RSS 2.0, and that is what we're using right now. You can retreive <guid>, but you cannot tell if isPermaLink is absent, 'true', or 'false'.

An organized list, or a few short paragraphs detailing suggestions for implementation

Only a few lines need to be changed to favor <guid> to <link> when determining uniqueness. The isPermaLink issue, is another matter. I have three ideas.

  1. Wait for the XML::RSS people to fix the problem. RSS has been around since April 2002. If they haven't added support for RSS 2.0 by now, I doubt they will any time soon. This is the theoretical best solution, but I wouldn't hold my breath on this one.
  2. Use a generic XML parser, such as XML::Simple or XML::Parser. This solution is ideal because a mess of unrelated code will have to be modified. However, I think this is the cleanest reliable solution.
  3. Forget isPermaLink and use a regex to see if guid looks like a URL. This will probably work 99% of the time, but the spec does warn against that. It clearly states that if isPermaLink is false, RSS applications may not assume that it is a link to the article. I suppose it is possible for someone to use identifiers as their <guid> that look like urls but do not point to valid resources.

Aside: Apologies for the original non-html escaped posting and the redundant second posting. After I posted the first time, the results page pointed me to a link saying that the database was unavailable. The link it produced was with no id. Perhaps that is a bug in the support template thing? Either way, I assumed the post did not go through. Then, I read the comment asking to escape the tags.

Tags: ~ historical
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded