James Tauber

journeyman of some

blog > 2008 > 11 > 06 >

Atom, Google Reader and Duplicates on Planets

For a while I've wondered why posts syndicated across multiple planets don't get picked up by Google Reader as duplicates (and automatically marked read when I've read it the first time around).

I wasn't sure whether the problem was:

so I decided to investigate further with my own feed as the source and the three planets my site is syndicated to (that I know of).

Let's take my post Cleese and Disk Images.

My feed gives both an id and a link for both the feed itself and each individual entry. That makes it possible, at least, for planets and readers to do the Right Thing. So I don't think the problem is my feed.

On the Official Planet Python:

On both the Unofficial Planet Python and on Sam Ruby's Planet Intertwingly:

Note that the handling of the author by the latter two feeds is correct per the Atom RFC, although I have noticed that Safari's feed reader gets this wrong and, despite the author in the source element, uses the inherited author from the planet feed itself.

But, in short, the Atom-feed-based Planets do the Right Thing, although IMHO the RSS-1.0-based Official Planet Python does not. That may not be the Planet's fault. The RSS 1.0 Spec (or any RSS for that matter) may not make the distinction between id and link.

So given that my feed and two of the planet feeds do the right thing, I guess that places the blame with Google Reader.

Why does Google Reader not honour the entry id and automatically mark duplicates as already read when you've read it the first time. That's my pony request for Google Reader.

And by the way, the same thing applies to feeds themselves, not just entries. Feedburner, for example, does the right thing and passes through the id of a source Atom feed into its own Atom feed version. However, if you subscribe to both the source and Feedburner version of of a feed, Google Reader doesn't not identify them as the same feed. Of course, if either are RSS, I'd assume all bets are off.

So, in summary, Atom supports doing the Right Thing. The Atom-based Planets do the Right Thing. Google Reader doesn't take advantage of this.

Categories:
prev « google
prev « atom_format » next
prev « blogging » next

Comments (9)

Eric Florenzano on Nov. 6, 2008:

I totally agree! Every time you, Simon Willison, or a handful of other people post anything, I get about 4 copies of that post in my inbox. It's not frustrating enough to make me change any of my subscriptions, but it's frustrating enough to be...well...frustrating.

Andreas on Nov. 6, 2008:

ohh i so agree! this problem has grown now with everybody writing a post once a day in november!

michele on Nov. 6, 2008:

same here... :-(

daryl on Nov. 6, 2008:

Thanks for looking into this. I am glad it was not just me noticing this :)

gmf on Nov. 6, 2008:

Just for your interest:

http://planet.thehazeltree.org/

Phil Ringnalda on Nov. 6, 2008:

Last time I remember having this discussion, the bottom line was that to avoid preemptive posts, where I guess your id for a particular post (much easier with some ids) and publish one before you can, the first time a reader spots duplicates from two particular sources, it has to ask which one is authoritative for what part of the id.

Hard to imagine a pretty UI that asks you to figure out how someone composes their ids, and what parts identify the feed, the post, and the provider.

James Tauber on Nov. 6, 2008:

You know, Phil, that does sounds familiar.

Calvin Spealman on Nov. 7, 2008:

If Google Reader gets around to fixing this and being smarter about marking things read between feeds, I'd love to see the extra mile and have it honor my web history (which I use via Google Toolbar in Firefox and Google Chrome) and mark items read if I've gone to the link before it even arrived. I get my own posts from the feeds, and often have read something from a link in another blogger's post before getting to the original item.

Chris Leary on Nov. 10, 2008:

Agreed -- I've wanted this feature for a long time. No reason to mark the same article as read twice!

Created: Nov. 6, 2008
Last Modified: Nov. 6, 2008
Author: James Tauber