Fix 303 with client-side redirects
April 6, 2012
I am trying to stay far away from the current TAG discussions of httpRange-14 (now just HR14). I did my time, years ago. I came up with the best solution to date: use “303 See Other”. It’s not pretty, but so far it is the best we’ve got.
I gather now the can of worms is open again. I’m not really hungry for worms, but someone mentioned that the reason it’s open again is that use of 303 is just too inefficient. And if that’s the only problem, I think I know the answer.
If a site is doing a lot of redirects, in a consistent pattern, it should publish its rewrite rules, so the clients can do them locally.
Here’s a strawman proposal:
As an example deployment, consider DBPedia. Everything which is the primary subject of a Wikipedia entry has a URL has the form http://dbpedia.org/resource/page_title. When the client does a GET on that URL, it receives a 303 See Other redirect to either http://dbpedia.org/data/page_title or http://dbpedia.org/page/page_title, depending on the requested content type.
So, with this proposal, DBPedia would publish, at http://dbpedia.org/.well-known/rewrite-rules this content:
RewriteRule /resource/(.*) /data/$1 303
This would allow clients to rewrite their /resource/ URLs, fetch the /data/ pages directly, and never going through the 303 redirect dance again.
The content-negotiation issue could be handle by traditional means at the /page/* address. When the requested media type is not a data format, the response could use a Content-Location header, or a 307 Temporary Redirect. The redirect is much less painful here; this is a rare operation compared to the number of operations required when a Semantic Web client fetches all the data about a set of subjects
My biggest worry about this proposal is that RewriteRules are error prone, and if these files get out of date, or the client implementation is buggy, the results would be very hard to debug. I think this could be largely addressed by Web servers generating this resource at runtime, serializing the appropriate parts of the internal data structures they use for rewriting.
This could be useful for the HTML Web, too. I don’t know how common redirects are in normal Web browsing or Web crawling. It’s possible the browser vendors and search engines would appreciate this. Or they might think it’s just Semantic Web wackiness.
So, that’s it. No more performance hit from 303 See Other. Now, can we close up this can of worms?
ETA: dbpedia example. Also clarified the implications for the HTML Web.