UPDATE 2023-03-27: This page is obsolete, as it refers to a prior version of this blog. However, it may be of historical interest.
As noted in my discussion of URI rewriting, we can use Apache to enforce canonical URI forms for HTML files and directories, but need to use a plugin to enforce canonical forms for URIs handled by Blosxom. I’ve thus written a new canonicaluri plugin that checks to see whether the requested URI is in the canonical form for the type of page being requested, and if necessary does a browser redirect to the canonical form of the URI.
The canonical forms are defined as follows:
- URIs for the blog root, categories, and date-based archives should not have an
index.*
component if the flavour being requested is the default flavour (normally “html”), and if anindex.*
component is not present then the URI should have one (and only one) trailing slash. - URIs for individual entry pages should not have a trailing slash, and also should not have a flavour extension if the flavour being requested is the default flavour (e.g., “html”).
For example, if you request either of the URIs
http://www.example.com/blog/foo
http://www.example.com/blog/foo/index.html
where “foo” is a category and “html” is the default flavour, this plugin will force a redirect to the canonical URI
http://www.example.com/blog/foo/
Similarly, if you request either of the URIs
http://www.example.com/blog/foo/
http://www.example.com/blog/foo.html
where “foo” is an individual entry and “html” is the default flavour, this plugin will force a redirect to the canonical URI
http://www.example.com/blog/foo
Note that this plugin should be used in conjunction with the extensionless plugin and should be configured to run after that plugin, in order to recognize extensionless URIs for individual entries; otherwise redirection will fail for individual entry pages.
Also note that this plugin depends on the Apache URI rewriting rules to enforce the restriction that a URI should never have more than one trailing slash. The plugin as presently written can’t handle this case because it depends on the path_info()
function to get the URI path, and the path_info()
value has already been stripped of any excess trailing slashes that might have been present in the original URI.
See the plugin code itself for the full documentation. If you encounter problems with the plugin (or if you just use it and like it) please send me email.