Similar Posts: WordPress Plugin

Updated

13th January, 2007

Similar Posts Version 2.0.0 beta is now available. Largely rewritten, it includes better handling of extended character sets, some new options, and many more possible styles of display. Version 2 now has its own page where future developments will be documented.

Comments are now closed for this post but can be added to the new page.

  • Version 1.14 just fixes two bugs. The Posts plugin were not working if more than one was installed! Also some users were getting odd characters appearing where they shouldn’t.
  • Version 1.13 allows some parameters (like ‘before_title’) to be blank, or to be more complex, e.g., 'before_title=
  • . Also allows trimming an excerpt so it ends with a word or a sentence and not in mid-word. NB excerpt_length is now counted in characters and not words as previously.
  • Version 1.12 fixes a bug with the option to show static pages.
  • Version 1.11 improves the sanity checking of parameter values to avoid database errors. Also lets you skip over a number of posts if you so wish (though this makes more sense for the related Recent Posts plugin).
  • Version 1.10 adds the ability to exclude certain authors on a multi-author blog. The text to show if no matches are found is now customisable and there is a new option for displaying links. All the options can be set via the options page but new in this version they can also be specified via a query-style parameter. This gives the flexibility to use Similar Posts in several places with different behaviour. I have also built three new plugins which use the same infrastructure: Random Posts, Recent Posts, and Recent Comments.
  • Version 1.03 adds the ability to exclude static pages or certain categories of post. It also has an improved stopword list based on the one used by MySQL.
  • Version 1.02 is a bug-fix and security release. It protects against a vulnerability where stray characters in the matching terms could cause database errors. This update also fixes a potential naming conflict between the internal names of its options and those of other plugins and establishes sensible default values for these options. When you update the installation you should visit the options page and check the settings are to your liking.
  • Version 1.01 restores the ability to use keywords or otherwise tweak the terms used to find similar posts. It also allows you to import keywords previously assigned using the Related Posts plugin.
  • Version 1.00

Description

I’ve been looking for a better alternative to the Related Posts plugin for WordPress. At least better on a blog like mine … and maybe yours too. Related Posts finds matches to other posts using the post’s title or any keywords that you care to define. The titles of my posts unfortunately reflect their context rather than their content and I have too many of them to go back and decide on keywords for them all. So I wrote a new plugin, Similar Posts,using a different algorithm which finds related posts based on the contents of a post and the pattern of word-usage rather than just its title.

You can see it in operation in the ‘Related Reading’ section of my sidebar (on single-post pages).

Instructions

  1. Download the latest version of Similar Posts.
  2. Upload the whole plugin folder (Similar_Posts) to your /wp-content/plugins/ directory. (Similar Posts can be installed without uninstalling Related Posts if you want to try out the difference)
  3. Go to your Admin|Plugins page and activate Similar Posts. This will automatically add an index to your posts table to enable fast matching. Don’t be alarmed if this takes a few moments.
  4. Put 

    at the place in your WP loop where you want the list of similar posts to appear. By default the plugin wraps each post with

  5. and
  6.  

    but that can be changed. Use the Admin|Options|Similar Posts page to set how you want the list of posts displayed.

Acknowledgements

Similar Posts is based on Related Posts 2.02 by Alexander Malov and Mike Lu. I’ve also used some code from Rich Boakes and Ken Cheung.

Under the Hood

By default Similar Posts scans a post each time it needs to find words to match. For long posts this can add an unwanted overhead. The plugin can speed up the process by caching the search terms as a custom field (named ‘similarterms’). When a post is published for the first time or subsequently edited the custom field gets updated. Since you probably have a lot of posts you won’t want to edit each one manually to cache the terms. Instead, the Admin|Options|Similar Posts page lets you process all your posts in one go. It won’t overwrite any terms that are already cached so there is also a button to clear all terms.

While you are editing a post you can modify the ‘similarterms’ custom field to add keywords or replace the automatically generated terms altogether. Note that the field will not be visible until you have saved the post at least once. If you ever want to regenerate the default terms just delete the current set.

If you have previously used the Related Posts plugin to assign keywords to posts you can now import them all from the Options page.

The Similar_Posts folder contains a file, ‘en.words.php’, which is used to supply the ‘stop list’ of common words that you want to exclude as search terms. It is supplied this way so that if your blog is in a different language you can use an alternative stop list. Similar Posts checks the WPLANG constant to see which language WordPress is using and looks for a file on that basis. For example, if the the language code is ‘fr’ for French, Similar Posts will look for a file ‘fr.words.php’. The language files must be in the same directory as similar-posts.php.

The way the list is displayed can be set from the Options|Similar Posts
page. You can exclude certain categories of post, for example, or
change the code that comes before and after the link.

These general options can be overridden in specific cases by passing a query-style parameter, e.g.:

<!--?php random_posts('limit=10'); ?-->
lists 10 random posts
<!--?php random_posts('none_text=sorry&#038;show_static=false'); ?-->
lists the default number of posts, excluding static pages, and specifies what to display if there are none

If you do not specify an option its value is taken from the options page.
This means you can use the template tag in different ways in different places.

The full list of parameters is as follows (with the default value in parentheses):

limit
maximum number of posts to show (5)
skip
how many posts to skip before listing (0)
show_static
include static pages (false)
show_private
include password-protected posts (false)
excluded_cats
comma separated list of categories to exclude (by ID) (9999, the default means none)
excluded_authors
comma separated list of authors to exclude (by ID) (9999, the default means none)
none_text
what to show if no posts match–can be plain text or a permalink
before_title
what to show before a link ()
after_title
what to show after a link( 

)

trim_before
remove the first instance of ‘before_title’ (false)
show_excerpt
include a snippet of the post after the link (false)
excerpt_length
how long an excerpt should be (50 characters)
excerpt_format
‘char’, the default, does nothing, ‘word’ trims the excerpt to the last full word, and ‘sent’ to the full sentence. If the excerpt would be trimmed to nothing no trimming is applied.
ellipsis
add ‘ …’ after the excerpt
before_excerpt
what to show before an excerpt ()
after_excerpt
what to show after an excerpt ( 

)

Feedback

If you try this plugin leave a comment here to let me know how you get on.

89 replies on “Similar Posts: WordPress Plugin”

  1. thanks father rob. i had no idea you were a jesuit. i was very active in the vietnamese dong hanh movement, which recently joined CLC.

  2. Hey Rob,
    Thank you for the nice plugin.
    I went checking the similarterms custom fields, just to take a look at how good the word selection looked, and found out that there may be a i18n bug.
    Words that have special characters in them are chopped at the special character (sometimes before, sometimes after the character), and no special character ever shows up. Sometimes they appear as blank spaces, sometimes the rest of the word is cut off.
    The language in the blog is defined by the WP variable, and i use the UTF8 option in WP. I am running a local copy of my site for testing.

    I would love to use the plugin!
    Thank you for your attention.

  3. eduardo: The problem is that PHP’s string functions do treat utf-8 encoded strings as though they were 8-bit characters. I use str_word_count to count the words in content and lots of accented characters are treated as separators. I think there is a way round this using regular expressions but it will probably be quite a bit slower. I’ll experiment.

  4. Hi again Rob,
    I am no programmer, but i went checikng the manual for str_word_count and found out that there was a third argument added on PHP5.1, which might make the plugin work with UTF8 characters. I could run only the plugin in PHP5, my server allows me to do that.
    Do you think it could work? There’s one chatch: when i look into my database with MySQLAdmin, i see characters like é depicted as é , for example, even tough my WP settings are for UTF8.

  5. eduardo: I believe that MySQL itself handles utf-8 but that it’s the admin program which seems to show them incorrectly.

    As for, the extra argument in str_word_count … give it a try if you like and let me know. It seems you have to specify the ‘extra’ characters. Since PHP 5 is not available on many hosts I wouldn’t consider it as a general solution.

    I’m trying using regular expressions.

  6. Sorry php was deleted.

    Al little addition like: ‘categorysame=true’); could make the software look for sim. posts but only if there in the same category as the current one.

  7. Great job, this is what I was exactly looking for. There are few others but not as good as this. Please consider to use a keyword of “related posts” and “related articles” which is commonly used than “similar” so many people will find your plug-in.

  8. Net superior to everything I’ve used in the past, including some software for content analysis. By the way, any intention to add a network analysis component to it (something like AutoMap-a conceptual network analysis program)?

  9. Hey Rob,
    I tried the php5 thing, but it didn’t work, i got the same errors when i ran the php5 function under php4. I suppose it may be because WP is under php4 and i had only your plugin under php5.
    Anyway, this is getting a little too complex for me, i will just wait for your regex solution.
    Thank you for your attention!

  10. eduardo: A new version should be out within days. I’m discovering how much slower development is when you have four related plugins sharing a lot of code!

  11. Is there a way to use the word from simular posts in the metadate keywords… that would be handy… Since it does it work… just need some , between the words then.

  12. I have installed Similar Posts Plugin v2.0.4 in wp2.0.5. My site is bi-lingual (English & myLanguage). Similar Posts displays well with English Posts but with myLanguage (UNICODE) posts don’t display any similar posts. I checked ‘similarterms’ box in edit section of the post, but it does not display any term. Even I put some terms mannually and updated the post. But no result!! Will you please suggest any correction in script for my expected results?

  13. do you think it’s best to release a new version soon considering that wp 2.1 is coming out on the 22nd?

Comments are closed.