Similar Posts

Caveat

Unfortunately, due to ill-health, this plugin has not been developed or supported properly for some years. It works with the latest versions of WordPress (including on this website) but could possibly conflict with any WordPress features added after 2008 — e.g. custom post types — if you use them.

Purpose

This plugin displays a list of posts which are related or similar to the current post.

This is version 2.6.2.0 download latest version. It is compatible with WordPress 1.5–2.6.2.

  • 2.6.2.0 fixes a problem with stemming and stop words and offers a new fuzzy matching capability; supplies a .pot file making internationalisation possible; introduces the {imagealt} output tag and allows {excerpt} to output whole sentences; the content filter and the widget can now take a parameter string; output can be automatically placed after post content without editing theme files.
  • 2.6.1.2 fixes the German-language stemmer which should have been encoded as utf8.
  • 2.6.1.1 fixes the Italian-language stemmer which was crashing under PHP4.
  • 2.6.1.0 allows the current post to be marked manually where the automatic mechanism fails; when used as a widget the plugin now honours the setting to show nothing when there is no output; {commenterlink} now applies the appropriate WordPress filter; and fixed a problem with some installations not finding the right language files.
  • 2.6.0.1 fixes the option to include attachments and adds a parameter to the {imagesrc} output tag to append a suffix to the image name.
  • 2.5.0.11 has a new option to include posts which are attachments; a new output template tag {authorurl} which points to the archive of the author’s posts; new behaviour for the {php} output tag which can now accept other output tags in the code; and includes a fix for MySQL problems in some locales’.
  • 2.5.0.10 provides the ability to select from two algorithms for term extraction; allows you to specify post relationships by hand; and fixes a problem indexing tags in some languages.
  • 2.5.0.9 adds an option to match the current post’s author and extends the options for snippet and excerpt output tags to make the ‘more’ text into a link.
  • 2.5.0.8 adds an option to show posts by status, i.e., published/private/draft/future, changes the {categorynames} and {categorylinks} output tags by applying the ‘single_cat_name’ filter, and fixes a bug in WordPress pre-2.2 that stopped installation code running on Windows servers.
  • 2.5.0 improves the CJK matching algorithm by using bigrams. Also introduces a new output tag {imagesrc}, and adds more parameters to {image}. Fixes bugs with empty categories, excluded posts, and the option to omit current posts.
  • 2.5b28 improves the matching algorithm and adds an experimental mode for blogs in Chinese, Korean, or Japanese.
  • 2.5b27 fixes a bug with the bulk indexing of tags.
  • 2.5b25 makes some important changes: the {image} output tag now serves real thumbnails (couple of bug fixes too); output can now be sorted as you choose with sub-headings included; the {date:raw} tag modifier has been added to help the sorting; the ‘trim_before’ option has been replaced with the more logical ‘divider’.
  • 2.5b24 fixes to stop recursive replacement by content filter, {gravatar} to allow for ‘identicon’ etc., to {commenter} to allow trimming, and to remove a warning in safe mode
  • 2.5b23 brings a new option to filter on custom fields and adds proper nesting of braces in {if}.
  • 2.5b22 moves the manage menu under settings as a subpage, restores automatic indexing on activation, fixes conflicts with the legacy Similar Posts Feed plugin, fixes bugs in several output tags, and introduces the option to show only pages.
  • 2.5b20 doubles the speed of indexing and reduces the memory footprint considerably.
  • 2.5b19 fixes a bug when snippets are stripped of extra tags.
  • 2.5b18 fixes a problem with filtering the output and introduces the conditional tag {if:condition:yes:no}.
  • 2.5b16 fixes a problem with {php}.
  • 2.5b15 fixes for some more installation problems, one or two bugs, and adds the ‘included posts’ setting.
  • 2.5b14 fixes for some of the kinds of installation problems.
  • 2.5b11 fixes some widget problems.
  • 2.5b10 fixes (some?) of the problems folks have been having with no posts found. Most of such errors seem to arise when the proper table is not created and this version addresses that.
  • 2.5b9 has new features and improvements.
  • 2.3.6 restores the widgetiness I managed to remove in 2.3.5!
  • 2.3.5 has been rebuilt to save memory and can match the current post’s tags. It also fixes a bug with categories in WordPress < 2.3.
  • 2.3.4 now works as a widget.
  • 2.3.3 beta adds the ability to include as well as exclude categories and authors and is able to find posts by tag.
  • 2.3.2 beta fixes a conflict between tags and categories.
  • 2.3.1 beta fixes a stupid bug in category exclusion.
  • 2.3.0 beta is compatible with WP 2.3, fixes the {author} bug, and a number of problems related to versions of MySQL.
  • 2.1.1 beta fixes a badly chosen fallback value for the number of terms used to match similar posts.

Ideally, similarity or relatedness would be based on a post’s meaning. Tagging systems try to add meaning after the fact but suffer from two deficiencies, one practical and the other theoretical. When a blog already has many posts it can be impractical to retrofit a tagging system by tagging every post by hand. ‘Automatic’ services, like Yahoo’s, tend to produce too many suggestions which need to be culled, again by hand.

The theoretical problem with tagging is that it tries to pin down a meaning for a post by categorising it under a small number of types, whether those types belong to a predetermined hierarchy or arise by ‘folk’ classification. In fact, a post has a variety of meanings, a multitude of ways it can be related to other posts. Meaning doesn’t just lie in the intention of the author or in the classification of the reader; meaning also inhabits the text itself. Meaning is in the words.

The Similar Posts plugin compares posts by comparing their words. MySQL has a sophisticated full-text searching facility with a carefully tuned algorithm for judging the similarity between texts. Similar Posts extracts representative words from a post’s content, title, and tags and uses the full-text index to find the best matches between posts. This simple approach gives surprisingly good results.

The results can be tweaked in several ways to tailor them for you blog. By default the plugin chooses the 20 most frequent words to make its matches but the number is adjustable. It is worth experimenting to see how many words gives the best results for your blog — it has hardly any impact on speed, even if you set the value high enough to include the whole post. The relative importance given to words in your title may be adjusted so that well-chosen titles can be used to advantage or titles with little relevance downplayed. Similarly, tags can be used to improve matching or not according to your blog and it’s needs.

It is also possible to override the automatic similarity ranking by using a custom field. In the post edit screen create a custom field called ‘sp_similar’ with the ID value of the post to which you wish to ‘link’. You can link to multiple posts by entering a comma-delimited list of IDs.

The plugin has a settings page which lets you change how the output is generated and displayed. There is also a management page where you can change settings which affect the index.

Note: Similar Posts needs to know the ID of the post for which it is generating related posts. WordPress keeps track of that information in a global variable but unfortunately some other plugins can corrupt the data before Similar Posts gets a chance to use it. Similar Posts tries various tricks to get round this but sometimes it fails. The usual symptom is a list of similar posts that stays the same from page to page. You can help Similar Posts out by marking the current post manually by adding a line to your theme files. Find the place where the_content(); is used to display the current post and right after it put similar_posts_mark_current();.

Installation Instructions

  1. If upgrading from a previous version, first deactivate the plugin via the Plugins page and delete the plugin folder from your server.
  2. If you have been using the Similar Posts Feed plugin you should deactivate it as it is now obsolete.
  3. Upload the plugin folder to your /wp-content/plugins/ directory. You will also need to install the Post-Plugin Library.
  4. Go to your admin Plugins page and activate Similar Posts. This will automatically add a new table to enable fast, flexible full-text matching. If the plugin reports that there was a problem creating the table first try deactivating and reactivating the plugin.
  5. Put<!--?php similar_posts(); ?--> at the place in your theme files where you want the list of similar posts to appear. Lorelle on WordPress has a good guide to modifying themes for plugins.If you are averse to editing template files you can also place the post listing automatically either as a widget in the sidebar of your widget-aware theme or after each post (from the plugin’s Placement submenu).
  6. Use the admin Settings|Similar Posts pages to set all the available options. Alternatively, the options can be overridden by passing a parameter to the similar_posts template tag.

Usage and Options

The configuration page will help you to set up the plugin to your satisfaction.

The Index Management Page

Using this settings subpage you can re-index your blog. There are two main settings which affect the indexing.

PHP is, by default, not very good at handling text that isn’t in English and you might find Similar Posts mangles extended characters. If so, you can get the plugin to use PHP multi-byte string library if it is available.

The second setting attempts to handle words with related meanings. For example, ‘animal’ and ‘animals’ should probably not count as two distinct words, nor ‘follow’, ‘follows’, ‘following’, etc. You can choose to build the index using a stemming algorithm that groups such words as one (if there is one available for your language) or you can try the fuzzy matching algorithm. Whether it is better to be strict or to be relaxed will depend on your website.

A third setting is for blogs written mainly in Chinese, Korean, or Japanese. The MySQL fulltext index used by Similar Posts has problems with these languages but this setting tries several ways to work around the issues. The setting currently only works when posts are encoded as UTF-8. I would be very glad to get opinions from users familiar with these languages.

To avoid excessive memory use the indexing routine processes posts in batches of 100. This figure can be reduced to shrink the memory consumption even further.

Language Issues

The underlying MySQL full-text indexing is obviously very locale-dependent — how words are divided or punctuation handled, what words are treated as noise, etc. all vary from language to language. For the Similar Posts plugin to work well the version of MySQL on your server must be properly setup in the appropriate language.

Similar Posts generates the terms it matches on by analysing the word frequency of a post while ignoring the most common ‘noise’ words — in English, words like ‘of’, ‘and’, ‘across’, ‘someone’, etc. It uses a so-called ‘stop list’ of common English words to ignore. In fact it uses the stop list a standard English installation of MySQL uses. Obviously this list will be useless for other languages so Similar Posts makes the stop list pluggable.

The Similar_Posts folder contains a subfolder, ‘languages’, with stop lists and stemmers for a German, English, French, Spanish, and Italian. The plugin checks the WPLANG constant (defined in wp-config.php) to see which language WordPress is using and looks for a file on that basis. If WPLANG is undefined or the appropriate file cannot be found the default English list is used.

If you are looking for help setting up a stop list in a language other than English a good resource can be found at http://www.ranks.nl/stopwords. Stemmers in PHP are harder to come by. You can work out how to adapt any you find by inspecting the provided stemming files

414 replies on “Similar Posts”

  1. Naveed Taj Ghouri: Thank you for your comment. I have just visited your site and I think I might know the problem: the text in your posts seems to be presented as images. Is that correct? If it is then Similar Posts would have no text to index. It should however be able to give matches based on your tags and titles and if it is giving a fatal error I need to fix that.

    Can you submit a bug report from the Similar Posts settings page? It will give me a little extra information that might help. Thank you.

  2. gero: You are right! I investigated and found a bug in the bulk indexing of tags that isn’t present in the indexing of individual posts. When you save a post it gets reindexed and the matches suddenly get better. I have fixed the bug and will be posting an update shortly. Thanks for finding the problem!

  3. nonhocapito: I can understand how irritating that must be. It feels a bit bit technical to have as an ordinary option. Would it satisfy you to have it as a ‘hidden’ option you can get at from the global options.php?

  4. Tony: As it stands, Similar Posts needs to have a post to find similarities to, so it can’t be done. Since searches tend to be on short phrases or a few keywords my first thought is that similarity might not work very well — a simple site search might be better.

    What do you think? If the idea has some merit I’d be glad to look into implementing it.

  5. Rob, thanks for the great plugin. I am using it to show 5 titles of similar posts on my single post page. Is there anyway to have the output include rel="nofollow" so that the page rank from the post is returned only to the home page?

    Thanks,
    Jeremy

  6. Jeremy: You just have to build the link yourself, i.e., instead of
    <li>{link}</li>

    use

    <li><a href="{url}" rel="nofollow">{title}</a></li>

    You can also include a title attribute if you wish.

  7. Hello Rob,
    just installed version 2.5b27 and rebuild the index.
    Now the similar posts really look great again! Thanks for the bugfix.

    My advice is, that everybody should rebuild the index (options -> manage the index), to get better results.

    Greetings
    gero

  8. gero: I spotted another bug — if you use the extended character option — and uploaded v2.5b28 to fix it.

    The new version also contains an experiment in handling Chinese, Korean, and Japanese text.

  9. Hi Rob,

    I just upgraded both WP and Similar Posts and I have 2 questions.

    1) I have enabled the sort order on date-descending, but the output does not display sorted that way.

    2) I understand that the similarterms custom field is now in a table. I used to edit that field frequently to improve results. Is there no way to do that now? Could an edit feature be added? Tagging is not really a comparable replacement for editing terms. For one thing, you can’t remove terms that way.

    Thanks for the great plugin. I hope you can help with these issues.

  10. Rob, I can get the string.

    $uri = $_SERVER['REQUEST_URI'];
    $uri = rtrim($uri, '/');
    $new_string = str_replace("/", " ", $uri );
    $final_string = str_replace("-", " ", $new_string);
    

    … and I can pass the string to a WordPress search, albeit, it is not automatic.

    <form method="get" id="searchform" action="http://mysite.us/">
    <div>
    <input type="text" value="<? echo $final_string; ?>" name="s" id="s" >
    <input type="submit" id="searchsubmit" value="Search" >
    </div>
    </form>

    However, I would like to pass along the string to the similar posts plugin to return “Posts that you might have been looking for ,,,” on the 404 page itself.

    What do you think?

  11. Hello Rob,
    my blog is in german (with umlauts) but I never used the extended character option. I never noticed any big differences in the results. I switched it on now and yes, I must say that after reindexing, the results got again a bit better.
    I love your plugin very much. It is quite amazing, how it finds similar posts without making any manual connections. Great!

    gero

  12. Hi again Rob.

    I figured out the sort problem on my own (I added ‘raw’ to the date format), but I would still like to know if anything can be done about editing the terms.

    I was able to significantly improve the results by manipulating the similarterms in the custom field. I already miss not being able to do that anymore.

    (I mistakenly posted this in comments on a different post on your site. Sorry for the repeat)

  13. Tony: You’d have to hack the function sp_terms_to_match. Right now it takes the ID of the current post and gathers terms to put into the search. You could instead inject your own terms.

    I’ll be interested to see what you come up with!

  14. Mark @ News Corpse: I missed the first comment somehow. I’m glad you figured it out.

    There’s now way to edit the terms now as they are generated on the fly. Does the use of tags not help?

  15. Mark @ News Corpse: Sorry Mark. I just found your original comment in spam and brought it back to life. Please discount my mention of tagging above!

    The table you mention doesn’t exactly store the ‘similar terms’ — rather it stores a pre-processed representation of post content, title, and tags. When it comes to query time, that representation for the current post is further processed to generate the ‘similar terms’ which are then compared with the representations for every other post. Editing the table is possible but you would be editing both the search terms (if they were judged significant) and the text to be searched.

    If you want to experiment load up phpMyAdmin (the wp plugin is very helpful) and see if editing does what you want it to. I’d be interested to know the answer.

  16. Thanks for your response (and for finding my lost question).

    If I understand you, it seems the whole method of operation has changed from previous versions. But let me give you some examples of why access to search terms was useful for me.

    1) There were often multiple duplications of words in similarterms. I sometimes saw a word repeated 7 or 8 times so I would remove some of the duplicates.

    2) The plugin would sometimes pick up words that were not relevant. For instance, in a post about the cost of groceries, the sentence: “My sister paid $3.00 for a quart of milk,” might index milk, quart, paid, and sister. But “sister” isn’t useful for similarity in this context. Plus it could match it with another post about a nun. So I would remove “sister” and get better matches.

    3) I could add terms that might not even be in the post so that it would use them to match other posts that I knew also had those terms. I wouldn’t want to clutter up my tags with these sort of words.

    I know similarity matching is part science, part magic, and nothing will be perfect. You’ve done a damn nice job of it. One idea I had is to have the plugin produce a display of what terms it’s indexing. Then allow a custom field for the user to input additional search terms and terms to exclude (preceded by a minus sign). Then the user could fiddle with the results without having to leave the post to edit the DB.

    Does any of this make sense?

  17. One more thing. In the previous version I noticed that the plugin would set the search terms when WordPress autosaved the post – even if I was not finished writing it. So, many terms didn’t get into the index unless I added them later. However, if I wrote the whole post without entering a title, this would not happen because it didn’t get autosaved until the title was entered. I guess that’s a WP thing, but it definitely affected the plugin’s choice of search terms.

    Does that play into the current version of Similar Posts in any way?

  18. Mark: Last point first: yes WordPress brought in autosave after Similar Posts had hooked into the save function and since I was trying to preserve the custom field once it was saved it got messy.

    1) The duplications were in fact a way of getting MySQL to give extra weight to the most frequent terms

    2) and 3) are very valid points but they rely upon a once and for all indexing of the posts and I have tried to shift to a more dynamic model where you can experiment with different approaches to index and term extraction and the relative weights of different factors.

    The idea of a separate scheme to add or subtract terms is a possibility I will bear in mind–especially if I can find a way that doesn’t impact performance too much.

  19. Thanks. I agree that a dynamic model has some attractive benefits. And overall, a plugin that requires no intervention from the user is the ideal. But I just think that’s fairly impossible and that the ability to fine tune results is useful.

    I had another idea: What if words in the post could be tagged so that the plugin would include/exclude them? Something like include this and exclude that

    You wouldn’t need a custom field for this and it wouldn’t be any more difficult than using italics. It could even be added to the quicktags.js.

  20. Rob,
    2 quick things. The first is, since the update just before this one, Similar posts is including the current post as one that is similar – to itself. In one case tonight, it presented the current post as similar to itself twice in the same post.

    The other thing is, I apologize for confusing the thumbnail issue. what I am hoping for is the ability to specify the image for Similar Posts to use if one is provided in the post. This way, the you tube videos and any other posts without images will have a thumbnail.
    many thanks,
    Sue

  21. Mark @ News Corpse: That’s a good idea! The only catch is what happens to the added markup if — heaven forbid — you should ever abandon Similar Posts. I’ll give it some more thought. Thanks for being persistent.

  22. Hello Rob,

    I have set the relative importance as follows:
    content:0 % title:100 % tags:0 %
    hoping to get exact match of post titles(which will be names of restaurants in my blog) but it doesn’t seem to work.

    I have chosen “Show nothing if no matches” but irrelevant posts are listed like in these two posts:
    http://www.potatomato.com/seat/2008/04/28/jean-paul-hvin-2/
    http://www.potatomato.com/seat/2008/05/06/chez-inno/

    While these two posts have the same titles but no “similar posts” are listed.
    http://www.potatomato.com/seat/2007/09/28/la-rochelle-minami-aoyama/
    http://www.potatomato.com/seat/2007/09/04/la-rochelle-minami-aoyama-2/

    I have set “treat as Chinese, Korean, or Japanese?” to yes.

  23. seat: Mmm food!

    Similarity is judged according to MySQL’s complicated word-based algorithm, so exact matches are always unreliable. Add in the difficulty Japanese presents to MySQL and all bets are off!

    The current attempt to work with C/J/K is very much a first try. It treats individual characters as if they were ‘words’ — a real hack. The next version which will appear soon is based on digrams which seems to get good results in the literature on similarity but, while better, is not likely to give you exact matches.

    I will be very interested to get feedback from you to make this work better.

  24. Thank you for the quick response. So getting an exact match is no easy feat… I look forward to the new version. Keep up the good work! 🙂

  25. Hey,

    Very nice plugin!

    I was having issues with it lately though, new posts wouldn’t have any related posts found, and the server logs be filled with database errors from WordPress.

    Apparently you have misplaced a closing parenthesis on line 71 when determining whether there are posts to exclude or not, and when there’s not then $exclude_posts is still set to true, resulting in a query with a ” ID NOT IN ( ) ” which of course gets an SQL error.

    
    $exclude_posts = (trim($options['excluded_posts'] !== ''));
    

    while you should have this (same as with $include_posts) :
    
    $exclude_posts = (trim($options['excluded_posts']) !== '');
    

    Thanks again for the great work!
    -fred

  26. hmm… things got mesed in in the code block it seems.
    Anyways, here it goes again, hopefully it will work this time:
    $exclude_posts = (trim($options[‘excluded_posts’] !== ”));
    I think should be this:
    $exclude_posts = (trim($options[‘excluded_posts’]) !== ”);

  27. We are upgrading our blog to 2.5 right now (yet to be deployed) and I must say… you are the best plugin author I have come across, hands down. You’ve thought of everything and make it so easy to make adjustments and tweaks. I’m wishing I had an extra thumb so I could give you 3 thumbs up rather than just two.

    Thanks!

  28. hi there. very cool plugin. thanks so much for your efforts!

    I’m a newbie and am embarrassed to ask.

    To work this plug in, I just Install it (Had the designer do it) an then put this code IN my posts as I write.

    Is that correct?

    the new site, nearly live where we are using your stuff is here:

    http://bradmo.theblogstudio.com/

    THANKS!

    Brad

  29. brad montgomery: It looks like your code got eaten by the comment box so I can’t really say…

    It all depends where you want the similar posts listing to appear. If you want it in a regular spot on all pages of a particular kind then it is best to put it in the theme files. But if you want it appearing inside some posts and not others then you can insert the code <!--SimilarPosts--> where you wish and, as long as the option is turned on, you should see the list.

  30. Hi,nice plugin, but I’m interested if I can show instead of related post titles the related posts custom field.

  31. No matches with updated similar posts plugin. I tried reinstalling without effect. Next step is working with php?????
    I guess the updated plugin is no longer working for me in that case. Too bad. It was a good one for awhile.

  32. Hi, Rob!

    I use next output template:
    {link} – {snipet:50}

    Every post contains the same first part like this:
    “This is post about something
    #Unique part

    How I can delete not_unique part of post (“This is post…”) or any other words from snipets?

    p.s. may be something like {snipet:50;delete:”this is post”,…)

  33. Hi!

    The plugin works fine, but my category name filter doesn’t apply to it. My filter hooks to single_cat_title, list_cats, get_category and the_category, but still the category title comes out unfiltered. Why’s that and what can I do to fix it?

  34. Mikko: Are you willing to try something for me? Open the post-plugin-library/output_tags.php file and find the function otf_categorynames. Replace the return $value with

    return apply_filters('the_category', $value, $ext, '');
    
    . You can do the same with otf_categorylinks too.

    Please let me know if that allows your category name filter to work properly.

  35. Thanks, that does the trick, except that I run into problems with my filter – but that’s my problem, and I can try to solve it now.

    You see, I have names as categories, and the names are formatted “lastname, firstname”. For display purposes, I have a filter that flips them “firstname lastname”. Getting that to work with multiple categories is tricky; this is something that was much easier in Movable Type. But I’m learning here…

  36. I switched from category names to category links (which is what I actually wanted in the first place), and that fixed things for me, everything works now as I want. As soon as I get my tags imported from MT, it’ll be interesting to see how the similarities come up – I expect an improvement, as my MT similar posts feature matched only tags. Even without the tags the results are pretty good. Thanks for the great plugin, I really like the flexibility!

    That change would probably make sense for general purposes, though.

  37. Rob, I’m getting the following error
    Fatal error: Call to undefined function ppl_display_status() in /home/.heebee/maland/mywartremover.com/wp-content/plugins/similar-posts/similar-posts-admin.php on line 74

    i have tried uninstalling and re-installing, deleting files from server and reuploading & and everything else I could see in this comment thread, can you give me any advice?

    thanks

  38. Heath: My apologies — you received a development version from trunk rather than the latest stable version. The links have been fixed. Please download again and things should be OK.

Comments are closed.