Scripture References for Homilies What Passes For Humour Among Philosophers

Similar Posts: WordPress Plugin

Print Version October 4th, 2006

Updated

13th January, 2007

Similar Posts Version 2.0.0 beta is now available. Largely rewritten, it includes better handling of extended character sets, some new options, and many more possible styles of display. Version 2 now has its own page where future developments will be documented.

Comments are now closed for this post but can be added to the new page.

  • Version 1.14 just fixes two bugs. The Posts plugin were not working if more than one was installed! Also some users were getting odd characters appearing where they shouldn’t.
  • Version 1.13 allows some parameters (like ‘before_title’) to be blank, or to be more complex, e.g., 'before_title=
  • . Also allows trimming an excerpt so it ends with a word or a sentence and not in mid-word. NB excerpt_length is now counted in characters and not words as previously.
  • Version 1.12 fixes a bug with the option to show static pages.
  • Version 1.11 improves the sanity checking of parameter values to avoid database errors. Also lets you skip over a number of posts if you so wish (though this makes more sense for the related Recent Posts plugin).
  • Version 1.10 adds the ability to exclude certain authors on a multi-author blog. The text to show if no matches are found is now customisable and there is a new option for displaying links. All the options can be set via the options page but new in this version they can also be specified via a query-style parameter. This gives the flexibility to use Similar Posts in several places with different behaviour. I have also built three new plugins which use the same infrastructure: Random Posts, Recent Posts, and Recent Comments.
  • Version 1.03 adds the ability to exclude static pages or certain categories of post. It also has an improved stopword list based on the one used by MySQL.
  • Version 1.02 is a bug-fix and security release. It protects against a vulnerability where stray characters in the matching terms could cause database errors. This update also fixes a potential naming conflict between the internal names of its options and those of other plugins and establishes sensible default values for these options. When you update the installation you should visit the options page and check the settings are to your liking.
  • Version 1.01 restores the ability to use keywords or otherwise tweak the terms used to find similar posts. It also allows you to import keywords previously assigned using the Related Posts plugin.
  • Version 1.00

Description

I’ve been looking for a better alternative to the Related Posts plugin for WordPress. At least better on a blog like mine … and maybe yours too. Related Posts finds matches to other posts using the post’s title or any keywords that you care to define. The titles of my posts unfortunately reflect their context rather than their content and I have too many of them to go back and decide on keywords for them all. So I wrote a new plugin, Similar Posts,using a different algorithm which finds related posts based on the contents of a post and the pattern of word-usage rather than just its title.

You can see it in operation in the ‘Related Reading’ section of my sidebar (on single-post pages).

Instructions

  1. Download the latest version of Similar Posts.
  2. Upload the whole plugin folder (Similar_Posts) to your /wp-content/plugins/ directory. (Similar Posts can be installed without uninstalling Related Posts if you want to try out the difference)
  3. Go to your Admin|Plugins page and activate Similar Posts. This will automatically add an index to your posts table to enable fast matching. Don’t be alarmed if this takes a few moments.
  4. Put 

    at the place in your WP loop where you want the list of similar posts to appear. By default the plugin wraps each post with

  5. and
  6.  

    but that can be changed. Use the Admin|Options|Similar Posts page to set how you want the list of posts displayed.

Acknowledgements

Similar Posts is based on Related Posts 2.02 by Alexander Malov and Mike Lu. I’ve also used some code from Rich Boakes and Ken Cheung.

Under the Hood

By default Similar Posts scans a post each time it needs to find words to match. For long posts this can add an unwanted overhead. The plugin can speed up the process by caching the search terms as a custom field (named ‘similarterms’). When a post is published for the first time or subsequently edited the custom field gets updated. Since you probably have a lot of posts you won’t want to edit each one manually to cache the terms. Instead, the Admin|Options|Similar Posts page lets you process all your posts in one go. It won’t overwrite any terms that are already cached so there is also a button to clear all terms.

While you are editing a post you can modify the ‘similarterms’ custom field to add keywords or replace the automatically generated terms altogether. Note that the field will not be visible until you have saved the post at least once. If you ever want to regenerate the default terms just delete the current set.

If you have previously used the Related Posts plugin to assign keywords to posts you can now import them all from the Options page.

The Similar_Posts folder contains a file, ‘en.words.php’, which is used to supply the ‘stop list’ of common words that you want to exclude as search terms. It is supplied this way so that if your blog is in a different language you can use an alternative stop list. Similar Posts checks the WPLANG constant to see which language WordPress is using and looks for a file on that basis. For example, if the the language code is ‘fr’ for French, Similar Posts will look for a file ‘fr.words.php’. The language files must be in the same directory as similar-posts.php.

The way the list is displayed can be set from the Options|Similar Posts
page. You can exclude certain categories of post, for example, or
change the code that comes before and after the link.

These general options can be overridden in specific cases by passing a query-style parameter, e.g.:

<!--?php random_posts('limit=10'); ?-->
lists 10 random posts
<!--?php random_posts('none_text=sorry&#038;show_static=false'); ?-->
lists the default number of posts, excluding static pages, and specifies what to display if there are none

If you do not specify an option its value is taken from the options page.
This means you can use the template tag in different ways in different places.

The full list of parameters is as follows (with the default value in parentheses):

limit
maximum number of posts to show (5)
skip
how many posts to skip before listing (0)
show_static
include static pages (false)
show_private
include password-protected posts (false)
excluded_cats
comma separated list of categories to exclude (by ID) (9999, the default means none)
excluded_authors
comma separated list of authors to exclude (by ID) (9999, the default means none)
none_text
what to show if no posts match–can be plain text or a permalink
before_title
what to show before a link ()
after_title
what to show after a link( 

)

trim_before
remove the first instance of ‘before_title’ (false)
show_excerpt
include a snippet of the post after the link (false)
excerpt_length
how long an excerpt should be (50 characters)
excerpt_format
‘char’, the default, does nothing, ‘word’ trims the excerpt to the last full word, and ‘sent’ to the full sentence. If the excerpt would be trimmed to nothing no trimming is applied.
ellipsis
add ‘ …’ after the excerpt
before_excerpt
what to show before an excerpt ()
after_excerpt
what to show after an excerpt ( 

)

Feedback

If you try this plugin leave a comment here to let me know how you get on.

Entry Filed under: Programming,WP Plugins

89 Comments

  • 1. GaryP  |  October 7th, 2006 at 7:33 pm

    I have installed “Similar Posts 1.0″ and like the results. I have noticed what I think is a bug. It the ‘related words’ include a word with an appostrophe then it gives an SQL error.

    It does not happen with every post that has a single quote – but may just be because the word with the apostrophe is not one of the 20 words chosen.

    Example: http://www.tricityrealestatenews.com/archives/sundance-ridge/

  • 2. Rob  |  October 8th, 2006 at 11:59 am

    Gary: Have a look at version 1.02. I hope it fixes your problem. Please let me know.

  • 3. Orlando  |  October 9th, 2006 at 3:04 pm

    The version 1.02 is OK! This is really a useful blogging tool! I inserted the code in the comments template. Thank you!

  • 4. Rob  |  October 9th, 2006 at 3:19 pm

    Orlando: Thank you. I’m glad it works well for you. Did you find or create a set of common stop words in Portugese? If so I’d be glad to include them for download.

  • 5. Orlando  |  October 9th, 2006 at 11:30 pm

    Hi Rob, I’m working on a full and complete common stop words in Portugese. I’ll keep you posted.

  • 6. alex  |  October 13th, 2006 at 12:40 pm

    error, while downloading latest version,
    it has 187 bytes and contains:

    [root@50 plugins]# cat similar.zip

    Fatal error: Call to undefined function: filename() … [snipped]

  • 7. Rob  |  October 13th, 2006 at 2:14 pm

    alex: it should be back in operation again … I had installed a plugin to monitor downloads and it wasn’t working properly. Thanks for the heads up.

  • 8. Yazeed  |  October 14th, 2006 at 3:55 pm

    hi there
    i am having a bit of trouble figuring out where to put the code in
    i want the plugin to work on the post page and not in the main page, where should i insert the code to get that?

  • 9. Orlando  |  October 14th, 2006 at 9:28 pm

    @yazeed:
    It depends on yr theme. I think you are using the K2 theme. Put it in the SINGLE template just before the
    <?php get_sidebar(); ?>
    tag

  • 10. Upekshapriya  |  October 16th, 2006 at 11:32 am

    Thanks for the plugin – it also gives me a better list of related/similar posts compared with WordPress Related Entries 2.0.

    Just wondering if you might be able to include options in the admin panel for removing static pages and certain categories from display as Ken Cheung has described in his post Excluding Categories from the Related Posts Plugin?

  • 11. Rob  |  October 16th, 2006 at 11:49 am

    Upekshapriya: Thanks for the ideas. I’ll see if I can use them without slowing the plugin dramatically.

  • 12. Rob  |  October 16th, 2006 at 6:34 pm

    Upekshapriya: Have a look at the latest version. I have added the exclusions you suggested. Let me know if they work well for you. I found a slight problem with Ken Cheung’s code for excluding categories but I hope I’ve fixed it.

  • 13. Upekshapriya  |  October 16th, 2006 at 11:40 pm

    That’s brill. So much easier than fiddling around with the SQL in the code. A great improvement. Thanks very much.

  • 14. Rob  |  October 16th, 2006 at 11:53 pm

    Upekshapriya: Thanks. Let me know if you have any more good ideas…

  • 15. Becky  |  November 6th, 2006 at 2:33 am

    Brilliant. Much more accurate than related posts and better options.
    You saved me, thankyou :)
    One thing that would make it perfect though is the ability to have more fields to have apart from just title as a link.
    How about the options to choose from date of posting, author and number of comments?

  • 16. Rob  |  November 6th, 2006 at 9:00 am

    Thanks Becky. Those additions would certainly be possible. I’m a little concerned about the performance hit they might cause. I’ll look into the possibility.

  • 17. Becky  |  November 6th, 2006 at 11:26 am

    Thats a good point. I have no idea about coding and tbh installing your plugin was the first time I ever touched the loop. I was too scared before. Now I am not scared of it. lol

    Just thinking about it some more, I think really the only useful info would be the date of posting, anything else can be seen on the post page. But even then all that is really not necessary, its quite perfect as it is :)

  • 18. Jonas  |  November 12th, 2006 at 3:48 pm

    Awesome plugin! Like it a lot… Just installed it on http://www.myuninstalledlife.com.

    What about a new feature with “Most popular posts” and maybe with the function to filter on certain category etc.?

  • 19. Rob  |  November 12th, 2006 at 4:12 pm

    Jonas: Thanks! I’m always interested to get suggestions for new features. I guess in this case it all depends what you mean by popular. Do you mean most viewed–which would be hard to implement? Or most commented which would be easy? Or something else altogether? Let me know…

  • 20. Kevin Donahue  |  November 18th, 2006 at 5:51 pm

    BRILLIANT!!! Exactly what I wanted. Thank you!

  • 21. Rob  |  November 18th, 2006 at 6:19 pm

    Kevin: Thank you!

  • 22. Zerobae  |  November 23rd, 2006 at 5:04 pm

    Thank you, this is a very useful plug-in.

    I second the request for a pollibility to add the date of posting to the links.

    Cheers,
    Zerobae

  • 23. Zerobae  |  November 24th, 2006 at 11:13 am

    Yeah, and the word is possibility. Sorry…

  • 24. Rob  |  November 25th, 2006 at 9:16 am

    The possibility is well on the way to being a reality–it is working on my test site–but I’ve implemented it as part of a major overhaul of how the plugin displays and formats it’s output so it will be a little while longer before I post it.

  • 25. Zerobae  |  November 25th, 2006 at 10:22 am

    Cool. Thank you!

  • 26. JH  |  November 25th, 2006 at 4:19 pm

    How can I change the wrapping in the Admin Panel that will result with the html code below? Thanks.

    
    <li>Post Title <br/>
    Excerpt</li>
    <li>Post Title <br/>
    Excerpt</li>
    ....
    

  • 27. Rob  |  November 25th, 2006 at 5:33 pm

    JH: You can’t right now but you will be able to in the next version which I am working on at the moment.

    You could get a similar effect by using a definition list instead of ordinary lists. Wrap the title in dt tags and the excerpt in dd tags (the options page makes that suggestion) and play with the css if you need to.

    Good luck.

  • 28. JH  |  November 25th, 2006 at 5:46 pm

    Rob,

    Thanks. I’ll look forward to it.

    Another suggestion is a field between Post Title and Excerpt fields where we can insert template tags like:

    
    <?php the_date(); >
    <?php comment_number(); >
    ...
    

    Thanks.

  • 29. JH  |  November 28th, 2006 at 4:39 am

    On field,

    What to display if no posts can be found ? It could, e.g., be a simple message or a hyperlink to a favourite post.:

    can I add php code?

  • 30. Rob  |  November 28th, 2006 at 2:06 pm

    JH: No, whatever you use will simply be echoed as text. I have no idea what might happen if you have a plugin installed which allows you to run php code.

  • 31. Jonas  |  November 30th, 2006 at 7:26 pm

    I was more looking into most popular for most visited. I guess this is not built-into WordPress that it calculates the number of visited? Otherwise I don’t know how to figure out most popular since most commented doesn’t always mean most popular?

  • 32. Rob  |  December 2nd, 2006 at 3:22 pm

    Jonas: I was afraid of that! I can’t see a way to access the most visited posts without using some kind of statistics code/plugin to keep track of that extra information. It wouldn’t be hard to do in itself but it would only cover the period after you started counting.

    Maybe that’s the kind of data that makes sense anyway–the most visited posts in a particular time-frame–but it would involve storing timestamp data for every post access. I don’t like the sound of that since I’m already overflowing my database quota…

    Opinions anyone?

  • 33. RA  |  December 3rd, 2006 at 11:00 am

    Thanks for this great plugin. It has worked very well, and I just love finding such great tools on the web from individuals like yourself. Very helpful!

    I am wondering if there is any way to see a full list of words which were used on any given post to match other posts to it. For instance, if I click on post number 10, and similar posts shows that post 34, 24, and 14 are related to 10, I would like to be able to see what keywords it determined were relevant to post number 10, so I can better weed out the most common keywords. Right now, I am finding the results to be too broad. I’m guessing I need to add more stop words, but I have nothing to go by.

  • 34. Rob  |  December 3rd, 2006 at 11:30 am

    RA: Thanks for your comment. The words that are used to make the match are stored in a custom field called ‘similarterms’. You can view it on the write posts or edit posts page by making the custom fields block visible.

    If you want to add words to the list (or remove them) you can just type them in and then press ‘Update’. Pressing ‘Delete’ will get the plugin to regenerate the default set.

    Let me know if that helps.

  • 35. RA  |  December 3rd, 2006 at 3:29 pm

    Perfect! That makes this plugin all the more useful. Now it’s time to roll up the sleeves and apply a bit of elbow grease…

  • 36. RA  |  December 3rd, 2006 at 3:45 pm

    Another quick question…how many “common” terms are necessary before a match happens? Is it simply one common word, or a number of them?

  • 37. Rob  |  December 3rd, 2006 at 6:06 pm

    RA: For the lowdown you could read the documentation for MySQL full text indexing!

    Basically MySQL scores every post for how well it matches the query terms, something it call ‘relevance’. The manual says:

    Relevance is computed based on the number of words in the row, the number of unique words in that row, the total number of words in the collection, and the number of documents (rows) that contain a particular word.

    Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Consequently, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row.

    I hope that helps!

  • 38. Tin  |  December 5th, 2006 at 2:19 pm

    hi, this is a much more robust plugin than the related post plugin. i have a question though. is it possible to display the dates of the related entries instead of their titles? and how would i go about doing that?

  • 39. Rob  |  December 5th, 2006 at 3:25 pm

    Tin: I’m working on a new version which will give you the ability to customise the output in the way you describe as well as and many others. I hope it won’t be too long before it is ready.

  • 40. Tin  |  December 5th, 2006 at 4:38 pm

    hi rob, that is great news. i will be looking forward to it!

  • 41. Mr Papa  |  December 5th, 2006 at 7:38 pm

    Hey Rob, nice plugin…

    I am having an issue with the latest version though. I am getting a few characters of gibberish before the link to the first similar post. This only happens in the most recent post. It is happening on four different sites. You can see one via the url linked to my name on this comment.

    I was running version 1.10 with no issues and this occurred when I upgrade to the latest 1.13 version. Maybe I messed up something…

    Thanks…

    Mr Papa

  • 42. Rob  |  December 6th, 2006 at 12:52 am

    Mr Papa: That’s very mysterious. I’ll see what I can discover…

  • 43. Rob  |  December 6th, 2006 at 12:54 am

    Mr Papa: Do you have the same problem if excerpts are turned off?

  • 44. Mr Papa  |  December 6th, 2006 at 1:11 am

    yes, it is still there with comments off…

    If I a make a new post, it gets fixed on the old post, but the new post now exhibits the problem…

    I have played with a bunch of other options combinations also with no luck…

    Mr Papa

  • 45. Mr Papa  |  December 6th, 2006 at 1:14 am

    oopss.. meant excerpt off…

    Mr Papa

  • 46. Tin  |  December 6th, 2006 at 1:28 am

    hi rob, i am getting this error:
    Problem creating the full text index for Similar Posts. Please check the instructions on how to create the index manually.
    Warning: Cannot modify header information – headers already sent by (output started at /home/ragmusco/public_html/blog/wp-includes/wp-l10n.php:43) in /home/ragmusco/public_html/blog/wp-includes/pluggable-functions.php on line 269

    Any suggestions? Thanks

  • 47. Rob  |  December 6th, 2006 at 11:47 am

    Tin: You have had the plugin working OK before and now it is giving an index error? Has the index been deleted or corrupted? Can you check in PHPMyAdmin or somesuch? The spec for the index is in the readme.txt file.

    Have you installed any new plugins in between it working and not working?

    I’ve never seen that kind of error before. Thanks for letting me know and I hope we can track it down quickly.

  • 48. Tin  |  December 6th, 2006 at 2:34 pm

    hi Rob,

    i’ve used your plugin on one wordpress blog just fine this past week. i installed it on a second wordpress blog just recently and got that error. the error showed up when i activated the plugin for the first time on the 2nd blog.

    the PHPmyadmin shows a normal fulltext entry for similarpost. i don’t know if this has anything to do with conflicts with another plugin or not.

  • 49. Mr Papa  |  December 6th, 2006 at 2:53 pm

    Rob, that file you sent me fixed the funky characters showing up! Thanks man!

    Mr Papa

  • 50. Rob  |  December 6th, 2006 at 4:56 pm

    Update: A Version 1.14 is an interim fix of two bugs. In one, some users were getting spurious characters showing up before the post listing. The other bug meant that the Post plugins which were meant to work together in fact caused each other to fail. D’oh!

  • 51. Tin  |  December 7th, 2006 at 12:12 am

    Hi Rob,

    were you able to squeeze in the date output instead of title output feature?

    thanks

  • 52. Rob  |  December 7th, 2006 at 8:13 am

    Tin: Not in this release sorry. Soooooooooon.

  • 53. Tin  |  December 7th, 2006 at 12:08 pm

    thanks father rob. i had no idea you were a jesuit. i was very active in the vietnamese dong hanh movement, which recently joined CLC.

  • 54. eduardo  |  December 18th, 2006 at 9:37 pm

    Hey Rob,
    Thank you for the nice plugin.
    I went checking the similarterms custom fields, just to take a look at how good the word selection looked, and found out that there may be a i18n bug.
    Words that have special characters in them are chopped at the special character (sometimes before, sometimes after the character), and no special character ever shows up. Sometimes they appear as blank spaces, sometimes the rest of the word is cut off.
    The language in the blog is defined by the WP variable, and i use the UTF8 option in WP. I am running a local copy of my site for testing.

    I would love to use the plugin!
    Thank you for your attention.

  • 55. Rob  |  December 19th, 2006 at 11:22 am

    eduardo: I’ll look into it. Thanks.

  • 56. Rob  |  December 19th, 2006 at 3:13 pm

    eduardo: The problem is that PHP’s string functions do treat utf-8 encoded strings as though they were 8-bit characters. I use str_word_count to count the words in content and lots of accented characters are treated as separators. I think there is a way round this using regular expressions but it will probably be quite a bit slower. I’ll experiment.

  • 57. eduardo  |  December 19th, 2006 at 11:50 pm

    Thank you for your prompt answer, Rob. I will be waiting for the results!

  • 58. eduardo  |  December 20th, 2006 at 2:04 pm

    Hi again Rob,
    I am no programmer, but i went checikng the manual for str_word_count and found out that there was a third argument added on PHP5.1, which might make the plugin work with UTF8 characters. I could run only the plugin in PHP5, my server allows me to do that.
    Do you think it could work? There’s one chatch: when i look into my database with MySQLAdmin, i see characters like depicted as é , for example, even tough my WP settings are for UTF8.

  • 59. Rob  |  December 20th, 2006 at 6:38 pm

    eduardo: I believe that MySQL itself handles utf-8 but that it’s the admin program which seems to show them incorrectly.

    As for, the extra argument in str_word_count … give it a try if you like and let me know. It seems you have to specify the ‘extra’ characters. Since PHP 5 is not available on many hosts I wouldn’t consider it as a general solution.

    I’m trying using regular expressions.

  • 60. Dennis  |  December 21st, 2006 at 9:46 am

    Can you add a parameter to only search simular post in a specific cateogry. Something like

    That would really help me!

  • 61. Dennis  |  December 22nd, 2006 at 7:45 am

    Sorry php was deleted.

    Al little addition like: ‘categorysame=true’); could make the software look for sim. posts but only if there in the same category as the current one.

  • 62. Rob  |  December 22nd, 2006 at 8:05 am

    Dennis: sounds interesting idea.

  • 63. Ahamed  |  December 22nd, 2006 at 8:12 pm

    Great job, this is what I was exactly looking for. There are few others but not as good as this. Please consider to use a keyword of “related posts” and “related articles” which is commonly used than “similar” so many people will find your plug-in.

  • 64. Rob  |  December 23rd, 2006 at 11:59 pm

    Ahamed: Thank you and thanks for the advice.

  • 65. Sorin Matei  |  December 28th, 2006 at 5:06 am

    Net superior to everything I’ve used in the past, including some software for content analysis. By the way, any intention to add a network analysis component to it (something like AutoMap-a conceptual network analysis program)?

  • 66. eduardo  |  December 30th, 2006 at 10:50 am

    Hey Rob,
    I tried the php5 thing, but it didn’t work, i got the same errors when i ran the php5 function under php4. I suppose it may be because WP is under php4 and i had only your plugin under php5.
    Anyway, this is getting a little too complex for me, i will just wait for your regex solution.
    Thank you for your attention!

  • 67. Rob  |  December 30th, 2006 at 12:44 pm

    eduardo: A new version should be out within days. I’m discovering how much slower development is when you have four related plugins sharing a lot of code!

  • 68. Dennis  |  January 9th, 2007 at 8:41 pm

    Is there a way to use the word from simular posts in the metadate keywords… that would be handy… Since it does it work… just need some , between the words then.

  • 69. bearded  |  January 10th, 2007 at 8:03 am

    I have installed Similar Posts Plugin v2.0.4 in wp2.0.5. My site is bi-lingual (English & myLanguage). Similar Posts displays well with English Posts but with myLanguage (UNICODE) posts don’t display any similar posts. I checked ‘similarterms’ box in edit section of the post, but it does not display any term. Even I put some terms mannually and updated the post. But no result!! Will you please suggest any correction in script for my expected results?

  • 70. Rob  |  January 11th, 2007 at 12:14 am

    bearded: In the next day or so I’ll be posting a new version of the plugin which has an option to handle extended characters more adequately.

  • 71. Tin  |  January 11th, 2007 at 9:49 am

    do you think it’s best to release a new version soon considering that wp 2.1 is coming out on the 22nd?

  • 72. Rob  |  January 11th, 2007 at 1:31 pm

    Tin: The plugins are working fine under the 2.1 betas so I’m hoping there’ll be no problems.

  • 73. Rob  |  January 13th, 2007 at 11:11 am


Calendar

October 2006
M T W T F S S
« Sep   Nov »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Related Reading

Random Reading

Most Recent Posts