Forum Moderators: open

Message Too Old, No Replies

MySQL, difference between IN BOOLEAN MODE and IN NATURAL LANGUAGE MODE

         

csdude55

8:01 pm on Mar 19, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have queries that look like this:

SELECT DISTINCT id, MATCH(column) AGAINST('%s' IN BOOLEAN MODE) AS score
FROM table
WHERE
MATCH(column) AGAINST ('%s' IN BOOLEAN MODE)
ORDER BY score DESC

I don't actually USE "score" in the results, though, it's just there for sorting.

But in retrospect, I'm not really sure why I'm using "IN BOOLEAN MODE". I realize that I could do this and get the same results (it sorts by "relevance" automatically):

SELECT DISTINCT id
FROM table
WHERE
MATCH(column) AGAINST ('%s' IN NATURAL LANGUAGE MODE)

I've read the docs on both, but I'm not really sure what the difference is between the two modes:

Natural Language Mode:
[dev.mysql.com...]

Boolean Mode:
[dev.mysql.com...]

I don't use InnoDB.

I know that the user can use + and - in Boolean Mode, as well as other operators, but I can't expect users to know that. And I'm not really sure that they would make it past mysqli_real_escape_string(), anyway. I guess that I could set up a form with the user to select options, and then plug them in myself?

I have no clue what the "50% threshold" is that it mentions, either.

What do you guys and gals think, which is better for a simple search form that users submit from PHP?

lammert

10:05 pm on Mar 19, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you don't use the boolean operators, then Natural and Boolean mode are roughly the same but don't necessarily produce the same results. Natural mode, for example, excludes words in the search query which match a pre-compiled list, while Boolean mode excludes words that appear in at least 50% of the rows. If texts in your database are in another language than English, Boolean mode may, therefore, give more predictive results because due to the 50% rule it automatically ignores stop words in foreign languages.