Forum Moderators: open

Message Too Old, No Replies

MySQL partitions

         

csdude55

5:52 pm on Nov 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm running MariaDB 10.3.32, which I understand is more or less identical to MySQL. Everything is MyISAM.

I'm working with a table for user's Private Messages. It is 2.5G in size (2.1G data, 433M index) and has 10.4 million rows.

One column is "folder", which is VARCHAR(25). I have 3 built-in folders ("inbox", "sent", "deleted"), but the user can create as many folders as they want to organize their messages.

Some users have a LOT of messages, which makes their initial SELECT query run slower than I'd like (more than 5 seconds). In an attempt to speed things up, @brotherhood_of_LAN led me to looking at partitions.

Would it be reasonable to create 3 partitions based on the "folder" column (one for inbox, one for sent, one for deleted), and then everything else would be in the original table without a separate partition?

If so, how would I do that? I've read this, but it's not clear on how to do it for text values:

[dev.mysql.com...]

mack

8:55 pm on Nov 18, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This may not be the ideal solution but by adding pagination into your output you can insert a "LIMIT" into your query thus speeding up the process. Selecting by limiting the number of results would greatly speed up each query.

Sorry if I am misunderstanding the question...

Mack.

csdude55

5:37 am on Nov 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've done that, and I honestly can't understand why some of the queries are so slow! For example, my username has 35,000 messages in it, with 8,366 in the "inbox", and here's the query that's in the slow log:

# Query_time: 6.225232 Lock_time: 0.000122 Rows_sent: 21 Rows_examined: 8366
# Rows_affected: 0 Bytes_sent: 23616
SET timestamp=1636605307;
SELECT private_messages.id, username, folder, status, postdate, sender, subject, message, pm_img.image
FROM private_messages
LEFT JOIN pm_img
ON pm_img.id = private_messages.id
WHERE username='example' AND
folder='inbox' AND
(status = "New" OR status = "" OR status IS NULL) AND
COALESCE(pm_img.sorter,0) = 0
ORDER BY postdate DESC
LIMIT 21;


EXPLAIN shows that it's using an index on private_messages that's on username, folder, status, and postdate, and an index on pm_img that's on id, sorter, and image (which is weird since I have an index on id and sorter that I THOUGHT it would use instead, but no).

I should also mention that pm_img is much smaller, only 9.9M and 375,000 rows.

Removing the JOIN didn't help, either; it still took over 6 seconds.

tangor

6:13 am on Nov 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



At some point databases become unwieldy. Have you looked at breaking the query down by YEARS (or months or whatever)? It is one thing to have historical archiving, it is another thing to have reasonable search for what "most folks" can remember ... which is "list week", "last month", "last year".

Just askin' ...

How this fits with partitions I have no clue, but I do know that targeted queries generally get better results.

csdude55

6:36 am on Nov 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't really thought about that, @tangor... maybe I should. I know it's not uncommon for some users to come back sporadically (mainly the ones that only use the classifieds or personals) and I don't want them to just LOSE messages with their old friends or customers. That's why I was thinking about partitioning based on the folder.

I could be totally wrong about this, but my thought process with partitioning is that it's more or less the same as if I had 4 tables:

pm_inbox
pm_sent
pm_deleted
pm_other

and then selected from the appropriate table based on the folder they selected. But partitioning it would make the query more automated rather than manually modifying the table in the query.

If that's correct then I could apply the same logic to other large tables (eg, partitioning forum posts by date to sort of auto-archive old posts).

But would it even help? I have no idea. In the query above it looks like it's finding 21 out of 8,366 and using indexes, so why does it take 6 seconds? Would eliminating rows outside of those 8,366 affect the query speed? I wouldn't think so, but this wouldn't be the first time that something makes no sense to me but works anyway! LOL

csdude55

7:25 pm on Nov 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't tested it yet, but I THINK that I can accomplish my goal using LIST COLUMNS / VALUES IN:

[dev.mysql.com...]

Another option will be harder since it requires modifying a lot of scripts, but might be faster? I already have a table for all of the folders created by users, so I could add a column for an autoincrement ID to it, ensuring that it starts with 0 for inbox, 1 for sent, and 2 for deleted. Then I could update private_messages.folder with the matching number instead of text, then change the column to an INT (probably SMALLINT or MEDIUMINT; I have about 4,000 folders created). If I do this then I could use LIST instead of LIST COLUMNS. Is that better? I have no clue.

Unless someone has already gone down this path and can give me some insight, my plan is to create a new table that's partitioned in such a way, copy data over 50,000 rows at a time over a few days (to minimize server load), then in a few days I can test and compare query speed times.

csdude55

7:14 am on Nov 20, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I started to partition the new table, then realized that my logic isn't going to work. Assuming that it would actually work without errors and I create 4 partitions with these VALUES IN:

inbox
sent
deleted
other

then I would have to have a value for all of the user defined folders of "other". Which means that I'd have to create another column for the name of the defined folder.

This leads me back to the more complicated idea of changing the folders to numbers, and then setting up the partitions like:

ALTER TABLE private_messages
PARTITION BY RANGE (folder) (
PARTITION inbox VALUES LESS THAN (1),
PARTITION sent VALUES LESS THAN (2),
PARTITION deleted VALUES LESS THAN (3),
PARTITION other VALUES LESS THAN MAXVALUE,
)


LESS THAN throws me off. I'm guessing that I would define "inbox" as 0, "sent" as 1, and "deleted" as 3, so "LESS THAN (1)" would only match 0?

If you can suggest a way to match EQUAL instead of LESS THAN, I'd love to know!

Either way, once it's done then I can modify the queries like:

// in PHP
$partition = 'other';

if ($folder == 'inbox' || $folder == 'sent' || $folder == 'deleted') {
$partition = $folder;
}

$query = sprintf("SELECT * FROM private_messages PARTITION (%s) WHERE username='%s'",
mysqli_real_escape_string($partition),
mysqli_real_escape_string('example')
);


In theory that should be a lot faster to process, but I guess we'll see! :-)

csdude55

8:02 pm on Nov 20, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Update:

I created a test table with partitions and copied all of the rows over where the "username" field was mine. So about 35,000 rows.

As far as I can tell, it had no noticeable impact on query speed. That's just from testing it once, though; after that it was caching the query, so I couldn't get an average or anything.

The bottleneck appears to be the joining with the other table, not the size of the primary table. I might post a new thread in the near future about the specific query, but first I'll push things around a little first and see what I can figure out on my own.

tangor

11:36 pm on Nov 20, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Keep us posted! My databases number in the tens of thousands, nothing like what you are wrangling.

csdude55

1:31 am on Nov 21, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What I THINK is happening is that the query is taking all 8,366 rows, using it to check pm_img.sorter, and then when it's done with all of the rows it returns the limit of 21. I only see it in the slow log when the user has a lot of messages in the folder, so that's the only explanation I can think of.

There has to be a better way of getting that data :-(

I tried this variation, but it didn't do any better:

SELECT private_messages.id, username, folder, status, postdate, sender, subject, message, pm_img.image
FROM private_messages
LEFT JOIN pm_img
ON
pm_img.id = private_messages.id AND
pm_img.sorter = 0
WHERE
username = 'example' AND
folder = 'inbox' AND
(status = "New" OR status = "" OR status IS NULL)
ORDER BY postdate DESC
LIMIT 21;


I really just need a boolean for pm_img.image, so I'm tempted to modify the whole table to add a boolean column instead of joining like that. That's going to be a huge pain, but it would be a LOT faster to process.

My second thought is to do a query for private_messages without a join, then loop through the results in PHP and create a second query for pm_img; something like this (typed for the example, not tested):

$pm_query = <<<EOF
SELECT private_messages.id, username, folder, status, postdate, sender, subject, message, pm_img.image
FROM private_messages
WHERE
username = 'example' AND
folder = 'inbox' AND
(status = "New" OR status = "" OR status IS NULL)
ORDER BY postdate DESC
EOF;

$results = mysqli_query($pm_query, $dbh);

$arr = [];
$img_query = false;

if ($results)
while (list($id, $username, $folder, $status, $postdate, $sender, $subject, $message) = mysqli_fetch_row($results)) {
// there's probably a better way to put all of the data in an associative array, but you get what I mean
$arr[$id]['username'] = $username;

if ($img_query) $img_query .= ' OR ';
$img_query .= 'id=' . $id;
}

if ($img_query)
$img_query = 'SELECT * FROM pm_img WHERE (' . $img_query . ') and sorter=0';

$images = mysqli_result($img_query, $dbh);

if ($images)
while (list($img_id, $sorter, $image) = mysqli_fetch_row($images)) {
$arr[$img_id]['id'] = 1;
}


That's going to require a modification to several scripts, but I think it would be faster than any of my other ideas.

I'd love to hear a simple option to fix the original query to be faster, though!