Forum Moderators: open

Message Too Old, No Replies

Thoughts on setting up a new PM system

         

csdude55

4:47 am on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a current Private Message system, hand rolled. I built it sometime around 2004, and even though I've made several cosmetic updates since then, there's been no major backend overhaul.

I want to change things up, make the system a little more modern. And I figure that now's the time to work on performance improvements, too :-)

My current system
I have one table for messages, with columns:

id
username
folder
status (ENUM('New', NULL)
postdate
sender
subject
message

I have a second table for images, with columns:

id
sorter (0-9)
image

For the list of messages, this is the query in PHP:

SELECT private_messages.id, username, folder, status, postdate, sender, subject, message, pm_img.image
FROM private_messages
LEFT JOIN pm_img
ON
pm_img.id = private_messages.id AND
pm_img.sorter = 0
WHERE
username = '%s' AND
folder = '%s'
ORDER BY postdate DESC
LIMIT 21;


Then when they view the message, it's simply:

SELECT * FROM private_messages WHERE id='%s' LIMIT 1



My updated plan
I plan to eliminate "subject" entirely, and group together all messages from the same sender so that it's more like a text chain.

I think that I can eliminate the JOIN altogether, too; no need to spend resources just to show an image of a paperclip to denote an attachment if nobody really cares anymore.

My options that I'm considering are:

Option 1. Modify the existing list query to something like:

SELECT id, status, postdate, sender, message
FROM private_messages
WHERE
username = '%s' AND
(folder = '%s' or folder = 'sent')
GROUP BY sender
ORDER BY postdate DESC
LIMIT 21;


That doesn't quite work right (it's not including all of the senders), but you get the idea.

Option 2. Create a second table that stores the info for the most recent message sent per user, and only query it for the list:

# would have a UNIQUE on username, folder, and sender
SELECT id, status, postdate, sender, message
FROM pm_list
WHERE
username = '%s' AND
(folder = '%s' or folder = 'sent')
ORDER BY postdate DESC
LIMIT 21;


Then I could eliminate "status" from private_messages.

The second option should process faster, since "list" would have a much smaller table (and index) to query, and I could eliminate some data and one index from private_messages.

BUT, in the rare case that I have to moderate something (eg, a scammer comes on and sends 1,000 messages while I'm asleep) it would be a lot harder to clean up. Right now all I have to do is delete the rows from private_messages, but if I use option (2) then I'd have to delete them AND update pm_list for each user.

Which would you do? Or is there a third option I should consider?

NickMNS

2:47 pm on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Option 2. Create a second table that stores the info for the most recent message sent per user, and only query it for the list:

Don't use SQL for this. Use something like Redis pub/sub.
[redis.io...]

The second option should process faster, since "list" would have a much smaller table (and index) to query,

Redis will push new messages that arrive in the queue to any user subscribed to the channel. Thus the "query" is eliminated completely. This typically requires an async pattern to work, so I'm not sure how that is handled with PHP but I'm certain that there is way to do it.

csdude55

6:26 pm on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well shoot, man. I had NEVER heard of Redis before, now I'm seeing that it's the most popular NoSQL database since like 2011?!

This is opening some doors, but I'm getting mixed information on whether it's going to be better or worse. And it seems like a TON of work to find out :-O But, down the rabbit hole I go...

Preliminary research makes me think that I should use MySQL to hold the messages themselves, and use Redis JUST in lieu of that new second table for the list. Would you agree? If so, I think I would do the same thing for my message boards.

Once concern is something I'm reading from several years ago, so I'm not sure if it's still applicable:

it is not recommended to use redis if the data you want to store in it exceeds the amount of ram in your server

[news.ycombinator.com...]

I'm paying for 8G of RAM (but I see they're giving me 10G), and as of right now I'm using 5G of it. Currently the entire private_messages table is over 10G. I don't think that this new table would be over 1G, but still... is this something I need to worry about, or has it improved since that article was written?

I'm also getting mixed information on redis.io on the pricing. It took some digging to find the pricing, but I finally found it:

[redis.com...]

The calculator shows that I need the "Flexible" plan of $0.881 /hour. Since my site is active 24/7, does that mean 24 x 30 = 720 hours /month, 720 x $0.881 = $634.32 /month? I... I don't see that one happening :-O

NickMNS

6:42 pm on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Preliminary research makes me think that I should use MySQL to hold the messages themselves, and use Redis JUST in lieu of that new second table for the list

This is what I do, mind you with MongoDB instead of mySQL.

I don't think that this new table would be over 1G

I would think not because the messages aren't really stored, they are queued while waiting to be read. Once read, the message is deleted from Redis, you need to be sure to keep a copy in your DB to save for history's sake. Alternately you can process the message as soon as it arrives, use it's arrival to trigger a notification to the user and then move the message to the DB. When the user goes to read the new message, you simply return the messages from the DB and the new one will be there. Like this the queue never really has time to grow unless there is a lot of traffic or the DB is down.

The calculator shows that I need the "Flexible" plan of $0.881 /hour.

There is a free open-source community version. What you describe is hosted/managed version.
[redis.io...]

csdude55

7:03 pm on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To be clear, then, wouldn't I still need the second table for the list in MySQL?

I love the idea of triggering the notification, I've been trying to find a good way to do that for YEARS! But I still need to have a system in place for users to view their full list of messages (on which they would click to read the PMs).

NickMNS

7:12 pm on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You still need the Database to store the messages. I'm not sure if this is done with your first or second table, but you need something.

The only real challenge I see for you is that to do this I use an async framework that uses redis' blocking feature that blocks a thread while it waits for messages. If you don't have asynchronous thread management Redis would block your main thread, essentially blocking your server. But you could switch to a polling system instead. I'm not sure what PHP offers in terms of async functionalities.

csdude55

7:25 pm on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What language are you using? I'm not opposed to learning something new, or to setting up a second VPS strictly for the PMs. Pushing notifications SHOULD increase my pageviews substantially, so if it's necessary then (in theory) it should be worth the expense.

csdude55

7:27 pm on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BTW, for future readers, I mentioned that my query in Option 1 didn't include all of the senders so it wasn't quite right. That original query:

SELECT id, status, postdate, sender, message
FROM private_messages
WHERE
username = '%s' AND
(folder = '%s' or folder = 'sent')
GROUP BY sender
ORDER BY postdate DESC
LIMIT 21;


I fixed the query, but it's SUPER slow so it's not practical:

# First modification, query speed 21.7974s
SELECT id, status, postdate, sender, message
FROM private_messages
WHERE id IN (
SELECT MAX(id) AS id
FROM private_messages
WHERE
username = '%s' AND
(folder = '%s' OR folder = 'sent')
GROUP BY sender
ORDER BY id DESC
)
ORDER BY postdate DESC

# Second modification, changed to INNER JOIN so that I could use LIMIT in the subquery
# This had no discernible impact on query speed, though
SELECT private_messages.id, status, postdate, sender, message
FROM private_messages
INNER JOIN (
SELECT MAX(id) AS id
FROM private_messages
WHERE
username = '%s' AND
(folder = '%s' OR folder = 'sent')
GROUP BY sender
ORDER BY id DESC
LIMIT 21
) AS two
ON private_messages.id = two.id
ORDER BY postdate DESC


I ran the subquery by itself, and it took 16.8s to complete! So MAX(id) is definitely the bottleneck.

My last thought would be to SELECT all rows that match (no LIMIT), then in PHP loop through all of them to eliminate older duplicates. I don't think that would be practical in my application, though, so at this point I think that I'm going to abandon the Option 1 idea entirely.

NickMNS

7:57 pm on Nov 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The system I have working is build using Python Quart Framework, which is an Async implementation of Flask. It uses some plain vanilla JS on the front end.

If I had to build it again I would build the entire front-end with React.js and use Quart to build a REST type server.

Note that if you are going to create a whole separate build and you don't need to use mySQL, I would then recommend MongoDB. The newer versions (>4 I believe) have some of the functionalities of Redis built in. You could likely build a simple messaging system without requiring Redis.

You can have a listener that listens for changes to a collections (table). So when a sender sends a message, it sends a request to the server endpoint /receive-msg/from/to, once the request is received it updates the collection with the message, this triggers the listener, that in turn triggers the notification to the user. Simple.
Download info here: [docs.mongodb.com...]
More details on how to here: [docs.mongodb.com...]