I originally set up this structure about 8 years ago, moving PMs from flat-text to MySQL. That setup had 2 rows for every PM sent:
id
username
sender
folder
status ('New' or null)
postdate
subject
message
So if "Joe" sent a message to "Angie" then it would insert like this:
INSERT INTO pm VALUES
(NULL, 'Joe', 'Angie', 'sent', NULL, 20181018213400, 'Test subject', 'Test message'),
(NULL, 'Angie', 'Joe', 'inbox', 'New', 20181018213400, 'Test subject', 'Test message')
The reason for doing 2 rows like that was because, every once in awhile, I'll have a suspected scammer or spam bot send a bunch of messages that I would want to remove from the Inbox, but wanted to leave in the "sent" folder until I was sure.
There's also another table for "usernames", of course.
After some discussion on here, I'm trying to change this system up a bit. I want to group messages between two users together, but still give both people the ability to create a new thread (Gmail style), as well as be able to move individual messages to different folders.
So the new structure will be to have two tables: "pm" and "pm_data". Table "pm" will have:
id (Auto Increment)
refid (reference id to pm_data)
username
sender
folder
status (0 or 1)
postdate
subject
Then "pm_data" will have:
refid (Auto Increment)
hash (UNIQUE, an MD5 of message)
message
Then when they view the message, the query would look like this:
SELECT * FROM pm
JOIN pm_data ON pm.id = pm_data.refid
WHERE username='%s' AND sender='%s' AND subject='%s'
ORDER BY postdate DESC
This method should cut the database size in about half, because there would only be one copy of the message instead of 2. Then the indexes should be faster, and in theory the whole system should load faster.
So. Beyond that, I'm considering 2 other modifications:
1. Move "subject" to a third table, and instead use a "group_id" column. I don't think I would need to make it UNIQUE like I do "pm_data", but then "pm" would have an INT column to reference the subject instead of having the subject in each row.
2. Someone else suggested giving each username an ID#, too, and then in "pm" I would store the numbers instead of the names.
If I did that then the query would look something like this:
SELECT * FROM pm
JOIN pm_data ON pm.id = pm_data.refid
JOIN pm_subject ON pm.subjectid = pm_subject.group_id
JOIN users_id ####
WHERE pm.userid='%s' AND pm.senderid='%s' AND pm.subjectid='%s'
ORDER BY postdate DESC
I'm not quite sure how to join the users_id table there, by the way, because I would need to get the username for both pm.userid and pm.senderid...
So the question here is, do you guys think that doing 3 JOINs like that would be inevitably faster than just the one? It would be marginally harder to maintain, but if it makes the index smaller and each page load faster then it could / should result in more pageviews per session, which would make it worthwhile.