Forum Moderators: open

Message Too Old, No Replies

Unique ID number or username?

         

csdude55

10:29 pm on Apr 20, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've had a username database for a long time, and it has about 250,000 rows.

I'm splitting it up in to 3 separate tables: one with registration data, one with optional profile information (name, gender, etc), and one with a list of photos (one row per photo, so if each user uploads 2 photos then it would end up being 500,000 rows; if each user uploads 10 then it would be 2.5 million rows).

I currently have the "username" column as a PRIMARY.

By splitting them out, though, I'm not sure if there's a significant value to assigning an autoincrement ID# to each username, then joining the tables based on that ID instead of by the username.

Moderating on the backend would be marginally more complicated; eg, I would have to find someone's ID# before I could appropriately edit their data under the image table (unless I include the username in each table, too, I guess).

But would it make the general queries faster? Or the index / data smaller?

What do you guys think, does the benefit outweigh the inconvenience?

csdude55

10:29 pm on Apr 20, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is in MySQL... the thread kept getting a 403 Forbidden error and I took that out of the subject while trying to get it to post. I think the error came when I said the username field is a var char.

tangor

10:38 pm on Apr 20, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For ordinary service unique ID is preferred---that's why they are termed "unique". Users (real names) can be replicated, unless you require those to be unique as well, in which case, problem solved. :)

User names are already unique, just keep it that way.

250K entries is not that intense ... is there a reason to break up the table?

csdude55

6:04 am on Apr 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In this case the username has to be unique, so it can't be replicated. I just wasn't sure if an integer would process faster or something.

The original table just had a single field for "image", and now I'm giving the user the ability to upload more than one photo; so that's getting moved to a separate table.

Then I figured that, while I'm at it, I might as well move the optional profile data, too. Right now about 200,000 of those rows have nothing in those fields, so it's a whole bunch of empty space for no reason. I only actually use any of it when someone views that user's profile page, anyway, so a single table with 50,000 rows would be easier to manage and, in theory, faster to load.

Once I get it all done I'm planning a large marketing campaign, so I'm HOPING that those 250,000 rows will at least double within the next year. Which still really isn't huge, but I'm trying to make everything nice and fast, JUST in case things blow up and I end up with a few million within the next few years :-D

graeme_p

12:29 pm on Apr 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think integer primary keys can give you better performance. If you Google a bit for more about surrogate primary keys and integer primary keys you will find some stuff on index fragmentation, and other issues but a lot of it is DB specific. There seems to be a lot of disagreement about this though!

It may not be relevant to you, but if its possible you may need to change usernames then using integer IDs will mean foreign keys will not need changing if you do.

LifeinAsia

7:03 pm on Apr 21, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



User names are already unique, just keep it that way.
You should be better performance with integer keys. Plus changing a user's name means having to change the data in multiple tables (whereas the ID wouldn't change- thus the reason for using something other than the username as a foreign key).

(unless I include the username in each table, too, I guess)
Ick! That defeats the purpose of normalizing your data into multiple tables and creates all sort of issues/

tangor

1:07 am on Apr 22, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We're talking about the difference between integer and alphanumeric unique, right? At what point (volume wise) does numeric perform better than alphanumeric unique?

And where does mysql fail in index performance?, a few billion or so?

Just something to think about.

graeme_p

1:12 pm on Apr 22, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@tangor, you are probably right. Worrying at this point is probably premature optimisation.

Ick! That defeats the purpose of normalizing your data into multiple tables and creates all sort of issues/


Yes, unless the username is included in other tables as a foreign key, and only as a foreign key.

LifeinAsia

4:31 pm on Apr 22, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Getting into the technical weeds...

It mostly comes down to storage and how many keys can fit into an index block. INT values require 4 bytes of storage- so if using a VAR CHAR key, more than 4 characters (which I'm assuming to be true in your case- 4-character usernames would be very user unfriendly) will require more storage, thus fewer keys being able to fit into each index block. In general, this means that more data has to be read into memory for searches/sorts.

Will it make a huge difference in your case at this point? Probably not much of a noticeable difference. (If it does, I'd say you have a problem elsewhere- either seriously lacking memory, CPU, and/or disk speed on the DB server.) If you have lots of pages with lots of user comments (each of which needing fields from the other tables), then it might start to make a difference.