Forum Moderators: open
I have a membership table with a registered email address for each record, say:-
Table: Membership
field1: username
field2: userid
field3: email_address
I want to pull a query result set listing popular email domains (so right of the "@") ordered most popular first.
For example, the resulting set might look something like:-
bob@ widgetmail.com
sam@ widgetmail.com
wil@ widgetmail.com
frd@ widgetmail.com
zip@ widgetmail.com
zap@ example.com
grp@ example.com
zor@ example.com
I'm guessing that I can perhaps use a RegEx query, but I want to do it without having to first specify all the domain names....
Make sense? Possible?
Thanks,
TJ
I don't think that will necessarily order by the most popular.
You're right. It's still actually a good start, but I think to be really useful I need it in popularity order.
Is there anyway to introduce a "count" field for each domain found, then order by count?
I appreciate it'll be a sluggish query and probably take ages to execute, but in my particular case that doesn't matter - it'll only be performed once in a blue moon and I can always do the query offline on my testbed PC.
Thanks,
TJ
SELECT SUBSTRING(email FROM POSITION('@' IN email)), COUNT(SUBSTRING(email FROM POSITION('@' IN email))) AS domain_count FROM users GROUP BY SUBSTRING(email FROM POSITION('@' IN email)) ORDER BY domain_count DESC LIMIT 30;
Thanks for the help guys.
(PS 0.22 seconds to run this over a DB of 30k users which I didn't think was bad at all).
TJ
Only problem is that query won't be able to make use of an index
Functional indexes are "still" on the MySQL To Do list as far as I know but have been for a long time (as in years) with a medium priority. I suppose a workaround might be to CREATE VIEW if necessary.
I don't think that will necessarily order by the most popular.
Correct. I was hurried off this weekend so I didn't finish the prioritization but thought that might be enough to get TJ going. Actually, a more refined list might use a subquery, something along the lines of ...
SELECT
email.email_address
FROM email
INNER JOIN
(
SELECT
SUBSTRING(email_address FROM POSITION('@' IN email_address)) AS domain,
COUNT(SUBSTRING(email_address FROM POSITION('@' IN email_address))) AS domain_count
FROM email
GROUP BY SUBSTRING(email_address FROM POSITION('@' IN email_address))
) AS email2
ON (SUBSTRING(email.email_address FROM POSITION('@' IN email.email_address)) = email2.domain)
ORDER BY domain_count DESC, email.email_address
;
For example, the resulting set might look something like:-bob@ widgetmail.com
sam@ widgetmail.com
wil@ widgetmail.com
frd@ widgetmail.com
zip@ widgetmail.com
zap@ example.com
grp@ example.com
zor@ example.com
... as opposed to just the domain name. I thought you had intended to get the email addresses, not just the domain.