Forum Moderators: open

Message Too Old, No Replies

MySQL : Getting "common" email addresses

Hotmail, Gmail etc

         

trillianjedi

10:55 am on Aug 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wonder if this is possible, if not by query perhaps via a script?

I have a membership table with a registered email address for each record, say:-

Table: Membership
field1: username
field2: userid
field3: email_address

I want to pull a query result set listing popular email domains (so right of the "@") ordered most popular first.

For example, the resulting set might look something like:-

bob@ widgetmail.com
sam@ widgetmail.com
wil@ widgetmail.com
frd@ widgetmail.com
zip@ widgetmail.com
zap@ example.com
grp@ example.com
zor@ example.com

I'm guessing that I can perhaps use a RegEx query, but I want to do it without having to first specify all the domain names....

Make sense? Possible?

Thanks,

TJ

coopster

2:09 pm on Aug 26, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



SELECT * FROM email ORDER BY SUBSTRING(email_address FROM POSITION('@' IN email_address)), email_address;

zCat

4:05 pm on Aug 26, 2006 (gmt 0)

10+ Year Member



Only problem is that query won't be able to make use of an index, which might cause performance problems with a large dataset. The best solution would be a functional index, but I don't think MySQL supports these; otherwise an extra field containing either just the domain part of the address or the address in reverse could be created when inserting / updating the data.

trillianjedi

5:40 pm on Aug 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Coop - brilliant - thanks!

zCat - absolutely right and good point, but in my particular case I only need to run this once.... or at least very very rarely.

Thanks guys,

TJ

FalseDawn

8:44 pm on Aug 26, 2006 (gmt 0)

10+ Year Member




SELECT * FROM email ORDER BY SUBSTRING(email_address FROM POSITION('@' IN email_address)), email_address;

I don't think that will necessarily order by the most popular.

trillianjedi

11:17 am on Aug 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't think that will necessarily order by the most popular.

You're right. It's still actually a good start, but I think to be really useful I need it in popularity order.

Is there anyway to introduce a "count" field for each domain found, then order by count?

I appreciate it'll be a sluggish query and probably take ages to execute, but in my particular case that doesn't matter - it'll only be performed once in a blue moon and I can always do the query offline on my testbed PC.

Thanks,

TJ

trillianjedi

4:00 pm on Aug 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Got it, example top 30 registration email domains:-


SELECT SUBSTRING(email FROM POSITION('@' IN email)), COUNT(SUBSTRING(email FROM POSITION('@' IN email))) AS domain_count FROM users GROUP BY SUBSTRING(email FROM POSITION('@' IN email)) ORDER BY domain_count DESC LIMIT 30;

Thanks for the help guys.

(PS 0.22 seconds to run this over a DB of 30k users which I didn't think was bad at all).

TJ

coopster

5:27 pm on Aug 28, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member




Only problem is that query won't be able to make use of an index

Functional indexes are "still" on the MySQL To Do list as far as I know but have been for a long time (as in years) with a medium priority. I suppose a workaround might be to CREATE VIEW if necessary.


I don't think that will necessarily order by the most popular.

Correct. I was hurried off this weekend so I didn't finish the prioritization but thought that might be enough to get TJ going. Actually, a more refined list might use a subquery, something along the lines of ...

SELECT 
email.email_address
FROM email
INNER JOIN
(
SELECT
SUBSTRING(email_address FROM POSITION('@' IN email_address)) AS domain,
COUNT(SUBSTRING(email_address FROM POSITION('@' IN email_address))) AS domain_count
FROM email
GROUP BY SUBSTRING(email_address FROM POSITION('@' IN email_address))
) AS email2
ON (SUBSTRING(email.email_address FROM POSITION('@' IN email.email_address)) = email2.domain)
ORDER BY domain_count DESC, email.email_address
;

You may need to refine it somewhat but you get the idea ...

trillianjedi

7:51 am on Aug 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



but you get the idea

It looks to be a more complex query, but I'm not sure that I see the advantage?

Is it a speed thing?

TJ

coopster

5:48 pm on Aug 29, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



No, it gives the email address detail as you asked in your first post ...

For example, the resulting set might look something like:-

bob@ widgetmail.com
sam@ widgetmail.com
wil@ widgetmail.com
frd@ widgetmail.com
zip@ widgetmail.com
zap@ example.com
grp@ example.com
zor@ example.com

... as opposed to just the domain name. I thought you had intended to get the email addresses, not just the domain.