Forum Moderators: open

Message Too Old, No Replies

What is the best way to prevent duplicates in mySQL?

Wanted to get a few opinions of the best way to do this.

         

webworker us

8:55 pm on Oct 28, 2007 (gmt 0)

10+ Year Member



Hey Guys,

I have a table with the following:

ID - Primary Key
MasterUser - Detail Key (to a master table)
JobDate - Date job performed
StartTime - text
EndTime - text
Notes - text
Closed - bool (job open or closed)
Type - int (Job type)
Fee - text

I have been building a PHP application to allow people to sign up for jobs. Any user can sign up for any number of jobs, and the same job could be signed up for by multiple users.

The part I'm wondering about is preventing a user from signing up for the same job more than once.

One approach I had considered, was to do a query before the insert to see if the MasterUser and JobDate (the two fields I would judge duplication by) were already in the table. However, that seemed like it would be pretty slow. What I was really hoping to do was to use the INSERT ON DUPLICATE KEY stuff, but I'm not really sure how to define what makes my records "duplicate". The last thought I had had was some kind of INSERT INTO WHERE NOT IN(SELECT...) but I don't think that is valid SQL.

Any suggestions?

Thanks,

Casey

SeanW

12:02 am on Oct 29, 2007 (gmt 0)

10+ Year Member



If there's an index on MasterUser,JobDate your lookup will be very fast. You're probably going to be joining on them anyway, so the index will help.

Sean

webworker us

12:34 am on Oct 29, 2007 (gmt 0)

10+ Year Member



Hey Sean,

Thanks for the reply. I have the table indexed on ID number, but not on MasterUser or the date. At most, a person could have four "months" worth of inserts, so that would be 120ish selects. Would indexing those two fields be enough to offset that?

Casey

SeanW

12:43 am on Oct 29, 2007 (gmt 0)

10+ Year Member



If you have to do 120 selects to find out of they've applied for a job, you've done something wrong with the design or have explained it to me wrong ;)

You can also check dates in ranges, ie

select * from foo where applied <= "2007-10-10" and applied >="2007-06-10"

Sean

webworker us

1:11 am on Oct 29, 2007 (gmt 0)

10+ Year Member



I think my explaination is lacking. Let me try again. The purpose of this PHP page is to allow users to quickly apply for jobs over an entire calendar year. What I have setup is a page where first they enter the details of their job offer: time to work, pay, etc. From there, they are taken to a page that has a calendar, seen 4 months at a time. Each date is a checkbox, and if they check the box for a given date, they want to add their job offer to that date (In case you were curious, the reason there are so many adds is this is a child care site, so people are posting their availability for the year).

As they go from 4 month block to 4 month block, I add their job offers to the database. Specifically, I have each date as an input field in a form, and before the next page is displayed, the page loops through the form data building a huge INSERT INTO then hitting the database.

The reason for all the selects would be to check each of the dates that was "checked" by the user. I wouldn't mind building them into a huge select, and being able to say:

INSERT INTO jobs VALUES (), (), (), () WHERE NOT IN (SELECT date, date, date, date), but I don't think that is valid (could be totally wrong...wouldn't be the first time).

If there is a much better way to do this, please say so, and I'll gladly change.

Thanks again,

Casey

SeanW

2:04 am on Oct 29, 2007 (gmt 0)

10+ Year Member



As I understand it you're editing that quarter, so it's possible I could set my availability for the quarter once, and then add/remove days as necessary. So you've got to worry about the case where a day is added, a day is deleted, and where a day doesn't change.

In that case, when the user submits the form, read in the data for the days covered by the calendar from the database in one query.

Then loop through the results of the form. By comparing the data from the database and the values from the form, you know which days have to be added or deleted.

If you want to add in one query, you can do

INSERT INTO table (colA, colB, colC) VALUES (1,2,3), (4,5,6),(7,8,9)

We've gone somewhat astray from your initial question, which can technically be solved with a unique index on those two columns (one index on both columns). You'll get errors if you try to insert a duplicate row, which is why the above method of reading in all the rows and then writing out the changes is best in any case.

Sean

webworker us

2:52 am on Oct 29, 2007 (gmt 0)

10+ Year Member



I get lucky with this page in that it only has to handle adds. Edits and deletes are handled differently.

I'll try doing the select on the date range then checking against it and see how that goes. Anything I should watch out for so I don't needlessly loop through the recordset?

My own thought was check the record date against the form date. If the date on the form is less than the record date, add it to the INSERT INTO, otherwise increment the recordset, check again.

Casey

SeanW

3:21 am on Oct 29, 2007 (gmt 0)

10+ Year Member



My own thought was check the record date against the form date. If the date on the form is less than the record date, add it to the INSERT INTO, otherwise increment the recordset, check again.

That's a cleverer way to do it, just make sure your query has an ORDER BY on the date, because you never know what order you'll get the data back.

Sean

mjwalshe

5:09 pm on Nov 21, 2007 (gmt 0)

10+ Year Member



That looks like a many to many relsion ship you need to use junction tables

so you have

tbl_User
tbl_Job
tbl_UserJob (this joins it and you would put a constaint on here)

the if SELECT UserId,JobId from tbl_UserJob returns nothing it's ok to post

ZydoSEO

7:24 pm on Nov 24, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree with mjwalshe. This is definitely a Many-To-Many relationship (Many users can sign up for Many Jobs) and therefore you should have a relationship table as he described... Something like:

tUser w/ UserID as the primary key
tJob w/ JobID as the primary key

and a relationship table:

tUserJob or tJobUser w/ (UserID,JobID,JobDate) together forming the primary key with foreign key constraints back to the tUser and tJob tables so that you can't insert JobIDs in the relationship table if the JobID does not exist in tJob (and likewise with UserIDs and tUser).

I'm not sure I understand the application, but if the user picks a particular job and then you show them the calendar so they can sign up for that job on particular dates, then I'd suggest somehow graying out the dates for that job where that user already has a row in the relationship table for that job. Then they can only check 'additional' dates for that job and you should never get a duplicate key error on an insert. The UI prevents duplicates on the front end.

The other approach would be allow them to attempt to sign up for any date (including ones they already signed up for) and catch it on the backend when you're doing the inserts. If inserting one date at a time (probably inefficient), you could simply do inserts and ignore DUPLICATE KEY errors. If inserting in mass you should be able to do something similar to the following:

create a tmp table #tmpDatesSelected (DateSelected)

insert all dates checked on the UI into #tmpDatesSelected

INSERT INTO tUserJob (UserID,JobID,JobDate)
SELECT 'joeuser', -- sample UserID
14151, -- sample Job
ds.DateSelected
FROM tDate ds
LEFT JOIN tUSerJob uj ON ds.DateSelected = uj.JobDate
WHERE uj.JobDate is NULL

This should only attempt to insert rows into tUserJob where there is not already a row in tUserJob for that UserID, JobID, JobDate combination.

[edited by: ZydoSEO at 7:26 pm (utc) on Nov. 24, 2007]

webworker us

1:16 am on Nov 25, 2007 (gmt 0)

10+ Year Member



Hey guys thanks for the comments,

It is definitely a many to many relationship. Sadly when I first conceived this application, I didn't include the table you described.

At first I used ON DUPLICATE KEY ... UPDATE but the database on the production site does not support this command.

Therefore I think I'm going to try graying out the dates as you suggest, since that will visually let the user know "Oh I already have a job here".

I do one huge insert for the reasons you mention.

Thanks,

Casey