I just discovered that a site I'm helping with has thousands of pages from their staging server indexed in Google.
That creates massive amounts of duplicate content that can't be blamed on scrapers!
Check to make sure this isn't happening to you.
Pages can be kept out of the search engine indexes by adding the noindex directive to the <head> ... </head> section:
<meta name="robots" content="noindex">
Don't block spiders in robots.txt if you're doing this. Google et al. will only see the NOINDEX if they're able to spider the page.
Also, remember to remove the noindex directive when you publish the content to the main site.