Protecting Written Content from Reuse by Other Publishers

Forum Moderators: martinibuster

Message Too Old, No Replies

Protecting Written Content from Reuse by Other Publishers

CommandDork

3:23 pm on Sep 23, 2022 (gmt 0)

So after all these decades, has there been any truly effective way for Publishers of original site content to protect their written work(s) from being stolen from other publishers?

Just curious - it's been about 20+ years and I'm still having to file DMCA notices with Google Search, Adsense, and hosting providers. Becomes a terrible waste of time that could be spent doing other things. If browsers stopped allowing the ability for users to view code, I think that would be a step in the right direction.

Has anyone found any useful method for protecting their stuff?

tangor

5:25 pm on Sep 23, 2022 (gmt 0)

Passwords ... but that defeats the purpose of getting new users.

Not posting it? Nah, see above.

Automating your DMCA process? That helps, marginally.

Pretty sure the answer is "NO".

Dimitri

9:44 pm on Sep 23, 2022 (gmt 0)

If browsers stopped allowing the ability for users to view code, I think that would be a step in the right direction.

How is it related to copying the content ?

If you mean copying the text , then someone can simply select the text, and do a copy n paste.

If you mean copying the whole page, including its layout, then, this is certainly not achieved by viewing the HTML source code of the page. Instead an automatic process will be involved. With a robot, fetching the page, and eventually extracting its content.

From the moment the content is viewable, it's reproducible. You can even extract the textual content of an image, using OCR technologies.

Now, there are some tricks to limit some forms of scrapping.

- you can block the access to your pages, when the IP address comes from a DataCenter. You can also restrict the access to IP which belongs to the countries you target.

- you can populate your page asynchronously. A piece of Javascript, fetching the content from your server and injecting it into the page. In that case, a "view source" , will show only the javascript code, that you can obstruct. However, the Devtool, included in all web browsers, will show the rendered HTML code, after the insertions. Also, most of scarpers will use bots which can render the Javascript, but most of these bots will also run from an IP address which belongs to the DataCenter.

- you can check if the favicon has been retrieved by the IP , before serving the content. All browsers seem to automatically retrieve the favicon, when a page is visited. This fetching comes after the download of the page, so you would need to asynchronously populate the page, and wait to detect the favicon fetching (or if it has already been fetched, from previous visited pages). Most bots do not retrieve the favicon. Bots writers, will know what to do now :-)

- you an also test if the browser accessing the page, is supporting the latest technologies, since lot of bots are a bit outdated (but with chromium code, it's easy to make a bot which is perfectly emulating a human visit). For example, reject all connections which are not http/2, all connections which are not supporting brotli compression, you can test if the browser is supporting webp image, etc... the lack of supporting the latest technologies, is a hint that the request might not come from a human.

But I am drifting away, and only considering blocking automatic scraping. You will never be able to prevent a human, to manually copy the content of your HTML pages, as far as I know.

phranque

11:51 pm on Sep 23, 2022 (gmt 0)

If browsers stopped allowing the ability for users to view code...

browsers are simply a subset of user agents.
you can easily make document requests without rendering the document using open source user agents such as curl, lwp-request, etc

tangor

1:34 am on Sep 24, 2022 (gmt 0)

^^^True, which is why all are denied in .htaccess. :)

CommandDork

1:39 am on Sep 26, 2022 (gmt 0)

Understood. Thanks all :/

Sgt_Kickaxe

7:53 am on Sep 26, 2022 (gmt 0)

There is absolutely no way, and will never be a way, to stop someone from copying snippets of a page. It's needed for discussion and is considered legal, Google's search engines are based on it.

As for copying an entire page, you probably can't stop it but search typically makes it pointless to do in the long run.

According to Google's John Mueller every section of a page is converted into a hash value and pages with content too similar to existing content get treated as duplicate by search. Yes, people can outrank you with your content, for a while, but as yours builds authority it becomes less likely to keep doing so. You might need DMCAs on fresher content theft but after 20 years I'd expect that copies won't get traffic from search.

All that aside, instead of scouring the net for copies just watch your traffic totals, if a page loses traffic a quick search for that page will tell you if theft is the cause. Spend more time creating and less time policing. If it's online someone somewhere will copy it.

Kendo

11:40 am on Sep 26, 2022 (gmt 0)

Many online tutors are moving toward copy protection and DRM to protect their intellectual property.

engine

12:03 pm on Sep 26, 2022 (gmt 0)

The best thing that I found was to embed identifiers so that it made it easier to track. I'd make a note of the identifier in a spreadsheet/database as i'd usually forget about it. Every-so-often i'd run a trawl looking for the most recent scrapes, and then take action accordingly.

I also didn't let it anger me.