I'm working on a new feature for one of my sites. I'm building a comparison engine that user will be able compare widgets. This not an e-commerce thing, it is simply a tool that allow users to compare features of widgets.
The idea is to have link from the widget page, say widget-A to the comparison page /widgets/compare/widget-A.html. Then at that page the user can select another widget to compare to widget-A, so the url would change to /widgets/compare/widget-A.html?item1=widget-B. The page will be updated using XHR-request and the url update using pushState().
The question is what are the SEO implication of this, first thing that comes to mind is duplicate content. I am basically increasing the total number of URL's by number of widgets to the power of 2, well actually to the power of 3 or 5 as I plan to allow multiple widgets to be compared at once. The resulting number is going to be big because my number of widgets is in the order hundreds of thousands. What to do?
On the one hand, the basic data that is displayed on these pages for each widget already exists on the widget page, I'm not really adding anything new in terms of data. So should I simply block this content from indexing?
On the other hand users will certainly be looking for answers to questions such as is widget-A bigger than widget-B. In the current state the user would need to go to two pages to get the answer so Google is unlikely to associate my site with this query, whereas the comparison page would answer this question directly. So having the content indexed would be of interest, for me and for my users. Do I simply leave it up to Google to decide.
I plan to include links to specific comparison pages from the widget page. So there would be links from the Widget-A page to compare similar widgets, say Widget-Aa. I already have a list of links to the similar widget pages. This would lead to Google to naturally discover pages and would restrict the total number of pages to index, but it would restrict the types of comparison only to like items and which may not necessarily to the comparison of highest interest. (if that makes any sense?). Simply put Google would index widget-A vs widget-Aa (highly similar, low interest) but would never discover or index widget-A vs. widget-Z (low similarity, high interest).
Any ideas? Am I over thinking this? Build it and they will come.