Forum Moderators: Robert Charlton & goodroi
Google : Rethinking Search: Making Experts out of Dilettantes
This paper envisions a unified model-based approach to building IR systems that eliminates the need for indexes as we know them today by encoding all of the knowledge for a given corpus in a model that can be used for a wide range of tasks. As the remainder of this paper shows, once everything is viewed through a model-centric lens instead of an index-centric one, many new and interesting opportunities emerge to significantly advance IR systems. If successful, IR models that synthesize elements of classical IR systems and modern large-scale NLP models have the potential to yield a transformational shift in thinking and a significant leap in capabilities across a wide range of IR tasks, such as document retrieval, question answering, summarization, classification,recommendation, etc.
If all of these research ambitions were to come to fruition, the resulting system would be a very early version of the system that we envisioned in the introduction. That is, the resulting system would be able to provide expert answers to a wide range of information needs in a way that neither modern IR systems, question answering systems, or pre-trained LMs can do today.Some of the key benefits of the model-based IR paradigm de-scribed herein include:
•It abstracts away the long-lived, and possibly unnecessary,distinction between “retrieval” and “scoring”.
•It results in a unified model that encodes all of the knowledge contained in a corpus, eliminating the need for traditional indexes.
•It allows for dozens of new tasks to easily be handled by the model, either via multi-task learning or via few-shot learning, with minimal amounts of labelled training data.
•It allows seamless integration of multiple modalities and languages within a unified model.
Do informational sites need to become multi-lingual asap?
"We send billions of visits to websites every day, and the traffic we’ve sent to the open web has increased every year since Google Search was first created.
…we’ve seen that as we’ve introduced more of these features over the last two decades, the traffic we’re driving to the web has also grown — showing that this is helpful for both consumers and businesses."
[edited by: martinibuster at 3:39 am (utc) on May 27, 2021]
The author (Google Search Liaison Danny Sullivan) also raised the issue that people use search differently than in the past and that can result in search queries that require an instant answer but do not need a click.
Google’s Danny Sullivan offered as examples...
People look for quick facts
We send billions of visits to websites every day, and the traffic we’ve sent to the open web has increased every year since Google Search was first created.
Many of the links went to other sites, like the mortgage referral site Bankrate.com, even though those sites cited CelebrityNetWorth as their source.
It feels like it takes about ten years for people in this community to grudgingly accept change and by the time they do the world has changed again and you're still ten years behind.
Scraping is not the correct word. The use of that word in the context of normal indexing of a website is misleading. It's called indexing.
Modern search engines are an excellent example of human-in-the-loop: The human crafts a query, gets a ranked list of candidate documents to peruse, either finding what they're looking for or issuing a new query.
By clicking through to the underlying documents, the human is in a position to evaluate the trustworthiness of the information there. Is this a source that I trust? Can I trace back where it comes from? Is it from a context that is congruent with my query?
Further, that Google’s AI ethics board is no more and looks to stay gone is not a positive sign.
you're dismissing what doesn't confirm to your pre-existing belief and rationalizing ideas (not facts) to shore up your belief
...if Googlebot does it is "indexing" and its ok, but if another bot "indexes" your content and re-uses it on their site its spam. Got it.
The technology described in this paper continues to push the limits and further erodes the agreement, essentially it pushes Google "indexing" ever closer to "scraping".
But to suggest that we should blindly trust Google, and that any technological innovation that is good for Google is good for us is just as nonsensical."
I am also asking everyone to not allow the conditioning of years of misleading clickbait articles
"...in chess the threat is often stronger than the execution...
...from what I see in this thread it has clearly succeeded in obfuscating much and revealing little."
The Q&As are full of wrong answers