By Hannes Westermann, PhD student in Artificial Intelligence and Law at the Université de Montréal and student researcher at JusticIA and the Cyberjustice Laboratory.
The internet gives us access to an enormous amount of information. Billions of websites provide us with the means to learn about almost any conceivable subject, to stay in touch with our friends, to get instantaneous access to news from all over the world and to access and endless supply of entertainment.
However, there is a much larger volume of information and content available on the internet than any person could consume in a lifetime. In order to make sense and navigate this wealth of data, we often rely on services that filter and rank the information for us, such as Google, which claimed to serve at least 2 trillion searches annually in 2016, equivalent to over 63,000 per second. These services are backed by algorithms that are able to sift through billions of documents and determine what to show us and in which order.
What is ranking?
Imagine that you want to learn more about artificial intelligence. You go to a search engine and enter the query “artificial intelligence” into the search box. From the billions of sources and websites available on the internet, the search engine must decide which links to websites to show in a list on your computer screen, and in which order. The user likely expects the most relevant sites to be at the top of the result list. But how can the search engine determine this relevancy?
The first step would be to filter the websites and only consider websites that contain the term “artificial intelligence”. However, according to Google, there are over 800 million websites that contain this term. The algorithm thus needs to find a way to decide which of these results to show first.
Some traditional ways to sort a result list with very simple algorithms would be sorting:
- by date (most recent first)
- by how often the term appears in the website
- by how often this site has been visited
These criteria would be enough to build a simple search engine with matching results but might not give the results of the highest quality or relevance, due to the amount and complexity of information on the internet. Further, they would be easy to game – a person who wanted to appear high in the list of results could simply publish an article once per hour and include the term as often as possible.
Modern search engines therefore use much more complex ranking systems, using hundreds of criteria to determine how to rank results. When Google receives a search query, it performs a sophisticated analysis to establish the meaning of the search terms, the relevance of webpages, factors about the quality of the content and the usability of the website, and the context of the search, such as what it knows about the user. Interested parties have identified 210 factors that probably affect the ranking of websites. Facebook, which uses ranking systems to determine what to show to users in their news feed, claims to take into account thousands of signals in determining what to show next.
These algorithms can provide enormous value to us as users, by making it easier for us to find information of a certain quality and relevance. However, it is important to keep in mind that changes to the algorithms can change the order of results that show up in the result list after a search. The ranking of search results does not only promote certain results, but also places other results in later pages where they are unlikely to be found by the user.
Which links appear on the first page of the results page plays a huge role in how people find relevant information. A study has shown that over 90% of clicks on a link by the user after a google search occur to a link located on the first page of matching results, with 33% of clicks being on the very first result. Given that 30% of traffic to news sites comes from Google, and that 68% of American Adults claim that they at least occasionally get their news from social media, the ranking of information might play a significant role in affecting the understanding of the world and influencing political opinions of users.
This means that the developers of the algorithms have immense power, knowingly or unknowingly. Altering the parameters of the algorithms will affect how information is ranked and can thus influence the information that will be presented to billions of individuals on a daily basis.
These ranking algorithms can have a profound impact on our lives. In determining which sources of information are presented to us, and when we see them, they can affect the way we feel, form our opinions and create a picture of the world, society, political systems and other people, both locally and globally. A study with 4,556 participants showed that biased search rankings can shift the voting preference of undecided voters by 20% or more, while also being able to remain undetected by the participants. Facebook has demonstrated that changing the algorithm to favor negative posts will influence the mood of the user to also be more negative.
Transparency in raking algorithms
Given the described influence of ranking algorithms, the question can be asked to which extent there is transparency about how the algorithms operate in order to understand the factors that go into a ranking decision.
In general, due to the massive commercial value of the algorithms, the companies protect them as trade secrets. This is a form of Intellectual Property Protection for secret information that is commercially valuable for the holder of the information due to its secrecy, and that is actively protected by the company. Google’s search algorithm is, besides for example the Coca Cola recipe, one textbook example for this type of protection.
Google argues that this form of protection is vital for maintaining the quality of search results. It describes building the algorithm as a kind of cat-and-mouse game. Google picks indicators of high quality of a website to surface valuable content. However, websites then try to find these indicators and optimize for the indicators, rather than actually improving the quality of their website. Having to reveal the algorithm would therefore make it easier for website creators to trick the algorithm, thus hampering the quality of search results.
Generally, this would imply that the ranking algorithms do not have to be revealed. Recently, there has been a lot of debate regarding a “right to an explanation” of automated decisions, particularly as regulated in articles 15 and 22 of the General Data Protection Regulation (GDPR), the new EU privacy regulation. It is an open question if these legal provisions might give rise to a right to get information about automated ranking decisions. There are however several indications that this might not be the case. First of all, the GDPR is only applicable to the processing of personal data. Ranking content, such as links to websites, is based to some extent on the content itself (such as the website content in relation to a search term, or number of likes a piece of content has received) rather than personal information of the user. Therefore, ranking algorithms might partially or wholly fall outside of the GDPR. Further, even if the GDPR was applicable, the obligation to disclose the exact content of an algorithm behind a decision is hotly debated, with researchers claiming that this does not have support in the legislation. Where such obligation is seen to exist, it often refers to decisions “which produces legal effects concerning him or her or similarly significantly affects him or her”, such as decisions regarding recruiting, or granting a loan. A decision of how to rank search results to be presented to a user has at most an indirect effect on the user and seems unlikely to fall under this categorization.
That said, there seems to be a movement towards opening up parts of the algorithms to the user. Facebook, for example, has introduced a feature to allow individuals to gain information about the order of items in the newsfeed.
Despite the importance of the ranking algorithms, people might not be aware of their existence. A study revealed that 62.5% of participants were not aware that a complex algorithm was ranking and selecting content, and assumed that they would see all posts by their friends on Facebook.
The algorithms have tremendous commercial values for their developers and their efficiency and quality might be impeded, if disclosed. The proposed “right to an explanation”, as discussed in some jurisdictions in connection with automated decisions with direct effect on people does not seem to extend to algorithm-based decisions how to rank information returned upon a user query.
In light of the crucial role that ranking algorithms play in providing people with information, it is important to always remain aware of the role that they play, and how they might affect us by the choices they make for us in ranking the information that they present.
This content has been updated on 13 August 2020 at 16 h 36 min.