At the end of July, Google finally let us have a closer look at Manifest v3 – a revision of how Chrome extensions work. This update has been quite controversial due to concerns that it will steal much of the power from ad-blockers. And while we dislike ads as much as the next person, we feel privacy is a much bigger issue.
It’s easy to understand that we’re being watched on the internet, but it’s difficult to feel it. There’s no shady figure peering at our screens from the darkness and neither do we hear muted breathing when on Facetime. Yet the online world clearly knows which shoes we were looking to buy on Amazon (as it keeps on repeating), and that’s just the beginning.
In light of the controversial Chrome update, we decided to take a closer look at the 50 most popular websites in the world (according to SimilarWeb), to see just how much our activities are being tracked. A few of the websites are really bad, while some are quite benign – apparently, the porn industry respects your privacy more than some of the companies that form the backbone of the internet.
How are you being tracked online?
Most websites use little text files called “cookies” to enable certain features – sites read them to remember your login credentials, which articles you’ve already read, your language preferences, etc. These are stored on your hard drive and are called first-party cookies because they’re given to you by the website you’re visiting. Yet they’re (usually) not the only cookies you’re getting when you visit a website.
To help monetize their website or their service, many websites rely on advertising and analytics tools. By doing so, they allow third-parties to run scripts on their website, and these third-parties also create cookies on your device. Unlike the more benign first party cookies, however, these can track you from one place on the internet to another.
Let’s take one of Google’s advertising services – DoubleClick. Of the Top 50 websites we looked at, 18 use the services of DoubleClick. Let’s say you’ve visited one of the sites serviced by DoubleClick – Reddit. You now have a cookie that has a unique identifier, the time you saw the ad, your IP address, and where on the site you were when you saw it – a home improvement subreddit. Later, you go to Twitch.tv, which also uses DoubleClick and can therefore read the original cookie on your device. Your unique identifier connects your two interests (home improvement and video games) and increases DoubleClick’s targeted advertising accuracy. This may sound abstract, but it really isn’t – just go to this page to see how well Google knows you.
Google is omnipresent and huge, which is a curse, but also a blessing – it’s harder for them to do anything too egregious. The same can’t be said of the thousands of smaller advertisers and web analysis services littering the internet as a whole and the Top 50 websites in particular. For example, Google doesn’t sell the raw data they gather to third-parties, but many others do. As a matter of fact, this is often an important source of revenue.
Cookies are not the only way businesses track consumers. The past decade has seen a great deal of innovation in this sphere, as a result of which there are many tools to keep tabs on you. This includes:
Web beacons. These are transparent images the size of 1 pixel, which have to be retrieved from a different server than the one hosting the website you’re visiting. By retrieving it, your browser gives away certain information – your IP address, browser type, and other information. Web beacons are widely used in conjunction with cookies.
Flash cookies (Local Shared Objects). Similar to regular cookies, only capable of holding much more data, longer-lasting, and more difficult to get rid of. They are notorious for their use in creating zombie cookies – a case where Flash cookies are used as backup for regular cookies, so that they can be restored after deletion.
HTML5 Local Storage. The functionalities introduced with HTML5 are useful to use as an alternative to regular cookies in some cases.
Cached content. To save load times, browsers cache content and reset the cache based on server-given expiration headers. This can be used (and is being used) to infer when a user has already visited a certain website. It’s difficult to prevent unless you want to constantly delete your cache.
Browser fingerprinting. Most people use plugins and extensions, making their browsers different. Websites can identify these differences and, together with Canvas fingerprinting, identify you in particular. Think you can’t be identified? Check here.
What’s worse – 100 benign trackers or 1 bad one?
Our article ranks websites based on the number of trackers. The main reason for this choice is simplicity, but there’s more to it than that. From the perspective of privacy, virtually all trackers reduce your privacy to a greater or lesser extent, because they all increase your footprint on the web. And, even if they themselves are benign, some of the services they use may not be.
It’s certainly true that not all trackers are equal, but the thick legalese of many privacy policies makes it very difficult to gauge the differences. Therefore, we felt the number of trackers was a more objective unit of measure.
A brief note on the third-party vendors mentioned below
Looking through the trackers on the Top 50 websites, we chose several to illustrate a few points about the issues native to tracking technologies on the web. These are not necessarily the worst trackers for privacy found on these websites, and certainly not the worst overall.
|Tracker||Why is it bad for privacy?|
|AdGear||Shares personal consumer data with third-parties|
|BlueKai & Datalogix (Oracle)||Gathers consumer info to help with ad targeting and uses loyalty card data to gauge marketing success in terms of real-world purchases|
|Hotjar||Session replay script: records your individual browsing session (avoiding sensitive data fields – if everything goes as planned)|
|MaxPoint Interactive||Uses an extensive range or tracking technologies to “pinpoint qualified customers interested in purchasing your product with more precision than using traditional zip codes.”|
|PulsePoint||Healthcare ad service. Collects highly sensitive medical data|
|Quantcast||Bad reputation for using permanent “zombie cookies” in the past|
Stories of the Top 50
Across the Top 50 websites, we found 459 third-party trackers from 132 different sources. That means that, by visiting these 50 websites, you‘re broadcasting your visit to at least 182 entities. In reality, the number is significantly higher because, firstly, most of these third-parties are sharing data with at least a few others.
Secondly, cookie syncing – the practice of sharing user data between different platforms – means that the conservative number of 182 is probably far from the truth.
Yet not all websites in the Top 50 are equal, so let‘s take a closer look at some of the details you are transmitting as you visit the sites.
Disclaimer: Beyond the Top 50, there are certainly websites that collect more sensitive data than the ones mentioned here.
Who has the most third-party trackers?
The Top 50 doesn‘t lack for cookies and pixels, but here‘s a clear list of 10 that have the most third-parties. These are, in order of appearance:
Let‘s look at each of these in turn.
Mail.ru: 61 third-party trackers
Mail.ru is Russia’s biggest tech company, with its hands in many different cookie jars. For starters, they own all the biggest Russian social media sites, like VK.com and ok.ru. Mail.ru is a website that does a lot of things, not least of which is deliver news, and one of the main ways online news portals monetize is through ads. As such, it’s no surprise that Mail.ru has a whopping 54 advertising trackers on the homepage.
Some of these make no qualms about collecting and using PII and even sensitive data (such as health information, religion, political affiliations, etc.). A few examples are PulsePoint – a healthcare ad service and information platform which lives and dies by their knowledge of very personal details – and Quantcast, which is notorious for having used zombie cookies in the past.
Needless to say, users should be skeptical about their privacy on large Russian sites regardless of the third-party services they are using. Data from Russian companies as large as Mail.Ru is a matter of national interest, and national interest takes precedence over individual freedoms in Russia.
Accuweather.com: 59 third-party trackers
The world‘s favorite weather website and app takes a close second in the race to get more advertising trackers. It has 51 of them, some from large, well-known advertisers (like Google, Facebook, AOL, Adobe, and Yahoo), others from lesser known, shadier sources.
“Sometimes our customers include their own (or their partners’) pixels or similar technologies within advertisements that dataxu serves to websites.”
AdGear is not much better. Here’s a sampler:
“We share certain personal information about consumers with third parties who provide us with information to enable targeted advertising on website(s) and application(s) that use our technology.”
This sounds extremely vague and could be used in any instance where personal information is shared with a third party.
Ebay.com: 37 third-party trackers
Like many online marketplaces, Ebay relies quite heavily on advertising to monetize their site. The company’s waning fortunes over the past decade have not helped at all. Ebay features trackers from industry giants like Google, Facebook, Adobe, AOL, and Yahoo, as well as from smaller vendors.
OK.ru: 35 third-party trackers
Odnoklassniki is a Russian social network owned by the Mail.Ru Group – this makes it part of the largest Russian internet company. In this particular case, discussing specific third parties seems pointless: it is understood that there is no privacy for users of large Russian websites, at least where issues important to the government are involved.
Fandom.com: 33 third-party trackers
Formerly known as Wikia and Wikicities is a wiki hosting service – a site containing lots of encyclopedias for films, tv shows, and games. The site makes a lot of its money from advertising, which should be clear considering the 25 advertising-related third parties on the site. Fandom also uses other external services – website analytics (5 trackers) and 2 trackers to facilitate customer interaction.
Two notable tracker names that come up on the Fandom website are BlueKai and Datalogix, both of which are part of the Oracle Data Cloud. BlueKai is responsible for obtaining data to help ad targeting (i.e. tracking what sites you’re visiting and thus determining what your interests are), meanwhile Datalogix provides information on the offline success of marketing (telling advertisers how many consumers purchased a product after seeing an ad).
In case you’re wondering how Datalogix is able to do that, at least partially the answer lies with data gathered by supermarket loyalty cards. In other words, this is a somewhat creepy connection between you of the computer and you of the supermarket.
Samsung.com: 25 third-party trackers
The Korean technology maker uses the services of many advertising and analytics vendors. An example of the latter is Blue Triangle – an A/B testing service. Put simply, the service checks different versions of the website in terms of their success in getting a customer to do something (buy, comment, click a link – whatever it might be).
Bbc.co.uk: 25 third-party trackers
Despite being funded by taxpayers, the BBC doesn’t ignore advertising revenue. The British media organization has 25 third parties on their UK website, including 19 advertising-related ones.
Most of the third parties on the BBC website are relatively harmless in isolation.
Amazon.com: 24 third-party trackers
The largest online marketplace in the world uses the services of many third-party trackers. You’ll see the names of some that are on many of the sites in the Top 50 (like Aggregate Knowledge, Advertising.com, or BidSwitch), as well as a few that we have already said some bad things about (PulsePoint, BlueKai). The point to make about Amazon, however, is how Amazon itself tracks your actions.
Your user profile, search queries, and wish list are an obvious example, but there are some avenues of tracking users you might not have considered. Amazon’s streaming services, such as Amazon Prime, are a good example. As are the various gadgets made by Amazon – the Kindle, the Fire TV Stick, and the Echo are just a few notable examples.
To expand on this, we can say that Amazon has been extremely successful at leveraging all this customer data in their Amazon Web Services (AWS). Through a simple API, clients of AWS are able to access and use the deep learning technologies powering Amazon products. As such, we can say that under the veneer of retail Amazon hides a business that is very much about data, and very much not about your privacy.
Reddit.com: 22 third-party trackers
The front page of the internet has seen huge success over the latter decade. Its semi-autonomous subreddits have something for virtually anyone to enjoy – free, we might add. As usual, however, “free” means “powered by ads.”
Reddit has 17 ad-centric third-parties onsite, one of which is MaxPoint Interactive, which uses “cookies, pixel tags, web beacons, server logs, HTML5 local storage, Flash local shared objects (LSOs), statistical identifiers, mobile advertising identifiers (such as IDFA and Android Advertising ID), software development kits (SDKs), server-to-server transfers, and non-cookie technologies” to collect information about users and “pinpoint qualified customers interested in purchasing your product with more precision than using traditional zip codes.”
WordPress.com: 21 third-party trackers
The world‘s favorite content management system is great, but the same can‘t be said for the privacy practices of its official website. Outside of the before-mentioned DataXu and Quantcast, WordPress.com also uses Hotjar – a popular session replay vendor.
Among the various analytical processes performed by Hotjar, not least important is session replay. Whenever you visit a website with Hotjar, it records what you‘re doing – where your mouse hovers, what you type in, what you click, and so on. This is a useful tool for website owners who want to optimize their webpages, but it‘s also a privacy risk.
There have been cases where session replay companies have accidentally recorded login credentials – usernames and passwords. These tools are supposed to ignore anything written into password fields. Yet unfortunately, the way the technology was designed is bound to occasionally cause problems. For starters, passwords are not the only type of sensitive info you may divulge on a website.
We found Canvas fingerprinting scripts on 5 of the websites we checked. Is this such a huge deal?
We decided to single out Canvas because it can’t be blocked unless you’re using the Tor browser, and regular ad blockers won’t show you which sites are using it. Moreover, we feel it indicates the lengths sites will go to for consumer data.
It’s telling that of the 5 sites using Canvas fingerprinting in the Top 50, 1 is Russian (vk.com) and 2 are Chinese (qq.com and aliexpress.com) – two countries that are not known for their stellar privacy credentials.
The other two are Western sites we’ve already mentioned on the naughty list – ebay.com and accuweather.com.
Porn – a bastion of privacy
There are 4 porn websites among the Top 50 most popular sites on the internet:
They account for a total of 9 third-party trackers, 5 of which are on Pornhub.com. None of these trackers are particularly bad for user privacy – 3 websites use the services of an adult advertising vendor (either TrafficJunky or Pornvertising), whereas xhamster.com only has 2 Google trackers (DoubleClick and Google Analytics). Pornhub also uses AOL’s Advertising.com and Index Exchange.
Perhaps it is understandable that websites serving something as sensitive as porn would feel the need to ensure their users’ privacy. Yet that certainly wasn’t always the case and the porn industry has cleaned up (no pun intended) considerably since the days of rampant adware and other nasty practices.
Nowadays, porn sites are some of the best at privacy – certainly a lot better than most of your favorite shops and media outlets.
The most prolific spies of the Top 50
As expected, Google trackers are by far the most numerous among Top 50 websites – we found 97 of them in total (more than 20% of all trackers). Facebook comes in at 2nd with 18 trackers. This is generally representative of the internet as a whole. Here is the full list of the more popular third parties:
For some of these companies, cookies and pixels are only one of many ways they gather data about users. Google, for example, has plenty of other resources they can rely on, starting with Gmail and YouTube, continuing with Android devices, and far from ending with Chrome. Meanwhile, Facebook is essentially a personal information database, crowdsourced by the users themselves.