30% of the 1000 largest sites use scripts for hidden identification

30% of the 1000 largest sites use scripts for hidden identification
3 min read
27 August 2020

A group of researchers from Mozilla, the University of Iowa and the University of California have published  their study results of the use of code for hidden user identification on sites. Hidden identification means the generation of identifiers based on indirect data about browser operation, such as screen resolution, a list of supported MIME types, specific parameters in headers (HTTP / 2 and HTTPS), analysis of installed plugins and fonts, availability of certain Web APIs, specific to graphics cards rendering features using WebGL and Canvasmanipulation with CSS, taking into account default values, scanning network ports, analyzing the peculiarities of working with a mouse and keyboard.

A study of the 100,000 most popular sites according to Alexa rankings showed that 9040 of them (10.18%) use a code to secretly identify visitors. At the same time, if we consider a thousand of the most popular sites, then such a code was found in 30.60% of cases (266 sites), and among sites ranking from the thousandth to ten thousandth, in 24.45% of cases (2010 sites). Basically, hidden identification is used in scripts provided by external services to combat fraud and filtering bots, as well as in advertising networks and tracking systems for users.

30% of the 1000 largest sites use scripts for hidden identification

To identify the code that carries out hidden identification, the FP-Inspector toolkit was developed, the code of which is offered under the MIT license. The toolkit uses machine learning techniques in combination with static and dynamic analysis of JavaScript code. It is argued that the use of machine learning has significantly improved the accuracy of detecting code for hidden identification and revealed 26% more problematic scripts compared to the manually specified heuristic.

Many of the identified authentication scripts were missing from the generic DisconnectAdsafe, DuckDuckGo, Justuno and EasyPrivacy block lists. After the developers of the EasyPrivacy block list sent the notificationa separate section was created for hidden identification scripts. In addition, FP-Inspector has revealed some new ways to use the Web API for identification that have not previously been encountered in practice.

For example, the use of information about the keyboard layout (getLayoutMap), residual data in the cache was revealed (using the Performance API, data delivery delays are analyzed, which makes it possible to determine whether the user has accessed a certain domain or not, as well as whether the page has been opened earlier), permissions set in the browser (information about access to Notification, Geolocation and Camera API), the presence of specialized peripherals and rare sensors (gamepads, virtual reality helmets, proximity sensors). In addition, when identifying the presence of APIs specialized for certain browsers and differences in API behavior (AudioWorklet, setTimeout, mozRTCSessionDescription), as well as the use of the AudioContext API to determine the features of the sound system, was recorded.

The study also examined the issue of disrupting the regular functionality of sites in the case of applying methods of protection against hidden identification, leading to blocking network requests or restricting access to the API. It has been shown that selectively restricting the API to only scripts identified by FP-Inspector results in less disruption than when Brave and Tor Browser use more stringent generic API call restrictions, potentially leading to data leaks.

In case you have found a mistake in the text, please send a message to the author by selecting the mistake and pressing Ctrl-Enter.
Alex 9.8K
Joined: 4 years ago
Comments (0)

    No comments yet

You must be logged in to comment.

Sign In / Sign Up