Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSOD with shadow hooks #26

Closed
purripurri opened this issue Dec 23, 2018 · 10 comments
Closed

BSOD with shadow hooks #26

purripurri opened this issue Dec 23, 2018 · 10 comments

Comments

@purripurri
Copy link

purripurri commented Dec 23, 2018

hi satoshi :)

i'm trying to utilize your wonderful hv to explore a process that does not want me there. it's protected with a kernel driver and running it while ddimon is running causes me to basically freeze and get a dpc_watchdog_violation bsod several minutes later. this seems to only happen if i have shadow hooks on functions that are exported by ntoskrnl. shadow hooking ssdt functions causes no issues whatsoever.

by any chance, would you happen to have an idea what might be triggering this and what i can do about it?

from crash dump:

ffffb280`124d5bc8 fffff801`481e619b : 00000000`00000133 00000000`00000001 00000000`00001e00 fffff801`4845e378 : nt!KeBugCheckEx
ffffb280`124d5bd0 fffff801`480533ba : 000000e4`c2a298da ffffb280`12480180 00000000`00000282 00000000`00000000 : nt!KeAccumulateTicks+0x19049b
ffffb280`124d5c30 fffff801`4896751b : 000000e4`c2a27e20 ffffda03`43f7f010 00000000`00000000 ffffda03`3c8e0a00 : nt!KeClockInterruptNotify+0x9da
ffffb280`124d5f40 fffff801`48118a65 : ffffda03`3c8e0a00 00000000`00000000 00000000`00001000 ffffc2d9`40098c20 : hal!HalpTimerClockIpiRoutine+0x1b
ffffb280`124d5f70 fffff801`481bc95a : ffffd907`f124f2a0 ffffda03`3c8e0a00 00000000`00000000 ffffc2d9`40098c20 : nt!KiCallInterruptServiceRoutine+0xa5
ffffb280`124d5fb0 fffff801`481bce47 : ffff9780`00080700 ffffda03`3c8e0a00 ffffd907`f124f480 00000000`00000000 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea
ffffd907`f124f220 fffff801`481ccd00 : fffff801`4816c141 ffff9780`00080700 00000000`00000001 00000000`00001011 : nt!KiInterruptDispatchNoLockNoEtw+0x37
ffffd907`f124f3b8 fffff801`4816c141 : ffff9780`00080700 00000000`00000001 00000000`00001011 00000000`00000000 : nt!memcpy+0x240
ffffd907`f124f3c0 fffff801`4816bf57 : fffff801`484d0000 ffffd907`f124f510 00000000`00001000 00000000`00000001 : nt!MiCopySinglePage+0x105
ffffd907`f124f410 fffff801`4bf18215 : ffffda03`43f7f010 00000000`00000000 00000000`00000000 fffff801`48011000 : nt!MmCopyMemory+0x1d7
@purripurri purripurri changed the title Curious BSOD BSOD with shadow hooks Dec 23, 2018
@tandasat
Copy link
Owner

Hi,

While I cannot give you an exact answer, I would suggest you to try to see those points to narrow down the cause:

  • what the problematic functions are
  • whether the hooked function is executed without any problem before the bug check, or the first execution of that triggers the bug check

The hang issue may happen when the hooked function is called too often at a high IRQL. This often happens due to some forms of loop conditions (eg, when you hook a function called by the timer interrupt handler and VM-exit to handle the hook takes long enough to trigger the timer interrupt)

@purripurri
Copy link
Author

i forgot to mention that this same issue occurs even if i do not set the int3 breakpoint, just creating the shadow pages but not actually hooking.

it happens on —any— ntoskrnl export when the driver in question is loaded, i cannot discern whether it is a deliberate hypervisor deterrent or some type of accidental occurrence

@tandasat
Copy link
Owner

That sounds weird as DdiMon hooks some of NT-exported functions without any issues as it is. If it is not the case anymore, please let me know your environment so I can try to repro.

  • Commit of DdiMon (eg, 2099af7)
  • OS version (ie, output of the "ver" command, or N/A)
  • Architecture: (ie, x86, x64, or Both)
  • Hardware (eg, Physical, VMware 14, Bochs, etc)

Otherwise, I am guessing the issues is somewhere around EPT manipulation logic you added. If you can explain what changes you made, it would be helpful to think of possible causes. Specifically, I am not sure what when the driver in question is loaded means. Are you hooking an IAT of certain drivers on load?

@purripurri
Copy link
Author

purripurri commented Dec 24, 2018

sorry i think i explained it poorly, let me try again. i am trying to explore a process that is protected by a kernel driver. everything works fine with ddimon until i try to start that process, at which point its driver will be loaded and cause me to crash. i am quite convinced that it is deliberately done as an anti-hypervisor measure. it is only happening if i have shadow pages on ntoskrnl exports, which leads me to believe that when the driver is loaded they are enumerating all ntoskrnl exports and somehow testing them for shadow pages. i also tested with hvpp project, i did not have this issue, but i would prefer to use ddimon. i looked at how both projects handle ept violation vmexit but it looks quite similar to me, so i'm a bit lost on why one suffers this issue and not the other.

i have changed no ept logic or anything else. this happens with a fresh compile, it's not an issue with ddimon itself but rather an issue with this conflicting software which i would like to figure out.

this is generally what my stack looks like when that anti-hv driver causes me to freeze and bsod

ffffb280`124d5bc8 fffff801`481e619b : 00000000`00000133 00000000`00000001 00000000`00001e00 fffff801`4845e378 : nt!KeBugCheckEx
ffffb280`124d5bd0 fffff801`480533ba : 000000e4`c2a298da ffffb280`12480180 00000000`00000282 00000000`00000000 : nt!KeAccumulateTicks+0x19049b
ffffb280`124d5c30 fffff801`4896751b : 000000e4`c2a27e20 ffffda03`43f7f010 00000000`00000000 ffffda03`3c8e0a00 : nt!KeClockInterruptNotify+0x9da
ffffb280`124d5f40 fffff801`48118a65 : ffffda03`3c8e0a00 00000000`00000000 00000000`00001000 ffffc2d9`40098c20 : hal!HalpTimerClockIpiRoutine+0x1b
ffffb280`124d5f70 fffff801`481bc95a : ffffd907`f124f2a0 ffffda03`3c8e0a00 00000000`00000000 ffffc2d9`40098c20 : nt!KiCallInterruptServiceRoutine+0xa5
ffffb280`124d5fb0 fffff801`481bce47 : ffff9780`00080700 ffffda03`3c8e0a00 ffffd907`f124f480 00000000`00000000 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea
ffffd907`f124f220 fffff801`481ccd00 : fffff801`4816c141 ffff9780`00080700 00000000`00000001 00000000`00001011 : nt!KiInterruptDispatchNoLockNoEtw+0x37
ffffd907`f124f3b8 fffff801`4816c141 : ffff9780`00080700 00000000`00000001 00000000`00001011 00000000`00000000 : nt!memcpy+0x240
ffffd907`f124f3c0 fffff801`4816bf57 : fffff801`484d0000 ffffd907`f124f510 00000000`00001000 00000000`00000001 : nt!MiCopySinglePage+0x105
ffffd907`f124f410 fffff801`4bf18215 : ffffda03`43f7f010 00000000`00000000 00000000`00000000 fffff801`48011000 : nt!MmCopyMemory+0x1d7

@tandasat
Copy link
Owner

Hi, thanks for the great details. This is perhaps because of the use of MTF, and rather a bug of DdiMon, i.e., a compatibility issue but the cause being DdiMon preventing normal-ish operations.

When DdiMon detects a processor tries to read from or write to the shadowed page, DdiMon lets the processor do so by making the page readable-not-executable but also sets MTF and reverts the page to non-readable-but-executable after the read instruction is completed. This means, if someone attempt to read the page 5 times, it leads to 5 EPT-violation-exits and 5 MTF-exits.
https://github.com/tandasat/DdiMon/blob/master/DdiMon/shadow_hook.cpp#L293

My theory in this case is that the driver reads a whole page (4096 bytes) at high IRQL (which is probably done by MmCopyMemory API internally). This would trigger (4096 / 16) times of read and 512 VM-exits, which is slow enough to fire the watchdog bug check.

I am not sure if that's really the case but think it is a possibility. Hvpp restores the page protection in lazy manner IIRC. That is, leave the page readable-not-executable after EPT-violation caused by read, until someone actually tries to execute that page. This is certainly much faster in the scenario like this. You could verify this theory by calling MmCopyMemory with a large size against the shadowed page.

@purripurri
Copy link
Author

oh! that makes perfect sense! i was suspecting something of the sort but my knowledge is not nearly enough to solve on my own. thank you for such a clear explanation!!! i will attempt to lazy the ept violation vmexits a bit and see if i can get around it. closing the issue as i think you have perfectly answered my question. thanks again satoshi! and happy holidays!

@purripurri
Copy link
Author

purripurri commented Dec 25, 2018

you were absolutely right regarding the freezes! i tried to "lazy" the ept violation vmexit handler as you mentioned and it indeed stopped the crashes with this particular application. though i have run into a new issue. it seems trivial but for some reason i cannot get it to work correctly. most of the time it is fine, but will occasionally return a read page when it should return an exec page. i'm sure i am making a mistake somewhere, but after about 8 hours i am kind of at the end of my wits. i cannot seem to find why this is happening. would it be asking too much to request some help on how to correctly implement the "lazy" way like hvpp?

i apologize if this is a dumb question. i'm far from your level, i'm just trying to learn. :) this is what i attempted to do. i changed ShHandleEptViolation to take an additional parameter so i could specify whether i wanted to return exec or r/w page and not have to use MTF.

// Handles EPT violation VM-exit.
_Use_decl_annotations_ void ShHandleEptViolation(
    ShadowHookData* sh_data, const SharedShadowHookData* shared_sh_data,
    EptData* ept_data, void* fault_va, BOOLEAN exec) {
  if (!ShpIsShadowHookActive(shared_sh_data)) {
    return;
  }

  const auto info = ShpFindPatchInfoByPage(shared_sh_data, fault_va);
  if (!info) {
    return;
  }

  if (exec) {
    ShpEnablePageShadowingForExec(*info, ept_data);
    return;
  }
  // EPT violation was caused because a guest tried to read or write to a page
  // where currently set as execute only for protecting a hook. Let a guest
  // read or write a page from a read/write shadow page and run a single
  // instruction.
  ShpEnablePageShadowingForRW(*info, ept_data);
}

and i changed the ept violation exit handler like so:

// Deal with EPT violation VM-exit.
_Use_decl_annotations_ void EptHandleEptViolation(
    EptData *ept_data, ShadowHookData *sh_data,
    SharedShadowHookData *shared_sh_data) {
  const EptViolationQualification exit_qualification = {
      UtilVmRead(VmcsField::kExitQualification)};

  const auto fault_pa = UtilVmRead64(VmcsField::kGuestPhysicalAddress);
  const auto fault_va = reinterpret_cast<void *>(
      exit_qualification.fields.valid_guest_linear_address
          ? UtilVmRead(VmcsField::kGuestLinearAddress)
          : 0);

  if (exit_qualification.fields.ept_readable ||
      exit_qualification.fields.ept_writeable ||
      exit_qualification.fields.ept_executable) {
    // EPT entry is present. Permission violation.
    if (exit_qualification.fields.caused_by_translation) {
      // Tell EPT violation when it is caused due to read or write violation.
      const auto read_failure = exit_qualification.fields.read_access &&
                                !exit_qualification.fields.ept_readable;
      const auto write_failure = exit_qualification.fields.write_access &&
                                 !exit_qualification.fields.ept_writeable;

      if(exit_qualification.fields.read_access)
      {
        if (read_failure) {
          ShHandleEptViolation(sh_data, shared_sh_data, ept_data, fault_va,
            FALSE);
        }
      } else if(exit_qualification.fields.write_access)
      {
        if (write_failure) {
          ShHandleEptViolation(sh_data, shared_sh_data, ept_data, fault_va,
            FALSE);
        }
      }
      else
      {
        ShHandleEptViolation(sh_data, shared_sh_data, ept_data, fault_va,
          TRUE);
      }
      
    }
    return;
  }

but as i mentioned, read pages are often returned when i expect exec pages and the hooks are not called as a result. :/

@Brut7
Copy link

Brut7 commented Dec 26, 2018

it should help you.
void EptHandleEptViolation()
{
......
// EPT entry is present. Permission violation.
if (exit_qualification.fields.caused_by_translation)
{
// Tell EPT violation when it is caused due to read or write violation.
const auto read_failure = exit_qualification.fields.read_access &&!exit_qualification.fields.ept_readable;
const auto write_failure = exit_qualification.fields.write_access &&!exit_qualification.fields.ept_writeable;
const auto execute_failure = exit_qualification.fields.execute_access && !exit_qualification.fields.ept_executable;
if (read_failure || write_failure)
{
HandleEptViolationRW((ShadowPageInfo*)shared_sh_data, ept_data, fault_va, guest_ip);
}
else if (execute_failure)
{
HandleEptViolationExecute((ShadowPageInfo*)shared_sh_data, ept_data, fault_va, guest_ip);
}
}
return;
}

void EnablePageShadowingForRW()
{
......
ept_pt_entry->fields.write_access = true;
ept_pt_entry->fields.read_access = true;
ept_pt_entry->fields.execute_access = false;


}


void EnablePageShadowingForExec()
{
......
ept_pt_entry->fields.write_access = false;
ept_pt_entry->fields.read_access = false;
ept_pt_entry->fields.execute_access = true;


}

@purripurri
Copy link
Author

purripurri commented Dec 26, 2018

edit: this is indeed perfect. i had attempted this originally but made an embarrassing mistake causing it not to work. thank you @Brut7! happy holidays!

@Brut7
Copy link

Brut7 commented Dec 26, 2018

Forgot to say that the function ShpFindPatchInfoByPage is not correct. Ept hook setup on the physical address and the search is conducted on the virtual address. a physical page can be accessed from a different virtual address (for example, a memory map). You will arise an unhandled ept violations which will call Bug check.
some correct code.
//store physical adress in hook function
m_HookInfo->PaPatchAddress = UtilPaFromVa(hook_address);

//fix for ShpFindPatchInfoByPage
ULONG64 fault_pa = UtilVmRead64(VmcsField::kGuestPhysicalAddress);
if (PAGE_ALIGN(m_HookInfo->PaPatchAddress) == PAGE_ALIGN(fault_pa))
{
return m_HookInfo;
}
and just fix all calls
EptGetEptPtEntry(ept_data, UtilPaFromVa(info.patch_address));
to
EptGetEptPtEntry(ept_data, info->PaPatchAddress);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants