# 学习的开始

笔记用于记录学习《Python绝技》这本书的过程中输出的代码、文字，因此采用jupyter notebook作为记录和分享的方式。很喜欢这个工具。

在笔记中，除了编写和运行相应功能的代码之外，同时把每个部分所使用到的工具、数据都进行说明。全部学完之后，也许会在README中把所有需要的工具和Python库做一下整理。

话不多说，下面正式进入学习！

# UNIX口令破解
## Crypt算法

Unix系统计算口令的算法是加盐（salt）的crypt，调用crypt方法，将salt和口令作为参数计算出一个hash值。Python的crypt库中带有crypt()方法可以用来计算这个hash值。

假设口令为“jupyter”，salt为“NT”，计算得到hash值为：

In [1]:
import crypt

salt = "NT"

# case 1:
passwd = "jupyterr"
crypt.crypt(passwd, salt)

'NT/Xpm78Dk6hk'

In [2]:
# case 2:
passwd = "jupyterrere"
crypt.crypt(passwd, salt)

'NT/Xpm78Dk6hk'

通过这个简单的例子可以观察到，加密后的hash值**前两位是salt本身**。我们根据Unix的crypt函数计算方法可以知道，crypt算法基于**DES**加密算法[^DES]，DES是基于字符置换的密码算法，salt用于扰动算法。crypt算法接受一个长度不超过8的口令，从上面两个例子对比可知，如果长度超过8，后面的字符会被丢弃，因此两个计算结果是相同的。salt是长度为2的字符串。

这种情况下计算出来的密码是很难防碰撞的，因此实际上Linux内部用了更有效的加密算法[^linux_crypt]。不过我们仍然可以用crypt来理解口令破解的过程。

针对上述情况，我们可以在获得密文的前提下，使用词库中的词汇进行计算和hash匹配，碰撞得到口令明文。因此，如果使用的是弱口令，就容易被枚举出来破解。简单写个破解算法：

In [3]:
import crypt

# supposed that the thesaurus have only these words but in fact it can be read from a file and have lots of candidate passwords. 
thesaurus=["jupyter", "123456", "abcdef", "notebook"] 

def test_pwd(user_pwd):
    user = user_pwd.split(':')[0].strip()
    cipher_text = user_pwd.split(":")[1].strip()
    
    if len(cipher_text) != 13:
        raise Exception(f"length of cipher text must be 13 (found: {len(cipher_text)}; `{user}: {cipher_text}`)!")
    salt = cipher_text[:2]
        
    # enum all candidate password    
    for w in thesaurus:
        if (crypt.crypt(w, salt) == cipher_text):
            print(f"[+] plain password of `{user}: {cipher_text}` found: `{w}`")
            return
    print(f"[+] plain password of `{user}: {cipher_text}` not found")

# test three cipher text 
user_pwd = ['user1:d36j1XGIvC.6g', 'hello1:WH8PqwO5uDWMc', 'nobody:Y7/8PqwOc']
for up in user_pwd:
    try:
        test_pwd(up)
    except Exception as e:
        print(f"[-] {str(e)}")

[+] plain password of `user1: d36j1XGIvC.6g` found: `notebook`
[+] plain password of `hello1: WH8PqwO5uDWMc` not found
[-] length of cipher text must be 13 (found: 9; `nobody: Y7/8PqwOc`)!


## SHA-512密码破解

目前很多系统的用户密码不会存储在`/etc/passwd`中，使用的也不是DES算法，而是SHA-512算法。我们可以在`/etc/shadow`中找到这些加密后的密文，同时

In [4]:
import re
from hashlib import sha512
from crypt import crypt

def get_passwd_cipher():
    shadow_text = !echo '123' | su -c 'cat /etc/shadow' root
    shadow_text = list(map(lambda x: x.split('：')[1] if '密码：' in x else x, 
                           filter(lambda x: len(re.findall(r'\$', x)) == 3, 
                                  shadow_text
                                 )
                          )
                      )
#     print(shadow_text)
    return shadow_text

def load_words_dictionary():
    return ['123456', '123', 'abcd', 'hello']

def enum_passwd(thesaurus, cipher_text_list):
    def match(thesaurus, cipher):
        user = cipher.split(':')[0]
        pwd = cipher.split(':')[1]
        print(f'user: {user}; password: {pwd}')
        
        pwd_id = pwd.split('$')[1]
#         salt = pwd.split('$')[2]
        salt = pwd[: pwd.rindex('$')]
        pwd_cipher = pwd.split('$')[3]
        print(f'id: {pwd_id}; salt: {salt}; password_cipher: {pwd_cipher}')
        
        for w in thesaurus:
            sha_obj = sha512(w.encode('utf-8'))
            sha_obj.update(salt.encode('utf-8'))
            sha_w = sha_obj.digest()
            print(sha_w)
#             print()
#             print(pwd_cipher)
#             print()
#             print(sha_w + '\n')
#             break
            
            print(crypt(w, salt))
    
    for cipher_text in cipher_text_list:
        match(thesaurus, cipher_text)
    
enum_passwd(load_words_dictionary(), get_passwd_cipher())

user: root; password: $6$S1mpkc6dpOx7G01l$V/hCAF9UmjpTC1gUPqQqb2/atLkJAPwMlrAOe.3heOQsxHdMsxWV/5OTiTotL1e3iRQk4IpehZ3.greyuYmMc0
id: 6; salt: $6$S1mpkc6dpOx7G01l; password_cipher: V/hCAF9UmjpTC1gUPqQqb2/atLkJAPwMlrAOe.3heOQsxHdMsxWV/5OTiTotL1e3iRQk4IpehZ3.greyuYmMc0
b'\xbc\xddl2A\xb3\xc61\xf6ID\x18\x10\xcd\xcc\xa84D\xf6]\xb3+\x8f9M\r\x01\xe8s\xb2m\xb9\x02f\xa5N\x98^\x83\xa6\xd0"\x9bI\xbd.\xb8\xfe\x8e|\xceN\xa2\xc9\xc6\xceh_\xbc\xfe\x13\xcc\xf0Y'
$6$S1mpkc6dpOx7G01l$O..6UuqI4C1Nlkm4BPdSeeF2.xv4NccsK87MeCRoscHEfjtbBvVgqlrDSlkKVeD5YsbAcCMRah3JDwkikxlov.
b")\xe0\x14h\x0c\xcaR\xf5i\xdf\x98*\xf8\x1c(\xb5\x9bk\xe0\xff>\xda\xb7`\n\xeam\xe6\xebd^\xf2N\x8f\xa7\xf0\xb0\xbb\xc5\x88_\xba\xfd4[\xd3\x12'\xa2\x12\xf8\xa02\xa7Q\x9d{\x92\xcfv\xfb\x1ei\xe4"
$6$S1mpkc6dpOx7G01l$V/hCAF9UmjpTC1gUPqQqb2/atLkJAPwMlrAOe.3heOQsxHdMsxWV/5OTiTotL1e3iRQk4IpehZ3.greyuYmMc0
b'\x82\xe6\xa6\x1bt\xf4\xb4\x18o\x8f\xe7\xae\xe7\x07\xd2Q\xf6\x85\xc5WT\x86r\x12\x11x\xce\xa7\x19\xf8Q\xa1\xa9\xc0\xfd\xbak\xb8@\x82\xf6\x02\xbb

经过阅读源码，Python的crypt模块调用了_crypt的加密方法，而这个模块是一个系统的库：'$ANACONDA_HOME/lib/python3.8/lib-dynload/_crypt.cpython-38-x86_64-linux-gnu.so',所以归根到底应该还是Linux自身的crypt方法？需要进一步了解这个函数。

# 参考资料

[^DES]: [DES算法](https://baike.baidu.com/item/des%E7%AE%97%E6%B3%95/10306073#:~:text=DES%E7%AE%97%E6%B3%95%E4%B8%BA%20%E5%AF%86%E7%A0%81%E4%BD%93%E5%88%B6%20%E4%B8%AD%E7%9A%84%E5%AF%B9%E7%A7%B0%E5%AF%86%E7%A0%81%E4%BD%93%E5%88%B6%EF%BC%8C%E5%8F%88%E8%A2%AB%E7%A7%B0%E4%B8%BA%E7%BE%8E%E5%9B%BD,%E6%95%B0%E6%8D%AE%E5%8A%A0%E5%AF%86%E6%A0%87%E5%87%86%20%EF%BC%8C%E6%98%AF1972%E5%B9%B4%E7%BE%8E%E5%9B%BDIBM%E5%85%AC%E5%8F%B8%E7%A0%94%E5%88%B6%E7%9A%84%E5%AF%B9%E7%A7%B0%E5%AF%86%E7%A0%81%E4%BD%93%E5%88%B6%20%E5%8A%A0%E5%AF%86%E7%AE%97%E6%B3%95%20%E3%80%82)
[^linux_crypt]: [linux crypt函数](https://blog.csdn.net/liuxingen/article/details/46673305)