## 未认证时

不管是否使用代理，均能够正常访问页面，但是因为没有认证所以获取到的页面元素不全。

In [14]:
import requests


r = requests.get('https://gerrit.ext.net.me.com/gerrit/c/MN/5G/NB/gnb/+/4767531')
print(r)

'''
with open("gerrit_verify_false.txt", "w", encoding="utf-8") as f:
    f.write(r.text)
'''

<Response [200]>


'\nwith open("gerrit_verify_false.txt", "w", encoding="utf-8") as f:\n    f.write(r.text)\n'

使用代理，获取到的结果也一样，排除代理问题。

In [10]:
proxies = {
  "http": "http://10.144.1.10:8080",
  "https": "http://10.144.1.10:8080",
}

r = requests.get('https://gerrit.ext.net.me.com/gerrit/c/MN/5G/NB/gnb/+/4767531', proxies=proxies)
print(r.text)

<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<meta name="description" content="Gerrit Code Review">
<meta name="referrer" content="never">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0">
<noscript>To use PolyGerrit, please enable JavaScript in your browser settings, and then refresh this page.</noscript><script>window.CLOSURE_NO_DEPS = true;window.CANONICAL_PATH = '\/gerrit';window.STATIC_RESOURCE_PATH = '\/gerrit';</script>
<link rel="icon" type="image/x-icon" href="/gerrit/favicon.ico">
<link rel="preload" href="/gerrit/fonts/RobotoMono-Regular.woff2" as="font" type="font/woff2" crossorigin="anonymous">
<link rel="preload" href="/gerrit/fonts/RobotoMono-Regular.woff" as="font" type="font/woff" crossorigin="anonymous">
<link rel="preload" href="/gerrit/fonts/Roboto-Regular.woff2" as="font" type="font/woff2" crossorigin="anonymous">
<link rel="preload" href="/gerrit/fonts/Roboto-Regular.woff" as="font" type="font/woff" c

## 如何认证？

尝试手动访问，未登录时页面显示如下。如上获取到的内容其实就是未认证时的内容：

![](./gerrt_request_login.png)

从[How to "log in" to a website using Python's Requests module?](https://stackoverflow.com/questions/11892729/how-to-log-in-to-a-website-using-pythons-requests-module)学习到，要进行登录的认证，那么需要从登录页面上获取几项信息：

- 表单提交对应的url
- 用户名和密码框对应的name属性



In [26]:
import requests

payload = {
    'username': '--',
    'password': '--'
}

login_url = 'https://gerrit.ext.net.me.com/gerrit/login/%2F%2Fc%2FMN%2F5G%2FNB%2Fgnb%2F%2B%2F4767531'
data_url = 'https://gerrit.ext.net.me.com/gerrit/c/MN/5G/NB/gnb/+/4767531'
with requests.Session() as s:
    #'''
    p = s.post(login_url, data=payload, allow_redirects=False)
    # print the html returned or something more intelligent to see if it's a successful login page.
    with open("gerrit_post_response.txt", "w", encoding="utf-8") as f:
        f.write(p.text)
    print(p)
    print(p.history)
    #'''

    # An authorised request.
    r = s.get(data_url, cookies=p.cookies, allow_redirects=False)
    with open("gerrit_authorised_response.txt", "w", encoding="utf-8") as f:
        f.write(r.text)
    print(r)
    print(r.history)


<Response [302]>
[]
<Response [200]>
[]


## 试试selenium

上面即便认证成功了，获取到的内容还是很少，那么可能是网页中有一些javascript，通过requests无法一次性加载完全。所以试试看selenium。

使用selenium在认证之后需要一定的延时来让网页全部加载出来，这里引入了sleep()函数，否则获取到的数据很少。

In [66]:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import re

#login_url = 'https://gerrit.ext.net.me.com/gerrit/login/%2F%2Fc%2FMN%2F5G%2FNB%2Fgnb%2F%2B%2F4767531'
login_url = 'https://gerrit.ext.net.me.com/gerrit/login/%2F%2Fq%2Fstatus%3Aopen'
data_url = 'https://gerrit.ext.net.me.com/gerrit/c/MN/5G/NB/gnb/+/4777769'
driver = webdriver.Firefox()
driver.get(login_url)
driver.find_element_by_name("username").send_keys("--")
driver.find_element_by_name("password").send_keys("--")
driver.find_element_by_id("b_signin").click()
time.sleep(5)

driver.get(data_url)
time.sleep(5)
with open("gerrit_selenium.txt", "w", encoding="utf-8") as f:
    f.write(driver.page_source)

bs = BeautifulSoup(driver.page_source)
builds = bs.find_all("a", string=re.compile(".*https://ece-ci.dynamic.me-net.net/job/MASTER/job/GNB/job/UPLANE/job/L2-LO/job/SCT.fuse.asib_abio/.*"))
'''
[
<a class="style-scope gr-linked-text" href="https://ece-ci.dynamic.me-net.net/job/MASTER/job/GNB/job/UPLANE/job/L2-LO/job/SCT.fuse.asib_abio/37383/" rel="noopener" target="_blank">https://ece-ci.dynamic.me-net.net/job/MASTER/job/GNB/job/UPLANE/job/L2-LO/job/SCT.fuse.asib_abio/37383/</a>, 
<a class="style-scope gr-linked-text" href="https://ece-ci.dynamic.me-net.net/job/MASTER/job/GNB/job/UPLANE/job/L2-LO/job/SCT.fuse.asib_abio/37424/" rel="noopener" target="_blank">https://ece-ci.dynamic.me-net.net/job/MASTER/job/GNB/job/UPLANE/job/L2-LO/job/SCT.fuse.asib_abio/37424/</a>, 
<a class="style-scope gr-linked-text" href="https://ece-ci.dynamic.me-net.net/job/MASTER/job/GNB/job/UPLANE/job/L2-LO/job/SCT.fuse.asib_abio/37619/" rel="noopener" target="_blank">https://ece-ci.dynamic.me-net.net/job/MASTER/job/GNB/job/UPLANE/job/L2-LO/job/SCT.fuse.asib_abio/37619/</a>]
'''

s = builds[-1].string
s = s[s.find('SCT.fuse.asib_abio'):]
print(s)
print(s.split('/'))
build_id = s.split('/')[1]
print(build_id)
driver.quit()

SCT.fuse.asib_abio/38137
['SCT.fuse.asib_abio', '38137']
38137


一个测试：检查CI JOB链接是否可用

In [74]:
import requests

def respond_ok(code):
    if code == 200:
        print("OK")
    else:
        print("NOK")

url1 = 'https://es5gci43.emea.me-net.net:54001/MASTER/GNB/UPLANE/L2-LO/SCT.fuse.asib_abio/38346/artifacts/'
r1 = requests.get(url1, verify=False)
respond_ok(r1.status_code)

url2 = 'https://es5gci43.emea.me-net.net:54001/MASTER/GNB/UPLANE/L2-LO/SCT.fuse.asib_abio/38021/artifacts/'
r2 = requests.get(url2, verify=False)
respond_ok(r2.status_code)



OK
NOK


