# Paint Transformer
<img src=https://i.imgur.com/mqlEuC9.png />
<img src=https://github.com/Huage001/PaintTransformer/raw/main/picture/2.gif width=30%/>
<img src=https://github.com/Huage001/PaintTransformer/raw/main/picture/1.gif width=30%/>
<img src=https://github.com/Huage001/PaintTransformer/raw/main/picture/3.gif width=30%/>


`Paint Transformer: Feed Forward Neural Painting with Stroke Prediction<https://arxiv.org/pdf/2108.03798.pdf>`<br>
這是百度在2021 8月提出的論文<br>
先回顧一下前面提到的論文。<br>
前面提到的
* 都是利用VGG抽取特徵，並用這個特徵量測內容、風格的距離
* 而若要訓練一個模型，會需要大量的內容影像、風格影像
* 輸出是一張靜態影像

----
而在百度提出的這篇論文中，<font size=6rem>僅僅只需要一張筆觸影像</font><br>
完全不需要任何的內容影像就能夠訓練出一個能一步步繪畫的模型<br>
<img src=https://imgur.com/U9XZPHE.png/><br>
他們的作法是一個蠻巧妙的方法<br>
由於筆觸影像非常的簡單，我們可以對筆觸影像做調整
* 顏色
* 角度
* 長寬
* 中心點位置

<img src=https://i.imgur.com/LqPRFoV.png /><br>
我們可以透過去調整這些參數去渲染隨機的影像，而後把這問題變成一個物件偵測問題。<br>
用 $S_b$ 渲染出 $I_c$，基於 $I_c$ 畫上 $S_f$ 渲染出 $I_t$ ，最後用一個物件偵測模型輸入$I_c, I_t$ 去預測出 $S_r$。 損失函數是希望 $S_r$ 與 $S_f$ 越近越好，以及$I_r$與$I_t$越近越好。
> 記號
* $I$: 影像
* $S$: 筆觸參數 (被隨機產生的)

In [None]:
#@title 下載程式碼
!git clone https://github.com/Huage001/PaintTransformer.git
!mv PaintTransformer/* .
!rm -r PaintTransformer

Cloning into 'PaintTransformer'...
remote: Enumerating objects: 104, done.[K
remote: Total 104 (delta 0), reused 0 (delta 0), pack-reused 104[K
Receiving objects: 100% (104/104), 12.58 MiB | 18.70 MiB/s, done.
Resolving deltas: 100% (25/25), done.
mv: cannot move 'PaintTransformer/inference' to './inference': Directory not empty
mv: cannot move 'PaintTransformer/picture' to './picture': Directory not empty
mv: cannot move 'PaintTransformer/train' to './train': Directory not empty


In [None]:
#@title 下載模型
%%bash
cd inference
gdown 1NDD54BLligyr8tzo8QGI5eihZisXK1nq

Downloading...
From: https://drive.google.com/uc?id=1NDD54BLligyr8tzo8QGI5eihZisXK1nq
To: /content/inference/model.pth
  0%|          | 0.00/36.3M [00:00<?, ?B/s] 55%|█████▍    | 19.9M/36.3M [00:00<00:00, 197MB/s]100%|██████████| 36.3M/36.3M [00:00<00:00, 255MB/s]


In [None]:
#@title 下載影像
url = "https://i.pinimg.com/originals/cb/cc/4d/cbcc4d43bf2e5a5baf4931b635f35253.jpg" #@param {type:"string"}
!wget {url} \
  -O inference/input/photo.jpg

--2022-08-20 20:57:47--  https://i.pinimg.com/originals/cb/cc/4d/cbcc4d43bf2e5a5baf4931b635f35253.jpg
Resolving i.pinimg.com (i.pinimg.com)... 104.110.240.146, 104.110.240.74, 2a04:4e42:65::84
Connecting to i.pinimg.com (i.pinimg.com)|104.110.240.146|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94010 (92K) [image/jpeg]
Saving to: ‘inference/input/photo.jpg’


2022-08-20 20:57:48 (11.9 MB/s) - ‘inference/input/photo.jpg’ saved [94010/94010]



In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
  has_gpu=False
else:
  print(gpu_info)
  has_gpu=True

Thu Aug 25 22:24:49 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
if has_gpu:
  # 原始程式碼用的是圖片版，這邊我們把它變成一步一步作畫版
  !sed -i 's/need_animation=False/need_animation=True/g' ./inference/inference.py

# 換成剛剛下載的影像
!sed -i 's/chicago.jpg/photo.jpg/g' ./inference/inference.py

In [None]:
%%bash
cd inference
python inference.py

It must be under serial mode if animation results are required, so serial flag is set to True!
25.385197162628174


  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


In [None]:
import cv2

if not has_gpu:
  print("do not generate video")
  img = cv2.imread("/content/inference/output/photo.jpg")
  import plotly.express as px
  fig = px.imshow(img[..., ::-1])
  fig.show()
else:
  path_format = "/content/inference/output/photo/{:03d}.jpg"
  sample_img = cv2.imread(path_format.format(1))
  h, w = sample_img.shape[:2]
  fps=30
  output_path = f"out.mp4"
  out_video_writer = cv2.VideoWriter(
      output_path,
      cv2.VideoWriter_fourcc(*"mp4v"),
      fps,
      (w, h)
  )
  for idx in range(200):
    path = path_format.format(idx+1)
    img = cv2.imread(path)
    out_video_writer.write(img)
  out_video_writer.release() 

In [None]:
from IPython.display import HTML
HTML(f"""<video src="https://i.imgur.com/dhNR30a.mp4" width=70% controls autoplay/>""")

# 課後挑戰
在PaintTransformer中，他們是直接的輸出筆劃的
* 位置
* 角度
* 大小
* 位置

並用這些參數去把參考圖片畫上去，由於能夠得到這些參數，我們可以基於這些參數做一些更動
* 筆刷：例如換成長方形、橢圓形、箭頭等等
* 動態：例如讓火爐處的筆觸會隨著時間做輕微晃動

相似研究:<br>
`Stylized Neural Painting<https://arxiv.org/abs/2011.08114>`<br>