In [1]:
from happytransformer import HappyGeneration

#### 1. model 생성

In [2]:
happy_gen = HappyGeneration("GPT-NEO", "EleutherAI/gpt-neo-1.3B")

01/29/2022 09:56:26 - INFO - happytransformer.happy_transformer -   Using model: cpu


#### 2. 텍스트 생성방법

In [3]:
result = happy_gen.generate_text("We must invest in ")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [4]:
print(result)

GenerationResult(text='\nthe future of our energy \ninfrastructure.\nAnd we must do it now.\nAnd we must do it now.\nAnd we must do it now.\nAnd we must do it now.\nAnd we must do it now')


In [5]:
print(result.text)


the future of our energy 
infrastructure.
And we must do it now.
And we must do it now.
And we must do it now.
And we must do it now.
And we must do it now


#### 3. Greedy 알고리즘 (default)
* 가장 확률이 높은 단어를 선정하여 생성하지만, 문제는 같은 단어가 반복해서 나오기 쉬움
* 해결방법 : n-gram(토큰수를 2개이상 고려할때 사용) 이용. n-gram 사이즈를 이용해서 같은 단어를 반복하지 않게 설정
* GENSettings 메소드를 임포트하고 반복하지 않을 n-gram size 설정

In [6]:
from happytransformer import GENSettings

In [26]:
# default = no_repeat_ngrams_size = 2(0, 2)
# max_length 토큰 수 지정 (10 to 50)
greedy_settings = GENSettings(no_repeat_ngram_size=2, max_length=50)

In [27]:
greedy_result = happy_gen.generate_text('We must invest in ', args=greedy_settings)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [28]:
print(greedy_result)

GenerationResult(text='\nthe future of our energy   and  our  economy.  We must  invest  in the  future  of  the energy industry.\n\nWe need to invest to  create jobs,  to create  new technologies, to')


In [29]:
print(greedy_result.text)


the future of our energy   and  our  economy.  We must  invest  in the  future  of  the energy industry.

We need to invest to  create jobs,  to create  new technologies, to


In [32]:
len('We need to invest to  create jobs,  to create  new technologies, to')

67

#### 4. Generic Sampling 알고리즘 : 텍스트 생성 알고리즘
* 계속해서 문장의 다음단어를 생성해내는 것이 특징이고 확률로 결정
* 문제는 단어가 많아지면 실제 와야 할 단어를 분별하지 못함
* 해결방법 : temperature 컨셉 이용. (0~1)
    * 0에 가까워지면 greedy 알고리즘에 가까워짐 (0 or 1)
    * 1에 가까워지면 grid 알고리즘에 가까워짐   (주어진 단어 전체에 대한 확률)

In [46]:
### do_sample =True : 모든 토큰의 확률 고려
### top_k = 0: 모든 토큰 고려

generic_sampling_settings = GENSettings(do_sample=True, top_k=0, temperature=1, max_length=50)

In [47]:
generic_sampling_result = happy_gen.generate_text("We must invest in ", args=generic_sampling_settings)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [48]:
print(generic_sampling_result.text)


new energy infrastructure

back to work
(27-Dec-2011)

In today’s globalized world, the 
increasingly erratic nature of energy demand requires 
investment across the energy system if our planet


#### 5. Top-k sampling
* 모든 토큰이 아닌 Top 토큰만 지정
* 반복되는 토큰이 없도록 하려면 no_repeat_ngram_size 지정

In [54]:
top_k_sampling_settings = GENSettings(do_sample=True, top_k=50, temperature=0.5, max_length=50, no_repeat_ngram_size=2)

In [57]:
top_k_sampling_result = happy_gen.generate_text("We must invest in ", args=top_k_sampling_settings)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [58]:
print(top_k_sampling_result.text)


the next generation of   technologies.

We need to invest  in the next  generation  of energy  and  communications  infrastructures.  We need  to  invest,  not  only in new  power


#### 6. GPT-2 
* openAI 개발. 공개하기를 꺼려 했음
* gpt2-x1 : 1.5 billion parameters 사용
* gpt2-large : 작은 버전

In [59]:
happy_gpt2 = HappyGeneration("GPT2", 'gpt2-large')

01/29/2022 16:09:45 - INFO - happytransformer.happy_transformer -   Using model: cpu


In [62]:
result_gpt2 = happy_gpt2.generate_text('We must invest in ', args=top_k_sampling_settings)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [63]:
print(result_gpt2.text)

 the future of our children and the future generations.  
I want to see a world where the children of the world are taught to love each other, to respect the earth, and to be good citizens. I want a place where everyone
