## 미세조정(fine tuning)

이번시간에는 간단하게 [datasets](https://huggingface.co/docs/datasets/main/en/loading#local-and-remote-files) 을 이용해서 파인튜닝을 해보겠습니다. 공식문서에서 보면 목적을 수행하기 위한 데이터 셋을 꾸리고, 해당 데이터 셋이 [손실 함수(loss function)](https://sbert.net/docs/sentence_transformer/loss_overview.html) 과 잘 맞아야 한다고 적혀있습니다.

일단 해야할 데이터셋을 보면서 해야할 과제를 정해보도록 하겠습니다.

## "유사도(비슷함)" 을 알려줘 보자!

우리가 문장들을 가지고 '패션', '경제', '스포츠' 이렇게 분류하는 모델을 만든다고 생각해 볼게요. 모델이 "이 스웨터는 최신 유행이야"라는 문장과 "파리 패션위크가 곧 열려"라는 문장이 서로 '비슷하다'고 느끼게 하려면 어떻게 해야 할까요? 반대로 "주식 시장이 하락했어"라는 문장과는 '다르다'고 느끼게 하려면 어떻게 해야할까요?

사람은 딱 보면 알지만, 컴퓨터는 그렇지 않아요. 그래서 우리는 모델에게 '비슷함'과 '다름'의 기준을 가르쳐줘야 합니다. 이 가르침의 방법 중 하나가 바로 **손실 함수(Loss Function)**를 이용하는 거예요. 손실 함수는 모델이 얼마나 잘 못하고 있는지를 알려주는 '벌점' 같은 거라고 생각하면 쉬워요. 모델은 이 벌점을 줄이려고 노력하면서 학습하게 됩니다.

모델이 문장의 의미를 숫자로 표현하도록 만드는 과정을 **임베딩(Embedding)**이라고 해요. 각 문장을 마치 지도 위의 한 점처럼, 다차원의 공간에 '의미 주소'를 부여해주는 것입니다. 

우리의 목표: 비슷한 의미의 문장들은 이 '의미 지도'에서 서로 가까운 곳에 주소가 찍히도록 하고, 다른 의미의 문장들은 멀리 떨어진 곳에 주소가 찍히도록 하는 거예요.

- "이 스웨터는 최신 유행이야" (패션) <--- 가깝게 ---> "파리 패션위크가 곧 열려" (패션)
- "이 스웨터는 최신 유행이야" (패션) <--- 멀게 ---> "주식 시장이 하락했어" (경제)

## Triplet

이제 '가깝게', '멀게'를 어떻게 가르칠까요? 여기서 Triplet (트리플렛), 즉 '세 쌍둥이' 또는 '세 친구' 개념이 등장하는데요.

- **Anchor (앵커, 기준점)**: 이야기의 주인공 문장입니다. (예: "이 스웨터는 최신 유행이야" [패션])
- **Positive (포지티브, 긍정적 예시)**: 앵커와 '비슷한' 의미를 가진 문장입니다. 즉, 앵커와 가까이 있어야 하는 친구죠. (예: "파리 패션위크가 곧 열려" [패션])
- **Negative (네거티브, 부정적 예시)**: 앵커와 '다른' 의미를 가진 문장입니다. 즉, 앵커와 멀리 떨어져 있어야 하는 친구죠. (예: "주식 시장이 하락했어" [경제])
Triplet Loss의 기본 아이디어: "앵커와 포지티브 사이의 거리"는 "앵커와 네거티브 사이의 거리"보다 작아야 한다!

더 나아가서, 그냥 작기만 한 게 아니라, 최소한 **'margin(여유 공간)'**만큼은 더 작아야 한다고 조건을 답니다.
- 거리(앵커, 포지티브) + margin < 거리(앵커, 네거티브) => 이 개념은 추후에 설명할테니 기억만 해두세요!

if 만약 이 조건이 만족되면? 모델이 잘 하고 있으니 벌점(loss)은 0에 가까워요.
if 만약 이 조건이 만족되지 않으면? (즉, 포지티브가 네거티브보다 멀거나, 충분히 가깝지 않으면) 모델이 잘못하고 있으니 벌점을 줘서 "야, 앵커랑 포지티브는 더 가깝게 만들고, 네거티브는 더 멀리 떨어뜨려!"라고 알려주는 거죠.

## BatchAllTripletLoss 로 실습해보기

BatchAllTripletLoss 를 하나하나 뜯어보겠습니다. 첫번째 단어인 `Loss` 는 위에서 설명한 '벌점' 시스템입니다. `Triplet` 은 방금 배운 **'앵커-포지티브-네거티브'** 세 친구를 사용한다는 뜻입니다. 그래서 공식문서에도 최소 3개의 클래스(라벨)을 이용해야 한다는 이유가 TripletLoss 이기 때문입니다. Batch(배치) 는 우리가 모델을 학습시킬 때, 데이터를 한 번에 조금씩 묶어서 처리하는 것을 의미합니다. 그렇다면 어떻게 데이터 셋을 구성하고 만들까요?

### 데이터셋 구성

첫번째로, 배치안의 모든 문장을 앵커로 삼는 시도를 해봅니다. 그리고 앵커와 **비슷한 라벨을 가진 친구들은 긍정적인 쌍(positive pair), 먼 친구들(패션-경제)는 부정적인 쌍(negative pair)** 로 설정하는 것이죠. 

- F1, F2 (패션 문장)
- E1, E2 (경제 문장)
- F1, E1 (부정적인 쌍)

즉, F1 이 앵커인 경우에 데이터 셋이 아래와 같이 구성되게 됩니다.

- (앵커: F1, 포지티브: F2, 네거티브: E1) 

이제 개념은 이정도로 하고 실습으로 보면서 익혀보도록 하겠습니다. 꼭 아래 내용은 위의 개념을 머리 속에서 생각하며 따라와주세요.

### 그럼 이걸로 무엇을 할수 있을까?

#### 의미론적 군집 형성

이 손실 함수를 사용하면, 같은 카테고리의 문장들은 임베딩 공간에서 서로 옹기종기 모이게 되고, 다른 카테고리의 문장들은 서로 멀리 떨어지게 됩니다. 즉, '패션' 문장들끼리 하나의 그룹, '경제' 문장들끼리 또 다른 그룹을 형성하도록 유도하는 거죠. 마치 비슷한 친구들끼리 모이는 것과 비슷합니다.

#### 다운스트림 작업 향상

- **유사 문장 검색**: "이 패션 기사와 비슷한 다른 기사 찾아줘!" 할 때 더 정확하게 찾아줍니다.
- **분류**: 임베딩 위에 간단한 분류기를 추가하면, '패션', '경제', '스포츠' 분류 성능도 좋아질 수 있습니다.
- **군집화**: 레이블 없이도 문장들을 의미에 따라 그룹으로 묶는 작업에도 유리합니다.

In [9]:
from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer, losses
from datasets import Dataset

model = SentenceTransformer("all-MiniLM-L6-v2") 

In [10]:
label_map = {
    0: 'fashion',
    1: 'economy',
    2: 'sport',
}

In [16]:
# 기존 문장 및 레이블 (각 20개)
existing_sentences = [
    # Fashion (패션)
    "This new collection features vibrant colors and bold patterns.",
    "The fashion show in Paris showcased the latest trends for spring/summer.",
    "She is known for her impeccable sense of style and elegant outfits.",
    "Vintage clothing is making a huge comeback this year.",
    "Accessorizing with the right handbag can elevate any look.",
    "Sustainable fashion is becoming increasingly important to consumers.",
    "He is a renowned designer, famous for his avant-garde creations.",
    "The Met Gala is a major event in the fashion world.",
    "She launched her own line of eco-friendly clothing.",
    "Denim jackets are a timeless fashion staple.",
    "The latest footwear trends include chunky sneakers and minimalist sandals.",
    "He always keeps up with the latest fashion magazines.",
    "Her style is a mix of bohemian and chic.",
    "This season's must-have item is a tailored blazer.",
    "Fashion bloggers have a significant influence on trends.",
    "The textile industry is exploring new sustainable materials.",
    "She wore a stunning gown on the red carpet.",
    "Athleisure wear continues to be a popular fashion choice.",
    "He prefers a minimalist approach to fashion.",
    "The brand is celebrated for its high-quality craftsmanship and luxurious fabrics.",

    # Economy (경제)
    "The central bank announced an increase in interest rates.",
    "Stock market volatility has been high in recent weeks.",
    "Inflation is a major concern for many households.",
    "The unemployment rate fell to its lowest level in a decade.",
    "Global supply chain disruptions are affecting businesses worldwide.",
    "The government unveiled a new economic stimulus package.",
    "Foreign direct investment has increased significantly this quarter.",
    "The GDP growth forecast for next year is optimistic.",
    "Small businesses are struggling with rising operational costs.",
    "The cryptocurrency market experienced a sharp downturn.",
    "Trade negotiations between the two countries have stalled.",
    "The housing market is showing signs of cooling down.",
    "Consumer spending is a key driver of economic growth.",
    "The national debt has reached a record high.",
    "Economists are debating the risk of a recession.",
    "The tech sector continues to drive innovation and economic expansion.",
    "Tax cuts are expected to boost corporate profits.",
    "The gig economy is transforming the labor market.",
    "Emerging markets offer significant growth opportunities.",
    "The price of oil has a major impact on the global economy.",

    # Sport (스포츠)
    "The home team secured a dramatic victory in the final minutes.",
    "She won the gold medal in the 100-meter dash.",
    "The championship game will be held next Sunday.",
    "He is considered one of the greatest athletes of all time.",
    "The team is training hard for the upcoming tournament.",
    "The World Cup brings together nations in a celebration of football.",
    "Injuries have plagued the team throughout the season.",
    "The transfer window saw several high-profile player moves.",
    "She set a new world record in the swimming competition.",
    "Basketball fans are eagerly anticipating the playoffs.",
    "The rivalry between the two clubs is legendary.",
    "He announced his retirement after a long and successful career.",
    "The Olympics showcase a wide variety of sporting events.",
    "The underdog team pulled off a stunning upset.",
    "He scored a hat-trick in yesterday's match.",
    "Sports analytics are increasingly used to improve team performance.",
    "The marathon runners braved the challenging weather conditions.",
    "She is a rising star in the world of tennis.",
    "The new stadium has state-of-the-art facilities.",
    "Fantasy sports leagues are incredibly popular among fans."
]

existing_labels = [
    # Fashion
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    # Economy
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    # Sport
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
]

# --- 추가 데이터 생성 ---

# Fashion (패션) - 80개 추가
new_fashion_sentences = [
    "Floral prints are expected to be very popular this spring.",
    "She has a keen eye for spotting upcoming fashion trends.",
    "The new boutique on Main Street offers unique, handcrafted jewelry.",
    "Layering different textures is a key styling technique for fall.",
    "He invested in a classic trench coat that will never go out of style.",
    "The fashion industry is slowly embracing more inclusive sizing.",
    "Online thrift stores are a great way to find affordable vintage pieces.",
    "She attended a workshop on sustainable fabric dyeing techniques.",
    "The red carpet was filled with celebrities wearing haute couture.",
    "Monochromatic outfits can create a very sophisticated and elongated silhouette.",
    "Sneaker culture has become a significant part of modern fashion.",
    "Many designers are now focusing on gender-neutral clothing lines.",
    "The exhibition showcased a retrospective of the designer's iconic work.",
    "She prefers minimalist jewelry to complement her understated style.",
    "Capsule wardrobes are gaining popularity for their simplicity and versatility.",
    "The right pair of sunglasses can instantly elevate your look.",
    "Fashion forecasting involves analyzing social, economic, and technological trends.",
    "Many brands are now using recycled materials in their collections.",
    "He is known for his experimental approach to menswear.",
    "The documentary explored the history of denim and its cultural impact.",
    "She often incorporates bold accessories into her otherwise simple outfits.",
    "The latest runway shows featured a lot of oversized silhouettes.",
    "Finding the perfect fit is crucial for any tailored garment.",
    "The fashion magazine's cover story featured an up-and-coming model.",
    "Ethical sourcing of materials is a growing concern for fashion brands.",
    "She is a fashion influencer with millions of followers on social media.",
    "The costume design in the film was widely praised for its historical accuracy.",
    "Power dressing, characterized by sharp tailoring, is making a comeback.",
    "The fashion school is renowned for producing talented young designers.",
    "Upcycling old clothes into new garments is an eco-friendly practice.",
    "The museum is hosting an exhibit on 20th-century fashion photography.",
    "She has a collection of vintage silk scarves from around the world.",
    "Knitwear is a cozy and stylish option for colder months.",
    "The designer's signature style often includes asymmetrical cuts.",
    "Comfort and style are no longer mutually exclusive in fashion.",
    "The fashion industry contributes significantly to the global economy.",
    "He is learning the art of bespoke tailoring from a master craftsman.",
    "She uses a mood board to gather inspiration for her designs.",
    "The color palette for the upcoming season includes earthy tones and bright accents.",
    "Digital fashion and NFTs are new frontiers being explored by designers.",
    "The right belt can completely transform the look of an outfit.",
    "She curated a collection of her favorite sustainable fashion brands.",
    "Street style photography captures everyday fashion inspiration.",
    "The brand's marketing campaign focused on body positivity.",
    "He is attending a trade show to source new fabrics for his collection.",
    "She loves to experiment with different hairstyles and makeup looks.",
    "The fashion council announced new initiatives to support emerging talent.",
    "Pattern clashing can be a bold and stylish fashion statement.",
    "She is studying fashion merchandising and marketing.",
    "The tailoring on the suit was impeccable, fitting him perfectly.",
    "Investing in high-quality basics is key to a versatile wardrobe.",
    "The rise of fast fashion has raised concerns about its environmental impact.",
    "She designed her own wedding dress, incorporating personal touches.",
    "He collects rare and vintage sneakers as a hobby.",
    "The fashion editor provided a critique of the latest collections.",
    "Choosing the right fabric is essential for the drape and feel of a garment.",
    "She attended a lecture on the cultural significance of traditional attire.",
    "The boutique specializes in avant-garde pieces from independent designers.",
    "He is an expert in identifying counterfeit luxury goods.",
    "The fashion house is celebrating its 50th anniversary this year.",
    "She follows several fashion blogs for daily style inspiration.",
    "The new line of activewear is designed for both performance and style.",
    "He is known for his androgynous and boundary-pushing fashion choices.",
    "The fashion show featured elaborate set designs and theatrical presentations.",
    "She is passionate about promoting ethical and fair-trade fashion.",
    "The children's clothing line uses organic cotton and non-toxic dyes.",
    "He prefers custom-made shirts for a perfect fit and personalized details.",
    "The fashion photographer captured the model's effortless elegance.",
    "She is a stylist who works with celebrities for red carpet events.",
    "The trend of 'quiet luxury' focuses on high-quality, understated pieces.",
    "He is skilled in pattern making and garment construction.",
    "The fashion brand collaborated with a famous artist on a limited-edition collection.",
    "She believes that true style is about expressing individuality.",
    "The history of corsetry reflects changing ideals of the female form.",
    "He shops for vintage finds at flea markets and second-hand stores.",
    "The textile museum displays fabrics from different cultures and eras.",
    "She is writing her thesis on the impact of social media on fashion trends.",
    "The designer's use of color is always innovative and surprising.",
    "He is a master artisan who creates handmade leather goods.",
    "The fashion startup aims to disrupt the industry with its direct-to-consumer model.",
    "She is known for her effortlessly chic Parisian style.",
    "The trend for micro-bags continues, despite their impracticality.",
    "He is restoring a collection of antique garments for a museum display."
]
new_fashion_labels = [0] * len(new_fashion_sentences)

# Economy (경제) - 80개 추가
new_economy_sentences = [
    "The finance minister presented the annual budget to parliament.",
    "Global markets reacted positively to the trade deal announcement.",
    "Econometric models are used to forecast future economic trends.",
    "The nation's current account surplus widened in the last quarter.",
    "Venture capital funding for tech startups has reached an all-time high.",
    "The World Bank revised its global growth projections downwards.",
    "Monetary policy aims to control inflation and stabilize the currency.",
    "The real estate sector is experiencing a period of rapid expansion.",
    "Fiscal stimulus measures are designed to boost aggregate demand.",
    "The country is heavily reliant on commodity exports for its revenue.",
    "Shareholders are concerned about the company's declining profitability.",
    "The rise of e-commerce has fundamentally changed the retail landscape.",
    "Labor market reforms are intended to increase flexibility and employment.",
    "The International Monetary Fund provided a bailout package to the struggling nation.",
    "Consumer confidence index dropped amid concerns about job security.",
    "The impact of automation on employment is a subject of ongoing debate.",
    "The government is promoting renewable energy to diversify its power sources.",
    "Developing countries are seeking more equitable global trade rules.",
    "The bond market remained stable despite the stock market fluctuations.",
    "Productivity growth has been sluggish in many advanced economies.",
    "The central bank is considering quantitative easing to inject liquidity.",
    "Microfinance institutions play a crucial role in poverty alleviation.",
    "The nation's credit rating was upgraded by a major rating agency.",
    "Supply-side economics emphasizes tax cuts and deregulation.",
    "The merger of the two companies is expected to create significant synergies.",
    "Income inequality has become a more prominent issue in recent years.",
    "The agricultural sector faces challenges due to climate change.",
    "The tourism industry is a major contributor to the country's GDP.",
    "The government is investing heavily in infrastructure development.",
    "Capital flight can destabilize a nation's economy.",
    "The effectiveness of sanctions as a foreign policy tool is debated.",
    "The manufacturing PMI showed a slight improvement last month.",
    "The digital divide can exacerbate existing economic disparities.",
    "The national currency appreciated against the US dollar.",
    "Market liberalization has led to increased competition in several sectors.",
    "The concept of a circular economy is gaining traction worldwide.",
    "The country's foreign exchange reserves are at a comfortable level.",
    "Behavioral economics studies how psychological factors influence decisions.",
    "The trade deficit narrowed due to a surge in exports.",
    "The government implemented new regulations to protect consumers.",
    "The informal economy provides livelihoods for a significant portion of the population.",
    "The impact of demographic shifts on pension systems is a concern.",
    "Austerity measures were implemented to reduce government debt.",
    "The logistics industry is vital for efficient global trade.",
    "The startup ecosystem is thriving with new innovative companies.",
    "The country aims to become a regional hub for financial services.",
    "The value of the national currency is pegged to a basket of foreign currencies.",
    "R&D investment is crucial for long-term economic competitiveness.",
    "The debate over minimum wage levels continues among policymakers.",
    "The sharing economy has created new business models and opportunities.",
    "The government is working to attract more foreign investment.",
    "The price of essential commodities has been rising steadily.",
    "The nation is transitioning towards a knowledge-based economy.",
    "The impact of geopolitical tensions on energy prices is significant.",
    "The financial services sector is undergoing rapid technological disruption.",
    "The government is implementing reforms to improve the ease of doing business.",
    "The country's economic recovery is showing positive signs.",
    "The central bank's independence is crucial for effective monetary policy.",
    "The role of international financial institutions in global governance is often discussed.",
    "The stock exchange launched a new index for green bonds.",
    "The country is rich in natural resources, which form the backbone of its economy.",
    "The effectiveness of development aid is a complex issue.",
    "The technology sector is the fastest-growing part of the economy.",
    "The government is focused on reducing the budget deficit.",
    "The global economic outlook remains uncertain due to various factors.",
    "The concept of universal basic income is being trialed in some regions.",
    "The country has a highly skilled workforce, which is a key economic asset.",
    "The impact of Brexit on the UK economy is still being assessed.",
    "The agricultural subsidies are a contentious issue in trade negotiations.",
    "The corporate tax rate was recently lowered to stimulate investment.",
    "The financial literacy of the population is important for economic stability.",
    "The government is promoting export-oriented industries.",
    "The shadow banking system poses risks to financial stability.",
    "The country is investing in human capital development through education.",
    "The recent economic data suggests a slowdown in growth.",
    "The sovereign wealth fund manages the country's oil revenues.",
    "The principles of free market capitalism drive many economic policies.",
    "The nation is working towards achieving sustainable development goals.",
    "The impact of intellectual property rights on innovation is debated.",
    "The government is trying to diversify its economy away from oil dependence.",
    "The global financial crisis of 2008 had far-reaching consequences."
]
new_economy_labels = [1] * len(new_economy_sentences)

# Sport (스포츠) - 80개 추가
new_sport_sentences = [
    "The athlete trained relentlessly for years to achieve this level.",
    "The coach's halftime speech motivated the team to play better.",
    "The draft combine allows teams to evaluate prospective players.",
    "She is a pioneer in her sport, breaking barriers for women.",
    "The final match went into overtime, keeping fans on the edge of their seats.",
    "The sports agency represents some of셔 the biggest names in the industry.",
    "He holds the national record for the long jump.",
    "The team's defense was impenetrable throughout the game.",
    "The winter sports season is a highlight for many enthusiasts.",
    "The sports psychologist helped the athlete overcome mental blocks.",
    "The rookie player exceeded all expectations in his first season.",
    "The stadium was packed with cheering fans wearing team colors.",
    "She is a versatile player, capable of playing multiple positions.",
    "The team is focusing on improving their set-piece strategies.",
    "The referee's decision was controversial and sparked debate among fans.",
    "The sports commentator provided insightful analysis during the broadcast.",
    "He is known for his incredible sportsmanship and fair play.",
    "The team's new training facility is equipped with cutting-edge technology.",
    "The youth academy is crucial for developing future talent.",
    "She competed in the gymnastics floor exercise with grace and precision.",
    "The rivalry between the two cities extends to their sports teams.",
    "The team celebrated their victory with a parade through the city.",
    "He is a veteran player who brings experience and leadership to the team.",
    "The sports federation announced new rules to enhance player safety.",
    "The Paralympics showcase the incredible abilities of athletes with disabilities.",
    "She is a dominant force in women's tennis, with multiple Grand Slam titles.",
    "The team's offensive line struggled to protect the quarterback.",
    "The sports drink company is a major sponsor of the event.",
    "He made a spectacular diving catch to save the game.",
    "The team's morale is high after their recent winning streak.",
    "The sports museum features memorabilia from legendary athletes.",
    "She is an advocate for gender equality in sports.",
    "The biathlon combines cross-country skiing and rifle shooting.",
    "The team is working on their conditioning and endurance.",
    "The sports agent negotiated a lucrative contract for his client.",
    "He is recovering from a knee injury and hopes to return next season.",
    "The opening ceremony of the games was a spectacular display.",
    "She is a role model for young aspiring athletes in her community.",
    "The team's mascot is a beloved figure among the fans.",
    "The sports scientist is analyzing player performance data.",
    "He is known for his powerful serves and aggressive playing style.",
    "The team needs to improve their away game performance.",
    "The sports betting industry has grown significantly in recent years.",
    "She is a world champion in martial arts, specializing in Taekwondo.",
    "The team's fan club organizes trips to support them at away games.",
    "The coach is under pressure after a series of losses.",
    "He is a talented young golfer with a promising future.",
    "The sports apparel brand launched a new line endorsed by the athlete.",
    "She participated in the equestrian show jumping event.",
    "The team's success is attributed to their strong teamwork and chemistry.",
    "The sports journalist wrote an in-depth article about the team's history.",
    "He is a skilled archer, consistently hitting the bullseye.",
    "The team is studying video footage of their opponents to prepare.",
    "The sports complex includes a swimming pool, tennis courts, and a gym.",
    "She is a competitive swimmer specializing in the butterfly stroke.",
    "The team's manager made some strategic substitutions during the game.",
    "He is a legendary figure in the world of boxing.",
    "The sports organization is committed to promoting fair play and anti-doping.",
    "She is training for the heptathlon, which involves seven different events.",
    "The team's supporters created an incredible atmosphere in the stadium.",
    "The sports therapist is helping players with injury prevention and rehabilitation.",
    "He is a skilled tactician, known for his clever game plans.",
    "The team is aiming to qualify for the continental championship.",
    "The sports memorabilia market can be very lucrative.",
    "She is a decorated Olympian with multiple medals.",
    "The team's goalkeeper made several crucial saves.",
    "The sports academy provides holistic development for young athletes.",
    "He is a professional skateboarder known for his innovative tricks.",
    "The team is building its strategy around its star player.",
    "The sports event was broadcast live to millions of viewers worldwide.",
    "She is a leading expert in sports nutrition.",
    "The team's colors are red and white.",
    "He is a former champion who now works as a coach.",
    "The sports governing body is investigating alleged rule violations.",
    "She is a talented figure skater, known for her artistic expression.",
    "The team is adapting to the new coach's playing style.",
    "The sports festival aims to promote physical activity among youth.",
    "He is a professional cyclist competing in a grand tour.",
    "The team's performance in the first half was disappointing.",
    "The sports community mourned the passing of a legendary athlete.",
    "She is a world-class sprinter, holding several national records."
]
new_sport_labels = [2] * len(new_sport_sentences)

# --- 최종 데이터셋 생성 ---
final_sentences = existing_sentences + new_fashion_sentences + new_economy_sentences + new_sport_sentences
final_labels = existing_labels + new_fashion_labels + new_economy_labels + new_sport_labels

# 각 클래스별 데이터 개수 확인
fashion_count = final_labels.count(0)
economy_count = final_labels.count(1)
sport_count = final_labels.count(2)

print(f"Total sentences: {len(final_sentences)}")
print(f"Total labels: {len(final_labels)}")
print(f"Fashion sentences: {fashion_count}")
print(f"Economy sentences: {economy_count}")
print(f"Sport sentences: {sport_count}")

Total sentences: 305
Total labels: 305
Fashion sentences: 103
Economy sentences: 101
Sport sentences: 101


일단은 테스트 하기에 앞서 기존 문장들이 얼마나 유사한지 부터 판단해보도록 하겠습니다. 첫번째 10문장과 20번째 문장을 꺼내서 비교해보면 거의 0.25~0.3 정도로 그렇게 유사하다고 판단하지 않는 것을 확인할 수 있는데요. 
이제 이 데이터 셋을 학습시켜 보도록 하겠습니다. 일단 공식문서 [BatchAllTripletLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchalltripletloss) 이고, 

In [17]:
fashion = final_sentences[:100]

f1, f2 = fashion[:10], fashion[10:20]

e1 = model.encode(f1)
e2 = model.encode(f2)

model.similarity(e1, e2)

tensor([[0.2667, 0.2121, 0.2471, 0.2434, 0.2809, 0.1967, 0.2159, 0.2609, 0.1829,
         0.3568],
        [0.3857, 0.4224, 0.2886, 0.3052, 0.4230, 0.2564, 0.2260, 0.4032, 0.3134,
         0.3338],
        [0.2944, 0.3834, 0.6157, 0.2682, 0.3475, 0.2215, 0.5477, 0.4469, 0.4194,
         0.4647],
        [0.3452, 0.4686, 0.2090, 0.3456, 0.4318, 0.3924, 0.2087, 0.4733, 0.3216,
         0.4444],
        [0.0973, 0.1027, 0.0387, 0.1523, 0.1285, 0.0417, 0.0834, 0.2244, 0.2159,
         0.0443],
        [0.3513, 0.4974, 0.2300, 0.2995, 0.4946, 0.6340, 0.2203, 0.5391, 0.4572,
         0.5120],
        [0.2145, 0.3671, 0.3570, 0.1353, 0.2833, 0.2068, 0.1621, 0.2441, 0.3905,
         0.3455],
        [0.2888, 0.3409, 0.2868, 0.2841, 0.3305, 0.1996, 0.3928, 0.4286, 0.2796,
         0.4177],
        [0.2582, 0.3019, 0.3636, 0.2815, 0.3160, 0.3901, 0.3931, 0.3565, 0.4152,
         0.3961],
        [0.2769, 0.4349, 0.2498, 0.3740, 0.4538, 0.3047, 0.1045, 0.4137, 0.3340,
         0.4624]])

In [18]:
train_dataset = Dataset.from_dict({
    "sentence": final_sentences,
    "label": final_labels
})

In [19]:
loss = losses.BatchAllTripletLoss(model)

In [None]:
trainer = SentenceTransformerTrainer(
    model=model,
    train_dataset=train_dataset,
    loss=loss,
)

trainer.train()

Step,Training Loss


TrainOutput(global_step=117, training_loss=3.695770263671875, metrics={'train_runtime': 4.0464, 'train_samples_per_second': 226.13, 'train_steps_per_second': 28.915, 'total_flos': 0.0, 'train_loss': 3.695770263671875, 'epoch': 3.0})

In [21]:
f1, f2 = fashion[:10], fashion[10:20]

e1 = model.encode(f1)
e2 = model.encode(f2)

model.similarity(e1, e2)

tensor([[0.9869, 0.9859, 0.9908, 0.9708, 0.9898, 0.9675, 0.9813, 0.9875, 0.9889,
         0.9878],
        [0.9959, 0.9961, 0.9909, 0.9859, 0.9970, 0.9633, 0.9880, 0.9965, 0.9945,
         0.9927],
        [0.9943, 0.9935, 0.9947, 0.9863, 0.9943, 0.9634, 0.9914, 0.9950, 0.9931,
         0.9917],
        [0.9955, 0.9942, 0.9910, 0.9877, 0.9959, 0.9715, 0.9870, 0.9967, 0.9938,
         0.9941],
        [0.9923, 0.9921, 0.9838, 0.9745, 0.9934, 0.9690, 0.9801, 0.9928, 0.9933,
         0.9907],
        [0.9937, 0.9964, 0.9862, 0.9795, 0.9971, 0.9756, 0.9815, 0.9955, 0.9971,
         0.9946],
        [0.9889, 0.9900, 0.9934, 0.9795, 0.9915, 0.9661, 0.9840, 0.9905, 0.9910,
         0.9899],
        [0.9933, 0.9947, 0.9877, 0.9854, 0.9953, 0.9617, 0.9894, 0.9950, 0.9929,
         0.9915],
        [0.9935, 0.9937, 0.9902, 0.9795, 0.9955, 0.9804, 0.9832, 0.9952, 0.9964,
         0.9948],
        [0.9958, 0.9948, 0.9890, 0.9831, 0.9962, 0.9737, 0.9835, 0.9968, 0.9954,
         0.9940]])