[PaLM 논문 리뷰] PaLM: Scaling Language Modeling with Pathways

NLP

[PaLM 논문 리뷰] PaLM: Scaling Language Modeling with Pathways

코딩무민 2022. 4. 26. 19:27

1. 핵심 요약

최근 모델들

BERT, T5 등의 encoder-only, encoder-decoder architectures 들이 MLM, span corruption 등을 활용하며 NLP tasks에서 좋은 성적을 냄.

위 모델의 한계점

model fine-tuning을 위해 상당한 양의 task-specific training examples를 필요로 함
task에 맞게 fitting 하는 과정에서 model parameter update가 필요함 → model finetuning & deployment에 complexity를 더함

GPT-3 Model

few-shot predictions을 사용한 extremely large Autoregressive LMs
→ decoder-only Transformer architecture & standard left-to-right LM objective
1. large-scale task specific data collection과 2) model parameter updating 없이도 좋은 성능을 냄
→ BERT, T5 의 한계 해결

Post-GPT3 Model (GLaM, Gopher, Chinchilla, Megatron-Turing NLG, LaMDA)

: 모두 GPT-3 model과 마찬가지로 Transformer Architecture

GPT-3 대비 4가지 improvements

Scaling the size
increasing the number of tokens
training on cleaner datasets
increasing model capacity

PaLM

: 위의 4개 중 **“Scaling the size”**에 초점을 맞춤

→ 540-billion parameter 사용 & 780 billion tokens of high quality text 사용

→ Pathways 사용하여 가능

2. 논문 링크

https://arxiv.org/abs/2204.02311

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular ap

arxiv.org

3. 논문 설명 링크

https://coding-moomin.notion.site/PaLM-Scaling-Language-Modeling-with-Pathways-1f078d7e77284728ae7a31a4ca9dbefd

PaLM: Scaling Language Modeling with Pathways

contents

coding-moomin.notion.site

'NLP' 카테고리의 다른 글

[Prompt Learning] Prompting Contrastive Explanations for Commonsense Reasoning Tasks (0)	2022.05.18
[KG-BERT 논문 리뷰] KG-BERT: BERT for Knowledge Graph Completion (0)	2022.05.18
[SimCSE 논문 리뷰] SimCSE: Simple Contrastive Learning of Sentence Embeddings (0)	2022.04.01
[ELECTRA 논문 리뷰]ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (0)	2022.03.25
[MPNet 논문 리뷰] MPNet: Masked and Permuted Pre-training for Language Understanding (0)	2022.03.25

현재글[PaLM 논문 리뷰] PaLM: Scaling Language Modeling with Pathways

코딩무민의 슬기로운 코딩생활