Skip to main content
Back to top
Ctrl
+
K
Introduction
Base
Attention Is All You Need
GPT
GPT2
GPT3
InstructGPT
Models
Llama 3
Llama 3 Source Code
DeepSeek-V2
DeepSeek-Coder-V2
DeepSeek V3
DeepSeek-R1
Qwen 2.5
Qwen2.5-Coder
Qwen3
Gemini 2.5
Seed1.5-Thinking
Seed-Coder
OpenCoder
AM-Thinking-v1
Techniques
Byte Pair Encoding (BPE)
Normalization
RoPE
Extending context window of LLMs
Multi-Head Latent Attention
DeepSeekMoE
Parallelisms
Flash Attention
Flash Attention 2
Benchmarks
HumanEval
MBPP
EvalPlus
LiveCodeBench
CRUXEval
BigCodeBench
SWE-bench
General Benchmarks
Math & Science Benchmarks
Alignment Benchmarks
Data
APPS
CodeContests
TACO
SELF-INSTRUCT
Code Alpaca
WizardCoder
Magicoder
MAGPIE
Acecoder
KodCode
Instag
Tree-sitter
SFT
SFT with RL2
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
LIMA: Less Is More for Alignment
Breaking the Attention Trap in Code LLMs
OpenCodeReasoning
OpenThoughts
Not All Correct Answers Are Equal
RM
Constitutional AI: Harmlessness from AI Feedback
RLAIF vs. RLHF
RLCD
West-of-N
Efficient Exploration for LLMs
DeepSeek-GRM
AdaptiveStep
DPO
DPO
RL
PPO
GRPO
REINFORCE++
Understanding R1-Zero-Like Training
DAPO
VAPO
GSPO
Skywork Open Reasoner 1
Reasoning
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Let’s Verify Step by Step
Training Language Models to Self-Correct via Reinforcement Learning
s1: Simple test-time scaling
Concise Reasoning via Reinforcement Learning
ShorterBetter
Agent
Agents in Software Engineering
REACT
Reflexion
CodeAct
AGENTLESS
SWE-agent
SWE-smith
References
Repository
Open issue
.md
.pdf
Introduction
Introduction
#
Note
This is a note on LLM(large language model) papers.